Thursday 29 August 2024

Getting Source Code at Runtime

One surprising feature present both in JavaScript and Python is that we can get access at runtime to the source code of a function. The internals are a bit different, and from that stems the fact that in Python this feature is slightly more limited than in JavaScript. Let's see.

In JavaScript user defined functions have a toString() method that returns a function's source code (including comments).


function sayHi(name) {
    name = name.toUpperCase();
    console.log(`Hi ${name}`)
}

console.log(sayHi.toString())
// function sayHi(name) {
//     name = name.toUpperCase();
//     console.log(`Hi ${name}`)
// }


For "native" functions or bound functions (obtained with .bind()) toString will just return: function xxx() { [native code] }.


function prepend(pr, msg) {
    return `${pr}${msg}${pr}`;
}
let fn = prepend.bind(null, "||");

console.log(`- bound function: ${fn.toString()}`);
//- bound function: function () { [native code] }

Notice that toString() works perfectly fine for functions created at runtime via eval():


let fn4 = eval("() => console.log('hi');");

console.log(fn4.toString());
//() => console.log('hi')


toString also works for classes, returning the whole class. It feels a bit surprising to me, cause under the covers classes are syntax sugar, and for a class Person what we really have is a Person function that corresponds to the constructor. So indeed, I don't know how we would get just the constructor code (other than extracting its substring from MyClass.toString()).


class Person {
    constructor(name) {
        this.name = name;
    }

    walk() {
        console.log("I'm waking")
    }
}

console.log(Person.toString())
// class Person {
//     constructor(name) {
//         this.name = name;
//     }

//     walk() {
//         console.log("I'm waking")
//     }
// }

console.log(typeof Person); 
//function


So it seems that when we define a function its source code is stored in some internal property of the corresponding function object. Checking the Language Specification it seems function objects have [[SourceText]] internal slot for that.

Things in Python are a bit different. We can obtain the source code of non native functions, classes and modules, but with some limitations, basically: getsource only works if it can open the file the source code exists in. The functionality is provided by the inspect.getsource() function. inspect is a standard python module, but the implementation feels like a bit "hackerish", like a functionality that was not initially intended and was added by means of leveraging some low level details. I've just said that in JavaScript functions have an slot pointing to its source code. This is not so straightforward in Python.

In Python a Function object has an associated code object (attribute __code__) that gives us access to the code of that function (through the co_code attribute). But that's a bytes object containing the Python bytecodes, not the Python source code. The __code__ object has 2 extra attributes: co_filename (with the full path to the python module where the function is defined) and co_firstlineno (with the line in that file where the function starts) (this is well explained here. So if we have the file where that function was defined, inspect.getsource can extract its source code, like this:


def format(txt: str):
    return f"[[{txt.upper()}]]"

print(inspect.getsource(format))
print(f"filename: {format.__code__.co_filename}")
print(f"firstlineno: {format.__code__.co_firstlineno}")

# def format(txt: str):
#     return f"[[{txt.upper()}]]"

# filename: /media/ntfsData/@MyProjects/MyPython_3.10_Playground/inspect/inspect_tests.py
# firstlineno: 27

This technique won't work for functions defined dynamically with exec-eval. There's not a file from which to get the source, and we'll get an exception: OSError: could not get source code.


format2_st = """
def format2(txt: str):
    return f'[[{txt.upper()}]]'
"""

def create_function(fn_st, fn_name):
	exec(fn_st)
	return eval(fn_name)

format2 = create_function(format2_st, "format2")
print(format2("aaa"))
# [[AAA]]

try:
    print(inspect.getsource(format2))
except Exception as ex:
    print(ex)
    # OSError: could not get source code

print(f"filename: {format2.__code__.co_filename}")
print(f"firstlineno: {format2.__code__.co_firstlineno}")

# [[AAA]]
# could not get source code
# filename: 
# firstlineno: 2
-----------------

inspect.getsource can also get the source code of a class. Classes do not have an associated code object, so the technique used has to be a bit different. You can check the inspect.py source code if you feel much intrigued.


class Person:
    def __init__(self):
        super().__init__()
        print("Person.__init__")

    def say_hi_to(self, to: str):
        return f"{self.name} says Hi to {to}"
        
print(inspect.getsource(Person))

# class Person:
#     def __init__(self):
#         super().__init__()
#         print("Person.__init__")

#     def say_hi_to(self, to: str):
#         return f"{self.name} says Hi to {to}"

By the way, inspect.getsource() can retrieve its own source code! nice :-)


inspect.getsource(inspect.getsource)
Out[14]: 'def getsource(object):\n    """Return the text of the source code for an object.\n\n    The argument may be a module, class, method, function, traceback, frame,\n    or code object.  The source code is returned as a single string.  An\n    OSError is raised if the source code cannot be retrieved."""\n    lines, lnum = getsourcelines(object)\n    return \'\'.join(lines)\n'

No comments:

Post a Comment