Saturday 27 May 2023

Usage of "this"

Since quite some time I find the existence of the this keyword in most programming languages not particularly helpful, and I prefer the Python approach of declaring the "receiver" in the function signature and using your name of choice (normally "self"). These are some eeasons why I'm not much keen of the "this" keyword:

Implicit use. In languages like C#, Kotlin, C++ (I think) you can skip typing "this". It will be inferred by the compiler where it makes sense. I find this confusing, at I prefer always refering to it explicitly. JavaScript forces you to write "this" if you want to use it.

Different meanings. In JavaScript "this" can refer to different things. "Normal" functions have dynamic this and Arrow functions have lexical this. When we look up a function in an object and invoke it, if that's a normal function it will dynamically receive as "this" that object on which it has been looked up (what in Kotlin we call "the receiver"), but if it's an arrow function, it won't get that object and will search for it in its lexical scope, so it will find the "this" that was there when the arrow function was defined. When we invoke a "Normal" function not through an object look up it receives as "this" a sort of global object (that's the window object when running in the browser, or another sort of "global" object if running for example in Node). Additionally, "Normal" functions can get bound a static "this" by means of invoking the bind function.

Closures. In JavaScript "Normal functions" (I mean, non Arrow Functions), as I've just explained, "this" is either the object on which the function is invoked (receiver) or a global one, so if we want to trap "this" in a closure we have to use the trick of referencing it with another variable (the normal convention is doing let self = this;) and use that new variable in the closure. On the other hand, in C# closures trap the "this" of the scope where the function is declared (the enclosing scope).

Kotlin adds some extra complexity to the "this" world, with the labelled qualifiers thing:

If this has no qualifiers, it refers to the innermost enclosing scope. To refer to this in other scopes, label qualifiers are used

So in functions with receiver (a member of a class, an extension function, a function literal with receiver) "this" is the object on which we are invoking the function, but additionally if we have nested scopes (inner clasess, several levels of nested functions...) we can get access to the "this" of an outer scope using a "qualified this".

When there is no receiver (anonymous functions or lambdas without receiver) these function can trap the "this" of the enclosing scope in its closure (so it's the same behaviour that we have for C#). Using a rather artificial example:



class Person constructor(var name: String, var age: Int) {
    fun createBookManager1(): (String) -> String {
        // anonymous function WITHOUT receiver, so the "this" is the one of the innermost enclosing scope 
        val manager: (String) -> String = fun(title: String): String {
            return "${this.name} has checked book: ${title}"
        }
        return manager;
    }

    fun createBookManager2(): (String) -> String {
        // lambda WITHOUT receiver, so the "this" is the one of the innermost enclosing scope 
        val manager: (String) -> String = {
            "${this.name} has checked book: ${it}"
        }
        return manager;
    }

    fun createBookManager3(): Person.(String) -> String {
        // function type WITH receiver 
        val manager: Person.(String) -> String = {
            //"this" is not trapped by the closure, it's the receiver that it receives when invoking the function
            "${this.name} has checked book: ${it}"
        }
        return manager;
    }

    fun createBookManager4(): Person.(String) -> String {
        // function type WITH receiver, to refer to the "class this" we use a qualified this 
        val manager: Person.(String) -> String = {
            //"this" is not trapped by the closure, it's the receiver that it receives when invoking the function
            "${this.name} has checked book: ${it} and has trapped ${this@Person.name}"
        }
        return manager;
    }

}


fun main() {
    val person = Person("Xuan", 22)
    val manager1 = person.createBookManager1()
    println(manager1("Book1"))

    val manager2 = person.createBookManager2()
    println(manager2("Book1"))

    val manager3 = person.createBookManager3()
    println(person.manager3("Book1"))

    val manager4 = person.createBookManager4()
    val person2 = Person("Francois", 5)
    println(person2.manager4("Book1"))

    // Xuan has checked book: Book1
    // Xuan has checked book: Book1
    // Xuan has checked book: Book1
    // Francois has checked book: Book1 and has trapped Xuan


}

Sunday 21 May 2023

Python Default Parameters Gotcha

I've recently been hit by a rather commonly known Python gotcha, the mutable default parameter. Long in short, if you initialize a function default parameter with a mutable object (a list, dictionary, a custom class), that object will be created only once and shared by all the invocations to the function!!! This means that most likely (unless you know really well what you are doing and plan to use this as a memoization technique), you don't want to do that. Let's see this odd behaviour:


class Country:
    def __init__(self, name: str, cities: list[str] = []):
        self.name = name
        self.cities = cities


france = Country("Frace")
france.cities.append("Paris")
austria = Country("Austria")

print(austria.cities)
# ["Paris"]
print(france.cities is austria.cities)
# True


Other languages like JavaScript, Kotlin or Ruby do not come with this oddity. They initialize the default parameter to a new object in each invocation. For example in JavaScript:


class Country {
    constructor(name, cities = []) {
        this.name = name;
        this.cities = cities;
    }
}

let france = new Country("France");
france.cities.push("Paris");
console.log(france.cities);
// ['Paris']

let austria = new Country("Austria");
console.log(austria.cities);
// []

In general I would say a compiler would implement default parameters by adding some code at the start of the function that checks if the parameter is null and if that's the case initializes it to the default value. Creating that default value in each call seems to me more simple to implement than storing it somewhere the first time it's called and reusing it. For example this is a Java view of the JVM bytecodes generated by the Kotlin compiler for Kotlin code using default parameters:


// Kotlin code:
fun printCity(city: City = City("Paris", 2000000)) {
    println("${city.name} - ${city.population}")
}

// The JVM bytecodes from the above code view as Java code:
// $FF: synthetic method
public static void printCity$default(City var0, int var1, Object var2) {
	if ((var1 & 1) != 0) {
		var0 = new City("Paris", 2000000);
	}

	printCity(var0);
}


So, one first question is how does Python store the initialized, shared value?. Reading this we find:

When Python executes a “def” statement, it takes some ready-made pieces (including the compiled code for the function body and the current namespace), and creates a new function object. When it does this, it also evaluates the default values.

The various components are available as attributes on the function object

So Python stores the default parameters of one function in its __defaults__ property (the articles refers to func_defaults, but that was in Python 2), and we can get access to them from outside the function! (same as we can get access to the variables trapped by a closure by using the __closure__ attribute).

The second question is why Python designers decided to manage default parameters this way?. The idea of having them as attributes of function objects is nice and powerful (as you can use it for memoizing, access-modify from outside) but that behaviour is not what you expect because other languages do it differently. The thing is that Python got this feature before other languages (JavaScript has not had it until recently, ruby is a bit younger than python, kotlin is just a (very cute and powerful) kid...), so indeed maybe we should wonder why other languages decided to do it different.

Notice that Java still does not support default parameters, and C# does, but with a limitation. You can only use constant values, so it's the same limitation that it has with "decorators" (oddly called "attributes" in C#).

I had always thought of default parameters and named parameters as 2 features that come together, but that's not the case in JavaScript, that as of today still does not support named parameters (you still have to use TypeScript if you want that).

Wednesday 10 May 2023

Html Select and Options

I've been doing some web development lately, after a really long time disconnected from that. It's a simple internal application and I've decided to use Vanilla js rather than going through the long and painful process of relearning Angular or learning Vue (React is not an option, I had looked into it time ago and the Hooks thing seemed absolutely ridiculous to me).

Using Vanilla js in a sort of small SPA has given me the feeling of understanding and controlling what I'm doing, something that I did not have with Angular. The thing is that looking into how to populate a select element I've come through something pretty interesting. This answer already shows something interesting. As with any other html element, they create the option elements with document.createElement() and set its different properties, what is interesting is that the HtmlSelectElement provides an add() method to add the options, so we can use it rather than the standard appendChild() method.


let cities = ["Paris", "Vienna", "Xixon"];
let selectElement = document.getElementById("usersSelect");
for (const city of cities) {
	let op = document.createElement("option");
	op.name = city;
	op.value = city:
	selectElement.add(op);
}

What is even more interesting, is that we can create the option element using the Option constructor


let cities = ["Paris", "Vienna", "Xixon"];
let selectElement = document.getElementById("usersSelect");
for (const city of cities) {
	selectElement.add(new Option(city, city));
}

I've been using document.createElement for centuries, and then at some point in time, thanks to the MDN documentation I found out that each of the different html elements corresponds with a class inheriting from HTMLElement. So we have HTMLDivElement, HTMLSelectElement, HTMLOptionElement, etc, etc. So, first, for creating an option, why do we use an Option constructor rather than HTMLOptionElement?, and second, why don't we create other html elements invoking HTMLDivElement, etc, rather than document.createElement?

Well, if we try to invoke a new HTMLOptionElement() (or any other new HTMLxxxElement()) we get an error: Uncaught TypeError: Illegal constructor. If we look into the MDN documentation, each HTMLxxxElement is described as an interface. This seems odd, given that in JavaScript there's not syntax for defining interfaces, so in the end it seems like an "interface" is a class which constructor throws an error when being invoked (so it's not directly instantiable). Though we can not directly invoke the HTMLxxxElement constructor, the constructor property of objects created with document.createElement() will point to the corresponding function, and of course instanceof will work fine. I mean:


let d1 = document.createElement("div");
d1.constructor.name
"HTMLDivElement" 

d1.constructor === HTMLDivElement
true

d1 instance of HTMLDivElement
true

Creating and option either with new Option() or document.createElement("option") has exactly the same effect. The instanceof operator applied to objects created that way returns true both for Option and for HTMLOptionElement, but in both cases the instances constructor property points to the HTMLOptionElement function, not to the Option function. So the Option function is considered a constructor, but a particular one. It can be invoked with new (in JavaScript any 'non arrow function' can be invoked with new and returns an object in such case) and has been designed to initialize the object that it returns, but that object is not an instance of Option, but an instance of HtmlOptionElement.


let op1 = new Option("k", "v");
undefined
op1 instanceof Option;
true
op1 instanceof HTMLOptionElement;
true
op1.constructor === Option
false
op1.constructor === HTMLOptionElement
true

let op2 = document.createElement("option");
undefined
op2 instanceof Option
true
op2 instanceof HTMLOptionElement
true
op2.constructor === Option
false
op2.constructor === HTMLOptionElement 
true

Sunday 7 May 2023

SqlAlchemy Lazy Loading

I've done some basic use of the ORM functionality provided by SqlAlchemy, that seems to be pretty powerful. At first sight it seems to give us the same features as .Net Entitiy Framework. Though the documentation advices using the Declatative Mapping style, I've been using imperative mapping. It's not that I'm much into Architecture stuff like DDD, Hexagonal and so on... but I think we should aim for Persistence Ignorance even in the most basic projects.

A very interesting feature of ORM's is how they handle the loading of related entities. I mean, we have a Blog entity that has a list of Posts. When we first retrieve a Blog in many cases it would be better not to retrieve its posts until we first access to them. This is called Lazy Loading. In .Net Entity Framework a property in an entity that refers to other entities is called a Navigation Property, and the Lazy Loading is implemented by means of Proxy classes. For those entities that have Navigation properties that have been marked as being lazily loaded, the ORM gives us an instance of a Proxy class that inherits from our Entity class. The Navigation properties in such Proxy class are implemented in a way that on first access they run a query agains the Database to retrieve the related instances. You can verify that I'm not making up things here.

adds lazy loading capabilities to an entity object by:

Storing a reference to the context.
Overriding navigation properties to make them load when they're accessed, using the context.

The proxy inherits from the entity class. Therefore, the navigation properties must be virtual and the entity class can't be sealed.

SqlAlchemy ORM also supports lazy loading, and I was wondering how it does it. Python being a highly dynamic language where you can add/remove attributes from an object there's no need for creating proxy classes. I was thinking that maybe the class for a lazy entity would implement this laziness in an overridden __getattribute__ method, but it's a bit different. Let's say I have these entities:


class Post:
    # very interesting, SqlAlchemy does not invoke the __init__ method in the entities
    def __init__(self, post_id, title, content):
        print("Post __init__")
        self.post_id = post_id
        self.title = title
        self.content = content

class Blog:
    def __init__(self, blog_id, url, title):
        print("Blog __init__")
        self.blog_id = blog_id
        self.url = url
        self.title = title
        self.posts: List[Post] = []

I use imperative mapping defining the blog to posts relation as lazy:


metadata = MetaData()
mapper_registry = registry(metadata=metadata)

table_blog = Table(
    "Blogs",
    mapper_registry.metadata,
    Column("BlogId", Integer, primary_key=True, autoincrement=True),
    Column("Url", String),
    Column("Title", String),
)

table_post = Table(
    "Posts",
    mapper_registry.metadata,
    Column("PostId", Integer, primary_key=True, autoincrement=True),
    Column("BlogId", ForeignKey("Blogs.BlogId")),
    Column("Title", String),
    Column("Content", String),

)

def start_mappers():
    mapper_registry.map_imperatively(entities.Blog, table_blog,
      properties={
            "blog_id": table_blog.c.BlogId, 
            "url": table_blog.c.Url, 
            "title": table_blog.c.Title, 
            "posts": relationship(entities.Post, lazy="select")
            #The default value of the relationship.lazy argument is "select", which indicates lazy loading. 
        }
    )

    mapper_registry.map_imperatively(
        entities.Post,
        table_post,
        properties={
            "post_id": table_post.c.PostId, 
            "title": table_post.c.Title,
            "content": table_post.c.Content
        }      
    )

When the above mapping function is invoked some attributes are added to the classes for our entities. For our Blog class, we can see below the contents of its __dict__ before (it mainly has the __init__ method) and after doing the mapping (for each column defined during the mapping an attribute of type: sqlalchemy.orm.attributes.InstrumentedAttribute has been added to the class)

The next interesting thing is that when creating and instance of our Entity class from a database row the __init__ method is not invoked. You can find the explanation for old versions of SqlAlchemy, not for version 2.0, but it still holds true (the print commands that I've put in the __init__ above are never executed).

The SQLAlchemy ORM does not call __init__ when recreating objects from database rows. The ORM’s process is somewhat akin to the Python standard library’s pickle module, invoking the low level __new__ method and then quietly restoring attributes directly on the instance rather than calling __init__.

So though in my Blog.__init__ I'm initializing the posts (relation-navigation property) to an empty list, as the ORM is not invoking __init__ my Blog instances initially lack that attribute. This is how a Blog instance looks (it's an instance of the Blog class, there's not any sort of additional proxy-inheriting class needed) when first retrieved from the DB, no trace of the posts attribute:

{'_sa_instance_state': , 'title': 'Deploy To Nenyures', 'blog_id': 1, 'url': 'deploytonenyures.blogspot.com'}

So how does the lazy loading work? I already talked about the complex attribute lookup process in Python. So the thing is that when we first try to access to myBlog.posts Python will find the posts attribute not in the instance, but in the Blog class, as an InstrumentedAttribute, that happens to be a descriptor. So here my assumption is that the __get__ method of the descriptor will query the database to obtain the Post entities and will add them as an attribute to the Blog instance, so that next time we access to myBlog.posts we find them in the instance rather than in the class. This is how the same Blog instance looks after the first access to posts and the lazy-loading:

{'_sa_instance_state': , 'title': 'Deploy To Nenyures', 'blog_id': 1, 'url': 'deploytonenyures.blogspot.com', 'posts': [, , ]}