Sunday 21 May 2023

Python Default Parameters Gotcha

I've recently been hit by a rather commonly known Python gotcha, the mutable default parameter. Long in short, if you initialize a function default parameter with a mutable object (a list, dictionary, a custom class), that object will be created only once and shared by all the invocations to the function!!! This means that most likely (unless you know really well what you are doing and plan to use this as a memoization technique), you don't want to do that. Let's see this odd behaviour:


class Country:
    def __init__(self, name: str, cities: list[str] = []):
        self.name = name
        self.cities = cities


france = Country("Frace")
france.cities.append("Paris")
austria = Country("Austria")

print(austria.cities)
# ["Paris"]
print(france.cities is austria.cities)
# True


Other languages like JavaScript, Kotlin or Ruby do not come with this oddity. They initialize the default parameter to a new object in each invocation. For example in JavaScript:


class Country {
    constructor(name, cities = []) {
        this.name = name;
        this.cities = cities;
    }
}

let france = new Country("France");
france.cities.push("Paris");
console.log(france.cities);
// ['Paris']

let austria = new Country("Austria");
console.log(austria.cities);
// []

In general I would say a compiler would implement default parameters by adding some code at the start of the function that checks if the parameter is null and if that's the case initializes it to the default value. Creating that default value in each call seems to me more simple to implement than storing it somewhere the first time it's called and reusing it. For example this is a Java view of the JVM bytecodes generated by the Kotlin compiler for Kotlin code using default parameters:


// Kotlin code:
fun printCity(city: City = City("Paris", 2000000)) {
    println("${city.name} - ${city.population}")
}

// The JVM bytecodes from the above code view as Java code:
// $FF: synthetic method
public static void printCity$default(City var0, int var1, Object var2) {
	if ((var1 & 1) != 0) {
		var0 = new City("Paris", 2000000);
	}

	printCity(var0);
}


So, one first question is how does Python store the initialized, shared value?. Reading this we find:

When Python executes a “def” statement, it takes some ready-made pieces (including the compiled code for the function body and the current namespace), and creates a new function object. When it does this, it also evaluates the default values.

The various components are available as attributes on the function object

So Python stores the default parameters of one function in its __defaults__ property (the articles refers to func_defaults, but that was in Python 2), and we can get access to them from outside the function! (same as we can get access to the variables trapped by a closure by using the __closure__ attribute).

The second question is why Python designers decided to manage default parameters this way?. The idea of having them as attributes of function objects is nice and powerful (as you can use it for memoizing, access-modify from outside) but that behaviour is not what you expect because other languages do it differently. The thing is that Python got this feature before other languages (JavaScript has not had it until recently, ruby is a bit younger than python, kotlin is just a (very cute and powerful) kid...), so indeed maybe we should wonder why other languages decided to do it different.

Notice that Java still does not support default parameters, and C# does, but with a limitation. You can only use constant values, so it's the same limitation that it has with "decorators" (oddly called "attributes" in C#).

I had always thought of default parameters and named parameters as 2 features that come together, but that's not the case in JavaScript, that as of today still does not support named parameters (you still have to use TypeScript if you want that).

No comments:

Post a Comment