Sunday 14 September 2014

Java Closures Limitations

I had read some time ago something about the limitations of Java 8 lambdas with regards to modifying state, but I hadn't had time to test it myself. Finally I've been able to give it a go and here are my findings.

This is one of the most typical examples of closures that I can think of, a function that keeps a counter of how many times it's been invoked:

	public static Supplier<String> getPrinterCounterFails(){
		int counter = 0;
		Supplier<String> f1 = () -> {
			System.out.println("function invokation: " + Integer.toString(counter));
			counter++;
			return "a";
		};
		return f1;
	}

The compiler will reject it with this message: local variables referenced from a lambda expression must be final or effectively final

So well, I felt quite puzzled by this limitation. Closures are functions with state, Java 8 does not seem to provide that, it's more like it provides functions with immutable state.

That said, it's indeed quite easy to work around this limitation. If you're trapping a primitive value, as Java 8 forces you to declare it final you can't modify it's state, but if you Wrap that value into another object, then the reference to that object will be final, but you can change the contents of that object, hence the primitive value that you've wrapped there.

	public static Supplier<String> getPrinterCounter(){
		//don't need to explicitely set it as final, as it's not being reassigned, it's effectively final
		/*final*/ int[] counterWrapper = new int[]{0};
		Supplier<String> f1 = () -> {
			System.out.println("function invokation: " + Integer.toString(counterWrapper[0]));
			counterWrapper[0]++;
			return "a";
		};
		return f1;
	}

It's been many years since the last time I wrote any Python code (I quite liked the language at first, but over time I ended up moving away from it because of the syntax, seriously, I've turned really intolerant to "non-C syntax"), but I think to remember that closures there had this same kind of limitation.

C# closures don't have this limitation, so this code will work nicely.

	public static Action GetPrinterCounter(){
		int counter = 0;
		return () => {
			Console.WriteLine("function invokation: " + counter.ToString());
			counter++;
		};
	}

It's needless to say that the almighty JavaScript also lacks this limitation.

Well, if we think in terms of how closures are implemented, and not in terms of what a closure should be, things look more clear. In JavaScript, variable resolution is based on [[scope]], ExecutionContext Objects and so on. Basically, a function points to an object where all the arguments and local variables are stored, and that in turn points to the same kind of object in the "parent function". These objects form a chain (similar to the prototype chain) and variables are looked up in this chain of objects. With such implementation, it's clear that this limitation can not exist in JavaScript.

C# and Java have nothing to do with the above. While .Net languages and now Java support functions as first class objects, their underlying platforms, the CLR and the JVM do not. I mean, neither of them has the notion of an object being a function. In both cases lambdas (and anonymous methods) will be desugared into normal (oddly named) methods. In C#, if the function needs state (i.e. it's a closure) this method will be created in a separate class containing fields for the state trapped by the closure. I think it's the same that Scala does. Java follows a more complex approach, the method is created inside the current class, and it's not until runtime (through invokeDynamic and Lambda MetaFactory magic) that a new class (implementing the required functional interface) is created. Then, in .Net a delegate object will be used to invoke this method, in Java we'll use directly the functional interface. Please, note that the above explained for java is a rough approximation based on my incomplete understanding of how lambda translation, invokedynamic and MethodHandles work in Java, I plan to write a long post about it once I've had more time to dive into it.

The problem comes for cases where the variable trapped by the closure could be trapped by another closure, or be modified by the outer function where the closure is created. In that case the compiler would need to do some black magic so that those different variables (a local variable in the outer function, a field contained in the generated class for the Closure) are kept in sync, meaning that if one is changed to point to another memory location or contain a different value (reference vs value type/primitive type), the other does so. Java compiler designers decided not to go through that trouble and just force you to declare these variables final (or be effectively final), while C# compiler designers considered it worth the effort. This means that even code like this works nicely in C#

	public static void ComplexTest(){
		int counter = 0;
		Action a1 = () => {
			Console.WriteLine("inside a1, counter: " + counter.ToString());
			counter++;
		};
		Action a2 = () => {
			Console.WriteLine("inside a2, counter: " + counter.ToString());
			counter++;
		};
		Console.WriteLine("ComplexTest, counter: " + counter.ToString());
		a1();
		a1();
		Console.WriteLine("ComplexTest, counter: " + counter.ToString());
		a2();
                a1();
		Console.WriteLine("ComplexTest, counter: " + counter.ToString());
	}

//then invoke ComplexTest
ComplexTest();
//and this is the output
ComplexTest, counter: 0
inside a1, counter: 0
inside a1, counter: 1
ComplexTest, counter: 2
inside a2, counter: 2
inside a1, counter: 3
ComplexTest, counter: 4
output:

Indeed, I had already talked about this more than 2 years ago in this post. You'll find there more information about what the C# compiler is doing and the generated bytecodes.

No comments:

Post a Comment