Thursday, 16 December 2010

C# vs Java generics

For some dark, hidden reason I ended up reading the wikipedia article about Generics, and to my great surprise I found out that there are huge differences between Java generics and C# generics, and as usual :-) (I'm joking, I don't like language wars and don't know enough about java, VMs, compilers, GCs... to thoroughly evaluate them) C#/.Net win.

The difference is not in usage (that is very similar in both languages), but in implementation. I already knew how Generics are implemented in C# , and I thought this was the usual approach (and is called reification), so it was odd to see that Java is using a completely different and poor approach that is little more than a compiler artifact and syntactic sugar. JVM is absolutely unaware of Generics, while the .Net IL was extended with new instructions supporting Generics (so also the JIT was updated), this seems in line with Java's conservative approach (the language is evolving so slooooooooow... thought of course that's not the same with all the wonderful Frameworks around it) and C# very fast paced evolution.
These differences are pretty well explained in this thorough article and in this StackOverflow discussion, so it does not pay off that I try to badly sum up them here, just click the links and read... This articles led me also to read about an interesting topic that I had not touched again since Generics were announced for C# 2, code bloating-explosion. Basically the same Generic type of different reference types will share the same Jitted code, which is pretty natural and good!

Well, anyway I can't help to write a bit more here to make this entry a bit longer... So, if we play around a bit with Reflector, we can see that when we declare a Generic Type in .Net, let's say MyCollection%lt;T> the Generated IL has just a MyCollection<T> class. For each instantiation of that Generic type using different type parameters (let's say MyCollection<string>, MyCollection<Person>) we can see that the variables, properties or whatever are typed as a new Concrete Type, MyCollection`1<string> and MyCollection`1<Person>. This means, that at Runtime (not at compile time) the JIT will create those new Concrete Types. If we do a myInstace.GetType() of any instance of a Generic Type we can see that their Type is not MyCollection<T>, but MyCollection`1<string>, MyCollection`1<Person>...

I don't have a clear idea of the memory layout of .Net objects (MethodTables, InterfaceTables...) but from some Windbg + SOS.dll debugging and some reading like this I understand that for each defined Type .Net creates an EEClass structure (containing all the information about a Type) and a MethodTable (with pointers to the code itself, IL or native code once compiled). MSDN explains it fine here:

In fact, EEClass and MethodTable are logically one data structure (together they represent a single type)

So it seems like for each Concrete Type instantiated from a Generic Type (to me a Generic Type is just a Template for creating Concrete Types) an EEClass (check the update below, the same EEClass is shared for Concrete types created from reference types) and a MethodTable gets created, but as explained in one the the links above, when these Concrete Types are created from reference type arguments, the Native code generated for those methods is shared, avoiding code bloating.

There's another point of interest with regards to Generics, and more now that C# is still rather fresh, Variance (you know, the Covariant and Contravariant generic type parameters thing. Again, there are interesting differences (use site variance vs declaration site variance) between Java and C# discussed here.

I should not end this entry so full of external links without adding another great piece of wisdom, an interview to the almighty Anders Hejlsberg.

Update Reading this great technical article, I've confirmed that a different MethodTable is created for each Concrete Type, and the native code can be shared or not depending of whether the parameters are Reference or Value types. Contrary to what I'd read before, the same happens with the EEClass, that does not need to be unique and can be shared.

No comments:

Post a Comment