Friday 15 April 2016

64 bits and Performance

I'd never thought too much of the performance implications of compiling an application as 64 bits rather than 32. For .Net applications the thought process was simple: Am I using any native component that forces me to target one specific architecture? If not, just set it as anyCPU, and the runtime will use the corresponding architecture for that machine when JITting. This seems to indicate that if the architecture is x64 it's always better to compile to 64.

Of course, if your application is heavy and can need more than 2-3 GBs of RAM, for sure you have to set it to 64, but otherwise, you better thinki it twice. Apart from using 64 bits memory addresses and extending to 64 bits the existing registers, x64 also added 8 new general purpose registers (r8 to r15). If your application does some heavy calculations it can take advantage of these extra registers and gain in performance. OK, good, so which are the downsides?

Basically, your application will consume much more memory! Why? Well, objects are made up of some data and references (pointers) to other objects. Good practices tell us to use Composition rather than Inheritance, so more and more our objects point to many other objects. Each reference is now 64 bits rather than 32, so that is going to make a difference (for sure it's not that the overall memory consumption multiplies by 2, as numbers and strings will take up the same space as in 32).

There's another important point to bear in mind. Each .Net instance of a reference type, has a header with 2 fields: the SyncBlock address and the RTTI (vTable if you want to keep it simple) address. Yes, I've said address, so while in 32 bits this header will take 8 bytes in x64 it'll be 16 bytes. You can read more here and here. It's interesting what they mention that references point to the second field rather than the first, that is at a negative offset then.

The sync block sits at a negative offset from the object pointer. The first field at offset 0 is the method table pointer, 8 bytes on x64. So on x86 it is SB + MT + X + Y = 4 + 4 + 4 + 4 = 16 bytes. The sync block index is still 4 bytes in x64. But the object header also participates in the garbage collected heap, acting as a node in a linked list after it is released. That requires a back and a forward pointer, each 8 bytes in x64, thus requiring 8 bytes before the object pointer. 8 + 8 + 4 + 4 = 24 bytes.

So your object is likely laid out like this:

x86: (aligned to 8 bytes)
Syncblk TypeHandle X Y
------------,------------|------------,------------|
8 16


x64: (aligned to 8 bytes)
Syncblk TypeHandle X Y
-------------------------|-------------------------|------------,------------|
8 16 24

I've never been particularly concerned for the memory consumption of my applications, but there are things that is important to have in mind. Let's say that you have a class with 2 Integer data fields. In a 64 bits application each instance will take: 16 bits of header + 4 + 4 (for the 2 integers), that is 24 bytes. If you were using a struct (value type) rather than a class, as no header exists, it would be just 8 bytes, 3 times less! Furthermore, if you are putting these objects in an array (or a List, as it's based on arrays), there's one more difference. If you used a class, your collection will hold references to the instances of that class, while that if you used a struct, it will be embedded in the collection itself, so not 8 extra bytes per object due to that level of indirection, all in all, we are using 32 bytes, while for the struct we stay in the 8 bytes (4 times less). If you have many instances of these objects the difference in memory pressure will be more than noticeable.

This article about Visual Studio sticking to 32 bits is a good link to close this post.

No comments:

Post a Comment