Monday, 16 October 2017

Dictionaries, Keys and Hash Codes

It's (or should be) common knowledge that objects used as keys in Dictionaries/Maps (Objects that are based in hash tables...) must be immutable [1], [2]. In few words, the Hash code of an object being used as key will be used to obtain the index in the internal structure in the hash table to locate the object. If you mutate an object, its hash code should also change, so it will no longer work for finding the right bucket in the Hash table. You can read a .net centric explanation here

In .Net when using an object as Dictionary key, the GetHashCode method will be used, and in case of collision it'll move to the the Equals method (and you should have overriden both methods in your class as the default Object.GetHashCode is considered not fit for dictionaries). Something similar is done in Java and Python.

ES6 introduced the Map class, and one of its advantages over using plain Objects as dictionaries is that we can use any object as keys (and not just strings). Object.prototype does not have a "getHashCode" method, so well, it seems a bit strange. Checking the MDN documentation on Maps it says that key equality works according to the semantics of the === operator. This means that if you use as key an objec (other than a string or a number), the key comparison will be based on the memory address of the object, so if you mutate the object, it will continue to be valid for accessing the dictionary. I mean:

let myMap = new Map();

let key1 = {name: "aaa"};

myMap.set(key1, "this is a value");

console.log("first item: " + myMap.get(key1));
//output: this is a value

key1.name = "bbbb";

//the modified object that we use as key is still valid!!! 
//"===" semantics/reference equality is being used

console.log("first item: " + myMap.get(key1)); 
//output: this is a value

This seems like a rather bizzarre behaviour for a Map/Dictionary. You can see here how people implement a "more standard" one.

Notice that MDN says according to the semantics of ===. Obviously one expects access to a Map item to be O(1), so for sure the runtime is not going to traverse the whole Map comparing the keys with "===". To simulate "===" sematics and have instant access, I guess one possibility is to get the hash code of the memory address of the object used as key.

Related to this, given that JavaScript objects behave similarly to a Dictionary of string keys (you can add or remove items and the lookup is supposed to be almost immediate), one could assume that they would be implemented as a Hash Table. Hash Tables are very fast, but they are slower that property access in a other class based languages (C#, Java, C++...) where the access to a specific property will always use a same offset from the beginning of the object (so you save the time of running the "getHashCode" function). Well, I can read here that for example V8 uses dynamically created hidden classes, making property access as fast as in C++.

I noticed this sentence: A Map may be perform better in scenarios involving frequent addition and removal of key pairs. in the MDN comparison. It rather fits with the V8 policiy. Each time you add a new property to an Object I guess V8 needs to create a new hidden class, and that comes at a (small) cost that most of the times is pretty much compensated by the very fast property access that the new class provides. However, if you add/remove very frequently, maybe the time saved in property access does not compensate the time spent in the hidden class creation.

No comments:

Post a Comment