Sunday, 27 July 2014

Parallel Linq

Parallel Linq has been around for a long while, but I've hardly had the need nor the chance to play with it. Some discussions at work as to whether some parts of our current Perl application could get noticeable performance improvements by running them in parallel have whet my appetite for running operations like sorting, filtering... in parallel, and this has drawn me to take a look into Parallel Linq.

Apart from all the parallel magic, I find interesting the way how these parallel operations have been made available to Collections, so I'll write here some notes about it.

Given that Parallel Linq is a parallel implementation of Linq To Objects, the design is very similar. While Linq to Objects is implemented as a set of Extension Methods that apply to IEnumerable objects (I'm using IEnumerable to refer both to IEnumerable<TSource> and IEnumerable), Parallel Linq is implemented as a set of Extension Methods that apply to ParallelQuery objects. The former are located in the System.Linq.Enumerable class, and the latter in the System.Linq.ParallelEnumerable class.

If you have an IEnumerable object and want to switch from applying normal (sequential) Linq methods on it to its parallel counterpart, all you need is calling the AsParallel method on it. How does this work? AsParallel will return a ParallelQuery objet, so now the different Where, Find, First... extension methods will be taken from System.Linq.ParallelEnumerable rather than from System.Linq.Enumerable. Notice that ParallelQuery implements IEnumerable, but the resolution rules for Extension Methods guarantee that "the closest" to the given object are taken, meaning that Where(ParallelQuery... is chosen over Where(IEnumerable...

Let's say you have one collection and for whatever the reason (depending on different factors the parallel version could be slower than the sequential one) you want to apply some parallel methods and some sequential methods (and you're chaining all those calls), you can then use the AsSequential method to convert your ParallelQuery back into a plain and simple Enumerable. Thinking a bit about this, it seems clear that all what AsParallel and AsSequential classes do is wrap/unwrap your normal Enumerable Object in a ParallelQuery object. We can launch ILSpy to confirm this:

Indeed, internally the framework is using a ParallelEnumerableWrapper class, that inherits from ParallelQuery. Not sure about the rationale for this extra class.

You'll see that we also have an AsEnumerable method, that seems to do just the same as AsParallel. An yes, if you check its code, it just calls AsSequential.

You could be wondering why I'm using ILSpy to peek into the Framework code when the Framework's source code is publically available (both online and for download). Well, first I'm a creature of habit, and second, using a decompiler still gives me sort of a slight sense of "hacker" that the source code doesn't :-D

Wednesday, 23 July 2014

Order Preserving Object

Some weeks ago one coleague showed me a pretty cool perl library, Tie, that among other things allows to preserve the order in which items are added to a Hash (meaning that you can iterate the Hash in that order)

use Tie::IxHash;
my $items = {};
tie (%{$items}, 'Tie::IxHash');

Given the similarities between Perl and JavaScript I began to think of a way to accomplish the same in JasvaScript. Notice that JavaScript does not guarantee that the order in which a "for in" loop traverses an object (or Object.keys returns its keys) is the same in which those properties were inserted, the specification says it quite clearly. So far, it seems like many JavaScript engines preserve that order, but I guess this is just an effect of how they lay out objects in memory, it's an implementation detail on which we can not rely.

Achieving this feature in ES6 does not seem particularly complicated thanks to proxies. Notice that I'm using Direct Proxies, and at the time of this writing only Firefox seems to implement them, so I'm trying to make this code work just in Firefox (no node.js fun this time). Basically, we could create an OrderPreservingObject, by creating a proxy object around a plain JavaScript object and trapping the set and delete actions, so that it adds the key to an array that holds the order of that object.

function OrderPreservingObject(initObj){
	var obj = {
		orderedKeys: [],
	};
	//to prevent orderedKeys from showing up when iterating with "for in" we should declare it as non enumerable, 
	//but anyway, we should not use "for in" for iterating this object
	//ideally we would have managed to trap the "for in" and we would exclude it directly there
	var proxy = Proxy(obj, {
		
		//receiver is the proxy itself
		set: function(target,name,val,receiver){    
			//console.log("set invoked");
			target[name] = val;
			if (target.orderedKeys.findIndex(function(it){
				return it == name;
			}) == -1){
				target.orderedKeys.push(name);
			}
			return true;
		},
		deleteProperty: function(target,name){
			//console.log("delete invoked");
			delete target[name];
			var index = target.orderedKeys.findIndex(function(it){
				return it == name;
			});
			if (index != -1){
				target.orderedKeys.splice(index, 1);
			}
			return true;
		}, 
				
		keys: function(target){
			//console.log("keys invoked");
			return target.orderedKeys;
		}
	});
	//now that the proxy is set up, we can add the items in the initObj and they'll be added in order
	if (initObj){
		for (var key in initObj){
			proxy[key] = initObj[key];
		}
	}	
	return proxy;
}

Once we have the pieces to keep the order, the main problem is, how should we do to iterate the object following that order? The ideal solution would be iterating the object with "for in". "for in" iteration logic does not depend on our objects implementing a certain iteration method, so it can not be altered or hinted by users. Well, that's true for normal objects, but looking into the ES6 documentation, we see that in principle proxies are also able modify the "for in" behavior. So, I came up with this method/trap:

enumerate: function* (target){
	console.log("enumerate invoked");
	for (var i=0; i<target.orderedKeys.length; i++){
		yield target.orderedKeys[i];
	}
}

But I guess I'm missing something, cause the above does not trap the "for in". The "for in" runs as normal, iterating the object without taking into account our trap function (it iterates the object in order, but not based on our ordering extra logic, just cause for the moment Firefox's internal implementation of objects and "for in" happens to preserve the order). After a fast search I haven't found any information about how this "enumerate" trap is supposed to work, so I guess it'll have to wait.

In C# and Java iterating an object with foreach works just by obtaining an Enumerator/Iterator for that object and working with it. ES6 implements iterators, and while the "for in" loop will continue to work right the same as it does now, the new "for of" loop has been added to work with iterators and their next method. So well, maybe the best way to iterate my OrderPreservingObject would be to make it implement the iterator protocol and use it with a "for of" loop.

		__iterator__: function* (){
			for (var i=0; i<this.orderedKeys.length; i++){
				yield this.orderedKeys[i];
			}
		}

Unfortunately, the above does not work either. Trying to use it will fail in Firefox:
TypeError: dic1['@@iterator'] is not a function

So it seems like I'm left with no other option but iterating my object with a normal for loop, either using Object.keys

for (var i=0; i<Object.keys(dic1).length; i++){
	var key = Object.keys(dic1)[i];
	console.info(key + " - " + dic1[key]);
}

or directly using my orderedKeys array (that otherwise I would prefer to keep out of the Object interface)

for (var i = 0; i< dic1.orderedKeys.length; i++){
	var key = dic1.orderedKeys[i];
	console.info(key + " - " + dic1[key]);
}

Both solutions are quite verbose, so in the end, I came up with the idea of adding a forEach method (similar to the one in Array.prototype) to my Object:

forEach: function(actionFunc){
	//where actionFunc receives: item, key, tiedDic
	for(var i=0; i<this.orderedKeys.length; i++){
		var key = this.orderedKeys[i];
		actionFunc(this[key], key, this);
	}
}

so that it can be iterated like this:

dic1.forEach(function(item, key){
	console.info(key + " - " + item);
});

You can find the whole code here.

Definitely, I should revisit this in a few months as the adoption of ES6 evolves, to see if I finally make the "for in" trap or the "for of"/__iterator__ work.

Saturday, 19 July 2014

Unfamous Resistenza

Over the last months I've come across in multiple occasions (quite normal, as they are all over Toulouse) with some nice stickers with a cool drawing and reading Unfamous Resistenza, something like this:

Given that many times these stickers can be found next to left wing stickers, which fortunately are very common here (NPA, Front de Gauche, all sort of Antifa designs...) I was expecting there could be something interesting behind them. Today I've finally done a search to see what this all was about, coming across with something pretty good. Unfamous Resistenza is a French collective of cultural activists, focusing on ethical and conscious works.

Just having a fast look around their site I've found 2 really interesting interviews, this one to Gutter, a Toulouse based Street Artist with a very dark and gloomy style, and this one to Dark Taz, an excellent photographer focused on Urban Exploration and Beauty in Dereliction.

I think I'll be spending some hours in their site in the next days and become a regular visitor. The fact of being in French is not putting me off, on one side, my French reading skills are slowly improving (reading Metro, 20 Minutes and Direct Matin every morning in the tram helps a bit :-), and on the other side, I want to force me to continue to read French materials. I joked some weeks ago with one friend that one of my aspirations in life is having enough French skills to be able to sit on a Parisian or Toulousain terrace enjoying a Café Noisette (coffee in France is occasionally pretty good, even slightly reminding me the unbeatable Portuguese coffee) while reading "Le Monde Diplomatique".

Saturday, 12 July 2014

Dictionary throwing Exception

I've always found it quite bothering that trying to access a missing key in a Dictionary in .Net throws an Exception rather than returning null. Being so used to JavaScript dictionaries (and to Perl hashes now), where it will just return undefined/undef, I sometimes forget about the difference in .Net and miss using ContainsKey or TryGetValue, ending up with an Exception.

If we think about why the .Net designers took this decision, I think the explanation comes from the Reference Type/Value Type dichotomy. In a Dictionary<K,V>, V can be a Reference type or a Value type. For Reference types, if the key is not present we could return null without a problem, but this is not the case for Value types, as Value types can't be null. One solution would be having applied the same logic as in TryGetValue, and return the default for that type (0 if it's a number). I don't think that would be a good idea, cause you would not know then if the item was in the Dictionary or not. I think this behaviour is fine for TryGetValue, as it's not the standard way to access an element, but not for the the most common accessor to a Dictionary elements.

This has led me to think about how Java behaves for similar data structures. Looking at the Hashmap and HashTable documentation for the get method (yeah, not even in its almighty Java 8 does Java feature indexes... ) we can see that:

Returns the value to which the specified key is mapped, or null if this map contains no mapping for the key.

OK, sure that's cool but, how do they handle Value types (the 8 primitive types supported by Java)? After writing a few lines of code to check this and failing to compile it I realised that Java Generics do not support Primitive Types. Sure it's a serious limitation (caused I guess by the same "don't break previous code" aim that led them to choose their non reified flavour of generics), but for this case it gives you a small feature.

So this code in Java won't crash:

Integer numObj;
int numPrimitive;
Hashtable<String, Integer> myHashTable = new Hashtable<String, Integer>();
  myHashTable.put("Haute-Garonne", 31);
  
  numObj = myHashTable.get("Haute-Garonne");
  System.out.println("Haute-Garonne: " + numObj);
  
  numObj = myHashTable.get("Gironde"); //returns null
  System.out.println("Gironde: " + numObj); //works fine

but notice that obiously, trying to unbox will throw a NullPointerException:

try{
   numPrimitive = numObj; //NullPointerException when trying to unbox a null object into an int
  }
  catch(Exception ex){
   System.out.println("Exception: " + ex); 
  }

There's something rather curious here, not the fact of throwing an exception, that is pretty logical, but the name of the Exception. NullPointerException. The usage of the term "Pointer" in Java seems pretty weird to me. It looks to me that calling it NullReferenceException as .Net does would be quite more appropriate. Seems like I'm not the only one puzzled by such naming decision. Someone justifies it by the fact that pointers and references are indeed the same, the memory address of the object. I would say such a justification is quite wrong. In .Net and Java (and I guess almost all platforms) the fact that a reference holds the address of the referenced object is an implementation detail. Rather than that, a reference could just hold an ID for an entry in a table that would in turn point to the objects (something similar to Object Handles in Windows OS), and such implementation should be transparent to the programmer.

One could wonder why Reference Types are not nullable. I don't fully agree with the accepted answer here. I see it more in terms of how would you differentiate null from a valid value. I mean, I think both Java and .Net internally represent null with a 0. So, how would you distinguish an int or a float which value is 0 from one with no value (null)?

Related to this, the question of how JavaScript implementations store null and undefined comes to mind. After a fast search I haven't found an answer, I think I'll look deeper once I have a chance.