Tuesday, 31 December 2013

Cyberwar

Last month I watched an entertaining BBC documentary about Computer Security, Defeating the Hackers. It deals with several common topics like the threat to security posed by Quantum Computers, and the solution provided by Quantum Cryptography. It also mentions a new concept to me, Ultra Paranoid Computing (well, it seems quite new to everyone, as there's no wikipedia entry), bringing up the idea of using passwords that we are not aware of and as such we can not reveal. This other post explains it a bit more, but above all, the most interesting part to me of the documentary is the section dealing with the Stuxnet worm. Well, indeed, being a computing and history/social issues freak as I am, it felt like quite embarrassing to me being absolutely unaware of such a "bomb". So far I think it's the most important CyberAttack ever done, and it gives me the creeps to think how these things can evolve in the future (ok, admittedly this time the output of the attack seems pretty good, Iran's nuclear program being damaged and delayed, but bearing in mind that the perpetrators were 2 ultra-nationalist governments: USA and Israel, I'm quite sure they could launch the attack against any other less "evil" target at any moment, just to ensure their "supremacy" and their role of "policemen of the planet". This paragraph in wikipedia is quite unsettling

Sandro Gaycken from the Free University Berlin argued that the attack on Iran was a ruse to distract from Stuxnet's real purpose. According to him, its broad dissemination in more than 100,000 industrial plants worldwide suggests a field test of a cyber weapon in different security cultures, testing their preparedness, resilience, and reactions, all highly valuable information for a cyberwar unit.

Last week I stumbled upon another related documentary (indeed, even more interesting), produced by Spanish public TV (unfortunately it's only in Spanish). En Portada has been producing some excellent documentaries for many years (for example the one from a few months ago about the fascist Greek party (paramilitary unit) "Golden Dawn" was outstanding), which is something really praiseworthy in a place where 99% of the TV production is stinky crap that is direct responsible for the illiteracy, passivity and amorality (or just plain degeneration) of a whole society. This program deals again with the Stuxnet case, and also mentions some other examples of cyberwar, like the attacks on Estonia (this was well known to me), and 2 cases where cyberwar was combined with "traditional" war: an Israeli attack on Syria in 2007 and a Russian attack on Georgia in 2008.
There's also some interesting info about the Bahnhof service provider. This program is previous to all the NSA surveillance scandal, so they do not mention how in the last months many organizations are starting to look at European companies to store their data to the detriment of USA ones, now seen with huge suspicion. I hope this trend continues on and it means some boom for European Data Centers and IT professionals.

Related to Security, some weeks ago Martin Fowler came up with a brilliant concept, Data Austerity , pretty interesting food for brain in these times of "Big Data coolness" everywhere.

Tuesday, 24 December 2013

Some Thoughts on C# 6

I'm a Programming Languages freak. Though I've never been into compilers programming, I've ever had an enormous interest in their input, languages (and even their output, MSIL and so on). This explains why I follow with much attention the development of ES6, Java 8, anything related to Groovy... and I've been quite upset by the lack of news regarding C# 6.

Hopefully, this summer we got some information from the very Anders. There he mainly highlighted the importance of Roslyn, how the next C# will come with the new managed C# compiler (as a service), and how this will allow for further development of the language in the future. I quite like what he mentions about his work on TypeScript making him more appreciative of Dynamic features (I'd like to see the astonishing dynamic features added to C# in v.4 getting some more love in the near future)

Last week some more news about probable new features in C# 6 were released on the web. Well, quite a bunch of things, but in first instance I felt a bit disappointed.
The one I'm mostly interested in is that they call (oddly enough) "Monadic null checking" and that is just the equivalent to Groovy's Safe Navigation Operator. I've really missed this feature so many times that I even wrote a sort of workaround for it both for C# and JavaScript.
On second thought, I'll say that Read Only Autoproperties and Property Expressions seem also rather appealing to me, but anyhow, as some people mention in the comments, these are mainly cosmetic/wrist friendly additions, but nothing involving any real change. Some people mention that if you want a really modern language for the CLR, you should switch to F#. I don't like the idea, though multiparadigm, F# favors Functional Programming over OOP, so it involves a sort of paradigm shift that I haven't done yet (I'm used and rather keen of the soft functional features in JavaScript and C#, but I really need to devote some time to understand the advantages of a more purely functional approach).

All this has made me think about what features I would add to a modern language like C# (or Java 8), and well, indeed, there are not so many. C# is (for the most part) a mainly statically typed language, so some of the things I love in JavaScript (or Groovy) can't easily be applied/used (I'm referring to expanding classes or instances with new methods or altering the inheritance chain). Also, I'm not sure to what extent Python's metaclasses (little to do with Groovy's ones) could be used. Anyway, after some pondering, I've come up with this (wish) list:

  • Some kind of Multiple Inheritance of Behaviour, like the one enabled by Java 8 with its Default Interface Implementation. Scala traits seem even more powerful, so they would be more than welcome (and would probably involve just some more compiler magic, but no IL changes).
  • When writing this post, I realized that Iterators (Generators in normal Programming Languages parlor) could gain some more power (send, throw, close, delegated yield).
  • Some more investment in the DLR and dynamic. Not sure if at this point (where Lightweight Code Generation has been superseded by Expression Trees in most cases) extending the access to the IL code could get us any benefits.
  • Allow Attributes (Annotations in the Java world) to receive delegates (lambdas) as parameters. This is possible in Groovy land, but in C# all you're left with is an error prone hack.

In terms of the CLR, I think Microsoft have got it quite wrong with TypeScript. They've decided to follow that (to my mind utterly stupid) trend of avoiding JavaScript and writing something that compiles to it. Sure JavaScript is riddled with some quirks and gotchas, but it's one of the most powerful languages ever designed, and what we need is speeding up the development and adoption of ES6, as it fixes many of those problems. So, what I'd like to see is a modern JavaScript engine for the CLR (something like Oracle's Nashorn).

Saturday, 14 December 2013

Fun with ES6 proxies

Proxies have been (and still are) a long awaited addition to JavaScript. Finally it seems like they'll be one more weapon of the ES6 arsenal. Though I remember having read information about them a few years ago, they're so new that only Firefox (always miles ahead of any other browser in terms of JavaScript innovation) implements them. You'll read that you can enable them in node.js with the --harmony-proxies argument, but that's an implementation of the old Proxy specification (Harmony Proxies: Proxy.create...) that is obsolete and has been superseded by Direct Proxies.

Ever that I've used proxies in other languages it's been for method interception, so when reading the documentation my attention was mainly drawn to the get and apply traps. It seems interesting to me that they enable 2 different approaches for the same purpose.

Let's say you have an object with just one method, something like this

var cat = {
	name: "Kitty",
	sayHiTo: function _sayHiTo(subject){
		return "Hi, " + subject + " my name is " + this.name;
	}
}; 

and you want to intercept calls to its sayHiTo method. You can wrap your cat object in a Proxy and use a get trap, so that when the method is requested (got) you return the wrapping function:

var proxiedCat = Proxy(cat, {
	get: function(target, name, receiver){
		return function(subject){
			console.log("intercepting call to method " + name + " in cat " + target.name);
			return target[name](subject);
		}
	}
});

Notice that target is the object being proxied, name is the property being requested, and receiver is the proxy itself.

We can use a different approach for this, just wrap the sayHiTo function in a Proxy and use the apply trap.

cat.sayHiTo = Proxy(cat.sayHiTo, {
	apply: function(target, thisArg, args){
		console.log("intercepting call to " + target.name + " in cat " + thisArg.name );
		return target.apply(thisArg, args);
	}
});

With the first approach you would be intercepting calls to any method in the class, so the thing seems clear to me, if you want to intercept all methods (or want to use any other trap besides get), this is the way to go. On the contrary, if you just need to intercept one (or a few) specific methods, using the second approach seems more appropriate.

The thing is that we've been intercepting (or decorating) function calls for years, just by doing something like this:

function formatString(st){
	return "X" + st + "X";
}
console.log(formatString("hi"));
console.log("formatString.name: " + formatString.name);

formatString = (function(){
	var target = formatString;
	return function formatString(st){ //notice that I'm naming the function expression, so that function.name continues to return "formatString"
		console.log("intercepting call to " + target.name);
		return target(st);
	};
})();
console.log(formatString("hi"));
console.log("formatString.name: " + formatString.name); //returns formatString

so one could wonder if there's any advantage in using "function proxies" (I mean wrapping a function in a proxy with just an apply trap:

duplicateString = Proxy(duplicateString, {
	apply: function (target, thisArg, args){
		console.log("intercepting call to " + target.name);
		return target.apply(thisArg, args);
	}
});
console.log(duplicateString("hi"));
console.log("duplicateString.name: " + duplicateString.name); 

well, the functionality is the same, so the only advantage that I can see is that the code is more semantic (the Proxy call clearly indicates that you are intercepting that function).

One very interesting point with ES6 proxies is that they try to be as transparent as possible. For example an instanceof of a proxy, or a myProxy.constructor, will work just as if we were applying it to the the proxied object (target). Also, if we proxy a function, myProxyFunction.name will return the name of the target function. This makes me wonder if there's anyway to know if a given object is a normal object or a proxy around it. So far, I haven't found a way to distinguish them.

You can see some code here, and run it from here (in Firefox+Firebug)

By the way, time ago I uploaded a small interception library to github based on the old approach.

Friday, 6 December 2013

One more Programming Mashup

It's been a while since my last "list of interesting programming stuff" kind of entry, so here it goes a new installment.

  • JavaScript is already very well packed with reflective features: Introspection, Object Expansion, Runtime Code Evaluation (eval is not evil), so it called my attention to find Reflect API planned for ES6. As the document says, it does not provide new functionality, but is a convenient way to place this already existing functionality (I've always considered Object as an odd location for functions like: freeze, seal, getOwnPropertyDescriptor). It's also interesting to see how nicely it plays with proxies.
  • Talking of proxies, it's clear that it's one of the hottest and more expected additions to JavaScript, and we'll see all sort of interesting uses of it, one that I pretty much like is this one, adding negative indexes to arrays. If you take a look at the code you'll see it's just a couple of lines!
  • I pretty much like the usage (and the code) for this Dependency Injection library for JavaScript
  • With my love for Prototype based languages, I've never been much keen to use any of the many class emulation libraries for JavaScript, and indeed I don't like the idea of adding Classes syntactic sugar (internally it'll continue to be based on prototypes chains) to ES6. But when one comes across so many discussions about different approaches to inheritance, with names like "parasitic inheritance":
    • http://www.benlakey.com/2013/05/javascript-prototypal-inheritance-done.html
    • http://helephant.com/2009/08/17/javascript-prototype-chaining/
    • http://stackoverflow.com/questions/7250423/javascript-parasitic-inheritance?rq=1
    • http://javascriptissexy.com/oop-in-javascript-what-you-need-to-know/http://stackoverflow.com/questions/2800964/benefits-of-prototypal-inheritance-over-classical?lq=1
    • http://stackoverflow.com/questions/16929353/parasitic-combination-inheritance-in-professional-javascript-for-web-developer
    • http://ericleads.com/2013/02/fluent-javascript-three-different-kinds-of-prototypal-oo/">
    one can end up changing his mind and thinking that the homogeneity that the class construct will bring to most developments, can be indeed a promising prospective.
  • I've found interesting this entry about how a bad design it is when a class requires its inheritors to call some of the parent methods. There's no way to force that an inheritor will do that, so if he skips that call, this new Child class will be wrong. As the article mentions, we should change the design to use the Template Method Pattern (one of my favorite ones)
  • I've always felt quite happy using normal objects when I need a Dictionary in JavaScript, so this post has been a real eye opener.

Wednesday, 27 November 2013

Exberliner, refugees in Germany

Exberliner is a really inspiring publication (it already prompted this post last year). The base idea (an in English lifestyle magazine for expats in Berlin) should not be particularly appealing to a scarcely social, not cool in the slightest individual like me... but it's crystal clear that Berlin is a different place, and what in other cities would be a hipsters oriented magazine, in Berlin involves articles about social issues, historical background and so on. For example, the April number contained great articles about the urban (gentrification) plans for the city and the wipe out of part of the socialist architectural heritage.

November's issue made a non stop reading session in my flight back, and helped me understand things I'd seen the previous days both in Berlin and Hamburg. Already in October 2012 I had come across a refugees support camp in OranienPlatz (O-Platz), and information about demonstrations and a march in solidarity with refugees. The same landscape has been present in my ensuing visits to the world capital (it's very funny to see how to me Berlin has turned into the Welthaupstadt that the nazi scum dreamt about, but just for all the contrary reasons: freedom, art, dynamism, convergence of ideas, alternative thinking...) This November, apart from the camp, I found many stickers, posters, flags hanging from autonomous centers and alternative book shops... reading: Refugees Welcome and Lampedusa in Berlin/Hamburg.

Also, when paying a visit to one of my favorite Berlin spots, the desolated site between the Spree and Kopenickerstrasse with its abandoned factory covered in graffiti and murals, I found a middle east family trying to build a shanty under the shelter of the factory.

Exberliner's november issue is called "Looking for Asylum in Berlin" and nicely explains the current situation through different interviews and articles. In the last years Germany has become one of the top destinations for people seeking asylum in Europe. The reasons for this are not completely clear, but one of them is the false belief that achieving asylum in Germany is easy. Reading the experiences of some refugees it's pretty clear that it's not. Years of stagnation in a Heim (emergency refugee centre), without any idea of whether your request will be accepted or not, whether you'll be moved to another Heim in another city, the inability to work as you lack the permissions needed for that... As more refugees have been arriving into Germany (and primarily Berlin) and no move has been done by the government to speed up the asylum request procedures or to improve the conditions in which they are kept until the final decision is done different support initiatives has emerged, one of them being the protest camp in OranienPlatz. The "Lampedusa in Berlin" phrase comes from the fact that many of those refugees enter Europe through Lampedusa.

It's shocking to see that lately there's been an increase of Russian gays seeking asylum in Germany on the basis of the wave of homophobia that Putin's government and the Orthodox Church have unleashed. It seems like in the last months Russian neonazis and other pieces of human debris have been moving their focus from dark skinned people to gays.

Apart from these sad refugees stories, there's also an interesting column about Die Linke, their last electoral results (unfortunately, not too impressive), their difficulties to get rid of the past and the permanent boycott they're subjected to.

Sunday, 24 November 2013

A slight taste of Perl

Over the last weeks I've been learning some Perl. Yes, I know it sounds like odd, what the hell is doing someone quite obsessed with language evolution learning a dusty 90's language?
Well, let's explain. First, a new project sprang up in my employer involving Perl and relocating for some months to the very beautiful city of Toulouse. Spending some months in Toulouse seemed appealing enough to me to make Perl look as cute as ES6 or even more :-) so I just eagerly jumped into the project.
Second, Perl is way much better than I would have expected. I guess I'll be blogging about it in the nearby future, so I'll just say here that you can do perfectly modern programming with Perl 5. This does not change the fact that the language is riddle with very idiomatic "features" that I frankly consider SHIT (the default variable/parameter thing above all), but well, it's just a matter of avoiding those things (and praying for not having to work with code written by others that consider those features cool and useful).

The thing is that while learning Perl I've come up with some very familiar features that until that moment I used to identify as JavaScript specific. Probably the most obvious are shift, unshift and splice for arrays. While the functionality is common to almost any language/library, the choice of names is not (I guess Append, RemoveFirst... are quite more natural).

A fast search brings up a related entry in the excellent 2ality blog. Well, there are a few more influences not mentioned there that I can think of:

  • Perl's undef is mainly the same as JavaScript's undefined
  • While (contrary to JavaScript) Perl lacks a Boolean type, the coercion rules applied to normal values in if conditions and son on are pretty similar. 0, "", and undef are considered false. Notice though, that "0" and an empty array/list will be considered false in Perl, but true in JavaScript.
  • I would not call this an influence, but an inherited disease... the most basic function to obtain the current datelocaltime, considers months from 0 to 11, rather than 1 to 12... one of the most common JavaScript gotchas.
  • Update, 2013/12/27 Removing a key from a dictionary/hash is pretty similar (and not particularly intuitive, I would prefer a .remove method), use the delete operator in JavaScript:
    delete myObj.key;
    or the delete function in perl:
    delete(myHashRef->{key});

There has also been some more recent influences. JavaScript 1.7 first (implemented by Firefox since quite a few years ago) and ES6 now, implement destructuring assignment, a beauty that Python also drew from Perl. Moose, a modern object system for Perl 5 sprung up a JavaScript equivalent, Joose.

Sunday, 17 November 2013

Stree Crap 2

While being in Hamburg last week, I came across another insult to aesthetics that makes a good follow up to this previous post. I was quite interested in paying a visit to the Bismarck memorial in Hamburg, mainly because its Berlin counterpart makes one of my favorite monuments, though not for the Bismarck statue itself, but for the 4 other statues surrounding it. You can read the wikipedia article for some basic info, I'll just say that they look incredibly imposing and powerful (reminding me in that sense the astonishing 2 statues flanking the entrance to the Altes Museum.

Hamburg's memorial is quite interesting, Bismarck stands sober and serious watching over one of the most affluent cities of the country he managed to unify (specifically over St. Pauli, the most leftist, anarchic, party going district in Hamburg). It's quite remarkable how with his cape and sword he looks more like a Teutonic Knight than like a statesman of the end of the XIX century.

And here you have some pics of the offence, when street art turns into street crap

As for Bismarck, he seems to be a rather interesting personage, and not just for founding the most important country in modern Europe. From my basic understanding of his biography, I have mixed feelings about him. On one side he was deeply anti-socialist, but on the other side he was responsible for creating the concept of Welfare state that civilized people (i.e Europeans, Canadians...) have today, along with trying to avoid war by all means (without him, WWI well might had occurred some decades earlier).

Thursday, 31 October 2013

Generators

It seems like there's been quite excitement lately around one of the new beautiful features with which one day ES6 will delight us, Generators. They've been added to the last version of node.js, and have been available in Firefox since time immemorial (fuck you Chrome), but notice that Firefox will have to update its implementation to the syntax agreed for ES6 (function*...).

I first learnt about Generators back in the days when I was a spare time pythonist. Then the feature was added to C# 2.0, and all my relation with the yield statement has been in the .Net arena. Notice that unfortunately Microsoft decided to use the term Iterator for what most other languages call Generators, same as they decided to use the terms Enumerable-Enumerator for what most other languages call Iterable-Iterator... frankly a quite misleading decision.

I've been quite happy with C# Generators (Iterators) ever since, especially with all the cool black magic that the compiler does under the covers (create aux class implementing IEnumerator (and IEnumerable if needed), __this property pointing to the "original" this...), but reading about JavaScript generators I've found some powerful features that are missing in C# version.

It's no wonder that in JavaScript generator functions (functions with yield statements) and iterator objects (__iterator__) are pretty related. Documentation talks about generator functions creating an generator-iterator object. Let's clarify it, what they mean with generator-iterator object is an iterator object with some additional methods apart from next:
send, throw, and close

These extra methods are quite interesting and are not present in C# implementation. send seems particularly appealing to me. As explained in different samples, basically it allows you to pass a value to the generator object that it can use to determine the next steps in the iteration. You can use it to reset your iterator or to jump to a different stage. I've said reset, well, IEnumerator has a Reset method, but the IEnumerator/IEnumerable classes that the C# compiler generates when we define an Iterator (generator) method lack a functional Reset method (it just throws a NotSupportedException).

Pondering a bit over this, we can sort of add a Send/Reset method to our C# iterators. From the iterator method we have normal access to properties in the class where it's defined (as the generated Enumerator class keeps a __this reference to that class), so we can add the Send/Reset method directly there. This means that if we want to Reset the Enumerator created from an iterator method, we'll have to do it through the class where that iterator method is defined, rather than directly through the Enumerator. Obviously it's not a much appealing solution, but well, it can be useful in some occasions.

public class FibonacciHelper
{
 private bool reset;
 public void Reset()
 {
  this.reset = true;
 }
 public IEnumerator<int> FibonacciGeneratorSendEnabled()
 {
  int prev = -1;
  int cur = 1;
  while (true)
  {
   int aux = cur;
   cur = cur + prev;
   prev = aux;
   yield return cur;
   if (this.reset)
   {
    this.reset = false;
    prev = -1;
    cur = 1;
   }
  }
 }
}

FibonacciHelper fibHelper = new FibonacciHelper();
  IEnumerator<int> fibEnumerator = fibHelper.FibonacciGeneratorSendEnabled();
  for (var i=0; i<10; i++)
  {
   fibEnumerator.MoveNext();
   Console.WriteLine(fibEnumerator.Current);
  }
  Console.WriteLine("- Reset()");
  fibHelper.Reset();
  for (var i=0; i<10; i++)
  {
   fibEnumerator.MoveNext();
   Console.WriteLine(fibEnumerator.Current);
  }

I've got a full sample here

Another difference is that while JavaScript's send will both reposition the iterator and return a value, my implementation above will just reposition the enumerator, but no value will be returned until the following call to Next.

Another interesting feature in JavaScript generators is the additional syntax for composing generators, aka "delegated yield": yield* (seems Python's equivalent is yield from)

function myGenerator1(){
yield* myGenerator2();
yield* myGenerator3();
}

As C# lacks that cutie, we have to write:

IEnumerator<T> MyGenerator1()
{
foreach(var it in MyGenerator2()){
yield it;
foreach(var it in MyGenerator3()){
yield it;
}

Saturday, 26 October 2013

Make me a German

Make me a German is an equally funny and informative BBC documentary. The curiosity for understanding why the German economy is doing so good while the economies of the rest of the European Union are doing so dramatically bad, compels a British family to move to Germany and try to convert themselves in the average German family in an attempt to understand the country from inside.

The experiment is pretty funny, and brings up some interesting points, at least for someone like me, that in spite of having an obsessive fascination for Berlin, having read quite a lot about German history and society and counting some German artists (Caspar David Friedrich) among my all time favorites, has never had a too intense interaction with locals (I've been to Germany quite many times, but as a solitary person with not much significant social skills... I've hardly scratched the surface of the German mind). I'll list below some of their findings:

  • Germans really work less hours than most Europeans (at least British and Asturians), but they work so focused and hard that they are much more productive. It's astonishing how outraged one German lady felt when talking about her experience in U.K. in an office where people where checking their personal emails or talking about their private life during the work day. Quite hilarious.
  • There seems to be a very strong sense of community at work, and also an identification with the company. You are part of a team and as everyone in the team is working hard you can't fail them. I guess most people will appreciate this, but notice that taking to the extreme this feeling of unity and belonging and the denial of individualism helped set the backdrop for the Nazi regime. I'm a very individualist person, so I'm a bit biased on this point.

  • Germans are cautious with money and save more than the rest of Europe. This can be easily traced to the brutal crisis after WWI and WWII. Germans are quite little fond of credit cards (hum, that's a rather Germanic trait of mine). This background also explains something that I pretty enjoy when being there, Supermarkets are cheap, indeed it turns out that German supermarkets have the tightest profit margins in Europe.
  • It's easier to be a mother in Germany. Families with kids get enormous fiscal advantages, Kinder Gartens are really cheap, and there's a sense of pride in being a mother that has left her job to take care of her little kids and the house. It's so common that mothers that decide to carry on with their jobs are generally seen with a certain disapproval. What seems odd to me is that having all these advantages the birthrate continues to be so low.
  • I'd never noticed that Sundays as a rest day were so important to Germans, well probably it's cause they're not that sacred in Anarchist Berlin as they are in Christian southern Germany. Combine this with that almost genetic obsession with abiding by the rules and civic behaviour, and doing some more noise than expected in a Sunday morning can end up with the Polizei paying you a visit and giving you a fine. On a personal note, the unruly Astur-Gallaecian in me can't help enjoying the bad looks I get there each time I cross a red light :-D (and what to say about travelling without a ticket in Berlin's BVG)

It quite caught my attention that a British family were looking at Germany as a better place. For an Asturian like me, that lives in a place with 27% unemployment, where youngsters are much more ignorant now than they were 100 years ago, having as their main aspiration in life to turn into a TV crap celebrity and partying as hard as possible, where politicians are mainly a bunch of thieves, where "picaresca" (that is, getting whatever you want by means of tricks and cheating rather than by effort) is a chronic illness... it's normal to perceive Germany as a better place and look at it with a certain sense of inferiority (though based on culture, history and geography Asturies is NOT Southern Europe, in the end we share and suffer too many traits with the rest of Southern Europeans), but I also perceive UK as a better place/society, so it seemed funny to me seeing the Brits envious of the Germans. Who knows, maybe Swedes are also envious of Norwegians or Danes... but for us, they're all just "first class countries".

Sunday, 20 October 2013

Stalin's Birthday Cakes

My fascination with architecture (I'm talking about buildings today, not about Software) has grown over and over along the years. My main source of fascination are Gothic Cathedrals (and to a lesser extent Baroque structures) and slim sky scrappers (aka "Business Cathedrals"). I'll also leverage this entry to make public my discomfort with simplistic buildings that for some reason "self proclaimed intellectuals" decided to consider "revolutionary". I'm talking about the main current in functionalism and that sort of Bauhaus crap. For me Aesthetic pleasure should be one of the main aims of architecture, indeed, it's one of its basic functions.

This said, is easy to understand my fascination with Stalinist Style sky scrappers (aka Stalin's Birthday Cakes). This wikipedia article gives an excellent introduction to the broader subject of Socialist Realism. Though I've never been to Russia, "the Soviet sphere of influence" after WW II (aka occupation and puppet states) has meant that I've been able to indulge myself with the views of some extraordinary pieces of this style with no need to leave the European Union.

Along with the 2 most well known buildings, Warsaw's Palace of Culture and Science and its little brother (or sister) Latvian Academy of Science in Riga, I've also set my eyes on 3 other beautiful (though obviously not so magnificent) constructions in:

  • Prague (the Crowne Plaza Hotel). I knew about this building through some web research, otherwise it's a bit far from city centre and I don't think I would have come across it just by chance.
  • Tallinn (nice residential building close to city centre). You can read more about Tallinn's "Soviet legacy" here.
  • .
  • and Vilnius. I just came across this building by chance. It's close to the city centre, by the Neris river, just next to the pedestrian bridge crossing to the business district (by the way, that bridge gives you a pretty nice view of that area). I haven't found any additional information about it.

I've created a new Picasa Gallery with some more related pictures.

While I find this "Birthday cakes" style buildings the most noticeable of the genre, the whole Socialist Classicism style seems fascinating to me. My visits to the Soviet War Memorial in Berlin have been sort of spiritual experiences (apart from the imposing architecture it confronts you with the miserable condition of human beings when you think about how the heroes that liberated Europe from fascism turned into the brutal rapists of millions of German women...), and visiting the memorials in Tallinn or Riga, or just strolling along Karl Marx Allee in Berlin or Nowa Huta in Krakow are absolutely recommendable activities.

Saturday, 12 October 2013

Debug, Release and PDBs

Many people (obviously I was among them) feel surprised when they build a Release version of a .Net Application with Visual Studio and find that .pdb (Program Database) files have been output to the Release folder along with the generated binaries. A better understanding of Release builds and pdb files will explain it.

Based on some online resources it seems like the Release configuration in Visual Studio invokes the compiler like this:
csc /optimize+ /debug:pdbonly

The /optimize flag instructs the compiler as to whether generate optimized MSIL code. My understanding is that there are few optimizations that the C# compiler applies when generating MSIL, I think the main difference is that a good bunch of NOPs is added to non optimized MSIL in order to make subsequent debugging easier. Take into account that most of the optimization tasks are done by the JIT when compiling from MSIL to Native Code. I'm not sure what effect the unoptimized MSIL has in the JIT compilation, as what I've read is that in principle JIT always tries to generate optimized code except when a method is decorated with the MethodImplAttribute set to NoOptimization, or while debugging with the Suppress JIT optimization on module load option. Also, I'm not sure whether the /optimize flag option has any effect on the JIT behaviour (it could set some additional metadata instructing the JIT to optimize or not Native code). Based on this article your can also manipulate the JIT behavior by means of a .ini file

The /debug flag tells the compiler whether it has to generate pdb files, and how complete the debug info should be (full vs pdbonly). This excellent post gives a neat analysis. It mentions another attribute to tell the JIT to what extent it must perform optimizations, the DebuggableAttribute. Related to this, it seems like the addition of MSIL NOPs has more to do with the /debug flag that with the /optimize one.

PDBs are a fundamental piece for any debugging attempt. This article will teach you almost everything you need to know about PDBs. Basically, you should always generate PDBs for your release versions and keep them stored along with your source code, in case you ever need to debug your Release binaries.

PDBs are used for a few more things other than debugging itself:

  • The .Net runtime itself uses the information in pdb files in order to generate complete stack traces (that include file names and line numbers). I guess the stack trace is built from StackFrame objects, about which we can read this:

    A StackFrame is created and pushed on the call stack for every function call made during the execution of a thread. The stack frame always includes MethodBase information, and optionally includes file name, line number, and column number information.

    StackFrame information will be most informative with Debug build configurations. By default, Debug builds include debug symbols, while Release builds do not. The debug symbols contain most of the file, method name, line number, and column information used in constructing StackFrame objects.

    I would say when they say talk about release/debug they should really talk about the presence or not of pdb files, cause as explained in the previous article, both full and pdbonly options generate the complete stacktraces.
  • ILSpy makes use of PDBs to get the names of the local variables (as these are not part of the Assembly Metadata). Assembly Metadata includes method names and parameter names, but not local variable names, so when decompiling an assembly into C# code ILSpy will read the variable names from the associated pdbs. I found these related paragraphs somewhere:

    Local variable names are not persisted in metadata. In Microsoft intermediate language (MSIL), local variables are accessed by their position in the local variable signature.

    The method signature is part of metadata. Just to call the method it would be enough to know the binary offset of the method and its number and size of parameters. However, .NET stores the full signature: method name, return type, each parameter's exact type and name, any attributes on the method or parameters, etc.

    Given this source code:

    ILSpy will decompile a Debug build like this when PDBs are present

    like this for a Releasse build also with PDBs present

    and like this when PDBs do not exist

It's interesting to note that Java does not use separate files for its debugging information, debug information (if present) is stored inside .class files. More on this here

Sunday, 6 October 2013

Windows vs Linux: Processes and Threads

I'm both a Windows and Linux (Ubuntu, of course) user, and I'm pretty happy with both systems. I find strengths and weaknesses on both of them, and love to try to understand how similar and how different both systems are. It's important to note that I don't have any sort of "moral bias" against Commercial Software. I deeply appreciate Open Source and almost all software I run on my home PCs is Open Source, but I have absolutely nothing against selling software, on the contrary, provided that it's sold by a fair price, I fully support it (until they day capitalism is overthrown and we start to live in a perfect "communist with a human face" society...) People buy and sell hardware, so what's the problem with buying/selling software?

What really annoys me (so much that it made me move away from Linux for several years) are the typical open source bigots that spend the whole day bashing Microsoft (a company where employees earn pretty decent salaries and enjoy a huge level of respect from their employer) because of the inherent evilness in selling software, but don't give a shit about wearing clothes produced by people earning 2 dollars a month under enslavement conditions... It's obvious that if you're involved in an anarchist hacklab you should avoid Closed Software, but someone with a iphone in the pocket of his Levi's trousers is not entitled to give moral lessons to Microsoft, Adobe or whatever... well, enough philosophy, let's go to the business :-)

There are a few Windows/Linux differences that I find interesting and I'd like to touch upon, I'll start off today by Processes and Threads:

For years I've had the impression than Threads in Linux play a rather less important role than in Windows. I can think of a handful of reasons for this:

  • It seems to be common knowledge that Process creation is cheaper in Linux, this discussion makes a pretty enriching read. In short, fork and even fork + exec seem cheaper than CreateProcess, and some aspects of Windows (like security) are fair more complicated (which does not necessarily mean better) than in Linux, which adds overhead. Regarding fork, when a process A starts a second copy of itself it's just a simple fork not followed by an exec, so my understanding is that no hard disk access is involved, while a CreateProcess will always involve disk access.
  • Traditionally Linux threads have been far from optimal, though all this seems to have changed since the introduction of NPTL in Kernel 2.6
  • I think we could say that for the Linux Kernel a Thread and a Process are quite more similar than they are for the Windows Kernel. In Linux both Process creation and Thread creation make use of the clone syscall (either invoked by fork for the former or by pthread_create for the latter), though both calls are done differently so that some data structures (memory space, processor state, stack, PID, open files, etc) are shared or not. This paragraph I found somewhere is good to note:

    Most of today's operating systems provide multi-threading support and linux is not different from these operating systems. Linux support threads as they provide concurrency or parallelism on multiple processor systems. Most of the operating systems like Microsoft Windows or Sun Solaris differentiate between a process and a thread i.e. they have an explicit support for threads which means different data structures are used in the kernel to represent a thread and a process.
    Linux implementation of threads is totally different as compared to the above-mentioned operating systems. Linux implements threads as a process that shares resources among themselves. Linux does not have a separate data structure to represent a thread. Each thread is represented with task_struct and the scheduling of these is the same as that of a process. It means the scheduler does not differentiate between a thread and a process.

    Please, with respect to the last sentence notice that the Windows Scheduler does not differentiate between threads and processes either, it just schedules threads, irrespective of their process. It's nicely confirmed here:

    Scheduling in Windows is at the thread granularity. The basic idea behind this approach is that processes don't run but only provide resources and a context in which their threads run. Coming back to your question, because scheduling decisions are made strictly on a thread basis, no consideration is given to what process the thread belongs to. In your example, if process A has 1 runnable thread and process B has 50 runnable threads, and all 51 threads are at the same priority, each thread would receive 1/51 of the CPU time—Windows wouldn't give 50 percent of the CPU to process A and 50 percent to process B. To understand the thread-scheduling algorithms, you must first understand the priority levels that Windows uses.

    This is another good read about Linux Threads and Processes

One consequence of these differences in importance is that getting thread figures is more straightforward in Windows.
Viewing the threads associated to a process is pretty simple in Windows, you don't even need the almighty ProcessExplorer and just can get by with Task Manager if you add the Threads column to it. This is not that out of the box in Linux. Ubuntu's System Manager does not have a Threads column, and most command line tools do not show the threads number by default, so you'll need to use some additional parameters:

with ps you can use the o option to specify the nlwp column, so you can end up with something like this:
ps axo pid,ppid,rss,vsz,nlwp,cmd
When using top in principle you can pass the -H parameter so that it'll show threads rather than processes, but I find the output confusing.

I think another clear example of the differences in "thread culture" between Linux/Windows communities is Node.js. Its asynchronous programming model is great for many scenarios, but it's easy to get to a point where you really need two "tasks" running in parallel (2 cpu bound tasks like decrypting 2 separate streams), when I first read that the only solution for those cases is spawning a new process, such answer came as a shock as I've got mainly a Windows background. When you think that though it's now massively used in Windows Node.js started with Linux as its main target, the answer is not that surprising.

Wednesday, 2 October 2013

Delay-Loaded Dlls

Something I love of using several similar (sometimes competing) technologies (C#-Java, Windows-Linux...) is that ever that I learn something new in one of them I try to find how it's done in its counterpart.

Some days ago I came across Delay-Loaded Dlls in native Windows applications. It's sort of a middle ground between the 2 normal dll loading techniques, that is:

  • The most common/obvious case: statically Linked Dlls. When you compile your application references to those dlls that it needs get added to the Import Table of your PE. As soon as your Application is loaded these dlls will get loaded into its address space, irrespective of whether they'll end up being used in that particular execution.
  • Dynamically Loaded Dlls. In this case the Dll is not loaded when the application starts, but you decide dynamically not only when, but also what, to load (you can decide to load one or another dll based on runtime conditions). This is all done with the LoadLibrary and GetProcAddress Win32 functions.

As I've said, Delay-loaded Dlls are something in between supported by our cute linker. You have to know at compile time the name of the Dll that you'll be delay-loading and it'll be added to the Import Table, but it won't be loaded until one of its functions is used for the first time. This means that it's like a transparent lazy version of the first case described above. Indeed, this is very similar to how .Net applications load assemblies "statically" (I mean, not using Assembly.Load...). The Assembly needs to be present at compile time and it'll be referenced from the metadata, but it won't be loaded until the first time that one of its functions is close to being executed (it's better explained here).

Linux can load Shared Objects (SO's) also in a purely static manner (statically linked) or in a purely dynamic fashion (using dlopen rather than LoadLibrary and dlsym rather than GetProcAddress)

And now the question is, how do we delay-load Shared Objects in the Linux world?
Well, to my surprise, based on this question/answer it can't be done!

You can read more about the loading of Shared Libraries here

Sunday, 29 September 2013

Windows Services, WCF Named Pipes and Windows Sessions

In the last months, while working on a project at work, I've come across a good number of pretty interesting things (interesting and challenging in Production projects tend to involve stress and hair loss... so well, let's say that I tend to wear a cap or a wool cap more and more often :-) One of these items has to do with the odd relationship between Windows Services and WCF Named Pipes. As usual, I'm not revealing here anything new, just mainly putting together the different pieces of information given by others that helped us solve the problem.

So, let's say that you have a Windows Service that wants to communicate with other processes (let's call them Activities) running in the same machine (in our case the Service was also launching these processes (activities), but that's not relevant to this specific problem). Well, this kind of IPC scenario seems a perfect fit for using named pipes. Our processes (and also the Windows Service) are .Net applications, so we should be able to set up this solution pretty easily by using WCF with the NamedPipesBinding.

The idea seems pretty straight forward, we want the Windows Service to establish a communication with the Activity processes, so these processes will work as servers and the Windows Service as client. This means that each of our Activity processes will contain a self hosted WCF server (System.ServiceModel.ServiceHost) with an Endpoint using a NetNamedPipeBinding, basically:

ServiceHost host = new ServiceHost(typeof(MyActivityImplementation));
NetNamedPipeBinding binding = new NetNamedPipeBinding();
host.AddServiceEndpoint(typeof(IActivity), binding, "NamedPipeName");

The problem with the above is that your Windows Service won't be able to open the connection to the Named Pipe. There are several posts and StackOverflow entries discussing this issue:

  • http://stackoverflow.com/questions/12072617/client-on-non-admin-user-cant-communicate-using-net-pipe-with-services?lq=1
  • http://msdn.microsoft.com/en-us/library/windows/desktop/ms717797%28v=vs.85%29.aspx
  • http://support.microsoft.com/?kbid=100843
  • http://blogs.msdn.com/b/patricka/archive/2010/05/13/if-i-m-an-administrator-why-do-i-get-access-denied.aspx
  • http://stackoverflow.com/questions/4959417/how-to-cross-a-session-boundary-using-a-wcf-named-pipe-binding?lq=1

Really valuable pieces of information, and all of them pointing to the same solution, to use a Callback Contract. This Callback Contract thing is really ingenious and comes as a neat surprise when you're so used to the Web (Services/APIs) world and its connectionless nature, with requestes always started by the Client. With the Callback Contract we're able to reverse the process, so that the Server can call the client through a callback. So, rather than A always doing requests to B, we can have B do an initial request to A, that request will contain a callback object that later on A can use for doing requests to B, so we've created a bidirectional channel, where both A can send requests to B and B send requests to A. We'll have two contracts then, one for the methods that B will invoke in A, and another one for the methods that A will invoke in B. Obviously WCF makes Callbacks available only to those bindings that can really support a bidirectional communication, that is, NetTcpBinding and NetNamedPipeBinding:

This was not intended to be a post about WCF and Callback Contracts, so if you want to see some code just google for it.
Once we've found a work around to allow our Windows Service and normal processes to happily talk to each other, the next point should be to understand why the initial approach was failing. In the links above there's some confusing information concerning this, but this paragraph below seem like the correct explanation to me:

If you are running on Windows Vista or later, a WCF net.pipe service will only be accessible to processes running in the same logon session (e.g. within the same interactive user's session) unless the process hosting the WCF service is running with the elevated privilege SeCreateGlobalPrivilege. Windows Services run in their own logon session, and have the privilege SeCreateGlobalPrivilege, so self-hosted and IIS-hosted WCF net.pipe services are visible to processes in other logon sessions on the same machine. This is nothing to do with "named pipe hardening". It is all about the separation of kernel object namespaces (Global and Local) introduced in Vista... and it is the change this brought about for security on shared memory sections which causes the issue, not pipe security itself. Named pipes themselves are visible across sessions; but the shared memory used by NetNamedPipeBinding to publish the current pipe name is in Local namespace, not visible to other sessions, if the server is running as normal user without SeCreateGlobalPrivilege.

Well, admittedly, those references to separation of kernel object namespaces and shared memory sections left me in shock, I had not a clue of what they were talking about.

Let's start off by understanding Kernel Objects and Kernel Objects namespaces. When I think of Windows Kernel Objects I mainly think in terms of handles and the per-process handle table, but we have to notice that many of these Kernel Objects have a name, and this name can be used to get a handle to them. Because of Remote Desktop Service (bear in mind that this is not just for Remote Sessions, the switch user functionality is also based on RDS), the namespace for these named objects was split between Global and Local in order to avoid clashes.

OK, so far so good, but how does that relate to Named Pipes and Shared Memory Sections?. This excellent article explains most of it. I'll summarize it below:

The uri style name used by WCF for the endpoint addresses of NetNamedPipeBindings has little to do with the real name that the OS will assign to the Named Pipe object (that in this case will be a GUID). Obviously WCF has to go down to the Win32 level for all this Pipes communications, so how does the WCF client machinery know, based on the .Net Pipe Name, the name of the OS Pipe that it has to connect to (that GUID I've just mentioned)?
The server will publish this GUID in a Named Shared Object (a Memory Mapped File Object). The name for this Named File Mapping Object is obtained with a simple algorithm from the .NetNamedPipeBinding endpoint address, and so the client part will use this same algorithm to generate this name of the File Mapping Object, open it and read the GUID stored there. And it's here where the problem lies. A normal process running in a Windows Session other than 0 (and usually we'll want to have our normal processes running in the Windows Session of the logged user that has started them, rather than in session 0) can't create a Named File Mapping Object in the Global namespace, so it'll create it in its Local namespace (corresponding to its Windows Session). Later on, when the Windows Service (that runs always in Session 0) tries to get access to that File Mapping, it'll try to open it from the Global namespace rather than from the namespace local to that process. This means that it won't be able to find the object and the whole thing will fail. That's why we have to sort of reverse our architecture using Callback Contracts. The Windows Service will create a File Mapping Object in the global namespace (contrary to a normal process, a Windows Service is allowed to do this) and then the Client process will open that File Mapping in the global namespace (it can't create a file mapping there, but it can open a file mapping). Now the Client process has the GUID for the name of the pipe and can connect to it. Once the connection is established, the Windows Service can send requests to the Client process through the Callback Contract Object.

My statements above are backed by that MSDN article that I've previously linked about Kernel Object Namespaces (and also by some painful trial and error). This paragraph contains the final information that I needed to put together the whole puzzle:

The creation of a file-mapping object in the global namespace, by using CreateFileMapping, from a session other than session zero is a privileged operation. Because of this, an application running in an arbitrary Remote Desktop Session Host (RD Session Host) server session must have SeCreateGlobalPrivilege enabled in order to create a file-mapping object in the global namespace successfully. The privilege check is limited to the creation of file-mapping objects, and does not apply to opening existing ones. For example, if a service or the system creates a file-mapping object, any process running in any session can access that file-mapping object provided that the user has the necessary access.

Friday, 27 September 2013

Street Crap

Since a few years ago I'm very much into different forms of Street Art. Though I'm mainly interested in pieces with a social/political meaning, the truth is that I can very much appreciate Street Art works just for its aesthetic appeal, even when no message is intended (it's astonishing how much life a few stickers and a few marker strokes can cast on an plainly dead city wall). This said, there are places that are beautiful enough on their own and do not need any addition, and such addition would be vandalism rather than art. Furthermore, if such addition is plainly bland and tasteless, it turns out being purely grotesque.

A few days ago I found one of the most flagrant displays of pure stupidity and disrespect of that kind. Some idiotic scum bag decided to leave a shitty graffiti in one of the cute old buildings in the imposing Toompea Hill in the delightful city of Tallinn

I really felt like rewarding the author/s of such an exhibition of mental diarrohea by shoving up his ass all the sprays used in such felony

Friday, 6 September 2013

Node Scope, Html Classes and more

Time for one of my typical listings of interesting notes/tricks/whatever that I want to have easily accessible for further reference.

  • If you've ever been dubious about how scope between different modules works in Node.js, you should read this brilliant explanation, it makes it crystal clear

    Unlike the browser, were variables are by default assigned to the global space (i.e. window) in Node variables are scoped to the module (the file) unless you explicitly assign them to module.exports. In fact, when you run "node myfile.js" or "require('somefile.js')" the code in your file is wrapped as follow: (function (exports, require, module, __filename, __dirname) { // your code is here });

  • I'd never been aware of how incorrectly I was using the term "CSS class" until I read this eye-opener. So I think we should speak like this: We assign html classes to html elements and style is applied to those elements by means of CSS rules consisting of a Class Selector and a declaration block. I fully agree with the author in that using the correct naming is not just a matter of being pedant:

    This isn't just pedanticism. By using the phrases "CSS class(es)" or "CSS class name(s)" you're not only being imprecise (or just plain wrong), you're tying the presentational context/framing of "CSS" to class names which implies and even encourages the bad practice of using presentational class names.

  • One of my first thoughts after learning about CSS Animations was how useful it would be to create CSS rules from JavaScript, you can read a good explanation here.
  • Even if you've never written a jQuery plugin you'll probably have wondered what the jQuery.fn thing is, well, it's that simple like this: jQuery.fn is an alias to jQuery.prototype (yes, in the end jQuery is just a function, so it has a prototype property). This explanation is excellent.
  • I'll start by saying that I'm not the least interested in languages that compile to JavaScript, be it Dart, CoffeeScript, TypeScript... JavaScript is a beautiful language and I can't understand people not wanting to use it for Normal Web Development. Nevertheless, this asm.js thing is quite different stuff, it comes with the promise of allowing us to run in our browsers things that were not thought for the web, and at a decent speed. You can read this beautiful explanation by John Resig or this quite detailed one (but admittedly quite harder to digest)
  • While doing a debugging session in Visul Studio and having a look at the Threads window I noticed that one of the threads was running with Priority set to Highest. I was not creating any thread implicitly, there was a bunch of worker threads being created by a WCF Service Host, so what could this be? Well, pretty easy, the .Net runtime will always create at minimum 3 threads for an application, the Main Thread, a Debugger Thread (a helper thread to work along with the debugger) and the Finalizer thread (and depending on the .Net version you can also have a separate GC thread). So it struck me that it had to be the Finalizer Thread that was running at High priority. This question in StackOverflow confirms it.

Sunday, 1 September 2013

Some Notes on Concurrency

My last assignment at work compelled me to go through a fast and pleasant upgrade of my very basic and rusted knowledge about concurrent programming, so there are some thoughts that I'd like to write about, and even when they're rather unconnected, I think I'm not going to refrain myself from throwing them into a messy post...

A basic understanding of Concurrency (threads, locks...) should be fundamental for any software developer, and now that multi-core processors are everywhere, such knowledge should be even more important. However, as more and more of our time as developers is spent writing code in a thread unaware language like Javascript, I think less and less exposed we are to concurrency constructs. In a way we could say that this is a time of concurrency ready hardware and concurrency unready developers...

A multi-core reality

The omnipresence of multi-core processors not only increases the number of situations where using multiple threads would be beneficial (think of CPU bound scenarios), but also adds new elements to account for:
  • Spinlocks I was totally unaware of this synchronization construct until I realized it had been added to .Net 4.0.

    In software engineering, a spinlock is a lock which causes a thread trying to acquire it to simply wait in a loop ("spin") while repeatedly checking if the lock is available. Since the thread remains active but is not performing a useful task, the use of such a lock is a kind of busy waiting.

    What should come to mind after reading that is that this thread doing a busy-wait to avoid a context switch makes only sense in a multi-core processor (or a multi-processor machine). I'll give an example. Let's say Thread1 acquires a Spinlock and gets scheduled out before completing its operation. Then Thread2 tries to acquire the Spinlock and gets blocked there doing busy wait. As there's an only core, as this busy wait is using it, Thread1 can not be scheduled to complete its action and release the Spinlock until the quantum of Thread2 finishes and a context switch happens. So in this case a spinlock is completely counterproductive. You can confirm here that what I'm saying is correct.

  • You should be aware that the same instruction could run at right the same moment in 2 different cores. The other day I found a bug in some code because in order to generate a pseudo-random string (that would be used for creating a named pipe) the current date to the millisecond level was being used (something like yyyyMMddHHmmssFFF). Well, on occasion, under heavy load testing, a thread would fail to create the named pipe cause another thread had already created it. This means that the 2 threads had run the instruction MyDateToStringFormatter(DateTime.Now); in its respective cores at just the same millisecond!!!

Atomicity, Thread Safety, Concurrent Collections

Another essential point we need to take into account when working on multithreaded applications is the atomocity of the operations. For example, it should be clear than in a multithreaded environment with several threads reading/writing from/to a normal Dictionary, this operation is unsafe:

if(myDictionary.ContainsKey("myKey"))
myVar = myDictionary["myKey"];

cause the first instruction could return true, but before we run the second instruction another thread could remove that key (and obviously in multi-core systems chances increase).

Hopefully .Net 4 introduced a whole set of Thread Safe collections, Concurrent Collections which means that we can easily fix that problematic code by using a ConcurrentDictionary this way:

myDictionary.TryGetValue("myKey", out myVar);

But there are cases that are not so obvious. For example, is there any problem if different threads running in parallel add elements to a normal List? an apparently innocent myList.Add(item);. That Add call is far from atomic. Adding an element to a list involves checking the size of the list and resizing it if necessary, so thread1 could be resizing the list, and before it has time to set its new size thread2 could run its own Add and start a new resize... It's a common question in Stackoverflow.

With this in mind, you set out to use a ConcurrentList but get slapped by the fact that such collection does not exist. Well, pondering over it one realizes that such collection would make little sense [good discussion here]. If several threads can be reading/writing from/to the List, you won't want to access its elements by index, as what you inserted as element 3, could be now at index 5 due to other threads's work. So maybe what you really need is a ConcurrentQueue or a ConcurrentStack, or maybe you just want a sort of container where you can insert/remove items in a thread-safe fashion and apply typical Linq to Objects operation... In this case, a fast look at the collections available in the System.Collections.Concurrent namespace gives us the solution, ConcurrentBag. The fact that it's optimized for situations where the same thread is both producing and consuming items from the collection should not lead you to confusion, you can use it for other concurrent scenarios without a problem (you can read more here and here.

As I already noted here enumerating a Concurrent Collection is safe, I mean, if another thread modifies the collection while your thread is iterating it, you won't get an InvalidOperationException on your next MoveNext, but something that I've found that has quite called my attention is that while
the Enumerator returned from a ConcurrentDictionary enumerates over the "live Dictionary":

The enumerator returned from the dictionary is safe to use concurrently with reads and writes to the dictionary, however it does not represent a moment-in-time snapshot of the dictionary. The contents exposed through the enumerator may contain modifications made to the dictionary after GetEnumerator was called.

the Enumerator returned for a ConcurrentBag iterates over a snapshot of the Bag:

The enumeration represents a moment-in-time snapshot of the contents of the bag. It does not reflect any updates to the collection after GetEnumerator was called. The enumerator is safe to use concurrently with reads from and writes to the bag.

Regarding Atomicity, I'll add that atomicity does not just refer to whether we have 1 or 2 C#/Java instructions, it comes down to the machine level instructions. That's why we have the interlocked class, and furthermore, that's why this class features a Read method (needed for safely reading 64 bits values in 32 bits systems, otherwise thread1 could read the first 32 bits, and thread2 overwrite the second block of 32 bits before thread1 had readed it, obtaining then a "mixed" value)

Friday, 23 August 2013

Odd use of Explicit Interface Implementation

I was trying to write a myConcurrentDictionary.Remove line today, but Visual Studio would prevent me from doing so saying that ConcurrentDictionary<K,V> lacks that method. Well, ConcurrentDictionary implements IDictionary, so it has to feature a Remove method! Looking into MSDN we can see that Remove comes as an Explicit Interface Implementation, so in order to use it you'll need a cast (IDictionary)myConcurrentDictionary.Remove(...);

OK, my understanding of Explicit Interface Implementations is that you use them when your class implements several interfaces with colliding methods and you want to give a different implementation of such method for each interface. ConcurrentDictionary implements 4 different interfaces (IDictionary<TKey, TValue>, ICollection<KeyValuePair<TKey, TValue>>, IDictionary, ICollection) that sport a Remove method. The signatures are a slightly different and somehow colliding (TKey, Object...) so this Explicit implementation makes things clear, but, why don't they add a non Explicit Remove(TKey) method?, that would be the most commonly used. It seems as if they were preventing the use of Remove by sort of hiding it.

Well, some searching confirms that impression. Here we can read:

Explicit interface implementations can be used to disambiguate class and interface methods that would otherwise conflict. Explicit interfaces can also be used to hide the details of an interface that the class developer considers private.

And then we find this and this excellent discussions in StackOverflow, with answers from Jon Skeet:

It allows you to implement part of an interface in a "discouraging" way - for example, ReadOnlyCollection implements IList, but "discourages" the mutating calls using explicit interface implementation. This will discourage callers who know about an object by its concrete type from calling inappropriate methods. This smells somewhat of interfaces being too broad, or inappropriately implemented - why would you implement an interface if you couldn't fulfil all its contracts? - but in a pragmatic sense, it can be useful.

and Eric Lippert:

"Discouragement" also allows you to effectively "rename" the interface methods. For example, you might have class C : IDisposable { void IDisposable.Dispose() { this.Close(); } public void Close() { ... } } -- that way you get a public Close method, you don't see the potentially confusing Dispose method, and yet you can still use the object in a context where IDisposable is expected, like a "using" statement.

The "renaming" thing for Dispose/Close seems a bit unnecessary to me, and as for the "hide to discourage" argument, I lean to see its need as denoting a wrongly designed interface (and interface that can't properly fulfill its contract).

Thursday, 22 August 2013

Delegates Caching

I'm not much sure how, but some days ago I came across this interesting question in StackOverflow, answered no more than by 2 of the C# Gods: Erick Lippert and Jon Skeet.

The method that backs the delegate for a given lambda is always the same. The method that backs the delegate for "the same" lambda that appears lexically twice is permitted to be the same, but in practice is not the same in our implementation. The delegate instance that is created for a given lambda might or might not always be the same, depending on how smart the compiler is about caching it.

A lambda expression which doesn't capture any variables is cached statically A lambda expression which only captures "this" could be captured on a per-instance basis, but isn't A lambda expression which captures a local variable can't be cached

So the C# compiler is smart enough to cache Delegate Instances when possible to avoid creating the same instance over an over. This comes to me as a really interesting revelation, as on occasion I've felt slightly uncomfortable when writing code involving many lambdas as it seemed to me like an "object explosion".

I've done some tests to verify the above claims.

The delegate returned below is not capturing anything (it's not a closure), so we can see caching at work!

public static Func<string, string> CreateFormatter()
{
return st => st.ToUpper();
} 
 ...
var func1 = CreateFormatter();
var func2 = CreateFormatter();
Console.WriteLine("simple delegate being cached? " + Object.ReferenceEquals(func1, func2)); //true

If we take a look at the generated IL, we can see the cryptically named field "CS$<>9__CachedAnonymousMethodDelegate1" used to cache the delegate:

On the contrary, if the code returns a closure, it should be obvious that caching can't take place, as we need different instances, each one with access to the corresponding captured values (the trapped values are properties in an instance of a support class that the compiler creates under the covers and that is pointed from the delegate's Target property).

public static Func<string, string> CreateFormatterClosure(string s)
{
 return st => s + (st.ToUpper()) + s;
}
func1 = CreateFormatterClosure("x");
func2 = CreateFormatterClosure("x");
Console.WriteLine("closure being cached? " + Object.ReferenceEquals(func1, func2)); //false

Notice that I'm using Object.ReferenceEquals rather than == to check for object identity because the == operator for delegates is overloaded to do a value comparison. From msdn:

Two delegates of the same type with the same targets, methods, and invocation lists are considered equal.

If we try similar code in JavaScript, we'll see that there's not any hidden compiler trick and no function caching is done, so each time you create a new function, a new function object is being created

function createFunction(){
 return function(){};
}
console.log(createFunction() == createFunction());//false

(function(){}) == (function(){}); //false

To avoid this, I remember having seen in some library code something like var emptyFunc = function(){}; in order to reuse that unique function wherever a "do nothing" function were needed.

Summing up, the C# compiler does a really great job again (as it does with Closures, Iterators (yield), dynamic, async... It's no wonder why it's taking longer than expected to the Roslyn guys to rewrite the Native compiler in C#

Tuesday, 13 August 2013

Modify While Iterating II

Last month I wrote about the risks of modifying a Collection while iterating it. Today I've come across a couple of things that complement that entry, so I'll post it here.

Recent versions of .Net Framework brought along 2 very important additions in the land of Collections: Read-Only Collections and Concurrent Collections. Pretty fundamental stuff, but admittedly I hadn't made any use of them until very recently. I had some very wrong assumptions as to how these collections behave regarding the modify while iterating thing, so let's take a look:

Read-Only Collections

I guess due to some incomplete, simultaneous reading of several different pieces of information I had the impression that when you create a Read-Only Collection from a normal collection you were taking a snapshot of such collection, that as such would be independent from the original. Nothing further from the true. As clearly stated in the documentation:

A collection that is read-only is simply a collection with a wrapper that prevents modifying the collection; therefore, if changes are made to the underlying collection, the read-only collection reflects those changes. See Collection for a modifiable version of this class.

I think it's quite important to have this pretty clear, as a common scenario is: your class exposes a Read-Only view of one of one internal collection, and while some consumer threads are iterating over that view your class modifies the internal underlying collection. You'll get the classical InvalidOperationException then. I've written some code to confirm it. You can just disassemble the ReadOnlyCollection.GetEnumerator method and will find this:

public IEnumerator GetEnumerator()
{
 return this.list.GetEnumerator();
}

So the normal Enumerator of the internal collection is being used and this enumerator will do the "have you been modified? check" based on the _version field of the internal collection...

Concurrent Collections

Well, for Concurrent Collections it's easy to deduce that if they allow for Adding/Removing/Updating in parallel, iterating at the same time should not be a problem. Somehow I thought to have read something different somewhere, so I did a fast test to verify that you can continue to iterate a collection that has been modified and no InvalidOperationException will happen.

You could also verify it by peeking for instance into the implementation of ConcurrentQueue, and seeing that it lacks any _version field.

Tuesday, 6 August 2013

Maniac

I don't feel much like writing a post now, but after watching this masterpiece I feel compelled to share it with anyone reading this blog. Maniac is one of the best horror films I've watched in a long while. It's extreme, extreme, extreme, utterly extreme... Though set in Los Angeles and starred by North American actors, you could somehow associate it to the New French Extremity school (indeed the director is French).

The story is nothing new, a disturbed young man with a repressed sexuality (due to a trauma from his childhood owing to his mother's unrepressed sexuality) turns into a serial killer. Sure you can think of several films revolving around the same idea, but this one is spiced by some brilliant elements, like being entirely shot from the Point of View of the murderer, the mannequins that give it a feel of arty and "modern horror", and especially the sheer brutality of some of its moments.

Really, this is an absolutely must see for anyone into horror films, but be warned that many people could find it too hard. Indeed, I for one would say there's more blood on screen than necessary, and the last sequence of the film was quite unnecessary (a gore feast that adds no value at all). But well, perfection is the biggest of horrors.... By the way, this is the remake of a film of the 80's, so probably I should also give it a try.

Saturday, 3 August 2013

Multiple Inheritance in C#

From the many features being added to Java 8, there's one that has really caught my eye, Default Interface Implementation aka Defender Methods (the other ones are really necessary stuff, but nothing out of the ordinary, as it's stuff that should have been in the language many years ago).

The main motivation for these Default methods is the same as for Extension Methods in C#, allowing you to add methods to an interface without breaking existing code. Let's think in C# and the pre-Linq times. You had an IEnumerable interface, and felt the need to add to it methods like "Contains, All, Any, Skin, Take...". Well, if you just add those methods, all your existing classes implementing IEnumerable would need to be updated to add an implementation of those methods there... well, quite a hard job. The solution to this in C# were Extension Methods. In Java they were about to mimic this same approach (and indeed you'll still find old references to "Java Extension Methods") but in the end they opted for a much more powerful one, Default Interfaces.

public interface SimpleInterface {
public void doSomeWork();
//A default method in the interface created using 'default' keyword
default public void doSomeOtherWork(){
System.out.println('DoSomeOtherWork implementation in the interface');

You've always been able to implement multiple interfaces, now they're adding behaviour to interfaces, so this winds up in you being able to inherit behaviour from multiple "places"!!!

Extension Methods in C# also adds behaviour to interfaces, and as such you also get a sort of Multiple Inheritance, but in a quite more "second class" limited way. Extension Methods are just a compiler artifact and the method resolution is done at compile time, so you lose the runtime magic of polymorphism-overriding-vTables. When you extend an existing Interface with new methods, if then a derived class implements one of those extra methods, polymorphism won't work to invoke that overriden method. Let's see an example:


public interface IPerson
{
 string Name{get;set;}
 string SayHello();
}

public static class IPersonExtensions
{
 public static string SayBye(this IPerson person)
 {
  return person.Name + " says Bye from Extension Method";
 }
}

public class Person:IPerson
{
 public string Name {get;set;}
 public Person(string name)
 {
  this.Name = name;
 }
 
 public string SayHello()
 {
  return this.Name + " says Hello";
 }
 public string SayBye()
 {
  return this.Name + " says Bye";
 }
}



public class Program
{
 public static void Main()
 {
  //the extension method is good to add a SayBye to the IPerson interface
  //but as a compile time artifact, it will not take into account if the implementing class has "overriden" it
  IPerson p1 = new Person("Iyan");
  Console.WriteLine(p1.SayBye()); //writes "says Bye from Extension Method"
  
  Person p2 = p1 as Person;
  Console.WriteLine(p2.SayBye()); //writes "says Bye"
 }
}

In the example above, the IPerson interface has been extended with an additional method SayBye through the IPersonExtensions static class. Then the Person class tries to override SayBye with its own implementation, but polymorphism won't work when it's invoked in a Person object via IPerson, and the IPerson implementation in the Extension Method will be used rather than the one in Person.

Other limitation of Extension Methods is that they are not visible via Reflection, I mean, if you call Type.GetMethodInfos() it won't return in that list those methods that could be accessed in that type via Extension Methods. As a consequence of this, they don't play well with dynamic either when you expect that dynamic resolution to be done through Reflection. You'll find more information on this here and here

With all this in mind, I decided to simulate this "Multiple Inheritance of Behaviour" in C#. The idea is simple and effective, though not much wrist friendly. For each interface whom you'd like to add behavior you create a class that implements the interface and contains those "default methods", and then, for your normal classes implementing that interface, you'll add a reference to that class for the default implementation, and for those methods for which you don't want to override the default implementation, you just delegate calls to that reference.

public interface IValidable
{
 bool AmIValid();
}

public interface IPersistable
{
 string Persist();
 
 int EstimateTimeForFullPersist();
}

public class DefaultValidable: IValidable
{
 //just one single method, no calls to other methods in the class, so no need for an Implementer field
 public bool AmIValid()
 {
  return this.GetType().GetProperties(BindingFlags.Public|BindingFlags.Instance).All(PropertyInfo => PropertyInfo.GetValue(this) != null);
 }
}

public class DefaultPersistable: IPersistable
{
 public IPersistable Implementer { get; set; }
 public DefaultPersistable()
 {
  this.Implementer = this;
 }
 
 public string Persist()
 {
  //notice how we have to use [this.Implementer.Estimate] here to allow method overriding to work,
  //cause using [this.Estimate] would invoke the Default (NotImplementedException) one.
  if (this.Implementer.EstimateTimeForFullPersist() > 1500)
   return this.ToString();
  else
  {
   //complex logic here
   return "this is the result of a complex logic";
  }
 }
 
 public int EstimateTimeForFullPersist()
 {
  throw new NotImplementedException();
 }
}

public class Book: IValidable, IPersistable
{
 protected IValidable ValidableImplementation { get; set; }
 protected IPersistable PersistableImplementation { get; set; }
 
 public Book(DefaultValidable validableImp, DefaultPersistable persistableImp)
 {
  this.ValidableImplementation = validableImp;
  this.PersistableImplementation = persistableImp;
 }
 
 public bool AmIValid()
 {
  //delegate to default implementation
  return this.ValidableImplementation.AmIValid();
 }

 public string Persist()
 {
  //delegate to default implementation
  return this.PersistableImplementation.Persist();
 }
 
 public int EstimateTimeForFullPersist()
 {
  //do not delegate to default implementation, "override" it
  return 50;
 }
}

public class Program
{
 public static void Main()
 {
  DefaultPersistable defPersistable = new DefaultPersistable();
  Book b = new Book(new DefaultValidable(), defPersistable);
  defPersistable.Implementer = b;
  
  Console.WriteLine("Is the Book valid: " + b.AmIValid().ToString());
  Console.WriteLine("Book.Persist: " + b.Persist());
 }
}

Looking at the implementation above you'll notice that the code is more straightforward for DefaultValidable than for DefaultPersistable. No defaul method in DefaultValidable invokes other methods in the interface, while in DefaultPersistable the Persist method invokes EstimateTimeForFullPersist, which means that in order to invoke the correct implementation if EstimateTimeForFullPersis has been overriden, we have to use the Implementer reference for those invokations.

You should also notice that while the above technique allows "Multiple Inheritance of Behavior" it does not address the real motivation of Default Methods in Java, extending the contract of an existing interface with new methods without breaking existing code. You still need to resort to Extension Methods in C# for that.

All this has reminded me of an interesting post I read months ago about using ES6 proxies as a way to implement multiple inheritance in JavaScript. The idea is pretty interesting, but I see an important flaw, the instanceof operator won't work with the "base classes". Applying instanceof to the proxy object will tell you that it's neither an instance of base1 or base2. This could be fixed if instanceof were also interceptable in the proxy, but seems like (at least in the current proposal) it's not.

By the way, as it's somehow related to this article, I'll reference here my write up about Extension Methods and Mixins/Traits from last year.