Wednesday, 24 February 2010

Xml Dom, GetElementById

Let's say you need to do some html manipulation. Given that the .Net FCL does not provide us with a html parser, you can use the neat, open source Html Agility Pack. I've used it several times and it's really well thought, providing you with a nice DOM implementation (even with XPath querying) and being able to parse rather malformed html documents.

If you're lucky to be dealing with xhtml documents it seems like you should save yourself using a third party library and just go with the System.Xml DOM implementation (or Linq to Xml or whatever other option). There's one problem with that, though.

Many times all what I need is just reaching an element by using its id, so the basic GetElementById should do the trick and that's all. But it's not.
When working with the Html DOM "id" means an attribute named "id", but when dealing with the Xml DOM "id" means an attribute that has been declared as id in the associated DTD or Schema, so unless you have that associated DTD (the schema doesn't work for the .Net Xml Dom) your beloved GetElementById won't work. This nuisance has made me resort to the Html Agility Pack several times...

Today, while learning a bit more of XPath (I've used it in few and far between occasions, so I have a very basic knowledge), I thought a XPath query could be all what I need, and in fact it is. All you need is something like this:

myXmlDocument.SelectSingleNode("//*[@id='" + id + "']")

so, you can do a simple helper method like this:

   1:  public static XmlElement GetElementById(XmlDocument doc, string id)

   2:          {

   3:              return (XmlElement)(doc.SelectSingleNode("//*[@id='" + id + "']"));

   4:          }

It would be even better to implement it as an extension method, but you could not name it GetElementById then.

I mean:
is much more elegant than
XmlHelper.GetElementById(doc, "myTitle")

problem is that with extension methods you have no way to override an existing implementation, so using the first statement the compiler would be choosing the XmlDocument.GetElementById method provided by the Framework...
You would have to choose a different method name, and well, I can't figure a good semantic name for it "GetElementByIdImproved... GetElementById2..."

Tuesday, 23 February 2010

Antikörper (Antibodies)

I've just watched this brilliant German film on TVE2 (this public channel has the good habit of displaying some rather good films on Monday evenings).
Antikörper is an absolutely impressive thriller that proudly draws inspiration from masterpieces such as "Silence of the lambs" and "Seven" (there's a key difference with Seven, work it out yourself).

I love the visual work, that includes top notch common modern terror imagery, with some lovely expressionist paintings by the crazy killer, white clean rooms with blood stains, dark, cold rooms in a dark cold metropolis (I love you Berlin)...
Good part of the story revolves around Christianism, with all its imposed moral views, tormented souls, punishment, myths (Binding of Isaac)... and takes place in Southern Germany. This is the second film, (the first was Requiem, another excellent film) that depicts the rural society there as terribly (even fanatically) Catholic.

For me, staying 127 minutes in front of the screen with no intention of getting up there for a single second means much as to how good and intense this film is.
This is a must-see film and the only thing I have to say now is THANKS to the whole crew involved in its production.

Saturday, 20 February 2010

6 Degrees and the Human Web

I've just watched on TV this interesting documentary about the Six degrees of separation idea.
I think most people have heard of this, but I guess that just like me, they've thought about it in a social, even fun and superficial level (by the way and just to show off a bit, I think I'm like 3 or 4 steps away from Noam Chomsky).
Research has shown that the same theory that explains how people connect with each other, can be applied to Computer networks, sexual relationships, the Web and even protein interactions.
All these structures come down to the same, a network of items. Most of these items are directly connected to a few, rather nearby items, but, here comes the important point, a few items are directly connected to very remote items, and a few items are connected to many other items (these are called Hubs). "Long distance connectors" and Hubs are the key of the theory, the global connectivity lies in them.

This "Degrees of Separation" thing is essential to understanding how diseases, computer viruses, ideas, conspiracies... can spread. Furthermore, it can be used to study how different diseases are linked by common genes, and how proteins interact inside our cells to cause disease.

These two sentences at the end of the documentary sums it up quite well:

"Network science is the foundation of the 21st century"

"All the major problems in science today depend on understanding networks"

Tuesday, 16 February 2010

Dynamic vs Static

I'm not much keen on watching "Programming Videos". Sure there are some very interesting ones, but they tend to bore me terribly, among other things because they tend to be too long, so I prefer the old way, a nice written article that you even can print if you find it really useful (of course both sides and two pages per side, I try to waste as little paper - and the energy and chemicals needed to recycle it also count - as possible).

Well, to the point, the thing is that the other day I found a video in Infoq with a good topic, differences between dynamic and static languages, and when I read the Bio of the lady giving the talk I can't deny that my interest grew up (yes, still today is rather unfrequent to find a lady with a passion for programming, technology...), so I decided to give it a try, and it proved to be one the best examinations of Static vs. Dynamic that I've came across.

Warning, what follows is a mix of what is stated in the presentation with my own ideas, so please, watch the video from that smart lady and give little credit to my musings below.

First and pretty obvious, but more than once in discussions with colleages I had missed it, Dynamic means done at runtime, Static means done at compile time.

Second, we have several terms with dark and mixed meaning:

  • Static Typing vs Dynamic Typing.

  • Strong Typing vs Weak Typing (vs Duck Typing)

For me, Strong typing means that types exist and type checks are done, either at compile time (Static Typing) or at runtime (Dynamic Typing). This article considers Python a Dynamic, Strong typed language, well, I don't agree with that, sure types exist in Python, but I think not real (runtime) type checks are done, just Duck typing checks (it quacks like a duck, walks like a duck... so it's a duck), and if the field, method does not exist in the Internal Dictionary, error...
So, I would better classify it as Duck typed (as Wikipedia does).
About Weak typing, I don't have a clue of what it really means...

We also have Implicit Typing, that has nothing to do with Dynamic Typing (C# var is very convenient, but it's just implicit typing, that is, compiler magic saving you keystrokes or even declaring just one time classes).

I think that when we talk nowadays about Dynamic languages we're talking about much more than the classification above.
Some of those things that to a greater or lesser extent conform what we call a Dynamic Language are:

  • Introspection. Programs can query their own internal structure at runtime. Static languages like C++, Java or classic C# (System.Reflection) have this feature.

  • Structural Reflection. Programs can query and modify their own internal structure. This means that we can add or remove methods and data fields to an Object (instance object or class object, this is sometimes called expandos), modify the inheritance chain... This is possible in the almighty JavaScript, Python, Ruby. Don't confuse C# Extension Methods with this, these ones are just a compiler trick.

  • Dynamic compilation. The program is able to add and run new code. We have two tastes here, the more traditional API style (available in languages like C# under the System.Reflection.Emit namespace), that allows a more limited form of code addition, and the almighty and straightforward "string eval" style (Javascript's eval(), Python's compile() and exec()) where the new code is able to run in the local scope!

  • Classes as first-order objects. I mean with this that Classes, that construct used for templating new objects, are objects themselves, that can be treated as other objects. This means that we have Instance Objects and Class Objects. It's a step forward from the C# Type class.

  • Support for "trendy" functional style features
    • Functions as first order objects. That means you can pass functions as parameters, return functions (that is, you have metafunctions)... C#'s delegates are good step in this direction

    • Closures. This is a long topic, not much to say in this post...

    • Partial function evaluation, binding object to functions

    • Generators (aka iterator blocks)

There's one interesting note by the speaker that I try to apply to many other aspects in life, "it's not a matter of whether a language is dynamic or not, but of how dynamic it is". These things are not digital, but purely analog, gradients are the important part here. It's the same I think when we talk about whether animals are intelligent or not, whether they have a language or not... there are many degrees in between...

Tuesday, 9 February 2010

Some useful methods

The other day I found myself with such a simple task as turning a long list of tab separated rows into a html table. In these times of Linq, functional programming, declarative programming... it seems like normal loops are no longer fun, so I came up with this solution:

   1:  string lineFormat = "<tr>\n" +

   2:                              "<td>{0}</td>\n" +

   3:                              "<td>{1}</td>\n" +

   4:                              "<td>{2}</td>\n" +

   5:                              "<td>{2}</td>\n" +

   6:                              "</tr>\n";

   7:  result = String.Join("", txt.Split("\n".ToCharArray())

   8:                          .Select(line => String.Format(lineFormat,line.Split("\t".ToCharArray())))

   9:                          .ToArray());

Looking at that code, there's something that hurts a bit, those 4 <td>n</td> lines, product of copy pasting... so, thinking of a nicer way to create that string of html:

   1:  string template = "<td>{{{0}}}</td>\n";

   2:  String.Join("", 

   3:       Enumerable.Range(0,4)

   4:       .Select(num => String.Format(template,num)).ToArray());

Wow, sure at first sight is rather less clear than the initial "hardcoded" version, but this one is cooler :-)

In both codeblocks above I'm using the String.Join method. That's OK, but first I don't like the fact that it needs and array instead of just an IEnumerable<string> and second, it breaks a bit the "fluent style" that I like so much, so after some fast investigation I found it can be replaced with Enumerable.Aggregate, so the last line would look like this:

   1:  res = 

   2:          Enumerable.Range(0,4)

   3:                   .Select(num => String.Format(template,num))

   4:                   .Aggregate((it1, it2) => it1 + it2);

This coding with the String.Format and Collections made me recall an Extension method that I coded some time ago:

   1:  /// <summary>

   2:          /// Does a looped String.Format, that is, apply the formatting pattern to an enumeration of elements and joins the results

   3:          /// </summary>

   4:          /// <typeparam name="T">Type of the item that will be formatted with the pattern</typeparam>

   5:          /// <param name="pattern">string pattern</param>

   6:          /// <param name="enumerable">enumeration of items to be formatted</param>

   7:          /// <param name="joiner"></param>

   8:          /// <param name="selectors">array of methods that select from the item the values to be used in the formatting</param>

   9:          /// <returns></returns>

  10:          public static string FormatEach<T>(this string pattern, IEnumerable<T> enumerable, string joiner, params Func<T, string>[] selectors)

  11:          {

  12:              //rather interesting, I'm using 2 nested Selects, that's equivalent to using 2 nested loops.


  14:              return String.Join(joiner,

  15:                                 enumerable.Select(item => string.Format(

  16:                                                                              pattern,

  17:                                                                              selectors.Select(selector => selector(item)).ToArray()

  18:                                                                         )

  19:                                                   ).ToArray());

  20:  }

It applies a format string to a collection of objects and joins together the resulting strings. The formatting of each element in the collection differs from the normal String.Format in that instead of using an Array of objects I use an object from which I extract the values to use in the formatting through the application of the corresponding projection delegate in the selectors array.

Extension methods have been an incredibly useful addition to .Net, and one sometimes wonders, why didn't they add more?
For example, it's rather annoying not being able to do this:

string sr = "a"*5;

OK, in 2002, when .Net was released and it looked like a much more "traditional" platform, maybe it wasn't an operation in everyone's mind, but now that almost everyone has got used to the expressive wonders of languages like Python, Javascript (yes, I know, you can't do that in JS, but anyway) it seems like a rather reasonable addition.
So, just add an expression method and we're done...
Wait, not so easy, unfortunately you can't have operator overloads in a Static Class... but well, at least it would be rather convenient to have a method like this:

string sr = "a".Multiply(5);

I've added one to NAsturLib, my small library of convenience methods, code reminders and non particularly useful code attempts...

   1:  public static string Multiply(this string s, int times)

   2:          {

   3:                return Enumerable.Range(1, times)

   4:                      .Select(num => s)

   5:                      .Aggregate((cur, next)=> cur + next);


   7:          }

Well, time to close this "mosaic of damn easy methods that I want to have easily accessible for future reuse" post (yes, I'm afraid my blog will end up turning into sort of a personal Pastebin...)

Wednesday, 3 February 2010

Stalin and Hitler in a Viennese café

Until watching this interesting documentary about Hitler and Stalin I'd never thought about the fact that both monsters never met in person (at least officially). All the contacts and negotiations between the Nazi Germany and the USSR (yes, both governments were "good friends" until Hitler decided to invade the USSR in 1941) were done by other people, never by the Heads of State.
For example it was Molotov and Ribbentrop who signed the Non-aggression pact (including the division of Europe between both superpowers), it was German and Russian officials who gathered together between 1939 and 1941 to discuss how their respective occupations were going, it was Russian soldiers who handed out Jews to the German soldiers...

So, it's strange that they never met together, could it be that there were some hidden hatred between them (mainly from Hitler to Stalin, it was Hitler who broke the pact, he was obsessed with wiping out Stalingrad...)
Could it be that they ran one into another in Vienna in 1913? (hey, this is not my investigation, it's mentioned in the documentary and in Wikipedia):

In January 1913, Stalin was sent to Vienna by Lenin and spent six weeks there writing an article "Marxism and the National Question".

Hitler, after being rejected twice by the Academy of Fine Arts Vienna (1907 – 1908), tried to make a living as a painter, copying scenes from postcards and selling his paintings to merchants and tourists. He left for Munich in May 1913 when he received the final part of his father's estate.

So, I'll let my imagination fly for a while (it's something I'm rather good at). Can you imagine Stalin and Hitler each one sitting alone in a Viennese café in a cold winter afternoon, both staring at the same chick? and then it's the Russian who picks up the girl. Can you imagine Hitler's raving hatred when seeing an "Aryan fecundity goddess" choosing the (in his supremacist rotten mind) "Slavic subhuman" over him... And then, can you imagine 25 years later the fanatic German leader preparing his revenge against the Russian flirter?

And, what would have happened if these two shitheads had turned friends just by chance?
Maybe Hitler would have considered Slavic people as the "true Aryans" and changed all his Norse mythology for old Slavic cults? Would the Atlantic Europe had turned into the "living space" for "GerRussia"? Would I be writing this post in German or Russian?

OK, enough rambling for today...