Wednesday 24 February 2010

Xml Dom, GetElementById

Let's say you need to do some html manipulation. Given that the .Net FCL does not provide us with a html parser, you can use the neat, open source Html Agility Pack. I've used it several times and it's really well thought, providing you with a nice DOM implementation (even with XPath querying) and being able to parse rather malformed html documents.

If you're lucky to be dealing with xhtml documents it seems like you should save yourself using a third party library and just go with the System.Xml DOM implementation (or Linq to Xml or whatever other option). There's one problem with that, though.

Many times all what I need is just reaching an element by using its id, so the basic GetElementById should do the trick and that's all. But it's not.
When working with the Html DOM "id" means an attribute named "id", but when dealing with the Xml DOM "id" means an attribute that has been declared as id in the associated DTD or Schema, so unless you have that associated DTD (the schema doesn't work for the .Net Xml Dom) your beloved GetElementById won't work. This nuisance has made me resort to the Html Agility Pack several times...

Today, while learning a bit more of XPath (I've used it in few and far between occasions, so I have a very basic knowledge), I thought a XPath query could be all what I need, and in fact it is. All you need is something like this:

myXmlDocument.SelectSingleNode("//*[@id='" + id + "']")

so, you can do a simple helper method like this:



   1:  public static XmlElement GetElementById(XmlDocument doc, string id)

   2:          {

   3:              return (XmlElement)(doc.SelectSingleNode("//*[@id='" + id + "']"));

   4:          }



It would be even better to implement it as an extension method, but you could not name it GetElementById then.

I mean:
doc.GetElementById("myTitle")
is much more elegant than
XmlHelper.GetElementById(doc, "myTitle")

problem is that with extension methods you have no way to override an existing implementation, so using the first statement the compiler would be choosing the XmlDocument.GetElementById method provided by the Framework...
You would have to choose a different method name, and well, I can't figure a good semantic name for it "GetElementByIdImproved... GetElementById2..."

No comments:

Post a Comment