Saturday 23 March 2013

Casting, Conversion and Coercion

Casting, Conversion and Coercion can be confusing sometimes, and more if those discussing about them are rooted in different languages. C#'s "hybrid Casting/Conversion" can be particularly confusing. This is shown by the many questions in StackOverflow about this matter [1], [2]...
My own understanding has changed a bit as of late based on some good readings that I've done.

I used to think of casting (upcasting and downcasting) just in terms of programmers advising/instructing the compiler. With upcasting you mainly tell the compiler what method we want to invoke: "hey, though I'm a Dog, I want to invoke the DoNoise method in my parent (Animal)". With downcasting, we say: "hey, though you think I'm an Animal, indeed I'm a Dog, so do a runtime check to verify that this is true and that you can invoke this Dog specific method. If I'm lying, throw an exception". In C#, this means that the IL generated by the compiler will have a castclass operation, that is, a typechek.

using System;

class Animal
{
 public void DoNoise()
 {
  Console.WriteLine("Animal.DoNoise");
 }
 public virtual void Move()
 {
  Console.WriteLine("Animal.Move");
 }
}

class Dog: Animal
{
 public new void DoNoise()
 {
  Console.WriteLine("Dog.DoNoise");
 }
 
 public override void Move()
 {
  Console.WriteLine("Dog.Move");
 }
 
 public void Bark()
 {
  Console.WriteLine("Dog.Bark");
 }
}

class App
{
 public static void Main()
 {
  Dog d1 = new Dog();
  ((Animal)d1).DoNoise();
  //upcasting and non virtual method, prints: Animal.DoNoise

  ((Animal)d1).Move();
  //prints: Dog.Move
  //as this is a virtual method we have dynamic binding here, so the casting has no effect
  //and it still calls Dog.Move

  Animal a1 = d1;
  ((Dog)a1).Bark();
//prints: Dog.Bark
  //compiler adds a runtime check, so at runtime a1 is verified to be a Dog, and Bark is invoked
 }
}

For me, the above has nothing to do with Conversion, but with asking the compiler to treat an object of allegedly one type as if it were of another type. Thing is that in C#, the casting operator, can be used both for the hinting described above, but also to instruct the compiler to generate real conversions. Notice that c# defines a set of those explicit conversions, and we can define our own implicit (no cast used) or explicit (using the cast syntax) conversions. This feature will give you quite shocking code if you're used to think of casting in terms of subtyping, cause you can end up finding things so odd like this:

class Table
{
    static public explicit operator Dog(Table t)
    {
       return new Dog();
    }
}

....
Dog d = (Dog)new Table();
...

All this is greatly explained by the almighty Eric Lippert in StackOverflow and on his blog:

A "cast" is the usage of a cast operator. A cast operator instructs the compiler that either (1) this expression is not known to be of the given type, but I promise you that the value will be of that type at runtime; the compiler is to treat the expression as being of the given type, and the runtime will produce an error if it is not, or (2) the expression is of a different type entirely, but there is a well-known way to associate instances of the expression's type with instances of the cast-to type. The compiler is instructed to generate code that performs the conversion. The attentive reader will note that these are opposites, which I think is a neat trick.

  • My code has an expression of type B, but I happen to have more information than the compiler does. I claim to know for certain that at runtime, this object of type B will actually always be of derived type D. I will inform the compiler of this claim by inserting a cast to D on the expression. Since the compiler probably cannot verify my claim, the compiler might ensure its veracity by inserting a run-time check at the point where I make the claim. If my claim turns out to be inaccurate, the CLR will throw an exception.
  • I have an expression of some type T which I know for certain is not of type U. However, I have a well-known way of associating some or all values of T with an “equivalent” value of U. I will instruct the compiler to generate code that implements this operation by inserting a cast to U. (And if at runtime there turns out to be no equivalent value of U for the particular T I’ve got, again we throw an exception.)

a casting can generate a conversion (which means the compiler will add code for that) or just nothing (when it's an upcast) or just will add a castclass operation at the IL level, that is, a typechek.

As you can see, my explanation above corresponds just to point (1). So, as Eric mentions in some other answer, we better think of casting just as syntax, a syntax which can mean 2 very different things. Eric calls this dual behaviour a "neat trick", honestly I would call it slightly confusing. If I want to do a conversion from one object to another I would prefer to state it clearly, with something like Convert.DoWhateverConversion

Seems like people tend to use the term conversion for both cases, both when we're just hinting the compiler and no physical conversion of one object into another takes place (Identity conversion), and when an object is transformed into a different object (a double into an int, a string into an int... )

A "conversion" is an operation by which a value of one type is treated as a value of another type -- usually a different type, though an "identity conversion" is still a conversion, technically speaking. The conversion may be "representation changing", like int to double, or it might be "representation preserving" like string to object. Conversions may be "implicit", which do not require a cast, or "explicit", which do require a cast.

All the above is of great help to better fully understand the difference between a cast and the as operator in C#. As explained here, the as operator is only related to the first part of a cast, not to the second one (so when it says "conversion", it's referring to the aforementioned "identity conversions"

The "as" operator only considers reference, boxing and unboxing conversions.

Another concept related to conversions and that I think we've mainly become acquainted with in the JavaScript arena is Coercion. I've read it somewhere and I think the explanation is clear and accurate: A "coercion" is a representation-changing implicit conversion.. This post does a really good job explaining JavaScript coercion. This said, we could call C#'s implicit conversions coercions. Bear in mind that these coercions really involve changing one value from one representation to another, it's not just a hint

short s = 5;
s.GetType();
//System.Int16, so this representation now takes up 2 bytes
int i = s;
i.GetType();
//System.Int32, so this representation now takes up 2 bytes

For a better understanding of all this, I reccomend reading these posts [1] and [2].

No comments:

Post a Comment