Tuesday, 11 June 2013

Java vs .Net: String.equals, interning

As someone who reads and writes code in C# and JavaScript for most of his (programming) time, I still get puzzled each time I look at some Java code and find a String.equals(). Yes, I know the reason for this, anyone with a minimum Java background needs to know it, but it still seems odd to me, and even more odd to know that using a "==" could give different results depending on an implementation detail like string interning

So, first of all, a comparison can mean 2 different things: Identity/Reference equality and Value equality. This post puts it nicely:

  • Identity (reference equality)
    Two objects are identical if they actually are the same object in memory. That is, references to them point to the same memory address.
  • Equivalence (value equality)
    Two objects are equivalent if the value or values they contain are the same.

Either in C#, Java or JavaScript when we compare Value/Primitive types we expect an equivalence comparison, and when we compare Reference types (Objects) we mainly expect a reference comparison. However, some a few things come in the way to make matters more complicated: boxing (and caching), strings (and interning). Strings are Objects/Reference types (well, not in JavaScript where they are primitive types), but when comparing them we would usually prefer value semantics. I mean, you usually don't mind whether 2 strings are the same (the same bytes at the same memory address), but whether they have the same value. Java and C# follow different policies here.

C# sticks to the value semantics, and when comparing 2 strings with "==" it'll apply value equality. It does this by overloading the "==" operator for the String class. If it hadn't been overloaded "==" would do a Reference comparison (as it does for other objects). Operator overload is based on static methods, so it's resolved at compile time, which has really interesting implications, as sharply explained by Eric Lippert.

//C# code:
object obj = "Int32";
string str1 = "Int32";
string str2 = typeof(int).Name;
Console.WriteLine(obj == str1); // true (Reference equality and both strings are interned)
Console.WriteLine(str1 == str2); // true (value equality)
Console.WriteLine(obj == str2); // false !? (Reference equality and the 2nd string is not interned)

Java does not feature operator overloading (I have mixed feelings for operator overloading, so I wouldn't necessarily say that it's a bad thing to have dismissed it), so it would not be easy to justify that "==" would behave for Strings differently from how it does for other objects, so str1 == str2 does a Reference comparison, and that's why you'll have to use String.equals, that does a value comparison. The odd thing, is that because of interning, sometimes == could seem to be doing a value comparison. Let's say we have:

//Java code:
String s1 = "hi"; //literal string, so it's interned
String s2 = "hi"; //literal string, so it's interned
s1 == s2; //true
String s3 = new String("hi"); //no interning
s2 == s3; //false

Being interned means that s1 and s2 are pointing to the same place in the string pool, so the Reference comparison will be true. However, s3 is not interned, so it's a different memory chunk, and the comparison will be false. This answer in Stackoverflow summarizes it pretty well:

== tests for reference equality.
.equals() tests for value equality.

Consequently, if you actually want to test whether two strings have the same value you should use .equals() (except in a few situations where you can guarantee that two strings with the same value will be represented by the same object eg: String interning).

On the contrary, in C# the fact of a string being interned or not won't have any effect on a "==" comparison. As it conducts value equality, it's the same whether the strings are really the same in the interning area, or different pieces of stack or heap memory. Well, C#'s "==" (that in the end calls to String.Equals()) will first do a reference check, so it can return true immediately if the strings are interned, sparing this way a longer char by char comparison. This is brilliantly explained here (I'd always thought of interning as a memory optimization, not as a processing optimization, and the point he brings up about using a string.intern() before a switch is really interesting)

It's also interesting to do some mention to boxing when dealing with equality. As expected, in C# (same as in JavaScript) when boxing 2 integers (Numbers) we have a reference comparison, so we get a false in these cases below:

I haven't done the test in Java cause I don't know any Java repl and I don't feel like going into the trouble of writing a Program.java for this... but this discussion in StackOverflow really bewildered me, cause due to caching the result for small numbers would be true:

If the value p being boxed is true, false, a byte, a char in the range \u0000 to \u007f, or an int or short number between -128 and 127, then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.

5 comments:

  1. The previously artist known as YesIusedToLikeEuropeWhat'sWrong11 June 2013 at 21:45

    Hi, man, well I'm afraid my comment doesn't really have much to do with your post, but taking in account we're neighbours (yes, it's me again :P) I think you're the right person to guide me through this choice.

    Looking for some courses and ways to learn I've found some really interesting thing, but I'd have to choose between:

    (Always talking about web development, of course)

    Web development focused on PHP
    or
    Web development focused on Java
    or
    Web development focused on .NET (I'm not even quite sure what this one is, indeed...)

    So the million dollar question is:

    what would you suggest me to choose? I mean, can you give me some piece of advice, please?

    Which one is "the best"? or The most requested? or The hardest to learn (in order to choose the easiest, of course :P)?

    Well, since you know me, I'm sure you'll give a very good piece of advice for someone who still is into the Plato's cavern ;-)

    Thanks¡

    ReplyDelete
    Replies
    1. hi neighbour :-) I'll send you an email as I have some time

      Delete
  2. YesISawEuropeLive11 June 2013 at 21:49

    hmmm... who still is? or who is still? I think both are correct, but still is emphasizes the issue...

    ReplyDelete
  3. Ey, did you already fix the comments issue? Cool!
    As usual, really nice article Xose, keep up the good work! ;)

    ReplyDelete
  4. Many Thanks Dani, great you liked it!
    As for the comments system, I just posted a comment when you told me about the issue and somehow it seems to have sorted itself out :-)

    ReplyDelete