Sunday 7 July 2013

Modify while iterating

Modifying (or trying to modify) a collection while iterating (enumerating, traversing) over it is one of those common source of errors that even if you're completely aware of it will probably continue to slip in your code once in a while, even more when there are differences between languages and platforms. So I thought it would be good to do a fast recap here.

Arrays and classic loops

Well, this is pretty clear and simple. Regardless whether you're writing C#, Java or JavaScript, if you're in the middle of a normal for loop or a while and decide to remove the current item, you'll have to be careful with the indexer or you'll end up skipping elements. The approach I always follow is iterating backwards:

for (var i=cities.length-1; i>= 0; i--){
 if (condition(cities[i]))
  cities.splice(i, 1);
}

other people prefer to stick to the forward iteration and carefully update the indexer when a removal is done:

for (var i=0; i≶cities.length; i++){
 if (condition(cities[i])){
  cities.splice(i, 1);
  i--;
 }

.Net Enumerators

If you're iterating over any of the standard .Net collections with a foreach loop and you try to Add or Remove items to it (or even update a Dictionary so that an existing entry points to a different Object), you'll end up getting an InvalidOperationException.
How does this happen?
.Net's iteration paradigm is based on the IEnumerable<T> and IEnumerator<T> interfaces. Whenever you use a foreach loop, the compiler is creating an Enumerator and using its Current property and MoveNext method. If you decompile the Framework assemblies with the wonderful ILSpy and check for example the source for List.RemoveAt and List.Enumerator.MoveNext you'll see that List has an internal numeric field: _version, that is incremented each time the List is modified. When an Enumerator is created over such collection, it saves the value for the collection's _version at that point, and in each MoveNext it compares the saved _version with the current _version in the collection, if they don't match it means that the collection has been modified and the Enumerator will throw an InvalidOperationException. This logic applies to all (I think) Collections in the BCL.

Depending on your collection (if it has a numeric indexer) you could switch to a classic loop and do as explained above, but you better stick to the foreach iteration, create a list of the items to be removed, and once the iteration is over, do a Remove of those items.

Removing while iterating a Dictionary deserves some extra attention. You'll run into the same "don't modify while enumerating" issue both with this:
foreach(KeyValuePair≶K, V> pair in myDictionary)
and with this:
foreach(K key in myDictionary.Keys())
cause the enumerator that you obtain in the second case is also watching the _version field in myDictionary.

This leads us to another strategy for removing from inside a foreach: iterating over a copy of the collection (that simple as using ToList()) but removing from the original collection. That is:

foreach (string st in myList.ToList())
{
if (condition)
myList.Remove(...);
}

foreach(string key in myDictionary.Keys().ToList())
{
if (condition)
myDictionary.Remove(...);
}

Java Iterators

Java's equivalent to C#'s foreach loop, IEnumerable and IEnumerator is the "enhanced for loop" (for (Type t : collection), Iterable<T> and Iterator<T>, and it all works right the same as in its .Net counterpart. If you try to remove an element from the collection from inside a "for :" you'll get an UnsupportedOperationException. The huge difference is that Iterators have a Remove method, which allows you to Remove while iterating if you're using a loop type other than "for :" (you don't get a reference to the iterator created by the enhanced for loop, so you can't use it directly there).

Iterator<Integer> iter = l.iterator();
while (iter.hasNext()) {
    if (condition(iter.next())) {
        iter.remove();
    }
}

JavaScript for in, and Array.forEach

In accordance with its tradition of always being slightly unlike the others, JavaScript behaves a bit differently here. JavaScript objects are sort of dictionaries, so what would happen if you try to remove a property from an object while iterating those properties with a for in?

for (var key in person){
  if (condition(person[key]))
   delete person[key];
}

well, it works without a problem. And what if we try to remove from an Array being iterated with the modern Array.forEach?

cities.forEach(function(city, index, arr){
 if (condition(city))
  arr.splice(index, 1);
});

Again, it works nicely.

No comments:

Post a Comment