Sunday 1 September 2013

Some Notes on Concurrency

My last assignment at work compelled me to go through a fast and pleasant upgrade of my very basic and rusted knowledge about concurrent programming, so there are some thoughts that I'd like to write about, and even when they're rather unconnected, I think I'm not going to refrain myself from throwing them into a messy post...

A basic understanding of Concurrency (threads, locks...) should be fundamental for any software developer, and now that multi-core processors are everywhere, such knowledge should be even more important. However, as more and more of our time as developers is spent writing code in a thread unaware language like Javascript, I think less and less exposed we are to concurrency constructs. In a way we could say that this is a time of concurrency ready hardware and concurrency unready developers...

A multi-core reality

The omnipresence of multi-core processors not only increases the number of situations where using multiple threads would be beneficial (think of CPU bound scenarios), but also adds new elements to account for:
  • Spinlocks I was totally unaware of this synchronization construct until I realized it had been added to .Net 4.0.

    In software engineering, a spinlock is a lock which causes a thread trying to acquire it to simply wait in a loop ("spin") while repeatedly checking if the lock is available. Since the thread remains active but is not performing a useful task, the use of such a lock is a kind of busy waiting.

    What should come to mind after reading that is that this thread doing a busy-wait to avoid a context switch makes only sense in a multi-core processor (or a multi-processor machine). I'll give an example. Let's say Thread1 acquires a Spinlock and gets scheduled out before completing its operation. Then Thread2 tries to acquire the Spinlock and gets blocked there doing busy wait. As there's an only core, as this busy wait is using it, Thread1 can not be scheduled to complete its action and release the Spinlock until the quantum of Thread2 finishes and a context switch happens. So in this case a spinlock is completely counterproductive. You can confirm here that what I'm saying is correct.

  • You should be aware that the same instruction could run at right the same moment in 2 different cores. The other day I found a bug in some code because in order to generate a pseudo-random string (that would be used for creating a named pipe) the current date to the millisecond level was being used (something like yyyyMMddHHmmssFFF). Well, on occasion, under heavy load testing, a thread would fail to create the named pipe cause another thread had already created it. This means that the 2 threads had run the instruction MyDateToStringFormatter(DateTime.Now); in its respective cores at just the same millisecond!!!

Atomicity, Thread Safety, Concurrent Collections

Another essential point we need to take into account when working on multithreaded applications is the atomocity of the operations. For example, it should be clear than in a multithreaded environment with several threads reading/writing from/to a normal Dictionary, this operation is unsafe:

if(myDictionary.ContainsKey("myKey"))
myVar = myDictionary["myKey"];

cause the first instruction could return true, but before we run the second instruction another thread could remove that key (and obviously in multi-core systems chances increase).

Hopefully .Net 4 introduced a whole set of Thread Safe collections, Concurrent Collections which means that we can easily fix that problematic code by using a ConcurrentDictionary this way:

myDictionary.TryGetValue("myKey", out myVar);

But there are cases that are not so obvious. For example, is there any problem if different threads running in parallel add elements to a normal List? an apparently innocent myList.Add(item);. That Add call is far from atomic. Adding an element to a list involves checking the size of the list and resizing it if necessary, so thread1 could be resizing the list, and before it has time to set its new size thread2 could run its own Add and start a new resize... It's a common question in Stackoverflow.

With this in mind, you set out to use a ConcurrentList but get slapped by the fact that such collection does not exist. Well, pondering over it one realizes that such collection would make little sense [good discussion here]. If several threads can be reading/writing from/to the List, you won't want to access its elements by index, as what you inserted as element 3, could be now at index 5 due to other threads's work. So maybe what you really need is a ConcurrentQueue or a ConcurrentStack, or maybe you just want a sort of container where you can insert/remove items in a thread-safe fashion and apply typical Linq to Objects operation... In this case, a fast look at the collections available in the System.Collections.Concurrent namespace gives us the solution, ConcurrentBag. The fact that it's optimized for situations where the same thread is both producing and consuming items from the collection should not lead you to confusion, you can use it for other concurrent scenarios without a problem (you can read more here and here.

As I already noted here enumerating a Concurrent Collection is safe, I mean, if another thread modifies the collection while your thread is iterating it, you won't get an InvalidOperationException on your next MoveNext, but something that I've found that has quite called my attention is that while
the Enumerator returned from a ConcurrentDictionary enumerates over the "live Dictionary":

The enumerator returned from the dictionary is safe to use concurrently with reads and writes to the dictionary, however it does not represent a moment-in-time snapshot of the dictionary. The contents exposed through the enumerator may contain modifications made to the dictionary after GetEnumerator was called.

the Enumerator returned for a ConcurrentBag iterates over a snapshot of the Bag:

The enumeration represents a moment-in-time snapshot of the contents of the bag. It does not reflect any updates to the collection after GetEnumerator was called. The enumerator is safe to use concurrently with reads from and writes to the bag.

Regarding Atomicity, I'll add that atomicity does not just refer to whether we have 1 or 2 C#/Java instructions, it comes down to the machine level instructions. That's why we have the interlocked class, and furthermore, that's why this class features a Read method (needed for safely reading 64 bits values in 32 bits systems, otherwise thread1 could read the first 32 bits, and thread2 overwrite the second block of 32 bits before thread1 had readed it, obtaining then a "mixed" value)

No comments:

Post a Comment