Sunday 25 October 2015

Thread Return Value

It's clear to me that since the introduction of the TPL in most cases there is no reason to create Threads directly via the Thread class, and we should just use Tasks via any of the overloads of Task.Run. Tasks work as an upper level abstraction over the low level Threads making use of the ThreadPool, and ever than possible we should use abstractions. Anyway, I still tend to occasionally create Threads directly, just from habit. The other day I realised of an interesting difference between both approaches that leads to writing quite a different code in both cases.

Mainly, Threads started via the Thread class do no return a result. You create your Thread, start it with the Start method and at some point the thread will end, but you are not provided with a generic place where you can get the return value from that code (that's why you can only pass to the Thread constructor a delegate that returns no value). Obviously you could work around this by setting that value in some global place (don't do that) , or in a property of some object passed as parameter to the Thread, or extend the Thread class adding a Result property...
Notice that another possibility would have been that Thread.Join returned a value (this is what happens in the horrific thread API provided by perl), but Thread.Join returns nothing.

Well, the idea of having a Result property in the Thread class is basically what we have with Task<.TResult>, as we get a Result property to hold the result of the code. Cool. This difference can have an interesting effect on how we write our consuming code, basically avoiding the use of locks. Let's see what I mean.

Let's say we have a class that downloads a post. It's an "old school" one working synchronously, a call to GetPost will just block until the operation is complete. We are going to download several posts in parallel, once all of them are finished we want to have these downloads in a dictionary of url, post content.

If we use classic Threads, we'll have to run in the thread the downloading code and the code that will add the result to a dictionary, and as several threads can be accessing this Dictionary at the same time, we have to synchronize the access by using a lock statement (that is, a Monitor)

  private static void ClassicThreads()
  {
   var lockHelper = new Object();
   var downloader = new PostDownloader();
   foreach (string url in urls)
   {
    //c# 5, the foreach var is internal to the loop, so each closure closes over a different variable...
    var th = new Thread(() =>
    {
              var txt = downloader.GetPost(url);
              Console.WriteLine(url + " downloaded");
              lock(lockHelper)
              {
               results.Add (url, txt);
              }
          });
    
    th.Start();
   }
   while(results.Count != urls.Count)
   {
    Thread.Sleep(200);
   }
   Console.WriteLine("All Done!");
   Console.WriteLine("- Results:\n" + resultsToString(results));
  }

If we use Tasks, we can run in the Task/thread only the downloading code, then wait for all the Tasks to be completed, and fill our dictionary with the results from the main thread, by reading the Task.Result property, no need for any synchronization.

  private static void TasksBased()
  {
   
   var downloader = new PostDownloader();
   var tasks = new List≶Task≶KeyValuePair≶string, string>>>();
   foreach (string url in urls)
   {
    Task≶KeyValuePair≶string, string>> downloadTask = Task.Run(() => {
      var res = downloader.GetPost(url);
      Console.WriteLine(url + " downloaded");
      return new KeyValuePair<string, string>(url, res);
    });
    
    tasks.Add(downloadTask);

   }
   Task.WaitAll(tasks.ToArray());
   
   foreach (var task in tasks)
   {
    results.Add(task.Result.Key, task.Result.Value);
    
   }
   
   Console.WriteLine("All Done!");
   Console.WriteLine("- Results:\n" + resultsToString(results));
  }

This article gives a nice overview of Threads, ThreadPool and Tasks.

No comments:

Post a Comment