Monday, 26 June 2017

Local Functions and Async

Following my previous post on Local Functions and Iterator methods, it's time now to see what I've learnt about async thanks to local functions

When reading the last part of this article, this paragraph was a great finding.

This exhibits the same issue as the iterator method. This method doesn’t synchronously throw exceptions, because it is marked with the ‘async’ modifier. Instead, it will return a faulted task. That Task object contains the exception that caused the fault. Calling code will not observe the exception until the Task returned from this method is awaited (or its result is examined).

So, let's compare these 2 methods:

public static Task<string> DoProcess1(string st)
  {
   Console.WriteLine("DoProcess1method started in: " +  Thread.CurrentThread.ManagedThreadId);
   if (st.StartsWith("throw")){
    throw new Exception("Can not process invalid string");
   }
   Console.WriteLine("string is valid to be processed");
   
   
   var t1 = Task.Run(() => {
                      Console.WriteLine("Time consuming operation running in: " +  Thread.CurrentThread.ManagedThreadId);
                      Thread.Sleep(1000);
                      return st.ToUpper();
                     });
   return t1;
  
  }
  
  public static async Task<string> DoProcess2(string st)
  {
   Console.WriteLine("DoProcess2 method started in: " +  Thread.CurrentThread.ManagedThreadId);
   if (st.StartsWith("throw")){
    throw new Exception("Can not process invalid string");
   }
   Console.WriteLine("string is valid to be processed");
   
   
   var res = await Task.Run(() => {
                      Console.WriteLine("Time consuming operation running in: " +  Thread.CurrentThread.ManagedThreadId);
                      Thread.Sleep(1000);
                      return st.ToUpper();
                     });
   
   return res;
  
  }

Both methods are basically the same, but the second is using await and hence we've had to decorate it with the async keyword. In both cases the first lines of the method (until Task.Run) are executed by the thread from the caller method, but while in Process1, the exception will be thrown immediatelly, in Process2 the compiler generated code will capture the exception and won't throw it until the Result of the returned Task is accessed. Let's see:

try{
    t1 = ProcessingService3.DoProcess1("throw aaaa");
    
    //we DON't reach this line
    Console.WriteLine("after Process1 call");
    Console.WriteLine(t1.Result);
   }
   catch (Exception ex){
    Console.WriteLine("Exception: " + ex);
   }
   
   Console.WriteLine("----------------");
   
   try{
    t1 = ProcessingService3.DoProcess2("throw bbbb");
    
    //we reach this line
    Console.WriteLine("after Process1 call");
    //exception occurs in the line below, when accessing Result
    Console.WriteLine(t1.Result);
   }
   catch (Exception ex){
    Console.WriteLine("Exception: " + ex);
   }

This is interesting, in DoProcess1, 0 or 1 Task object is created (depending on whether we throw or not before reaching Task.Run). In a method marked as async, like DoProcess2 the compiler creates at least 1 "main" Task object. This main Task object is always created, it does not involve a particular thread, but is used to hold the Result/Exceptions of the different Tasks created inside the method, and is returned to the invoker of the method. If DoProcess2 does not throw an Exception then it reaches Task.Run, and the compiler will add an invocation to ContinueWith on that second task, and so on for as many "await" calls we could have inside the async method. For the last one the ContinueWith will set a Result in that main Task created at the beginning of the call. If at some point an exeption happens the exception will be added to the Main Task and no more code of the asynchronous method will be run.

OK, the explanation above is confusing and not really accurate, but it helps me to get an approximate idea of what is going on in a method marked as async. If you really want to understand it, check this amazing post.

As you can see in that glorious article, for each method marked as async the compiler will create a class that is basically a State Machine. This State Machine has a MoveNext method that orchestrates the sequence of asynchronous calls. Each await call that exists in the async method is like a step, and where each awaited Task is asked (via ContinueWith) to call into MoveNext once it's done.

It's amazing to see how similar this compiler generated State Machine is to the one generated for an Iterator method. In the end, both things are basically the same, the big difference is that for Iterators is the consumer of the iterator who will be calling to MoveNext, while for the async-await thing it's the asynchronous method who once finished will call to MoveNext. Just as a reminder, in ES6 (while waiting for the await keyword to be introduced in ES7) people have been leveraging generators to sort of simulate await and avoid the callback hell.

Saturday, 24 June 2017

Local Functions and Iterators

In the end I find the introduction of Local Functions in C# 6 really useful, as I have learnt a couple of things reading about their uses. Thanks to them I've got a refresh and improvement of my understanding of how the compiler manages Iterators and Async methods. As explained here, in both cases local functions provide a useful pattern to manage validation in Iterators and Async Methods. This post will focus on Iterators.

I've had since its inception a more or less clear a view of the magic used by the compiler when it comes across an Iterator method. It creates a class implementing both IEnumerable and IEnumerator and moves the code of the iterator method into the MoveNext method in that class (that acts as a sort of State Machine), replacing the "yield" keyword (that does not exist at MSIL level) with assignments to the "Current" support field. No need to enter into more details when you have a perfect explanation here.

The thing is that maybe one could think that when we have code like this:


public IEnumerablet<string> GetMainDocuments(string url)
{
 if (url == null){
  throw new Exception("Empty!");
 }
 for (var i=0; i<5; i++){
  yield return downloader.download(url + "/item/i");
 }
}

The compiler could put the code that goes before the loop in a separate method, not in the MoveNext. A method that would be executed before we started the iteration with MoveNext, just invoked from the constructor of the state machine for example. This would make sense for certain cases of validation code, where we could want an immediate crash rather than a delayed crash when we decided to iterate. But we have to be aware that one of the basis of iterators is "lazy evaluation/deferred execution". In a way they are a bit like "promises", we have a "promise of a sequence" but each of its items does not materialize until the moment when we ask for it in the next iteration step. Have this in mind, and think that there are many cases where the "pre-iteration" code should be done just at the moment of starting to iterate: maybe it returns values used during the iteration, maybe it's still a validation, but one that depends on dynamic factors and has to be done just at the moment of starting to iterate, not before. So as the compiler can not differenciate those cases, absolutely all the code in your iterator method is put inside the MoveNext of the state machine.

For example think about this case:

public IEnumerable<string> GetMainDocuments(string pattern, DBHelper dbHelper)
{
 if (pattern == null){
  throw new Exception("Empty!");
 }
 
 if (!dbHelper.IsDbUp()){
  throw new Exception("DB down!!!");
 }
 
 IEnumerable<string> documents = dbHelper.GetDocuments("whatever query " + pattern + " whatever");
 
 for (var i=0; i<5; i++){
  yield return downloader.download(url + "/item/i");
 }
}

For the first contitional it would be useful to do it in a no deferred way, but for the second conditional, it makes more sense to do it just at the moment when data start to be needed. And for the third, again it makes more sense to do it not deferred (or maybe you wanted a snapshot at the moment when GetMainDocuments was called?) Yes, there are many possibilities, and the compiler can not just guess. So just have to adapt to his rules. Anything, inside the iterator will be deferred, so if there is code that you don't want lazy you'll have to separate it yourself

public IEnumerable<string> GetMainDocuments(string url)
{
 IEnumerable<string> getDocuments(string url){
  for (var i=0; i<5; i++){
   yield return downloader.download(url + "/item/i");
  }
 }
 
 if (url == null){
  throw new Exception("Empty!");
 }
 return getDocuments(url);
}
 

I have no idea of how the different JavaScript engines manage generators, but the issue with deferred execution is the same. Nothing of the code in your generator function will be executed until you start to iterate the created generator object.


function* nameGenerator(){
 //this console.log won't be executed until the first call to "next"
 console.log("inside iteration");
 yield "xuan";
 yield "xana";
 yield "iyan";
};

let gen = nameGenerator();

console.log("generator created\n");

console.log("starting loop");
for(let name of gen){
 console.log(name);
} 

//output:

// generator created

// starting loop
// inside iteration
// xuan
// xana
// iyan

Wednesday, 21 June 2017

A Distaste For Expression Bodied Members

I really like to see programming languages evolve, getting new features version after version. Even if some of those features are not particularly useful or revolutionary, but just a way to make code less verbose, I appreciate them, as I like to see evolution and dynamism, being forced to learn something new. Microsoft has put quite a bit of effort in saving us to type repetitive code and do our code more concise. The introduction of Automatic properties was a huge step, and the improvements done in the last versions (default values for Automatic properties, read only auto properties, and both features combined) are pretty sweet.


//Automatic property with default value
public string FavoriteCity {get; set;} = "Paris";

//Read Only Automatic property (I'll be setting its value in the constructor
public string BirthCity {get;}
 
//Read Only Automatic property with default value
public DateTime BirthDate {get;} = DateTime.Now;
  

Another "big feature" in terms of type strokes saving has been Expression Bodied Mmembers. Honestly, this feature feels like a bit irritating to me.

So now you can write a method like this:

public int FormatString(string st) => $"---{st}---";

//rather than like this:
public int FormatString(string st){
return $"---{st}---";
}

I don't see it particularly helpful, but guess others will find it cool. The problem for me comes when we use it with Automatic Properties. Let's see:



public string FrenchFullName => $"{this.LastName.ToUpper()}, {this.Name}";


//is equivalent to:
public string FrenchFullName 
{
 get{ return $"{this.LastName.ToUpper()}, {this.Name}";}

}

//And here an identical method

public string GetFrenchFullName() => $"{this.LastName.ToUpper()}, {this.Name}";

So in the first line I have a property and in the last line a method. At first sight it's not easy to see if it's a method or a property, the only difference is the extra "()". If you follow the correct nomenclature, properties should be nouns and methods should be verbs, there's no room for confusion, but anyway, I feel a bit uneasy with this similarity.

Furthermore, this syntax can be applied to create local functions (added in C# 7). So you can write alocal function in any of these 2 ways:

public static void InternalFunctionTest()
        {
   string applyFormat(int num){
    return $"[{num.ToString()}]";
   }
   
   string applyFormat1(int num) =>  $"[{num.ToString()}]";
  }

If you use the second way, it looks a bit similar to what we would have done in the past if we wanted to avoid polluting the class with a function that is only used from a certain method, write it as a lambda:

Func<int, string> applyFormat2 = (int num) => $"[{num.ToString()}]";

Under the covers, this is quite different from the expression bodied local function. In this case we are creating a new object, a delegate, while with the local function such object creation is not needed. There are more differences in terms on where the compiler creates the underlying method and the use of closures, but I've read some contradictory informations about this, so I'll have to spend some time to get the complete idea.

The "=>" syntax has been a pretty bad choice in my opinion and probably the source for my distaste for Expression Bodied Members . It's linked in my mind to delegates, so it takes my brain some extra cycles to break that association each time I see this symbol.

Saturday, 17 June 2017

Current Directory

The Current Directory (or Working Directory or Current Working Directoy) of a process is one of those topics that I seem to clearly understand, but after a while I partially forget and have to strive to rebuild it in my head, so I'll do a dump it here after that last build. I'm talking only about Windows, I guess the behaviour in Linux and the different UI Shells will be similar.

Every windows process has an associated Current Directory. This is the folder that the process will use as root for any relative path that it tries to access. You can see it with Process Explorer. So, how does a process get its working directory assigned?

The CreateProcess Win32 API function has a parameter to set the Current Directory. If NULL, the Current Directory of the parent process will be inherited. Higher level mechanisms used to create processes (and that ultimately make use of CreateProcess) will usually give you the option to provide the Current dDrectory (like in .Net with ProcessStartInfo).

OK, the above is good to know when we are writing a program that will launch another process, but how have the writers of common OS components handle this? meaning, how does this work when a program is started from the command line or Windows Explorer?

The Windows command line. cmd.exe has as Current Directory the folder where it is currently located. If you do a cd, the Current Directory will get updated accordingly. When you start a process from the command line, the process will inherit the Current Directory from cmd. This means that if you do this:

c:\Temp>cd c:\myAppFolder
c:\myAppFolder> myApp.exe

myApp.exe will have as working directory c:\myAppFolder

But if you do this:

c:\Temp>c:\myAppFolder\myApp.exe

myApp.exe will have as working directory c:\Temp

Windows Explorer. There are several ways in which you can launch a process from Windows Explorer (explorer.exe). Browsing to the folder containing the .exe file and double clicking on it, or typing the path to the exe in the Windows->Run window, is the same, Explorer will set up as Current Directory the folder where the .exe file is located.

An additional way to launch process from Windows Explorer is through a shortcut. If you check the properties of a shortcut you'll see a Start in field. If filled, that value will be used as Current Directory. If emtpy, Explorer will set the program's Current Directory to the folder where the shortcut is located (quite commonly the Desktop).

A process can easily change its working directory while running, by using the SetCurrentDirectory Windows API function. From .Net you can use the Directory.SetCurrentDirectory method.

As far as I know there is not a public API for changing the Current Directory of another process. The option that I've seen is pretty hackerish, injecting a dll in that process and invoking SetCurrentDirectory from its entry point.

As far as I know, by default Windows Services get as Current Directory C:\Windows\System32\, I guess because they inherit it from services.exe

Finally, there's an interesting quirk to bear in mind with .NET applications. The standard configuration system (the old school System.Configuration.ConfigurationManager, previous to the introduction in .Net Core of the much more advanced Microsoft.Extensions.Configuration) is pretty smart and does not use the Current Directory to locate the myApp.config file that has to be located next to myApp.exe, it will get the folder containing our binary and search the app.config in it, so in terms of configuration you don't care about the Current Directory and whether your app is started via explorer or from the command line or from another process. If you move to using a different kind of config file placed also next to your binary, you'll have to start to consider that the Current Directory could have different values and is not a reliable way to get the file.