Saturday 19 October 2019

Hot vs Cold Observables

Having read some of the multiple articles on the Hot vs Cold observables thing like this, I used to think that I have a good understanding of it, but the other day I came across a question in StackOverflow that quite confused me. I did my own test to verify that it really worked like that.

import { Observable, Subject, ReplaySubject, from, of, range } from 'rxjs';
import { map, filter, switchMap, flatMap, tap, mapTo} from 'rxjs/operators';

const subject1 = new Subject();

let ob1 = subject1.pipe(
    tap(it => console.log("tap: " + it)),
    map (it => {
  console.log("map: " + it);
  return it * 2;
 })
);

ob1.subscribe(it => console.log(it + " received in subscription 1"));
ob1.subscribe(it => console.log(it + " received in subscription 2"));

subject1.next(20);

We get this output:

tap: 20
map: 20
40 received in subscription 1
tap: 20
map: 20
40 received in subscription 2

I would have expected the ouput to be like this:

tap: 20
map: 20
40 received in subscription 1
40 received in subscription 2

I'd never checked how pipe and operators are implemented, so I was thinking that when we pipe several operators a sort of chain is created, so that when a subscribe is run, the different observables are created and linked, but this chain of observables would be created only once, regardless of how many subscription we were doing, so we would have:

subject -> tap Observable -> map Observable -> subscriber 1
                                           |
                                           |-> subscriber 2

Reading this article about creating your custom operators has helped me to understand that the above chain is created each time we call subscribe, so what we really get is this:

subject -> tap Observable -> map Observable -> subscriber 1
       |
       |-> tap Observable2 -> map Observable2 -> subscriber 2

Sunday 13 October 2019

IEqualityComparer and GetHashCode

I've always found this a bit confusing, but it has not been until this week that I decided to try to understand the reasons behind it. Some methods in Linq to Objects (System.Linq.Enumerable) take an IEqualityComparer as parameter, for example Distinct and Contains. Both methods have an overload that requires no EqualityComparer. In that case the default EqualityComparer for the class is used, that means that if the class does not implement System.IEquatable and it's a reference type, we'll end up using reference equality. So using Distinct or Contains when we are concerned about references pointing to the same object is pretty straight forward.

When we want to compare based on some property in the object, (the Id, the Name...) it's obvious that we need to provide the comparison "mechanism". In javascript for functions like find, findIndex, includes... we pass a function expecting two values and returning true or false. That's what in principle I would be expecting that should be needed in .Net, a delegate, but no, we are forced to provide an IEqualityComparer, that has an Equals method and also a GetHashCode method. The reason for this is that the comparison logic in Distinct and Contains not only uses the Equals method (in that case it could have been designed to just receive a delegate), but also the HashCode. It's well explained here. Hash functions are not perfect, they have collisions (the hash of 2 different values can be the same), but we need to make sure that for 2 objects, if IEqualityComparer.Equals is true, the GetHashCode for both objects is also the same (but the reverse does not need to be true). With this, and based on some explanations I've read (I've taken a fast look into the source code, but got a bit lost) we can think of the Distinct method being implemented sort of like this:

Each unique element found so far is stored in a HashTable, where the key is the EqualityComparer.GetHashCode. Now, when checking if another element is unique, rather than comparing it with all items from the start of the collection, for the part of the collection going from the start to this current element we can just check against the hash table (much faster than checking element by element), if we don't find it there, we'll have to continue comparing item by item against the remaining part of the collection.

So this GetHashCode is clearly a way to speed up those methods where the collection is going to be iterated multiple times (like Distinct). However, for methods like Contains, not sure how it comes into play (probably it compares hashCodes first before resorting to Equals?). The odd thing is that methods similar to Contains, like First or Any, do not use an IEqualityComparer, but just a delegate...

Saturday 5 October 2019

C# Asynchronous (pull) Streams

C# 8 is finally out, with two main, long awaited features: Default Interface Methods and Asynchronous (Pull) Streams (Asynchronous Enumeration). Regarding the latter, already 3 years ago I had posted about how useful it would be to have something like that, and I think it will be good to do a recap (I also posted about types of streams early this year).

At that time I had said that in some cases we could deal with Asynchronous Enumeration by returning an IEnumerable<Task<T>>. This technique is only valid when knowing if we can get a new element or not (so MoveNext returns true or false) is synchronous, and the asynchronous part is obtaining the value Task<T> that will be set in Current. An example:

 private static Task<string> GetBlogAsync(int id)
 {
  return Task.Delay(500).ContinueWith(task => "Blog " + id.ToString());
 }


 private static IEnumerable<Task<string>> GetBlogs()
 {
  for (var i=0; i<5; i++)
  {
   yield return GetBlogAsync(i);
  }
 }
 

 static async Task Main(string[] args)
 {
  foreach (var blogTask in GetBlogs())
  {
   Console.WriteLine(blogTask.Result);
  }
 }

The above code can be written in C# 8 (thanks to the introduction of IAsyncEnumerable and IAsyncEnumerator) like this:

        private static async IAsyncEnumerable<string> GetBlogs2()
        {
            for (var i=0; i<5; i++)
            {
                yield return await GetBlogAsync(i);
            }
        }
  
  static async Task Main(string[] args)
        {
            await foreach (var blog in GetBlogs2())
            {
                Console.WriteLine(blog);
            }
        }
    }

For this case, where we can know if we have reached the end of the iteration (i==5) without trying to get the element, both options are equally valid, but we have to understand that there's a difference in what we are doing. The difference between IEnumerable<Task<T>> and IAsyncEnumerable<T> is that for the former, MoveNext() returns a bool, and Current returns a Task<T>. For the latter, MoveNextAsync() returns a Task<bool> and Current returns T. Thanks to this, IAsyncEnumerable works fine both when the iteration end condition is known synchronously or asynchronously.

Modern javascript also includes this feature, known as Asynchronous Iterators, and this article makes a really good read about it. By the way, I've always found it rather confusing that the Iterator protocol uses Symbol.iterator rather than Symbol.getIterator for the name of the function that defines an object as iterable and returns the iterator. It is a function, so it should have a verb name, not a noun...