Friday, 10 May 2024

Kotlin iteration, toList vs asIterable

In this previous post I talked about Iterables and Sequences in Kotlin, about the eager vs deferred and lazy nature of their extension functions (this StackOverflow discussion is pretty good). As I read somewhere "Collections (that are classes that implement Iterable) contain items while Sequences produce items". What I did not mention in that post is that the fact that Sequences are not Iterables feels quite strange to me (and I see no reason for the Sequence[T] interface not to implement the Interface[T] interface, it would not interfere with using the right extension function). In Python and JavaScript (where we don't have an Iterable class or interface) an iterable is a sort of "protocol". Anything that can be iterated (we can get an iterator from it) is iterable. In Kotlin we have an Iterable interface, but there are things (Sequences) that can be iterated (both the Iterable and Sequence interfaces have an iterator() method) without implementing that Interface. I guess what is confusing for me is saying "X is iterable" cause it seems very similar to saying "X is (implements) an Iterable". Saying "X can be iterated" seems more appropriate, a Sequence can be iterated but it's not an Iterable.

Well, let's put semantic discussions aside and let's see something more concrete. We can obtain a Sequence from an Iterable by means of the asSequence() extension function or the Sequence() function. We'll do that if we want to leverage the lazy extension functions for mapping, filtering, etc... In the other direction we can "materialize" a sequence into a Collection using the toList() extension function. That way we force the sequence to "produce" all its items and we store them in a collection. Additionally we also have the Sequence[T].asIterable() extension function. When/why would we use asIterable() rather than toList()?

Let's say we have a sequence on which we will be applying some filtering/mapping, and we want those functions to be applied in one go, eagerly, as soon as one item would be needed in an ensuing iteration. BUT we don't want to do it now. We can use asIterable() to get an object that implements the Iterable interface but contains no elements (so we are deferring the execution). Then, when we first invoke the map/filter extension function (on Iterable) the whole original sequence will be iterated and processed in one go (egarly). If we use .toList() the whole iteration of the sequence occurs immediatelly, before calling map/filter to process the items. I'm using the deferred/immediate execution and lazy/eager evaluation terms as explained in this excellent post.



fun createCountriesSequence() = sequence {
    println("Before")
    yield("France")
    println("next city")
    yield("Portugal")
    println("next city")
    yield("Russia")
}

    var countriesSequence  = createCountriesSequence()
    // converting the sequence to list immediatelly iterates the sequence
    var countries = countriesSequence.toList()
    println("after converting to list")

    // Before
    // next city
    // next city
    // after converting to list

    println()

    // what's the use of calling .asIterable() rather than directly .toList()?
    // with .asIterable we obtain a deferred collection (nothing runs until we first need it)
    // but as the extension functions in Iterable are eager, first time we apply a map or whatever the whole iteration is performed 
 
    countriesSequence = createCountriesSequence()
    //converting to iterable does not perform any iteration yet
    var countriesIterable = countriesSequence.asIterable()
    println("after creating iterable")

    println("for loop iteration:") //no iteration had happenned until now
    for (country in countriesIterable) {
        println(country)
    }

    // after creating iterable
    // for loop iteration:
    // Before
    // France
    // next city
    // Portugal
    // next city
    // Russia
    
    println()

    countriesSequence  = createCountriesSequence()
    countriesIterable = countriesSequence.asIterable()
    println(countriesIterable::class)
    println("after creating iterable") //no iteration has happenned yet (deferred execution)

    //applying an Iterable extension function performs the whole iteration (eager evaluation)
    val upperCountries = countriesIterable.map({ 
        println("mapping: $it")
        it.uppercase() 
    })
    println("after mapping")
    println(upperCountries::class.simpleName)


    // after creating iterable
    // Before
    // mapping: France
    // next city
    // mapping: Portugal
    // next city
    // mapping: Russia
    // after mapping
    // ArrayList


One more note. The iterator() method of an Iterable or Sequence has to be marked as an operator, why? This is the explanation I found in the documentation and in some discussion:

The for-loop requires the iterator(), next() and hasNext() methods to be marked with operator. This is another use of the operator keyword besides overloading operators.

The key point is that there is a special syntax that will make use of these functions without a syntactically visible call. Without marking them as operator, the special syntax would not be available.

Remember than in Kotlin (as in Python) we can not define custom operators, we can just overload the predefined ones.

No comments:

Post a Comment