Sunday 26 September 2021

Linux Process Memory Consumption

Properly understanding the memory consumption of a process is quite an endeavor. For Windows I've given up. Over the years I've gone through different articles and discussions about the WorkingSet, Private Bytes and so on... and I've not managed to get a full understanding. In Linux things seem to make quite more sense.

I have several recent posts that could join this current one in a sort of "Understanding Memory series". This one about the whole Memory consumption in your system, this about the available physical memory, and this one where among other things I talk about the Virtual Address Space.

I'm going to talk now about the physical memory consumed by a specific process and about "shared things". With the top command we see 3 memory values: VIRT, RES and SHR. VIRT is the virtual address space, so as I explained in previous articles it has nothing to do with physical memory and normally it's not something we should care about. RES is the important one, it's the physical memory used by the process, and it's the same value that we see with the ps v pid command under the RSS column.

Well, things are not so simple. Processes share resources among them. I can think of 2 main ones: SO's (shared objects/shared libraries, that is, Linux equivalent to dll's) and the code segment between different instances of the same process. From what I can read here and here, the memory taken by a SO will be included in the RSS value of each process using that library (though it's taking, at least for most part of the SO, physical memory only once). The same goes for the code segment space, for each instance of the same process the RSS value will include the code segment space, though it's loaded only once in physical memory. This means that if you add up the RSS value of all the processes in your system it will give you a far bigger value than the real physical memory consumption that is provided by free -m.

RSS is the Resident Set Size and is used to show how much memory is allocated to that process and is in RAM. It does not include memory that is swapped out. It does include memory from shared libraries as long as the pages from those libraries are actually in memory. It does include all stack and heap memory.

Remember that code segment pages are shared among all of the currently running instances of the program. If 26 ksh processes are running, only one copy of any given page of the ksh executable program would be in memory, but the ps command would report that code segment size as part of the RSS of each instance of the ksh program.

When a process forks, both the parent and the child will show with the same RSS. However linux employs copy-on-write so that both processes are really using the same memory. Only when one of the processes modifies the memory will it actually be duplicated. This will cause the free number to be smaller than the top RSS sum.

There's an additional point to take into account, Shared Memory. Contrary to what you could think, when talking about "Shared Memory" in linux systems we are not talking about shared libraries, but about a different concept, memory reserved by shmget or mmap. Honestly I know almost nothing about it, I'm just telling what I've read here and here

"shared libraries" != "shared memory". shared memory is stuff like shmget or mmap. The wording around memory stuff is very tricky. Using the wrong word in the wrong place can totally screw up the meaning of a sentence

This Shared Memory is the SHR value on top's output. All processes sharing a block of shared memory will have it counted under their SHR value (but not under RSS).

The RSS value doesn't include shared memory. Because shared memory isn't owned by any one process, top doesn't include it in RSS.

Tuesday 14 September 2021

Truffle Interpreters

In my previos post about Graal VM I mentioned that maybe I would write an additional post about Truffle. This is how I summarized in that post what Truffle is supposed to be:

The Truffle Language Implementation Framework. It allows "easily" writing programming languages implementations as interpreters (written in Java) for self-modifying Abstract Syntax Trees (what?!). It has allowed the creation of very efficient Ruby, Python and JavaScript implementations to run in the GraalVM. As these implementations are written in Java you can run them on a standard JVM, but they'll be slow, as it's the GraalVM JIT compiler who allows these implementations to run efficiently, so that's why Truffle is so linked to GraalVM. This is a really good (and thick) read.

This idea of writing high performance interpreters seems odd to me. I know very little about compilers, interpreters and VM's, but it seems to go against the last decades of common Computer Science knowledge. Indeed, the Java guys spent much time and efforts to adapt the JVM, by means of the Da Vinci project and the new invokedynamic bytecode instruction, to strongly increase its performance with dynamic languages. After doing that, all of a sudden comes Truffle and you can write interpreters for languages that you compile to "self-modifying AST's" (not to Java bytecodes), and those implementations of JavaScript, Ruby... seem to be faster that the implementations that compiled those languages to Java bytecodes and that leveraged that recent and magical invokedynamic instruction! Indeed, Oracle has discontinued Nashorn (that was developed along with the invokedynamic-Da Vinci thing and that compiled JavaScript to Java bytecodes containing invokedynamic), and replaced it with Graal.js, that is based on Truffle.

Truffle is intended to run with the GraalVM Compiler, so OK, as this is an optimized JIT, your interpreter, written in Java and compiled to bytecodes, will end up compiled into a very optimized machine code thanks to all the optimizations that this cool JIT will do overtime, but anyway, we are optimizing an interpreter... and there can not be such a difference between the GraalVM Compiler and the Hotspot JIT to allow for this big change that Truffle seems to mean.

Well, I think I've finally understood what's the magic with Truffle. This article has been quite esential (and this other one has also helped). First we have that the AST that gets constructed for the program that you are interpreting will evolve (based on runtime profiling) from dynamically typed nodes into statically typed nodes. And then comes the real magic, the Partial Evaluation technique. Your interpreter code (the java bytecodes, not the code being interpreted) that was written to work with any types, will be especialized to work with the types being used in these executions. This is the Futamura projections, your interpreter gets specialized to run a specific source code!

So in the end your interpreter will ask Graal to compile to native code especialized versions of the different classes that make up the interpreter. From one of the articles:

Partial evaluation is a very interesting topic in theoretical computer science – see the Wikipedia article on Futamura projections for some pretty mind-bending ideas. But for our case, it means compiling specific instances of a class, as opposed to the entire class, with aggressive optimizations, mainly inlining, constant folding and dead code elimination.

All the above seems terribly complex (and magic) to me, and at some point I still wonder if I'm really understanding it correctly. The Truffle-Ruby documentation here has a couple of paragraphs that seem to confirm that my understanding is correct.

The Truffle framework gets the bytecode representation of all of the AST interpreter methods involved in running your Ruby method, combines them into something like a single Java method, optimizes them together, and emits a single machine code function.

TruffleRuby doesn’t use invokedynamic, as it doesn’t emit bytecode. However it does have an optimizing method dispatch mechanism that achieves a similar result.

Same as Ruby has now the "classic" JRuby (that continues to use as far as I know a strange mix of a Ruby interpreter written in Java and a runtime compiler to java bytecodes, that then will be either interpreted or JIT compiled by the JVM) and the new Truffle interpreter, it would be interesting to see if other languages plan a similar move. For the moment Groovy has no plans for that. After the performance improvements they reached when moving from callsite caching to indy (invokedynamic), they don't seem to see any reason to create a Groovy Truffle interpreter. Bear in mind also that as with all Java code, they are getting additional performance improvements "for free", just by running with the GraalVM JIT rather than the C2 Hotspot JIT.

Monday 6 September 2021

Groovy Types, AsType

It was such a long time ago that I expressed my profound admiration for Groovy. At that time I was impressed by its metaprogramming capabilites, recently I've been delighted by what a beautiful platform for DSL's it provides (Jenkins pipelines...), and lately it's its approach to Types that has caught my attention.

The dynamic vs static typing, and strong vs weak vs duck typing debate can be a bit confusing. I talked about it long in the past. Groovy's approach to type checking is pretty interesting, and this article is an excellent reference. In the last years it has gained support for static type checking (checks done at compile time) by means of the @TypeChecked and @CompileStatic annotation, but I'm not much interested on it. What's really appealing to me is that when working in the "dynamic mindset", its Optional typing approach allows us to move between dynamic typing and duck typing. When you declare types for variables or method/function parameters, Groovy will do that type check a runtime, but if you don't declare the type, it will be declared as Object, but no type checks will be applied, and the Groovy invokation magic (callsite caching in the past, invokedynamic in modern versions) will give you a duck typing behaviour. It will try to find a method with the given name, will try to invoke it, and if it works, it works :-)

All this thinking about type systems reminded me of my discovery of structural typing via TypeScrit time ago, and made me wonder if when using type declarations Groovy would support structural typing at runtime (I mean, rather than checking if an object has an specific type, checking if the "shapes" match, that is, the expected type and the real type has the same methods). The answer is No, (but a bit).

In other words, Groovy does not define structural typing. It is however possible to make an instance of an object implement an interface at runtime, using the as coercion operator.

You can see that there are two distinct objects: one is the source object, a DefaultGreeter instance, which does not implement the interface. The other is an instance of Greeter that delegates to the coerced object.

This thing of the "as operator" (AsType method) and the creation of a new object that delegates calls to the original object... sounds like a Proxy, n'est ce pas? So I've been investigating a bit more, and yes, Groovy dynamically creates a proxy class for you (and an instance of that proxy). Thanks to the proxy you'll go through the runtime type checks, and then it will try to invoke the requested method in the original object, and if it can not find it you'll get a runtime error.


import java.lang.reflect.*;

interface Formatter{
    String applySimpleFormat(String txt);
    String applyDoubleFormat(String txt);
}

class SimpleFormatter implements Formatter {
    private String wrap1;
    private String wrap2;
    
    SimpleFormatter(String w1, String w2){
        this.wrap1 = w1;
        this.wrap2 = w2;
    }

    String applySimpleFormat(String txt){
        return wrap1 + txt + wrap1;
    }

    String applyDoubleFormat(String txt){
        return wrap2 + wrap1 + txt + wrap1 + wrap2;
    }
}

class TextModifier{
    String applySimpleFormat(String txt){
        return "---" + txt + "---";
    }
}

def void formatAndPrint(String txt, Formatter formatter){
    System.out.println(formatter.applySimpleFormat(txt));
}

def formatter1 = new SimpleFormatter("+", "*");
formatAndPrint("bonjour", formatter1);

def modifier = new TextModifier();
try{
    formatAndPrint("bonjour", modifier); //exception
}
catch (ex){
    //runtime exception, when invoking the function it checks if modifier is an instance of Formatter
     println("- Exception 1: " + ex.getMessage());
}    

Formatter formatter2;
try{
    formatter2 = modifier;
}
catch (ex){
    //runtime exception, it tries to do a cast
// Cannot cast object 'TextModifier@1fdf1c5' with class 'TextModifier' to class 'Formatter'
    println("- Exception 2: " + ex.getMessage());
}   


formatter2 = modifier.asType(Formatter); //same as: modifier as Formatter;

//this works fine thanks to the Proxy created by asType
formatAndPrint("bonjour", formatter2); 

println "formatter2.class: " + formatter2.getClass().name;
//TextModifier1_groovyProxy

//but it is not a java.lang.reflect.Proxy
println ("isProxyClass: " + Proxy.isProxyClass(formatter2.class));


As you can see in the code above, Groovy creates a "TextModifier1_groovyProxy" class, but it does so through a mechanism other than java.lang.reflect.Proxy, as the Proxy.isProxyClass check returns false.

I'll leverage this post to mention another interesting, recent feature regarding types in a dynamic language. Python type hints and mypy. You can add type declarations to your python code (type hints), but they have no effect at runtime, nothing is checked by the interpreter, your code remains as dynamic and duck typed as always, the type hints work just as a sort of documentation. However, you can run a static type checker like mypy on your source code to simulate compile time static typing. This is in a sense similar to what you do with TypeScript, only that you are not transpiling from one language (TypeScript) to another (javascript), you are only doing the type checking part.