Saturday, 15 August 2015

Bytecodes, ASTs and so on

These pieces of news: [1], [2] and [3] about WebAssembly have made a pretty intereting read. It's not that I am interested in using it, I think 99% of web development projects will continue to be done in an every day more powerful and beautiful JavaScript, but the concept is pretty interesting since a technical stand point.

Being able to write high performance code for the web so that game engines can run smoothly on the browser is cool, but as I've said, I guess few people will ever need it. Having your Browser Virtual Machine running for a same "application" two parsers (one for JavaScript and one for WebAssembly) compiling to a common bytecode language is cool, but nothing new (you've been able since a long while to mix Python and Ruby into a C# application, and the same for Java). What is funny is that in the CLR and the JVM the first citizens were the static, "low level" languages (C#, Java...), and then the dynamic and more feature filled languages (Python, JavaScript, Groovy) were added as second citizens, for the web it has been right the contrary path.

What I find really interesting is that the code written in this "binary format for the web" is not bytecodes, in the sense that is not a set of instructions for a stack based (CLR, standard JVM) or registers based (Dalvik) Virtual Machine, but a bit upper level, a binary representation of an Abstract Syntax Tree (AST). Does this ring a bell? Well, it's just what is done with the DLR languages, that are compiled at runtime into a DLR AST that will be then transformed into .Net bytecodes.

The main performance advantage offered by WebAsm over JavaScript is that parsing this binary AST into bytecodes is way faster than parsing JavaScript. Though quite unrelated, my mind established a link between this and JerryScript, a JavaScript engine for small embedded devices. In the article they explain how they managed to make this engine fit into just 64KBs of memory:

From the perspective of IoT, we only focused on memory footprint.

JerryScript is a pure interpreter in contrast with multi-stage adaptive JIT engines today. So it has no overhead for storing compiled code. Even the parser doesn’t store AST. It produces bytecode line-by-line directly from the source code. For data representation, objects in JerryScript are optimized for size. JerryScript is using compressed pointers, bytecode blocks with fixed size, a pre-allocated object pool and multiple representation of Number objects to achieve both standard compliance and memory optimization. We will keep continue to reduce the memory footprint in various ways.

No comments:

Post a Comment