Sunday, 28 December 2014

Metaprogramming, Reflection and MOP's

Metaprogramming, Reflection and MOP's (Metaobject Protocols) are some of the most interesting topics in Computer Science that I can think of. The idea of a program inspecting itself and modifying itself is for sure beautiful. The specific meaning of each of these terms can be a bit confusing though, and the borders and relations among them rather blurry. My understanding has changed a bit over the years and I think it's good to write here my current view on it.

When reading this excellent and very in depth article about Metaprogramming in ES6 with Proxies, I felt a bit confused by the notion of Reflective Metaprogramming exposed there and its different subparts. It was a bit different from how I used to think of Reflection and Metaprogramming and their different classifications, so I've been doing some reading to clarify this nomenclature and taxonomy.

Metaprogramming is a wide topic, and from the Wikipedia entry we can read:

Metaprogramming is the writing of computer programs with the ability to treat programs as their data. It means that a program could be designed to read, generate, analyse and/or transform other programs, and even modify itself while running

so Reflective Metaprogramming deals with the itself (analysing, transforming and even modifying itself while running)

Reflective Metaprogramming is indeed what I'd always called just Reflection. However, the way I used to subdivide it is a bit different, while Axel Rauschmayer talks about:

  • Introspection: you have read-only access to the structure of a program.
  • Self-modification: you can change that structure.
  • Intercession: you can redefine the semantics of some language operations.

I used to think in these terms (obviously is not something invented by me, I had read it years ago in some book, copy-pasted it in my personal notes and stick to it):

  • Introspection: same as in the above classification, the ability to ask for the type of an object, the methods of a type, its parameters...
  • Structural Reflection: the ability to access properties or invoke methods based on runtime decisions, and even creating new classes (runtime metaprogramming). To a greater or lesser extent most languages possess this feature. In .Net for example it has evolved from the initial Reflection API to the current Roslyn superstar, going through LCG and the DLR.
  • Computational Reflection. This would be a way more complex beast. It would include things like modifying the semantics of a language, from how method resolution and invokation is done or how property or indexed access is performed, to adding new constructs to the language at runtime (yes, crazy stuff like adding new types of loops...). I used to see this related to the existence of a MOP.

The terms Structural Reflection and Computational Reflection don't seem to be particularly popular (you'll find some references along with Behavioral Reflection), so I'll try to change my mindset and use the ones given by Axel Rauschmayer in the future.

The another big and confusing topic is the MOP. Everyone seems to have a different idea of what a MOP is, so I'll give first some definitions that I've found over time:

From StackOverflow

The MOP exposes some or all internal structure of the interpreter to the programmer. The MOP may manifest as a set of classes and methods that allow a program to inspect the state of the supporting system and alter its behaviour. MOPs are implemented as object-oriented programs where all objects are metaobjects.

From Wikipedia

A metaobject protocol (MOP) provides the vocabulary to access and manipulate the structure and behavior of objects. Typical functions of a metaobject protocol include:

Creating and deleting new classes
Creating new methods and properties
Changing the class structure so that classes inherit from different classes
Generating or modifying the code that defines the methods for the class

The metaobject protocol is contrary to the "closed" aspect of Bertrand Meyer's open/closed principle. It reveals and allows a system to modify the internal structure of the objects. For this reason it is usually used sparingly and for special circumstances such as software that transforms other software, for example for reverse engineering.

From Moose Documentation

A meta object protocol is an API to an object system.

To be more specific, it abstracts the components of an object system (classes, object, methods, object attributes, etc.). These abstractions can then be used to inspect and manipulate the object system which they describe.

It can be said that there are two MOPs for any object system; the implicit MOP and the explicit MOP. The implicit MOP handles things like method dispatch or inheritance, which happen automatically as part of how the object system works. The explicit MOP typically handles the introspection/reflection features of the object system.

All object systems have implicit MOPs. Without one, they would not work. Explicit MOPs are much less common, and depending on the language can vary from restrictive (Reflection in Java or C#) to wide open (CLOS is a perfect example).

The distinction done above between implicit and explicit MOP is unclear to me. The way Java or C# handle inheritance is hardcoded and there's not an API to modify it (excluding dynamic C#). There's not API for intercepting method calls or property access (other than creating proxy objects), there's no methodmissing/aoutoload/nosuchmethod, no way to modify inheritance chains or inheritance behavior... All in all nothing to do with the power found in Groovy

Groovy's MOP system includes some extension/hooks points that you can use to change how a class or an object behaves, mainly being:
getProperty/setProperty: control property access
invokeMethod: controls method invocation, you can use it to tweak parameters of existing methods or intercept not-yet existing ones
methodMissing: the preferred way to intercept non-existing methods
propertyMissing: also the preferred way to intercept non-existing properties

And finally from the article by Rauschmayer

The ECMAScript specification describes how to execute JavaScript code. It includes a protocol for handling objects. This protocol operates at a meta level and is sometimes called the meta object protocol (MOP). The JavaScript MOP consists of own internal methods that all objects have. “Internal” means that they exist only in the specification (JavaScript engines may or may not have them) and are not accessible from JavaScript. The names of internal methods are written in double square brackets.

Notice how in ES6 if you want to alter the way property access or method invokation works you can not directly alter the [[GET]] internal method of an object, but to create a proxy and set a get trap. While the article praises this separation between the base level and meta level, I see some disadvantages on it. In Groovy, you can directly alter "the protocol" of an existing object by just adding or redefining its invokeMethod method, so all your references to that instance will get the new behaviour, in JavaScript you would have to update all your references to point to the proxy instance rather than the proxied one. For me this means losing transparency.

So you can see different perspectives on what a MOP is.Iinspired by the article when it says "The term protocol is highly overloaded in computer science" I guess we can say right the same for MOP, it's highly overloaded. I'll give now my own practical definition of what a MOP is:

For me a practical definition of a MOP would be any support (in the form of some sort of API) given by a language in order to modify how property access, method invokation or inheritance work.
Less advanced topics like the way to add new methods or properties (expand) to an existing class or instance, or modify the inheritance (or prototype) chain would be part of that MOP.
More advanced features like modifying/expanding the syntax of the language, changing how exceptions propagate, how loops behave (think for example of a calling a method that would turn all or certain loops into parallel loops), changing methods from sync to async, making those methods memoize results, throttle... would also be part of a MOP.

No comments:

Post a Comment