Sunday, 7 May 2023

SqlAlchemy Lazy Loading

I've done some basic use of the ORM functionality provided by SqlAlchemy, that seems to be pretty powerful. At first sight it seems to give us the same features as .Net Entitiy Framework. Though the documentation advices using the Declatative Mapping style, I've been using imperative mapping. It's not that I'm much into Architecture stuff like DDD, Hexagonal and so on... but I think we should aim for Persistence Ignorance even in the most basic projects.

A very interesting feature of ORM's is how they handle the loading of related entities. I mean, we have a Blog entity that has a list of Posts. When we first retrieve a Blog in many cases it would be better not to retrieve its posts until we first access to them. This is called Lazy Loading. In .Net Entity Framework a property in an entity that refers to other entities is called a Navigation Property, and the Lazy Loading is implemented by means of Proxy classes. For those entities that have Navigation properties that have been marked as being lazily loaded, the ORM gives us an instance of a Proxy class that inherits from our Entity class. The Navigation properties in such Proxy class are implemented in a way that on first access they run a query agains the Database to retrieve the related instances. You can verify that I'm not making up things here.

adds lazy loading capabilities to an entity object by:

Storing a reference to the context.
Overriding navigation properties to make them load when they're accessed, using the context.

The proxy inherits from the entity class. Therefore, the navigation properties must be virtual and the entity class can't be sealed.

SqlAlchemy ORM also supports lazy loading, and I was wondering how it does it. Python being a highly dynamic language where you can add/remove attributes from an object there's no need for creating proxy classes. I was thinking that maybe the class for a lazy entity would implement this laziness in an overridden __getattribute__ method, but it's a bit different. Let's say I have these entities:


class Post:
    # very interesting, SqlAlchemy does not invoke the __init__ method in the entities
    def __init__(self, post_id, title, content):
        print("Post __init__")
        self.post_id = post_id
        self.title = title
        self.content = content

class Blog:
    def __init__(self, blog_id, url, title):
        print("Blog __init__")
        self.blog_id = blog_id
        self.url = url
        self.title = title
        self.posts: List[Post] = []

I use imperative mapping defining the blog to posts relation as lazy:


metadata = MetaData()
mapper_registry = registry(metadata=metadata)

table_blog = Table(
    "Blogs",
    mapper_registry.metadata,
    Column("BlogId", Integer, primary_key=True, autoincrement=True),
    Column("Url", String),
    Column("Title", String),
)

table_post = Table(
    "Posts",
    mapper_registry.metadata,
    Column("PostId", Integer, primary_key=True, autoincrement=True),
    Column("BlogId", ForeignKey("Blogs.BlogId")),
    Column("Title", String),
    Column("Content", String),

)

def start_mappers():
    mapper_registry.map_imperatively(entities.Blog, table_blog,
      properties={
            "blog_id": table_blog.c.BlogId, 
            "url": table_blog.c.Url, 
            "title": table_blog.c.Title, 
            "posts": relationship(entities.Post, lazy="select")
            #The default value of the relationship.lazy argument is "select", which indicates lazy loading. 
        }
    )

    mapper_registry.map_imperatively(
        entities.Post,
        table_post,
        properties={
            "post_id": table_post.c.PostId, 
            "title": table_post.c.Title,
            "content": table_post.c.Content
        }      
    )

When the above mapping function is invoked some attributes are added to the classes for our entities. For our Blog class, we can see below the contents of its __dict__ before (it mainly has the __init__ method) and after doing the mapping (for each column defined during the mapping an attribute of type: sqlalchemy.orm.attributes.InstrumentedAttribute has been added to the class)

The next interesting thing is that when creating and instance of our Entity class from a database row the __init__ method is not invoked. You can find the explanation for old versions of SqlAlchemy, not for version 2.0, but it still holds true (the print commands that I've put in the __init__ above are never executed).

The SQLAlchemy ORM does not call __init__ when recreating objects from database rows. The ORM’s process is somewhat akin to the Python standard library’s pickle module, invoking the low level __new__ method and then quietly restoring attributes directly on the instance rather than calling __init__.

So though in my Blog.__init__ I'm initializing the posts (relation-navigation property) to an empty list, as the ORM is not invoking __init__ my Blog instances initially lack that attribute. This is how a Blog instance looks (it's an instance of the Blog class, there's not any sort of additional proxy-inheriting class needed) when first retrieved from the DB, no trace of the posts attribute:

{'_sa_instance_state': , 'title': 'Deploy To Nenyures', 'blog_id': 1, 'url': 'deploytonenyures.blogspot.com'}

So how does the lazy loading work? I already talked about the complex attribute lookup process in Python. So the thing is that when we first try to access to myBlog.posts Python will find the posts attribute not in the instance, but in the Blog class, as an InstrumentedAttribute, that happens to be a descriptor. So here my assumption is that the __get__ method of the descriptor will query the database to obtain the Post entities and will add them as an attribute to the Blog instance, so that next time we access to myBlog.posts we find them in the instance rather than in the class. This is how the same Blog instance looks after the first access to posts and the lazy-loading:

{'_sa_instance_state': , 'title': 'Deploy To Nenyures', 'blog_id': 1, 'url': 'deploytonenyures.blogspot.com', 'posts': [, , ]}

No comments:

Post a Comment