Asynchronous ORM

Potentially ignorant question here, but is there any reason a new ORM can’t be introduced that inherits from and overrides the old? Why try to modify and shoehorn in async functionality into what I presume to be an already massively complicated piece of software that could potentially be “deliberately making catastrophic/cascading breaking changes.” Thinking of the “O” in SOLID, for example.

Using async Django is apparently going to be opt-in in many ways already, why not give the ability to swap in a different version of the ORM?

Django 4.1 will provide an async-interface to the ORM.

See the release notes:

https://docs.djangoproject.com/en/dev/releases/4.1/#asynchronous-orm-interface

Django 4.1a1 was released May 18. The final release is in August. (Version4.1Roadmap – Django)

3 Likes

Well now that is exciting!

I see that for the moment we have an async-compatible interface only, but that the stack underneath the interface is still actually sync for now (and thus you wouldn’t get the performance/concurrency improvements from an all-async stack). Still, having an async interface should help with actually experimenting with the functionality.

The old ORM uses sync I/O functions. A new async ORM would not be able to call into the older sync ORM while gaining the performance of an all-async stack, which is presumably why a caller would be using an async interface in the first place (i.e. for improved performance/concurrency).

To elaborate a bit more: If you call a sync function from an async function using sync_to_async(), the sync operation actually has to run on a separate thread (from a thread pool IIRC). Threads are expensive and limited in number. An all-async stack would be able to run entirely on a single thread, and have a very high number of concurrent operations (i.e. presumably hundreds of thousands, with appropriate tuning, based on numbers from other ecosystems).

Is there any good example for how sync vs async is compared in terms of performance using ORM or outside API calls ?

Since the async ORM in 4.1 isn’t natively async, I wouldn’t expect any performance improvements. Testing performance metrics for this version would be a moot point imo, since it’s only a stopgap measure.

would there be anyway to test a more native async to see how performance will improve eventually ?

Not really, since it hasn’t been developed by the Django team yet.

I recall seeing some natively async monkey patches to Django floating around GitHub. Those might help give a ballpark of what kind of performance improvements will happen. But last I checked there wasn’t any of these projects that were fully developed.

Maybe you’d get a better performance guesstimate by comparing some of the async ORMs (listed above) to Django’s ORM performance.

What about addressing this issue with a new method: An I/O-driven model

When it comes to server implementation, we can divide it into two parts:

  1. I/O, Network, local disk …
  2. Logic, how to reply to a request.

Server becomes meaningful only if it can trigger some side effects:
game user get level up, create an account in a system, post a blog on a site …
all of which will use I/O.

That’s interesting, we often implement our server with I/O and logic bounded to each other:
accept user’s request, and process it and save some data to database and return response to user through network.

When there is only one synchronous execution model, that’s OK, but when we want to execute the same logic in
an asynchronous execution model, we must reimplement the logic in an asynchronous way, which the main difference is
brought by the I/O difference between synchronous I/O and asynchronous I/O.

So, if we could implement our logic that can run both in synchronous and asynchronous I/O, it will be great,
it requires some type of I/O-driven execution framework that can wipe out the difference between different I/O model.

In the Python programming language, there is a concept called generator, which can be treated as an execution routine,
and it can communicate with the caller with the yield keyword, it can send data to the caller, and it can receive data
from the caller. The caller get chances to affect the inner-state of generator.

We can use the generator to implement I/O-driven framework, which the real I/O happens at the framework level and the
generator will just send I/O context information to the caller and the caller will do the real I/O
and send the I/O result to the generator to push the generator to the next step.

@tinylambda , I’m not sure what you’re actually proposing

Yes, that’s how Python’s async support works under the hood.

Looks like this maybe.

def battle():
    battle_id = yield
    print("battle_id is ", battle_id)

    io_event = {
        "io_type": "redis",
        "io_action": "set",
        "kwargs": {"key": f"battle_{battle_id}", "value": "battle started"},
    }
    # don't do I/O in this place, just send an "I/O Event" to the caller
    io_return = yield io_event
    print("io return: ", io_return)

    # some other simulations
    for i in range(5):
        io_event = {
            "io_type": "redis",
            "kwargs": {"key": battle_id, "value": i},
        }
        print("send io_event: ", io_event)
        io_return = yield io_event
        print("io return: ", io_return)
        if io_return != "ok":
            print("IO error! break!")
            break

Is there any short term plans for mitigating the usage of the ORM in mixed sync-async contexts? This exception is currently called in those circumstances:
django.core.exceptions.SynchronousOnlyOperation: You cannot call this from an async context - use a thread or sync_to_async.

Based on the new thread-per-view rendering, would it be possible to eliminate this exception call?

For some more information, I’m calling the the sync ORM API within a sync function, that is called within an async event loop (for compatibility purposes for our software, the details are a bit complicated).

I guess Django 4.1 doesn’t include any provision for async Model methods, like save? Does anyone know what the plan is for this?

From my understanding, those should be actually trivial do implement if following what was done to QuerySet. Basically create the following methods, which calls their sync counterparts with sync_to_async:

  • asave
  • arefresh_from_db
  • adelete
  • aclean
  • afull_clean
  • aclean_fields
  • avalidate_unique

Probably also add those to the relatedmanager for add/remove/set/etc:

  • aadd
  • acreate
  • aget_or_create
  • aupdate_or_create
  • aremove
  • aclear
  • aset

This is actually so trivial that maybe it could be backported to 4.1 release? Don’t know if the Django release policy allows that though, but just my thoughts to really have a complete async layer for the ORM in 4.1

2 Likes

I don’t understand async as well as I wished but assuming this question isn’t silly, then:

Is there a write up anywhere of the current state of async Django? I understand there’s an async ORM interface, I see psycopg3 was recently merged - what more work is necessary before we can use Django asynchronously through the entire stack?

2 Likes

there is the official docs:

and also some tutorials mainly on async views.