DEP0009/ORM implementation plan

I’ve took a first pass at adding async-native support to the ORM.

No surprises, it’s going to be a big change in terms of code review, so I think we should break it into smaller phases. Ideally each of them would be a Pull Request and they could be delivered across multiple releases of Django.

Phase1: connection-level API

This would cover the new new_connection context managers, and provide an async cursor that the user could use.

The goal of this first phase is to give users a low-level async cursor that they can use for raw SQL query:

from django.db import new_connection

async def my_view(request):
    async with new_connection() as conn:
          async with conn.acursor() as cursor:
              await cursor.execute("SELECT * FROM my_table")
              result = await cursor.fetchone()

The scope will also include manual transaction management, such as acommit and arollback. transaction.atomic would be out of scope.

Phase 2: transaction.atomic

This would build on the previous work to provide an async-compatiable transaction.atomic decorator:

from django.db import new_connection, transaction

async def my_view(request):
    async with new_connection() as conn:
        async with transaction.atomic():
            async with conn.acursor() as cursor:
                await cursor.execute("SELECT * FROM my_table")
                result = await cursor.fetchone()

All methods in django.db.transaction would be in scope: acommit(), arollback(), asavepoint(), asavepoint_commit(), asavepoint_rollback(), etc.

Phase 3: Models and managers, django.contrib apps

In this phase we’ll coonvert manager methods from the current “faux-async” to be async-native. This will also include model methods such as .asave() and adelete(), the Delete collector.
We’ll also convert models in django.contrib that already have faux-async methods, such as contrib.auth and contrib.sessions. We will not be adding async APIs to any contrib apps that do not yet have them.

Does it sounds like a good plan? Comments, Questions, Concerns?

4 Likes

How are you planning to do this at the low level - an entirely separate set of methods down to the database driver? That was always the duplication I was worried about.

an entirely separate set of methods down to the database driver?

Pretty much yes. I could only find a couple of methods that don’t need to be async’ed. It’s a lot of duplication, but I don’t see a way around it.

Yeah, that was also my conclusion last time I looked at this. Hopefully a lot of the utility methods can be reused, but there’s no way around having a core of function calls that are essentially duplicates (much like I had to do with request handling).

I think your plan phasing makes sense; just getting the first part done would honestly be useful because it would mean the existing async functions wouldn’t have a performance penalty.

I’ve been looking a bit at this problem as well, and I’m wondering about what the end goal is here.

In principle I think the “write a<method> methods all over” strategy is nice because it requires the least amount of thinking. But there’s a lot of non-IO logic spread all over the querysets in particular, so this kind of deep cut feels more risky.

A part of me is really worried about subtle bugs showing up in these changes, and also new bugs not getting properly fixed in both versions.

For example, if we look at RawModelIterator, we have a single iter(query) operation which is where the I/O is happening, and nowhere else. In a world where we need to support __iter__ and __aiter__, we now just have to deal with that.

We can try to extract a bunch of stuff into helper classes when possible, but I do think there might be a real task here of making the queryset code a bit more … straightforward somehow. Like instead of having RawModelIterable be the magic iterable that not only calls iter but also applies all of these transformers, we extract as much non-IO as possible so that the async/sync duplication is more a question of “3 line function duplication” rather than “40 line function duplication”.

But the sync perf requirements are still there, so even with async versions of everything, we can’t simply say “get is async_to_sync(aget)”, so … we just have to keep two implementation trees forever?


The “we don’t want to lose sync performance” question, combined with how everything is set up… a part of me almost wants the async versions to be load-time AST-level transformations of the sync versions (result = maybe_await(thing) transforming into result = thing or result = await thing depending on the async-ness). Having these things be performant and reducing duplication overall feels like such a tarpit if there was some automation that saves us.

This is definitely a problem where having an explicit build step would open up some solutions, though that would definitely be controversial to say the least

Having said all of that, the raw connection/cursor API supporting sync and async makes all of this a lot less theoretical, and it feels necessary and also nicely scoped as a “mandatory” first step so… we should definitely go for it IMO