DEP0009/ORM implementation plan

I’ve took a first pass at adding async-native support to the ORM.

No surprises, it’s going to be a big change in terms of code review, so I think we should break it into smaller phases. Ideally each of them would be a Pull Request and they could be delivered across multiple releases of Django.

Phase1: connection-level API

This would cover the new new_connection context managers, and provide an async cursor that the user could use.

The goal of this first phase is to give users a low-level async cursor that they can use for raw SQL query:

from django.db import new_connection

async def my_view(request):
    async with new_connection() as conn:
          async with conn.acursor() as cursor:
              await cursor.execute("SELECT * FROM my_table")
              result = await cursor.fetchone()

The scope will also include manual transaction management, such as acommit and arollback. transaction.atomic would be out of scope.

Phase 2: transaction.atomic

This would build on the previous work to provide an async-compatiable transaction.atomic decorator:

from django.db import new_connection, transaction

async def my_view(request):
    async with new_connection() as conn:
        async with transaction.atomic():
            async with conn.acursor() as cursor:
                await cursor.execute("SELECT * FROM my_table")
                result = await cursor.fetchone()

All methods in django.db.transaction would be in scope: acommit(), arollback(), asavepoint(), asavepoint_commit(), asavepoint_rollback(), etc.

Phase 3: Models and managers, django.contrib apps

In this phase we’ll coonvert manager methods from the current “faux-async” to be async-native. This will also include model methods such as .asave() and adelete(), the Delete collector.
We’ll also convert models in django.contrib that already have faux-async methods, such as contrib.auth and contrib.sessions. We will not be adding async APIs to any contrib apps that do not yet have them.

Does it sounds like a good plan? Comments, Questions, Concerns?

4 Likes

How are you planning to do this at the low level - an entirely separate set of methods down to the database driver? That was always the duplication I was worried about.

an entirely separate set of methods down to the database driver?

Pretty much yes. I could only find a couple of methods that don’t need to be async’ed. It’s a lot of duplication, but I don’t see a way around it.

Yeah, that was also my conclusion last time I looked at this. Hopefully a lot of the utility methods can be reused, but there’s no way around having a core of function calls that are essentially duplicates (much like I had to do with request handling).

I think your plan phasing makes sense; just getting the first part done would honestly be useful because it would mean the existing async functions wouldn’t have a performance penalty.

I’ve been looking a bit at this problem as well, and I’m wondering about what the end goal is here.

In principle I think the “write a<method> methods all over” strategy is nice because it requires the least amount of thinking. But there’s a lot of non-IO logic spread all over the querysets in particular, so this kind of deep cut feels more risky.

A part of me is really worried about subtle bugs showing up in these changes, and also new bugs not getting properly fixed in both versions.

For example, if we look at RawModelIterator, we have a single iter(query) operation which is where the I/O is happening, and nowhere else. In a world where we need to support __iter__ and __aiter__, we now just have to deal with that.

We can try to extract a bunch of stuff into helper classes when possible, but I do think there might be a real task here of making the queryset code a bit more … straightforward somehow. Like instead of having RawModelIterable be the magic iterable that not only calls iter but also applies all of these transformers, we extract as much non-IO as possible so that the async/sync duplication is more a question of “3 line function duplication” rather than “40 line function duplication”.

But the sync perf requirements are still there, so even with async versions of everything, we can’t simply say “get is async_to_sync(aget)”, so … we just have to keep two implementation trees forever?


The “we don’t want to lose sync performance” question, combined with how everything is set up… a part of me almost wants the async versions to be load-time AST-level transformations of the sync versions (result = maybe_await(thing) transforming into result = thing or result = await thing depending on the async-ness). Having these things be performant and reducing duplication overall feels like such a tarpit if there was some automation that saves us.

This is definitely a problem where having an explicit build step would open up some solutions, though that would definitely be controversial to say the least

Having said all of that, the raw connection/cursor API supporting sync and async makes all of this a lot less theoretical, and it feels necessary and also nicely scoped as a “mandatory” first step so… we should definitely go for it IMO

Been working through implementing model methods with the ongoing branch.


async def print_value(m):
  async with new_connection():
    refetched_m = await Model.objects.aget(id=m.id)
    print(refetched_m.value)

m = Model.objects.create(value=1)
with transaction.atomic():
   Models.objects.update(value=2)
   async_to_sync(print_value)(m)

From DEP0009

Whenever a new_connections() block is entered, Django sets a new context with new database connections. Any transactions that were running outside the block continue; any ORM calls inside the block operate on a new database connection and will see the database from that perspective. If the database has transaction isolation enabled, as most do by default, this means that the new connections inside the block may not see changes made by any uncommitted transactions outside it.

So the above should print 1.


The following code would currently print 2 (Because aget just falls back to get)

async def print_value(m):
  refetched_m = await Model.objects.aget(id=m.id)
  print(refetched_m.value)

m = Model.objects.create(value=1)
with transaction.atomic():
   Models.objects.update(value=2)
   async_to_sync(print_value)(m)

Should we implement some sync fallback magic for backwards compatibility reasons? That is to say, if an async connection is not already open via new_connection, then we should fallback to sync operations?

That would maintain backwards compatibility with existing code in the wild, but unfortunately might make it harder for people to realize they’re not using async connections at times.

Backwards compatibility is the overriding concern here, though.

Current idea is very brutal: if you’re in a new_connection context, you get the new async behavior. If you’re not, you’re in “every async API is just falling back to sync APIs” behavior.

1 Like

Maybe I should have some coffee, but what backwards compatibility concerns are we trying to cover here?

aget exists in Django already. When you call it, it calls sync_to_async(self.get). This will use the connection that is held in the django.db.connections connection handler, through an asgiref Local.
So… in the sync thread we have a connection that we can share across all requests on the sync thread.

But this connection is “asgiref context”-local. By default, we can’t share the same connection across multiple async tasks (absent just passing it around between tasks). And (by my understanding) the sync thread’s connection is not at all visible when you’re inside an event loop (ConnectionHandler.thread_critical = True).

In this PR with the proposed implementation, we are adding a new connection handler, django.db.async_connections. When calling async with new_connection() as conn: ... we are creating a new connection, distinct from a request that is currently within django.db.connection.

The sync thread has a connection, but an async task can spin up new connections with async with new_connection():.

    async def test_new_connection(self):
        async with new_connection(force_rollback=True) as aconn:
            conn1 = async_connections[DEFAULT_DB_ALIAS]
            self.assertEqual(conn1, aconn)

When these connections are created, they are then stored in async_connections, but are async context locals.

What is my point here? My point is that the sync API (going through the main sync thread) and the async connection API will be using different connections. In particular they won’t share transaction context.

Back to the example:

async def print_value(m):
  refetched_m = await Model.objects.aget(id=m.id)
  print(refetched_m.value)

m = Model.objects.create(value=1)
with transaction.atomic():
   Models.objects.update(value=2)
   async_to_sync(print_value)(m)

^ in the above code, the first line of print_value, in the current version of Django, calls aget. This just calls get. So it ends up getting the sync thread’s connection to make the query to the DB.

But we’re in transaction.atomic(). So… the get query happens inside the transaction where we updated the value… so we are going to get back a model with the updated value of 2.


I am trying to get aget to be “async-native”, and not just call out to the sync thread. If I don’t call out to the sync thread, I cannot use the sync connection. So… I create a new connection. That new connection won’t be within the transaction context that updated value. So if aget just gets a new connection through async_connections/new_connection within aget… it will see the model as having the old value of 1. This is different than the current behavior in Django.


My current branch does the following:

  • if we are not within a new_connection context, aget just falls back to get. In other words, existing behavior is maintained unless you opt into the new API.

Another idea I had, but did not do:

  • Make sync connections just wrap an async connection. Psycopg’s AsyncConnection is thread-safe, so can be shared between async tasks, or between the sync thread and async tasks. In that world, we could have a “native” aget that still queries in the same transaction context as get

An idea that I find hard to justify:

  • Break backwards compatibility for users of async ORMs in the case of an async_to_sync call within an atomic block, and don’t try to maintain this behavior

Hopefully this clarifies things a bit.

@andrewgodwin I would be curious to know if you had a vision in mind when it came to this example when aget = sync_to_async(get) was done as an API introduction. I have experienced a lot of misreads of DEP0009 while experimenting on this, so I might be missing something totally obvious in the text.

1 Like
  • if we are not within a new_connection context, aget just falls back to get. In other words, existing behavior is maintained unless you opt into the new API.

This is the behavior I previously proposed, and it seemed well received.

Make sync connections just wrap an async connection. Psycopg’s AsyncConnection is thread-safe,

This option is interesting, but I don’t know if we can count and async connections being portable for every backend. Would this work on, let’s say, async SQLite?

1 Like

This option is interesting, but I don’t know if we can count and async connections being portable for every backend. Would this work on, let’s say, async SQLite?

Sorry, I suppose my actual thought was “Make sync connections just wrap an async connection for backends that support async connections.”. For backends that don’t support native async connections or portability, the existing behavior would be maintained.

I just dislike having a backend where we are opening connections through two separate APIs in the lower leve.

I did a bit more work on my branch during PyCon AU sprints implementing a variant of this idea:

def should_use_sync_fallback(async_variant):
    # this checks how many "new connection" context managers deep we're in
    # (tracked by an asgiref.Local of course)
    return async_variant and (new_connection_block_depth.value == 0)

# ... over in the Queryset API
    async def aget(self, *args, **kwargs):
        if should_use_sync_fallback(ASYNC_TRUTH_MARKER):
            return await sync_to_async(self.get)(*args, **kwargs)
        # otherwise we're going down an async-native thing

The core point here being that there is a check for determining whether to use a sync fallback here. This provides a place for resolving the issue we’re talking about, as well as the issue of people overriding save but not asave in their custom Model methods.

Anyways it sounds like here there’s good consensus on the following idea:

  • We will maintain backwards compatibility with regards to existing async queryset APIs sharing the same connection context as sync queryset APIs in code that is valid in existing versions of Django.
  • Creating an async connection through the new_connection context manager places you into new semantics regarding connection context throughout that context. Inside there, async APIs use async connections, while sync APIs will continue to use sync connections
1 Like

Currently going through getting test coverage through async code paths…

    def test_invalid_batch_size_exception(self):
        msg = "Batch size must be a positive integer."
        with self.assertRaisesMessage(ValueError, msg):
            Country.objects.bulk_create([], batch_size=-1)

    async def test_invalid_batch_size_exception_async(self):
        msg = "Batch size must be a positive integer."
        async with new_connection(force_rollback=True):
            with self.assertRaisesMessage(ValueError, msg):
                await Country.objects.abulk_create([], batch_size=-1)

^ found myself needing to write a test like this. A bit worried this will be a fairly common form of code duplication needed in future tests.

I suppose if we have sync and async APIs, then “there are tests covering those two cases” feels inevitable. But either there’s tedium, or people would be tempted to have codegen for test code as well.

For now just going to do the manual transformations given these are small snippets, while trying to think about what might be best. This issue is honestly mostly a problem for early returns in queryset code, as other parts of the async code I’ve written gets fairly good coverage in the existing tests in test_async_queryset.py

1 Like

This is my main concern with where we’re at, at the moment. Medium term, surely, this gets out of hand. It’s already a pain with (say) middleware.

The psycopg codegen approach looks interesting.

I can’t shake the feeling there should be a higher-level abstraction. But I don’t have such to hand.

I think every language system with the “we have language-level support for coroutinees” like this is struggling with this problem (or, in JS’s case, just punting on it by saying “everything’s async now”), so at least we’re probably around the status quo on this.

The only ecosystem I can think of that doesn’t have this problem is Purescript, because you can abstract over the effect type (Aff/Eff…), which is a little bit of a hurdle for Django to say the least!

Having said that, one thing that I’ve felt is that I think the problem gets better, not worse, over time. Like we have all this existing code that we have to deal with, but new code can be written with effectful separation in mind.

At the end of the day if we are saying that .get and .aget are different things, and that .aget behaves differently depending on whether it’s in a new_connection context manager or not… saying “this needs to be 3 tests” feels like the default stance.

For some of this I’ve found it possible to just write one test and then parametrize it over “wrap the test in new_connection”.

But if we can extract parameter prep steps in particular, then a lot of the early-return stuff probably doesn’t need to be covered with 3 different tests! We can write a unit test on the validation, and have our one effectful test, and be done with it.

So my belief here is I have this changeset that is aiming to get very close to 100% coverage. Once I have that I will try and figure out how to chop it down. There’s a couple of refactoring of internals that could make it more straightforward for us to have coverage. Bit of elbow grease, but that’s it.

And once we have some foundation, we can move forward. We’ll have a one-time hit in messiness, but I don’t believe it will accumulate over time.


PS If you’re interested in the current state of things, this branch includes my work so far. There’s stuff in there that is for my debugging purposes (and also some not-yet-merged things like this PR and of course the async cursor PR), but it successfully has let the following take 1 second instead of 5 seconds (when I have 5 clients in this artificial test)

async def overall_summary():
    # look up all clients
    # for each client fetch all the invoices (with an extra bit to make it nice and slow)
    clients = [client async for client in Client.objects.all()]
    results = {}
    async with asyncio.TaskGroup() as tg:
        for client in clients:
            results[client] = tg.create_task(get_invoice_summary(client))

    results = {str(k): v.result() for k, v in results.items()}
    return results
async def get_invoice_summary(client: Client):
    async with new_connection():
        my_invs = [
            inv
            async for inv in Invoice.objects.filter(client=client).annotate(
                other=RawSQL("select pg_sleep(1)", [])
            )
        ]
        return {"count": len(my_invs)}

3 Likes

Hi all,

I am new here and I didn’t any contribution to the framework.

Anyway I think I can add something to the discussion. Another thing you’ll have to consider is about implicit data loading, for example when accessing (not yet loaded) foreign key references.

About code duplication the only library I found overcoming the issue without copying code is SQLAlchemy but it uses greenlet and this adds some complex patterns and code generation paths.

1 Like