DEP0009/ORM implementation plan

fcurella · July 22, 2024, 2:52pm

I’ve took a first pass at adding async-native support to the ORM.

No surprises, it’s going to be a big change in terms of code review, so I think we should break it into smaller phases. Ideally each of them would be a Pull Request and they could be delivered across multiple releases of Django.

Phase1: connection-level API

This would cover the new new_connection context managers, and provide an async cursor that the user could use.

The goal of this first phase is to give users a low-level async cursor that they can use for raw SQL query:

from django.db import new_connection

async def my_view(request):
    async with new_connection() as conn:
          async with conn.acursor() as cursor:
              await cursor.execute("SELECT * FROM my_table")
              result = await cursor.fetchone()

The scope will also include manual transaction management, such as acommit and arollback. transaction.atomic would be out of scope.

Phase 2: transaction.atomic

This would build on the previous work to provide an async-compatiable transaction.atomic decorator:

from django.db import new_connection, transaction

async def my_view(request):
    async with new_connection() as conn:
        async with transaction.atomic():
            async with conn.acursor() as cursor:
                await cursor.execute("SELECT * FROM my_table")
                result = await cursor.fetchone()

All methods in django.db.transaction would be in scope: acommit(), arollback(), asavepoint(), asavepoint_commit(), asavepoint_rollback(), etc.

Phase 3: Models and managers, `django.contrib` apps

In this phase we’ll coonvert manager methods from the current “faux-async” to be async-native. This will also include model methods such as .asave() and adelete(), the Delete collector.
We’ll also convert models in django.contrib that already have faux-async methods, such as contrib.auth and contrib.sessions. We will not be adding async APIs to any contrib apps that do not yet have them.

Does it sounds like a good plan? Comments, Questions, Concerns?

andrewgodwin · July 22, 2024, 8:52pm

How are you planning to do this at the low level - an entirely separate set of methods down to the database driver? That was always the duplication I was worried about.

fcurella · July 22, 2024, 9:20pm

an entirely separate set of methods down to the database driver?

Pretty much yes. I could only find a couple of methods that don’t need to be async’ed. It’s a lot of duplication, but I don’t see a way around it.

andrewgodwin · July 22, 2024, 10:30pm

Yeah, that was also my conclusion last time I looked at this. Hopefully a lot of the utility methods can be reused, but there’s no way around having a core of function calls that are essentially duplicates (much like I had to do with request handling).

I think your plan phasing makes sense; just getting the first part done would honestly be useful because it would mean the existing async functions wouldn’t have a performance penalty.

rtpg · August 11, 2024, 12:28pm

I’ve been looking a bit at this problem as well, and I’m wondering about what the end goal is here.

In principle I think the “write a<method> methods all over” strategy is nice because it requires the least amount of thinking. But there’s a lot of non-IO logic spread all over the querysets in particular, so this kind of deep cut feels more risky.

A part of me is really worried about subtle bugs showing up in these changes, and also new bugs not getting properly fixed in both versions.

For example, if we look at RawModelIterator, we have a single iter(query) operation which is where the I/O is happening, and nowhere else. In a world where we need to support __iter__ and __aiter__, we now just have to deal with that.

We can try to extract a bunch of stuff into helper classes when possible, but I do think there might be a real task here of making the queryset code a bit more … straightforward somehow. Like instead of having RawModelIterable be the magic iterable that not only calls iter but also applies all of these transformers, we extract as much non-IO as possible so that the async/sync duplication is more a question of “3 line function duplication” rather than “40 line function duplication”.

But the sync perf requirements are still there, so even with async versions of everything, we can’t simply say “get is async_to_sync(aget)”, so … we just have to keep two implementation trees forever?

The “we don’t want to lose sync performance” question, combined with how everything is set up… a part of me almost wants the async versions to be load-time AST-level transformations of the sync versions (result = maybe_await(thing) transforming into result = thing or result = await thing depending on the async-ness). Having these things be performant and reducing duplication overall feels like such a tarpit if there was some automation that saves us.

This is definitely a problem where having an explicit build step would open up some solutions, though that would definitely be controversial to say the least

rtpg · August 11, 2024, 12:42pm

Having said all of that, the raw connection/cursor API supporting sync and async makes all of this a lot less theoretical, and it feels necessary and also nicely scoped as a “mandatory” first step so… we should definitely go for it IMO

Topic		Replies	Views
async orm \ refreshing view / Async	2	916	December 9, 2021
Vinyl, a django-based async orm Async	0	1009	December 9, 2021
I am collecting requirements for future django version. Please speak! Getting Started	1	259	May 15, 2023
Asynchronous ORM Async	95	30198	June 1, 2023
Is DEP009 ("async-capable Django") still relevant? Async	16	1425	October 27, 2024

DEP0009/ORM implementation plan

Phase1: connection-level API

Phase 2: transaction.atomic

Phase 3: Models and managers, django.contrib apps

Related Topics

Phase 3: Models and managers, `django.contrib` apps