Asynchronous ORM

Altering the Django ORM to be async will be a significant undertaking.

I had a few initial thoughts on syntax to support async ORM queries in async contexts while still allowing sync ORM queries in sync contexts (and sometimes also in async contexts when no queries are actually issued):

A. ORM queries happen explicitly when fetching specific models. For example:

question = Question.objects.first()  # explicit query

question = Question.objects.get(id=...)  # explicit query

question = Question.objects.prefetch_related(
    'choice_set'
).get(id=...)  # explicit query

I think we could make those functions also support async calls at the same time as sync calls by replacing the function object with an object that implements both __call__ and __acall__. So in an async view you could write:

question = await Question.objects.first()  # explicit query

question = await Question.objects.get(id=...)  # explicit query

question = await Question.objects.prefetch_related(
    'choice_set'
).get(id=...)  # explicit query

B. ORM queries happen implicitly via property access when accessing related single models. For example:

choice = ...
question = choice.question  # implicit query if not prefetched

Python doesn’t currently support any kind of asynchronous property syntax. Perhaps we could have a different “async namespace” (perhaps called something short like a) that could used to access properties:

choice = ...
question = await choice.a.question  # implicit query if not prefetched

Additionally, we could still allow the regular property syntax in the common case where a view function does an early prefetch of all models it uses and the property being accessed leads to a model that has already been fetched:

choice = ...
question = choice.question  # ok only if has been prefetched

If the above syntax is used when a property has NOT been prefetched, a suitable exception can be raised:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnprefetchedRelatedModelAccessError: Related model Choice.question cannot be
fetched synchronously from within an async view function or other async context.
Consider using select_related() or prefetch_related() to prefetch the
model before accessing it.

C. ORM queries happen explicitly when accessing related model managers for relationships that have not been prefetched. For example:

question = ...
choices = question.choice_set.all()  # explicit query if not prefetched

In an async context we could use the same approach as A to make methods like all() support both __call__ and __acall__:

question = ...
choices = await question.choice_set.all()  # explicit query if not prefetched

Additionally, we could still allow the regular property syntax in the common case where a view function does an early prefetch of all models it uses and the property being accessed leads to a collection that has already been fetched:

question = ...
choices = question.choice_set.all()  # ok only if has been prefetched

If the above syntax is used when a collection has NOT been prefetched with prefetch_related(), a suitable exception can be raised:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnprefetchedRelatedModelAccessError: Related models in Question.choice_set cannot be
fetched synchronously from within an async view function or other async context.
Consider using prefetch_related() to prefetch the models before accessing them.

D. ORM queries can happen in view function decorators:

@user_passes_test(lambda user:
    user.inschooluser.type == 'teacher'  # explicit query, if not prefetched
)
def gradebook(request: HttpRequest) -> HttpResponse:
    ...

General decorators should be able to detect whether the view function they are wrapping is async or sync and thus should be able to support async view functions:

async def _user_is_teacher(user):
    return (await user.a.inschooluser).type == 'teacher'

@user_passes_test(_user_is_teacher)
async def gradebook(request: HttpRequest) -> HttpResponse:
    ...

E. ORM queries can happen in middleware. And async middleware has already been implemented.

F. ORM queries can happen in management commands. I’m not sure whether async management commands have been implemented or not, but I imagine they’d be straightforward.

G. ORM queries can happen in an interactive shell prompt. I don’t know too much about interactive async REPL prompts.

It should be possible to rewrite existing sync views as async views without significant effort, with the availability of syntaxes A-C in particular:

  • Small view functions are easy to rewrite no matter what the syntax.

  • Large view functions, which are probably already prefetching all models early anyway, will need to add some await expressions to the leading prefetch calls, but all other functions that the view function calls can continue to use the existing sync model access syntax without modification.

    • In situations where a dynamic model fetch after the initial prefetch is potentially needed, await expressions and intermediate async functions can be introduced only where necessary.

Comments?

6 Likes

Sadly there is no __acall__ in Python - instead, await foo() calls __call__ and expects it to return a coroutine (the await is separate and you have no idea it’s coming).

This sadly means that we can never write functions or methods that support both sync and async modes - we have to namespace anything async with a separate name from its current sync version. This includes querysets - so unfortunately, while I love your ideas, they’re never going to be possible.

It’s also sadly true for attribute access - that’s always forced synchronous.

What is possible:

  • Separate queryset functions for things that are non-lazy (e.g. await Question.objects.first_async())
  • Preventing attribute access from working if you’re in an asynchronous context (this is already in - it’s dangerous as it blocks the thread)
  • Asynchronous iterators (e.g. async for item in Question.objects.all():)

What isn’t possible:

  • Re-using the same functions to serve both async and sync (e.g. await Question.objects.get())
  • Asynchronous attribute access (e.g. await x.foreign_key.id)
1 Like

Shucks. I had thought there was an __acall__. The lack of it certainly limits options. It sounds like then a lot more bifurcation is needed than I thought.

I had mentioned introducing an “a” namespace for async-versions of functions and properties. It’s an easy convention to remember to map synchronous function names to their async equivalents. A few examples:

question = await Question.objects.a.first()  # explicit query

choice = ...
question = await choice.a.question  # implicit query if not prefetched

question = ...
choices = await question.choice_set.a.all()  # explicit query if not prefetched

Also as mentioned before, I’d propose that synchronous query patterns still be usable in async contexts in the common case that the related objects are already prefetched:

choice = ...
question = choice.question  # still ok in async context iff has been prefetched

I am not sure where I can read more about past discussion about the async ORM design but IMO the limitations above could point in a completely different direction: not chaning the current Manager but adding an AsyncManager (and AsyncQueryset for that matter).

Those new queryset and manager would mostly mirror the current methods on querysets and managers but only implement the async versions.

The migration for models would thus be quite straightforward, you can just opt in the new manager for your models and use async for it. The added benefit of completely separating is that by only using the async manager you can have more peace of mind regarding queries. In Django it is always tricky for beginners to know when a query will happen, and we know it often leads to inefficient behavior. If you use the async version, you should get a strong guarantee that a query will ONLY happen if you await something.

Concerning the attribute access, personally I see 2 main solutions:

  • Adding a accessor attribute to the ForeignKey and all property that create related accessor on the relation. By default this would point to the current FetchIfNeeded behavior. There would be other possible accessors such as AsyncFetchIfNeeded (the attribute needs to be awaited, and a query will happen if it hasn’t been prefetched) or OnlyPrefetched (the attribute will work as today, but will just fail hard if it hasn’t been prefetched). I think allowing the customization of the accessor could allow additional benefits in the future as well. For instance I have the use case of supporting views as unmanaged models with Django but if the view mirrors a ForeignKey it is not currently possible to have a simple reverse relation.
  • simply disallow accessing the attribute in async context if it hasn’t been prefetched. it has the strong benefit of enforcing the guarantee I mentioned earlier (queries happen each time when you call asyncand never elsewhere).

A trickier problem is the queries made through calling model instances such as save. Here I don’t see much “clean” solution. A potential thing would be to have a AsyncMixin for Model that changes those methods into an async variant. The downside is that if it’s optional it will probably be a bad fit for libraries as it would force the user to use the async / non async version only depending on the library choice. Another possibility would be to have a setting for models that should have async methods, so that you can opt in yourself for both your own models and library ones. Of course there is always the possibility to add another async variant of those such as save_async.

In general my point of view regarding adding new _async variants of the existing methods is that it is dangerous for the ecosystem: it is easy at the beginning to opt in, but then what happens later? Is it going to be _async forever? If no then at some point it would also need a _sync variant and a (potentially deep) rewrite of all libraries.

I did ponder a separate manager, but most manager methods are perfectly fine as querysets are lazy (and we’d have to mirror queryset the same way), so at the end of the day, async variants of the execution-causing queryset methods (get, first, values, etc.) is my preferred approach.

As for attribute access, I want to reduce the scope of the work there and instead, we’ll just block it if you’re in async mode and recommend people use select_related instead. You almost always want to use that anyway these days.

save is annoying because we encourage overriding of it, but we don’t have any other choice other than to provide an async variant that’s separate - I don’t want a mixin that changes the signature as a) I believe signatures of methods should not fundamentally change in subclasses and b) we want to encourage hybrid projects as the future - sync views for the simple/safe bits, async views for the performance-critical bits.

I wish there was another option but we’re a bit hemmed in by the language design here.

1 Like

Seems like a good first step. Later we can open up attribute access in async context for more complex cases.

Shucks I forgot about save. For someone writing a Model in a reusable Django app, would it then be necessary to write both a sync AND and async version?

Maybe there would be a way to write just one version and use one of the async_to_sync / sync_to_async bridges? I suspect that both bridges imply a non-trivial performance hit though…

I really wonder if it would be possible to make some kind of syntax that allows a sync/async method pair to be put together… Perhaps something like how the @property decorator can be used to combine a getter and a setter:

class MyModel(models.Model):
    def _save_sync(...):
        ...
    async def _save_async(...):
        ...
    save = multisync(_save_sync, _save_async)

or:

class MyModel(models.Model):
    @multisync
    def save(...):
        ...
    @save.async
    async def save(...):
        ...

I like this syntax, although I’d have to think a bit more about how to implement it (or whether it actually can be implemented as written).

I think initially save_async will just be a sync_to_async wrapper around save, so one override will still work. If we can get the transactions happy, that is.

It’s unfortunately literally impossible because both await foo() and foo() call __call__, and provide no context as to what the next move is (because await is a separate keyword, so Python doesn’t parse it as “async call”, it parses it as “sync call, then await the result”).

In order to make single-name-dispatch work, we would need either:

  • __acall__ to be implement, which last time I talked to Python core devs is pretty unlikely because of the fact await is a separate statement.
  • To return coroutines if there is an active event loop and to run synchronously if not, which is going to cause some really nasty bugs that will be impossible for people to track down, and would break existing code.
  • To have a context manager that makes everything inside it run in an async mode, which is better than the second option but still going to be really weird if you call another function inside that with block.

Here’s my idea: Recognize the expression form “await foo(…)” specially – that is, an await expression directly wrapping a call expression – and alter the semantics to look for an __acall__ attribute on foo and use it if possible, otherwise falling back to the usual behavior. This change would allow implementing something like the @multisync decorator I mentioned earlier in the thread.

This proposal has the downside of breaking some usual expectations around substitution. In particular the following two methods would NOT do the same thing:

async def add_like(post: Post):
    post.likes += 1
    await post.save()  # sees __acall__ on `post.save` and uses it

async def add_like(post: Post):
    post.likes += 1
    save_co = post.save()  # performs regular __call__
    await save_co  # does NOT invoke __acall__ here

I also think that this proposal has a downside of making await expressions operate a bit slower, since they’d have to do a lookup for __acall__. Although perhaps if the lookup is done in C (nearly certain) and a new slot is added to optimize __acall__ lookup (possible), perhaps the slowdown could be minimized to an acceptable level.

Yeah, I think the separation reason is why acall was not implemented - without removing await as a separate statement, you sort of break a fundamental tenet of the language.

The only other idea I had that was slightly workable was a different “kind” of call, but we’d have to do that with attributes - e.g. await post.save.async(). That won’t work directly as async is a reserved word, but it’s at least implementable.

Interesting. I’ll have to look for the prior discussion to see exactly was said and whether the arguments still substantially apply.

I like that idea, although I might advocate for something shorter than async such as just a. So then you might write await post.save.a(). Normally I’d advocate for the longer/more-descriptive form (async rather than a) but this kind of call I expect would be so common a thing to type in async-aware Django apps that I’d lean on the side of brevity in this case.

On the other hand, if folks disagree and would prefer to use the longer async form instead, we might be able to advocate to Python core devs to downgrade async from a full-on language keyword and allow it to be used in identifier context (which would unlock the use of the word async as a method/property name here).

a was also something I’ve been considering due to the reserved nature of async, but it does feel kind of non-Django-ish to do things with abbreviations. That said, even if we can get Python core to downgrade async’s keyword status (which seems unlikely since it got made more strict from 3.6 to 3.7), we have to support all the current releases anyway.

Dang. Yeah now I see that PEP 492 (“Coroutines with async and await syntax”) had planned from the beginning to intentionally upgrade “async” to a reserved keyword. And then it was actually done in Python 3.7 to “cement the asynchronous constructs we’ve been using since 3.5.” So it could be difficult to advocate for a downgrade…

The usual alternative spellings that I see for async:

  • async_ (with a trailing underscore) and
  • asynchronous,

seem either too ugly or too long to use directly. Hmm.

  • asynch is probably too subtlely different from async. Pronounced the same too…
  • So now I’m back to a, which does work but isn’t very descriptive (although is very easy to type).

Let’s unpack that: If we did get async downgraded to a non-reserved keyword in Python 3.X, we could put any new async-ORM supporting code inside a conditional block that only executes in Python 3.X+:

# django/db/models/base.py
class Model(...):
    def save(...):
        ...
    
    if sys.version_info >= (3, X):
        async def _save_async(...):
            ...
        save.async = _save_async

Is it necessary that all Django features be available on all Python versions supported by Django (even older Python versions)?

Nevermind. We’re dealing with a syntax error, so the usual if sys.version_info >= (3, X) trick won’t actually prevent a syntax error on say Python 3.7 (where 7 < X). So no bare async for us.

As mentioned above, a still seems the best contender for a bare word that could also be used as a “namespace” for other async-related versions of methods/properties/etc.

Yes, this is why I was grudgingly expecting a as well if we didn’t go for the get_async, values_async naming variant instead.

Still not sure if I want to totally separate it though, given that most of the manager/queryset methods don’t need touching. It would be weird to do objects.filter(x=2).order_by("something").a.first()

I. Async Method Syntax Styles

A quick summary of syntax styles mentioned so far:

A. Trailing _async suffix:

await objects.filter(...).order_by(...).first_async()

Pros (of async): Explicit. Method overriding (ex: for save_async) feels a bit more natural.
Cons (of separate method): Encourages documenting the sync/async method pair separately, which would bloat the docs.

Pros (of trailing form): The await (at the beginning) and the async (at the end) conceptually are in a symmetic formation, which is easy to see and remember: “If you start with ‘await’ then you need to end with ‘async’.”

B1. Trailing a namespace: (new)

await objects.filter(...).order_by(...).first.a()

Pros (of a): Succinct.

B2. Leading a namespace:

await objects.filter(...).order_by(...).a.first()

Pros (of leading form): Meshes better with at least one syntax for async field access (i.e. await model.a.field). However I could see other field access syntaxes.

II. Async Field Access Syntax Styles

A. Trailing _async suffix: (new)

await choice.question_async

B1. Trailing a namespace: (new) - :no_entry_sign: Not implementable

await choice.question.a

B2. Leading a namespace:

await choice.a.question

C. Leading a lookup method: (new)

await choice.a('question')
# or
await choice.a['question']

:-1: I don’t like the extra quotes and parens which are combersome to type for a common operation.

III. Thoughts

  • I’m leaning toward the trailing forms.
  • I’m feeling a bit better about using _async as a general suffix.
    • The biggest caveat I see is possibly doubling the number of functions in the documentation. But then again, Sphinx has enough control that you can just document foo and foo_async as two method names with the same description. (The documentation for the typing module’s fields {IO, TextIO, BytesIO} shows that this pattern is possible.)
    • And of course I’m also a bit worried that _async is a lot of extra letters to type.

Comments?

I’m not too concerned about the extra letters - Django has always been about explicitness rather than conciseness, and I think that should continue to apply here.

And I agree, trailing is best, mostly as (if nothing else) it groups the sync and async versions of the function together in most views of functions and allows nice tab-completion if you have it.

@andrewgodwin Django already raises SynchronousOnlyOperation where querying from an async context. Instead of raising, could it return a proxy object with __await__ declared (or just an async def coroutine wrapper function), providing async capabilities?

This could be set up as a decorator for things like .get() and .save(), and could initially just return a sync_to_async(func) version of the method, but could also look for an async variant of the method (renamed appropriately, e.g. _async_methodname) on the object.

Another option is to use the inspect framework to walk the stack and see if we’re in a coroutine (instead of looking for the running event loop), but I’m not sure how efficient that is (or if you can even identify coroutines within inspect, but I’d expect you can). Not sure if this is worth doing, but I felt it might be worth listing.

This should allow full compatibility with the existing api, and could even potentially work for properties that weren’t collected with select_related.

(1) Okay let me then summarize the syntaxes in async context that are the best candidates so far:

  • Single-Model Returning Methods
    • await objects.filter(...).order_by(...).first_async()
    • await Question.objects.get_async(id=...)
  • Multiple-Model Returning Methods
    • choices = await question.choice_set.all_async() # need not be prefetched
    • choices = question.choice_set.all() # raises if not prefetched
  • Model Field Get
    • q = await choice.question_async
  • Model Field Set (Deferred)
    • choice.question = q
  • Single-Model Save
    • await question.save_async()
  • Multiple-Model Save
    • await choice_set.update_async(...)
    • await bulk_update_async(...)
    • await bulk_create_async(...)

(2) A user who creates their own model class which overrides save who wants their model to also be used in an async context should also override save_async:

class ProjectTextFile(models.Model):
    name = models.CharField(max_length=50)
    content = models.TextField(blank=True)

    def save(self, *args, **kwargs):
        if ProjectTextFile.is_content_too_big(self.name, self.content):
            raise ValidationError(...)
        super().save(*args, **kwargs)
    
    async def save_async(self, *args, **kwargs):
        if ProjectTextFile.is_content_too_big(self.name, self.content):
            raise ValidationError(...)
        await super().save_async(*args, **kwargs)

If only one of save or save_async is overridden, I’d have to think more about the consequences…

  • The built-in admin app currently would always use the synchronous save and ignore any save_async.
  • User code that was familiar with the model class would presumably invoke whichever save method was implemented.

(3) It feels a bit weird to have a method’s return type vary depending on whether it is being invoked from a sync context vs. an async context, but perhaps it could work…

I could see this trick working for methods like .get() and .save(). For model fields the getter would have to return a proxy if invoked from an async context, but the setter would continue to defer any actual set operation regardless of whether it is invoked from a sync/async context.


(4) Ick. Inspecting the current stack frame is almost certainly quite slow. Better stick with querying the event loop.

How much work is it to make the ORM solely async-safe? As in, just being able to safely invoke the ORM as it is today from within a coroutine, with no expectations of the coroutine yielding during database calls.

This work would (I believe) solely affect ORM internals, (eg: scoping db connections per-coroutine rather than per-thread), so could be done without being blocked by the tricky async API work you’re contemplating here.

I think it could be a useful half-way house, because many database calls are very fast and there isn’t massive gain in yielding the coroutine during that time. The big gains are with heavy queries and external API calls, where you do the extra work of using sync_to_async and async http clients respectively.

Sorry if this is slightly off topic, but I was curious what new async features (if any) are slated for Django 3.2.

We have a production Django application we’d like to add some more async to, and were just curious if there was a tentative roadmap of what new async features might land in 3.2 vs 4.0, etc. We didn’t see anything in the current 3.2 release notes yet.

Thanks.