Is DEP009 ("async-capable Django") still relevant?

Should DEP009 still be persued?

On another thread, @carltongibson and I have been discussing what would be good areas for further investment in async Django and we realized this topic might deserve a broader discussion.

The above DEP discusses the primary goal of the async project like so:

The overall goal is to have every single part of Django that could be blocking -that is, which is not just simple CPU-bound computation - be async-native (run in an asynchronous event loop without blocking).

State of the World

At the time of writing, the following components of Django had some form of async-native support:

  • Middleware
  • Views
  • The ORM
  • Caching
  • Signals
  • Decorators
  • Testing (including an async test client)
  • contrib.auth
  • contrib.contenttypes
  • contrib.sessions
  • contrib.staticfiles

There are several components called out in the DEP that do not yet have async-native support (Templating, Form validation, Emails) as well as many other components within Django that are still blocking (such as, importantly, the internals of the ORM itself / database backends).

Asyncification Experience

In thinking about how the “contrib asynficiation” project has gone so far Carlton surfaced this concern:

Which echoed my earlier concerns at the start of this project:

We haven’t found many mechanisms for reducing code duplication between sync and async components. The duplication has several negative impacts: increased fragility (what if a bug fix is only applied to the sync code path but not the async path?), reduces readability (line counts in asyncified components are effectively 2x what they used to be), and artificially increases the number of test cases (one test for the sync path and one for the async path).

The positive impacts have been fairly limited. Async has the greatest impact when code can wait on IO concurrently, but there are only marginal opportunities for this within asyncified code paths. The only code path I could find that does this is in the signals.asend method:

In this instance we execute all async receivers concurrently. Other than that the only benefit for callers of the asyncified code is reduced context-switching. As discussed in the docs this only has a marginal benefit:

This context-switch causes a small performance penalty of around a millisecond.

When comparing the costs and benefits the process of asyncification seems somewhat hard to justify.

Further Steps in the DEP

The DEP has a further goal of converting the internals of different components to be async-native ONLY:

The principle that allows us to achieve both sync and async implementations in parallel is the ability to run one style inside of the other.
Each feature will go through three stages of implementation:

  • Sync-only (where it is today)
  • Sync-native, with an async wrapper
  • Async-native, with a sync wrapper

So far we’ve only achieved step 2 in a few places, and to my knowledge nothing has achieved step 3. Achieving step 3 would mean performance hits in the other direction as calling async code from a sync context requires a context switch.

Where do we go from here?

Should we continue asyncifying various components within Django? Are the benefits worth the costs? Are there additional guidelines we should follow when considering asyncification work beyond the goals in the DEP? Should the DEP be superseded by something? Should Django eventually become async-native with sync wrappers as laid out in the DEP?

Carlton had a few ideas to spark the conversation:

Just to stake out an argument for people to argue against: I think we should continue with the spirit of the goal of the DEP and asyncify components that block inside Django. While writing a webserver in Python will never be blazing fast I think we should avoid performance hits where possible (in this case due to the context switches). I don’t think we should ever get to “step 3” (async-native, with a sync wrapper) as that would cause harm to existing users of sync django due to the context switch problem. Instead, we should set up stronger guidelines about what is an acceptable introduction of async code in the future to prevent code duplication. Whether or not this takes the form of a DEP I don’t know.

At the very least, I think it is worthwhile to push the async boundary down through the layers of the ORM to the database backend (and into the backend, if an async backend exists) and up to the templating system, but maybe we end up stopping there.

5 Likes

I, unsurprisingly, have some thoughts on this :slight_smile:

The function colour problem is very real, and I do think that the idea of making absolutely everything async is probably not sensible at this point; a lot of code is perfectly fine as it is, and we don’t have the people to do such a big overhaul and then test it right now anyway.

I do think, though, that a fully async ORM is worth the cost still; Django is more and more used as a place to tie together a lot of API and database calls (that’s what it’s really good at, if you ask me), and this is the place where we could get significant speedups for common query use cases with some nice wrappers - a way to run multiple queries in parallel and get the results would be the obvious first one.

That said, I do think we’ll never be able to make it fully async only in the ORM core, as the slowdown in sync mode will just be too much. Given that, I’m very realistic about the fact that we may just not be able to write and maintain what are two parallel ORM cores (the query planning could be shared, but the execution layer would have to be written twice due to the function colour problem).

I don’t think templates are worth the effort at the moment, personally; the only real use case here would be streaming things out, and we should just be making sure we can support Jinja2 and its async support for use-cases like that.

3 Likes

I’ve been looking at DEP009 and took a stab at the ORM just for fun. If there’s still interest in it, I’d love to volunteer!

2 Likes

Thank you for raising this @bigfootjon
I’m going to highlight that this increased fragility is particularly concerning for security vulnerabilities.
It would be great if we somehow have a pattern or tooling to help us mitigate the risk of us exposing a security vulnerability, because we only fixed one path being unaware of the other one.

Do we have concrete ideas for this?
Adding references to their sister method in every docstring (both directions, tests included) is a suggestion. Not a foolproof one :thinking:

Just to clarify, my scepticism here is about rolling async all the way through e.g. all the contrib modules. auth, yeah, OK I get that. I can even see async sprinkles on the admin being cool (thinking about notifications in mult-user situations). But the base admin views, the syndication framework, … — given the duplication issues, I’m not sure we ever need to make those async.

I’m very excited by the prospect of async cursors (and more) making their way to the ORM.

(I don’t think this contradicts anything anyone else said, I was clarifying what my position is)

2 Likes

Not particularly. I’ve had a few ideas which sounded perfect when I first thought of them but never worked when I actually wrote them down. I do have a few ideas based on other work on other languages, but I think what we really want is a language feature.

The Goal

In Python the only conceptual analog I’m aware of is “sans I/O”: https://sans-io.readthedocs.io/
But it’s not terrible applicable here afaict.

The fundamental pattern we need to solve for can be represented here:

async? def foo():
    return ...

async? def do_foo_checked():
    result = await? foo()
    if not result:
        raise Exception("foo did not succeed")
    return result

(where the ? markers mean we want to make this callable from an async context or a sync context, I made this syntax up to explain what we WANT with syntax inspired by: Extending Rust's Effect System)

The above would be ideal, but it’s a language feature so idk how realistic it would be to expect it (and it would be years before we could adopt it in Django anyway)

In other words, we want to call an async function and then do something with the result of that function call. This cannot be handled by a “single” function right now because await cannot be used outside of async functions (among other challenges)

As far as other ideas I’m aware of from other languages:

Promises - JavaScript

Promise in Javascript: Promise - JavaScript | MDN

(caveat against this idea: You don't need promises in Python: just use async/await!)

We could translate the above example into a promise-like API like so:

def foo():
    return Promise(...)

def do_foo_checked():
    def or_throw(result):
        if not result:
            raise Exception("foo did not succeed")

    return Promise(foo()).then(or_throw)

Then Promise could offer 2 APIs: resolve and aresolve where resolve uses non-async IO to resolve everything and aresolve awaits anything that needs I/O. The implementation here could end up elegant or tricky, I haven’t really investigated this idea because it feels so un-pythonic and is pretty horrible to actually write code in since python doesn’t have multiline lambdas.

There’s also some open questions around how we could do concurrency in both contexts etc.

There is a promise package on PyPI but it doesn’t appear to do what we need (handle I/O based on calling context): promise · PyPI

AST Rewriting - Rust

Rust has a fascinating crate called maybe_async: maybe_async - Rust

(for those that don’t speak Rust, this package uses a Rust “macro” to transform the code at compile-time into the different variants)

We could probably implement something similar with a decorator and the ast module. But this technique kinda scares me. As an example it could look like this using the example above:

@create_sync_version
async def afoo():
    return ...

foo = afoo.sync_version

@create_sync_version
async def ado_foo_checked():
    result = await afoo()
    if not result:
        raise Exception("foo did not succeed")
    return result

do_foo_checked = ado_foo_checked.sync_version

(where the implementation of create_sync_version is a decorator that duplicates the ast of the wrapped function and transforms it to remove await keywords and fixes the functions called to the sync versions, then binds this transformed ast to a sync_version attribute on the original function)

This is complex as heck, so I don’t love this either.

Conclusion

In other words, I have hacks instead of solutions. However, if either of these ideas (promises or ast hacks) are exciting or palatable to anyone I’m happy to work them up as PoCs (or report back on why they don’t work when actually attempted).

2 Likes

I think it would be useful to hear from other Python projects that implements both async and sync API. I would be particularly interested in learning from Psycopg3’s experience.

I was curious about this, and if you look at cursor and cursor_asnyc in psycopg, you can see that basically the cursor_async implementation of various methods really are just “async/await sprinkles on the cursor methods”. Seems like they have the same sort of issue Django has.

I just came across this Python discussion (adding here as it has relevance to the discussion):

1 Like

I agree that this is very relevant to Django’s async pains. I posted something in the Python Ideas forum to try and spark a conversation but people didn’t seem to take the bait :rofl:

In all seriousness I do care about this issue a lot, and I need to find out who I can talk to Python-side to get a serious convo starting that tries to be productive

1 Like

Another detail here: I reached out to the maintainer of psycopg, he pointed out that how psycopg is handling things is through a “build step”.

Basically the async version is the “official” version, and a sync version is generated through an AST walker found here. While I think for Django there’s some asterisks to be had, I think it would make sense to explore something in this vein once async cursors and the like become available.

4 Likes

Psycopg just published a blog post on their workflow: Automatic async to sync code conversion — Psycopg

Seems a bit complicated, but perhaps not TOO complicated of a strategy

2 Likes

I have been working on trying to just add support for async cursors from postgresql, and I’ve unfortunately hit what I believe to be a major design challenge with the save/asave strategy on Model in particular.

Currently, asave just calls into save. So Model subclasses with overwritten save methods work fine.

If we want async saves “for real”, the straightforward way to do so is for asave to not call into save (which can block). But if we make that change, suddenly there’s this behavior divergence between calling save or asave with existing code!

This is of course true for overwriting more or less any method in the stack here. But save it feels acute because even in my test app I am using the venerable TimeStampedModel which… overwrites save!

    def save(self, *args: Any, **kwargs: Any) -> None:
        """
        Overriding the save method in order to make sure that
        modified field is updated even if it is not given as
        a parameter to the update field argument.
        """
        update_fields = kwargs.get('update_fields', None)
        if update_fields:
            kwargs['update_fields'] = set(update_fields).union({'modified'})

        super().save(*args, **kwargs)

So in a world where asave doesn’t call into save, what is the transition strategy for TimestampModel? And how do we avoid exporting the function coloring problem to users?

In a world where asave calls save, what magic could we use to have multiple concurrent pieces of work?


Right now, calling aX functions will quickly get you into a sync_to_async(X) call. So async calls all will quickly get pointed “back” to their sync variants.

This means that, roughly, every time I do a top-level async operation I will pay one sync_to_async cost. I think it’s important here to say that the cost doesn’t compound on deeper trees, given that you end up “in sync mode” for the rest of your call tree.

If we inverted things, so that calling X calls aX, then we instead pay one async_to_sync cost per top-level operation.

But if we have X call aX instead of the contrary, we end up in an interesting place. Overrides for save “just work”, and it’s only people calling asave who need to figure things out.


But the cost to sync mode might be too much? In that case it might make sense to have two model variants, models.Model and models.AsyncModel. In Model, save is the “canonical” version. In AsyncModel, asave is the canonical version.

Of course if you wanted TimeStampedModel to support both, you need to implement save and asave… so you start getting into things like "I’m using signals for things like TimeStampedModel (or we add hooks for pre-save/post-save as model methods…). But if Model subclasses get a hold of the thread of control here for I/O, we’ve already kind of lost.


On a higher level, I do think that saying something like “users should only need to override save as a last resort” (in the same way that the ORM has raw for last resorts), because then people could move code over to strategies that don’t require controlling the thread of execution. And in that model you could outright have semi-automated checks to help people figure out why their models might not be async-performant.

If we had def prep_save(self, **save_kwargs) and def do_post_save(self, instance, **save_kwargs) in the model API, that might be enough to avoid most save operation requirements… though then we again hit the function color problem.

But with such hooks inserted now, it would be, IMO, reasonable to say something like “we’ll look at whether the hook is a coroutine or not, and pay a sync/async transition cost if needed”. Or have some other magic that would let people write async-agnostic hook implementations.

After having written all of this, not clear to me how much custom save methods come into play (often, custom processing people do tends to be in forms, DRF serializers etc…), but there might simply be a thing of saying “you gotta override save and asave now”. I do think it will likely require some sort of transition release to help people figure out what’s needed.

EDIT: Did notice this post offering one idea of using a context manager to indicate you really do want async-ness. There might be a way to combine this idea with my concerns to make it possible to flag where some overwritten methods might be lurking

I think it’s reasonable to say this is a backwards incompatible change, so long as we can flag it somehow with a warning and documentation in Django before we actually change the implementation of asave.

I think the complexity explosion here isn’t worth it. I think providing a phase-in transition path is the better solution because in the long term we don’t want to have to maintain 2 different classes that do the same thing purely for historical backwards compatibility reasons.

Was unsure about how to think about this, but realized that like with other cases, code could have a Django version check to either override one thing or override the other.

Release notes saying “if you override either of these, please make sure to override the a’d version instead from this version” might cover it (not a small amount of work of course)

1 Like

I’ll make a more developped proposal etc but based on @fcurella’s branch I was able to get codegen working! Add unasync codegen helpers · rtpg/django@c4ac6ec · GitHub this is the POC commit, the short version is that:

    @generate_unasynced_codegen
    async def ainit_connection_state(self):
        """Initialize the database connection settings."""
        global RAN_DB_VERSION_CHECK
        if self.alias not in RAN_DB_VERSION_CHECK:
            await self.acheck_database_version_supported()
            RAN_DB_VERSION_CHECK.add(self.alias)

in a class definition will, after running the codegen script, insert

    @from_codegen
    @async_unsafe
    def init_connection_state(self):
        """Initialize the database connection settings."""
        global RAN_DB_VERSION_CHECK
        if self.alias not in RAN_DB_VERSION_CHECK:
            self.check_database_version_supported()
            RAN_DB_VERSION_CHECK.add(self.alias)

just above it. Comments are preserved, the sync version “looks canonical”, the codegen script is idempotent, and at least so far I have been able to make it so that async → sync leads to no actual diff within the sync body (i.e. no behavioral changes).

Unreasonably excited about how easy this was (like 180 line transformation script, including having if ASYNC_TRUTH_MARKER: A else: B collapse to B for the unasync version, hacking in aconnection -> connection renames…), I imagine there’ll be dragons but given the psycopg proof of this strategy somewhat working… very pleased.

5 Likes

This seems very interesting. And psycopg demonstrated that this approach can works well.