Should DEP009 still be persued?
On another thread, @carltongibson and I have been discussing what would be good areas for further investment in async Django and we realized this topic might deserve a broader discussion.
The above DEP discusses the primary goal of the async project like so:
The overall goal is to have every single part of Django that could be blocking -that is, which is not just simple CPU-bound computation - be async-native (run in an asynchronous event loop without blocking).
State of the World
At the time of writing, the following components of Django had some form of async-native support:
- Middleware
- Views
- The ORM
- Caching
- Signals
- Decorators
- Testing (including an async test client)
contrib.auth
contrib.contenttypes
contrib.sessions
contrib.staticfiles
There are several components called out in the DEP that do not yet have async-native support (Templating, Form validation, Emails) as well as many other components within Django that are still blocking (such as, importantly, the internals of the ORM itself / database backends).
Asyncification Experience
In thinking about how the “contrib asynficiation” project has gone so far Carlton surfaced this concern:
Which echoed my earlier concerns at the start of this project:
We haven’t found many mechanisms for reducing code duplication between sync and async components. The duplication has several negative impacts: increased fragility (what if a bug fix is only applied to the sync code path but not the async path?), reduces readability (line counts in asyncified components are effectively 2x what they used to be), and artificially increases the number of test cases (one test for the sync path and one for the async path).
The positive impacts have been fairly limited. Async has the greatest impact when code can wait on IO concurrently, but there are only marginal opportunities for this within asyncified code paths. The only code path I could find that does this is in the signals.asend
method:
In this instance we execute all async receivers concurrently. Other than that the only benefit for callers of the asyncified code is reduced context-switching. As discussed in the docs this only has a marginal benefit:
This context-switch causes a small performance penalty of around a millisecond.
When comparing the costs and benefits the process of asyncification seems somewhat hard to justify.
Further Steps in the DEP
The DEP has a further goal of converting the internals of different components to be async-native ONLY:
The principle that allows us to achieve both sync and async implementations in parallel is the ability to run one style inside of the other.
Each feature will go through three stages of implementation:
- Sync-only (where it is today)
- Sync-native, with an async wrapper
- Async-native, with a sync wrapper
So far we’ve only achieved step 2 in a few places, and to my knowledge nothing has achieved step 3. Achieving step 3 would mean performance hits in the other direction as calling async code from a sync context requires a context switch.
Where do we go from here?
Should we continue asyncifying various components within Django? Are the benefits worth the costs? Are there additional guidelines we should follow when considering asyncification work beyond the goals in the DEP? Should the DEP be superseded by something? Should Django eventually become async-native with sync wrappers as laid out in the DEP?
Carlton had a few ideas to spark the conversation:
Just to stake out an argument for people to argue against: I think we should continue with the spirit of the goal of the DEP and asyncify components that block inside Django. While writing a webserver in Python will never be blazing fast I think we should avoid performance hits where possible (in this case due to the context switches). I don’t think we should ever get to “step 3” (async-native, with a sync wrapper) as that would cause harm to existing users of sync django due to the context switch problem. Instead, we should set up stronger guidelines about what is an acceptable introduction of async code in the future to prevent code duplication. Whether or not this takes the form of a DEP I don’t know.
At the very least, I think it is worthwhile to push the async boundary down through the layers of the ORM to the database backend (and into the backend, if an async backend exists) and up to the templating system, but maybe we end up stopping there.