Async Performance

I spent some time today running performance numbers with various parts of the async stack loaded in.

My main concern is the speed impact we make on synchronous Django when this code lands - I don’t want sites that only use sync code to see a serious performance impact.

Sadly, right now we’re seeing a 10x performance slowdown on the request processing (from 0.6ms to 6ms). I did some playing around and the lowest we can possibly get for this (i.e. the cost of instantiating an event loop at all) is 0.8ms, which is not a huge increase and acceptable if we get down to it.

This is going to need some modification of asgiref and the complex code around threadlocals; now we have the ability to run sync code in the same thread as other sync code, a lot of it can likely be thrown away and replaced just with a safety catch that makes sure you don’t call them from async threads.

Otherwise, there’s the chance we’ll have to maintain two request paths in BaseHandler - one sync and one async. It’s not a huge amount of code, but it would be nice to avoid this.

5 Likes

Hi Andrew. I am reluctant to spend too much additional time on the async middleware until we have a plan for the overall performance concern.

In particular, it would be useful if we had some kind of light benchmarking tool developed that would show (1) the synchronous performance and (2) the asynchronous performance of: a tiny Django app with just a view function returning a fixed hello-world response with no other middlewares or models involved.

Such a tool would allow experimentation (and quick cycling) to get the basic baseline performance to a reasonable value. It would also be useful for me - and probably others on the async Django project - to ensure that PRs we’re putting together don’t regress performance unacceptably.

Since you detected this performance issue already, I suspect you may already have a leg-up on the creation of such a benchmarking tool. If not, then myself or others may be able to put one together once we can find some cycles.

1 Like

I think it may be a bit too early to focus on performance regressions. There is still a considerable global slowdown inherent to sync_to_async/async_to_sync which will probably need to be resolved independently of the work on the rest of the features.

I still wonder how the performance will look when we start comparing views that are more IO-bound (IIRC, the testes were mostly around an empty view).

It’s probably not too difficult to put together such a test with https://github.com/django/djangobench, but it would, of course, be easier if we can reuse what’s already been done.

I do indeed have a benchmarking tool in that I have a very basic Django view that returns a string, and I run it through a HTTP benchmarking tool in both ASGI mode and WSGI mode (as well as mutating various parts of the ASGI stack and how many sync_to_async calls there are, etc.)

I’ve not been able to do it for the last few weeks because of conferences and moving house, and that will continue for at least another week or two. I don’t mind if we had a few ms to each request in async mode, but the problem was that it was adding a lot of time to projects even in WSGI mode (as the WSGI path used a single async handler). We might need two handler paths for speed.

I would like to work on async middleware Andrew. I am a beginner and I saw you say at DjangoCon that this would be a nice intro to contributing. I really want to help and contribute and learn. I just started a Django study group last night and we will be studying Django on developer.mozilla.org/learn and then we will dive into the official Django docs. 100+ people signed up and we start studying tomorrow. I hope to begin contributing soon!

So, good news, everyone - I have managed to rework BaseHandler so that it has two parallel request paths, one for sync code (that never, ever touches an async context, unless you have an async view) and one for async code.

This meant some refactoring of _get_response so I didn’t have to fully duplicate all the logic, but it was honestly time for this code to get some love anyway.

As a result:

  • Calling sync views under WSGI touches no async code and is as fast as before
  • Calling async views under WSGI is possible and invokes a single async thread for the view
  • Calling sync views under ASGI works and invokes as few synchronous transitions as possible
  • Calling async views under ASGI results in only a single sync_to_async call for the request_started signal, if there’s no synchronous middleware.

It also allows synchronous and asynchronous middleware to be mixed in either case; obviously, there are performance advantages to having all the middleware match and be the same type as the view.

Next steps, in my eyes:

  • Fix up the few test failures to make this mergeable
  • Work out how we can allow middleware that is both synchronous and asynchronous. I’m tempted to say we should just implement our own version of an __acall__ and say that you should implement that and __call__ if you want to service both.
  • See if we can find a way to have the signal dispatch not need sync_to_async until it has to actually call a signal handler

You can see the updated commit here: https://github.com/django/django/commit/a22d05324d2d3d8e652d31368688d55be34d4858 and it’s all part of the existing PR: https://github.com/django/django/pull/11650

13 Likes

Further update - I’ve modified middleware to work in much the same way as the core request stack now does, allowing middleware to be sync, async, or “hybrid” (capable of both). This should let us rewrite all core Django middleware to allow maximum performance in both modes (there’s some further docs in the commit): https://github.com/django/django/pull/11650/commits/4b91d20f96a40c3844a88b22e029ca531a6da23a

4 Likes

Instantly thinking, Why do we have this anyway? :grinning:

That sounds like most fun.

Thank you for your work Andrew. Super excited to be looking at this.

Sadly, for the database connection cleanup. It’s been the bane of my life since Channels. If we can work out a reason to not need that every request and handle DB connections more transparently, it’d save a lot of people a lot of pain (it affects everyone working outside a request/response flow, too - like people writing long-running management commands)

Yes. It would be something of a change to avoid the request_started/request_finished signals…

Slight aside, but, ASGIHandler.send_response() calls request.close(). That sends request_finished, which caches and ORM subscribe to. Should request.close be @async_unsafe? :grimacing:

Ah yes, I bet we need to wrap that somehow. I’ll go look at it today and at least stick a sync_to_async around it; however, given that async_unsafe is wrapped around the important core parts of the ORM, cache, and so on, it’s likely not hitting any of that in a default configuration (but of course, until we make signals async-aware, we have to run them synchronously)

Hi @andrewgodwin. Is the benchmarking tool you have in a sharable state? (I feel duty bound to reproduce your findings… :slightly_smiling_face:)

Unfortunately not, as it’s just ab and a test Django async project I’ve had for ages strapped together. If you want to verify, I’d follow this procedure:

  • Create a Django project with a simple sync and async view that do the same thing (mine just render a basic template with request/debugging info)
  • Run ab against the sync view under runserver on both master and the async_views branch. This compares the sync performance before and after.
  • Run ab against the sync view under daphne on the async_views branch. This makes sure the performance hit for ASGI is not too high.
  • Run ab against the async views under runserver and daphne if you want to, just to see how they behave (I was doing this while modifying MIDDLEWARE to get an idea of relative differences)
2 Likes

OK, I’ll do that. Cheers Andrew.

Hi; Can you share the results?

There’s no observed slowdown, particularly on the WSGI path, for which it’s important we don’t introduce a slow down.

We merged the PR. It’ll be Django 3.1.

6 Likes

Hi; Can you share the results?