Django Async vs Sync is slow

I have a boilerplate app and was playing around with Django Async(planning to add some GraphQL + subscriptions) and made a benchmark that shocked me.

Gunicorn with async worker uvicorn is much more slower than gthread.

Async Code:

async def my_async_view(request):
   return JsonResponse(
    {"async accounts": "Test"},
    status=200,
)

Gunicorn Async command:

gunicorn --worker-class uvicorn.workers.UvicornWorker banking_api.asgi --workers=8 --threads=2

Sync Code

def my_sync_view(request):
   return JsonResponse(
    {"sync accounts": "Test"},
    status=200,
)

Gunicorn Sync command:

gunicorn banking_api.wsgi --workers=8 --threads=2 

Benchmark results:

Async:

Running 30s test @ http://localhost:8000/test/
  12 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    57.72ms   75.82ms   1.08s    97.25%
    Req/Sec    81.90     44.42   232.00     81.61%
  29007 requests in 30.05s, 8.08MB read
Requests/sec:    965.41

Sync:

Running 30s test @ http://localhost:8000/test/
  12 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    24.26ms   44.37ms 865.76ms   92.37%
    Req/Sec   281.56    146.56     0.89k    66.32%
  100051 requests in 30.08s, 30.25MB read
Requests/sec:   3326.37

What is happening? Why the difference of Requests/sec of 3326.37 (sync) vs 965.41 (async) ?
Is this expected or am I doing something wrong here?

I know sintetic benchmarks are not helpful, the only thing I want to make sure is if I’m going async, and all the hassle of writing wrappers of sync_to_async, async_to_sync in my code, that at least I don’t hit a performance hit.

Actually the reason I need this is because I am writing a GraphQL API and I need DataLoaders for my resources (using Graphene-django), that are async (latest graphql core removed Promise based DataLoaders).

I did find an alternative for a Sync Dataloader that might work for me.

Long story short, I also want to embrace the Async mindset and start writing some Async Django to get used to it and learn the caveats, but taking a performance hit is clearly not the way to start, that’s why it’s important for me to understand what is happening.

It’s hard to say without actually profiling, but you’re not actually doing any IO here, there aren’t any awaits, which is what allows the event loop to work on another task, rather than blocking. As such you’re not going to see any performance gains from async. (Indeed traditional parallelism may be faster.)

I’d try a more realistic application and see how you get on.

Hi,
I agree that I am not doing any I/O but even then, is it normal to cut the RPS more than 3 times for a simple JSONResponse ?

So you think traditional parallelism (sync) is 3 times more faster than Async ?

There is one difference in the test, I am using Gunicorn with the Uvicorn worker for Async and gthread worker for the Sync code, so that might be a reason why it’s so fast ?!

It’s weird how all tests on the web show Async so much faster, how could it be 50% more faster with I/O if now without it, it’s 3x times slower ?

The only thing I can think about is because when you have much I/O traditional sync is getting much slower than Async ? So it catches up by 3x times than ass another 50% to it?

Can someone with a bigger app with much I/O can confirm this hypothesis ?

I see the CLI flags but, I’m not sure from your post that you’re really running multiple workers in the ASGI case. 1 ASGI worker vs multiple WSGI workers isn’t a fair test (especially when there’s no IO in play to take advantage of the concurrency).

You want to compare the single worker performance for both cases I would guess, in order to speak meaningfully about the straight sync/async difference in this case.

Yes, I was using the same number of workers.

If you have time and are curious, you can just start a new Django project with the command line tools and have 2 routes, one to a sync function and one to a Async function and test yourself.

There is no package that I installed, I haven’t done any changes to the generated Django app, besides adding 2 routes that have only 1 import (the JSONResponse import)

TBH not really… I’ve been running Django apps in production for years now without this kind of issue, so it’s likely an artefact of the setup, and I don’t have time to dig into that. (Likely your multiple workers aren’t actually being used… — the single worker example is what’s relevant here.)

Ok, so interesting results I’ve got here:

Sync:

Gunicorn workers=1 : Requests/sec:    226.49(using gunicorn sync worker)
Gunicorn workers=8 :  Requests/sec:    501.48 (using gunicorn sync worker)
Gunicorn workers=8 threads=2 :  Requests/sec:   3796.44 (using gunicorn gthread worker)

Async:

Gunicorn Uvicorn workers=1: Requests/sec:    550.66 (compared 1wrk to 1wkr with gunicorn sync its double so its what you would expect)
Gunicorn Uvicorn workers=8:  Requests/sec:   1686.96 (compared to gunicorn sync its 3 times faster)

the --threads param didn’t seem to affect in any way the async performance

So the real difference that is made is related to that “gthread” worker type gunicorn uses when you add the “–threads” attribute.

For Async Gunicorn recomends the “gevent” worker, but I couldn’t make the Django app work with that worker, I get this error:
TypeError: ASGIHandler.__call__() missing 1 required positional argument: 'send'

Anyway, @carltongibson you might be right, I was looking into it too much, the Async approach is 3 times faster after all, just that the “gthread” is better (when you are using sync code), but probably in real world situations with many network calls, the Async approach will have the edge in the end.

Thanks for the test “single worker” idea.

I think this discussion can be closed now.

1 Like