Huge performance difference when using ASGI and WSGI

I have been testing my backend based on Django and Django Rest Framework. I wanted to check what is the minimal possible response time for my REST services. I was doing testing with ASGI and WSGI configuration. Difference between ASGI and WSGI shocked me completely.

There is ~15ms overhead for the same operation (same view) when using ASGI - laboratory conditions with one client and one request processing just to show minimal latency potential of Django (not handling multiple requests at the same time, to demonstrate the power of ASGI). I compared results obtained for identical requests in ASGI and WSGI modes.

My test environment was a dedicated physical machine which at a given time was used only for the testing purposes, having a lot of free RAM, free CPUs resources, local NVMe storage, and LAN connection to a client workstation.

I have performed testing for various configuration scenarios, however, there has always been a huge overhead for the ASGI requests. I checked different python versions (3.9, 3.10, 3.11 up to 3.12), Django version (4.1, 4.2 up to 5), asgiref (3.6 to 3.8), as well as different versions of gunicorn and uvicorn/daphne, with/without DEBUG…

The difference was still ~15ms +/-2ms, so I decided to utilize Django version 5.0.4 and the latest stable versions of other python packages.

Below is the view witch I used during the tests (returns just a static value without any database connection):

class TestView(APIView):

   def get(self, request, *args, **kwargs):

      response_data = {'message': "Test Message"}

      return Response(response_data, status=status.HTTP_200_OK)

Results (I use wrk and also locust to do the testing):


WSGI:

gunicorn message_broker.wsgi:application --workers 4 --bind 0.0.0.0:8000

avg ~ 1ms


ASGI:

gunicorn message_broker.asgi:application --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

avg ~ 17ms


I was trying to find out what was the cause of the observed difference resorting to django-cprofile-middleware. Adding django-cprofile-middleware to the end of MIDDLEWARE configuration (in settings.py) show me the same total time ~1ms for both ASGI and WSGI, so there was no difference on the django-cprofile-middleware level.

I know that the optimal solution for ASGI would be to use asynchronous view, in case of which standard coroutines would be used (not a new thread to emulate coroutine by sync_to_async). In my case ASGI and standard Django Rest Framework synchronous view is used.

I am also aware that for a synchronous view and ASGI, Django runs the view’s synchronous code in a thread pool using ThreadPoolExecutor. It works fine, I have analysed details with extended logging (logging process, thread, threadname) and using PYTHONASYNCIODEBUG=1. It was looking good for me - same process, same thread, creating new thread pool number (view code is running inside ThreadPoolExecutor-NNN_0 for each request - different pool number NNN). I also have checked if there was an issue with middleware not supporting asynchronous, as suggested by Django documentation:

Middleware can be built to support both sync and async contexts. Some of Django’s middleware is built like this, but not all. To see what middleware Django has to adapt for, you can turn on debug logging for the django.request logger and look for log messages about “Asynchronous handler adapted for middleware …”.

Nothing is logged by class BaseHander adapt_method_mode() method so there is no problem with many/extra switches between async and sync.

My MIDDLEWARE configuration is as follows:

MIDDLEWARE = [

'django.middleware.security.SecurityMiddleware',

'django.contrib.sessions.middleware.SessionMiddleware',

'django.middleware.common.CommonMiddleware',

'django.middleware.csrf.CsrfViewMiddleware',

'django.contrib.auth.middleware.AuthenticationMiddleware',

'django.contrib.messages.middleware.MessageMiddleware',

'django.middleware.clickjacking.XFrameOptionsMiddleware',

]

I read documentation and watched conferences about ASGI/internals to find an answer to this issue. I read and searched for a cause of the problem in django/core/handlers source codes (asgi.py, base.py), django utils source codes and asgiref source codes. I found nothing special.

The obvious differences between ASGI and WSGI in my case are:

  1. running method:

gunicorn (WSGI) vs gunicorn with uvicorn class worker (ASGI) or daphne (same results)

  1. django internals:

get_wsgi_application() (WSGI) vs get_asgi_application() - process of synchronous view for ASGI mode by using ThreadPools (ASGI)

I wonder what is the cause of such a big difference, i.e., 1ms for WSGI vs 15ms for ASGI. It is hard to believe that it results from using get_asgi_application() and threading for synchronous view.

I will be thankful for any ideas or suggestions.

1 Like

The difference in response times between WSGI and ASGI is a bit complicated. However if we try to reason by first principles, some underlying factors could contribute to your observations (on YOUR hardware):

1. Threading Overhead

Even though DRF manage synchronous views in ASGI by running them in a ThreadPoolExecutor, there is inherent overhead the comes with thread management. This includes the cost of creating threads, context switching, handling the thread pool, etc… while seemingly small, can add up, particularly under lightweight operations where the relative overhead becomes more significant compared to the actual processing time…

2. Event Loop Integration

When using ASGI, Django integrates with an event loop. Scheduling synchronous code to run via an event loop can introduce additional latency compared to directly servin the request synchronously as WSGI does. The event loop checks for the readiness of various events and manages asynchronous tasks, which introduces complexity and potential delays even if the actual view is handled synchronously.

3. Gunicorn with Uvicorn vs. Standard Gunicorn

The configuration of Gunicorn when using Uvicorn workers for ASGI might differ in performance characteristics compared to standard Gunicorn serving WSGI apps. Uvicorn’s design is optimized for asynchronous apps, so while it can handle synchronous code by offloading synchronous code to threads, its primary optimizations are for non-blocking code. This might result in less than optimal handling for purely synchronous loads.

4. Initialization and Adaptation Costs

Django needs to adapt it to run asynchronously whenever a synchronous view is called in an ASGI application. This adaptation, although optimized, still adds some overhead compared to directly serving the request synchronously without any adaptation.

2 Likes

:+1:

Also, I suggest you benchmark more realistic views. Use the database, render a template, etc. Any ASGI/WSGI overhead is likely to be negligible in most cases.

FWIW: for apps getting up to first-round scaling limits, and then beyond, for the last few years I’ve tended to split my app in two. I’ll use WSGI for the bulk, and then reserve ASGI for cases which require async — long-lived connections, realtime updates, websockets, and so on. I’ve found this to be headache free: WSGI is very mature and the scaling patterns are well known — that (still) buys a lot.

4 Likes

Hi @carltongibson, how would you go about it, if you have combination of both. For example, django application with some async views, and channels application.

In the case you mentioned, are you running two servers i.e one for asgi side and one for wsgi side?

Thanks

You have your load balancer or frontend web server (nginx, say) route requests to different backends, usually by path, but any feature of the request you need really.

With nginx you’d define to upstreams, one for each of WSGI, ASGI. Then, e.g., those starting /ws/ get routed to the ASGI server. The rest go to WSGI.

Keep in mind gunicorn can also run other worker types than the default “sync” type.

A lot of folks immediately switch to ASGI without exploring the other gunicorn worker types.

In a lot of cases, these worker types can eliminate entirely the need for async views.
In particular, the “gevent” worker type is underrated: it allows to run WSGI applications in high I/O context very efficiently.

The “gthread” worker type is simpler to grasp and understand and also allows you to alleviate a lot of the performance issues you run into with high I/O tasks.