Django Async: use cases and learning path

MrCordeiro · June 30, 2020, 2:24pm

I still don’t fully grasp what can an async-Django can do, but wanted to learn.

The oficial release notes were not very beginner friendly. It mentions ASGI vs WSGI and links to an article about how to deploy an ASGI server, but it doesn’t mention what use cases are unlocked by using an ASGI server.

I come from an old Treehouse tutorial where Kenneth Love explains how to use Django Channels to create a real-time chat. So far, that’s the only use case I know.

Can I use Django 3.0 to implement a real-time chat too?

What about jobs? Rails has the Active Job framework for declaring jobs and making them run on queuing back-ends. With Django, you would have to rely on Celery (which I find confusing). Can run jobs with async Django? Or, will it?

Looking at the Django roadmap, what are the use cases for async now and what use cases will be unlocked by future features?

Also, what resources do you recommend for leaning async?

adamchainz · June 30, 2020, 2:37pm

Django 3.0’s async support is minimal. 3.1 will expand it greatly. 3.1 is in beta and you can read its release notes and try it out now: https://docs.djangoproject.com/en/3.1/releases/3.1/

For now, Channels is needed to implement most async cases. The Channels tutorial has you build a real-time chat application: https://channels.readthedocs.io/en/latest/tutorial/index.html

Django’s async is not for background jobs and there’s no movement in that direction at current. It’s only for async handling in requests, for server-sent events, websockets, and similar.

There are lots of frameworks and resources for learning more about async web in the Awesome ASGI repo: https://github.com/florimondmanca/awesome-asgi

Hope that helps!

MrCordeiro · July 1, 2020, 7:02pm

Thanks!

When you are unfamiliar with ASGI applications it’s very hard to decipher how the 3.1 release notes translates into use cases and features that will be unlocked.

That being said, the repo has some interesting talks!

ephes · July 2, 2020, 3:57pm

I’m also interested in what you can do with async-Django. In a weak moment I agreed to write something about async features of Django 3.1, so I kind of have to get to know more about them. Just started to experiment in this repository.

As far as I know the only thing where async is useful for Django 3.0 is when you have multiple file uploads. With sync Django your uploads will get blocked as soon as all of your workers are busy receiving data. Since file uploads are handled by the “handler” before views are involved they should not block if you use a server like uvicorn with asgi.

Starting with Django 3.1 async views are possible and therefore there should be a lot more use cases. Still, I struggled to come up with something interesting. The chat example is not feasible, because you would use websockets for that and for websockets you would use channels. I don’t know if it makes sense to receive data from an api with long polling or if everybody uses websockets too for that nowadays. Another example I found in the DEP for async was “slow streaming” but I don’t have a use case for that.

The most promising use case for me is to have aggregation api endpoints which gather data from other api endpoints which are behind async views. In sync world your latencies would add up or you would have to use threads manually where you can just use an async aggregation view in Django 3.1.

If you stumble about an interesting application, I would be very interested :).

KenWhitesell · July 2, 2020, 4:32pm

Just to add a slight variation on to your theme here - the aggregation idea doesn’t just need to apply to api endpoints - the same concept may apply within your application itself.

For example, let’s say you have a “dashboard” page consisting of a number of different blocks of content. You could structure your view such that each content block gets rendered asynchronously and then pulled together in your final template. Agreed, if the blocks are cpu-bound, it’s probably not going to help any. But if the retrieval of the data for that block is data-intensive, there may be some benefit to doing it that way.

(All conjecture on my part.)

Ken

ephes · July 3, 2020, 9:12am

Stumbled across the first obstacle trying to connect to a sync view from an async aggregation view running inside an asgi server. Asked a question on stackoverflow.

ephes · July 3, 2020, 10:39am

Ok, the problem was just me creating the simplest possible deadlock by trying to connecting a single threaded server to itself. It works in the default development server, because it’s multithreaded by default, but I didn’t know that.

ephes · July 5, 2020, 8:29am

Good idea, but after some time playing around with async api views etc. I wonder if this is a good example for async views. Maybe it’s more elegant to write this as async, but it’s also possible to do this with normal sync views and a ThreadPoolExecutor for example:

github.com

ephes/django_async/blob/master/django_async/sync_views.py

import time
import httpx
import concurrent.futures

from django.http import JsonResponse
from .viewfinder import get_all_functions


def api(request):
    time.sleep(1)
    payload = {"foo": "bar"}
    return JsonResponse(payload)


def threadpool_aggregation(request):
    num_iterations = 10
    sync_api_url = "http://localhost:8000/sync/api/"
    results = []
    urls = [sync_api_url for i in range(1, 1 + num_iterations)]
    s = time.perf_counter()

This file has been truncated. show original

Hmm, guess I’m still struggling to come up with a use case where async views are really needed.

KenWhitesell · July 5, 2020, 12:26pm

I don’t think you’re going to find a use case where async views are needed. That’s not the benefit of an async environment. Nor is it going to make any individual request faster - in fact, it’s quite possible that an individual request, made in a test/development environment will be slower than its sync counterpart.

The benefit of going async is the expected ability to remain stable with a more consistent response under load. Thread pools are not an unlimited resource, and as the number of processes and threads increase beyond the capabilities of the CPU, additional latency is being added as threads can end up blocked waiting for IO to complete - threads that might otherwise be able to be activated on other tasks where IO has already completed.

There’s a recent blog post by Tom Christie (creator of the DRF), Python async frameworks - Beyond developer tribalism that addresses what I think is the essence of what you’re struggling with. (I found out about the article from Django News, always a useful source of information.)

Ken

ephes · July 6, 2020, 4:20pm

Hmm, seem that support for async views is already included in the development server. I thought I had to install something like uvicorn to be able to write them, but they seem to work out of the box:

github.com

ephes/django_async/blob/master/async_in_devserver.md

# Django 3.1 Async Views Working in Development Server?

Seems that async views are working without having to start servers
like uvicorn.

# Install Poetry and Setup Project
```shell
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
mkdir mysite && cd mysite && poetry init -n
poetry add django==3.1b1 httpx
poetry shell  # switch to virtualenv created by poetry, I have to use a new shell, dunny why
```

# Initialize Django
```shell
django-admin startproject mysite .  # create django project in current directory
python manage.py migrate            # migrate sqlite
python manage.py runserver          # should start the development server now
```

This file has been truncated. show original

ephes · July 7, 2020, 4:59pm

Really nice article, thanks. Well, I also watched the DjangoCon talks from Tom Christie and Andrew Godwin and listened to every podcast episode about async I could find. Slowly I get a little bit more used to it, I think .

So the main advantage async has over threads is that is that its:

Easier for developers
More efficient

And while I’m pretty convinced about 1., I’m still not really sure about 2. - it’s usually said that threads are not as efficient because of memory overhead and context switches. I tried to find out about the memory overhead and found it to be not as bad as reported. Effective Python says a thread will cost about 8MB, Fluent Python talks about a few MB vs about 1KB per async task) - my tests say something way below 1MB. But maybe I’m testing it wrong. Maybe I’m only seeing user space memory and overlook kernel space structures? Context switches might be a problem, but I don’t know how to test that.

Btw this is the script I used to test thread memory usage:

github.com

ephes/django_async/blob/master/measure_threads_memory.py

import time

import concurrent.futures


def do_almost_nothing(thread_id):
    time.sleep(100)
    return thread_id


num_threads = 10000
results = []
s = time.perf_counter()
with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:
    future_to_function = {executor.submit(do_almost_nothing, thread_id): thread_id for thread_id in range(num_threads)}
    for future in concurrent.futures.as_completed(future_to_function):
        function = future_to_function[future]
        try:
            results.append(future.result())
        except Exception as exc:

This file has been truncated. show original

It crashed my mac and I assumed at first it might be a hardware problem, but then tried it on a second one which also crashed. So I filed a bug against macOS. Maybe it runs out of kernel memory…

KenWhitesell · July 8, 2020, 12:22am

<opinion>
Actually, I’m not at all convinced about either of them.

I don’t find async to be “easier”, because I find it puts more requirements on me to ensure I don’t create any blocking situation. If I’m working with threads, I just let the code run - I don’t need to explicitly wait for anything.

Nor do I find it necessarily more efficient - especially if I’m doing work that is CPU intensive. It seems to me that it’s only better when you’re regularly performing operations that are IO bound.
While async is likely to be more stable under higher loads than what sync would be, that doesn’t necessarily mean it’s more efficient for any individual procedure.
</opinion>

Ken

ephes · July 8, 2020, 8:53am

Hmm, probably the 8MB quote comes from ulimit -s. The default stack space for threads is 8MB (Linux, macOS). But this is virtual memory, not resident. On 32bit machines it was 2MB and imposed a low and hard limit on the number of threads because your usable virtual address space was only 3GB. But on 64bit architectures this is much larger. Found it in this really helpful article. Context switching costs should also not be that bad. Maybe python threads are kind of special or the locks you have to acquire for job and result queue are the problem (didn’t measure)…

KenWhitesell · July 8, 2020, 11:14am

Also keep in mind that the Global Interpreter Lock affects how effective threads can be.

From the Threading docs:

CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation).

This means that in a multi-core machine, a single Python process is only going to use one core at any time. From the rest of that same paragraph:

If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing or concurrent.futures.ProcessPoolExecutor . However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

Not sure how that affects your testing or your results, but it implies to me that if your threads are CPU bound, you’re not going to see any performance improvements by splitting your task out into threads vs running them sequentially.

Ken

ephes · July 8, 2020, 4:37pm

Yes, writing async code is more effort upfront. But it’s also probably easier to reason about than multithreaded code. And tools/languages might be able to be more helpful, because async code is more explicit about whats going on (when it’s giving up control). I like the metaphor Tom Christie used in the article you mentioned. Async is a little bit like writing python code where types are enforced. You have to be more precise but also get additional safety in return.

But yes, I see your point :).

ephes · July 8, 2020, 4:45pm

Yes, I would say that threads as well as async both only improve things when your code is I/O-bound. But a view in a web application probably is.

Topic		Replies	Views
Django 3.1 Async Article Show & Tell	1	787	August 4, 2020
Django - Using asynchronous features Using Django	6	1067	May 21, 2020
Django async views Using Django	3	982	September 21, 2021
Current Async TODOs Async	20	5298	August 14, 2020
What does switching to ASGI entail? Async/Channels	6	4733	September 20, 2024

Django Async: use cases and learning path

Related topics