I still don’t fully grasp what can an async-Django can do, but wanted to learn.
The oficial release notes were not very beginner friendly. It mentions ASGI vs WSGI and links to an article about how to deploy an ASGI server, but it doesn’t mention what use cases are unlocked by using an ASGI server.
I come from an old Treehouse tutorial where Kenneth Love explains how to use Django Channels to create a real-time chat. So far, that’s the only use case I know.
Can I use Django 3.0 to implement a real-time chat too?
What about jobs? Rails has the Active Job framework for declaring jobs and making them run on queuing back-ends. With Django, you would have to rely on Celery (which I find confusing). Can run jobs with async Django? Or, will it?
Looking at the Django roadmap, what are the use cases for async now and what use cases will be unlocked by future features?
Also, what resources do you recommend for leaning async?
Django’s async is not for background jobs and there’s no movement in that direction at current. It’s only for async handling in requests, for server-sent events, websockets, and similar.
When you are unfamiliar with ASGI applications it’s very hard to decipher how the 3.1 release notes translates into use cases and features that will be unlocked.
That being said, the repo has some interesting talks!
I’m also interested in what you can do with async-Django. In a weak moment I agreed to write something about async features of Django 3.1, so I kind of have to get to know more about them. Just started to experiment in this repository.
As far as I know the only thing where async is useful for Django 3.0 is when you have multiple file uploads. With sync Django your uploads will get blocked as soon as all of your workers are busy receiving data. Since file uploads are handled by the “handler” before views are involved they should not block if you use a server like uvicorn with asgi.
Starting with Django 3.1 async views are possible and therefore there should be a lot more use cases. Still, I struggled to come up with something interesting. The chat example is not feasible, because you would use websockets for that and for websockets you would use channels. I don’t know if it makes sense to receive data from an api with long polling or if everybody uses websockets too for that nowadays. Another example I found in the DEP for async was “slow streaming” but I don’t have a use case for that.
The most promising use case for me is to have aggregation api endpoints which gather data from other api endpoints which are behind async views. In sync world your latencies would add up or you would have to use threads manually where you can just use an async aggregation view in Django 3.1.
If you stumble about an interesting application, I would be very interested :).
Just to add a slight variation on to your theme here - the aggregation idea doesn’t just need to apply to api endpoints - the same concept may apply within your application itself.
For example, let’s say you have a “dashboard” page consisting of a number of different blocks of content. You could structure your view such that each content block gets rendered asynchronously and then pulled together in your final template. Agreed, if the blocks are cpu-bound, it’s probably not going to help any. But if the retrieval of the data for that block is data-intensive, there may be some benefit to doing it that way.
Stumbled across the first obstacle trying to connect to a sync view from an async aggregation view running inside an asgi server. Asked a question on stackoverflow.
Ok, the problem was just me creating the simplest possible deadlock by trying to connecting a single threaded server to itself. It works in the default development server, because it’s multithreaded by default, but I didn’t know that.
Good idea, but after some time playing around with async api views etc. I wonder if this is a good example for async views. Maybe it’s more elegant to write this as async, but it’s also possible to do this with normal sync views and a ThreadPoolExecutor for example:
Hmm, guess I’m still struggling to come up with a use case where async views are really needed.
I don’t think you’re going to find a use case where async views are needed. That’s not the benefit of an async environment. Nor is it going to make any individual request faster - in fact, it’s quite possible that an individual request, made in a test/development environment will be slower than its sync counterpart.
The benefit of going async is the expected ability to remain stable with a more consistent response under load. Thread pools are not an unlimited resource, and as the number of processes and threads increase beyond the capabilities of the CPU, additional latency is being added as threads can end up blocked waiting for IO to complete - threads that might otherwise be able to be activated on other tasks where IO has already completed.
There’s a recent blog post by Tom Christie (creator of the DRF), Python async frameworks - Beyond developer tribalism that addresses what I think is the essence of what you’re struggling with. (I found out about the article from Django News, always a useful source of information.)
Hmm, seem that support for async views is already included in the development server. I thought I had to install something like uvicorn to be able to write them, but they seem to work out of the box:
Really nice article, thanks. Well, I also watched the DjangoCon talks from Tom Christie and Andrew Godwin and listened to every podcast episode about async I could find. Slowly I get a little bit more used to it, I think .
So the main advantage async has over threads is that is that its:
Easier for developers
More efficient
And while I’m pretty convinced about 1., I’m still not really sure about 2. - it’s usually said that threads are not as efficient because of memory overhead and context switches. I tried to find out about the memory overhead and found it to be not as bad as reported. Effective Python says a thread will cost about 8MB, Fluent Python talks about a few MB vs about 1KB per async task) - my tests say something way below 1MB. But maybe I’m testing it wrong. Maybe I’m only seeing user space memory and overlook kernel space structures? Context switches might be a problem, but I don’t know how to test that.
Btw this is the script I used to test thread memory usage:
It crashed my mac and I assumed at first it might be a hardware problem, but then tried it on a second one which also crashed. So I filed a bug against macOS. Maybe it runs out of kernel memory…
<opinion>
Actually, I’m not at all convinced about either of them.
I don’t find async to be “easier”, because I find it puts more requirements on me to ensure I don’t create any blocking situation. If I’m working with threads, I just let the code run - I don’t need to explicitly wait for anything.
Nor do I find it necessarily more efficient - especially if I’m doing work that is CPU intensive. It seems to me that it’s only better when you’re regularly performing operations that are IO bound.
While async is likely to be more stable under higher loads than what sync would be, that doesn’t necessarily mean it’s more efficient for any individual procedure. </opinion>
Hmm, probably the 8MB quote comes from ulimit -s. The default stack space for threads is 8MB (Linux, macOS). But this is virtual memory, not resident. On 32bit machines it was 2MB and imposed a low and hard limit on the number of threads because your usable virtual address space was only 3GB. But on 64bit architectures this is much larger. Found it in this really helpful article. Context switching costs should also not be that bad. Maybe python threads are kind of special or the locks you have to acquire for job and result queue are the problem (didn’t measure)…
CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation).
This means that in a multi-core machine, a single Python process is only going to use one core at any time. From the rest of that same paragraph:
If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing or concurrent.futures.ProcessPoolExecutor. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
Not sure how that affects your testing or your results, but it implies to me that if your threads are CPU bound, you’re not going to see any performance improvements by splitting your task out into threads vs running them sequentially.
Yes, writing async code is more effort upfront. But it’s also probably easier to reason about than multithreaded code. And tools/languages might be able to be more helpful, because async code is more explicit about whats going on (when it’s giving up control). I like the metaphor Tom Christie used in the article you mentioned. Async is a little bit like writing python code where types are enforced. You have to be more precise but also get additional safety in return.