Hello everybody,
I’m working on a big Django DRF application.
Here I have certain views who are calling external APIs a lot, so it would be good to make them async.
Now I’m a little uncertain what the best way for this is. I think using the normal django async views is not the best approach, because I would loose all the automatic permission handling, authentication and so on. Now as I’m writing this I’m now thinking that this may not even be so much and it may be the best approach?
My other idea was to use the third party package ADRF. Does anybody have experience with this in production?
I would also need to use a lot of sync_to_async here, so I can make queries on the database, this would sporn a lot of threads. But since this is the recommended way anyways for the normal django views, I think this is not such a big issue.
Of course I’m already using celery for long running tasks. But some tasks require an immediate response to the user, so I can not use this here.
Or would it be better to use FASTAPI or Django Ninja in these cases?
If you have any questions, let me know.
I would appreciate any help from you, thank you in advance
Simon
I’d wrap the API calls in a single coroutine, which can run them in parallel, and then use async_to_sync
to run that from your sync view, if you want to embed this in a sync DRF view.
ADRF seems popular, so that worth looking at too.
Yes, didn’t think about that yet, but I would prefer the view being async, so that other requests can be handled in the meantime.
Since I already use Uvicorn workers and have it running as an asgi app, I will let the API-Calls in the mainthread with aiohttp, so I think that will work.
My main concern is that sync_to_async will sporn so many threads that this will be the bottleneck. But it sounds like this is unlikely?
If your code is mostly sync with requests, you can get concurrency by making requests in a ThreadPoolExecutor
python releases the gil waiting for io, so the requests will occur concurrently. Then you can continue to use wsgi and normal drf.
Are you sure? My code after the Api Calls need the data I get back from them, so I need to wait for them. Therefore this would have to be async, so I would have to use asgi anyways, or am I missing something?
It’s one of my favourite patterns to use an async view to make external API calls in an otherwise sync (WSGI) application. (Your WSGI server will be running multiple workers, and will continue to serve other requests.)
As I’ve read what you’ve posted, you can’t quite do that with DRF, because you still need a sync view, but you can wrap your fetches in async_to_sync
and they can all be dispatched in parallel with asyncio.gather()
.
I’d suggest prototyping it, and seeing how it goes. You may be pleasantly surprised how easy and well it works.
1 Like
Slightly modified example from the docs specifically using requests and with more urls to demonstrate concurrency.
# perfthreads.py
import concurrent.futures
import sys
import requests
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://google.com/',
'http://www.bbc.co.uk/',
'http://facebook.com/'] * 10
# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
return requests.get(url, timeout=timeout)
def threads():
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
response = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
def sync():
for url in URLS:
load_url(url, 60)
funcs = {"sync": sync, "threads": threads}
if __name__ == '__main__':
# python perfthreads.py sync
# python perfthreads.py threads
print(sys.argv)
funcs[sys.argv[1]]()
$ time poetry run python perfthreads.py sync
['perfthreads.py', 'sync']
poetry run python perfthreads.py sync 1.48s user 0.37s system 9% cpu 18.694 total
$ time poetry run python perfthreads.py threads
['perfthreads.py', 'threads']
poetry run python perfthreads.py threads 1.12s user 0.29s system 36% cpu 3.903 total
There are reasons to choose async over threads. I wanted to share this option because it might be some low hanging fruit to help with your performance.
Thank you for your suggestions, I really appreciate that!
They are both aiming in the direction of actually not using async views itself.
As far as I understood it would scale very well to use async views with uvicorn for example, because then one worker could work on another request while awaiting an async APICall for example. All this still in the main event loop and not in a thread.
Since I have some requests which would be long running (like 10 seconds), I would like to use this functionality to free up the worker, so that it can work on different requests while waiting.
Therefore I think ADRF might be a good choice. I’m just concerned about to many threads being used, because all the ORM, Serializer, Permissions have to be wrapped in sync_to_async and are therefore done in a thread.
What do you think? Or is it better to just use polling on everything in django?
Kind regards
Since I have some requests which would be long running (like 10 seconds), I would like to use this functionality to free up the worker, so that it can work on different requests while waiting.
your understanding is correct. neither the threadpoolexecutor nor async_to_sync help when one worker is blocked for a 10 second response while trying to optimize for overall system req/s throughput. asgi can help with that. but generally speaking, i try and avoid long http responses, as they are likely to timeout at a loadbalancer or even gunicorn/uvicorn in front of your view anyway (reasonable timeouts are a good thing). a python application will usually push heavy work to a message broker to be handled in a background task, using a lib like celery. this introduces complexity to the architecture, but it’s often necessary. whether you need to introduce a background task architecture or if asgi will be good enough depends on your requirements on what you’re actually processing in your view.
Yes, I have celery in place for long running backroundtasks, but there are some where I want to give the user an immediate feedback and not through polling.
Since I already use channels in my Django application, I m running it with uvicorn workers anyways. I just learned that uvicorn is executing sync views in the thread pool anyways (I hope I understand that right). Because then my main concern would not matter anymore. My main concern is that all the sync_to_async executions for the ORM, Serializers, Permissions and so on would overload the thread pool. But if the thread pool already is tested with normal sync requests, I think this will not make a huge difference in performance anyways then.
What do you think?
Thank you for this conversation already
I think you’ll be ok using the sync_to_async functions.