Parallelism across the same django DB connection?

Right now my understanding is the ORM serializes async DB requests through sync_to_async (is this understanding correct?). My other understanding is that even if we use async-capable connection objects through psycopg3, that we don’t automatically resolve this issue. After all, we have one connection to a DB, so it’s not like we can just start doing queries in parallel on that one connection right?

Have there been any ideas on enabling concurrent queries through the same “connection”, without asking users to manually load balance in their code?

For example, would we ever want the following to use two connections in parallel? (Let’s assume the action is in fact heavy enough for this to be a good idea):

asyncio.gather(MyModel.objects.aget(pk=1), MyModel.objects.aget(pk=2))

transactions are their own thing of course.

My idea would be that maybe a Django connection could in fact be up to N connections, and we could distribute the queries across those connections. Default N to 1, if you know what you’re doing you can make it N, and maybe there’s a context manager to make it N.

There is a lot of chatter about library support for async-capable libraries, but now that that is available, what is the next thing? Is it all silly anyways because for the most part it’s all going to the same DB that will have to manage all the querying?

EDIT: to add some context, this is downstream of me working on a Django project that has the async ORM all set up but is still getting pretty heavily limited by the serialization of the DB requests themselves.

That’s correct.

I think a better way of saying it is that Django has no ability (yet) to use async-native connections. And if it did, it would come down to how the underlying async-native library (psycopg3 in this case) implements this.

Per the psycopg3 docs:

An AsyncConnection can be used by several asyncio.Task at the same time. However, as with threads, all the AsyncCursor on the same connection will share the same session and will have their access to the connection serialized.

(Concurrent operations - psycopg 3.2.2.dev1 documentation)

So yes I think you are correct! Only one connection to the database, no parallelism :frowning:

This would come down to database-backend support, and the way that Django would implement it. Since Django doesn’t yet even support async database backends we’re a bit far away from this being a reality and it’s hard to see that far ahead to be able to multiplex queries over the same connection.

Django 5.1 connection pools are possible, which might be similar to what you’re asking for here:

I’m not sure exactly how the feature works or what its performance characteristics are, but you might want to investigate it.

My understanding regarding connection pools is that it’s more for getting a connection for each request a bit more quickly, but I should probably dive a bit into the get_new_connection logic in the DB backends.

From your replies though it does feel like we haven’t really gotten far in ideas of how we could take advantage of async in a way to avoid serializing requests. Like with everything else, I guess the first step is just getting async DB backends to be a workable thing first.

“Multiplex” here is an interesting word, it feels like it might be possible to set up some form of multiplexing support even without async connection support (idea: you can open up to N connections, each on their own Task. Multiplexed operations get distributed across there as the operations come in).

This work in itself would still Django internals to be async-compatible, but if people are explicitly opting in to having multiple running connections… might be doable.

Yep. I’d agree with that statement! First we need functionality then we could talk about new features.

I think there’s a few open questions here, nothing is really settled yeah.

I haven’t tried it, and I might be wrong (the asgiref locals and stuff is complicated), but I think this should work? asyncio.gather will run each of these in their own asyncio task, and each task will get its own connection. You would want to use a connection pool for this though (available for postgres since Django 5.1), to avoid the connection setup overhead.

Try it and let us know :slight_smile:

Is that true? I thought the current implementation serializes all requests on a single thread.

I am fairly certain that this isn’t what happens. Instead, Django, through the sync_to_async helpers, syncs all database requests to a single thread. So while you have two threads here in theory, in practice all the I/O happens on the same thread.

You can see this just by looking at aget implementation

Correct you are, sync_to_async(thread_sensitive=True) which aget is using means the two calls serialize to the same thread, though interestingly, using separate connections because of running in separate tasks!

Django DB async is in an awkward place. Async DB drivers will make this better.

A very different approach I took here GitHub - iNishant/django-querysets-single-query-fetch: Utility which executes multiple Django querysets over a single network/query call and returns results which would have been returned in normal evaluation of querysets which works only for Postgres + Django.

1 Like

Welcome @iNishant !

I must say, that does look cool. (I don’t have anything needing it, but it’s definitely an interesting approach to this type of situation.)

1 Like