Is it possible to parallelize prefetch_related?

I have a number of models (>10) that have a foreign key to MyCentralModel. When prefetching, the related tables get queried sequentially, and the time adds up. Is there a way to query all the related tables at the same time?

Is there a way to query all the related tables at the same time?

The short answer is that there is no way to have QuerySet.prefetch_related perform concurrent queries to parallelize object prefetching. In fact, as of today, even the async part of the ORM performs prefetching serially but in a non-blocking manner obviously.

The reason why this isn’t implemented is three fold. First prefetch_related doesn’t have logic to parallelize its lookups while respecting order of retrieval.

For example, if you do something like

Book.objects.prefetch_related("authors__location", "pages")

The fetching of author and pages can be parallelized but the fetching of __location depends on the completion of author. A solution that ought to concurrently fetch sets of objects in the right order would have to account for that.

Secondly, Django ties database connections to threads so assuming you wanted to use a thread pool you’ll have to do a bit of connection management.

Lastly, assuming you have resolved the parallelization problem, figured out an efficient way to manage connections in your multi-concurrency solution of predilection there remains the problem of ORM objects manipulation. In other words, models.Model instances were not designed / tested to be thread safe which means that all the prefetched data re-assignment would have to happen in a single thread serially to avoid running into subtle issues.

In theory you might be able to build such a solution around the lower level prefetch_related_objects function that prefetch_related relies on but your mileage might vary.

The parallelization of work would could be done by a caller that splits call to prefetch_related_objects (e.g. a first one for "authors__location" and concurrent one for "page") but the problem is that prefetch_related_objects does both the query fetching and the attibute assignments and you might run into problems if you have two threads altering the same model instance (e.g. try to assign a authors and pages to the same book._prefetched_cache at the same time).

1 Like

Wouldn’t the thread safety problems be avoided by using cooperative multitasking?
When making queries to the DB, await them and let the thread work on the next db request. When the response comes back, handle it on the main thread. You’ll still load objects into memory sequentially, but the db requests will be resolved in parallel.

We’d still need to solve connection management - having connection pools available for prefetches.

I suggest you give it a shot and see how far you can take it, from my understanding it is far from trivial even to get it working in an async context where you can delegate to an even loop.

AFAIK even async connections cannot execute multiple queries concurrently

When an asynchronous query is being executed, connection.isexecuting() returns True. Two cursors can’t execute concurrent queries on the same asynchronous connection.

So even in an async world you need a form of connection pooling to get it working.