Efficient way(s) to deal with multiple many-related models in a field path?

hepcat72 · March 19, 2025, 12:33am

I’ve been focussing on performance improvements of our ListView pages in the past month or so and I’ve made great strides while doing some rapid prototyping.

I wasn’t the original developer for these pages and they did not implement server-side pagination. Our samples page could take nearly 3 minutes to load. Now a page load with a reasonable number of results seems nearly instantaneous. 1000 rows takes 15s, which I’m satisfied with, though I might try to tweak it.

I’m of course using prefetch_related for all of the related tables and am thinking about adding usage of only and defer.

However, there’s one page that I have tried to focus on that I can’t seem to get that last bit of performance out of, and that’s our ArchiveFileListView page. It can take 3-6 seconds per 10-row page and 15-20s for a 1000 row page. So the 1000 row page is on par with the samples page, but the 10-row page irks me that it takes on average, 4s to load. The reason for its slowness is the study column. If I eliminate that column, it’s nearly instantaneous. The ArchiveFile model is linked to from a few places, and those places are far from the Study model. The field paths are:

peak_groups__msrun_sample__sample__animal__studies
mz_to_msrunsamples__sample__animal__studies
raw_to_msrunsamples__sample__animal__studies

There are 2 many-related steps in each of these paths (only 1 is ever populated, BTW, so I use a Case/When strategy based on file type [after playing around with Coalesce]):

ArchiveFile.peak_groups (ArchiveFile:PeakGroup), ArchiveFile.raw_to_msrunsamples (ArchiveFile:MSRunSample), and ArchiveFile.mz_to_msrunsamples (ArchiveFile:MSRunSample) are all reverse relations, essentially one to many from the perspective of ArchiveFile.
Animal.studies is a many to many relationship.

I had initially explored using Prefetch’s to_attr argument to hold the unique Study objects, but since there’s another many-related model on the path before it, I couldn’t get that to work, so I overrode paginate_queryset to set the attribute on the ArchiveFile object by iterating through the page’s worth of results. That trick got me from 1m down to the 15-20s timing for 1000 rows.

There are lots of PeakGroup and MSRunSample records that all link to the same ArchiveFile records, but they all link to the same Animal and there’s usually 1 or 2 studies that an animal belongs to.

Each row on the ArchiveFile page should be a unique ArchiveFile record and the Study column should be a delimited list of unique (linked) study records.

Should I just maintain a “studies” many-to-many link from ArchiveFile to Study. I try to avoid the redundant links, but I’m getting the feeling that it’s either that or caching that would be necessary, just due to the data relationships.

Extra info:

I have not yet investigated how many queries are being executed. I plan to install the django debug toolbar for the first time this week and see if I can discern where the bottlenecks are.

I also have a tsv iterator for exporting the table and it runs satisfyingly fast on the entire model. In fact, it was that strategy I used to refactor the view/template. It wasn’t storing objects. It was just getting fields (no keys).

Topic		Replies	Views
Prefetch 'to_attr' attribute results in an Attribute not found error Using the ORM	8	915	March 20, 2025
Optimizing prefetch_related for "repeated" values Using the ORM	2	4145	February 19, 2022
Help with n + 1 (or similar) query Using the ORM	4	631	April 7, 2023
How to use prefetch_related to retrieve multiple rows similar to SQL result Using the ORM	1	1078	September 23, 2022
Performance when filtering on multiple annotations Using the ORM	8	155	July 16, 2025

Efficient way(s) to deal with multiple many-related models in a field path?

Extra info:

Related topics