Hello,
The gist: for performance reasons, I’m trying to have the ORM generate SQL a particular way, and I’m stuck at how to make an inner join on 2 values provided by a values_list.
Abbreviated models with only relevant fields and shorter names:
class GenericContent(models.Model):
consumer_type = models.ForeignKey(ContentType, on_delete=models.CASCADE, limit_choices_to=allowed_consumers)
consumer_id = models.PositiveIntegerField(db_index=True)
consumer = GenericForeignKey('consumer_type', 'consumer_id')
class SpecificContent(GenericContent):
picture = models.Charfield(max_length=200)
video = models.Charfield(max_length=200)
sound = models.Charfield(max_length=200)
class R(models.Model):
pass
class W(models.Model):
pass
class C(models.Model):
pass
class O(models.Model):
pass
class G(models.Model):
pass
R, W, C, O and G are consumers of GenericContent. There are other types of content, but only SpecificContent is used here. My production database have about 10 million rows of Content. It’s a Standard-2 on Postgres.
I’m trying to get all Contents that match a list of consumers (there is a filter per type) by using a subquery that outputs all consumer_type_id and consumer_id couples, then do an inner join of that.
Here’s the SQL I want to generate, that I’ve tested meet my performance goal (there are other ways to write it, but other ways I’ve found are too slow):
SELECT specificcontent.picture, specificcontent.video, specificcontent.sound
FROM specificcontent WHERE specificcontent.genericcontent_ptr_id IN
(
SELECT genericcontent.id FROM genericcontent
INNER JOIN (
SELECT R."id" as "consumer_id", 28 as "consumer_type_id" FROM R WHERE [some business-specific filters here]
UNION
SELECT W."id" as "consumer_id", 92 as "consumer_type_id" FROM W WHERE [some business-specific filters here]
UNION
SELECT C."id" as "consumer_id", 17 as "consumer_type_id" FROM C WHERE [some business-specific filters here]
UNION
SELECT O."id" as "consumer_id", 11 as "consumer_type_id" FROM O WHERE [some business-specific filters here]
UNION
SELECT G."id" as "consumer_id", 90 as "consumer_type_id" FROM G WHERE [some business-specific filters here]
) AS consumers
ON genericcontent.consumer_id = consumers.consumer_id AND genericcontent.consumer_type_id = consumers.consumer_type_id
);
The building of the consumers
subquery is crucial to performance. Using a mixture of (genericcontent.consumer_id = R.consumer_id AND genericcontent.consumer_type_id = ContentType.objects.get_for_model(R).id) OR [same for other types
is not performant enough (DB query ends up taking more than 30s, which is Heroku timeout limit).
So far, I’ve managed to build the consumers
subquery using annotate and values_list:
consumers = R.objects.filter([business logic goes here]).annotate(consumer_id=F('id'), consumer_type_id=Value(ContentType.objects.get_for_model(R).id)).values_list('consumer_id', 'consumer_type_id').union(
W.objects.filter([business logic goes here]).annotate(consumer_id=F('id'), consumer_type_id=Value(ContentType.objects.get_for_model(R).id)).values_list('consumer_id', 'consumer_type_id')
).union([same for other models])
Now what I can’t figure out: how can I use the consumers
as a subquery to do the INNER JOIN with Content? Is it even possible without using raw SQL?
It would be very beneficial to me to use the ORM, as I have multiple similar queries that would benefit from that speed-up.
Note: in a previous version (with performance issues) I was using this code to filter:
filters = {
ContentType.objects.get_for_model(R).id: Q(genericcontent_ptr__consumer_id__in=R.objects.filter([business logic goes here]))
ContentType.objects.get_for_model(W).id: Q(genericcontent_ptr__consumer_id__in=W.objects.filter([business logic goes here]))
[...]
}
SpecificContent.objects.annotate(
consumer_filter=NamedFilteredRelation(
'genericcontent_ptr',
condition=Q(reduce(operator.or_, (Q(queryfilter & Q(genericcontent__consumer_type=content_type_id)) for content_type_id, queryfilter in filters.items()))))
).filter(consumer_filter=F('id'))
But I’m at a loss on how to adapt that with the consumers
subquery.