Hi everyone, I hope to get some help to translate a query into Django ORM. I tried but failed to come up with a ORM-only, fast query.
Here’s a simplified version of the problem I am facing.
I have 4 entities.
class Collection(models.Model):
items = models.ManyToManyField(Item, related_name='collections')
class Item(models.Model):
pass
class Atom(models.Model):
items = models.ManyToManyField(to='core.Item', related_name='atoms')
class Status(model.Model):
atom = models.ForeignKey(to='core.Atom', related_name='statuses')
status = models.CharField(max_length=50, choices=(('done', 'done'), ('pending', 'pending'),))
user = models.CharField(max_length=50)
I would like to annotate each collection with the number of item that is done. An item is done if all its atoms are done, i.e. the statuses related to the atoms are ‘done’.
I can for example get this done in raw SQL using the following query that takes just a few milliseconds.
with items as (
select
ci.id,
count(ca.id) as atoms_total_count,
count(ca.id) filter(where ca.status is not null) as atoms_done_count
from core_item ci
join core_atom_items cai on cai.item_id = ci.id
join core_atom ca on ca.id = cai.atom_id
left join core_status cs on cs.atom_id = ca.id
where cs.user = '123' -- this is helping performance I believe.
group by ci.id
)
select
cc.*,
count(i.id) as items_total_count,
count(i.id) filter(where atoms_total_count = atoms_seen_count) as items_done_count,
from core_collection cc
join core_collection_items cci on cci.collection_id = cc.id
join items i on i.id = cci.item_id
group by cc.id
Here’s what I have tried so far on the ORM side:
items = Item.objects.filter(collections=OuterRef("pk")).order_by()
items = items.values(
"collections",
atoms_total_count=Count('atoms', filter=Q(atoms__statuses_user='123')),
atoms_done_count=Count(
'atoms',
# Here the filter condition will have to scan the entire cartesian product of all tables?
filter=Q(
atoms__statuses_user='123',
atoms__statuses_status='done'
)
)
)
items_done = items_qs.filter(atoms_total_count=F('atoms_done_count')).values("pk")
queryset = Collection.objects.annotate(
items_total_count=Count('items', distinct=True),
items_done=Count(
'items',
filter=Q(items__in=items_done)
distinct=True
)
)
Unfortunately,
- I am not sure this is the right way to proceed with the ORM.
- this is extremely slow. I believe this is due to the fact that the where clause in the with clause is not generated by Django’s ORM, hence the DB has to scan all collections x items x atoms x statuses.
I am afraid this is a limitation of Django’s ORM. However, not being very familiar with it I hope there might be a way around it.