Hi!
I have this two models:
class Creditor(models.Model):
name = models.CharField(max_length=300, unique=True)
class Order(BaseModel):
creditor = models.ForeignKey(
Creditor,
on_delete=models.CASCADE,
related_name="orders"
)
value = models.FloatField()
I would like to annotate to each Order, the creditor_size (sum of all order values for each Creditor).
This could be done pretty straightforward with this query:
creditor_size_subquery = Subquery(
models.Creditor.objects
.filter(pk=OuterRef("creditor"))
.annotate(size=Coalesce(Sum("orders__value"), 0.0, output_field=FloatField()))
.values("size")
)
models.Order.objects.select_related("creditor").all().annotate(creditor_size=creditor_size)
Problem is, for only 37000 orders rows, this query takes 3 minutes. That’s because of this,
the raw sql:
SELECT "order"."id",
(SELECT COALESCE(SUM(U1."value"), 0.0) AS "size"
FROM "creditor" U0
LEFT OUTER JOIN "order" U1 ON (U0."id" = U1."creditor_id")
WHERE U0."id" = "order"."creditor_id"
GROUP BY U0."id") AS "creditor_size"
FROM "order"
LEFT OUTER JOIN "creditor" ON ("order"."creditor_id" = "creditor"."id")
WHERE "order"."hidden_at" IS NULL
ORDER BY "order"."created_at" DESC
The query above contains a subquery that is executed once per row, which is slowing the query, if i have 37000 orders, the subquery will be executed 37000 times.
The solution is to calculate this query once and append the results as a join, the query bellow will run in less than 1 second:
SELECT o.id, o.creditor_id, subquery.creditor_size
FROM order as o
LEFT JOIN
(
SELECT c.id, SUM(c.value) as creditor_size
from creditor c
left join agrometrics_cpr c ON (c.creditor_id = b.id)
GROUP BY b.id
) subquery ON (subquery.id = o.creditor_id)
Is it possible to achieve this query with django ORM? Basically join a subquery and use values from this subquery.