I think I need to use a union, because both subqueries may return the same parts (a part may be used in an engine component and a transmission component), but I only want to count each part once. I feel that I need to return several rows from the subqueries, so that I can achieve uniqueness in the union.
I have tried many iterations of this query, and I keep running into this error:
django.db.utils.ProgrammingError: more than one row returned by a subquery used as an expression
Is there a work around for this? Is there a better way to achieve the annotation?
I am assuming that you want to count the number of unique, active parts associated with each machine, whether those parts are connected to the engine or the transmission (or both).
If so, that could be done with a single query like this:
from django.db.models import Count, Subquery, OuterRef, Q
def annotate_machine_query_with_num_parts(machine_query):
from part.models import Part # Import the Part model
parts_subquery = Part.objects.filter(
Q(components__machine_engine=OuterRef('pk')) |
Q(components__machine_transmission=OuterRef('pk')),
is_active=True
).distinct().values('pk')
return machine_query.annotate(
num_parts=Subquery(
parts_subquery.annotate(
count=Count('pk')
).values('count')[:1]
)
)
I am assuming that you want to count the number of unique, active parts associated with each machine, whether those parts are connected to the engine or the transmission (or both).
Yes, this is exactly correct.
I am trying to understand your provided code. It looks like the Count is supposed to annotate every row with the total count value, and then the [:1] is supposed to take the first value of the annotated count column, (which should be a column of all the same total count value).
When I try implementing your proposed solution, I get a count of 1 for all my test cases.
Here is an example test case where I would expect 2 parts to be counted and annotated, and I am now getting 1 using your solution.
I can confirm that this part of the code is returning 2 rows in my test case. I think there might be a problem with how the count is being done.
parts_subquery = Part.objects.filter(
Q(components__machine_engine=r1.pk) |
Q(components__machine_transmission=r1.pk),
is_active=True
).distinct().values('pk')
# returns a queryset of 2 pks during the above test case.
Please use this code and post the output… I have added some comments and explanations:
from django.db.models import Count, Subquery, OuterRef, Q
def annotate_machine_query_with_num_parts(machine_query):
from part.models import Part
parts_subquery = Part.objects.filter(
Q(components__machine_engine=OuterRef('pk')) |
Q(components__machine_transmission=OuterRef('pk')),
is_active=True
).distinct().values('pk')
return machine_query.annotate(
num_parts=Subquery(
parts_subquery.annotate(
count=Count('pk')
# This transforms the subquery from returning Part objects
#to returning just the count values.
).values('count'),
#This specifies that the output of the subquery should be an integer.
output_field=models.IntegerField()
)
)
When I use the most recent query you provided, I am getting django.db.utils.ProgrammingError: more than one row returned by a subquery used as an expression
This is frustrating because the subquery is returning the correct Parts, but I can’t seem to count them correctly.
from django.db.models import Count, Subquery, OuterRef, Q, IntegerField
from django.db.models.functions import Coalesce
def annotate_machine_query_with_num_parts(machine_query):
from part.models import Part
parts_count_subquery = Part.objects.filter(
Q(components__machine_engine=OuterRef('pk')) |
Q(components__machine_transmission=OuterRef('pk')),
is_active=True
).distinct().values('components__machine_engine').annotate(
count=Count('pk')
).values('count')
return machine_query.annotate(
#Coalesce function ensures we always get a single value, even if the subquery returns NULL
num_parts=Coalesce(Subquery(parts_count_subquery, output_field=IntegerField()), 0)
)
To help debug print the SQL generated by this query:
Using the latest code you provided, this is the SQL generated.
SELECT
"machine""."id",
-- Other machine fields
COALESCE((
SELECT
DISTINCT COUNT(U0."id") AS "count"
FROM
"part" U0
INNER JOIN "part_components" U1 ON
(U0."id" = U1."part_id")
INNER JOIN "component" U2 ON
(U1."component_id" = U2."id")
WHERE
((U2."machine_engine_id" = "machine"."id"
OR U2."machine_transmission_id" = "machine"."id")
AND U0."is_active")
GROUP BY
U2."machine_enging_id"),
0) AS "num_products"
FROM
"machine";
When I run my test case with this code, r1 is annotated as having num_parts=3. This probably means that ps1 is being double counted.
It appears that ps1 is indeed being double-counted. This is likely because it’s associated with both f1 and f2 components, and the current query isn’t properly handling this case.
Try this:
from django.db.models import Count, Subquery, OuterRef, Q, IntegerField
from django.db.models.functions import Coalesce
def annotate_machine_query_with_num_parts(machine_query):
from part.models import Part
parts_count_subquery = Part.objects.filter(
Q(components__machine_engine=OuterRef('pk')) |
Q(components__machine_transmission=OuterRef('pk')),
is_active=True
).distinct().annotate(
dummy=Count('pk') # This is just to create a group by
).values('pk').annotate(
count=Count('pk')
).values('count')
return machine_query.annotate(
num_parts=Coalesce(
Subquery(
parts_count_subquery.order_by().aggregate(total=Count('count'))['total'],
output_field=IntegerField()
),
0
)
)
# machine.py
class Machine(models.Model):
id = HashidSaltedAutoField(primary_key=True)
Generated SQL using latest iteration of the query:
SELECT
"machine"."id",
COALESCE((
SELECT
DISTINCT COUNT(U0."id") AS "count"
FROM
"part" U0
INNER JOIN "part_components" U1 ON
(U0."id" = U1."part_id")
INNER JOIN "component" U2 ON
(U1."component_id" = U2."id")
WHERE
((U2."machine_transmisssion_id" = "machine"."id"
OR U2."machine_engine_id" = "machine"."id")
AND U0."is_active")
GROUP BY
U0."id"
LIMIT 1),
0) AS "num_parts"
FROM
"machine"
I replaced the values () with .values('machine_id') because now we’re grouping by this new machine_id field, which correctly represents all machines, whether they use the part in the engine or the transmission.