Subquery needs to return more than 1 row?

I am trying to annotate a machine query with a count of related parts.

# component.py
   machine_engine = models.ForeignKey(
        "Machine",
        null=True,
        blank=True,
        related_name="engine_components",
        on_delete=models.SET_NULL,
    )

    machine_transmission = models.ForeignKey(
        "Machine",
        null=True,
        blank=True,
        related_name="transmission_components",
        on_delete=models.SET_NULL,
    )

# part.py
    components = models.ManyToManyField("Component", related_name="parts")
def annotate_machine_query_with_num_parts(machine_query):
    engine_subquery = query.filter(
       engine_components__parts__is_active=True
    ).values('engine_components__parts__id').distinct()

    transmission_subquery = query.filter(
       transmission_components__parts__is_active=True
    ).values('transmission_components__parts__id').distinct()

    combined_subquery = engine_subquery.union(
        transmission_subquery,
    )

    return machine_query.annotate(
        num_parts=Count(
            Subquery(combined_subquery), distinct=True
        )
    )

I think I need to use a union, because both subqueries may return the same parts (a part may be used in an engine component and a transmission component), but I only want to count each part once. I feel that I need to return several rows from the subqueries, so that I can achieve uniqueness in the union.

I have tried many iterations of this query, and I keep running into this error:

django.db.utils.ProgrammingError: more than one row returned by a subquery used as an expression

Is there a work around for this? Is there a better way to achieve the annotation?

I am assuming that you want to count the number of unique, active parts associated with each machine, whether those parts are connected to the engine or the transmission (or both).

If so, that could be done with a single query like this:

from django.db.models import Count, Subquery, OuterRef, Q

def annotate_machine_query_with_num_parts(machine_query):
    from part.models import Part  # Import the Part model

    parts_subquery = Part.objects.filter(
        Q(components__machine_engine=OuterRef('pk')) |
        Q(components__machine_transmission=OuterRef('pk')),
        is_active=True
    ).distinct().values('pk')

    return machine_query.annotate(
        num_parts=Subquery(
            parts_subquery.annotate(
                count=Count('pk')
            ).values('count')[:1]
        )
    )

Thanks for the response.

I am assuming that you want to count the number of unique, active parts associated with each machine, whether those parts are connected to the engine or the transmission (or both).

Yes, this is exactly correct.

I am trying to understand your provided code. It looks like the Count is supposed to annotate every row with the total count value, and then the [:1] is supposed to take the first value of the annotated count column, (which should be a column of all the same total count value).

When I try implementing your proposed solution, I get a count of 1 for all my test cases.

Here is an example test case where I would expect 2 parts to be counted and annotated, and I am now getting 1 using your solution.

        # |------------+---------------------+----------------+--------------+--------------|
        # | machine    | components          | parts          | expected     | output       |
        # |            |                     |                | num_parts    | num_parts    |
        # |------------+---------------------+----------------+--------------+--------------|
        # | r1         | f1                  | ps1            | 2            | 1            |
        # |            | f2                  | ps1            |              |              |
        # |            |                     | ps2            |              |              |
        # |------------+---------------------+----------------+--------------+--------------|

I can confirm that this part of the code is returning 2 rows in my test case. I think there might be a problem with how the count is being done.

    parts_subquery = Part.objects.filter(
        Q(components__machine_engine=r1.pk) |
        Q(components__machine_transmission=r1.pk),
        is_active=True
    ).distinct().values('pk')
    # returns a queryset of 2 pks during the above test case.

Hmm…

Please use this code and post the output… I have added some comments and explanations:

from django.db.models import Count, Subquery, OuterRef, Q

def annotate_machine_query_with_num_parts(machine_query):
    from part.models import Part 
    
    parts_subquery = Part.objects.filter(
        Q(components__machine_engine=OuterRef('pk')) |
        Q(components__machine_transmission=OuterRef('pk')),
        is_active=True
    ).distinct().values('pk')
    
    return machine_query.annotate(
        num_parts=Subquery(
            parts_subquery.annotate(
                count=Count('pk')

               # This transforms the subquery from returning Part objects 
              #to returning just the count values.
            ).values('count'),

            #This specifies that the output of the subquery should be  an integer.
            output_field=models.IntegerField()
        )
    )

When I run that code for the given test case, parts_subquery is correctly a queryset of 2 rows with unique pks.

I am getting this value by running the subquery as its own query in the test code.

# `parts_subquery`
<QuerySet [{'pk': Hashid(782): 8d5e83a9b6a52d97993573eb484d2656}, {'pk': Hashid(783): edb6b438da4e65972e297d2b83592ae6}]>

But with this iteration of the query, I am getting the following error:

django.db.utils.ProgrammingError: more than one row returned by a subquery used as an expression

Sorry, I do not understand exactly what you mean here. Please try to describe the problem with more details.

I edited my previous comment slightly.

When I use the most recent query you provided, I am getting django.db.utils.ProgrammingError: more than one row returned by a subquery used as an expression

This is frustrating because the subquery is returning the correct Parts, but I can’t seem to count them correctly.

Try this:

from django.db.models import Count, Subquery, OuterRef, Q, IntegerField
from django.db.models.functions import Coalesce

def annotate_machine_query_with_num_parts(machine_query):
    from part.models import Part 
    
    parts_count_subquery = Part.objects.filter(
        Q(components__machine_engine=OuterRef('pk')) |
        Q(components__machine_transmission=OuterRef('pk')),
        is_active=True
    ).distinct().values('components__machine_engine').annotate(
        count=Count('pk')
    ).values('count')

    return machine_query.annotate(
       #Coalesce function ensures we always get a single value, even if the subquery returns NULL
        num_parts=Coalesce(Subquery(parts_count_subquery, output_field=IntegerField()), 0)
    )

To help debug print the SQL generated by this query:

print(annotate_machine_query_with_num_parts(Machine.objects).query)

Using the latest code you provided, this is the SQL generated.

SELECT
	"machine""."id",
	-- Other machine fields
	COALESCE((
	SELECT
		DISTINCT COUNT(U0."id") AS "count"
	FROM
		"part" U0
	INNER JOIN "part_components" U1 ON
		(U0."id" = U1."part_id")
	INNER JOIN "component" U2 ON
		(U1."component_id" = U2."id")
	WHERE
		((U2."machine_engine_id" = "machine"."id"
			OR U2."machine_transmission_id" = "machine"."id")
		AND U0."is_active")
	GROUP BY
		U2."machine_enging_id"),
	0) AS "num_products"
FROM
	"machine";

When I run my test case with this code, r1 is annotated as having num_parts=3. This probably means that ps1 is being double counted.

Updated test results:

# |------------+---------------------+----------------+--------------+--------------|
# | machine    | components          | parts          | expected     | output       |
# |            |                     |                | num_parts    | num_parts    |
# |------------+---------------------+----------------+--------------+--------------|
# | r1         | f1                  | ps1            | 2            | 3            |
# |            | f2                  | ps1            |              |              |
# |            |                     | ps2            |              |              |
# |------------+---------------------+----------------+--------------+--------------|

It appears that ps1 is indeed being double-counted. This is likely because it’s associated with both f1 and f2 components, and the current query isn’t properly handling this case.

Try this:

from django.db.models import Count, Subquery, OuterRef, Q, IntegerField
from django.db.models.functions import Coalesce

def annotate_machine_query_with_num_parts(machine_query):
    from part.models import Part 
    
    parts_count_subquery = Part.objects.filter(
        Q(components__machine_engine=OuterRef('pk')) |
        Q(components__machine_transmission=OuterRef('pk')),
        is_active=True
    ).distinct().annotate(
        dummy=Count('pk')  # This is just to create a group by
    ).values('pk').annotate(
        count=Count('pk')
    ).values('count')

    return machine_query.annotate(
        num_parts=Coalesce(
            Subquery(
                parts_count_subquery.order_by().aggregate(total=Count('count'))['total'],
                output_field=IntegerField()
            ),
            0
        )
    )

Thanks again for sticking with me and helping me try to figure this out. The last iteration you provided gives the following error

ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.

On this line

                 parts_count_subquery.order_by().aggregate(total=Count('count'))['total'],

Try this:

from django.db.models import Count, Subquery, OuterRef, Q, IntegerField
from django.db.models.functions import Coalesce

def annotate_machine_query_with_num_parts(machine_query):
    from part.models import Part 
    
    parts_count_subquery = Part.objects.filter(
        Q(components__machine_engine=OuterRef('pk')) |
        Q(components__machine_transmission=OuterRef('pk')),
        is_active=True
    ).distinct().values('pk').annotate(
        dummy=Count('pk')
    ).values('dummy').annotate(
        count=Count('dummy')
    ).values('count')

    return machine_query.annotate(
        num_parts=Coalesce(
            Subquery(parts_count_subquery, output_field=IntegerField()),
            0
        )
    )

It looks like this attempt is supposed to annotate all rows with “dummy” and then count the "dummy"s?

This iteration is giving the following error:

django.core.exceptions.FieldError: Cannot compute Count('dummy'): 'dummy' is an aggregate

Try this:

from django.db.models import Count, Subquery, OuterRef, Q, IntegerField
from django.db.models.functions import Coalesce

def annotate_machine_query_with_num_parts(machine_query):
    from part.models import Part 
    
    parts_count_subquery = Part.objects.filter(
        Q(components__machine_engine=OuterRef('pk')) |
        Q(components__machine_transmission=OuterRef('pk')),
        is_active=True
    ).order_by().distinct().annotate(
        count=Count('pk')
    ).values('count')

    return machine_query.annotate(
        num_parts=Coalesce(
            Subquery(parts_count_subquery[:1], output_field=IntegerField()),
            0
        )
    )

and post here the output of this:

print(annotate_machine_query_with_num_parts(Machine.objects).query)

Also paste your models.py code here

Model code:

# part.py

class Part(models.Model):
    id = HashidSaltedAutoField(primary_key=True)
    is_active = models.BooleanField(default=True)
    components = models.ManyToManyField("Component", related_name="parts")
# component.py

class Component(models.Model):
    id = HashidSaltedAutoField(primary_key=True)
    engine_machine = models.ForeignKey(
        "Machine",
        null=True,
        blank=True,
        related_name="machine_engine",
        on_delete=models.SET_NULL,
    )
    transmission_machine = models.ForeignKey(
        "Machine",
        null=True,
        blank=True,
        related_name="machine_transmission",
        on_delete=models.SET_NULL,
    )
# machine.py

class Machine(models.Model):
    id = HashidSaltedAutoField(primary_key=True)

Generated SQL using latest iteration of the query:

SELECT
	"machine"."id",
	COALESCE((
	SELECT
		DISTINCT COUNT(U0."id") AS "count"
	FROM
		"part" U0
	INNER JOIN "part_components" U1 ON
		(U0."id" = U1."part_id")
	INNER JOIN "component" U2 ON
		(U1."component_id" = U2."id")
	WHERE
		((U2."machine_transmisssion_id" = "machine"."id"
			OR U2."machine_engine_id" = "machine"."id")
		AND U0."is_active")
	GROUP BY
		U0."id"
	LIMIT 1),
	0) AS "num_parts"
FROM
	"machine"

Results from test:

# |------------+---------------------+----------------+--------------+--------------|
# | machine    | components          | parts          | expected     | output       |
# |            |                     |                | num_parts    | num_parts    |
# |------------+---------------------+----------------+--------------+--------------|
# | r1         | f1                  | ps1            | 2            | 1            |
# |            | f2                  | ps1            |              |              |
# |            |                     | ps2            |              |              |
# |------------+---------------------+----------------+--------------+--------------|

Try this:

from django.db.models import Count, Subquery, OuterRef, Q, IntegerField
from django.db.models.functions import Coalesce

def annotate_machine_query_with_num_parts(machine_query):
    from part.models import Part 
    
    parts_count_subquery = Part.objects.filter(
        Q(components__engine_machine=OuterRef('pk')) |
        Q(components__transmission_machine=OuterRef('pk')),
        is_active=True
    ).distinct().count()

    return machine_query.annotate(
        num_parts=Coalesce(
            Subquery(
                Part.objects.filter(
                    Q(components__engine_machine=OuterRef('pk')) |
                    Q(components__transmission_machine=OuterRef('pk')),
                    is_active=True
                ).distinct().values('components__engine_machine')
                .annotate(count=Count('pk', distinct=True))
                .values('count')[:1],
                output_field=IntegerField()
            ),
            0
        )
    )

Test it:

print(annotate_machine_query_with_num_parts(Machine.objects).query)

and post the result of this:

annotated_machines = annotate_machine_query_with_num_parts(Machine.objects)
for machine in annotated_machines:
    print(f"Machine {machine.id}: {machine.num_parts} parts")

This iteration is throwing this error again

ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.

Try and this:

from django.db.models import Count, Subquery, OuterRef, Q, IntegerField
from django.db.models.functions import Coalesce

def annotate_machine_query_with_num_parts(machine_query):
    from part.models import Part 
    
    return machine_query.annotate(
        num_parts=Coalesce(
            Subquery(
                Part.objects.filter(
                    Q(components__engine_machine=OuterRef('pk')) |
                    Q(components__transmission_machine=OuterRef('pk')),
                    is_active=True
                )
                .values('components__engine_machine')
                .annotate(count=Count('pk', distinct=True))
                .values('count')[:1],
                output_field=IntegerField()
            ),
            0
        )
    )

We’re getting somewhere, this iteration is passing the test cases that I have.

It looks like this iteration fails when a machine uses a part in its transmission and not its engine.

I think .values('components__engine_machine') fails to find parts which are not used in the engine.

Test this:

from django.db.models import Count, Subquery, OuterRef, Q, IntegerField, F
from django.db.models.functions import Coalesce

def annotate_machine_query_with_num_parts(machine_query):
    from part.models import Part 
    
    return machine_query.annotate(
        num_parts=Coalesce(
            Subquery(
                Part.objects.filter(
                    Q(components__engine_machine=OuterRef('pk')) |
                    Q(components__transmission_machine=OuterRef('pk')),
                    is_active=True
                )
                .annotate(
                    machine_id=Coalesce('components__engine_machine', 'components__transmission_machine')
                )
                .values('machine_id')
                .annotate(count=Count('pk', distinct=True))
                .values('count')[:1],
                output_field=IntegerField()
            ),
            0
        )
    )

This annotation creates a machine_id field that will be either the engine_machine or the transmission_machine ID, whichever is not null.

.annotate(
    machine_id=Coalesce('components__engine_machine', 'components__transmission_machine')
)

I replaced the values () with .values('machine_id') because now we’re grouping by this new machine_id field, which correctly represents all machines, whether they use the part in the engine or the transmission.

I hope that helps.

1 Like