Empty queryset when using Count result in filter expression

Hi,

I use django-4.1.1 and I’m having unexpected results to queries involving multiple Count() annotation used in a filter expression. I can reproduce this behavior with the User/Group models from django.contrib.auth.

In the following, I’m adding 2 Count() to the User queryset, with one of them using a conditional involving an empty set:

User.objects \
  .annotate(num1=Count('groups'), num2=Count('groups', filter=Q(groups__in=[]))) \
  .values('pk', 'num1', 'num2')

This yields the following result:

<QuerySet [{'pk': 1, 'num1': 0, 'num2': 0}, {'pk': 2, 'num1': 0, 'num2': 0}, {'pk': 3, 'num1': 1, 'num2': 0}]>

However, if I add the filter num1 >= num2, as in

User.objects \
  .annotate(num1=Count('groups'), num2=Count('groups', filter=Q(groups__in=[]))) \
  .filter(num1__gte=F('num2')) \
  .values('pk', 'num1', 'num2')

Then this yields an empty result set:

<QuerySet []>

I would have expected to get the same results as in the first query above, given the values in the database (the condition num1 >= num2 is true for all User instances). Also looking at len(django.db.connection.queries) before and after, I can see that this last query does not even trigger any request to the postgres database, it seems that django decides to skip the query altogether and return and empty queryset.

And if I force the Count condition to use a non-trivial queryset, although the result is also empty, it then works as expected. E.g.:

User.objects \
  .annotate(num1=Count('groups'), num2=Count('groups', filter=Q(groups__in=Group.objects.filter(pk=0)))) \
  .filter(num1__gte=F('num2')) \
  .values('pk', 'num1', 'num2')

Yields

<QuerySet [{'pk': 1, 'num1': 0, 'num2': 0}, {'pk': 2, 'num1': 0, 'num2': 0}, {'pk': 3, 'num1': 1, 'num2': 0}]>

I’m hoping to get some insight on what I’m doing wrong. Thanks.

1 Like

I’m guessing it’s because the empty “in” condition is causing some special handling within the ORM.

This shows up with an even simpler example.

If I run the following in the Django shell:
print(User.objects.filter(username__in=['ken']).query)
I get the expected query being generated.

However, if I try:
print(User.objects.filter(username__in=[]).query)

I get a stack dump with an exception “EmptyResultSet” having been raised.

Note: I just checked, an empty “in” clause appears to not be valid SQL. So Django is catching that situation before sending it to the database.
The reason that your other version works is that Django creates that expression using a subquery, which must be evaluated by the database.

In addition what was already said, your __in filter conditions are not equivalent. To get the same as groups__in=Group.objects.filter(pk=0) does, imho groups__in=[0] should work.

Thank you for the answers. I understand that it boils down to how the query is mapped to SQL, but you have to admit that, from a high-level point of view, there’s some inconsistency in the fact that the first query returns valid result, while the second returns an empty set.

It kinda makes sense when you look at the SQL generated for these 2 snippets:

User.objects \
  .annotate(num1=Count('groups'), num2=Count('groups', filter=Q(groups__in=[])))
  .values('pk', 'num1', 'num2')
SELECT
  "auth_user"."id",
  COUNT("auth_user_groups"."group_id") AS "num1",
  0 AS "num2"
FROM "auth_user"
LEFT OUTER JOIN "auth_user_groups"
  ON ("auth_user"."id" = "auth_user_groups"."user_id")
GROUP BY "auth_user"."id"
User.objects \
  .annotate(num1=Count('groups'), num2=Count('groups', filter=Q(groups__in=Group.objects.filter(pk=0))))
  .filter(num1__gt=F('num2')) \
  .values('pk', 'num1', 'num2')
SELECT
  "auth_user"."id",
  COUNT("auth_user_groups"."group_id") AS "num1",
  COUNT("auth_user_groups"."group_id")
    FILTER (WHERE "auth_user_groups"."group_id" IN (SELECT U0."id" FROM "auth_group" U0 WHERE U0."id" = 0)) AS "num2"
FROM "auth_user"
LEFT OUTER JOIN "auth_user_groups"
  ON ("auth_user"."id" = "auth_user_groups"."user_id")
GROUP BY "auth_user"."id"
HAVING
  COUNT("auth_user_groups"."group_id") > (COUNT("auth_user_groups"."group_id") FILTER (WHERE "auth_user_groups"."group_id" IN (SELECT U0."id" FROM "auth_group" U0 WHERE U0."id" = 0)))

If I were to use the empty array in the conditional Count in the second query, this would translate into an empty IN clause, which, as you said, is invalid SQL. I guess that’s what triggers the EmptyResultSet exception and makes django return an empty queryset instead. It’s just too bad that django is able to handle the first case, by using constant “0” in the query, but cannot do the same for the HAVING clause.

In practice, my use case if a bit more involved than the above sample. The filter for the condition Count is actually the cleaned_data from a ModelMultipleChoiceField form field, roughly used as follows:

groups = self.cleaned_data['groups']
query = User.objects.annotate(num1=Count('groups'), num2=Count('groups', filter=Q(groups__in=groups))).filter(num1__gt=F('num2'))
if query.exists():
  # trigger form validation error

So even though there are technically User rows matching my criteria (when the “groups” cleaned data is empty), django returns an empty queryset and my form validation does not work as expected. For the time being, I resorted to using len(groups) to select 2 different code paths.

As a side note, instead of having two different code paths, you could conditionally assign your groups variable as something like:
groups = self.cleaned_data['groups'] or [0]

@goffioul The behavior of the ORM for [] is indeed a bit surprising here, esp. the fact that it swallows the “error” which would raise on the printing attempt. There is no valid SQL repr for that kind of query and the ORM prolly shortcuts things to empty result for a quite different reason - pure convenience within OOP realms (thats a pure guess from my side, so I cannot answer that for sure).

For those kind of errors/misbehavior I tend to remember myself, that relational algebra is very different from OOP mechs, esp. in terms of cardinality/expressiveness. All ORMs are all lacking at this or that end, and django’s ORM is def. more at the easy convenience end of things, thus might run earlier into incongruencies. For that reason I kinda always look at the SQL level for more complicated constructs, and might resort to raw SQL. It is still great for common tasks, as it “get the job done” with a very convenient interface. (Well, for python projects outside of django I use SQLAlchemy most of the time.)

What I want to say with this - ORMs are a great tool to get things done within our OOP muscle brain, but should not be trusted easily outside common tasks.