Querysets, Aggregation and filtering their results using other querysets

I’ve recently been given a new requirement which involves filtering objects based on some custom user permissions. The requirement is quite straightforward but I’ve come unstuck at the following:

A user interacts with cases and every case has a group

Group

class Group(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4)
    name = models.CharField(max_length=128, unique=True)

Case

class Case(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4)
    title = models.CharField(max_length=128)
    group = models.ForeignKey(Group, related_name="cases", on_delete=models.CASCADE)

Based on some permissions, I filter the cases a user can see. This part is working as expected.

However, I have an endpoint which must list all groups in which there are cases a user can access along with the count of the total number of cases in each group to which a user can access.

As an example, let’s look at three users:

  1. Julie has access to four cases in the group birds
  2. Jenny has access to two cases in the group birds
  3. Juliet has access to two cases in the group birds and three cases in the group insects

When presenting a summary of available groups, the user should see every group for which they have access to cases within that group, and the count of the cases to which they have access in that group. They should not see groups in which there are no cases to which they have access.

As an example of the desired result for each of the above users.

Julie : has four cases in birds, no cases in any other group

[
 {
   "name": "birds",
   "case_count": 4
 } 
]

Jenny: has two cases in birds, no cases in any other group

[
 {
   "name": "birds",
   "case_count": 2
 }
]

Juliet: has two cases in birds and three cases in insects

[
 {
   "name": "birds",
   "case_count": 2,
 },
 {
   "name": "insects",
   "case_count": 3,
 } 
]

Until this permissions requirement was presented to me, I used to get all of the groups which had a case count greater than 0.

        return Group.objects.annotate(
          case_count=Count("cases")
        ).filter(case_count__gt=0)

Now that I have to deal with the permissions, I first filter the user’s case queryset, and for Juliet it would look something like this:

Queryset = [
   <birds case 1>,
   <birds case 2>,
   <insects case 1>,
   <insects case 2>,
   <insects case 3>,
]

Note: It is a Django Queryset that I am returning and what I am representing above. It is not an array.

Using the data from the above queryset, I want to get the group of each case, return only a single instance for a group along with the case count from the case queryset, a la the depicted JSON data for Julie, Jenny and Juliet.

What I have now is the following:

    def get_queryset(self):
        # cases is Queryset containing all of the cases in
        # all of the groups that a user has permission to access. 
        cases = case_query_filter(self.request.user)
        # cases is the related name from Case.group to Group
        case_filter = Q(cases__in=cases)

        return (
            Group.objects.filter(case_filter)
            .annotate(case_count=Count("cases", filter=case_filter))
            .filter(case_count__gt=0)
        )

This queryset correctly returns only the groups with cases that a user can access. That is, Julie and Jenny does not see an insects group, but Juliet does.

The issue that I face is that the case count that is returned is the count for Case.objects.all(group=group).count() as opposed to the count of cases in each group to which the user has access.

Result from Group endpoint

[
    {
        "name": "birds",
        "case_count": 8,  # this should be 2
    },
    {
        "name": "insects",
        "case_count": 20,  # this should be 3
    }
]

Apologies if this was a bit long winded, but I felt that it would help with a quite verbose explaination.

Super hero status to anyone who can point my in the right direction.

Cheers,

C

Hi Conor, have you tried passing distinct=True in Count?

Hi Tim,

No, I hadn’t, as I didn’t think it would be of assistance in this instance. I’m getting distinct groups in my group queryset, but the count reflects the count of all cases related to the group, not the count of cases related to the group to which the user has access.

For good measure I added distinct=True but it didn’t change the result.

        return (
            Group.objects.filter(case_filter)
            .annotate(case_count=Count("cases", filter=case_filter, distinct=True))
            .filter(case_count__gt=0)
        )

Is this along the lines of what you were thinking, or I have I misinterpreted what you were suggesting?

Yes it was. The results really look like duplicates. Strange. Can you print the QuerySet out? Sometimes I find looking at the raw SQL help reveal the problem.

print(queryset.query)

Hi Tim,

Thanks for bearing with me. The data:

[
    {
        "name": "birds",
        "case_count": 8,  # this should be 2
    },
    {
        "name": "insects",
        "case_count": 20,  # this should be 3
    }
]

is sound to my eyes with respect to grouping of the groups if my understanding of the query I am trying to write adds up. That is, there are two groups, one named birds and the other insects and each group has a total 8 and 20 cases, respectively. I expect there two be two groups in the result, but the tally of the case_count should be less as the user has only access to 2 bird cases and 3 insect cases.

The query looks like this.

SELECT          "cases_group"."id",
                "cases_group"."name",
                "cases_group"."display_name",
                Count(DISTINCT "cases_case"."uuid") filter (WHERE "cases_case"."uuid" IN
                (
                       SELECT "cases_case"."uuid"
                       FROM   "cases_case")) AS "case_count"
FROM            "cases_group"
LEFT OUTER JOIN "cases_case"
ON              (
                                "cases_group"."id" = "cases_case"."group_id")
WHERE           "cases_case"."uuid" IN
                (
                       SELECT "cases_case"."uuid"
                       FROM   "cases_case")
GROUP BY        "cases_group"."id"
HAVING          count(DISTINCT "cases_case"."uuid") filter (WHERE (
                                "cases_case"."uuid" IN
                                (
                                       SELECT "cases_case"."uuid"
                                       FROM   "cases_case"))) > 0

To my eye which last studied SQL nigh on 17 years ago, it looks like the where statement of the left out join:

LEFT OUTER JOIN "cases_case"
ON              (
                                "cases_group"."id" = "cases_case"."group_id")
WHERE           "cases_case"."uuid" IN
                (
                       SELECT "cases_case"."uuid"
                       FROM   "cases_case")

Is select all case UUIDs from the cases table.

In my initial post, I have a variable which is a queryset with the cases a user is allowed to view. In the example we’ve been discussing, the queryset has 5 cases.

cases = case_query_filter(self.request.user) # I have 5 cases
case_filter = Q(cases__in=cases) # I'm filtering the Count in the query below

queryset = (
    Group.objects.filter(case_filter)
    .annotate(case_count=Count("cases", filter=case_filter, distinct=True))
    .filter(case_count__gt=0)
)

So it appears to my confused eye that the filter in the Count() method is not behaving as I thought it would behave, or, probably more likely now that I think about it, my user of the filter, case_filter is off.

Thanks for the assistance so far, Tim.

Yeah, something seems off about what’s being returned from case_query_filter looking at your SQL.

A quick thing to test would be to force the cases returned from case_query_filter into a list, then use that in case_filter. This will verify that if you do have the correct cases being returned in a list that the count filtering works properly. And if it does give you the numbers you want, we know that the problem is in the combination of the two.

cases = list(case_query_filter(self.request.user))
case_filter = Q(cases__in=cases)

The other approaches that I can think of trying is are COUNT(CASE(WHEN())) or using trying to finagle a Subquery into returning an annotation.

Something is definitely amiss here - myself or the code - I can’t tell yet . I cast the queryset into a list, but the same result, unfortunately. I’m going to do some more digging and when I find the solution or make some progress, I’ll post it here.

Hope to be back with some good news soon.