Django DRF: how to groupby on a foreign fields?

I have a model where users can upvote other users for specific topics. Something like:

#models.py
Class Topic(models.Model):
    name = models.StringField()

    def __str__(self):
        return str(self.name)

Class UserUpvotes(models.Model):
    """Holds total upvotes by user and topic"""
    user = models.ForeignKey(User)
    topic= models.ForeignKey(Topic)
    upvotes = models.PositiveIntegerField(default=0)

Using DRF, I have an API that returns the following: topic_id, topic_name, and upvotes, which is the total upvotes for a given topic.

One of the project requirements is for the API to use these field names specifically: topic_id, topic_name, and upvotes

#serializers.py
class TopicUpvotesSerializer(serializers.ModelSerializer):
    topic_name = serializers.StringRelatedField(source="topic")

    class Meta:
        model = UserUpvotes
        fields = ["topic_id", "topic_name", "upvotes"]

My trouble is aggregating these fields. I’m filtering the UserUpvotes by user or team and then aggregating by topic.

Desired output

This is the result I want to get. When I don’t perform any aggregations (and there are views where this will be the case), it works.

[
    {
        "topic_name": 3,
        "topic_name": "Korean Studies",
        "upvotes": 14
    },
    {
        "topic_name": 12,
        "topic_name": "Inflation",
        "upvotes": 3
    },
]

At first, I tried creating a TopicSerializer, and then assigning it to the topic field in TopicUpvotesSerializer. But then, the resulting json would have a nested “topic” field and the aggragation would fail.

Attempt 1

#views.py

def get_queryset(self):
    return (
      UserUpvotes.objects.filter(user__team=team)
          .values("topic")
          .annotate(upvotes=models.Sum("upvotes"))
          .order_by("-upvotes")
      )

My problem is that the topic_id and topic_name fields are not showing. I get something like:

[
    {
        "topic_name": "3",
        "upvotes": 14
    },
    {
        "topic_name": "12",
        "upvotes": 3
    },
]

Attempt 2

Another queryset attempt:

# views.py

def get_queryset(self):
    return (
      UserUpvotes.objects.filter(user__team=team)
          .values("topic__id", "topic__name")
          .annotate(upvotes=models.Sum("upvotes"))
          .order_by("-upvotes")
      )

Which yields:

[
    {
        "upvotes": 14
    },
    {
        "upvotes": 3
    },
]

The aggregation worked on the queryset level, but the serializer failed to find the correct fields.

Attempt 3

This was the closest I got:

# views.py

def get_queryset(self):
    return (
      UserUpvotes.objects.filter(user__team=team)
          .values("topic__id", "topic__name")
          .annotate(upvotes=models.Sum("upvotes"))
          .values("topic_id", "topic", "upvotes")
          .order_by("-upvotes")[:n]
      )
[
    {
        "topic_name": 3,
        "topic_name": "3",
        "upvotes": 14
    },
    {
        "topic_name": 12,
        "topic_name": "12",
        "upvotes": 3
    },
]

I have no idea why “topic_name” is simply transforming the “topic_id” into a string, instead of calling the string method.

Another thing I tried was to customize the “topic_name” field:

class UserUpvotes(serializers.ModelSerializer):
    topic_name = serializers.SerializerMethodField()

    def get_topic_name (self, obj):
        return obj.topic.name

    class Meta:
        model = UserUpvotes
        fields = ["topic_id", "topic_name", "upvotes"]

The rub is that obj is a dict, which I wasn’t expecting. Maybe due to the values().annotate() aggregation?

There will be times there won’t be any aggregations, and then I’ll get a 'UserUpvotes' object is not subscriptable error.

This is when I REALLY miss type hints in Django. I could write a try-except, but it feels wrong. Plus I don’t know if there are any other cases I should be aware of.

I can’t help you with any of the other questions you’ve posed - but this one is rather straight-forward.

See the docs for values - when iterating over a queryset using the values function, you’re iterating over a dict, not a list of object instances. (The annotate function has nothing to do with this.)

That was not the point though. The point is that what you are essentially doing is a group by operation. The fact this aggregation returns a list of dict and not a QuerySet is a curveball: after all, you can perform the inverse operation, by aggregating topics instead, and then get a QuerySet:

(
 Topic.objects.filter(userupvote_set__user__team=team)
  .annotate(upvotes=models.Sum("userupvote_set__upvotes")
) -> QuerySet

In fact, you could also run raw SQL, and get a RawQuerySet instead. But running the canonical way yields dicts.

In fact, I even found a library for previous Django versions that tries to hack the QuerySet back to a group by operation on the previous model: GitHub - kako-nawao/django-group-by: Have your valuesquerysets return model instances instead of dicts

But that is the point.

The difference in your example here is that you’re not using the values function in this query.

It’s the values function that causes the conversion of a “queryset of object instances” to a “queryset of a list of dict” as documented at QuerySet API reference | Django documentation | Django

To change the semantics of the values function in this case to make it return a “queryset of object instances” would make it inconsistent with the documentation and with every other instance where values is being used.

Just to be curious. Where would that be preferred?

Which is the “that” you are referring to?

The second post in this thread contains:

The author has a query at the top containing a values clause, and was not expecting this behavior in that query. So in this specific situation, the author would have preferred that values would return a “queryset of instances” instead of the “queryset of a list of dict”.

I believe the question is, in which cases would you prefer dealing with dicts rather than model instances?

1 Like

That’s actually the purpose of the values clause. It saves the overhead of instantiating instances of objects when you’re only interested in the values of specific fields.

It’s also appropriate in the cases like that of the “group by” aggregation as described above, in that what the query is returning is not a set of object instances.

Technically speaking, a “group by” result set are not rows from the table being aggregated. It makes no sense to refer to model fields from a group by aggregation that aren’t the aggregated values or the groups over which the aggregation is being performed - the results would be indeterminate.

1 Like

Thank you. :slight_smile: