Grouping by ForeignKey with a limit per group

I have a scenario where I would like to query a model and produce limited groups of records in relation to a foreign key. Here’s an example to illustrate.

from django.conf import settings
from django.db import models

class Project(models.Model):
    """The model that I want to group by"""
    user = models.ForeignKey(settings.AUTH_USER_MODEL)

class Task(models.Model):
    """The model that I want to put into groups"""
    project = models.ForeignKey(Project, related_name="tasks")

If I don’t care about an N+1 query bug, then I know I can do something like:

limit = 5
projects = Project.objects.filter(user=user)  # user fetched earlier
for project in projects:
    tasks = project.tasks.all()[:limit]

What I’m trying to get here is “the next 5 tasks for each project for the user.” This example uses the default id ordering. The real constraints have order significance, but I didn’t want to throw up too much code.

I could flip this on its head and do something like:

tasks = Task.objects.filter(project__user=user).select_related('project')
tasks_dict = {}
for task in tasks:
    if task.project not in tasks_dict:
        tasks_dict[task.project] = []
    tasks_dict[task.project].append(task)

The problem I have with that query is that I can’t figure out how to group the resulting tasks by project and limit to 5 per project. My real world scenario would have hundreds of tasks per project so the query above would grab too much data.

I also can’t put a limit on the Task query because it wouldn’t guarantee that each project would have 5 tasks.

Is there some kind of aggregation option that I’m not aware of that could do this?

Hi

I don’t think there’s an easy way right now

This is the most popular Stack Overflow post for this question: https://stackoverflow.com/questions/57440893/limit-number-of-related-objects

Here’s the related ticket: https://code.djangoproject.com/ticket/26780

Both contain a workaround involving prefetch_related and Subquery, which will work but check the queryset.explain() on your database to see if it’s slow.

Another option is to do a second query using window functions and prefetch_related_objects() to join the Tasks back onto the projects in-memory.

Thanks, @adamchainz! You’ve at least help confirm my suspicions that there is not an obvious way to do this that I was missing.

I’ll check out the options you suggested.