Using Django 4.2.7 and sqlite3 as the backend.
There are two problems here, so bear with me. One is that Django will generate group by columns for every field in the model, regardless of what is actually selected, if there as at least one aggregate in the query. This creates problems because it groups by the id column and that effectively undoes all grouping. Example (in Django shell plus):
str(Person.objects.annotate(latest_dob=Max('date_of_birth')).values('first_name', 'latest_dob').query)
Produces:
'SELECT "org_person"."first_name", MAX("org_person"."date_of_birth") AS "latest_dob" FROM "org_person" GROUP BY "org_person"."id", "org_person"."created_on", "org_person"."first_name", "org_person"."middle_name", "org_person"."last_name", "org_person"."prefix", "org_person"."suffix", "org_person"."date_of_birth"'
Because of this, instead of having a row per unique first name, there is a row for every entry. This is not desired at all.
I found a trick on StackOverflow that involves changing the query to look like this:
str(Person.objects.annotate(dummy_group_by_value=models.Value(1)).values('dummy_group_by_value').annotate(latest_dob=Max('date_of_birth')).values('first_name', 'latest_dob').query)
The resulting SQL is now correct (and produces the correct number of rows when run):
'SELECT "org_person"."first_name", MAX("org_person"."date_of_birth") AS "latest_dob" FROM "org_person" GROUP BY "org_person"."first_name"'
This works fine until I introduce window functions. It attempts to group by the window function, which is invalid:
str(Person.objects.annotate(dummy_group_by_value=Value(1)).values('dummy_group_by_value').annotate(rn=Window(RowNumber(), partition_by=F('last_name'), order_by='first_name'), latest_dob=Max('date_of_birth')).values('first_name', 'latest_dob', 'rn').query)
Which produces this SQL:
'SELECT "org_person"."first_name", ROW_NUMBER() OVER (PARTITION BY "org_person"."last_name" ORDER BY "org_person"."first_name") AS "rn", MAX("org_person"."date_of_birth") AS "latest_dob" FROM "org_person" GROUP BY 2, "org_person"."first_name"'
Note the GROUP BY 2
, where 2 refers to the window function column. It is not possible to do this in SQLite or any database that I’m aware of. And running it generates an error “misuse of window function”. It is perfectly okay not to group by the column at all and indeed, that’s what I want to happen.
I can “fix” this by removing the trick. The window function field is no longer grouped by, but all of the other group by columns show back up (including id) and the results are no longer correct.
If I can solve the extraneous group by columns problem without using the trick, I feel like this window function problem goes away as well. I’m honestly not sure why Django has to add the extra columns. I explicitly specify which fields I want and which ones are aggregated. It should only included non-aggregated, non-window function fields explicitly listed in .values()
for the group by. If there is no .values()
, then I suppose it can add all the group by fields, though in that case, I’m not sure what the results are supposed to mean anyway. It feels broken no matter what.