When doing aggregates in a django query, the count()
doesn’t seem to be correct, is this expected?
Here is an example, asking it for the total number of users:
>>> User.objects.count()
29511
But if I ask for an aggregated result by year the users logged in:
>>> User.objects.values('last_login__year').annotate(total=Count('id'))
<QuerySet [{'last_login__year': None, 'total': 29188}, {'last_login__year': 2023, 'total': 110}, {'last_login__year': 2022, 'total': 69}, {'last_login__year': 2021, 'total': 142}, {'last_login__year': 2024, 'total': 2}]>
This is returning 5 rows:
>>> len(User.objects.values('last_login__year').annotate(total=Count('id')))
5
And yet, if I ask for the count()
instead:
>>> User.objects.values('last_login__year').annotate(total=Count('id')).count()
29511
Is this expected?
This is giving me some odd results when using the paginator which is giving some odd count results because of this discrepancy between len()
and count()
You are using the wrong method.
Add the filtering details you want.
or
User.objects.filter(last_login__year={year}).count()
or
from django.models import Count
User.objects.all().annotate(total=Count('id', filter=Q(last_login__year={year})))
But I want all years, not one in particular, I’m trying to make a point that count() and len() return different results
Hey @hernantz could you share the SQL generated for the last query, I’m also surprised it doesn’t return 5. If you could share your backend and the version of Django you are using it would also help.
I would expect it to look something like
SELECT COUNT(*) FROM (
SELECT EXTRACT(YEAR FROM "last_login")
FROM "auth_user"
GROUP BY 1
) subquery
But it seems like it might be eliding the total
annotation and doing
SELECT COUNT(*) FROM auth_user
I’ve just tested main
as well as the latest 4.2.x
on SQLite and Postgres and it did produce the right query
SQLite
SELECT
COUNT(*)
FROM
(
SELECT
django_datetime_extract(
'year', "auth_user"."last_login",
NULL, NULL
) AS "last_login__year"
FROM
"auth_user"
GROUP BY
1
) subquery
Postgres
SELECT
COUNT(*)
FROM
(
SELECT
EXTRACT(
YEAR
FROM
"auth_user"."last_login"
) AS "last_login__year"
FROM
"auth_user"
GROUP BY
1
) subquery
So I suspect you are using a third-party app or database backend or not the latest release of 4.2 or 5.0.
Are you not thinking of using this?
from django.models import Count
User.objects.all()\
.annotate(a=Count('id', filter=Q(last_login__year={year1})))\
.annotate(b=Count('id', filter=Q(last_login__year={year2})))
This is sql that is being generated:
>>> User.objects.values('last_login__year').annotate(total=Count('id')).count()
SELECT COUNT(*)
FROM (
SELECT "auth_user"."id" AS "col1"
FROM "auth_user"
GROUP BY EXTRACT(YEAR FROM"auth_user"."last_login" AT TIME ZONE 'UTC'),
1
) subquery
Execution time: 0.040411s [Database: default]
29511
>>> User.objects.values('last_login__year').annotate(total=Count('id'))
SELECT EXTRACT(YEAR FROM"auth_user"."last_login" AT TIME ZONE 'UTC') AS "last_login__year",
COUNT("auth_user"."id") AS "total"
FROM "auth_user"
GROUP BY 1
LIMIT 21
Execution time: 0.015709s [Database: default]
<QuerySet [{'last_login__year': None, 'total': 29188}, {'last_login__year': 2023, 'total': 110}, {'last_login__year': 2022, 'total': 69}, {'last_login__year': 2021, 'total': 142}, {'last_login__year': 2024, 'total': 2}]>
Which version of Django and database backend are you using? Can you provide the exact definition of your User
model?
I’m using Django 4.2 and PostgreSQL 12.9 with the auth
user model:
>>> django.__version__
'4.2'
>>> User
<class 'django.contrib.auth.models.User'>
As pointed out previously you are not using the latest 4.2 and are affected by a bug fixed in 4.2.1 that matches your described problem.
Please upgrade to 4.2.15 which is the latest 4.2 release.
1 Like