Hi,
I want to try and understand why the generated SQL for annotations always (?) results in the subqueries being duplicated when re-used in other annotations, instead of using the relevant AS
variables. For example:
class Person(models.Model):
created_datetime = models.DateTimeField('Created at', null=False, blank=False, default=timezone.now)
name = models.CharField('Name', max_length=128, null=True, blank=True)
parent = models.ForeignKey('self', null=True, blank=True, on_delete=models.CASCADE)
def __str__(self):
return self.name
class PersonLog(models.Model):
person = models.ForeignKey('person.Person', null=True, blank=True, on_delete=models.CASCADE)
created_datetime = models.DateTimeField('Created at', null=False, blank=False, default=timezone.now)
login = models.BooleanField('Login', null=True, blank=True)
def __str__(self):
return self.person.name
In an example used elsewhere, I build a queryset like so:
persons = Person.objects.all().annotate(
log_login_count=Count(
'personlog',
filter=Q(personlog__login=True)
),
log_login_duration_over=Now() - F('created_datetime'),
log_login_frequency=ExpressionWrapper(
F('log_login_duration_over') / F('log_login_count'),
output_field=models.DurationField(),
),
).values()
Testing this on sqlite (but the same happens when running the same query on MySQL – no idea about PG), the generated SQL is:
SELECT
"person_person"."id",
"person_person"."created_datetime",
"person_person"."name",
"person_person"."parent_id",
COUNT("person_personlog"."id") FILTER (
WHERE
"person_personlog"."login"
) AS "log_login_count",
django_timestamp_diff(
CURRENT_TIMESTAMP,
"person_person"."created_datetime"
) AS "log_login_duration_over",
(
django_format_dtdelta(
'/',
django_timestamp_diff(
CURRENT_TIMESTAMP,
"person_person"."created_datetime"
),
COUNT("person_personlog"."id") FILTER (
WHERE
"person_personlog"."login"
)
)
) AS "log_login_frequency"
FROM
"person_person"
LEFT OUTER JOIN "person_personlog" ON (
"person_person"."id" = "person_personlog"."person_id"
)
GROUP BY
"person_person"."id",
"person_person"."created_datetime",
"person_person"."name",
"person_person"."parent_id",
django_timestamp_diff(
CURRENT_TIMESTAMP,
"person_person"."created_datetime"
)
Notice this:
(
django_format_dtdelta(
'/',
django_timestamp_diff(
CURRENT_TIMESTAMP,
"person_person"."created_datetime"
),
COUNT("person_personlog"."id") FILTER (
WHERE
"person_personlog"."login"
)
)
) AS "log_login_frequency"
Given the two operands of the division are computed/declared just above, why is the complete query not generated as:
SELECT
"person_person"."id",
"person_person"."created_datetime",
"person_person"."name",
"person_person"."parent_id",
COUNT("person_personlog"."id") FILTER (
WHERE
"person_personlog"."login"
) AS "log_login_count",
django_timestamp_diff(
CURRENT_TIMESTAMP,
"person_person"."created_datetime"
) AS "log_login_duration_over",
(
django_format_dtdelta(
'/',
log_login_duration_over,
log_login_count
)
) AS "log_login_frequency"
FROM
"person_person"
LEFT OUTER JOIN "person_personlog" ON (
"person_person"."id" = "person_personlog"."person_id"
)
GROUP BY
"person_person"."id",
"person_person"."created_datetime",
"person_person"."name",
"person_person"."parent_id",
django_timestamp_diff(
CURRENT_TIMESTAMP,
"person_person"."created_datetime"
)
where both log_login_duration_over
and log_login_count
are actually used directly after being “declared”.
The advantages seem obvious:
- less bytes sent over the wire
- more performance? Would the redeclaration/recomputation be heavily cached at database level anyway?
I am sure there is a good reason for this, but would like to understand it!
Many thanks!