I am upgrading my app to Django 4.2. So far, I stumbled onto a single issue with PostgreSQL-specific queries.
Here is a queryset snippet that has a different behaviour between Django 4.1.9 and Django 4.2+:
from django.db import models
from django.contrib.postgres.fields import ArrayField
from django.contrib.postgres.aggregates import ArrayAgg
class Offer(models.Model):
keywords = ArrayField(
models.TextField(),
default=list,
blank=True,
)
@classmethod
def flatten_keywords(cls):
return cls.objects.all().annotate(keyword_lists=models.Func(models.F("keywords"), function="unnest")).values_list("keyword_lists", flat=True).aggregate(keyword_list=ArrayAgg("keyword_lists", distinct=True))
Function flatten_keywords performs a following working query in Django 4.1.9:
SELECT ARRAY_AGG(DISTINCT "keyword_lists" ) FROM (SELECT unnest("camps_offer"."keywords") AS "keyword_lists" FROM "camps_offer" subquery
However, if I upgrade to Django 4.2.0 (or 4.2.2), the statement changes:
SELECT ARRAY_AGG(DISTINCT unnest("camps_offer"."keywords") ) AS "keyword_list" FROM "camps_offer"
It is followed by the error and a hint:
ERROR: aggregate function calls cannot contain set-returning function calls at character 27
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
I am struggling to cherry-pick the change that led to it, can’t see anything related to it in the changelog.
Did anyone happen to have this as well, or know what was changed?
Django 4.2 shipped changes that avoids doing subquery pushdown on aggregation when unnecessary as these can be harmful performance wise in a lot of cases.
What it failed to account to is that when aggregate expressions refer to certain expressions the pushdown must still take place.
An example of that fixed in Django 4.2.2 is reference to subqueries but the problem you are facing is similar in essence; an aggregate referencing an expression that produces rows.
Could you confirm that using the following code instead addresses your issue?
class Unnest(Func):
subquery = True
function = "unnest"
cls.objects.annotate(
keyword_lists=Unnest("keywords"),
).values_list(
"keyword_lists", flat=True
).aggregate(
keyword_list=ArrayAgg("keyword_lists", distinct=True)
)
If it does it makes me believe we might need an Expression flag to denote either that it returns rows or maybe simply a flag that denotes that it can only be summarized by reference. The latter would be the case for aggregates, subqueries, and functions like unnest.
Thank you for the reply and extensive explanation.
This is good to know, I definitely missed this change.
Your example indeed addresses the issue. I am honestly a bit far from the Django database internals architecture to justify whether the patch in such logic is needed and what would be the right place for it. But it feels like such pushdown certainly makes sense in some cases.
Now, this code fails again with the similar error (code logic did not change, but Django version):
aggregate function calls cannot contain set-returning function calls
LINE 1: SELECT ARRAY_AGG(DISTINCT unnest("camps_offer"."keywords") )...
^
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
Is there any chance I could get your brain here again, @charettes ?
Otherwise, I see just two options - to go raw SQL or switch to Python-based evaluations, which would be much slower.
Oh, I should pay more attention to what is happening. This totally makes sense as I see it now.
Many thanks again for the suggestion, and please forgive me for the noise.
I just upgraded django from 4.1.7 to 5.0.3 and had a similar situation: using unnest function with an aggregate, which resulted in the same error. The solution of @charettes works.
Discussions about introducing an Expression.set_returning: bool = False flag to allow the ORM to identify user-defined Func subclasses that may return more than one row are taking place in this accepted ticket if y’all want to chime in.
It would avoid (ab)using the .contains_subquery flag and more importantly account for cases where a set-returning annotation is not referenced by the eventuated aggregation and document how it should be used.