Django 4.2 behavior change when using ArrayAgg on unnested ArrayField (PostgreSQL-specific)

Hello @amureki!

Django 4.2 shipped changes that avoids doing subquery pushdown on aggregation when unnecessary as these can be harmful performance wise in a lot of cases.

What it failed to account to is that when aggregate expressions refer to certain expressions the pushdown must still take place.

An example of that fixed in Django 4.2.2 is reference to subqueries but the problem you are facing is similar in essence; an aggregate referencing an expression that produces rows.

Could you confirm that using the following code instead addresses your issue?

class Unnest(Func):
    subquery = True
    function = "unnest"

cls.objects.annotate(
    keyword_lists=Unnest("keywords"),
).values_list(
    "keyword_lists", flat=True
).aggregate(
    keyword_list=ArrayAgg("keyword_lists", distinct=True)
)

If it does it makes me believe we might need an Expression flag to denote either that it returns rows or maybe simply a flag that denotes that it can only be summarized by reference. The latter would be the case for aggregates, subqueries, and functions like unnest.

2 Likes