Hi folks,
I’m working on a rather complicated query that requires a chain of window functions where the column generated by one window function is used for the order_by
of a subsequent window function.
Unfortunately, my attempts end up generating invalid SQL.
The (drastically simplified) query looks like this:
Thing.objects
.annotate(row_number=Window(RowNumber()))
.annotate(row_number2=Window(RowNumber(), order_by=["row_number"]))
.filter(row_number2=1)
Which fails on PostgreSQL with:
django.db.utils.ProgrammingError: window functions are not allowed in window definitions
LINE 1: ...R () AS "row_number", ROW_NUMBER() OVER (ORDER BY ROW_NUMBER...
(Also fails on SQLite with a less meaningful error message.)
I do understand why the generated SQL is not valid:
SELECT *
FROM (
SELECT my_app_thing.id AS col1,
ROW_NUMBER() OVER () AS row_number,
ROW_NUMBER() OVER (ORDER BY ROW_NUMBER() OVER ()) AS row_number2
FROM my_app_thing
) qualify
WHERE row_number2 = 1;
But I don’t know if there is a way to force the ORM to generate something more like this:
SELECT col1 as id,
row_number,
row_number2
FROM (
SELECT *,
ROW_NUMBER() OVER (ORDER BY row_number) AS row_number2
FROM (
SELECT my_app_thing.id AS col1,
ROW_NUMBER() OVER () AS row_number
FROM my_app_thing
) qualify
) qualify2
WHERE row_number2 = 1;
Is this a missing feature? OR a bug? I think the ORM should automatically generate a nested subquery, like it does when filtering on a column generated by a window function…
Hello @salomvary,
There is unfortunately no way to have the ORM automatically perform a subquery pushdown when a window function is referenced in another one as of 5.2.
Before 4.2 it wasn’t not even possible to reference a window function in a filter but a significant amount of efforts went into implementing support for it as it was known to have a bounded number of nesting (a single one) and it was done in hope that major RDMBS eventually adopt the QUALIFY
clause. You can find more details about it in this ticket.
The reason why it’s particularly hard to implement nested window function reference support is that it now becomes almost impossible to do the right things when other functions are referencing these annotations through ORDER BY
, WHERE
, or even aggregation over them. For these reasons I think that support for the feature is unlikely to land any time soon and that efforts would be better be spent on adding an explicit method to perform a subquery wrapping.
The latter would allow you to explicitly do
Thing.objects.annotate(
row_number=Window(RowNumber())
).wrap().annotate(
row_number2=Window(RowNumber(), order_by=["row_number"])
).filter(row_number2=1)
In the mean time your best shot is to use Queryset.raw
.
@charettes Thanks for the explanation. I understand there needs to be a limit on how smart an ORM is, the single level cutoff is likely good enough for 99% of the use cases here.
Interesting about QUALIFY
, was not aware this is (or rather will be) an SQL thing, always wondered about why the name “qualify” in the Django-generated queries. Seems indeed not very widely adopted yet: QUALIFY clause (window functions) | SQL
Subquery wrapping would be awesome.
In the mean time your best shot is to use Queryset.raw
.
Seems like it ![:slight_smile: :slight_smile:](https://emoji.discourse-cdn.com/twitter/slight_smile.png?v=12)
For the record I’ve also tried the third party django-cte
package to avoid writing raw queries but I hit a bug there: Duplicate CTE generated when multiple window functions are involved · Issue #105 · dimagi/django-cte · GitHub
If it can be of any help to django-cte
maintainers they likely need to adjust their CTEQueryCompiler
to specialize get_qualify_sql
method so it doesn’t include CTE’s in the inner query.
The fact the ORM doesn’t even have an internal hook when it performs subquery wrapping (aggregation, exclude
against m2m, qualify) makes it hard for tools like django-cte
to transplant CTE in queries pushed down to the outer wrapper so they adequately get compiled. This is tracked by this ticket.
1 Like