Django queryset chain annotate and distinct together

devspectre · July 9, 2021, 5:42pm

Hi!

I have queryset which uses both annotate and distinct but it throws the error.
'Error: annotate() + distinct(fields) is not implemented.'

The code looks like this.

toys = (
    Toy
    .objects
    .values('owner', 'mall_id')
    .annotate(max_price=Max('price'))
    .order_by('-max_price', 'owner')
    .distinct('owner')[:30]
)

I want to get top expensive toys but only one per owner.

How can I get this done?

Thanks

KenWhitesell · July 9, 2021, 7:46pm

try:
toys = Toy.objects.distinct('owner').order_by('owner', '-price')

KenWhitesell · July 9, 2021, 7:53pm

Sorry, misinterpreted the original objective of the query.

To make sure I’m straight - you want the 30 most expensive toys, but if the same owner appears in that list, they aren’t included.

So, if you have the situation with the following rows in toys, and assuming you’re only looking for 3 entries instead of 30:

Owner    Price
Joe      $100
Jane     $70
Jane     $50
Fred     $40
Mary     $20

The list you’re looking for would look like this:

Owner    Price
Joe      $100
Jane     $70
Fred     $40

So if the top 100 priced items are all owned by the same person, the second item on the list would be the 101st-most expensive item.

Is this the correct interpretation?

devspectre · July 9, 2021, 10:36pm

Yes, that is correct interpretation

KenWhitesell · July 9, 2021, 11:52pm

Ok, so I was on the right track:

The issue here is that it’s sorted by owner and not by price, so truncating the list gets the first “n” alphabetically and not by price, and sorting by price first doesn’t work because distinct expressions must match order by expressions.

I’m going to keep looking at this…

KenWhitesell · July 10, 2021, 12:59am

I think I’ve got something working here by using a Subquery:

toy_sub = Toys.objects.filter(owner=OuterRef('owner')).order_by('-price').values('price').annotate(max_price=Max('price')).values('max_price')[:1]
toys = Toys.objects.annotate(max_price=Subquery(toy_sub)).distinct('owner', 'max_price').order_by('-max_price', 'owner')

devspectre · July 12, 2021, 5:21pm

Hi @KenWhitesell

I will try this.

Will keep you posted

Thanks

devspectre · August 3, 2021, 5:59am

@KenWhitesell
That did not solve the problem.

sub_query = (
    scores
    .filter(player=OuterRef('player'))
    .order_by('-score')
    .values('score')
    .annotate(max_score=Max('score'))
    .values('max_score')[:1]
)

This gives the following error.

*** ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.

devspectre · August 3, 2021, 6:02am

So here’s the original QuerySet

scores = (
    scores
    .values('player', 'venuemachine_id')
    .annotate(max_score=Max('score'))
    .order_by('-max_score')[:count]
)

It worked well, grouped by player_id and ordered by max_score
Then I needed to get id of scores, so I added id to values

scores = (
    scores
    .values('id', 'player', 'venuemachine_id')
    .annotate(max_score=Max('score'))
    .order_by('-max_score')[:count]
)

Then it broke grouping by id instead of player_id.

I hope this explains a bit more.

devspectre · August 3, 2021, 6:10am

FYI

This is the raw SQL query of original QuerySet which does not contain id field.

SELECT "venuemachine_score"."player_id", "venuemachine_score"."venuemachine_id", MAX("venuemachine_score"."score") AS "max_score" FROM "venuemachine_score" WHERE ("venuemachine_score"."is_shared" = True AND "venuemachine_score"."venuemachine_id" IN (SELECT U0."id" FROM "venuemachine_venuemachine" U0 WHERE U0."machine_id" = 1195)) GROUP BY "venuemachine_score"."player_id", "venuemachine_score"."venuemachine_id" ORDER BY "max_score" DESC  LIMIT 20

This is SQL Query of QuerySet that contains id field

SELECT "venuemachine_score"."id", "venuemachine_score"."player_id", "venuemachine_score"."venuemachine_id", MAX("venuemachine_score"."score") AS "max_score" FROM "venuemachine_score" WHERE ("venuemachine_score"."is_shared" = True AND "venuemachine_score"."venuemachine_id" IN (SELECT U0."id" FROM "venuemachine_venuemachine" U0 WHERE U0."machine_id" = 1195)) GROUP BY "venuemachine_score"."id" ORDER BY "max_score" DESC  LIMIT 20

Please notice that difference of Group By clause.

KenWhitesell · August 3, 2021, 12:36pm

Correct, the actual query is both lines. You can’t execute the first line alone as a query. I do not see where you’ve tried to use what I supplied.

devspectre · August 3, 2021, 10:18pm

I tried that first but it didn’t work, so I tried other stuffs

awtimmering · August 4, 2021, 2:38am

I unfortunately don’t have time right now to mock this up in Django ORM, but the way you would probably normally approach this in SQL would be: (assuming I’m understanding what you’re trying to do correctly, based on Ken’s comment:

Annotate to add a RANK() (or rather a ROW_NUMBER() to avoid duplicates) by descending price, partitioned by owner, let’s say we call this owner_price_rank
Then filter the dataset on owner_price_rank == 1 (i.e. the most expensive item by each owner)
Then ORDER BY price DESC, and select the first 30 rows

For example:

WITH 
  OwnerRanking AS (
    SELECT
      mall_id,
      owner,
      price,
      ROW_NUMBER() OVER (PARTITION BY owner ORDER BY price DESC) AS owner_price_rank
    FROM
      sometable
  )
SELECT
  mall_id,
  owner,
  price
FROM OwnerRanking
WHERE
  owner_price_rank = 1
ORDER BY price DESC
LIMIT 30;

awtimmering · August 4, 2021, 6:41am

Thought I’d give it a shot. While this is the ‘normal’ way to get to it in SQL, Django ORM currently doesn’t allow filters on top of Windows, see ticket/28333.

If/when addressed you should be able to do something like below, but alas not yet … Back to plain SQL?

  qs = (
    models.Toy.objects
      .annotate(
        owner_price_rank=Window(
          expression=RowNumber(),
          partition_by=[F('owner')],
          order_by=F('price').desc()))
      .filter(owner_price_rank=1)
    [:30]
  )

Topic		Replies	Views
At a loss with ordering by one field and making distinct on another Using Django	3	893	August 10, 2020
Django queryset filter with Rank Using the ORM	13	7117	November 3, 2022
Need Help With Using Annotate in a Query Using Django	2	828	July 19, 2020
Find objects with mix Distinct and Order_by Using the ORM	8	3548	April 5, 2022
Custom manager With DISTINCT ON Using Django	4	724	June 4, 2021

Django queryset chain annotate and distinct together

Related topics