Django queryset chain annotate and distinct together

Hi!

I have queryset which uses both annotate and distinct but it throws the error.
'Error: annotate() + distinct(fields) is not implemented.'

The code looks like this.

toys = (
    Toy
    .objects
    .values('owner', 'mall_id')
    .annotate(max_price=Max('price'))
    .order_by('-max_price', 'owner')
    .distinct('owner')[:30]
)

I want to get top expensive toys but only one per owner.

How can I get this done?

Thanks

try:
toys = Toy.objects.distinct('owner').order_by('owner', '-price')

Sorry, misinterpreted the original objective of the query.

To make sure I’m straight - you want the 30 most expensive toys, but if the same owner appears in that list, they aren’t included.

So, if you have the situation with the following rows in toys, and assuming you’re only looking for 3 entries instead of 30:

Owner    Price
Joe      $100
Jane     $70
Jane     $50
Fred     $40
Mary     $20

The list you’re looking for would look like this:

Owner    Price
Joe      $100
Jane     $70
Fred     $40

So if the top 100 priced items are all owned by the same person, the second item on the list would be the 101st-most expensive item.

Is this the correct interpretation?

Yes, that is correct interpretation

Ok, so I was on the right track:

The issue here is that it’s sorted by owner and not by price, so truncating the list gets the first “n” alphabetically and not by price, and sorting by price first doesn’t work because distinct expressions must match order by expressions.

I’m going to keep looking at this…

I think I’ve got something working here by using a Subquery:

toy_sub = Toys.objects.filter(owner=OuterRef('owner')).order_by('-price').values('price').annotate(max_price=Max('price')).values('max_price')[:1]
toys = Toys.objects.annotate(max_price=Subquery(toy_sub)).distinct('owner', 'max_price').order_by('-max_price', 'owner')
1 Like

Hi @KenWhitesell

I will try this.

Will keep you posted

Thanks

@KenWhitesell
That did not solve the problem.

sub_query = (
    scores
    .filter(player=OuterRef('player'))
    .order_by('-score')
    .values('score')
    .annotate(max_score=Max('score'))
    .values('max_score')[:1]
)

This gives the following error.

*** ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.

So here’s the original QuerySet

scores = (
    scores
    .values('player', 'venuemachine_id')
    .annotate(max_score=Max('score'))
    .order_by('-max_score')[:count]
)

It worked well, grouped by player_id and ordered by max_score
Then I needed to get id of scores, so I added id to values

scores = (
    scores
    .values('id', 'player', 'venuemachine_id')
    .annotate(max_score=Max('score'))
    .order_by('-max_score')[:count]
)

Then it broke grouping by id instead of player_id.

I hope this explains a bit more.

FYI

This is the raw SQL query of original QuerySet which does not contain id field.

SELECT "venuemachine_score"."player_id", "venuemachine_score"."venuemachine_id", MAX("venuemachine_score"."score") AS "max_score" FROM "venuemachine_score" WHERE ("venuemachine_score"."is_shared" = True AND "venuemachine_score"."venuemachine_id" IN (SELECT U0."id" FROM "venuemachine_venuemachine" U0 WHERE U0."machine_id" = 1195)) GROUP BY "venuemachine_score"."player_id", "venuemachine_score"."venuemachine_id" ORDER BY "max_score" DESC  LIMIT 20

This is SQL Query of QuerySet that contains id field

SELECT "venuemachine_score"."id", "venuemachine_score"."player_id", "venuemachine_score"."venuemachine_id", MAX("venuemachine_score"."score") AS "max_score" FROM "venuemachine_score" WHERE ("venuemachine_score"."is_shared" = True AND "venuemachine_score"."venuemachine_id" IN (SELECT U0."id" FROM "venuemachine_venuemachine" U0 WHERE U0."machine_id" = 1195)) GROUP BY "venuemachine_score"."id" ORDER BY "max_score" DESC  LIMIT 20

Please notice that difference of Group By clause.

Correct, the actual query is both lines. You can’t execute the first line alone as a query. I do not see where you’ve tried to use what I supplied.

I tried that first but it didn’t work, so I tried other stuffs

I unfortunately don’t have time right now to mock this up in Django ORM, but the way you would probably normally approach this in SQL would be: (assuming I’m understanding what you’re trying to do correctly, based on Ken’s comment:

  1. Annotate to add a RANK() (or rather a ROW_NUMBER() to avoid duplicates) by descending price, partitioned by owner, let’s say we call this owner_price_rank
  2. Then filter the dataset on owner_price_rank == 1 (i.e. the most expensive item by each owner)
  3. Then ORDER BY price DESC, and select the first 30 rows

For example:

WITH 
  OwnerRanking AS (
    SELECT
      mall_id,
      owner,
      price,
      ROW_NUMBER() OVER (PARTITION BY owner ORDER BY price DESC) AS owner_price_rank
    FROM
      sometable
  )
SELECT
  mall_id,
  owner,
  price
FROM OwnerRanking
WHERE
  owner_price_rank = 1
ORDER BY price DESC
LIMIT 30;

Thought I’d give it a shot. While this is the ‘normal’ way to get to it in SQL, Django ORM currently doesn’t allow filters on top of Windows, see ticket/28333.

If/when addressed you should be able to do something like below, but alas not yet … Back to plain SQL? :slight_smile:

  qs = (
    models.Toy.objects
      .annotate(
        owner_price_rank=Window(
          expression=RowNumber(),
          partition_by=[F('owner')],
          order_by=F('price').desc()))
      .filter(owner_price_rank=1)
    [:30]
  )