Hi!
I have queryset which uses both annotate and distinct but it throws the error.
'Error: annotate() + distinct(fields) is not implemented.'
The code looks like this.
toys = (
Toy
.objects
.values('owner', 'mall_id')
.annotate(max_price=Max('price'))
.order_by('-max_price', 'owner')
.distinct('owner')[:30]
)
I want to get top expensive toys but only one per owner.
How can I get this done?
Thanks
try:
toys = Toy.objects.distinct('owner').order_by('owner', '-price')
Sorry, misinterpreted the original objective of the query.
To make sure Iâm straight - you want the 30 most expensive toys, but if the same owner appears in that list, they arenât included.
So, if you have the situation with the following rows in toys, and assuming youâre only looking for 3 entries instead of 30:
Owner Price
Joe $100
Jane $70
Jane $50
Fred $40
Mary $20
The list youâre looking for would look like this:
Owner Price
Joe $100
Jane $70
Fred $40
So if the top 100 priced items are all owned by the same person, the second item on the list would be the 101st-most expensive item.
Is this the correct interpretation?
Yes, that is correct interpretation
Ok, so I was on the right track:
The issue here is that itâs sorted by owner and not by price, so truncating the list gets the first ânâ alphabetically and not by price, and sorting by price first doesnât work because distinct
expressions must match order by
expressions.
Iâm going to keep looking at thisâŚ
I think Iâve got something working here by using a Subquery:
toy_sub = Toys.objects.filter(owner=OuterRef('owner')).order_by('-price').values('price').annotate(max_price=Max('price')).values('max_price')[:1]
toys = Toys.objects.annotate(max_price=Subquery(toy_sub)).distinct('owner', 'max_price').order_by('-max_price', 'owner')
1 Like
@KenWhitesell
That did not solve the problem.
sub_query = (
scores
.filter(player=OuterRef('player'))
.order_by('-score')
.values('score')
.annotate(max_score=Max('score'))
.values('max_score')[:1]
)
This gives the following error.
*** ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
So hereâs the original QuerySet
scores = (
scores
.values('player', 'venuemachine_id')
.annotate(max_score=Max('score'))
.order_by('-max_score')[:count]
)
It worked well, grouped by player_id
and ordered by max_score
Then I needed to get id
of scores, so I added id
to values
scores = (
scores
.values('id', 'player', 'venuemachine_id')
.annotate(max_score=Max('score'))
.order_by('-max_score')[:count]
)
Then it broke grouping by id
instead of player_id
.
I hope this explains a bit more.
FYI
This is the raw SQL query of original QuerySet which does not contain id
field.
SELECT "venuemachine_score"."player_id", "venuemachine_score"."venuemachine_id", MAX("venuemachine_score"."score") AS "max_score" FROM "venuemachine_score" WHERE ("venuemachine_score"."is_shared" = True AND "venuemachine_score"."venuemachine_id" IN (SELECT U0."id" FROM "venuemachine_venuemachine" U0 WHERE U0."machine_id" = 1195)) GROUP BY "venuemachine_score"."player_id", "venuemachine_score"."venuemachine_id" ORDER BY "max_score" DESC LIMIT 20
This is SQL Query of QuerySet that contains id
field
SELECT "venuemachine_score"."id", "venuemachine_score"."player_id", "venuemachine_score"."venuemachine_id", MAX("venuemachine_score"."score") AS "max_score" FROM "venuemachine_score" WHERE ("venuemachine_score"."is_shared" = True AND "venuemachine_score"."venuemachine_id" IN (SELECT U0."id" FROM "venuemachine_venuemachine" U0 WHERE U0."machine_id" = 1195)) GROUP BY "venuemachine_score"."id" ORDER BY "max_score" DESC LIMIT 20
Please notice that difference of Group By clause.
Correct, the actual query is both lines. You canât execute the first line alone as a query. I do not see where youâve tried to use what I supplied.
I tried that first but it didnât work, so I tried other stuffs
I unfortunately donât have time right now to mock this up in Django ORM, but the way you would probably normally approach this in SQL would be: (assuming Iâm understanding what youâre trying to do correctly, based on Kenâs comment:
- Annotate to add a
RANK()
(or rather a ROW_NUMBER()
to avoid duplicates) by descending price, partitioned by owner, letâs say we call this owner_price_rank
- Then filter the dataset on
owner_price_rank == 1
(i.e. the most expensive item by each owner)
- Then
ORDER BY price DESC
, and select the first 30 rows
For example:
WITH
OwnerRanking AS (
SELECT
mall_id,
owner,
price,
ROW_NUMBER() OVER (PARTITION BY owner ORDER BY price DESC) AS owner_price_rank
FROM
sometable
)
SELECT
mall_id,
owner,
price
FROM OwnerRanking
WHERE
owner_price_rank = 1
ORDER BY price DESC
LIMIT 30;
Thought Iâd give it a shot. While this is the ânormalâ way to get to it in SQL, Django ORM currently doesnât allow filters on top of Windows, see ticket/28333.
If/when addressed you should be able to do something like below, but alas not yet ⌠Back to plain SQL?
qs = (
models.Toy.objects
.annotate(
owner_price_rank=Window(
expression=RowNumber(),
partition_by=[F('owner')],
order_by=F('price').desc()))
.filter(owner_price_rank=1)
[:30]
)