At a loss with ordering by one field and making distinct on another

Hey everyone,

I hope this is the right place for this. I’ll try to describe it as simply as possible, but my head is spinning a bit after a few hours of googling. :slight_smile: I have a Building model, and I want to add a few annotations for a particular client, so for each Building I’m calculating distance from the client, and how many of the things the clients wants (does it have a pool, AC included, etc.) are satisfied. I can do that easily enough with something like:

buildings = Building.objects.all()
buildings = buildings.annotate(distance=Distance('location', client_location))

# This is an example because I can't share the real code, but assume this gives the number of requests matched
buildings = buildings.annotate(matches=____)

buildings.order_by('matches', 'distance')

The part that ruins it for me is that the client also doesn’t want me to recommend more than one building in the same neighborhood. So what I’m trying to do is first sort by matches and distance, and then eliminate any duplicate neighborhoods. So something like buildings.order_by('matches', 'distance').distinct('neighborhood'). That gives me this error:

django.db.utils.ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions

I can solve it by starting the order_by with neighborhood, but then they’re no longer ordered correctly. There’s a Django bug report that brings up a similar problem which suggests solving it with a subquery, but if I do that I have to recompute the annotations which are somewhat slow, because they are in the subquery. So is there any way to make this work?

1 Like

Before you summarily reject a solution as being too slow, I recommend looking at the queries being generated to see if that’s really the case. You can also do some testing to see how slow it is - try to determine what exactly the effects are of each additional clause or annotation you add to the query.

Regarding your specific question, you might be able to generate the first part with one order_by, then re-sort the QuerySet by the desired order.

Your first comment is almost definitely fair - it’s slower but probably trivial compared to other parts of this endpoint, so I’ll just go with that solution. I think what’s getting me is it feels like this should be possible. But maybe it just isn’t and I’ll have to live with that and accept that life is hard. :slight_smile: I appreciate the help and the suggestion, unfortunately it still gives the same error. I also tried evaluating the queryset by calling list(buildings) prior to the second order_by but that didn’t work.

buildings.order_by('neighborhood', 'matches', 'distance').distinct('neighborhood').order_by('matches', 'distance') 

This is one of those situations where I’d be tempted to work backwards by writing my SQL query first, then working it backwards into the ORM. (Or, if I find that to be too hard to understand, I’d leave it as a raw SQL query.) I’m not one who believes that the ORM must be used for everything, however, if I can work the query back into ORM notation, I generally prefer doing that.