Distinct results for a ranked search

gunnerson · March 10, 2025, 10:31pm

I have a custom manager that returns a ranked by weights query:

class MyManager(models.Manager):
    
    def search(self, query, fa, fb=None, fc=None, fd=None, qs=None):
        try:
            if not (1 in [c in query for c in '&|()!*:"']):
                query = " & ".join([f"{q}:*" for q in query.split()])

            vector = SearchVector(*fa, weight="A", config="english")
            if fb:
                vector += SearchVector(*fb, weight="B", config="english")
            if fc:
                vector += SearchVector(*fc, weight="C", config="english")
            if fd:
                vector += SearchVector(*fd, weight="D", config="english")
            query = SearchQuery(query, search_type="raw")
            qs = (
                qs.annotate(search=vector, rank=SearchRank(vector, query))
                .filter(search=query)
                .order_by("-rank", "-id")
                .distinct("rank", "id")
            )

        except (ProgrammingError, UnboundLocalError):
            qs = qs.none()

        return qs

But if I try to search on related fields of different models, I get duplicate results. Is there a way around it?

```
class Case(models.Model):
    machine = models.ForeignKey(Machine)   
    user = models.ForeignKey(Profile)    

    objects = MyManager()

class Profile(models.Model):
    user = models.OneToOneField(User)

class CaseListView(ListView):
    model = Case

    def get_queryset(self):
        query = self.request.GET.get("query", None)
        if query:
            qs = self.model.objects.all()
            fa = (
                "id",
                "machine__serial_number",
                "machine__company__name",
                "user__user__first_name",
                "user__user__last_name",
            )
            fb = ("hashtags",)
            fc = ("caseprogress__desc",)
            qs = self.model.objects.search(
                query, fa, fb, fc, qs=qs
            )
            return qs

merwok · March 11, 2025, 4:38pm

This line does not eliminate duplicates, as the same row can be matched by multiple vectors each with a different rank. I think you are using the postgresql trick of order_by + distinct, which works great in other cases (get one record for each sub-group based on criterion), but it is not necessary for searches.

In my own code, I do something like:

        rank = SearchRank(vector, query)
        qs = self.annotate(rank=rank).distinct()
        qs = qs.filter(rank__gte=0.2)
        qs = qs.order_by("-rank", "title")

which takes care of duplicates. Strangely, in two other cases I don’t call distinct() and have no problem!

gunnerson · March 11, 2025, 6:26pm

That won’t do anything in my case. A workaround would be something like this:

pk_list = list(dict.fromkeys([i[0] for i in qs.values_list("id", "rank")]))
preserved = Case(*[When(pk=pk, then=pos) for pos, pk in enumerate(pk_list)])
qs = queryset.filter(pk__in=pk_list).order_by(preserved)

# Or on the DB level
qs = queryset.filter(pk__in=pk_list).extra(
            select={
                "ordering": f"CASE {" ".join( ["WHEN id=%s THEN %s" % (pk, i) for i, pk in enumerate(pk_list)])} END"
            },
            order_by=("ordering",),
        )

But converting queryset to a list of id’s is a lot of overhead and I wonder if there’s a better way.

merwok · March 11, 2025, 10:10pm

I don’t see why that would be. Our models and managers look similar. Have you tried my code with the distinct call?

A workaround to what problem?
You lost me with the code snippet that follows.

Topic		Replies	Views
SearchQuery and duplicate in queryset Using the ORM	4	5638	April 13, 2022
how to using `order_by` from relate model combine with `distinct` Forms & APIs	10	2643	November 17, 2022
At a loss with ordering by one field and making distinct on another Using Django	3	1115	August 10, 2020
SELECT Distinct on a many-to-many model Using the ORM	4	1242	March 4, 2024
problems with duplicates in querysets Using the ORM	1	347	June 16, 2023

Distinct results for a ranked search

Related topics