Using PostgreSQL Similarity % Operator with the ORM

I’ve been researching this subject pretty hard for the past few weeks, and I’ve yet to come up with a nice solution. I’m working on a searching tool to utilize across my entire project, and needing to add fuzzy search. To achieve this, I followed the suggestion of the Django Docs and implemented the trigram search method, pg_trgm, in my project.

Using the following query, I’m able to “fuzzily” search my DB, great!

Author.objects.annotate(
...     similarity=TrigramSimilarity('name', query),
... ).filter(similarity__gt=0.3).order_by('-similarity')

The issue with this is the speed. I found when using the TrigramSImilarity function, it does not utilize any indexes created on the DB, including the ones using the Trigram Operator indexes

            GinIndex(name='trigram_task_title_idx', fields=['title'], opclasses=['gin_trgm_ops']),
            GinIndex(name='trigram_task_desc_idx', fields=['description'], opclasses=['gin_trgm_ops']),

My solution to this was using the PostgreSQL % operator! This utilizes the index, so all is well… Until I realized the only way I can get this to work is with Raw SQL queries.

SELECT id, title, description, owner_id, %(s_type)s AS s_type, (1 - (title <-> %(query)s)) AS similarity FROM course_requirement WHERE company_id=%(company_id)s AND (title %% %(query)s OR description %% %(query)s) order by similarity limit 50

(The %% represents % in Raw SQL)

Anyway, this has caused issues since now we cannot extend off of the QuerySet it returns. Since we are given a RawQuerySet, we can’t perform any Django ORM operations, and I’m forced to convert to a list of dicts for any way to work with them, which just isn’t scalable.

Is there any way I can use the PostgreSQL % operator using the ORM, or convert the RawQuerySet to a normal QuerySet?

Thanks!

1 Like

:wave: @Kandles11

Until I realized the only way I can get this to work is with Raw SQL queries.

This should not be the case, using the % operator should be doable through custom lookups.

Did you try implementing one?

1 Like

I had never heard of this until now, so I haven’t tried it out. Upon first impressions, it looks like this would work! I’ll try it out and report back.

Thank you for your help!

e.g.

from django.db.models import CharField, Lookup, TextField

class Similar(Lookup):
    lookup_name = "sim"

    def as_postgresql(self, compiler, connection):
        lhs_sql, lhs_params = self.process_lhs(compiler, connection)
        rhs_sql, rhs_params = self.process_rhs(compiler, connection)
        params = lhs_params + rhs_params
        return f"{lhs_sql} % {rhs_sql}", params

CharField.register_lookup(Similar)
TextField.register_lookup(Similar)

And then

Requirement.objects.filter(
   Q(title__sim=query) | Q(description__sim=query)
)
2 Likes

With one minor change, (changing % to %%), this ended up working perfectly!

Checking the SQL explanation, it now uses the % operator, which uses the index!

I’ll have to refactor everything to utilize this, but it’ll all be worth it, and I know this is going to make a huge difference for everybody working on the project.

Thanks again!

1 Like

How do you set the threshold parameter in that case?

In this situation, I’m relying on the threshold set in the database. This means that it will apply for every query, and you can’t change it on the fly.

SET pg_trgm.similarity_threshold = 0.15;

More info about this can be found here:
Postgres Docs