Using PostgreSQL Similarity % Operator with the ORM

Kandles11 · June 27, 2024, 3:06pm

I’ve been researching this subject pretty hard for the past few weeks, and I’ve yet to come up with a nice solution. I’m working on a searching tool to utilize across my entire project, and needing to add fuzzy search. To achieve this, I followed the suggestion of the Django Docs and implemented the trigram search method, pg_trgm, in my project.

Using the following query, I’m able to “fuzzily” search my DB, great!

Author.objects.annotate(
...     similarity=TrigramSimilarity('name', query),
... ).filter(similarity__gt=0.3).order_by('-similarity')

The issue with this is the speed. I found when using the TrigramSImilarity function, it does not utilize any indexes created on the DB, including the ones using the Trigram Operator indexes

            GinIndex(name='trigram_task_title_idx', fields=['title'], opclasses=['gin_trgm_ops']),
            GinIndex(name='trigram_task_desc_idx', fields=['description'], opclasses=['gin_trgm_ops']),

My solution to this was using the PostgreSQL % operator! This utilizes the index, so all is well… Until I realized the only way I can get this to work is with Raw SQL queries.

SELECT id, title, description, owner_id, %(s_type)s AS s_type, (1 - (title <-> %(query)s)) AS similarity FROM course_requirement WHERE company_id=%(company_id)s AND (title %% %(query)s OR description %% %(query)s) order by similarity limit 50

(The %% represents % in Raw SQL)

Anyway, this has caused issues since now we cannot extend off of the QuerySet it returns. Since we are given a RawQuerySet, we can’t perform any Django ORM operations, and I’m forced to convert to a list of dicts for any way to work with them, which just isn’t scalable.

Is there any way I can use the PostgreSQL % operator using the ORM, or convert the RawQuerySet to a normal QuerySet?

Thanks!

charettes · June 27, 2024, 3:23pm

@Kandles11

Until I realized the only way I can get this to work is with Raw SQL queries.

This should not be the case, using the % operator should be doable through custom lookups.

Did you try implementing one?

Kandles11 · June 27, 2024, 3:27pm

I had never heard of this until now, so I haven’t tried it out. Upon first impressions, it looks like this would work! I’ll try it out and report back.

Thank you for your help!

charettes · June 27, 2024, 3:29pm

e.g.

from django.db.models import CharField, Lookup, TextField

class Similar(Lookup):
    lookup_name = "sim"

    def as_postgresql(self, compiler, connection):
        lhs_sql, lhs_params = self.process_lhs(compiler, connection)
        rhs_sql, rhs_params = self.process_rhs(compiler, connection)
        params = lhs_params + rhs_params
        return f"{lhs_sql} % {rhs_sql}", params

CharField.register_lookup(Similar)
TextField.register_lookup(Similar)

And then

Requirement.objects.filter(
   Q(title__sim=query) | Q(description__sim=query)
)

Kandles11 · June 27, 2024, 3:56pm

With one minor change, (changing % to %%), this ended up working perfectly!

Checking the SQL explanation, it now uses the % operator, which uses the index!

I’ll have to refactor everything to utilize this, but it’ll all be worth it, and I know this is going to make a huge difference for everybody working on the project.

Thanks again!

sea256 · July 29, 2024, 10:33am

How do you set the threshold parameter in that case?

Kandles11 · July 29, 2024, 3:17pm

In this situation, I’m relying on the threshold set in the database. This means that it will apply for every query, and you can’t change it on the fly.

SET pg_trgm.similarity_threshold = 0.15;

More info about this can be found here:
Postgres Docs

Topic		Replies	Views
Setting trigram similarity thresholds? Using the ORM	5	1876	April 30, 2023
`TrigramSimiliarity` on `ArrayField` Using the ORM	10	1423	November 3, 2023
order_by doesn't seem to order by UTF-8 Using the ORM	5	1915	October 5, 2023
Trying to map a Postgres SQL Query to an ORM queryset Using the ORM	2	147	May 25, 2024
Remove "stop words" from Postgres full text search Using the ORM	1	1007	December 22, 2022

Using PostgreSQL Similarity % Operator with the ORM

Related Topics