Setting trigram similarity thresholds?

chrisjdixon · April 26, 2023, 3:59am

Asked over at SO, we’ve set up a __trigram_similar lookup which works nicely, except we don’t know how to adjust “the current similarity threshold” that the docs mention.

No responses on SO and we’re unsure how to better describe the issue, so we might be out of luck. Can we adjust the threshold somewhere in queryset.filter(colname__trigram_similar= search_value) ?

Thanks for any help!

KenWhitesell · April 26, 2023, 1:17pm

If you follow those docs through to the PostgreSQL docs, you’ll see that the pg_trgm.similarity_threshold is changed by a SET command and is not a parameter to any of the query functions.

From what I can see at PostgreSQL: Documentation: 15: SET, this probably isn’t something that you can set on the fly. (In other words, I don’t think you could change this on a per-query basis.) If you really want it “project-wide”, there may be a way of specifying that setting in the OPTIONS section of your DATABASES settings.

However, if you do want to filter search results based on the similarity, see the example at Full text search | Django documentation | Django.

leandrodesouzadev · April 26, 2023, 1:33pm

Hey there!
This is from the extension docs.
This looks like a configuration done on the database, i had found this SO post on how to set it.

chrisjdixon · April 27, 2023, 1:44am

OK, thank you for the help. I’m confused by your last point about “filter[ing] search results based on the similarity”. Isn’t that what I’m asking?

This SO answer also seems to set it on the fly with .filter(similarity__gt=0.3).order_by('-similarity'), although this is applied in a different way I don’t understand. Could it be done like this somehow or do I have the wrong idea?

Setting the similarity threshold across our different environments isn’t a big deal but we’re keen to minimize the number of moving parts we have to think about. We can always set it in Postgres with no dramas.

KenWhitesell · April 27, 2023, 2:18am

I wasn’t sure. It wasn’t clear to me whether this was something you wanted on a per-query basis or a global setting.

Specifically, you asked:

and

The mechanism shown in the docs doesn’t adjust “the current similarity threshold” as you asked. It annotates each instance of the model in the query with the current similarity score. It’s then up to you filter the queryset based on the value added to the row. This results in a different SQL query being generated by the ORM

No, that not an accurate restatement of the answer provided.

Again, it’s not setting or adjusting anything - at the database layer, these are two separate and distinct operations being performed.

The trigram_similar operation only returns rows that have a similarity score greater than the defined value. It is a filter in that it only returns rows with a score greater than the defined value.

This other solution annotates every row with that similarity score, leaving it up to you to add a filter. It’s a different database operation being performed.

It actually gives you more flexibility in that you can now sort by that score, or even apply other filters to categorize the results into groupings.

Now, this distinction may not matter to you - and that’s fine. You probably don’t need to account for this difference. But you should be aware of it in case you ever encounter a situation where it’s important.

chrisjdixon · April 30, 2023, 3:08am

Thank you very much guys, and thank you for explaining the difference between setting it on the fly and what that other answer was doing. We ended up setting it manually in Postgres without issue. Should be fine until we forget about it moving environments!

Topic		Replies	Views
Using PostgreSQL Similarity % Operator with the ORM Using the ORM	6	296	July 29, 2024
`TrigramSimiliarity` on `ArrayField` Using the ORM	10	1443	November 3, 2023
order_by doesn't seem to order by UTF-8 Using the ORM	5	1933	October 5, 2023
Remove "stop words" from Postgres full text search Using the ORM	1	1028	December 22, 2022
Docs / docs (pull_request) Failing after 1m ORM	23	508	January 14, 2024

Setting trigram similarity thresholds?

Related topics