Case insensitive pattern lookups (icontains, istartswith and iendswith) in postgres don't use ILIKE and thereby miss out on potential performance boosts from Gin indexes

Steps to reproduce and explanation:

  1. Create a charfield or a textfield
  2. Query it using icontains, istartswith or iendswith
  3. Resulting query looks something like (pun unintended) …UPPER(field_name) LIKE UPPER(value) ← this can’t make use of any Gin indexes on that field
  4. On the other hand, if it compiled to field_name ILIKE value it could and would be a great performance boost!

For more details (and as a starting point towards a fix) check operators and pattern_ops fields of the class DatabaseWrapper in django/db/backends/postgresql/base.py.
My initial impression is that they should be changed to:

{
        "exact": "= %s",
        "iexact": "= UPPER(%s)",
        "contains": "LIKE %s",
        "icontains": "ILIKE %s",
        "regex": "~ %s",
        "iregex": "~* %s",
        "gt": "> %s",
        "gte": ">= %s",
        "lt": "< %s",
        "lte": "<= %s",
        "startswith": "LIKE %s",
        "endswith": "LIKE %s",
        "istartswith": "ILIKE %s",
        "iendswith": "ILIKE %s",
}

and

{
        "contains": "LIKE '%%' || {} || '%%'",
        "icontains": "ILIKE '%%' || {} || '%%'",
        "startswith": "LIKE {} || '%%'",
        "istartswith": "ILIKE {} || '%%'",
        "endswith": "LIKE '%%' || {}",
        "iendswith": "ILIKE '%%' || {}",
    }

respectively.

I’d assume this was either overlooked or intentionally left less-performant for backward-compatibility with previous versions of postgres or something like that. I’m posting this in hopes that it’s the former and not the latter

I think you might interested in the following tickets

The TL;DR is that Django never got into the business of creating LIKE specialized indices of all text columns and thus defaults to the UPPER approach (which could also take advantage of a b-tree index on UPPER(field_name)).

Now since the ORM allows you to define custom lookups or even override the existing ones the consensus was that we’d document what SQL each lookups generate by default (note that no one pushed it through the finish line yet) and refer to the custom lookups documentation if a need to override them arise.

In the case of icontains you could use the following logic to achieve what you’re after.

from django.db.models.fields import Field
from django.db.models.lookups import IContains

class ILike(IContains):
    def get_rhs_op(self, connection, rhs):
        if connection.vendor == "postgres":
            return f"ILIKE {rhs}"
        return super().get_rhs_op(connection, rhs)

# This will override `__icontains` to use `ILIKE` on Postgres
Field.register_lookup(ILike)

# This will register a new `__ilike` lookup while keeping
# `__icontains` unchanged on Postgres
Field.register_lookup(ILike, lookup_name="ilike")

Hello, I’ve tried implementing your solution, but only the part after ILIKE changes and I get something like this:
WHERE UPPER("scsapp_pack"."barcode"::text) ILIKE '%D250301%'

Do you have any solution to make the ORM also change the left part?

I’ve been working on this exact thing over here, and you’re right, the suggested solution only affects the right-hand side, but keeps the UPPER in the left-hand side. Here’s a simple working version of what I have come up so far, with the caveat that this only works for static values, and won’t work with any more advanced references or SQL expressions in the right-hand side. That part is still work-in-progress, and I can’t promise I’ll get around to it anytime soon.

from django.db.models import Field
from django.db.models.lookups import IContains, IEndsWith, IStartsWith


class ILikeMixin:
    def process_lhs(self, compiler, connection, lhs=None):
        # We pretend to be the case-sensitive counterpart to this lookup
        # while processing the left-hand side to skip the UPPER call
        # inserted for case-insensitive lookups by connection.ops, but
        # otherwise maintain the actual lookup_name.
        self.lookup_name = self.lookup_name.lstrip("i")
        try:
            return super().process_lhs(compiler, connection, lhs)
        finally:
            del self.__dict__["lookup_name"]

    def get_rhs_op(self, connection, rhs):
        return "ILIKE %s" % rhs


@Field.register_lookup
class IContainsILike(ILikeMixin, IContains):
    pass


@Field.register_lookup
class IStartsWithILike(ILikeMixin, IStartsWith):
    pass


@Field.register_lookup
class IEndsWithILike(ILikeMixin, IEndsWith):
    pass
1 Like