DRF search customization

I have a ViewSet that have two search_fields.
The problem is I want to ignore any apostrophe or quotes in search.
This means
The DB contains - (Andy’s shoes)
The user inputs

  1. (Andy"s shoes)
  2. (Andy`s shoes)
  3. (Andy’s shoes)
    in search field.

For now, the search only works for #3, I want to make it work for all of them(1, 2, 3).

What is the correct approach to get this done?

Search can be a complicated topic and there are a range of answers depending on your needs. Roughly, you could:

  1. Use LIKE queries in your db (I don’t think that will work for the scenario you’ve described)
  2. If you database is Postgres, use some of its built-in searching capabilities.
  3. Use a third party service like Algolia to do advanced seaching.
  4. Add a search tool like ElasticSearch to your setup.

@wsvincent presented on this topic at DjangoCon 2019. You might want to check out that video for more details and options.

I will take a look at that

But I think there’s a way to do this without involving any third-party tools.
I want to get it done with Django.

So I think you may need to be a little more precise as to what you’re looking for.

You wrote:

The problem is I want to ignore any apostrophe or quotes in search.

That’s actually ambiguous. You mention:

The DB contains - (Andy’s shoes)

This isn’t a trivial issue, there are many edge-cases and conditions that may apply. Being very precise regarding those edge cases may affect what solutions are considered viable. There are a lot more conditions than the subset you supplied.

You show three options for what matches. What about the following inputs?

  • Andy’'s shoes (That’s two adjacent apostrophes)
  • Andys shoes
  • Andys’ shoes
  • Andy s shoes
  • Andy-s shoes
  • Andys shoes ( - common wildcard matching character)
  • Andy.s shoes (. - the regex match-anything character)
  • Andy%s shoes (% - the SQL match any character)

Then, what if your database contains something like “Andys shoes” - do any of these match?

  • Andy’s shoes, Andy"s shoes, Andy`s shoes
  • Andys’ shoes
  • Andy s shoes
  • Andy-s shoes

Likewise, what if your database contains something like “Andys’ shoes” - do any of these match?

  • Andy’s shoes, Andy"s shoes, Andy`s shoes
  • Andys" shoes, Andys` shoes
  • Andy s shoes
  • Andy-s shoes
  • Andys shoes

These may all be considered as different cases depending upon what you mean by “ignore”. While the ‘s form is the valid possessive form for proper names not ending with s, there are other situations where a proper name ends with ‘s’, leading to the possessive form being "Jones’ "; or the syntactic and semantic difference between " its " (possession) and “it’s” (contraction of it is). How you choose to handle these conditions will affect the accuracy of your searches.

(Note: One option that I have seen done in a slightly different situation is that there is a second column containing the “search data”. The basic field contains the raw / entered data for that row, while the “search data” column contains data that has been normalized to a standard format - and all searches for data in that table is performed on the normalized data.)

Ah, to be precise, I only consider these characters - [`, ", '] 3 characters

I think this can be done using raw SQL query with regular expression.
Then I need to override that method used to filter the data based on search_fields.
But I am not sure which method it is on ModelViewSet.

I just want to add that extra filtering to current queryset

I solved the issue by overriding filter_queryset method of SearchFilter.
Here’s the code for that.

class CustomSearchFilter(SearchFilter):
    def filter_queryset(self, request, queryset, view):
        search_fields = self.get_search_fields(view, request)
        search_terms = self.get_search_terms(request)

        if not search_fields or not search_terms:
            return queryset

        orm_lookups = [
            for search_field in search_fields

        base = queryset
        conditions = []
        for search_term in search_terms:
            queries = [
                models.Q(**{orm_lookup: self.get_regex_term(search_term)})
                if orm_lookup.endswith('iregex') else models.Q(**{orm_lookup: search_term})
                for orm_lookup in orm_lookups
            conditions.append(reduce(operator.or_, queries))
        queryset = queryset.filter(reduce(operator.and_, conditions))

        if self.must_call_distinct(queryset, search_fields):
            # Filtering against a many-to-many field requires us to
            # call queryset.distinct() in order to avoid duplicate items
            # in the resulting queryset.
            # We try to avoid this if possible, for performance reasons.
            queryset = distinct(queryset, base)
        return queryset

    def get_regex_term(self, term):
        return re.sub(r'[`\'"]', '[`\'"]', term)