How to check search terms for keywords efficiently?

I am trying to find search terms fitting to certain keywords. These are my models:

class SearchTerm(models.Model):
    text = models.CharField(null=False, unique=True, blank=False, max_length=80)


class KeyWord(models.Model):
    text = models.CharField(null=False, unique=True, blank=False, max_length=80)

Some sample data could be:

SearchTerm
----------
smith
april smith
shirley lee
travis lopez
eric berry lee
jason meza

KeyWord
-------
smith
lee berry
lopez travis adams

The simplest way probably is, adding something to SearchTerm like:

def is_checked(self):
    …

and then iterate through all keywords and do something like

if set(key_word.text.split()).issubset(
                    set(self.text.split())
                ):
    return True

The result would be:

  1. True for “smith” because of keyword “smith
  2. True for “april smith” because of keyword “smith
  3. False for “shirley lee
  4. False for “travis lopez
  5. True for “eric berry lee” because of keyword “lee berry
  6. False for jason meza

This works fine, but takes a while if there are many keywords (a couple of thousands). How could this be done more efficiently?

What about filtering the keywords first instead of checking them all? So I could get rid of keywords which do not contain a word of the search term at all. Could this be done easier than splitting the search term, filtering KeyWords for each part and then to “union” the results together?

Another idea was to store the number of words in KeyWord and compare this to the number of words of the search_term. It’s expendable to check a keyword like lopez travis adams for example for travis lopez because it’s simply too long.

Actually, that’s probably quite a standard problem and there should be better solutions than mine. I would be grateful for every hint. Thanks a lot!

Hi

It looks like you’re trying to implement full text search.

Django has support for PostgreSQL’s full text search: Full text search | Django documentation | Django

See also Paulo’s talk and slides on the topic: Paolo Melchiorre - Full-Text Search in Django with PostgreSQL - YouTube / pauloxnet – Full-Text Search in Django with PostgreSQL .

Thanks a lot for your message. I already had a quick look at full text search and my impression was, this is way too much for my needs. I do not need most of its features like stemming, ranking, stop words, multiple language support and so on.

What I need is more like tagging. I could save a tag for each word of my search term resp. keyword and then filter for these tags. Do you think it’s worth to check out a modul like django-taggit for that?

OK, meanwhile I tried django-taggit and it’s quite easy to add tags for keywords with that. I also played a little bit with permutations.

Then I found a bug in my original program which I fixed and now it’s “fast enough” for my needs even without tagging. Anyway, it’s good to know that there is still room for improvement, just in case it’s getting too slow when the number of keywords grows.

Ok great!

Yes a tag based solution is what you were looking for, rather than full text search. You don’t necessarily need to use a package for it - you’re really just looking for a tag model to relate to with a ManyToManyField.