I am trying to find search terms fitting to certain keywords. These are my models:
class SearchTerm(models.Model):
text = models.CharField(null=False, unique=True, blank=False, max_length=80)
class KeyWord(models.Model):
text = models.CharField(null=False, unique=True, blank=False, max_length=80)
Some sample data could be:
SearchTerm
----------
smith
april smith
shirley lee
travis lopez
eric berry lee
jason meza
KeyWord
-------
smith
lee berry
lopez travis adams
The simplest way probably is, adding something to SearchTerm like:
def is_checked(self):
…
and then iterate through all keywords and do something like
if set(key_word.text.split()).issubset(
set(self.text.split())
):
return True
The result would be:
-
True
for “smith
” because of keyword “smith
” -
True
for “april smith
” because of keyword “smith
” -
False
for “shirley lee
” -
False
for “travis lopez
” -
True
for “eric berry lee
” because of keyword “lee berry
” -
False
forjason meza
This works fine, but takes a while if there are many keywords (a couple of thousands). How could this be done more efficiently?
What about filtering the keywords first instead of checking them all? So I could get rid of keywords which do not contain a word of the search term at all. Could this be done easier than splitting the search term, filtering KeyWords for each part and then to “union” the results together?
Another idea was to store the number of words in KeyWord and compare this to the number of words of the search_term. It’s expendable to check a keyword like lopez travis adams
for example for travis lopez
because it’s simply too long.
Actually, that’s probably quite a standard problem and there should be better solutions than mine. I would be grateful for every hint. Thanks a lot!