Hi,
So I wanted to add a search functionality to one of my personal projects.
The search proccess is a bit complicated, because its on m2m field (tags), as described:
Search for all Tags in the db
Search for tags with lower name equals to the substrings in the search string
Search For resumes with the required tags.
I’m looking for ways to optimize the search proccess.
Would like to hear what you think about it.
The Code:
class ResumeListView(OwnerListView):
"""Display all the resumes"""
model = Resume
ordering = ['-created_at']
# template_name = "resumes/<modelName>_list.html"
queryset = Resume.objects.prefetch_related('tags', 'author', 'author__profile')
def get_queryset(self):
self.queryset = super(ResumeListView, self).get_queryset()
# Check for searchTerm existence
searchTerm = self.request.GET.get("search", False)
if searchTerm:
# Find all existing tags
exists_tags = Tag.objects.annotate(lower_name=Lower('name'))
# Find all existing tags names (in lower case)
existing_tags_lower_name = exists_tags.values_list('lower_name', flat=True)
# Build a REGEX to help find the tags names that is in the search string
look_for = "|".join(f'\\b{p}\\b' for p in existing_tags_lower_name)
# find all expressions from the search string
required_tags_lower_name = re.findall(look_for, searchTerm.lower())
# Find the Tags instances themselves
tags_required = exists_tags.filter(lower_name__in=required_tags_lower_name).values_list('id', flat=True)
# By now, I have all the tags the user search for
# Lets look for the resumes associated with them
# Query which resumes have the wanted tags, order by the match score.
Q_query = Q(tags__in=tags_required)
self.queryset = self.queryset.filter(tags__isnull=False).distinct().annotate(score=Count('tags', filter=Q_query)).filter(score__gt=0).order_by('-score')
return self.queryset
Please help us understand the model structure a little better.
You’ve got one model named Resume.
The Resume model has an M-M relationship with a Tag model?
The Tag model has a field with the “tag name” in it. What is the name of that field?
The input is a list of tag names? How is that input formatted? (What does it look like being sent from the browser, e.g. What does self.request.GET.get("search", False) look like.)
Your query shows that you’re scoring the Resumes based upon the number of matching tags? (Most number of matching tags comes first) (You didn’t list that in your numbered list of requirements, just wanted to confirm it was needed)
The input is from html input type - ‘text’. (so a simple str)
What separates the tags on input? (Please provide a sample of the query variable)
Its a open text box… nothing separates the tags…
It can be any string…
This is why the search process is a bit complicated.
Its directly from the user (security risk?)
Probably not. You’re not adding it to the database or using it directly to render output.
Thanks
Basically, I believe you should be able to:
Create a list of lower-case tags from the input.
I’m doing it right after I make a list of all existing Tags from the DB.
I need to do it, because I need to identify the Tag (lowername) from the given string…
Create your base Resume query as .filter(tags__name__in=<list from previous step>)
Done this in the get_queryset.
And then annotate / score as necessary.
Yeah, but after this I need to apply the ordering defined in the cls level…
I think I have a problem from the architecture aspect…
Because of the lost of ordering…
Dont sure why I lost it…
Edit:
It because of a call to ‘order_by’ will override any previous call…
That’s not correct. There is no need to make a list of all tags. The filter I provided demonstrates that you can search against the tag names directly.
Again, please provide one or more examples of what you’re trying to explain here.
Strictly speaking, what you’re saying is that someone could enter “phpythonjavascript” and that you’re supposed to match on all of “php”, “python”, “java”, “javascript”, “script”, and “c”.
(If that’s true, then I would suggest you have a more fundamental UI issue you may want to address.)
The length of the function isn’t a problem - get_queryset can be as long as it needs to be to return the proper queryset. (I personally am not a fan of that type of comment style, but that’s a personal bias and not a judgement of quality or appropriateness.)
If there is an issue with it, it’s all the excess work you’re doing with the tags. That seems to be a lot of unnecessary work being done on every request.
The length of the function isn’t a problem - get_queryset can be as long as it needs to be to >return the proper queryset. (I personally am not a fan of that type of comment style, but >that’s a personal bias and not a judgement of quality or appropriateness.)
The many Comments ?
Its there only because, its there because I still working on it…
If there is an issue with it, it’s all the excess work you’re doing with the tags. That seems to be a lot of unnecessary work being done on every request.
Thats one reason I opened the post…
how would you done it ?
Using check boxes for the tags ? or maybe some multiselect form ?
I looked at doing it with an open string, as a challenge…
Split the string into individual elements, converted to lower case.
Use a filter to match the elements to tags. If you’re trying to get a count, do the query in an annotation clause to annotate each Resume with the count of matching tags.
There’s no reason to preload all the tags and use a regex to search them.
This is one of the things I didnt really understand.
You have to check the given string (example:“Python DeSign Patterns”) against the existing tags some how…
If you do it as explained, what will happen when a user will enter something that is not a tag lower name ?
“Python Design patterns <-not a tag->”
Oh, sorry its been bleached…
If you do it as explained, what will happen when a user will enter something that is not a tag lower name ?
“Python Design patterns <-not a tag->”
For now I have the tags: C, Python, Design Patterns, Java
for any given string: (that may contain tag names that are separated)
I expect to:
analyze the string, and to find any tags it maybe contains (using the regex)
search the resumes that match the above tags. (if no tag was given - return the full list)
order the resumes by the score calculated (score is what was the matching percentage-ish)
but the tags have to be separated, so “pythonc” will not do the process for “python” and “c”.
“pythonc” or any string that will not contain a valid tag - will return an empty results.
I will explain my all process:
At first I tried to split the given string using " " (space), but then I have problems with expressions like “design patterns”, so I looked into a way to solved it, and reached to REGEX.
It was some progress, but then I had the problem that when I searched the string, I dont really know what are the Tags name are…
And this is why I done this Tags lookup (I also dont like it because it force additional query to the db)
You’ve set yourself up with a fairly icky situation. My first inclination is to think there’s some other way of organizing your tags to make this easier to process. I’ll have to think about this a little.