Search Bar function inside a ListView

LiorA1 · August 10, 2021, 12:20pm

Hi,
So I wanted to add a search functionality to one of my personal projects.

The search proccess is a bit complicated, because its on m2m field (tags), as described:

Search for all Tags in the db
Search for tags with lower name equals to the substrings in the search string
Search For resumes with the required tags.

I’m looking for ways to optimize the search proccess.
Would like to hear what you think about it.

The Code:

class ResumeListView(OwnerListView):
    """Display all the resumes"""
    model = Resume
    ordering = ['-created_at']
    # template_name = "resumes/<modelName>_list.html"
    queryset = Resume.objects.prefetch_related('tags', 'author', 'author__profile')

    def get_queryset(self):
        self.queryset = super(ResumeListView, self).get_queryset()

        # Check for searchTerm existence
        searchTerm = self.request.GET.get("search", False)
        if searchTerm:
            # Find all existing tags
            exists_tags = Tag.objects.annotate(lower_name=Lower('name'))

            # Find all existing tags names (in lower case)
            existing_tags_lower_name = exists_tags.values_list('lower_name', flat=True)

            # Build a REGEX to help find the tags names that is in the search string
            look_for = "|".join(f'\\b{p}\\b' for p in existing_tags_lower_name)

            # find all expressions from the search string
            required_tags_lower_name = re.findall(look_for, searchTerm.lower())

            # Find the Tags instances themselves
            tags_required = exists_tags.filter(lower_name__in=required_tags_lower_name).values_list('id', flat=True)

            # By now, I have all the tags the user search for
            # Lets look for the resumes associated with them

            # Query which resumes have the wanted tags, order by the match score.
            Q_query = Q(tags__in=tags_required)
            self.queryset = self.queryset.filter(tags__isnull=False).distinct().annotate(score=Count('tags', filter=Q_query)).filter(score__gt=0).order_by('-score')

        return self.queryset

KenWhitesell · August 10, 2021, 12:48pm

Please help us understand the model structure a little better.

You’ve got one model named Resume.
The Resume model has an M-M relationship with a Tag model?
The Tag model has a field with the “tag name” in it. What is the name of that field?
The input is a list of tag names? How is that input formatted? (What does it look like being sent from the browser, e.g. What does self.request.GET.get("search", False) look like.)
Your query shows that you’re scoring the Resumes based upon the number of matching tags? (Most number of matching tags comes first) (You didn’t list that in your numbered list of requirements, just wanted to confirm it was needed)

LiorA1 · August 10, 2021, 1:07pm

Sorry,
I have a Resume model with ‘tags’ m2m field to model named ‘Tag’.
The Tag model has only one field - ‘name’ (and automatic ‘id’)

The input is from html input type - ‘text’. (so a simple str)
It doesnt have any unique characters…
Its directly from the user (security risk?)

I’m scoring the results… (will add it)

I think that I have some issues here:

I lost the ordering attribute from the ListView.
maybe have performance issues (This is the reason I post the question)
security issues (not sure about it, as this is from the user, but I search using REGEX…)

The relevant parts from models file:

class Resume(models.Model):
    """Resume Model"""

    resume_file = models.FileField(upload_to='uploads/resumes/')
    text = models.TextField(default="")
    tags = models.ManyToManyField('Tag', blank=True)

    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    author = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)

    @property
    def filename(self):
        return os.path.basename(self.resume_file.name)

    def __str__(self):
        return f'{self.resume_file.name} File'

    def get_absolute_url(self):
        return reverse('resumes:resume_detail', kwargs={'pk': self.pk})

...


class Tag(models.Model):
    """Tag Model (been added after the initial db design)"""
    name = models.CharField(max_length=25)

    def __repr__(self):
        return f'{self.name} (id:{self.pk})'

    def __str__(self):
        return f'{self.name}'

Full Project: (with link to the views file)

github.com

LiorA1/resume_reviews/blob/826d3cce4131c316800f781503766bff7009b816/resumes/views/resume_views.py#L27

    
      
          
          
import time
          from django.core.cache import cache
          
          

          
def home(request):
              time.sleep(5)
              return render(request, 'resumes/home.html')
          
          

          
class ResumeListView(OwnerListView):
              """Display all the resumes"""
              model = Resume
              ordering = ['-created_at']
              # template_name = "resumes/<modelName>_list.html"
              queryset = Resume.objects.prefetch_related('tags', 'author', 'author__profile')
          
          
    def get_queryset(self):
                  self.queryset = super(ResumeListView, self).get_queryset()
          
          
        # Check for searchTerm existence

KenWhitesell · August 10, 2021, 1:25pm

The input is from html input type - ‘text’. (so a simple str)
What separates the tags on input? (Please provide a sample of the query variable)

Its directly from the user (security risk?)
Probably not. You’re not adding it to the database or using it directly to render output.

Basically, I believe you should be able to:

Create a list of lower-case tags from the input.
Create your base Resume query as .filter(tags__name__in=<list from previous step>)
And then annotate / score as necessary.

LiorA1 · August 10, 2021, 2:20pm

The input is from html input type - ‘text’. (so a simple str)
What separates the tags on input? (Please provide a sample of the query variable)

Its a open text box… nothing separates the tags…
It can be any string…
This is why the search process is a bit complicated.

Its directly from the user (security risk?)
Probably not. You’re not adding it to the database or using it directly to render output.

Thanks

Basically, I believe you should be able to:

Create a list of lower-case tags from the input.

I’m doing it right after I make a list of all existing Tags from the DB.
I need to do it, because I need to identify the Tag (lowername) from the given string…

Create your base Resume query as .filter(tags__name__in=<list from previous step>)

Done this in the get_queryset.

And then annotate / score as necessary.

Yeah, but after this I need to apply the ordering defined in the cls level…

I think I have a problem from the architecture aspect…
Because of the lost of ordering…
Dont sure why I lost it…

Edit:
It because of a call to ‘order_by’ will override any previous call…

KenWhitesell · August 10, 2021, 2:22pm

That’s not correct. There is no need to make a list of all tags. The filter I provided demonstrates that you can search against the tag names directly.

KenWhitesell · August 10, 2021, 2:27pm

Again, please provide one or more examples of what you’re trying to explain here.
Strictly speaking, what you’re saying is that someone could enter “phpythonjavascript” and that you’re supposed to match on all of “php”, “python”, “java”, “javascript”, “script”, and “c”.
(If that’s true, then I would suggest you have a more fundamental UI issue you may want to address.)

LiorA1 · August 10, 2021, 2:36pm

Its need to be separate by some boundary ‘\b’ from Regex looks sufficient…

The Tags can be “Java”, “Design Patterns” and such…

I prefer to do it using a text, because it looks more general way to do so…

LiorA1 · August 10, 2021, 2:41pm

For each resume I count how many “tags” exists, and then filter by the ‘score’ - have to be greater then 0…
I also order by “score”

KenWhitesell · August 10, 2021, 2:45pm

Ok, no problem there. You can annotate by count and order by the annotated value.

LiorA1 · August 10, 2021, 5:11pm

What about the structure of the logic ?
It feels to me like too long to ‘get_queryset’…

KenWhitesell · August 10, 2021, 5:15pm

The length of the function isn’t a problem - get_queryset can be as long as it needs to be to return the proper queryset. (I personally am not a fan of that type of comment style, but that’s a personal bias and not a judgement of quality or appropriateness.)

If there is an issue with it, it’s all the excess work you’re doing with the tags. That seems to be a lot of unnecessary work being done on every request.

LiorA1 · August 10, 2021, 5:44pm

The length of the function isn’t a problem - get_queryset can be as long as it needs to be to >return the proper queryset. (I personally am not a fan of that type of comment style, but >that’s a personal bias and not a judgement of quality or appropriateness.)

The many Comments ?
Its there only because, its there because I still working on it…

If there is an issue with it, it’s all the excess work you’re doing with the tags. That seems to be a lot of unnecessary work being done on every request.

Thats one reason I opened the post…
how would you done it ?
Using check boxes for the tags ? or maybe some multiselect form ?
I looked at doing it with an open string, as a challenge…

KenWhitesell · August 10, 2021, 5:50pm

I’m fine with allowing entry as a string.

Split the string into individual elements, converted to lower case.

Use a filter to match the elements to tags. If you’re trying to get a count, do the query in an annotation clause to annotate each Resume with the count of matching tags.

There’s no reason to preload all the tags and use a regex to search them.

LiorA1 · August 10, 2021, 7:10pm

This is one of the things I didnt really understand.
You have to check the given string (example:“Python DeSign Patterns”) against the existing tags some how…

If you do it as explained, what will happen when a user will enter something that is not a tag lower name ?
“Python Design patterns <-not a tag->”

KenWhitesell · August 10, 2021, 7:50pm

You convert the entered string to lower case so that the comparison is done with lowered characters.

LiorA1 · August 10, 2021, 7:53pm

Oh, sorry its been bleached…
If you do it as explained, what will happen when a user will enter something that is not a tag lower name ?
“Python Design patterns <-not a tag->”

KenWhitesell · August 10, 2021, 8:05pm

Let’s look at this question a different way - and I think I’m going to need more clarity with the data you’re working with.

Someone enters
Python Design patterns

What tags do you have, and what do you expect to have happen?

LiorA1 · August 10, 2021, 8:30pm

For now I have the tags: C, Python, Design Patterns, Java

for any given string: (that may contain tag names that are separated)
I expect to:

analyze the string, and to find any tags it maybe contains (using the regex)
search the resumes that match the above tags. (if no tag was given - return the full list)
order the resumes by the score calculated (score is what was the matching percentage-ish)

but the tags have to be separated, so “pythonc” will not do the process for “python” and “c”.
“pythonc” or any string that will not contain a valid tag - will return an empty results.

I will explain my all process:
At first I tried to split the given string using " " (space), but then I have problems with expressions like “design patterns”, so I looked into a way to solved it, and reached to REGEX.
It was some progress, but then I had the problem that when I searched the string, I dont really know what are the Tags name are…
And this is why I done this Tags lookup (I also dont like it because it force additional query to the db)

I hope this clarify a bit…

KenWhitesell · August 10, 2021, 9:02pm

It does, thank you.

You’ve set yourself up with a fairly icky situation. My first inclination is to think there’s some other way of organizing your tags to make this easier to process. I’ll have to think about this a little.

Topic		Replies	Views
Search Bar returns SQL on the GET query Templates & Frontend	1	386	October 21, 2022
Saving user-defined query filtering logic in a model Using Django	0	439	July 21, 2022
how to query a field which is a list of strings? Using the ORM	5	5224	January 5, 2024
Elegant way __in insensitive case Using the ORM	14	3584	December 16, 2023
Searchbuilder Using the ORM	2	427	November 17, 2022

Search Bar function inside a ListView

Related topics