Search Bar function inside a ListView

Hi,
So I wanted to add a search functionality to one of my personal projects.

The search proccess is a bit complicated, because its on m2m field (tags), as described:

  1. Search for all Tags in the db
  2. Search for tags with lower name equals to the substrings in the search string
  3. Search For resumes with the required tags.

I’m looking for ways to optimize the search proccess.
Would like to hear what you think about it.

The Code:

class ResumeListView(OwnerListView):
    """Display all the resumes"""
    model = Resume
    ordering = ['-created_at']
    # template_name = "resumes/<modelName>_list.html"
    queryset = Resume.objects.prefetch_related('tags', 'author', 'author__profile')

    def get_queryset(self):
        self.queryset = super(ResumeListView, self).get_queryset()

        # Check for searchTerm existence
        searchTerm = self.request.GET.get("search", False)
        if searchTerm:
            # Find all existing tags
            exists_tags = Tag.objects.annotate(lower_name=Lower('name'))

            # Find all existing tags names (in lower case)
            existing_tags_lower_name = exists_tags.values_list('lower_name', flat=True)

            # Build a REGEX to help find the tags names that is in the search string
            look_for = "|".join(f'\\b{p}\\b' for p in existing_tags_lower_name)

            # find all expressions from the search string
            required_tags_lower_name = re.findall(look_for, searchTerm.lower())

            # Find the Tags instances themselves
            tags_required = exists_tags.filter(lower_name__in=required_tags_lower_name).values_list('id', flat=True)

            # By now, I have all the tags the user search for
            # Lets look for the resumes associated with them

            # Query which resumes have the wanted tags, order by the match score.
            Q_query = Q(tags__in=tags_required)
            self.queryset = self.queryset.filter(tags__isnull=False).distinct().annotate(score=Count('tags', filter=Q_query)).filter(score__gt=0).order_by('-score')

        return self.queryset

Please help us understand the model structure a little better.

You’ve got one model named Resume.
The Resume model has an M-M relationship with a Tag model?
The Tag model has a field with the “tag name” in it. What is the name of that field?
The input is a list of tag names? How is that input formatted? (What does it look like being sent from the browser, e.g. What does self.request.GET.get("search", False) look like.)
Your query shows that you’re scoring the Resumes based upon the number of matching tags? (Most number of matching tags comes first) (You didn’t list that in your numbered list of requirements, just wanted to confirm it was needed)

Sorry,
I have a Resume model with ‘tags’ m2m field to model named ‘Tag’.
The Tag model has only one field - ‘name’ (and automatic ‘id’)

The input is from html input type - ‘text’. (so a simple str)
It doesnt have any unique characters…
Its directly from the user (security risk?)

I’m scoring the results… (will add it)

I think that I have some issues here:

  1. I lost the ordering attribute from the ListView.
  2. maybe have performance issues (This is the reason I post the question)
  3. security issues (not sure about it, as this is from the user, but I search using REGEX…)

The relevant parts from models file:

class Resume(models.Model):
    """Resume Model"""

    resume_file = models.FileField(upload_to='uploads/resumes/')
    text = models.TextField(default="")
    tags = models.ManyToManyField('Tag', blank=True)

    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    author = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)

    @property
    def filename(self):
        return os.path.basename(self.resume_file.name)

    def __str__(self):
        return f'{self.resume_file.name} File'

    def get_absolute_url(self):
        return reverse('resumes:resume_detail', kwargs={'pk': self.pk})

...


class Tag(models.Model):
    """Tag Model (been added after the initial db design)"""
    name = models.CharField(max_length=25)

    def __repr__(self):
        return f'{self.name} (id:{self.pk})'

    def __str__(self):
        return f'{self.name}'

Full Project: (with link to the views file)

The input is from html input type - ‘text’. (so a simple str)
What separates the tags on input? (Please provide a sample of the query variable)

Its directly from the user (security risk?)
Probably not. You’re not adding it to the database or using it directly to render output.

Basically, I believe you should be able to:

  • Create a list of lower-case tags from the input.
  • Create your base Resume query as .filter(tags__name__in=<list from previous step>)
  • And then annotate / score as necessary.

The input is from html input type - ‘text’. (so a simple str)
What separates the tags on input? (Please provide a sample of the query variable)

Its a open text box… nothing separates the tags…
It can be any string…
This is why the search process is a bit complicated.

Its directly from the user (security risk?)
Probably not. You’re not adding it to the database or using it directly to render output.

Thanks

Basically, I believe you should be able to:

  • Create a list of lower-case tags from the input.

I’m doing it right after I make a list of all existing Tags from the DB.
I need to do it, because I need to identify the Tag (lowername) from the given string…

  • Create your base Resume query as .filter(tags__name__in=<list from previous step>)

Done this in the get_queryset.

  • And then annotate / score as necessary.

Yeah, but after this I need to apply the ordering defined in the cls level…

I think I have a problem from the architecture aspect…
Because of the lost of ordering…
Dont sure why I lost it…

Edit:
It because of a call to ‘order_by’ will override any previous call…

That’s not correct. There is no need to make a list of all tags. The filter I provided demonstrates that you can search against the tag names directly.

Again, please provide one or more examples of what you’re trying to explain here.
Strictly speaking, what you’re saying is that someone could enter “phpythonjavascript” and that you’re supposed to match on all of “php”, “python”, “java”, “javascript”, “script”, and “c”.
(If that’s true, then I would suggest you have a more fundamental UI issue you may want to address.)

Its need to be separate by some boundary ‘\b’ from Regex looks sufficient…

The Tags can be “Java”, “Design Patterns” and such…

I prefer to do it using a text, because it looks more general way to do so…

For each resume I count how many “tags” exists, and then filter by the ‘score’ - have to be greater then 0…
I also order by “score”

Ok, no problem there. You can annotate by count and order by the annotated value.

What about the structure of the logic ?
It feels to me like too long to ‘get_queryset’…

The length of the function isn’t a problem - get_queryset can be as long as it needs to be to return the proper queryset. (I personally am not a fan of that type of comment style, but that’s a personal bias and not a judgement of quality or appropriateness.)

If there is an issue with it, it’s all the excess work you’re doing with the tags. That seems to be a lot of unnecessary work being done on every request.

The length of the function isn’t a problem - get_queryset can be as long as it needs to be to >return the proper queryset. (I personally am not a fan of that type of comment style, but >that’s a personal bias and not a judgement of quality or appropriateness.)

The many Comments ?
Its there only because, its there because I still working on it…

If there is an issue with it, it’s all the excess work you’re doing with the tags. That seems to be a lot of unnecessary work being done on every request.

Thats one reason I opened the post…
how would you done it ?
Using check boxes for the tags ? or maybe some multiselect form ?
I looked at doing it with an open string, as a challenge…

I’m fine with allowing entry as a string.

Split the string into individual elements, converted to lower case.

Use a filter to match the elements to tags. If you’re trying to get a count, do the query in an annotation clause to annotate each Resume with the count of matching tags.

There’s no reason to preload all the tags and use a regex to search them.

This is one of the things I didnt really understand.
You have to check the given string (example:“Python DeSign Patterns”) against the existing tags some how…

If you do it as explained, what will happen when a user will enter something that is not a tag lower name ?
“Python Design patterns <-not a tag->”

You convert the entered string to lower case so that the comparison is done with lowered characters.

1 Like

Oh, sorry its been bleached…
If you do it as explained, what will happen when a user will enter something that is not a tag lower name ?
“Python Design patterns <-not a tag->”

Let’s look at this question a different way - and I think I’m going to need more clarity with the data you’re working with.

Someone enters
Python Design patterns

What tags do you have, and what do you expect to have happen?

1 Like

For now I have the tags: C, Python, Design Patterns, Java

for any given string: (that may contain tag names that are separated)
I expect to:

  1. analyze the string, and to find any tags it maybe contains (using the regex)
  2. search the resumes that match the above tags. (if no tag was given - return the full list)
  3. order the resumes by the score calculated (score is what was the matching percentage-ish)

but the tags have to be separated, so “pythonc” will not do the process for “python” and “c”.
“pythonc” or any string that will not contain a valid tag - will return an empty results.

I will explain my all process:
At first I tried to split the given string using " " (space), but then I have problems with expressions like “design patterns”, so I looked into a way to solved it, and reached to REGEX.
It was some progress, but then I had the problem that when I searched the string, I dont really know what are the Tags name are…
And this is why I done this Tags lookup (I also dont like it because it force additional query to the db)

I hope this clarify a bit…

It does, thank you.

You’ve set yourself up with a fairly icky situation. My first inclination is to think there’s some other way of organizing your tags to make this easier to process. I’ll have to think about this a little.

1 Like