returning dictionary incorrectly

atheer331 · May 20, 2023, 8:28pm

i am trying to use spacy and read a CV file to match the text with a set of skills in a dictionary i made, the function check_all_majors returns dictionary with skills that are related to one major only instead of the skills related to all majors it has found in the cv

nlp = spacy.load('en_core_web_sm')

skill_dict = {
    'Computer Science': {'Python', 'Java', 'C++', 'machine learning', 'data structures', 'algorithms'},
    'Electrical Engineering': {'circuit design', 'power systems', 'analog electronics', 'digital signal processing'},
    'Mechanical Engineering': {'CAD', 'mechanical design', 'materials science', 'thermodynamics'},
    'Statistics': {'metrices', 'statistic', 'algorithm', 'mathmatics'}
}

def tokenize_cv(cv_file):
    file_extension = cv_file.name.split('.')[-1]
    if file_extension == 'docx':
        cv_text = docx2txt.process(cv_file)
    elif file_extension == 'txt':
        cv_text = cv_file.read().decode('utf-8')
    elif file_extension == 'rtf':
        cv_text = textract.process(cv_file).decode('utf-8')
    else:
        raise ValueError('Unsupported file type')
    print('CV text:', cv_text)
    doc = nlp(cv_text)
    print('Spacy tokens:', [token.text for token in doc])
    tokens = [token.text.lower() for token in doc if not token.is_stop and not token.is_punct]
    print('Filtered tokens:', tokens)
    return tokens


def check_all_majors(tokens):
    matches = {}
    for major, skills in skill_dict.items():
        intersect = skills.intersection(set(tokens))
        if intersect:
            matches[major] = list(intersect)
    return matches

@login_required(login_url='login')
def editAccount(request):

    
    if request.user.is_Seeker:
        seeker = request.user.seeker
        form = SeekerAccountForm(instance=seeker)
        

    elif request.user.is_Recruiter:
        recruiter = request.user.recruiter
        form = RecruiterAccountForm(instance=recruiter)

        
    AllSkills = []

    if request.method == 'POST':
        if request.user.is_Seeker:
            form = SeekerAccountForm(request.POST, request.FILES, instance=seeker)
            if form.is_valid():
                # Validate file extension
                file = request.FILES['cv']
                try:
                    # Validate the file extension
                    validate_word_or_text_file(file)
                    
                except ValidationError as e:
                    form.add_error('cv', e)
                    messages.error(request, 'the cv format is not accepted, Try (.docx , .txt , .rtf)')
                    return render(request, 'account-edit.html', {'form': form})
                
                file = request.FILES.get('cv', None)
                if file:
                    tokens = tokenize_cv(file)
                    print('tokens:' , tokens)
                    matches = check_all_majors(tokens)
                    for major, skills in matches.items():
                        for skill in skills:
                            skill_obj = Skill(owner=seeker, category=major, name=skill)
                            skill_obj.save()
                            print(f'Saved skill {skill_obj.name} in category {skill_obj.category}')
                            AllSkills.append(skill_obj)
                form.save()
                messages.success(request, 'Your account has been updated!')
                return redirect('account')
        elif request.user.is_Recruiter:
            form = RecruiterAccountForm(request.POST, request.FILES, instance=recruiter)
            if form.is_valid():
                form.save()
                messages.success(request, 'Your account has been updated!')
                return redirect('account')

    context = {'form': form}
    if request.user.is_Seeker and seeker is not None:
        context['cv_skills'] = AllSkills
    return render(request, 'account-edit.html', context)

KenWhitesell · May 20, 2023, 11:21pm

Side note: In the future, please surround your code with lines of three backtick - ` characters. This means you’ll have a line of ```, then your code, then another line of ```. (I’ve taken the liberty of doing that with this post - this is more a reminder for the future.)

My first thought here is that you’re creating a list of tokens that are all lowercase, but you have elements in your skill_dict with capital letters.

My suggestion would be to create some test cases to test check_all_majors with known data to verify that it operates as expected. You’ve got some print statements already to show data being presented, it might also help if you posted the output of one of your tests.

atheer331 · May 20, 2023, 11:39pm

i updated my dictionary and made all of them in lowercase and now i can see python and java and c++ being matched from the cv with the dictionary, but two of them still didn’t appear: circuit design and power systems, from the print statements i know the method tokenize_cv works correctly but i think the issue is with check_all_majors since it didn’t print all the skills that matches

atheer331 · May 21, 2023, 1:43am

the problem appears that tokenize_cv will return a skill like (power systems) as two separated words instead of one phrase

Topic		Replies	Views
problem in entering user's data Forms & APIs	1	186	June 2, 2023
Django Fellow Report - Sarah - 2024 Django Internals	43	1561	January 6, 2025
TypeError at / 'dict' object is not callable Mystery Errors	7	648	June 18, 2024
Django Fellow Report - Natalia - 2024 Django Internals	54	2206	January 6, 2025
Django Fellow Report - Sarah - 2025 Django Internals	39	1239	September 30, 2025

returning dictionary incorrectly

Related topics