returning dictionary incorrectly

i am trying to use spacy and read a CV file to match the text with a set of skills in a dictionary i made, the function check_all_majors returns dictionary with skills that are related to one major only instead of the skills related to all majors it has found in the cv

nlp = spacy.load('en_core_web_sm')

skill_dict = {
    'Computer Science': {'Python', 'Java', 'C++', 'machine learning', 'data structures', 'algorithms'},
    'Electrical Engineering': {'circuit design', 'power systems', 'analog electronics', 'digital signal processing'},
    'Mechanical Engineering': {'CAD', 'mechanical design', 'materials science', 'thermodynamics'},
    'Statistics': {'metrices', 'statistic', 'algorithm', 'mathmatics'}

def tokenize_cv(cv_file):
    file_extension ='.')[-1]
    if file_extension == 'docx':
        cv_text = docx2txt.process(cv_file)
    elif file_extension == 'txt':
        cv_text ='utf-8')
    elif file_extension == 'rtf':
        cv_text = textract.process(cv_file).decode('utf-8')
        raise ValueError('Unsupported file type')
    print('CV text:', cv_text)
    doc = nlp(cv_text)
    print('Spacy tokens:', [token.text for token in doc])
    tokens = [token.text.lower() for token in doc if not token.is_stop and not token.is_punct]
    print('Filtered tokens:', tokens)
    return tokens

def check_all_majors(tokens):
    matches = {}
    for major, skills in skill_dict.items():
        intersect = skills.intersection(set(tokens))
        if intersect:
            matches[major] = list(intersect)
    return matches

def editAccount(request):

    if request.user.is_Seeker:
        seeker = request.user.seeker
        form = SeekerAccountForm(instance=seeker)

    elif request.user.is_Recruiter:
        recruiter = request.user.recruiter
        form = RecruiterAccountForm(instance=recruiter)

    AllSkills = []

    if request.method == 'POST':
        if request.user.is_Seeker:
            form = SeekerAccountForm(request.POST, request.FILES, instance=seeker)
            if form.is_valid():
                # Validate file extension
                file = request.FILES['cv']
                    # Validate the file extension
                except ValidationError as e:
                    form.add_error('cv', e)
                    messages.error(request, 'the cv format is not accepted, Try (.docx , .txt , .rtf)')
                    return render(request, 'account-edit.html', {'form': form})
                file = request.FILES.get('cv', None)
                if file:
                    tokens = tokenize_cv(file)
                    print('tokens:' , tokens)
                    matches = check_all_majors(tokens)
                    for major, skills in matches.items():
                        for skill in skills:
                            skill_obj = Skill(owner=seeker, category=major, name=skill)
                            print(f'Saved skill {} in category {skill_obj.category}')
                messages.success(request, 'Your account has been updated!')
                return redirect('account')
        elif request.user.is_Recruiter:
            form = RecruiterAccountForm(request.POST, request.FILES, instance=recruiter)
            if form.is_valid():
                messages.success(request, 'Your account has been updated!')
                return redirect('account')

    context = {'form': form}
    if request.user.is_Seeker and seeker is not None:
        context['cv_skills'] = AllSkills
    return render(request, 'account-edit.html', context)

Side note: In the future, please surround your code with lines of three backtick - ` characters. This means you’ll have a line of ```, then your code, then another line of ```. (I’ve taken the liberty of doing that with this post - this is more a reminder for the future.)

My first thought here is that you’re creating a list of tokens that are all lowercase, but you have elements in your skill_dict with capital letters.

My suggestion would be to create some test cases to test check_all_majors with known data to verify that it operates as expected. You’ve got some print statements already to show data being presented, it might also help if you posted the output of one of your tests.

i updated my dictionary and made all of them in lowercase and now i can see python and java and c++ being matched from the cv with the dictionary, but two of them still didn’t appear: circuit design and power systems, from the print statements i know the method tokenize_cv works correctly but i think the issue is with check_all_majors since it didn’t print all the skills that matches

the problem appears that tokenize_cv will return a skill like (power systems) as two separated words instead of one phrase