Help with n + 1 (or similar) query

I have a set of queries where at one point the db gets hit a lot. I know where the problem is but I can’t figure out how to fix, or make it more efficient.

My models are:

class Grade(models.Model):
    score = models.CharField(max_length=4, blank=True)
    assessment = models.ForeignKey(Assessment, on_delete=models.CASCADE)
    objective = models.ForeignKey(Objective, on_delete=models.CASCADE)
    student = models.ForeignKey(Student, on_delete=models.CASCADE)
    cblock = models.ForeignKey(Classroom, on_delete=models.CASCADE)
    time_created = models.DateField(default=timezone.now)

class Assessment(models.Model):
	name = models.CharField(max_length=20, blank=True)
	omit = models.BooleanField(default=False)
	
class Classroom(models.Model):
	name = models.CharField(max_length=20, blank=True)

class Objective(models.Model):
    name = models.CharField(max_length=20)
    course = models.ForeignKey(Course, on_delete=models.CASCADE)

class Course(models.Model):
    name = models.CharFied(max_length=20)

class Student(models.Model):
    name = models.CharFied(max_length=20)
    classroom = models.ForeignKey(Classroom, on_delete=models.CASCADE)

The problematic view is:

def gradebook(request, classroom_id, course_id):
    classroom = Classroom.objects.get(pk=classroom_id)
    objective_list = Objective.objects.filter(course=course_id).order_by('name')
    student_list = Student.objects.filter(classroom=classroom_id).order_by('name')	
    for student in student_list:
        for obj in objective_list:
			q = Grade.objects.filter(student=student.id, objective=obj.id, assessment__omit="False").order_by('-time_created')
			if q.exists():
				if len(q) < 3:
			            qn = q.first()
				else:
				    .....

q.exists() and qn=q.first() are repeating hits to the db. I don’t think I can prefetch_related for either of objective_list or student_list. For example, I tried:

grade_queryset = Grade.objects.filter(cblock=classroom_id, assessment__omit="False")
objective_list = Objective.objects.prefetch_related(Prefetch('grade', queryset=grade_queryset,)).order_by('objective_name')

But this gave me an error that 'grade' is an invalid parameter to prefetch_related(). I guess prefetch_related doesn’t work from the other side of of the foreignkey.

Any suggestions on what I can try?

The first if isn’t necessary.

As I read this logic, you want the first item of the list if there are only 1 or 2 items in that list.

In that case, your if could be:

if len(q) in [1, 2]:
    qn = q[0]

That should remove the extra queries from the process.

You need to use the reverse reference name grade_set.

Thank you, that removed the extra queries

If len(student_list) = 30 and len(objective_list) = 20, I hit the db 600 times with

q = Grade.objects.filter(student=student.id, objective=obj.id, assessment__omit="False").order_by('-time_created')

Is that the best I can expect to do? I think doing prefetch_related queryset on grade objects doesn’t help, because when I filter on the queryset it hits the db again?

Another idea I had was to get the values of grades before iterating:

grade_values = Grade.objects.filter(assessment__omit="False").exclude(
                    score="---").order_by('-time_created').values_list('student', 'objective', 'score')

And then in the iterations, do searches in grade_values list. This would reduce the 600 db queries to one, but there is the cost of doing 600 searches through the list of items. This number of searches would rarely go higher than 600, but the number of items in the list could be up to 5000 (typically 2000).

Is one method preferred over the other, or is this something that we try to measure (response times?) and decide from that?

Actually, you’re hitting the database 600 times because of this:

You’d be better off creating an appropriate query involving (some combination of) annotations, aggregations, and/or subqueries such that you’re not creating separate queries for each (objective, student).

1 Like

Actually, you’re hitting the database 600 times because of this:

You’d be better off creating an appropriate query involving (some combination of) annotations, aggregations, and/or subqueries such that you’re not creating separate queries for each (objective, student).

I get all the objects I need with
all_grades = Grades.objects.filter(cblock=classroom_id)

I use the iterations to mostly just produce an ordered list that I pass to the template, and the template goes through two for loops to fill a table (columns are objectives, rows are students, cells are the grades).

Maybe I can do something like:

all_grades = Grades.objects.filter(cblock=classroom_id).values_list('student, objective', 'score', 'time_created')

Now I would have a list that could be sorted and I no longer have to touch the database. I just recently learned about .values() and .values_list()