Help with n + 1 (or similar) query

shmish · April 5, 2023, 3:46am

I have a set of queries where at one point the db gets hit a lot. I know where the problem is but I can’t figure out how to fix, or make it more efficient.

My models are:

class Grade(models.Model):
    score = models.CharField(max_length=4, blank=True)
    assessment = models.ForeignKey(Assessment, on_delete=models.CASCADE)
    objective = models.ForeignKey(Objective, on_delete=models.CASCADE)
    student = models.ForeignKey(Student, on_delete=models.CASCADE)
    cblock = models.ForeignKey(Classroom, on_delete=models.CASCADE)
    time_created = models.DateField(default=timezone.now)

class Assessment(models.Model):
	name = models.CharField(max_length=20, blank=True)
	omit = models.BooleanField(default=False)
	
class Classroom(models.Model):
	name = models.CharField(max_length=20, blank=True)

class Objective(models.Model):
    name = models.CharField(max_length=20)
    course = models.ForeignKey(Course, on_delete=models.CASCADE)

class Course(models.Model):
    name = models.CharFied(max_length=20)

class Student(models.Model):
    name = models.CharFied(max_length=20)
    classroom = models.ForeignKey(Classroom, on_delete=models.CASCADE)

The problematic view is:

def gradebook(request, classroom_id, course_id):
    classroom = Classroom.objects.get(pk=classroom_id)
    objective_list = Objective.objects.filter(course=course_id).order_by('name')
    student_list = Student.objects.filter(classroom=classroom_id).order_by('name')	
    for student in student_list:
        for obj in objective_list:
			q = Grade.objects.filter(student=student.id, objective=obj.id, assessment__omit="False").order_by('-time_created')
			if q.exists():
				if len(q) < 3:
			            qn = q.first()
				else:
				    .....

q.exists() and qn=q.first() are repeating hits to the db. I don’t think I can prefetch_related for either of objective_list or student_list. For example, I tried:

grade_queryset = Grade.objects.filter(cblock=classroom_id, assessment__omit="False")
objective_list = Objective.objects.prefetch_related(Prefetch('grade', queryset=grade_queryset,)).order_by('objective_name')

But this gave me an error that 'grade' is an invalid parameter to prefetch_related(). I guess prefetch_related doesn’t work from the other side of of the foreignkey.

Any suggestions on what I can try?

KenWhitesell · April 6, 2023, 2:07am

The first if isn’t necessary.

As I read this logic, you want the first item of the list if there are only 1 or 2 items in that list.

In that case, your if could be:

if len(q) in [1, 2]:
    qn = q[0]

That should remove the extra queries from the process.

You need to use the reverse reference name grade_set.

shmish · April 6, 2023, 12:51pm

Thank you, that removed the extra queries

If len(student_list) = 30 and len(objective_list) = 20, I hit the db 600 times with

q = Grade.objects.filter(student=student.id, objective=obj.id, assessment__omit="False").order_by('-time_created')

Is that the best I can expect to do? I think doing prefetch_related queryset on grade objects doesn’t help, because when I filter on the queryset it hits the db again?

Another idea I had was to get the values of grades before iterating:

grade_values = Grade.objects.filter(assessment__omit="False").exclude(
                    score="---").order_by('-time_created').values_list('student', 'objective', 'score')

And then in the iterations, do searches in grade_values list. This would reduce the 600 db queries to one, but there is the cost of doing 600 searches through the list of items. This number of searches would rarely go higher than 600, but the number of items in the list could be up to 5000 (typically 2000).

Is one method preferred over the other, or is this something that we try to measure (response times?) and decide from that?

KenWhitesell · April 6, 2023, 3:26pm

Actually, you’re hitting the database 600 times because of this:

You’d be better off creating an appropriate query involving (some combination of) annotations, aggregations, and/or subqueries such that you’re not creating separate queries for each (objective, student).

shmish · April 7, 2023, 7:56pm

Actually, you’re hitting the database 600 times because of this:
shmish:
    for student in student_list:
        for obj in objective_list:
You’d be better off creating an appropriate query involving (some combination of) annotations, aggregations, and/or subqueries such that you’re not creating separate queries for each (objective, student).

I get all the objects I need with
all_grades = Grades.objects.filter(cblock=classroom_id)

I use the iterations to mostly just produce an ordered list that I pass to the template, and the template goes through two for loops to fill a table (columns are objectives, rows are students, cells are the grades).

Maybe I can do something like:

all_grades = Grades.objects.filter(cblock=classroom_id).values_list('student, objective', 'score', 'time_created')

Now I would have a list that could be sorted and I no longer have to touch the database. I just recently learned about .values() and .values_list()

Topic		Replies	Views
DRF and nested serialisers optimisation with prefect_related() Using Django	2	11156	September 10, 2020
Query from one side filtering on the many side. Using Django	3	551	December 20, 2020
How to use prefetch_related to retrieve multiple rows similar to SQL result Using the ORM	1	1071	September 23, 2022
Automatic prefetching of ForeignKeyFields Show & Tell	1	1534	April 6, 2020
prefetch_related display ManyToManyField values Using Django	5	978	September 20, 2021

Help with n + 1 (or similar) query

Related topics