I want to share with the community something I learned today, after spending a lot of time (inefficiently) chasing a bug.
My app implements an online school system that allows students to take tests made up of multiple questions. I’m working, among the others, with these models (I redacted most of the fields that weren’t relevant to the issue):
class Question(models.Model):
text = models.TextField()
exam = models.ForeignKey(
Exam, null=True, on_delete=models.SET_NULL, related_name="questions"
)
class ExamProgress(models.Model):
user = models.ForeignKey(
settings.AUTH_USER_MODEL,
related_name="exams_progress",
on_delete=models.CASCADE,
)
exam = models.ForeignKey(
Exam, on_delete=models.CASCADE, related_name="participations"
)
questions = models.ManyToManyField(
Question,
through="ExamProgressQuestionsThroughModel",
related_name="question_assigned_in_exams",
blank=True,
)
current_item_cursor = models.PositiveIntegerField(
default=0
) # current item being displayed
class ExamProgressQuestionsThroughModel(models.Model):
ordering = models.PositiveIntegerField()
exam_progress = models.ForeignKey(ExamProgress, on_delete=models.CASCADE)
question = models.ForeignKey(Question, on_delete=models.CASCADE)
class Meta:
ordering = ["ordering"]
The model ExamProgress
represents an instance of a student’s session during an exam. Different students will see different questions in general, because they are randomized, and only one question is shown at once. Therefore, we have a m2m field that holds the questions assigned to the user, and we use a through model to keep track of the ordering. To access a question, we get
the one that has the field ordering
equal to the value of current_item_cursor
in ExamProgress
.
After the exam, my app generates pdf reports for the participants, showing the questions they were assigned and the answers they gave.
Let’s say I get a participant’s questions like this:
# this is inside an ExamProgress method, so `self` is an instance of ExamProgress
questions = (
self.exam.questions.filter(pk__in=self.questions.all())
.prefetch_related("answers")
.prefetch_related("given_answers")
)
This is mostly fine, except I want to retain the order in which they were assigned to the user, i.e. I want to order by the through model’s ordering
field.
Fine, let’s add:
questions = (
self.exam.questions.filter(pk__in=self.questions.all())
.order_by("examprogressquestionsthroughmodel__ordering") # <-----
.prefetch_related("answers")
.prefetch_related("given_answers")
)
Now, fast forward to this morning–the exam went by, at the end I hit the button to generate the pdf and, lo and behold, I got a bunch of pdf files with about 200 questions per user! Note that the exam only contained 14 questions per user.
I was extremely confused and didn’t really know what was going on. All I knew was that the right amount of questions was returned if I omitted the order_by
method. Another clue I eventually picked up on was that the correct number of questions was also returned if only one user had taken the exam.
It was at this time that I had the idea: let’s check the generated SQL by the query with and without the order by clause.
This is the generated SQL without it:
SELECT "jsplatform_question"."id",
"jsplatform_question"."text",
"jsplatform_question"."exam_id",
FROM "jsplatform_question"
WHERE (
"jsplatform_question"."exam_id" = 10
AND "jsplatform_question"."id" IN (
SELECT U0."id"
FROM "jsplatform_question" U0
INNER JOIN "jsplatform_examprogressquestionsthroughmodel" U1 ON (U0."id" = U1."question_id")
WHERE U1."exam_progress_id" = 294
)
)
ORDER BY "jsplatform_question"."id" ASC
Now this is the generated SQL with the order_by
:
SELECT "jsplatform_question"."id",
"jsplatform_question"."text",
"jsplatform_question"."exam_id",
FROM "jsplatform_question"
LEFT OUTER JOIN "jsplatform_examprogressquestionsthroughmodel" ON (
"jsplatform_question"."id" = "jsplatform_examprogressquestionsthroughmodel"."question_id"
)
WHERE (
"jsplatform_question"."exam_id" = 10
AND "jsplatform_question"."id" IN (
SELECT U0."id"
FROM "jsplatform_question" U0
INNER JOIN "jsplatform_examprogressquestionsthroughmodel" U1 ON (U0."id" = U1."question_id")
WHERE U1."exam_progress_id" = 294
)
)
ORDER BY "jsplatform_examprogressquestionsthroughmodel"."ordering" ASC
After inspecting the second query’s SQL, I realized what was really going on: ExamProgress
was being left-joined with the through model and the results were correctly filtered by exam, but then all the rows of this joined table were being returned, meaning I would get an entry in my resulting queryset for each time a question assigned to that ExamProgress
had been assigned to any ExamProgress
. What really needed to happen was filtering on the results of the joined table too.
After gaining this knowledge thanks to inspecting the generated SQL, I went back to my python code and added this:
questions = (
self.exam.questions.filter(pk__in=self.questions.all())
.order_by("examprogressquestionsthroughmodel__ordering")
.filter(
examprogressquestionsthroughmodel__exam_progress=self
) # <---------
.prefetch_related("answers")
.prefetch_related("given_answers")
)
which in turn added a condition to the outermost WHERE
clause:
...
WHERE (
"jsplatform_question"."exam_id" = 10
AND "jsplatform_question"."id" IN (
SELECT U0."id"
FROM "jsplatform_question" U0
INNER JOIN "jsplatform_examprogressquestionsthroughmodel" U1 ON (U0."id" = U1."question_id")
WHERE U1."exam_progress_id" = 294
)
AND "jsplatform_examprogressquestionsthroughmodel"."exam_progress_id" = 294
)
So today I learned two lessons that might come in handy to some of you in the future:
- when you query a many-to-many relationship and you order by a field that is in the through model, rather than the target model of the relationship, what you get back from the db is actually the rows of a joined table between the target model and the through model (as opposed to, say, the joined table only being used to get the ordering and being discarded later)
- when your queries aren’t behaving like you expect them to, a valuable option is to check the resulting SQL and see if it gives you any clues.
Thoughts and comments are appreciated I’m putting this out here just in case anybody with a similar issue bumps into this in the future.