Where should this piece of logic live?

This is going to be less of a technical question and more of a software engineering / Django-idiomatically-specific question. I’d like some feedback as to how I organized the code pertaining to a feature of my Django app, as I’m starting to second guess the way I’ve designed it.

I’m making a school application and the part that will be discussed here is how answers to questions (from now on called exercises) are graded.

The what
Let’s summarize the problem at hand, then I’ll show the way I implemented it:

  • exercises can be of different types, from multiple choice, to open answer, to even coding exercises. Each type of exercise has a different set of criteria for grading a submission
  • quizzes can be created in order to present students a series of exercises. A quiz isn’t directly related to questions: it uses a mechanism where a rule model determines how to pick the exercises to assign to student; this allows for different students to be assigned different exercises in the same quiz.
  • each exercise carries a maximum attainable grade: scoring that grade means the exercise is 100% correctly answered; however, the actual score the student gets for that exercise is determined by a property in the rule that assigned it. If an exercise has a max score of 4, the rule that assigned it carries of weight of 3, and a student scores 2, their grade for that exercise will be 2/4*3=1.5. This allows teachers to define exercises independently of quizzes and re-use them with different weights.

The how
When a student participates in a quiz, a Participation is created. For each rule defined in the quiz by the teacher, a ParticipationSlot is created: this has a fk to the assigned exercise, and is the only interface through which the student can interact with it. This model also encapsulates the fields necessary for submitting an answer (for example a m2m of selected choices, or a text field for open answer exercises). If also has a score property that computes the obtained score/grade. This is what I’m wanting to get some feedback on.

Here’s the ParticipationSlot model:

class EventParticipationSlot(models.Model):
    participation = models.ForeignKey(
        EventParticipation,
        related_name="slots",
        on_delete=models.CASCADE,
    )
    exercise = models.ForeignKey(
        Exercise,
        related_name="in_slots",
        on_delete=models.CASCADE,
    )
    populating_rule = models.ForeignKey( # <-- this is where the slot's weight comes from
        Rule,
    )

    selected_choices = models.ManyToManyField(ExerciseChoice)
    answer_text = models.TextField(blank=True)

    _score = models.DecimalField(
        max_digits=5,
        decimal_places=2,
        null=True,
        blank=True,
    )

    @property
    def score(self):
        if self._score is None:
            return get_assessor_class()(self).assess()
        return self._score

    @score.setter
    def score(self, value):
        self._score = value

As you can see, whatever the automatically assigned grade is, it can be manually overridden by assigning to it. This is vital for types of exercises like open answer, where an automatic grade cannot really be assigned.

Now, in a separate module called logic, here’s the definition of get_assessor_class:

def get_assessor_class():
    return SubmissionAssessor

and here’s the class:

class SubmissionAssessor:
    def __init__(self, participation_slot):
        self.participation_slot = participation_slot
        self.slot_weight = self.participation_slot.populating_rule.weight

    def get_multiple_choice_submission_correctness(self, slot):
        selected_choices = slot.selected_choices.all()
        return sum([c.correctness for c in selected_choices])

    def get_programming_submission_correctness(self, slot):
        passed_testcases = len([t for t in slot.execution_results["tests"] if t["passed"]])
        return passed_testcases

    def get_manual_submission_correctness(self, slot):
        return None

    def get_submission_correctness(self, slot):
        from courses.models import Exercise

        exercise_type = slot.exercise.exercise_type

        if exercise_type in [Exercise.OPEN_ANSWER, Exercise.ATTACHMENT]:
            return self.get_manual_submission_correctness(slot)

        if exercise_type in [
            Exercise.MULTIPLE_CHOICE_SINGLE_POSSIBLE,
            Exercise.MULTIPLE_CHOICE_MULTIPLE_POSSIBLE,
        ]:
            return self.get_multiple_choice_submission_correctness(slot)

        if exercise_type in [Exercise.JS, Exercise.C]:
            return self.get_programming_submission_correctness(slot)

    def assess(self):
        exercise_max_score = self.participation_slot.exercise.get_max_score()
        if exercise_max_score is None or exercise_max_score is 0:
            return 0

        submission_correctness = self.get_submission_correctness(
            self.participation_slot
        )
        if submission_correctness is None:
            return None

        return (
            Decimal(submission_correctness)
            / Decimal(exercise_max_score)
            * Decimal(self.slot_weight or 0)
        )

As you can see, accessing the score property on a ParticipationSlot instantiates a SubmissionAssessor and calls assess passing the slot, which in turn calls a specialized assessment method depending on the type of the exercise in the slot. The reason I have a get method instead of directly instantiating the class is to allow for future extensions where I might want different behaviors depending on the type of quiz.

A “correctness” value is computed for the submission, then it’s divided by the maximum attainable score of the exercise (more on this in a second), and ultimately scaled by the weight of the slot/rule.

(as an aside, if you’re wondering by I am passing slot to all of the assess methods instead of using self.participation_slot, it’s because slots are actually recursive – although it’s omitted here – and to grade some exercises you have to recursively evaluate the sub-slots)

The last bit is how the maximum possible grade for an exercise is determined. This is a method in the model Exercise:

    def get_max_score(self):
        if self.exercise_type in [Exercise.OPEN_ANSWER, Exercise.ATTACHMENT]:
            return None
        if self.exercise_type in [Exercise.JS, Exercise.C]:
            return self.testcases.count()
        if self.exercise_type == Exercise.MULTIPLE_CHOICE_SINGLE_POSSIBLE:
            max_score = (self.choices.all().aggregate(Max("correctness")))[
                "correctness__max"
            ]
            return max_score
        if self.exercise_type == Exercise.MULTIPLE_CHOICE_MULTIPLE_POSSIBLE:
            correct_choices = self.choices.filter(correctness__gt=0)
            return sum([c.correctness for c in correct_choices])

So as you can see, we go from accessing the score property, which is inside a model, to instantiating a SubmissionAssessor, which lives in a separate module altogether, to once again tying into the models by calling get_max_score on the related exercise.

I understand that, as per the active record and fat model pattern, most business logic code should live inside the models.

Do you have any feedback as to this architecture? How could it be made clearer and tidier?

I’m thinking the get_submission_correctness method in the assessor class might become a method in ParticipationSlot, and at that point the score property would just involve calling that, dividing by the max exercise score, and scaling for the slot weight. But then I’d also have to move all the different assess methods inside the model, possibly bloating it up too much.

One more thing I don’t particularly like, althought I can’t really think of a different way, is to have all this logic “repeated” for every type of exercise: how to evalue a submission, how to get the max score depending on the type, etc.

Two really high level questions/comments, as someone who’s worked on the Open edX project:

Does the authored content change over time? What happens to student grades in the scenario that course teams made some sort of mistake that they later correct? I have a lightning talk on some of the issues that have come up over the years for us, and some of the mitigations we’ve taken. I’m trying to address some of those issues today in the openedx-learning data model, but that’s still in really early stages (warning: that is a long thread, the models.py in the PR might be a better summary).

How far does this have to scale? I wrote a blog post back in 2018 about some of the painful lessons we learned from our initial grading system, which also aimed for pluggability and recursive nesting (though in a hackier way). The short takeaway from that for us was to avoid querying smart, pluggable objects at request time; but instead to have objects like Assessors push data into a dedicated grading app that controls its own data model and can make guarantees about the queries you want to ask of it.

This project looks really interesting and powerful. Best of luck with it!

1 Like

It can change, like the example you mentioned of correcting a mistake in the exercise text/choices/something else.
The nice thing about my solution, in that regard, is that the grades will be automatically kept in sync because the grading is “re-done” every time the property is accessed. This might mean a slower response time, but in practice I haven’t experience that kind of issues even when viewing a page where a hundred exercise slots were displayed at once, thus triggering a hundred Assessors (your typical quiz will have anywhere 2-30 exercises, so that shouldn’t be an issue when viewing a single quiz attempt).

You just gave me some really interesting reading/listening material to consume! Thank you!

It’s being used by upwards of 200 students at a time with quizzes of 20 exercises on average. I’ll read your article and see if there’s any takeaway point I can get from it and apply to my own code.

Thank you! This project is actually in a late-ish stage already, as it’s already being used in production by University of Pisa to deliver online exams. Now I’m looking to take it to the next level by re-engineering some of its pain spots in order to really provide the best service possible.

Since you sure have a lot more experience than me in designing or working on these types of systems, if you have some spare time and want to take a look at my code, here’s the back-end repository for the project: GitHub - samul-1/sai_evo_backend Don’t take it as a commitment, but if you are so inclined I’d love to get some feedback from you about the general structure of the project or really just about some of the code I wrote. (I apologize in advance for the poorly commented models—I’d be happy to explain away any doubts you might have reading the code; you’ll find that other modules are better-commented though).

We had this approach in the beginning as well. The self-healing aspect of it was especially valuable to us in the early days when students were sometimes mis-scored because of bugs in our code. I think this was the right approach for us to start out with given the constraints that we were under, but it did cause us some headaches over time, like:

  1. Performance issues when generating aggregate score/grade statistics across hundreds of thousands of students. At the scale of a couple hundred, this might be ignorable…?
  2. Changes where the students got re-graded lower because of a correction. Because few things are as painful to a student as barely passing a course, thinking you’re done, and then coming back three weeks later to find out that no, in fact that was a mistake, and now you’re just barely failing. But it looks like you have a manual override option for those exceptional cases.
  3. Situations where the content changed significantly over time, like if you have a course open to many people over a long period of time.

We had a very specific use case of running courses with extremely high enrollments though, so it might be that none of these are really problems for you at all.

That’s fantastic! Congratulations!

I can’t make any promises (especially not this week), but I will try next week.

Take care.

1 Like