How and where to perform/store calculations of model fields

I’m working on a personal finance site as a Django learning project and struggling to understand a good way to calculate and store values based on model fields.

I’m dealing with this in multiple places, but as an examples, I have two models Budget and BudgetItem, where Budget is a collection of multiple BudgetItems.

class Budget(models.Model):
    date = models.DateField()
    total_spend = models.IntegerField(default=0, null=False)
    total_budgeted = models.IntegerField(default=0, null=False)
    total_difference = models.IntegerField(default=0, null=False)
    total_income = models.IntegerField(default=0, null=False)

    def calc_total_spend(self):
        return self.budgetitem_set.all().aggregate(total_spend = Sum('purchase_total'))['total_spend']

    user = models.ForeignKey(
        get_user_model(),
        on_delete=models.CASCADE,
    )

    def __str__(self):
        return str(self.date)

    def save(self, *args, **kwargs):
        if self.budgetitem_set.all().exists():
            self.total_spend = self.budgetitem_set.all().aggregate(total_spend = Sum('purchase_total'))['total_spend']
            self.total_budgeted = self.budgetitem_set.all().aggregate(total_budgeted = Sum('amount'))['total_budgeted']       
        self.total_difference = self.total_budgeted - self.total_spend

        if self.incomeitem_set.all().exists():
            self.total_income = self.incomeitem_set.all().aggregate(total_income = Sum('total_income'))['total_income']

        return super().save(*args, **kwargs)

    class Meta:
        constraints = [
            models.UniqueConstraint(fields=['date', 'user'], name='unique_month')
        ]
        ordering = ['date']


class BudgetItem(models.Model):
    category = models.ForeignKey(
        'purchases.Category',
        null = False,
        blank = False,
        on_delete = models.CASCADE,
    )

    budget = models.ForeignKey(
        Budget,
        null = False,
        blank = False,
        on_delete = models.CASCADE
    )

    amount = models.IntegerField(default=0, blank=False, null=False)
    purchase_total = models.IntegerField(default = 0, blank = True, null = False)
    income_total = models.IntegerField(default = 0, blank = True, null = False)
    difference = models.IntegerField(default = 0, blank = True, null=False)

    user = models.ForeignKey(
        get_user_model(),
        on_delete=models.CASCADE,
    )

    def __str__(self):
        return str(self.category) + str(self.amount)

    def save(self, *args, **kwargs):
        qs_purchase = self.category.purchase_set.all().filter(date__year = self.budget.date.year, date__month = self.budget.date.month)
        if qs_purchase.exists():
            self.purchase_total = qs_purchase.aggregate(purchase_total = Sum('price'))['purchase_total']
        else:
            self.purchase_total = 0

        self.difference = self.amount - self.purchase_total
        return super().save(*args, **kwargs)
    
    
    class Meta:
        ordering = ['category__name']
        constraints = [
            models.UniqueConstraint(fields=['category', 'budget'], name='unique_budgetitem')
        ]

For each Budget instance, I want to calculate the sum of all values of the “amount” and “purchase_total” for each associated BudgetItem. I’ve managed to do this multiple ways but I don’t think any of them are very good.

One way (which is in the models above) is creating fields in the Budget model to store the calculations and then do the calculation for them every time the Budget instance is saved. I also got a this to work by doing the calculations in the view and saving the Budget instance.

I also realized I could add a method to the model to calculate it (as done with calc_total_spend) and then call that method in the template when I want to display it.

All of the calculated values will be displayed in the template and will be used in different views. So I’m not sure what is better practice. Should I have fields on my models to store the values so I’m not recalculating all the time? Or does it not really matter and should I just have the instance method be called when it is rendered in the template?

I think there’s a trade-off to consider here. Putting the calculated values in the models hits the database every time each model instance is saved, and database accesses can be slow (depending on a lot of factors). Calculated values (either using methods or properties) means that the values are overwritten with each calculation, and there’s no caching. I think this depends on what performance you’re seeing and where the bottleneck is.

I would say that if performance is not a concern, I would keep things the way you’ve written. It can always be changed later if you want to try a different way.

-Jorge

In addition to Jorge’s answer, there’s also another take on this from the perspective of data integrity.

Don’t store data that is calculated from other values, unless you can absolutely ensure that there is no way that the calculated values can end up “out of sync” with the source data.
Yes, there’s a performance hit for doing that, but I come from an environment where data integrity is the overriding concern over top of all other concerns.

Don’t believe for a minute that you can fully cover all cases within Django, at least not easily. A lot of people want to use signals to catch when an update occurs, either forgetting or not aware that the .update method won’t trigger those signals. Or handling deletes - if a budget item is deleted, is that going to trigger an update?

Worse-case scenario - what if someone is using a tool like PgAdmin to fix some data problems and makes changes to a budget item directly?

So our approach is always to calculate when a value is needed. That’s the only answer that makes sense to us.

(Having said that, we do have one special case where that’s not true. We have a reporting system, where a user can choose to save a report. The report is saved as-of the date/time it was run, with all calculated values stored. They can then bring up that report with the data as it existed at that point in time without any further calculations required - we save all data associated with that report, stored and calculated.)

1 Like

Jorge and Ken, thanks for the perspectives on this!

I hadn’t considered caching, mostly since I’m still learning and haven’t tried implementing it yet, but that is a good think for me to think about.

As for data integrity, I ran into that in an earlier (and terrible) version I had of this. And since I was contemplating using signals for this too, I’m glad you brought that to my attention Ken. Even though this isn’t a critical application by any means, I still don’t want to worry about values getting out of sync.

I’ll think about it some more but I’m leaning towards using methods at this point.

Thanks for you help!