calculated fields in models

I am new to Django and new to Python. I decided to build a project to practise my understanding.

Anyway while designing my tables what I found is that I would like to have a field in there that is calculated through say a python function. I read through and found that I could potentially use @property decorator and setter.

ref: [https://www.stavros.io/posts/how-replace-django-model-field-property/](http://How to replace a Django model field with a property)

So in my model class I have:

class CapacityRequest(models.Model):

cr = models.IntegerField(verbose_name="Change Request No")

disktier = models.ForeignKey('Disktier', on_delete=models.SET_DEFAULT,default=1)

requestedby = models.CharField(max_length=50, verbose_name='Requested By')

datacentre = models.ForeignKey('Datacentre', on_delete=models.SET_DEFAULT,default=1)

storageresource = models.CharField(max_length=50, verbose_name='VM Name/Datastore/Lun')

vcpu = models.IntegerField(default=0)

ram = models.IntegerField(default=0)

requestdate = models.DateTimeField(auto_now=True,verbose_name="Request Date")

notes = models.CharField(max_length=80)

diskspacegb = models.IntegerField(default=0)

_diskspace_gb_vswap = models.DecimalField(db_column="diskspace_gb_vswap", decimal_places=2 , max_digits=10)

@property

def diskspace_gb_vswap(self):

    return self._diskspace_gb_vswap

@diskspace_gb_vswap.setter

def diskspace_gb_vswap(self , value):

    self._diskspace_gb_vswap = self.diskspacegb * 1.5

In the shell
I import the necessary classes and after getting the object details for the foreign keys run this in the django shell to see if it will work.

capacityrequest = CapacityRequest(cr = 2353253 , disktier = ssddatastore , storageresource = ‘VMTest’ , diskspacegb = 60)

then

capacityrequest.save()

However i am getting an error that says:

django.db.utils.IntegrityError: NOT NULL constraint failed: capacityrequest_capacityrequest.diskspace_gb_vswap

This tells me that the function setters and such is not being implemented properly.

Any help to achieve what i want to do would be most appreciated.

You’re not setting the value for the diskspace_gb_vswap. variable.
Properties aren’t magic - they’re not called automatically, they just allow you to define methods to be used as if they’re variables.

If you really want to store a calculated value as a separate column (without addressing whether or not that’s a desirable pattern), you’ll still need to set the value - so you could do something like:

capacityrequest = CapacityRequest(cr = 2353253 , disktier = ssddatastore , storageresource = ‘VMTest’ , diskspacegb = 60)
# Note, the function requires a value to be passed, but it's not being used.
capacityrequest.diskspace_gb_vswap = diskspacegb 
capacityrequest.save()

Thank you @KenWhitesell for putting me on the right path.

I thought about what you said and then decided to change my approach to suit what you said.

I added this method like this.

def save(self, *args, **kwargs):
self.diskspace_gb_vswap = self.diskspacegb * 1.5

    super(CapacityRequest,self).save(*args, **kwargs)

</code
I have tested it and it works.

However you have hinted at a desirable pattern I am keen to listen to what that might be. I am sure it will help me in my Django journey

Couple little things:

  • If you’re already storing diskspacegb in the table, there’s no need to store diskspace_gb_vswap if it’s always going to be 1.5 * diskspacegb. Do the math only when/if it’s needed.

  • Use the “short form” of super - super().save(*args, **kwargs)

Otherwise, keep on truckin!

Ken

Thank you @kenwhitesell. What do you mean by doing the math only when needed? Is that what you mean by a conditional using something like if diskspacegb == 0 return 0 or something similar?

Thank you for the help.

@KenWhitesell:

I just had a relook and is this what you mean?

def save(self, *args, **kwargs):
    if self.diskspacegb > 0:

        self.diskspace_gb_vswap = self.diskspacegb * 1.5

    else:

        self.diskspace_gb_vswap = 0

    super().save(*args, **kwargs)

No, I mean that there shouldn’t even be a column for diskspace_gb_vswap in the database if it’s always going to be diskspacegb * 1.5. You’ve effectively created duplicate data in the database.

@KenWhitesell Ahhh i get what you mean. So how would i create a calculated field? What i would like is to have the user type in a value and it performs calculations on it to set the value correctly.

In this case they just type in the value 30 GB for example and Django will know that it is calculated correctly.

Hey @KenWhitesell ,

A bit of a side topic, but related to preventing duplication of data… How would you recommend compiling and displaying metrics? For example daily, weekly, monthly, yearly aggregation metrics or more complex metrics. Would you recommended calculating every time it is needed or storing/updating those metrics in the database since they would be consistently accessed?

Sorry for butting in weiyentan :grimacing:

Let’s try to shift the perspective just a little. “Django”, as the web framework, doesn’t “know” (or “care”) what is calculated when - You do.

And so the questions are, when and where do you care?

If you only have one or two views that need to display this calculated field, then you can create a class method to prepare this value to be displayed.
Small example:

class CapacityRequest(models.Model):
    ...
    diskspacegb = models.IntegerField(default=0)
   
    def diskspace_gb_vswap(self):
         return self.diskspacegb * 1.5

You can then directly reference that function in your templates as a value to be displayed in a manner similar to:

{{ some_capacity_request.diskspace_gb_vswap }}

(You can also use that method in your view if the value is needed for other calculations.)

If you’re going to need it for multiple rows at one time that you’re retrieving through a query, you can use the annotate method on the queryset to apply this calculation on every row being retrieved:

some_capacity_requests = CapacityRequest.objects.filter(...).annotate(
    diskspace_gb_vswap=F('diskspacegb') * 1.5)

Finally, if you really wanted to get fancy with this, you could create a custom manager that adds this annotation to every query on that table.

So it really comes down to figuring out how much or how often you’re going to need that calculated value.

Ken

2 Likes

Actually, this would make an excellent topic of its own - but it’s not necessarily a Django-specific topic. (It’s really a more general “data management” issue.)

As a result, my initial response to something like this is always going to be my “Standard Answer #1” - “It depends”.

Such an answer, of course, only invites the response of “What does it depend on?”

Generally, it depends on volumes, activity, and usage.

For example, I’ve got one data collection app that is working with a multi-terabyte DB collecting approximately 8 million rows of data per day. You can well bet that we’re not going to perform monthly statistics reporting by querying 250 million rows on a consistent basis.

On the other hand, I’ve got another app that over the past 16 months has created 149 rows of data. I’m not going to lose any sleep over any queries pulling data from ~150 rows.

Ken

1 Like

Thanks for both of the thorough replies. I’m certainly not in the 8 million data rows per day camp so I’ll stick to keeping it simple :grinning:

You should blog on the topic (and likely many others), I sense you have a wealth of knowledge many of us could tap in to.

thank you @KenWhitesell very much for the reply.

I am very new to Python. (I am actually come from a PowerShell background so learning about classes is a relatively new concept).

If I can summarise, from what I can see you have created a method in a class which I can then call upon in other areas of where I want to display data. First example is in your template but I suppose I could also tweak that in my api get data request by using class meta and adding in the function there to display the data back?

I am learning slowly , but having looked at other frameworks, including ones that are in the PowerShell ecosystem, I dont see anything is quite as simple and powerful as Django

You don’t need to add anything special to use that method when you’re working with an instance of that class.

For example, lets say you want to have a function-based view that takes the id of a CapacityRequest object as a parameter, then render a template based upon the details in that object, you could do something like:

def display_a_capacity_request(request, id):
    capacity_request = CapacityRequest.objects.get(id=id)
    diskspace_gb_vswap = capacity.diskspace_gb_vswap()
    context = {'capacity': capacity_request, 
               'vswap': diskspace_gb_vswap}
    return render(request, 'capacity_detail.html', context)

The third line of that snippet is what’s key here. Since the method is defined on the object, that method is available on every instance of that object.

Ken

1 Like

Hello,

Also new to Django and DB Management in general, but like to know the best way of doing things, I have some ideas:

  • I have a Table that has “area” and “rent per sf” and in several places I need to calculate “Total Rent” = “area” * “rent per sf”
  • Doing the calculation in every query works but if I have many million rows it would be better to have the calculation already there, specially to aggregate data for other uses, right?
  • However I also want data consistency, maybe in the future the DB is accessed by other tools other than Django, so I believe that the calculated Column should be part of the DB structure itself, meaning that this field is always calculated at DB level, and nobody could overwrite it with a value that is not the exact formula.
  • This makes sense to you?
  • Can the model be defined in Django to be registered at DB level?

Thanks in advance.

Maybe what I say is not supported?

https://code.djangoproject.com/ticket/31300

This sort of decision is always a tough call.

As a general principle, I always lean to the side of data consistency - this means dynamically calculating the totals. (This can be done in the database through the use of annotations, or in code by iterating over the elements of a queryset.)

For frequent calculations, you can define a view that returns the calculated value as a column. (You can’t define it in Django, it has to be created on the database, but you can use it by defining an unmanaged model that refers to the view by name.) I’d guess that it’s going to be at least as fast as an annotation added to a queryset. (It might be faster, I’ve never tried to benchmark it.)

However, the “best” solution isn’t a singular one. This table and these calculations don’t exist independently of a larger context and environment. Understanding the complete solution is part of deciding specific implementation issues.

Depending upon the exact needs of the site and what your performance window really is, there are some things you can do to speed it up.

I’m assuming that the majority of this data doesn’t change all that frequently. There are way you can construct this such that you keep “partial” totals by groups of rows. When applying updates, you then only need to update that summary data for those groups having changes applied. But making those kinds of decisions requires a much deeper understanding of the application and the environment in which it’s being run.

When I refer to the performance window, part of that involves understanding how frequently this calculation needs to be performed. For instance, if I’m only going to access the complete total once / month, I might not care how long it takes the query to run. I could schedule it as a task, and have the numbers available for later perusal.

We have what we call the 95-5 rule. Any query, report, view, etc used less than 5% of the time (1 business day / month) has no time spent on optimization. If the queries can’t be generated quickly enough “live”, then we look for ways to schedule their execution and store the results. (Monthly, Quarterly, and Annual business reports frequently fit into that category.)

That’s a great response, thank you!

I will start from the easiest approach and test for performance, in case is not that great then I can always try to implement a more complex solution that deals with this.

Regards,