How to efficiently render calculated fields on a table?

Hello everyone, me again.
I think I have some design problem.

I have a Product model with some properties:

class Product(models.Model):
    length = DecimalField
    width = DecimalField
    height = DecimalField

Also I have a Variable model acts like a singleton,
just one register to store variables used in my system.

# This model isn't related to Product
class Variable(models.Model):
    var1 = DecimalField
    var2 = DecimalField
    var3 = DecimalField
    var4 = DecimalField

For each Product, I have 4 model methods to calculate different prices:

  • def cost_price = length + width + height + var1
  • def sale_price = cost_price + var2
  • def finished_price = cost_price + var3
  • def sale_finished_price = finished_price * var4

Problem

This are not the exact calculations but you can start to see where the problem is.
Calling sale_finished_price calls finished_price which calls cost_price and so on. Also the vars count as queries to the DB.

The problem arises when I render a table of Products, with one of the fields being sale_finished_price.
This shoots a lot of similar queries for making the calculations.

Django Debug Toolbar Information
imagen

SELECT "variables_variable"."id",
       "variables_variable"."created",
       "variables_variable"."modified",
       "variables_variable"."is_removed",
       "variables_variable"."precio_hora",
       "variables_variable"."precio_pintado",
       "variables_variable"."ganancia_por_mayor",
       "variables_variable"."ganancia_por_menor",
       "variables_variable"."ganancia_fibrofacil"
  FROM "variables_variable"
 WHERE ("variables_variable"."is_removed" = false AND "variables_variable"."id" = 1)
  28 similar queries.   Duplicated 28 times. 

I think I can’t use select_related() and prefetch_related() in this case, because there are no relations between Product and Variable.

If you read until here, thanks!
Hopefully we’ll find some efficient way to do this.

Hmmm… Interesting question!

My first reaction is to think about whether or not this is really a problem. Yes, it’s a lot of queries. But assuming you’re using Postgresql as the database, there’s going to be a lot of caching going on along the way. It’s not like the second and subsequent queries are going to need to hit the physical disk.
(For example, in the results you show from DDT, it took 73,41 ms, which is less than a tenth of a second. It’s longer than it needs to be, but is it a problem? That’s not for me to say.)

It might be helpful to see your model methods to see how you’re calculating these. There’s a chance there might be a better way of doing it without making any significant changes.

My next thought would be to prefetch the values in the model methods and use the local variables in the calculations rather than trying to do the calculations at the database layer.

Depending upon how stable these variables are across the running of the application, I might also think about loading them into Django’s cache at startup and referencing them from there.

Ken

Thanks for answering Ken!

I tend to think that what I’m trying to do it’s already done by someone so, surely there’s a better way to do it than mine.

Here is the code from the models.

This is what I wanted to do from the beginning but I did it wrong and you corrected me.


Possible solutions:

  • I never implemented caching because this is a school project but I’d like to implement it if there’s the need.

  • Another solution would be to refactor the functions to use Variable.objects... only one time. (What you’ve said above)

  • Last but not least, I can exclude this fields from the table, avoiding the calculations in batch.

1 Like

Cool - just one quick question before I offer a suggestion: Do you always need to calculate the four values together, or are there situations where you might need to only call precio_costo without calculating the other values?

Not at all. Maybe I need only the last function precio_venta_terminado, which is the sale price to the client and the most important to look in a table.

I wrote it this way, by “concatenating” functions, as a way to imitate how it works on an spreadsheet. Now I know this is not like a spreadsheet :sweat_smile:

Just to update, I just tried to cache the view with
@method_decorator(cache_page(60 * 5), name='dispatch')
and the result is great.

The downside is it doesn’t invalidate the cache when I modify the Variable instance.

So the “least invasive” / “least disruptive” solution I came up with looks like this:

class Product(models.Model):
    largo = DecimalField
    ancho = DecimalField
    alto = DecimalField
    tiempo = PositiveIntegerField
    
	def precio_costo(self, vars) -> float:
        precioBase = (self.largo / 100) * (self.ancho / 100)

        precioLatCorto = (
            (self.ancho / 100) * (self.alto / 100)
        )

        precioLatLargo = (
            (self.largo / 100) * (self.alto / 100)
        )

        horas: float = self.tiempo / 60

        # Cast to Decimal to be able to multiply it with another Decimal
        precio_tiempo = Decimal(horas) * vars.precio_hora ### Repeated query

        return precioBase + precioLatCorto + precioLatLargo + precio_tiempo

    def precio_venta_crudo(self, vars) -> float:
        """ Calcula el precio de venta al pĂşblico del producto crudo """

        # Precio costo * % de ganancia
        return self.precio_costo(vars) * (
            (vars.ganancia_por_menor / 100) + 1 ### Repeated query
        )

    def precio_terminado(self, vars) -> float:
        """ Calcula el precio del producto terminado, sin la ganancia """

        tiempo_terminado = (self.tiempo * 2) / 60

        precio_tiempo_terminado = (
            Decimal(tiempo_terminado) * vars.precio_hora ### Repeated query
        )

        return (
            self.precio_costo(vars)
            + vars.precio_pintado
            + precio_tiempo_terminado
        )

    def precio_venta_terminado(self) -> float:
        """ Calcula el precio de venta al pĂşblico del producto terminado """
		vars = Variable.objects.get(pk=1)
        # Precio terminado * % de ganancia
        return self.precio_terminado(vars) * (
            (vars.ganancia_por_menor / 100) + 1 ### Repeated query
        )

(I tried to maintain the style and structure of what you’ve already done - I mostly wanted just to illustrate what I was talking about in my previous reply.)

This is not a “perfect” solution - there’s no error checking on the get, and there’s no way to call the other functions without passing a variable, or no way to use this without the Variable object being present. Just hoping it gives you some ideas.

Ken

1 Like

As usual, thanks for your answer Ken, it helped reducing the amount of similar queries and optimizing the time of the query.

If you have another idea, I’d like to know it, even if it’s invasive.
This is not being used by anyone so this is the best moment to make these kind of changes.


On caching, where should I look into to be able to invalidate the cache on certain model changes? Is this a common problem?

I cached a view with cache_page() but I’m not finding the way to invalidate it. Cache Docs

I would need to ask a couple more questions before I’m able to offer more suggestions:

Are these calculated for 1 row at a time, or are you needing to perform these calculations across a number of rows at once?

Are the changes to Variable frequent or infrequent? (How often / regularly do they change?)

Ken

At least for now they’re calculated 1 row at a time, one calculation for each product.

Normally the changes to Variable could be between 1 to 3 months.
With the current state of Argentine economy, the changes could be between weeks or days sadly.

I’m sorry, I wasn’t clear enough for what I was asking. The views that are displaying the results of these calculations - is one view showing one row (a “detail view”), or is a view showing multiple rows (a “list view”), where these calculations need to be performed on each row being displayed in that view?

Also, while I’m thinking about it regarding the cache - what I would do for updating the cache would be to hook into the save method of the variable model.

Ken

I’m using these calculations in a ListView with multiple products, each product making it’s own price calculation with precio_venta_terminado.

I was thinking and maybe it isn’t very useful to cache a list of values which could change very frequently.
And on hooking into the save method, do you reccommend to use the cache API provided by Django? Something like cache.delete('some_key').

Hi! Really sorry about the delay in getting back to you on this.

What you can do is use an annotation to perform the calculations across all elements of your queryset. For example, let’s say you’re doing something like this:
qs = Product.objects.all() to retrieve all Product.

You could, then, modify your queryset to do something like this instead:
(again, I’m trying to effectively mimic what you have done - any refactoring is up to you)

v = Variable.objects.get(pk=1)
qs = Product.objects.all().annotate(
    precioBase=(F('largo')/100) * (F('ancho')/100),
    precioLatCorto=(F('ancho')/100) * (F('alto')/100),
    horas=F('tiempo')/60,
    precio_tiempo=horas*v.precio_hora,
    precio_costo=F('precioBase')+F('precioLatCorto')+F('precioLatLargo')+F('precio_tiempo'),
    precio_venta_crudo=F('precio_costo') * ((v.ganancia_por_menor / 100) + 1),
    tiempo_terminado=F('tiempo') * 2 / 60,
    precio_tiempo_terminado=F('tiempo_terminado') * v.precio_hora
    precio_terminado=F('precio_costo')+v.precio_pintado+v.precio_tiempo_terminado,
    precio_venta_terminado=F('precio_terminado') * ((v.ganancia_por_menor / 100) + 1)
)

When this query executes, you’ll have all the values you’re looking for, plus all the intermediate values available as attributes, on each row.
(Note: I typed this live - there’s likely one or more syntax errors.)


As a separate topic; I wouldn’t ever delete the key, just rewrite it when it’s updated. (You can set a cache key even if it’s already set - the new value overwrites the old.) But, with the modifications we’ve made in these last two iterations, I really don’t think it’s necessary to bother with the cache on this.

Ken

1 Like

Hi! No problem for the delay.

I tried your approach and it ended being more efficient than my model methods, mainly in the amount of queries.

This is the query for one product with all his annotations.

Last question: Do you think, in this case, that Qs with annotations is a good replacement for the model methods, or just a different way of calculating the same information?

Thanks for your time Ken, really appreciated.

It’s different, and in part, the differences relate to how/when you’re doing these queries.

If you look closely at that generated query, you can see that it’s calculating the same value multiple times. The F clauses creating intermediate results aren’t actually saving those results - it’s effectively defining them as a formula. Later uses of that calculation don’t reuse the earlier results, it’s calculating them again.

This means that once you reach some level of complexity, it ceases to be advantageous to calculate values within the context of a query.

That you are reducing the total number of queries does not necessarily translate into faster performance, it’s something you would need to evaluate on a case-by-case basis.

<opinion>
Just as a guess, with nothing in the way of facts to back this up, I’d believe that a streamlined method (doing everything in one function and not split across 4 functions) would be faster than this method of chaining the calculations in the query.

I generally limit my use of annotation-based calculations to those situations where I may need to perform calculations like this involving multiple tables through foreign key relationships. I much prefer doing them within my “regular code”.

In your specific case, I’d be inclined to consider a different approach. I would move the get of the Variable row into the View so that the view becomes responsible for the calculations - possibly through a model method, or probably through a manager method. (Not sure exactly, I’m just thinking aloud at the moment.)

Again, most of this is likely going to be a judgement call based upon all the different situations in your application you need to perform these calculations. For just one list view, it’s not likely going to make a difference what you choose to do. However, if this table and its contents are the core of this entire application and you’re going to be calculating this thousands of times per day, I might come up with different answers.
</opinion>

I considered using a custom manager or refactoring the function to calculate the different prices. Surely it should solve some of the issues I have.

Thanks for your help Ken :+1:

1 Like

Hi there, I’m quite new to Djando… Just learning! I would like to know where this example would go? Within the models.py or could be within the views.py? Thank you!

Hi!

You can certailnly use it as model methods or properties like in this comment:

You can also insert it in a view: (I commented it for some reason I cant’ recall)

Where’s best to put it may depend on other factors.

I come from the future ?) and Generated Fields are a thing on Django 5.0!
Take a look at this post from Mariusz to see a great example on how to use it. New goodies in Django 5.0 · Django Beats