Database query optimization when base queryset filters multiple times.

Hi everyone!
I have a base queryset and the goal is to build a dashboard statistic using the base queryset but filtered by different values for each variable.

My simplified code for better understanding:

class Transaction(models.Model):
    
    user = models.ForeignKey(CustomUser, related_name="transactions", on_delete=models.CASCADE)
    title = models.CharField(max_length=32, verbose_name="Title")
    category = models.ForeignKey(Category, related_name="transactions", on_delete=models.CASCADE, null=True, blank=True)
    operation = models.CharField(max_length=8, choices=OPERATION_TYPE, verbose_name="operation")
    value = models.DecimalField(max_digits=14, decimal_places=2, verbose_name="value")
    date_created = models.DateTimeField(auto_now_add=True, blank=True, null=True) 

class Category(MPTTModel):

    name = models.CharField(max_length=54, unique=True)
    parent = TreeForeignKey("self", on_delete=models.CASCADE, null=True, blank=True, related_name='children')

Having the models above I’ve created ClassBasedView(ListView) :

class DashboardView(ListView):

    model = Transaction
    template_name = "invoices/dashboard.html"
    ordering = "-date_created"


    def get_queryset(self) -> QuerySet[Any]:
        queryset = super().get_queryset()
        queryset = queryset.filter(user=self.request.user).select_related("category__parent")
        return queryset

    def get_context_data(self, **kwargs: Any) -> dict[str, Any]:
        data = super().get_context_data(**kwargs)

        # retrieving all transactons
        transactions_all = self.get_queryset()

        # retrieving incomes/expenses summary for current month
        incomes_this_month = transactions_all.filter(transaction_filter.transaction_date_filter(month="current"), operation="incomes")
        incomes_this_month_sum = incomes_this_month.aggregate(Sum("value")).get("value__sum")
        
        expenses_this_month = transactions_all.filter(transaction_filter.transaction_date_filter(month="current"), operation="expenses")                          
        expenses_this_month_sum = expenses_this_month.aggregate(Sum("value")).get("value__sum")
        
        # retrieving incomes/expenses summary for previous month
        transactions_prev_month = transactions_all.filter(transaction_filter.transaction_date_filter(month="previous"))
   
        incomes_previous_month = transactions_prev_month.filter(operation="incomes")
        incomes_previous_month_sum = incomes_previous_month.aggregate(Sum("value")).get("value__sum")

        expenses_previous_month = transactions_prev_month.filter(operation="expenses")
        expenses_previous_month_sum = expenses_previous_month.aggregate(Sum("value")).get("value__sum")

You can see transaction_date_filter in filter() manager. It’s just a Q() objects for date filtering.

Using select_related I managed to remove duplicates but there are still “similar” queries as Django ORM requests individual queries to filter incomes_this_month, expenses_this_month, incomes_previous_month and expenses_previous_month by date= and operation= fields. I’m aware that filter() manager “resets” cached objects so I’m trying to find a more efficient way to get such a data withoud hitting a DB with extra queries. I’ve attached DjDT query counter screenshot. Hope it makes the question more clear.

Appreciate any suggestion on this topic.

I think you are confused a bit about what a QuerySet is.

A QuerySet is a list of instructions to be sent to a database (server). It is NOT a list of things from the db. Furthemore, if you look at the QuerySet documentation, you will notice that some methods will return a QuerySet, some others won’t. The ones that won’t are the ones that actually execute the query.

Therefore (please see the comments):

# retrieving all transactons
transactions_all = self.get_queryset()  # you have a queryset

# retrieving incomes/expenses summary for current month
incomes_this_month = transactions_all.filter(...)   # you have another queryset
incomes_this_month_sum = incomes_this_month.aggregate(Sum("value")).get("value__sum")  # you execute the queryset

expenses_this_month = transactions_all.filter(...)   # you have yet another queryset                          
expenses_this_month_sum = expenses_this_month.aggregate(Sum("value")).get("value__sum")  # you execute this other queryset

You have the following options:

  1. Leave it as is. Doing more than 1 query is not necesarly bad.

  2. Retrieve all transactions (for this user) once and use regular python to do your sums:

transactions_all = list(self.get_queryset())
# transactions_all is now a LIST of Transaction (eg, python list of python objects)
  1. Read about annotate in the QuerySet docs and annotate your queryset with all the sums at once. You might want to also read about Case.

Thank you for a pointing on a right way. I guess the annotation() will do a trick.

  1. Retrieve all transactions (for this user) once and use regular python to do your sums:

Is it a good idea to convert the QuerySet to a list as that may cause a performance hit if q-ty of Transaction objects reach some huge value?

All 3 options could be a bad idea in different contexts. Remember that no mater what, you will be using large amounts of memory if you have lots of data, it is just a mater of figuring out where is cheaper to use that much memory.

You should be reading about annotate and case. You could then do something like:

current_transactions_sums = transactions_all.filter(...get_transactions_for_current...)
.annotate(
    incomes_sum=Sum(Case(
        When(operations="incomes", then=F(value)),
        default=0,
        output_field=FloatField()
    )),
    expenses_sum=Sum(Case(
        When(operations="expenses", then=F(value)),
        default=0,
        output_field=FloatField()
    )),
).values('incomes_sum', 'expenses_sum')

You’re going to encounter that either way if the number of transactions gets high enough.

You can mitigate the amount of memory needed for your application by using an iterator to process the data in chunks rather than all-at-one-time.

But you would need to determine whether it’s better for you to process all the rows one time in your application, or if it’s better to process subsets of the data “N” times in the database itself.
Note, if this is a frequent operation, you may wish to consider creating indexes on month and operation.

Or, if these transactions are “date locked”, you might want to consider creating a reference table for historical data so that only the current month is being calculated on the fly.

1 Like