Best practices for dealing with slow performance

I am wondering what the best practices are for optimizing view performance.

I have this view that takes over 90 seconds to load (with 5 queries, 2 of which take the bulk of time). It has some complex queries, which a few Subqueries and a bunch of calculations.

Should I try to split all these calculations in different views, use model @property, use async for the calculations or make some intermediary calculations and save them as fields on models? (I am fairly new to programming and Django)?

This view is supposed to calculate the (salary) base amount and the bonus for each Dispatcher. A dispatcher can have BonusCustomers, for which they receive an extra bonus.

The base amount (per dispatcher) is the average weekly miles from the last 2 weeks * the set rate per location.

The bonus (per dispatcher) is the set bonus per customer (field of Dispatcher) * sum of relevant jobs (loads) from dispatcherā€™s BonusCustomers. If the company weekly rate per mile for the last 2 weeks (billable amount / total_miles) is higher than the customer rate per mile for the last week, then relevant jobs are all loads from past week for all BonusCustomers for this specific Dispatcher.

def dispatchers_list(request):
        tenant = get_tenant(request)

    if request.method == 'GET':
        if request.GET.get('date_range') is not None:
            date_range = request.GET.get('date_range')
            date_start = datetime.datetime.strptime(date_range.split('-')[0].strip(), '%m/%d/%Y')
            date_end = datetime.datetime.strptime(date_range.split('-')[1].strip(), '%m/%d/%Y')
            custom_week_nr = get_week_number(date_end.date())
            year = int(date_end.strftime("%Y"))
            date_start_string = date_start.strftime('%m/%d/%Y')
            date_end_string = date_end.strftime('%m/%d/%Y')

        else:
            date_start = start_week_nr(CURRENT_YEAR, CURRENT_WEEK_CUSTOM-1)
            date_start = datetime.datetime.combine(date_start, datetime.datetime.min.time())
            date_end = date_start + datetime.timedelta(days=7)
            custom_week_nr = get_week_number(date_end.date())
            year = int(date_end.strftime("%Y"))
            date_start_string = date_start.strftime('%m/%d/%Y')
            date_end_string = date_end.strftime('%m/%d/%Y')
            date_range = f"{date_start_string} - {date_end_string}"
    
        date_start_miles = datetime.datetime.combine(date_start - datetime.timedelta(days=7), datetime.datetime.min.time())
    
        date_end_miles = date_start_miles + datetime.timedelta(days=14)
   
    
        # Calculate company rate per mile for last 2 weeks
        company_loads_last_2_weeks = Load.objects.filter(drop_date__gte=date_start_miles, drop_date__lt=date_end_miles).exclude(load_status='Cancelled').aggregate(billable_amount=Sum('billable_amount'), total_miles=Sum('total_miles'))
        if company_loads_last_2_weeks['billable_amount'] is not None and company_loads_last_2_weeks['total_miles'] is not None:
            company_rate_per_mile = company_loads_last_2_weeks['billable_amount'] / company_loads_last_2_weeks['total_miles']
        else:
            company_rate_per_mile = 0
     


        loads_subquery = Load.objects.filter(
            customer=OuterRef('customer'),  # Reference the customer field of the current BonusCustomer
            drop_date__gte=date_start_miles,
            drop_date__lt=date_end_miles 
        ).exclude(load_status='Cancelled'
        ).values('customer'
        ).annotate(
            total_revenue=Sum('billable_amount', default=0),
            total_miles=Sum('total_miles', default=0)
        )

        loads_subquery_all_jobs = Load.objects.filter(
            customer=OuterRef('customer'),
            drop_date__gte=date_start,
            drop_date__lt=date_end 
        ).exclude(load_status='Cancelled'
        ).values('customer'
        ).annotate(
            all_jobs=Count('id')
        )


        loads_subquery_all_jobs_pick_city = Load.objects.filter(
            customer=OuterRef('customer'),
            pick_city=OuterRef('pick_city'),
            drop_date__gte=date_start,
            drop_date__lt=date_end 
        ).exclude(load_status='Cancelled'
        ).values('customer'
        ).annotate(
            all_jobs=Count('id')
        )


        loads_subquery_jobs_excl_dispatcher = Load.objects.filter(
            customer=OuterRef('customer'),
            drop_date__gte=date_start,
            drop_date__lt=date_end
        ).exclude(load_status='Cancelled'
        ).exclude(sales_agent=OuterRef('dispatcher__name')
        ).values('customer'
        ).annotate(
            jobs_excl_dispatcher=Count('id')
        )


        loads_subquery_jobs_excl_dispatcher_pick_city = Load.objects.filter(
            customer=OuterRef('customer'),
            pick_city=OuterRef('pick_city'),
            drop_date__gte=date_start,
            drop_date__lt=date_end
        ).exclude(load_status='Cancelled'
        ).exclude(sales_agent=OuterRef('dispatcher__name')
        ).values('customer'
        ).annotate(
            jobs_excl_dispatcher=Count('id')
        )

    # Annotate the BonusCustomer queryset with the calculated sum
    bonus_customers = BonusCustomer.objects.select_related('dispatcher').annotate(
        customer_revenue=Subquery(loads_subquery.values('total_revenue'), output_field=DecimalField(max_digits=9, decimal_places=2)),
        customer_miles=Subquery(loads_subquery.values('total_miles'), output_field=DecimalField(max_digits=9, decimal_places=2)),
        customer_rpm = F('customer_revenue') / F('customer_miles'),
        pick_city_null = ExpressionWrapper(Q(pick_city=''), output_field=BooleanField()),
        all_jobs = Coalesce(
            Case(
                When(pick_city_null=True, then=Subquery(loads_subquery_all_jobs.values('all_jobs'), output_field=DecimalField(max_digits=9, decimal_places=2))),
                When(pick_city_null=False, then=Subquery(loads_subquery_all_jobs_pick_city.values('all_jobs'), output_field=DecimalField(max_digits=9, decimal_places=2)))
            ), 0, output_field=DecimalField(max_digits=3, decimal_places=0)
        ),
        jobs_excl_dispatcher = Coalesce(
            Case(
                When(pick_city_null=True, then=Subquery(loads_subquery_jobs_excl_dispatcher.values('jobs_excl_dispatcher'), output_field=DecimalField(max_digits=9, decimal_places=2))),
                When(pick_city_null=False, then=Subquery(loads_subquery_jobs_excl_dispatcher_pick_city.values('jobs_excl_dispatcher'), output_field=DecimalField(max_digits=9, decimal_places=2)))
            ), 0, output_field=DecimalField(max_digits=3, decimal_places=0)
        ),
        jobs_for_bonus = Coalesce(
            Case(
                When(customer_rpm__gte=Decimal(company_rate_per_mile), then='jobs_excl_dispatcher'),
                When(customer_rpm__lt=Decimal(company_rate_per_mile), then='all_jobs')
            ), 0, output_field=DecimalField(max_digits=3, decimal_places=0)
        ),
    ).order_by('customer')
    

    dispatchers = Dispatcher.objects.select_related('company', 'location').annotate(
        num_of_loads=Count('load', filter=Q(load__drop_date__gte=date_start, load__drop_date__lt=date_end_miles)&~Q(load__load_status='Cancelled')),
        loads_miles = Sum(F('load__total_miles')/2, filter=Q(load__drop_date__gte=date_start_miles, load__drop_date__lt=date_end_miles)&~Q(load__load_status='Cancelled'), default=0),
        base_rate = Subquery(BaseRate.objects.filter(location=OuterRef('location')).values('rate')),
        base_amount_model_null = ExpressionWrapper(Q(base_amount__isnull=True), output_field=BooleanField()),
        base_amt = Case(
            When(base_amount_model_null=False, then='base_amount'),
            When(base_amount_model_null=True, then=F('loads_miles')*F('base_rate'))
        ),
        bonus_jobs=Coalesce(
            Subquery(bonus_customers.filter(dispatcher=OuterRef('pk')).order_by().values('dispatcher').annotate(jobs_sum=Sum('jobs_for_bonus')).values('jobs_sum')), 0, output_field=DecimalField(max_digits=3, decimal_places=0)
        ),
        bonus_amount_new=Coalesce(
            F('bonus_jobs')*F('bonus_per_customer'), 0, output_field=DecimalField(max_digits=3, decimal_places=0))
        )


    context = {
        'tenant': tenant,
        'dispatchers': dispatchers,
        'bonus_customers': bonus_customers,
        'date_range': date_range,
        'date_start': date_start_string,
        'date_end': date_end_string
    }
    return render(request, template_name='dispatchers/dispatcher-list.html', context=context)

The models:

class Dispatcher(models.Model):
    name = models.CharField(max_length=50, blank=True, null=True, editable=True, unique=True)
    location = models.ForeignKey(Location, on_delete=models.CASCADE, blank=True, null=True)
    base_amount = models.DecimalField(max_digits=6, decimal_places=2, blank=True, null=True)
    bonus_amount = models.DecimalField(max_digits=6, decimal_places=2, blank=True, null=True)
    bonus_per_customer = models.DecimalField(max_digits=8, decimal_places=2, blank=True, null=True, default=25.00)


 class BonusCustomer(models.Model):
    customer = models.CharField(max_length=150, blank=True, null=True)
    dispatcher = models.ForeignKey(Dispatcher, on_delete=models.DO_NOTHING, blank=True, null=True)
    pick_city = models.CharField(max_length=150, blank=True, null=True)


class Load(models.Model):
    sales_agent = models.ForeignKey(Dispatcher, on_delete=models.DO_NOTHING, to_field='name', blank=True, null=True)
    customer = models.CharField(max_length=320, null=True, blank=True)
    pick_date = models.DateTimeField(null=True, blank=True)
    drop_date = models.DateTimeField(null=True, blank=True)
    pick_city = models.CharField(max_length=120, null=True, blank=True)
    load_status = models.CharField(max_length=120, null=True, blank=True)
    billable_amount = models.DecimalField(max_digits=9, decimal_places=2, blank=True, null=True)
    empty_miles = models.FloatField(null=True, blank=True)
    total_miles = models.DecimalField(max_digits=9, decimal_places=2, null=True, blank=True, editable=False)


class BaseRate(models.Model):
    location = models.ForeignKey(Location, on_delete=models.DO_NOTHING)
    rate = models.DecimalField(max_digits=6, decimal_places=3, verbose_name='Base rate')

class Location(models.Model):
    location = models.CharField(max_length=50, blank=False, null=False, unique=True)

You should run an explain on that query. It should help highlight which clauses are consuming the bulk of the time. It may also identify cases where additional indexes may help.

Superficially, I see the biggest potential problem with your models is that youā€™re effectively doing a join on what are probably non-indexed fields. (In principle, thatā€™s what the OuterRef calls are doing.)

Anyway, going through your ideas:

Thatā€™s find if you want to present this data on different pages. Does that satisfy your requirements?

Not going to matter at all. The @property decorator probably doesnā€™t do what you think it does.

Not likely to help here

Thatā€™s a possible improvement.

We have seen cases where weā€™ve needed to perform multiple queries and do the calculations for that data in Python rather than as part of a query - and have it work much faster.

But really, your first step should be to take a much closer look at the query itself. Like I said at the top, explain is the best tool for understanding this query. (Yes, understanding the output of an explain does take some effort, but itā€™s the most sure way of finding out exactly what the database is doing.)

Looking at connection.queries will show you the actual query being issued to the database.

For experimentation purposes, you can try simplifying the query by selectively removing subqueries to see if thereā€™s one in particular having the greatest effect on the overall performance. That doesnā€™t fix anything, but may help point you in the direction of identifying the biggest issue.

1 Like

Ken, thank you for the detailed reply.

Can you point me to how I can do that? Django debug toolbar has no ā€œExplā€ button next to this query.

What would be a possible solution in this case? Indexing?

Not really. It would be great if I can do these calculations in this one view.

The problem with saving intermediate results as model field values is that the calculations depend on a time range, which should be a user input.

This is the output of connection.queries I am not sure what it means.

[{'sql': "SET search_path = 'public'", 'time': '0.183'}, {'sql': 'SELECT "companies_domain"."id", "companies_domain"."domain", "companies_domain"."tenant_id", "companies_domain"."is_primary", "companies_company"."id", "companies_company"."schema_name", "companies_company"."name", "companies_company"."paid_until", "companies_company"."on_trial", "companies_company"."created_on", "companies_company"."full_name", "companies_company"."alvys_company_code", "companies_company"."alvys_authorization", "companies_company"."alvys_x_alvys_key" FROM "companies_domain" INNER JOIN "companies_company" ON ("companies_domain"."tenant_id" = "companies_company"."id") WHERE "companies_domain"."domain" = \'hermes.fleetdata.pro\' LIMIT 21', 'time': '0.191'}, {'sql': "SET search_path = 'hermes','public'", 'time': '0.183'}, {'sql': "SET search_path = 'hermes','public'", 'time': '0.191'}, {'sql': "SET search_path = 'hermes','public'", 'time': '0.198'}]

Iā€™ll try removing subqueries, thanks.

So I ran QuerySet.explain() on the slower query and this is the output:

GroupAggregate  (cost=2067489.82..91166732.02 rows=6865 width=1096)
  Group Key: dispatchers_dispatcher.id, ((dispatchers_dispatcher.base_amount IS NULL)), (COALESCE((SubPlan 11), '0'::numeric)), (COALESCE((COALESCE((SubPlan 20), '0'::numeric) * dispatchers_dispatcher.bonus_per_customer), '0'::numeric)), dispatchers_location.id, dispatchers_company.id
  ->  Incremental Sort  (cost=2067489.82..45484759.77 rows=6865 width=1041)
        Sort Key: dispatchers_dispatcher.id, ((dispatchers_dispatcher.base_amount IS NULL)), (COALESCE((SubPlan 11), '0'::numeric)), (COALESCE((COALESCE((SubPlan 20), '0'::numeric) * dispatchers_dispatcher.bonus_per_customer), '0'::numeric)), dispatchers_location.id, dispatchers_company.id
        Presorted Key: dispatchers_dispatcher.id
        ->  Nested Loop Left Join  (cost=5.29..45484208.21 rows=6865 width=1041)
              ->  Nested Loop Left Join  (cost=0.29..108.41 rows=22 width=945)
                    Join Filter: (dispatchers_dispatcher.company_id = dispatchers_company.id)
                    ->  Nested Loop Left Join  (cost=0.29..64.16 rows=22 width=189)
                          ->  Index Scan using dispatchers_dispatcher_pkey on dispatchers_dispatcher  (cost=0.14..12.47 rows=22 width=63)
                          ->  Index Scan using dispatchers_location_pkey on dispatchers_location  (cost=0.15..2.35 rows=1 width=126)
                                Index Cond: (id = dispatchers_dispatcher.location_id)
                    ->  Materialize  (cost=0.00..11.50 rows=100 width=756)
                          ->  Seq Scan on dispatchers_company  (cost=0.00..11.00 rows=100 width=756)
              ->  Bitmap Heap Scan on loads_load  (cost=5.00..52.48 rows=327 width=44)
                    Recheck Cond: ((dispatchers_dispatcher.name)::text = (sales_agent_id)::text)
                    ->  Bitmap Index Scan on loads_load_sales_agent_id_f07dc912_like  (cost=0.00..4.92 rows=327 width=0)
                          Index Cond: ((sales_agent_id)::text = (dispatchers_dispatcher.name)::text)
              SubPlan 11
                ->  GroupAggregate  (cost=0.14..3312.66 rows=1 width=40)
                      Group Key: v0.dispatcher_id
                      ->  Nested Loop  (cost=0.14..9.45 rows=1 width=657)
                            ->  Index Scan using dispatchers_bonuscustomer_dispatcher_id_cd9f56b3 on dispatchers_bonuscustomer v0  (cost=0.14..8.16 rows=1 width=644)
                                  Index Cond: (dispatcher_id = dispatchers_dispatcher.id)
                            ->  Seq Scan on dispatchers_dispatcher v1  (cost=0.00..1.27 rows=1 width=21)
                                  Filter: (id = dispatchers_dispatcher.id)
                      SubPlan 3
                        ->  GroupAggregate  (cost=0.00..404.32 rows=1 width=55)
                              Group Key: u0_2.customer
                              ->  Seq Scan on loads_load u0_2  (cost=0.00..404.30 rows=1 width=28)
                                    Filter: ((drop_date >= '2023-08-19 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-02 00:00:00'::timestamp without time zone) AND (((load_status)::text <> 'Cancelled'::text) OR ((load_status)::text IS NULL)) AND ((customer)::text = (v0.customer)::text))
                      SubPlan 4
                        ->  GroupAggregate  (cost=0.00..404.32 rows=1 width=55)
                              Group Key: u0_3.customer
                              ->  Seq Scan on loads_load u0_3  (cost=0.00..404.30 rows=1 width=29)
                                    Filter: ((drop_date >= '2023-08-19 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-02 00:00:00'::timestamp without time zone) AND (((load_status)::text <> 'Cancelled'::text) OR ((load_status)::text IS NULL)) AND ((customer)::text = (v0.customer)::text))
                      SubPlan 5
                        ->  GroupAggregate  (cost=0.00..421.48 rows=1 width=31)
                              Group Key: u0_4.customer
                              ->  Seq Scan on loads_load u0_4  (cost=0.00..421.46 rows=1 width=31)
                                    Filter: ((drop_date >= '2023-08-26 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-02 00:00:00'::timestamp without time zone) AND (((load_status)::text <> 'Cancelled'::text) OR ((load_status)::text IS NULL)) AND (((sales_agent_id)::text <> (v1.name)::text) OR ((sales_agent_id)::text IS NULL)) AND ((customer)::text = (v0.customer)::text))
                      SubPlan 6
                        ->  GroupAggregate  (cost=0.00..438.64 rows=1 width=31)
                              Group Key: u0_5.customer
                              ->  Seq Scan on loads_load u0_5  (cost=0.00..438.62 rows=1 width=31)
                                    Filter: ((drop_date >= '2023-08-26 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-02 00:00:00'::timestamp without time zone) AND (((load_status)::text <> 'Cancelled'::text) OR ((load_status)::text IS NULL)) AND (((sales_agent_id)::text <> (v1.name)::text) OR ((sales_agent_id)::text IS NULL)) AND ((customer)::text = (v0.customer)::text) AND ((pick_city)::text = (v0.pick_city)::text))
                      SubPlan 7
                        ->  GroupAggregate  (cost=0.00..404.32 rows=1 width=55)
                              Group Key: u0_6.customer
                              ->  Seq Scan on loads_load u0_6  (cost=0.00..404.30 rows=1 width=28)
                                    Filter: ((drop_date >= '2023-08-19 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-02 00:00:00'::timestamp without time zone) AND (((load_status)::text <> 'Cancelled'::text) OR ((load_status)::text IS NULL)) AND ((customer)::text = (v0.customer)::text))
                      SubPlan 8
                        ->  GroupAggregate  (cost=0.00..404.32 rows=1 width=55)
                              Group Key: u0_7.customer
                              ->  Seq Scan on loads_load u0_7  (cost=0.00..404.30 rows=1 width=29)
                                    Filter: ((drop_date >= '2023-08-19 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-02 00:00:00'::timestamp without time zone) AND (((load_status)::text <> 'Cancelled'::text) OR ((load_status)::text IS NULL)) AND ((customer)::text = (v0.customer)::text))
                      SubPlan 9
                        ->  GroupAggregate  (cost=0.00..404.31 rows=1 width=31)
                              Group Key: u0_8.customer
                              ->  Seq Scan on loads_load u0_8  (cost=0.00..404.30 rows=1 width=31)
                                    Filter: ((drop_date >= '2023-08-26 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-02 00:00:00'::timestamp without time zone) AND (((load_status)::text <> 'Cancelled'::text) OR ((load_status)::text IS NULL)) AND ((customer)::text = (v0.customer)::text))
                      SubPlan 10
                        ->  GroupAggregate  (cost=0.00..421.48 rows=1 width=31)
                              Group Key: u0_9.customer
                              ->  Seq Scan on loads_load u0_9  (cost=0.00..421.46 rows=1 width=31)
                                    Filter: ((drop_date >= '2023-08-26 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-02 00:00:00'::timestamp without time zone) AND (((load_status)::text <> 'Cancelled'::text) OR ((load_status)::text IS NULL)) AND ((customer)::text = (v0.customer)::text) AND ((pick_city)::text = (v0.pick_city)::text))
              SubPlan 20
                ->  GroupAggregate  (cost=0.14..3312.66 rows=1 width=40)
                      Group Key: v0_1.dispatcher_id
                      ->  Nested Loop  (cost=0.14..9.45 rows=1 width=657)
                            ->  Index Scan using dispatchers_bonuscustomer_dispatcher_id_cd9f56b3 on dispatchers_bonuscustomer v0_1  (cost=0.14..8.16 rows=1 width=644)
                                  Index Cond: (dispatcher_id = dispatchers_dispatcher.id)
                            ->  Seq Scan on dispatchers_dispatcher v1_1  (cost=0.00..1.27 rows=1 width=21)
                                  Filter: (id = dispatchers_dispatcher.id)
                      SubPlan 12
                        ->  GroupAggregate  (cost=0.00..404.32 rows=1 width=55)
                              Group Key: u0_10.customer
                              ->  Seq Scan on loads_load u0_10  (cost=0.00..404.30 rows=1 width=28)
                                    Filter: ((drop_date >= '2023-08-19 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-02 00:00:00'::timestamp without time zone) AND (((load_status)::text <> 'Cancelled'::text) OR ((load_status)::text IS NULL)) AND ((customer)::text = (v0_1.customer)::text))
                      SubPlan 13
                        ->  GroupAggregate  (cost=0.00..404.32 rows=1 width=55)
                              Group Key: u0_11.customer
                              ->  Seq Scan on loads_load u0_11  (cost=0.00..404.30 rows=1 width=29)
                                    Filter: ((drop_date >= '2023-08-19 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-02 00:00:00'::timestamp without time zone) AND (((load_status)::text <> 'Cancelled'::text) OR ((load_status)::text IS NULL)) AND ((customer)::text = (v0_1.customer)::text))
                      SubPlan 14
                        ->  GroupAggregate  (cost=0.00..421.48 rows=1 width=31)
                              Group Key: u0_12.customer
                              ->  Seq Scan on loads_load u0_12  (cost=0.00..421.46 rows=1 width=31)
                                    Filter: ((drop_date >= '2023-08-26 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-02 00:00:00'::timestamp without time zone) AND (((load_status)::text <> 'Cancelled'::text) OR ((load_status)::text IS NULL)) AND (((sales_agent_id)::text <> (v1_1.name)::text) OR ((sales_agent_id)::text IS NULL)) AND ((customer)::text = (v0_1.customer)::text))
                      SubPlan 15
                        ->  GroupAggregate  (cost=0.00..438.64 rows=1 width=31)
                              Group Key: u0_13.customer
                              ->  Seq Scan on loads_load u0_13  (cost=0.00..438.62 rows=1 width=31)
                                    Filter: ((drop_date >= '2023-08-26 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-02 00:00:00'::timestamp without time zone) AND (((load_status)::text <> 'Cancelled'::text) OR ((load_status)::text IS NULL)) AND (((sales_agent_id)::text <> (v1_1.name)::text) OR ((sales_agent_id)::text IS NULL)) AND ((customer)::text = (v0_1.customer)::text) AND ((pick_city)::text = (v0_1.pick_city)::text))
                      SubPlan 16
                        ->  GroupAggregate  (cost=0.00..404.32 rows=1 width=55)
                              Group Key: u0_14.customer
                              ->  Seq Scan on loads_load u0_14  (cost=0.00..404.30 rows=1 width=28)
                                    Filter: ((drop_date >= '2023-08-19 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-02 00:00:00'::timestamp without time zone) AND (((load_status)::text <> 'Cancelled'::text) OR ((load_status)::text IS NULL)) AND ((customer)::text = (v0_1.customer)::text))
                      SubPlan 17

If thatā€™s what explain indicates as an issue, yes, that might help. (More to follow on this below)

Thatā€™s not what Iā€™m suggesting. Instead of trying to run one query that does everything, you run multiple queries - each using whatever parameters might be necessary - and then do the aggregation of the data in Python rather than as part of the larger query. (e.g., how Django handles the prefetch_related clause.)

The second element of the list shows a query that has been executed. You would need to run your query, then find that query from that list.

Great!

Notice how early on you can see these lines:

This is good - this is showing that this clause is using an index.

By comparison, you have a lot of lines like this:

This (Seq Scan) is bad.

Itā€™s saying itā€™s doing a table scan on the Load model, and the field itā€™s referencing is load_status.

You could try indexing that field to see what it does for you.

However, Iā€™m going to go out on a limb and guess that load_status really isnā€™t a ā€œfree formā€ list of any arbitrary text up to 120 characters long. Is load_status truly anything? Or is it one of a well-defined list of options?

If the latter, then Iā€™d consider refactoring your models to make that field a foreign key to a reference table containing the set of valid load_status.

Also, Iā€™ve noticed an inconsistency in your use of the field named customer. In BonusCustomer itā€™s 150 characters and in Load itā€™s 320 characters. If they are actually both references to the same type of entity (a reasonable assumption based upon the query youā€™re trying to run), that field is another candidate for being refactored into a reference table.

This may well be a situation where properly normalizing your data models will yield significant performance benefits.

Ken, thank you so much for these suggestions. Iā€™ll take some time to refactor the load_status and see if I can do that for the customer, as these are strings coming from the import and not consistent.

Ken, I refactored both the load_status and the customer to their own tables. Also, indexed a few columns including these both. Now Iā€™m down to 35 seconds. Is there anything else I can try?

Hereā€™s the SQL explain:

GroupAggregate  (cost=267281.02..17853357.33 rows=7479 width=1096)
  Group Key: dispatchers_dispatcher.id, ((dispatchers_dispatcher.base_amount IS NULL)), (((SubPlan 20) * dispatchers_dispatcher.bonus_per_customer)), dispatchers_location.id, dispatchers_company.id
  ->  Incremental Sort  (cost=267281.02..5880181.53 rows=7479 width=1009)
        Sort Key: dispatchers_dispatcher.id, ((dispatchers_dispatcher.base_amount IS NULL)), (((SubPlan 20) * dispatchers_dispatcher.bonus_per_customer)), dispatchers_location.id, dispatchers_company.id
        Presorted Key: dispatchers_dispatcher.id
        ->  Nested Loop Left Join  (cost=4.97..5879573.75 rows=7479 width=1009)
              ->  Nested Loop Left Join  (cost=0.29..108.41 rows=22 width=945)
                    Join Filter: (dispatchers_dispatcher.company_id = dispatchers_company.id)
                    ->  Nested Loop Left Join  (cost=0.29..64.16 rows=22 width=189)
                          ->  Index Scan using dispatchers_dispatcher_pkey on dispatchers_dispatcher  (cost=0.14..12.47 rows=22 width=63)
                          ->  Index Scan using dispatchers_location_pkey on dispatchers_location  (cost=0.15..2.35 rows=1 width=126)
                                Index Cond: (id = dispatchers_dispatcher.location_id)
                    ->  Materialize  (cost=0.00..11.50 rows=100 width=756)
                          ->  Seq Scan on dispatchers_company  (cost=0.00..11.00 rows=100 width=756)
              ->  Bitmap Heap Scan on loads_load  (cost=4.68..56.07 rows=356 width=44)
                    Recheck Cond: ((dispatchers_dispatcher.name)::text = (sales_agent_id)::text)
                    ->  Bitmap Index Scan on loads_load_sales_a_63980f_idx  (cost=0.00..4.59 rows=356 width=0)
                          Index Cond: ((sales_agent_id)::text = (dispatchers_dispatcher.name)::text)
              SubPlan 20
                ->  GroupAggregate  (cost=0.00..785.95 rows=3 width=40)
                      Group Key: v0_1.dispatcher_id
                      ->  Nested Loop  (cost=0.00..2.57 rows=3 width=34)
                            ->  Seq Scan on dispatchers_dispatcher v2_1  (cost=0.00..1.27 rows=1 width=21)
                                  Filter: (id = dispatchers_dispatcher.id)
                            ->  Seq Scan on dispatchers_bonuscustomer v0_1  (cost=0.00..1.26 rows=3 width=21)
                                  Filter: (dispatcher_id = dispatchers_dispatcher.id)
                      SubPlan 12
                        ->  GroupAggregate  (cost=4.34..32.63 rows=1 width=40)
                              Group Key: u0_10.customer_id
                              ->  Bitmap Heap Scan on loads_load u0_10  (cost=4.34..32.61 rows=1 width=13)
                                    Recheck Cond: (customer_id = v0_1.customer_id)
                                    Filter: ((((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)) AND (drop_date >= '2023-09-09 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-23 00:00:00'::timestamp without time zone) AND (((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)))
                                    ->  Bitmap Index Scan on loads_load_custome_8c7b78_idx  (cost=0.00..4.34 rows=8 width=0)
                                          Index Cond: (customer_id = v0_1.customer_id)
                      SubPlan 13
                        ->  GroupAggregate  (cost=4.34..32.63 rows=1 width=40)
                              Group Key: u0_11.customer_id
                              ->  Bitmap Heap Scan on loads_load u0_11  (cost=4.34..32.61 rows=1 width=14)
                                    Recheck Cond: (customer_id = v0_1.customer_id)
                                    Filter: ((((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)) AND (drop_date >= '2023-09-09 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-23 00:00:00'::timestamp without time zone) AND (((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)))
                                    ->  Bitmap Index Scan on loads_load_custome_8c7b78_idx  (cost=0.00..4.34 rows=8 width=0)
                                          Index Cond: (customer_id = v0_1.customer_id)
                      SubPlan 14
                        ->  GroupAggregate  (cost=4.34..32.65 rows=1 width=16)
                              Group Key: u0_12.customer_id
                              ->  Bitmap Heap Scan on loads_load u0_12  (cost=4.34..32.63 rows=1 width=16)
                                    Recheck Cond: (customer_id = v0_1.customer_id)
                                    Filter: ((((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)) AND (drop_date >= '2023-09-16 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-23 00:00:00'::timestamp without time zone) AND (((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)) AND (((sales_agent_id)::text <> (v2_1.name)::text) OR ((sales_agent_id)::text IS NULL)))
                                    ->  Bitmap Index Scan on loads_load_custome_8c7b78_idx  (cost=0.00..4.34 rows=8 width=0)
                                          Index Cond: (customer_id = v0_1.customer_id)
                      SubPlan 15
                        ->  GroupAggregate  (cost=4.34..32.67 rows=1 width=16)
                              Group Key: u0_13.customer_id
                              ->  Bitmap Heap Scan on loads_load u0_13  (cost=4.34..32.65 rows=1 width=16)
                                    Recheck Cond: (customer_id = v0_1.customer_id)
                                    Filter: ((((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)) AND (drop_date >= '2023-09-16 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-23 00:00:00'::timestamp without time zone) AND (((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)) AND (((sales_agent_id)::text <> (v2_1.name)::text) OR ((sales_agent_id)::text IS NULL)) AND ((pick_city)::text = (v0_1.pick_city)::text))
                                    ->  Bitmap Index Scan on loads_load_custome_8c7b78_idx  (cost=0.00..4.34 rows=8 width=0)
                                          Index Cond: (customer_id = v0_1.customer_id)
                      SubPlan 16
                        ->  GroupAggregate  (cost=4.34..32.63 rows=1 width=40)
                              Group Key: u0_14.customer_id
                              ->  Bitmap Heap Scan on loads_load u0_14  (cost=4.34..32.61 rows=1 width=13)
                                    Recheck Cond: (customer_id = v0_1.customer_id)
                                    Filter: ((((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)) AND (drop_date >= '2023-09-09 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-23 00:00:00'::timestamp without time zone) AND (((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)))
                                    ->  Bitmap Index Scan on loads_load_custome_8c7b78_idx  (cost=0.00..4.34 rows=8 width=0)
                                          Index Cond: (customer_id = v0_1.customer_id)
                      SubPlan 17
                        ->  GroupAggregate  (cost=4.34..32.63 rows=1 width=40)
                              Group Key: u0_15.customer_id
                              ->  Bitmap Heap Scan on loads_load u0_15  (cost=4.34..32.61 rows=1 width=14)
                                    Recheck Cond: (customer_id = v0_1.customer_id)
                                    Filter: ((((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)) AND (drop_date >= '2023-09-09 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-23 00:00:00'::timestamp without time zone) AND (((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)))
                                    ->  Bitmap Index Scan on loads_load_custome_8c7b78_idx  (cost=0.00..4.34 rows=8 width=0)
                                          Index Cond: (customer_id = v0_1.customer_id)
                      SubPlan 18
                        ->  GroupAggregate  (cost=4.34..32.63 rows=1 width=16)
                              Group Key: u0_16.customer_id
                              ->  Bitmap Heap Scan on loads_load u0_16  (cost=4.34..32.61 rows=1 width=16)
                                    Recheck Cond: (customer_id = v0_1.customer_id)
                                    Filter: ((((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)) AND (drop_date >= '2023-09-16 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-23 00:00:00'::timestamp without time zone) AND (((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)))
                                    ->  Bitmap Index Scan on loads_load_custome_8c7b78_idx  (cost=0.00..4.34 rows=8 width=0)
                                          Index Cond: (customer_id = v0_1.customer_id)
                      SubPlan 19
                        ->  GroupAggregate  (cost=4.34..32.65 rows=1 width=16)
                              Group Key: u0_17.customer_id
                              ->  Bitmap Heap Scan on loads_load u0_17  (cost=4.34..32.63 rows=1 width=16)
                                    Recheck Cond: (customer_id = v0_1.customer_id)
                                    Filter: ((((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)) AND (drop_date >= '2023-09-16 00:00:00'::timestamp without time zone) AND (drop_date < '2023-09-23 00:00:00'::timestamp without time zone) AND (((load_status_id)::text <> 'Cancelled'::text) OR ((load_status_id)::text IS NULL)) AND ((pick_city)::text = (v0_1.pick_city)::text))
                                    ->  Bitmap Index Scan on loads_load_custome_8c7b78_idx  (cost=0.00..4.34 rows=8 width=0)
                                          Index Cond: (customer_id = v0_1.customer_id)
  SubPlan 1
    ->  Sort  (cost=14.45..14.47 rows=7 width=22)
          Sort Key: u0.id
          ->  Bitmap Heap Scan on dispatchers_baserate u0  (cost=4.21..14.35 rows=7 width=22)
                Recheck Cond: (location_id = dispatchers_dispatcher.location_id)
                ->  Bitmap Index Scan on dispatchers_baserate_location_id_1fb093a2  (cost=0.00..4.21 rows=7 width=0)
                      Index Cond: (location_id = dispatchers_dispatcher.location_id)
  SubPlan 2
    ->  Sort  (cost=14.45..14.47 rows=7 width=22)
          Sort Key: u0_1.id

You havenā€™t posted your new models (with the indexes defined) or the new query, so itā€™s going to be tough for me to offer any specific suggestions. However, itā€™s pretty clear that youā€™ve still got a couple of sequential scans being performed:


[quote="valeriuv, post:7, topic:23707"]
   Presorted Key: dispatchers_dispatcher.id
    ->  Nested Loop Left Join  (cost=4.97..5879573.75 rows=7479 width=1009)
          ->  Nested Loop Left Join  (cost=0.29..108.41 rows=22 width=945)
                Join Filter: (dispatchers_dispatcher.company_id = dispatchers_company.id)
                ->  Nested Loop Left Join  (cost=0.29..64.16 rows=22 width=189)
                      ->  Index Scan using dispatchers_dispatcher_pkey on dispatchers_dispatcher  (cost=0.14..12.47 rows=22 width=63)
                      ->  Index Scan using dispatchers_location_pkey on dispatchers_location  (cost=0.15..2.35 rows=1 width=126)
                            Index Cond: (id = dispatchers_dispatcher.location_id)
                ->  Materialize  (cost=0.00..11.50 rows=100 width=756)
                      ->  Seq Scan on dispatchers_company  (cost=0.00..11.00 rows=100 width=756)
[/quote]

and

[quote="valeriuv, post:7, topic:23707"]
            ->  GroupAggregate  (cost=0.00..785.95 rows=3 width=40)
                  Group Key: v0_1.dispatcher_id
                  ->  Nested Loop  (cost=0.00..2.57 rows=3 width=34)
                        ->  Seq Scan on dispatchers_dispatcher v2_1  (cost=0.00..1.27 rows=1 width=21)
                              Filter: (id = dispatchers_dispatcher.id)
                        ->  Seq Scan on dispatchers_bonuscustomer v0_1  (cost=0.00..1.26 rows=3 width=21)
                              Filter: (dispatcher_id = dispatchers_dispatcher.id)
[/quote]

which means there may be more indexes worth adding.

I also would not rule out the possibility that it may still be better to break this query down into smaller queries and perform the aggregation within your code.

Ken, here are the new models (added some indexes after your reply).

class Dispatcher(models.Model):
    ...
    name = models.CharField(max_length=50, blank=True, null=True, editable=True, unique=True)
    location = models.ForeignKey(Location, on_delete=models.CASCADE, blank=True, null=True)
    company = models.ForeignKey(Company, on_delete=models.CASCADE, blank=True, null=True)
    dispatcher_id = models.CharField(max_length=50, blank=True, null=True, help_text='Leave blank if no Dispatcher ID.', verbose_name='Dispatcher ID')
    base_amount = models.DecimalField(max_digits=6, decimal_places=2, blank=True, null=True)
    bonus_amount = models.DecimalField(max_digits=6, decimal_places=2, blank=True, null=True)
    bonus_per_customer = models.DecimalField(max_digits=8, decimal_places=2, blank=True, null=True, default=25.00)
    bonus_adjustment = models.DecimalField(max_digits=8, decimal_places=2, blank=True, null=True, default=0.00)

    class Meta:
        ordering = ['id']
        indexes = [
            models.Index(fields=['name']),
            models.Index(fields=['location']),
            models.Index(fields=['company'])
        ]

class BonusCustomer(models.Model):
    customer = models.ForeignKey('loads.Customer', on_delete=models.DO_NOTHING, blank=True, null=True)
    dispatcher = models.ForeignKey(Dispatcher, on_delete=models.DO_NOTHING, blank=True, null=True)
    pick_city = models.CharField(max_length=150, blank=True, null=True)

    class Meta:
        indexes = [
            models.Index(fields=["customer"]),
            models.Index(fields=["dispatcher"]),
            models.Index(fields=["pick_city"])
        ]


class LoadStatus(models.Model):
    status = models.CharField(max_length=25, null=True, blank=True, unique=True)
        
    class Meta:
        indexes = [
            models.Index(fields=['status'])
        ]
    

class Customer(models.Model):
    name = models.CharField(max_length=100, null=True, blank=True, unique=True)

    class Meta:
        ordering = ['name']
        indexes = [
            models.Index(fields=['name'])


class Load(models.Model):

    REEFER = 'Reefer'
    DRYVAN = 'DryVan'
    FLAT_BED = 'Flat Bed'
    POWER_ONLY = 'Power Only'
    NO_MATCH = 'No match'

    COMMODITY_CHOICES = (
        (REEFER, 'Reefer'),
        (DRYVAN, 'DryVan'),
        (FLAT_BED, 'Flat Bed'),
        (POWER_ONLY, 'Power Only'),
        (NO_MATCH, 'No match'),
    )

    load_id = models.CharField(max_length=20, unique=True)
    sales_agent = models.ForeignKey(Dispatcher, on_delete=models.DO_NOTHING, to_field='name', blank=True, null=True)
    owner_operator = models.CharField(max_length=120, null=True, blank=True)
    customer = models.ForeignKey(Customer, on_delete=models.DO_NOTHING, blank=True, null=True)
    equipment = models.CharField(max_length=120, null=True, blank=True)
    pick_date = models.DateTimeField(null=True, blank=True)
    drop_date = models.DateTimeField(null=True, blank=True)
    paid_loaded_miles = models.FloatField(null=True, blank=True)
    loaded_miles = models.FloatField(null=True, blank=True)
    pick_city = models.CharField(max_length=120, null=True, blank=True)
    pick_state = models.CharField(max_length=20, null=True, blank=True)
    drop_city = models.CharField(max_length=120, null=True, blank=True)
    drop_state = models.CharField(max_length=20, null=True, blank=True)
    load_status = models.ForeignKey(LoadStatus, on_delete=models.DO_NOTHING, to_field='status', blank=True, null=True)
    billable_amount = models.DecimalField(max_digits=9, decimal_places=2, blank=True, null=True)
    empty_miles = models.FloatField(null=True, blank=True)
    paid_empty_miles = models.FloatField(null=True, blank=True)
    customer_accessorial = models.DecimalField(max_digits=7, decimal_places=2, blank=True, null=True)
    dispatcher = models.CharField(max_length=120, null=True, blank=True)
    driver_1 = models.CharField(max_length=120, null=True, blank=True)
    truck = models.CharField(max_length=120, null=True, blank=True)
    date_created = models.DateTimeField(null=True, blank=True)
    date_updated = models.DateTimeField(null=True, blank=True)
    non_revenue_accessorial = models.DecimalField(max_digits=7, decimal_places=2, blank=True, null=True)
    truck_type = models.ForeignKey(Truck, on_delete=models.DO_NOTHING, to_field='type', blank=True, null=True)
    commodity = models.CharField(max_length=30, choices=COMMODITY_CHOICES, default=NO_MATCH)
    total_miles = models.DecimalField(max_digits=9, decimal_places=2, null=True, blank=True, editable=False)
    billable_amount_after_accessorial = models.DecimalField(max_digits=9, decimal_places=2, blank=True, null=True)

    class Meta:
        ordering = ['-load_id']
        indexes = [
            models.Index(fields=['sales_agent']),
            models.Index(fields=['customer']),
            models.Index(fields=['load_status']),
            models.Index(fields=['pick_date', 'drop_date'])
        ]


class Location(models.Model):
    location = models.CharField(max_length=50, blank=False, null=False, unique=True)
    
    class Meta:
        ordering = ['id']
        indexes = [
            models.Index(fields=['location'])
        ]

class BaseRate(models.Model):
    location = models.ForeignKey(Location, on_delete=models.DO_NOTHING)
    rate = models.DecimalField(max_digits=6, decimal_places=3, verbose_name='Base rate')

    class Meta:
        ordering = ['id']
        indexes = [
            models.Index(fields=['location'])
        ]

As for the query, the most time-intensive part was calculating the bonus_jobs and bonus_amount_new for each dispatcher:

bonus_jobs = Subquery(bonus_customers.filter(dispatcher=OuterRef('pk')).order_by().values('dispatcher').annotate(jobs_sum=Sum('jobs_for_bonus')).values('jobs_sum'))
bonus_amount_new = F('bonus_jobs')*F('bonus_per_customer')

bonus_jobs is calculating the number of relevant loads for each bonus customer of this dispatcher and adding them. This query would take about 15 seconds to perform. And interestingly enough, the bonus_amount_new would take about the same, but all itā€™s doing is multiplying bonus_jobs with the value of the model field bonus_per_customer

So I removed these 2 annotations from the dispatchers query and replaced them with the (dirty?) code below:

bonus_customer_bonus_jobs = bonus_customers.values('dispatcher', 'jobs_for_bonus').values('dispatcher').annotate(sum=Sum('jobs_for_bonus')).order_by()

for dispatcher in dispatchers:
    try:
        dispatcher.bonus_jobs = bonus_customer_bonus_jobs.get(dispatcher=dispatcher)['sum']
        dispatcher.bonus_amount_calculated = dispatcher.bonus_jobs * dispatcher.bonus_per_customer + dispatcher.bonus_adjustment
    except:
        dispatcher.bonus_jobs = 0

So Iā€™m getting the sum of jobs_for_bonus from the previous bonus_customers query and setting the value as an attribute of the instance of dispatcher queryset.

Hereā€™s the new query:

    # Loads QuerySet
    loads = Load.objects.select_related('sales_agent', 'customer', 'load_status', 'truck_type').all().exclude(load_status='Cancelled')

    # Calculate company rate per mile for the last 2 weeks
    loads_last_2_weeks = loads.filter(drop_date__gte=date_start_2_weeks, drop_date__lt=date_end_2_weeks)
    loads_last_2_weeks_aggregates = loads_last_2_weeks.aggregate(billable_amount=Sum('billable_amount'), total_miles=Sum('total_miles'))
    company_rate_per_mile = loads_last_2_weeks_aggregates['billable_amount'] / loads_last_2_weeks_aggregates['total_miles']

    # Loads last 1 week
    loads_last_1_week = loads.filter(drop_date__gte=date_start, drop_date__lt=date_end)

    # Subquery to get the customer rate per mile for last 2 weeks
    customer_loads_rate_per_mile = loads_last_2_weeks.filter(customer=OuterRef('customer')).values('customer').annotate(billable_amount=Sum('billable_amount'), total_miles=Sum('total_miles'))

    # Subquery to get all jobs and jobs ex dispatcher with and without pick_city

    customer_loads_all_jobs_no_pick_city = loads_last_1_week.filter(customer=OuterRef('customer')).exclude(pick_city__in=Subquery(BonusCustomer.objects.filter(customer=OuterRef(OuterRef('customer'))).values_list('pick_city', flat=True))
    ).values('customer').annotate(all_jobs=Count('id'))
    customer_loads_all_jobs_pick_city = loads_last_1_week.filter(customer=OuterRef('customer'), pick_city=OuterRef('pick_city')).values('customer').annotate(all_jobs=Count('id'))
    customer_loads_jobs_ex_disp_no_pick_city = loads_last_1_week.filter(customer=OuterRef('customer')).exclude(sales_agent=OuterRef('dispatcher__name')).exclude(pick_city__in=Subquery(BonusCustomer.objects.filter(customer=OuterRef(OuterRef('customer'))).values_list('pick_city', flat=True))
    ).values('customer').annotate(jobs_excl_dispatcher=Count('id'))
    customer_loads_jobs_ex_disp_pick_city = loads_last_1_week.filter(customer=OuterRef('customer'), pick_city=OuterRef('pick_city')).exclude(sales_agent=OuterRef('dispatcher__name')).values('customer').annotate(jobs_excl_dispatcher=Count('id'))

    bonus_customers = BonusCustomer.objects.select_related('customer', 'dispatcher').annotate(
        customer_revenue = Subquery(customer_loads_rate_per_mile.values('billable_amount'), output_field=DecimalField(max_digits=9, decimal_places=2)),
        customer_miles = Subquery(customer_loads_rate_per_mile.values('total_miles'), output_field=DecimalField(max_digits=9, decimal_places=2)),
        customer_rpm = F('customer_revenue') / F('customer_miles'),
        pick_city_null = ExpressionWrapper(Q(pick_city=''), output_field=BooleanField()),
        all_jobs = Case(
            When(pick_city_null=True, then=Subquery(customer_loads_all_jobs_no_pick_city.values('all_jobs'), output_field=DecimalField(max_digits=9, decimal_places=2))),
            When(pick_city_null=False, then=Subquery(customer_loads_all_jobs_pick_city.values('all_jobs'), output_field=DecimalField(max_digits=9, decimal_places=2)))
            ),
        jobs_excl_dispatcher = Case(
            When(pick_city_null=True, then=Subquery(customer_loads_jobs_ex_disp_no_pick_city.values('jobs_excl_dispatcher'), output_field=DecimalField(max_digits=9, decimal_places=2))),
            When(pick_city_null=False, then=Subquery(customer_loads_jobs_ex_disp_pick_city.values('jobs_excl_dispatcher'), output_field=DecimalField(max_digits=9, decimal_places=2)))
            ),
        jobs_for_bonus = Case(
            When(customer_rpm__gte=Decimal(company_rate_per_mile), then='jobs_excl_dispatcher'),
            When(customer_rpm__lt=Decimal(company_rate_per_mile), then='all_jobs')
            ),
    )

    bonus_customer_bonus_jobs = bonus_customers.values('dispatcher', 'jobs_for_bonus').values('dispatcher').annotate(sum=Sum('jobs_for_bonus')).order_by()

    dispatchers = Dispatcher.objects.select_related('location', 'company').annotate(
        num_of_loads = Count('load', filter=Q(load__drop_date__gte=date_start)&Q(load__drop_date__lt=date_end)&~Q(load__load_status='Cancelled')),
        loads_miles = Sum(F('load__total_miles')/2, filter=Q(load__drop_date__gte=date_start_2_weeks, load__drop_date__lt=date_end_2_weeks)&~Q(load__load_status='Cancelled')),
        base_rate = Subquery(BaseRate.objects.filter(location=OuterRef('location')).values('rate')),
        base_amount_model_null = ExpressionWrapper(Q(base_amount__isnull=True), output_field=BooleanField()),
        base_amt = Case(
            When(base_amount_model_null=False, then='base_amount'),
            When(base_amount_model_null=True, then=F('loads_miles')*F('base_rate'))
        ), 
        #bonus_jobs=Coalesce(
        #    Subquery(bonus_customers.filter(dispatcher=OuterRef('pk')).order_by().values('dispatcher').annotate(jobs_sum=Sum('jobs_for_bonus')).values('jobs_sum')), 0, output_field=DecimalField(max_digits=3, decimal_places=0)
        #),
        #bonus_amount_new = F('bonus_jobs')*F('bonus_per_customer')
    
    )

    for dispatcher in dispatchers:
        try:
            dispatcher.bonus_jobs = bonus_customer_bonus_jobs.get(dispatcher=dispatcher)['sum']
            dispatcher.bonus_amount_calculated = dispatcher.bonus_jobs * dispatcher.bonus_per_customer + dispatcher.bonus_adjustment
        except:
            dispatcher.bonus_jobs = 0

This took the total load time from 30 seconds down to 6. The downside is that I cannot aggregate the ā€˜bonus_jobsā€™ and bonus_amount_calculated attributes.

Hereā€™s the new dispatchers.explain():

HashAggregate  (cost=894.03..1279.97 rows=184 width=1046)
  Group Key: dispatchers_dispatcher.id, (dispatchers_dispatcher.base_amount IS NULL), dispatchers_location.id, dispatchers_company.id
  ->  Hash Right Join  (cost=3.96..526.79 rows=8161 width=991)
        Hash Cond: ((loads_load.sales_agent_id)::text = (dispatchers_dispatcher.name)::text)
        ->  Seq Scan on loads_load  (cost=0.00..410.61 rows=8161 width=44)
        ->  Hash  (cost=3.67..3.67 rows=23 width=959)
              ->  Hash Left Join  (cost=2.09..3.67 rows=23 width=959)
                    Hash Cond: (dispatchers_dispatcher.company_id = dispatchers_company.id)
                    ->  Hash Left Join  (cost=1.04..2.47 rows=23 width=203)
                          Hash Cond: (dispatchers_dispatcher.location_id = dispatchers_location.id)
                          ->  Seq Scan on dispatchers_dispatcher  (cost=0.00..1.23 rows=23 width=77)
                          ->  Hash  (cost=1.02..1.02 rows=2 width=126)
                                ->  Seq Scan on dispatchers_location  (cost=0.00..1.02 rows=2 width=126)
                    ->  Hash  (cost=1.02..1.02 rows=2 width=756)
                          ->  Seq Scan on dispatchers_company  (cost=0.00..1.02 rows=2 width=756)
  SubPlan 1
    ->  Sort  (cost=1.03..1.04 rows=1 width=22)
          Sort Key: u0.id
          ->  Seq Scan on dispatchers_baserate u0  (cost=0.00..1.02 rows=1 width=22)
                Filter: (location_id = dispatchers_dispatcher.location_id)
  SubPlan 2
    ->  Sort  (cost=1.03..1.04 rows=1 width=22)
          Sort Key: u0_1.id
          ->  Seq Scan on dispatchers_baserate u0_1  (cost=0.00..1.02 rows=1 width=22)
                Filter: (location_id = dispatchers_dispatcher.location_id)
[30/Sep/2023 05:41:59] "GET /dispatchers/ HTTP/1.1" 200 32361

I am not sure how to do this. How do I break this query down into smaller queries?

PS Thank you so much. Can I buy you a coffee? :slight_smile:

Unless Iā€™m misunderstanding something, thatā€™s what youā€™ve already done in the code segment here:

What you are calling ā€œdirty?ā€ is exactly what Iā€™m referring to.

There are times where the ORM, or even raw SQL, is not the most suitable language for expressing business logic for calculations. Sometimes both clarity and performance are enhanced by performing calculations within Python and not in the database.

Thank you so much, Ken! Your suggestions helped a lot!