Aggregate Max value

9Pants · February 13, 2021, 8:05pm

I have a stacked mysql database table:

company_id	parameter_id	time_stamp	value
1	2	2020-01-01	23.4
1	2	2020-02-01	24.4
2	3	2020-03-01	42.6
2	3	2020-05-01	32.4
2	4	2020-05-01	24.2

I would like to get the value for the max time_stamp for each parameter
So answer I would like is:

company_id	parameter_id	time_stamp	value
1	2	2020-02-01	24.4
2	3	2020-05-01	32.4
2	4	2020-05-01	24.2

The closest I have is:

FinancialReports.objects.values(‘company’, ‘parameter’).annotate(time_stamp=Max(‘time_stamp’))

I believe the answer lies in annotate somewhere, however I do not get the value so I suspect I am barking up the wrong tree with this format…?

CodenameTim · February 15, 2021, 4:22pm

Take a look at Subquery. I would use that to pull the ID for the company, parameter pair that has the max time stamp value and filter to that record in the parent queryset so you can get the value field.

KenWhitesell · February 15, 2021, 4:59pm

I’ll be honest here - I’m having problems getting something to work - and a bit of searching for this hasn’t helped a whole lot. I can do it with two queries trivially, but I’ve not yet been able to get a Subquery to return the pk of the table when the aggregation is done on a different field.

For example:

FinancialReports.objects.values('parameter_id').annotate(max_time_stamp=Max('time_stamp'))

returns a queryset consisting of a list of dicts, where each dict has two keys, parameter_id and max_time_stamp.

The only way I have been able to make use of it is by iterating over the list, and using those values as the filter of a second query - creating a clear N+1 query situation.

UPDATE: I have gotten it down to two queries by using the results of the first query as the filter for the second query, but I can’t see a way to get this down to a single query in the ORM.

CodenameTim · February 15, 2021, 7:19pm

So I did make the assumption there is a table PK. If there isn’t, this becomes much more difficult due to some of the ORM support. So assuming FinancialReport has an id field not listed in the post:

# Create a subquery that will fetch the most recent financial report.
# This requires a table PK that's not a part of the unique filtering set. 
max_ids = FinancialReport.objects.filter(
    company_id=OuterRef('company_id'),
    parameter_id=OuterRef('parameter_id'),
).order_by('-time_stamp').values('id')

#  Filter on the subquery's matching id.
reports = FinancialReport.objects.filter(id=Subquery(max_ids[:1]))

CodenameTim · February 15, 2021, 7:35pm

EDIT:
Ignore this, it’s Postgres:

Oh this is really flipping neat! I didn’t realize we could actually return multiple values out of the inner subquery as long as you apply the check with ROW. Note this is for postgresql only.

from django.db.models import Func, CharField, Subquery, OuterRef

class Row(Func):
    function = "ROW"
    # This is a bit of a hack
    output_field = CharField()

max_keys = FinancialReport.objects.filter(
    company_id=OuterRef('company_id'),
    parameter_id=OuterRef('parameter_id'),
).order_by('-time_stamp').values('company_id', 'parameter_id')

reports = FinancialReport.objects.annotate(
    key=Row('company_id', 'parameter_id')
).filter(key=Subquery(max_keys[:1]))

Maybe I missed the release note that allows a Subquery to return more than one column via values, but I think this is really neat.

KenWhitesell · February 15, 2021, 7:57pm

That is cool - but the very first line of the original message says that this is a mysql database.

CodenameTim · February 15, 2021, 7:58pm

Well crap. I neglected that part.

9Pants · February 15, 2021, 8:12pm

Yes MySQL…for the time being anyway, and thanks for all the suggestions! Also yes there is an id PK in the table.

A tip from someone who knows a lot more about orm queries than I do is to add another .values on the end to add on the value:

FinancialReports.objects.all().values(‘company’, ‘parameter’).annotate(max_time_stamp=Max(‘time_stamp’)).values(‘company’, ‘parameter’, ‘max_time_stamp’, ‘value’)

The first values adds in company_id and parameter_id into the group by. However this still adds the value in the groupby so giving me more rows in the return than expected. Associated SQL is:

SELECT reporting_data.company_id, reporting_data.parameter_id, reporting_data.value, MAX(reporting_data.time_stamp) AS max_time_stamp FROM reporting_data GROUP BY reporting_data.company_id, reporting_data.parameter_id, reporting_data.value ORDER BY NULL

“Just” need to get the value out of the group by…

CodenameTim · February 15, 2021, 8:30pm

I confirmed that my first approach should work for you. The Row class was the bit that was limited to postgresql.

# Create a subquery that will fetch the most recent financial report.
# This requires a table PK that's not a part of the unique filtering set. 
max_ids = FinancialReport.objects.filter(
    company_id=OuterRef('company_id'),
    parameter_id=OuterRef('parameter_id'),
).order_by('-time_stamp').values('id')

#  Filter on the subquery's matching id.
reports = FinancialReport.objects.filter(id=Subquery(max_ids[:1]))

9Pants · February 16, 2021, 8:49pm

I have tried this a few ways, but seems to hang and not sure why

9Pants · February 16, 2021, 8:57pm

You mentioned that you use the first query as the filter to a second filter query, how would I do this?

I have tried the following:

result = filter_items.filter(id__in=filter_items.id)

but get the following error which seems strange it does not have an id:

‘FinancialReportsQueryset’ object has no attribute ‘id’

KenWhitesell · February 16, 2021, 9:00pm

CodenameTim:

# Create a subquery that will fetch the most recent financial report.
# This requires a table PK that's not a part of the unique filtering set. 
max_ids = FinancialReport.objects.filter(
    company_id=OuterRef('company_id'),
    parameter_id=OuterRef('parameter_id'),
).order_by('-time_stamp').values('id')

#  Filter on the subquery's matching id.
reports = FinancialReport.objects.filter(id=Subquery(max_ids[:1]))

Try those two statements as close to as written as possible. Don’t change anything that you don’t absolutely have to (something like a field name not being right). Notice that the second statement is the filter using the first - you don’t need to add anything else.

Topic		Replies	Views
Use max aggregation in Model save with single SQL statement Using Django	1	610	December 31, 2020
Help crafting Django ORMese Using Django	1	670	February 10, 2020
QuerySet API Using Django	0	586	January 9, 2020
How to write this raw sql subquery as native django query? Using the ORM	15	4688	January 26, 2023
Simple(ish) SQL to ORM query Using the ORM	2	453	March 23, 2023

Aggregate Max value

Related topics