Aggregate Max value

I have a stacked mysql database table:

company_id parameter_id time_stamp value
1 2 2020-01-01 23.4
1 2 2020-02-01 24.4
2 3 2020-03-01 42.6
2 3 2020-05-01 32.4
2 4 2020-05-01 24.2

I would like to get the value for the max time_stamp for each parameter
So answer I would like is:

company_id parameter_id time_stamp value
1 2 2020-02-01 24.4
2 3 2020-05-01 32.4
2 4 2020-05-01 24.2

The closest I have is:

FinancialReports.objects.values(‘company’, ‘parameter’).annotate(time_stamp=Max(‘time_stamp’))

I believe the answer lies in annotate somewhere, however I do not get the value so I suspect I am barking up the wrong tree with this format…?

Take a look at Subquery. I would use that to pull the ID for the company, parameter pair that has the max time stamp value and filter to that record in the parent queryset so you can get the value field.

1 Like

I’ll be honest here - I’m having problems getting something to work - and a bit of searching for this hasn’t helped a whole lot. I can do it with two queries trivially, but I’ve not yet been able to get a Subquery to return the pk of the table when the aggregation is done on a different field.

For example:

FinancialReports.objects.values('parameter_id').annotate(max_time_stamp=Max('time_stamp'))

returns a queryset consisting of a list of dicts, where each dict has two keys, parameter_id and max_time_stamp.

The only way I have been able to make use of it is by iterating over the list, and using those values as the filter of a second query - creating a clear N+1 query situation.

UPDATE: I have gotten it down to two queries by using the results of the first query as the filter for the second query, but I can’t see a way to get this down to a single query in the ORM.

So I did make the assumption there is a table PK. If there isn’t, this becomes much more difficult due to some of the ORM support. So assuming FinancialReport has an id field not listed in the post:

# Create a subquery that will fetch the most recent financial report.
# This requires a table PK that's not a part of the unique filtering set. 
max_ids = FinancialReport.objects.filter(
    company_id=OuterRef('company_id'),
    parameter_id=OuterRef('parameter_id'),
).order_by('-time_stamp').values('id')

#  Filter on the subquery's matching id.
reports = FinancialReport.objects.filter(id=Subquery(max_ids[:1]))
1 Like

EDIT:
Ignore this, it’s Postgres:

Oh this is really flipping neat! I didn’t realize we could actually return multiple values out of the inner subquery as long as you apply the check with ROW. Note this is for postgresql only.

from django.db.models import Func, CharField, Subquery, OuterRef

class Row(Func):
    function = "ROW"
    # This is a bit of a hack
    output_field = CharField()

max_keys = FinancialReport.objects.filter(
    company_id=OuterRef('company_id'),
    parameter_id=OuterRef('parameter_id'),
).order_by('-time_stamp').values('company_id', 'parameter_id')

reports = FinancialReport.objects.annotate(
    key=Row('company_id', 'parameter_id')
).filter(key=Subquery(max_keys[:1]))

Maybe I missed the release note that allows a Subquery to return more than one column via values, but I think this is really neat.

1 Like

That is cool - but the very first line of the original message says that this is a mysql database.

Well crap. I neglected that part.

1 Like

Yes MySQL…for the time being anyway, and thanks for all the suggestions! Also yes there is an id PK in the table.

A tip from someone who knows a lot more about orm queries than I do is to add another .values on the end to add on the value:

FinancialReports.objects.all().values(‘company’, ‘parameter’).annotate(max_time_stamp=Max(‘time_stamp’)).values(‘company’, ‘parameter’, ‘max_time_stamp’, ‘value’)

The first values adds in company_id and parameter_id into the group by. However this still adds the value in the groupby so giving me more rows in the return than expected. Associated SQL is:

SELECT reporting_data.company_id, reporting_data.parameter_id, reporting_data.value, MAX(reporting_data.time_stamp) AS max_time_stamp FROM reporting_data GROUP BY reporting_data.company_id, reporting_data.parameter_id, reporting_data.value ORDER BY NULL

“Just” need to get the value out of the group by…

1 Like

I confirmed that my first approach should work for you. The Row class was the bit that was limited to postgresql.

# Create a subquery that will fetch the most recent financial report.
# This requires a table PK that's not a part of the unique filtering set. 
max_ids = FinancialReport.objects.filter(
    company_id=OuterRef('company_id'),
    parameter_id=OuterRef('parameter_id'),
).order_by('-time_stamp').values('id')

#  Filter on the subquery's matching id.
reports = FinancialReport.objects.filter(id=Subquery(max_ids[:1]))
2 Likes

I have tried this a few ways, but seems to hang and not sure why

1 Like

You mentioned that you use the first query as the filter to a second filter query, how would I do this?

I have tried the following:

result = filter_items.filter(id__in=filter_items.id)

but get the following error which seems strange it does not have an id:

‘FinancialReportsQueryset’ object has no attribute ‘id’

1 Like

Try those two statements as close to as written as possible. Don’t change anything that you don’t absolutely have to (something like a field name not being right). Notice that the second statement is the filter using the first - you don’t need to add anything else.

1 Like