Take a look at Subquery. I would use that to pull the ID for the company, parameter pair that has the max time stamp value and filter to that record in the parent queryset so you can get the value field.
I’ll be honest here - I’m having problems getting something to work - and a bit of searching for this hasn’t helped a whole lot. I can do it with two queries trivially, but I’ve not yet been able to get a Subquery to return the pk of the table when the aggregation is done on a different field.
returns a queryset consisting of a list of dicts, where each dict has two keys, parameter_id and max_time_stamp.
The only way I have been able to make use of it is by iterating over the list, and using those values as the filter of a second query - creating a clear N+1 query situation.
UPDATE: I have gotten it down to two queries by using the results of the first query as the filter for the second query, but I can’t see a way to get this down to a single query in the ORM.
So I did make the assumption there is a table PK. If there isn’t, this becomes much more difficult due to some of the ORM support. So assuming FinancialReport has an id field not listed in the post:
# Create a subquery that will fetch the most recent financial report.
# This requires a table PK that's not a part of the unique filtering set.
max_ids = FinancialReport.objects.filter(
company_id=OuterRef('company_id'),
parameter_id=OuterRef('parameter_id'),
).order_by('-time_stamp').values('id')
# Filter on the subquery's matching id.
reports = FinancialReport.objects.filter(id=Subquery(max_ids[:1]))
Oh this is really flipping neat! I didn’t realize we could actually return multiple values out of the inner subquery as long as you apply the check with ROW. Note this is for postgresql only.
from django.db.models import Func, CharField, Subquery, OuterRef
class Row(Func):
function = "ROW"
# This is a bit of a hack
output_field = CharField()
max_keys = FinancialReport.objects.filter(
company_id=OuterRef('company_id'),
parameter_id=OuterRef('parameter_id'),
).order_by('-time_stamp').values('company_id', 'parameter_id')
reports = FinancialReport.objects.annotate(
key=Row('company_id', 'parameter_id')
).filter(key=Subquery(max_keys[:1]))
Maybe I missed the release note that allows a Subquery to return more than one column via values, but I think this is really neat.
The first values adds in company_id and parameter_id into the group by. However this still adds the value in the groupby so giving me more rows in the return than expected. Associated SQL is:
SELECT reporting_data.company_id, reporting_data.parameter_id, reporting_data.value, MAX(reporting_data.time_stamp) AS max_time_stamp FROM reporting_data GROUP BY reporting_data.company_id, reporting_data.parameter_id, reporting_data.value ORDER BY NULL
I confirmed that my first approach should work for you. The Row class was the bit that was limited to postgresql.
# Create a subquery that will fetch the most recent financial report.
# This requires a table PK that's not a part of the unique filtering set.
max_ids = FinancialReport.objects.filter(
company_id=OuterRef('company_id'),
parameter_id=OuterRef('parameter_id'),
).order_by('-time_stamp').values('id')
# Filter on the subquery's matching id.
reports = FinancialReport.objects.filter(id=Subquery(max_ids[:1]))
Try those two statements as close to as written as possible. Don’t change anything that you don’t absolutely have to (something like a field name not being right). Notice that the second statement is the filter using the first - you don’t need to add anything else.