I have a Data
model and a Region
model (see below for definitions) where Data
has a CharField
that references instances of Region
(using a 5 digit identifier str). Regions
have a hierarchy based on their category
CharField
: both STATES and METROS are comprised of COUNTIES. Data
does not have an FK to Region
.
I want to sum COUNTY data into their parent METROs, but only if the number of observations matches the number of counties that are in the metro (which can be determined by using a Subquery involving the Region
model). It’s possible that data is missing for a given Data.date
for one of the counties in a METRO, so I don’t want the incomplete sum calculated.
My models and attempted solution are below:
# models.py
class Region(models.Model):
CATEGORIES = [
('c', 'COUNTY'),
('m', 'METRO'),
('s', 'STATE'),
]
category = models.CharField(max_length=1, choices=CATEGORIES)
name = models.CharField(max_legnth=255)
code = models.CharField(max_length=5)
children = models.ManyToManyField(self, related_name='parents', symmetrical=False)
class Data(models.Model):
region = models.CharField(max_length=5)
indicator = models.CharField(max_length=15)
date = models.DateField()
value = models.FloatField()
My best attempt to understand this so far is below:
# Aggregation attempts and notes
qs = Data.objects.annotate(
parent_region=Subquery(
Region.objects.filter(
category='METRO',
children__code=OuterRef('region'),
).values('code')
),
min_count=Subquery(
# Can't just use `Count()` since it messes with Subquery/OuterRef
# https://docs.djangoproject.com/en/4.2/ref/models/expressions/#using-aggregates-within-a-subquery-expression
Region.objects.filter(
category='COUNTY',
parents__code=OuterRef('parent_region'),
).order_by().values('parents__code').annotate(
cnt=Count('code')
).values('cnt')
),
).values(
# GroupBy
'indicator', 'parent_region', 'min_count', 'date',
).annotate(
count=Count('value'),
value=Sum('value'),
).filter(
count=F('min_count'),
)
Examining the results of qs.query
shows me that the combination of the final filter
call and the parent_region
Subquery lead to Django adding Data.region
to the GROUPBY
clause, which then causes count
to equal 1 (and thus not equal min_count
and return an empty QuerySet). If, instead of the parent_region
Subquery, I just set parent_region=Value('some_value')
, then the calculation and filters are done correctly (that is, Data.region
is not included in the GROUPBY
), but parent_region
is dynamic depending on the OuterRef('region')
.
Any advice on how I can accomplish this? Thanks