GroupBy sum with minimum count

I have a Data model and a Region model (see below for definitions) where Data has a CharField that references instances of Region (using a 5 digit identifier str). Regions have a hierarchy based on their category CharField: both STATES and METROS are comprised of COUNTIES. Data does not have an FK to Region.

I want to sum COUNTY data into their parent METROs, but only if the number of observations matches the number of counties that are in the metro (which can be determined by using a Subquery involving the Region model). It’s possible that data is missing for a given Data.date for one of the counties in a METRO, so I don’t want the incomplete sum calculated.

My models and attempted solution are below:

# models.py
class Region(models.Model):
  CATEGORIES = [
    ('c', 'COUNTY'),
    ('m', 'METRO'), 
    ('s', 'STATE'),
  ]
​
  category = models.CharField(max_length=1, choices=CATEGORIES)
  name = models.CharField(max_legnth=255)
  code = models.CharField(max_length=5)
  children = models.ManyToManyField(self, related_name='parents', symmetrical=False)
​
​
class Data(models.Model):
  region = models.CharField(max_length=5)
  indicator = models.CharField(max_length=15)
  date = models.DateField()
  value = models.FloatField()

My best attempt to understand this so far is below:

# Aggregation attempts and notes
qs = Data.objects.annotate(
  parent_region=Subquery(
    Region.objects.filter(
      category='METRO',
      children__code=OuterRef('region'),  
    ).values('code')
  ),
  min_count=Subquery(
    # Can't just use `Count()` since it messes with Subquery/OuterRef
    # https://docs.djangoproject.com/en/4.2/ref/models/expressions/#using-aggregates-within-a-subquery-expression
    Region.objects.filter(
      category='COUNTY',
      parents__code=OuterRef('parent_region'),
    ).order_by().values('parents__code').annotate(
      cnt=Count('code')
    ).values('cnt')
  ),
).values(
  # GroupBy
  'indicator', 'parent_region', 'min_count', 'date',
).annotate(
  count=Count('value'),
  value=Sum('value'),
).filter(
  count=F('min_count'),
)

Examining the results of qs.query shows me that the combination of the final filter call and the parent_region Subquery lead to Django adding Data.region to the GROUPBY clause, which then causes count to equal 1 (and thus not equal min_count and return an empty QuerySet). If, instead of the parent_region Subquery, I just set parent_region=Value('some_value'), then the calculation and filters are done correctly (that is, Data.region is not included in the GROUPBY), but parent_region is dynamic depending on the OuterRef('region').

Any advice on how I can accomplish this? Thanks :slight_smile: