Group By Multiple Values

Hi,

I have 5 classes:

class Event(models.Model):
    class EventType(models.IntegerChoices):
        CREATE = 1
        UNASSIGN = 2
        ASSIGN = 3
        CANCEL = 4

    type = models.SmallIntegerField(choices=EventType.choices)

    group = models.ForeignKey(
        Group,
        on_delete=models.PROTECT,
    )

    assignee= models.ForeignKey(
        Assignee,
        on_delete=models.PROTECT,
    )
class Group(models.Model):    
    date = models.DateField()
class Deal(models.Model):
    product = models.ForeignKey(
        Product,
        on_delete=models.PROTECT,
        verbose_name="The underlying product with unit, exchange, size, period and date"
    )

    group = models.ForeignKey(
        Group,
        on_delete=models.PROTECT,
    )

    quantity = models.IntegerField()
class Product(models.Model):    
    name = models.CharField()
class Assignee(models.Model):    
    name = models.CharField()

I am trying to get a sum of quantities per assignee per product, where the latest event is not cancelled The assignee is calculated by the latest event for a given group.

I can achieve this relatively easily in raw SQL using a group by function (I can add this here if that’s helpful). I know how to get a sum on one or the other, using an annotate. Here would be example just per product:

  latest_events = Event.objects.order_by(
          "group_id", 
          "-id"
      ).distinct(
          "group_id"
      )
  
  valid_events= Event.objects.exclude(
      type=Event.EventType.CANCEL,
  )filter(
      id__in=Subquery(latest_events.values("id"))
  )

  Product.objects.annotate(
      summed_quantity = Sum(
          "quantity", 
          filter=Q(
              deal__group_id__in=Subquery(
                  valid_events.values("group_id")
              )
          )
      )
  )

How would I get this subdivided by assignee and product instead? Any suggestions would be appreciated, I haven’t managed to find anything similar in the documentation.

Django generates group by operations automatically. Add a call to .values() before .annotate() and put in the two fields you want to group on:

latest_events = Event.objects.order_by('group_id', '-id').distinct()
valid_groups = Event.objects.exclude(type=Event.EventType.CANCEL).filter(id__in=latest_events)
results = (Deal.objects.values('product__name', 'group__assignee__name')
                       .filter(deal__group_id__in=valid_groups)
                       .annotate(summed_quantity=Sum('production__quantity')
                       .values('product__name', 'group__assignee__name', 'summed_quantity'))

Thanks very much - does that work even though assignee is not on group its on event? (Which is a one to many on group).

I’ve written up a query using subqueries but I’m not sure its the most efficient way of doing it:

def get_quantities():
    latest_events = TradeEvent.objects.order_by(
        "group_id", 
        "-id"
    ).distinct(
        "group_id"
    )
​
    return Deal.objects.annotate(
        assignee_id=Subquery(
            latest_events.filter(
                group_id = OuterRef("group_id")
            ).values("assignee_id")[:1]
        ),
        type=Subquery(
            latest_events.filter(
                group_id = OuterRef("group_id")
            ).values("type")[:1]
        )
    ).exclude(
        type=EventType.CANCEL
    ).values(
        "product_id",
        "assignee_id"
    ).annotate(
        summed_quantity=Sum("quantity")
    ).values(
        "product_id",
        "assignee_id",
        "summed_quantity"
    )

I’m afraid I don’t quite understand your question.

It might be helpful, if you are able, to write out the expected SQL. I can help you work backwards from there. If you are not comfortable with SQL, can you write out in more detail what the expected structure of your results should be like? For example: “I want to get a row for each assignee and product, that additionally has a column giving the number of events for each such grouping in the last month”.

Thanks, I will write up a raw SQL query with a similar query pattern to the one above in django to demonstrate.

With regards to the requested structure:

“I would like the sum of quantities per product per assignee, where the assignee is taken from the latest event for each group, ignoring any group where the latest event is a cancel”

The query above does do this but it adds a subquery for each property I want to get from the event - this scales very poorly as the number of properties desired increases.

What I would probably find most natural as a query is this:

select latest_event.assignee_id, django_deal.product_id, sum(quantity)
	from django_group 
	join (
		select distinct on (group_id) group_id, assignee_id, type
			from django_event
			order by group_id asc, id desc
	) as latest_event on (django_group.id=latest_event.group_id and latest_event.type!=4)
	join django_deal on django_group.id=django_deal.group_id
	group by latest_event.assignee_id, django_deal.product_id