annotate with subquery in Django

Hi, I want to make some statistics in my template, but I’m stuck with my query.
I’m not so bad with SQL, but I’m swimming totally in Django when I need to do a advanced query.
I want to .annotate() a query with the count of the number of races a runner have made in the past, and another .annotate() to select the best chrono made by the runner in the past.

class Horse(models.Model):
    id = AutoField(primary_key=True)
    name = models.CharField(max_length=255, blank=True, null=True, default=None)

class Race(models.Model):
    id = AutoField(primary_key=True)
    datetime_start = models.DateTimeField(blank=True, null=True, default=None)
    name = models.CharField(max_length=255, blank=True, null=True, default=None)

class Runner(models.Model):
    id = AutoField(primary_key=True)
    horse = models.ForeignKey(Horse, on_delete=models.PROTECT)
    race = models.ForeignKey(Race, on_delete=models.PROTECT)
    chrono = models.DecimalField(
        max_digits=10, decimal_places=2, blank=True, null=True, default=None)

I trying to do something like this but I don’t know how to make a subquery to access the past races, filtered whit the datetime_start and the horse_id from the current race:

def statistic(request, race_id):
    race = Race.objects.get(id=race_id)
    runners = Runner.objects.filter(race_id=race.id) \
        .annotate(nb_race=Count('race'), best_chrono=Min('chrono'))

    return render(request, 'turf/statistic.html', {
        'race': race,
        'runners': runners,
    })

Thx in advance.

Please help me understand exactly what you’re looking for here.

My understanding is that, given a race, you want to find all the runners that participated in that race filter(race_id=race.id). And, when doing so, you want to annotate each runner’s record with the total number of races that the runner has previously participated in, on the same horse as for the current race.

Am I understanding this correctly? Is that what you’re trying to achieve?

(Note: If you’re comfortable enough to do this in pure SQL, you might want to post the SQL you would use - that way, I might be able to relate your solution to the ORM-based solution. That might help you understand the ORM better.)

Ken

You have totally understand. The name for model Runner is maybe not the more appropriate ( I use french names in my project, and I changed them for this post) Let say that a runner is a combination of a Horse and a Jockey (this last field is not important here, so I didn’t mention It).
The .annotate query is only focused on the Horses and their statistics from the past, in this case, the number of race they have run, and their best chrono, all for the the race filtered with the id.

PS: I will try to write the SQL query.

SELECT horse_id, (SELECT count(run.id)
                    FROM runner run
                    INNER JOIN race
                    ON run.race_id = race.id
                    WHERE run.horse_id = r.horse_id
                    AND race.datetime_start < rc.datetime_start 
                    ) AS nb_race
FROM runner r
INNER JOIN  race rc
ON rc.id = r.race_id
WHERE rc.id = 7890
GROUP BY horse_id

This is just fort the nb_race
Note : the query works without the datetime_start filter. With it, all the the values returned are equal to zero. And this query don’t return the entire runner Objets like in the Django query above , only the horse_id and the nb_race
I must admit this is the best I can do for the moment without digging into internet for more info.
Anyway I think it shows you the idea.

This gives me a starting point - I’m going to need to think about this a little.

I’ve put together some sample test data:

Horse

id name
1 Red
2 Green
3 Blue
4 Brown

Race

id date name
1 1 Mar 2020 Florida
2 1 May 2020 Kentucky
3 15 May 2020 Maryland
4 1 June 2020 New York

Runner

id horse race chrono
1 1 1 61.1
2 2 1 61.2
3 3 1 61.3
4 2 2 62.2
5 3 2 62.3
6 4 2 62.4
7 3 3 63.3
8 4 3 63.4
9 1 3 63.1
10 4 4 64.4
11 1 4 64.1
12 2 4 64.2

So, can you confirm that if I were to run your desired query on the data defined above, with race_id = 3, I should see the following results?

id horse race chrono nb_race best_chrono
7 3 3 63.3 2 61.3
8 4 3 63.4 1 62.4
9 1 3 63.1 1 61.1

Thank you,
I modified a little the SQL query (the GROUP BY was unnecessary and I use WHERE instead of JOIN, it was more efficient is this case), and it’s now fully functional, it returns the full runner Objects, and even the datetime_start filter works.

SELECT *, (SELECT count(run.id)
                    FROM runner run
                    INNER JOIN race
                    ON run.race_id = race.id
                    WHERE run.horse_id = r.horse_id
                    AND race.datetime_start < rc.datetime_start 
                    ) AS nb_race
FROM runner r, race rc
WHERE r.race_id = rc.id
AND rc.id = 7890

And if you want the minimum time from the previous results, you can extend the query like this:

SELECT *, (SELECT count(run.id)
                    FROM runner run
                    INNER JOIN race
                    ON run.race_id = race.id
                    WHERE run.horse_id = r.horse_id
                    AND race.datetime_start < rc.datetime_start 
                    ) AS nb_race, 
            (SELECT min(run.chrono)
                    FROM runner run
                    INNER JOIN race
                    ON run.race_id = race.id
                    WHERE run.horse_id = r.horse_id
                    AND race.datetime_start < rc.datetime_start 
                    ) AS best_time
FROM runner r, race rc
WHERE r.race_id = rc.id
AND rc.id = 7890

This is exact, so how do I now write the same query in Django ?

The sample you have created is good, and the responses are what we should expect.
On my side I’m trying different options : subquery or RawSQL, but the end is always the same, I just can’t access the horse_id outside the subquery, even with an OuterRef('horse_id').
This is my last attempt:

runners = Runner.objects.filter(race_id=race.id) \
        .annotate( nb_course=RawSQL(
            "SELECT count(runner.id) FROM runner INNER JOIN race ON race.course_id = race.id WHERE runner.horse_id = %s", (OuterRef('horse_id'))))

Just how do you do a " Correlated subquery" in Django ? How can I access the horse_id in the Runner.objects.filter(race_id=race.id) outside the RawSQL() ?

If you answer this, everything will become crystal clear, and I can even try a solution without RawSQL()

Well, the short answer is that I’m temporarily stumped as well. What I thought was going to work, doesn’t.
I’ve been reading (and rereading) the Subquery docs, as well as the sections on generating aggregates for each item of a QuerySet, Subquery expressions and using aggregates in subquery expressions, and I don’t have anything working yet.
I have to admit though, I do like a good puzzle so I’m going to keep working on this for a while longer.

Thank you Ken, I will try to resolve this on my side too.

I found this post on Stack Overflow and it works. This query gives me the nb_race fro every horse, independently from the datetime_start, the SQL query give the same results, so it’s correct.

runners = Runner.objects.filter(race_id=race.id)\
        .annotate(nb_race=Subquery(Runner.objects.filter(horse_id=OuterRef('horse_id')) \
         .values('horse_id').annotate(count=Count('horse_id')).values('count')))

Now I need to add the datetime_star filter, but I will study this query first, I don’t understand it totally for the moment…

OK, this one works for the nb_race:

race = Race.objects.get(id=race_id)
runners = Runner.objects.filter(race_id=race.id)\
        .annotate(nb_race=Subquery(Runner.objects.filter(
                   horse_id=OuterRef('horse_id'), race__datetime_start__lt=race.datetime_start)
                      .values('horse_id').annotate(count=Count('horse_id')).values('count')))

Now I will try the second query.

This is perfect, I got it !

race = Race.objects.get(id=race_id)
runners = Runner.objects.filter(race_id=race.id)\
        .annotate(nb_race=Subquery(Runner.objects.filter(
                horse_id=OuterRef('horse_id'), race__datetime_start__lt=race.datetime_start)
                      .values('horse_id').annotate(count=Count('horse_id')).values('count')))\
        .annotate(best_chrono=Subquery(Runner.objects.filter(
                 horse_id=OuterRef('horse_id'), race__datetime_start__lt=race.datetime_start)
                      .values('horse_id').annotate(min=Min('chrono')).values('min')))
1 Like