annotate with subquery in Django

FredericBentz · June 26, 2020, 1:53am

Hi, I want to make some statistics in my template, but I’m stuck with my query.
I’m not so bad with SQL, but I’m swimming totally in Django when I need to do a advanced query.
I want to .annotate() a query with the count of the number of races a runner have made in the past, and another .annotate() to select the best chrono made by the runner in the past.

class Horse(models.Model):
    id = AutoField(primary_key=True)
    name = models.CharField(max_length=255, blank=True, null=True, default=None)

class Race(models.Model):
    id = AutoField(primary_key=True)
    datetime_start = models.DateTimeField(blank=True, null=True, default=None)
    name = models.CharField(max_length=255, blank=True, null=True, default=None)

class Runner(models.Model):
    id = AutoField(primary_key=True)
    horse = models.ForeignKey(Horse, on_delete=models.PROTECT)
    race = models.ForeignKey(Race, on_delete=models.PROTECT)
    chrono = models.DecimalField(
        max_digits=10, decimal_places=2, blank=True, null=True, default=None)

I trying to do something like this but I don’t know how to make a subquery to access the past races, filtered whit the datetime_start and the horse_id from the current race:

def statistic(request, race_id):
    race = Race.objects.get(id=race_id)
    runners = Runner.objects.filter(race_id=race.id) \
        .annotate(nb_race=Count('race'), best_chrono=Min('chrono'))

    return render(request, 'turf/statistic.html', {
        'race': race,
        'runners': runners,
    })

Thx in advance.

KenWhitesell · June 26, 2020, 2:24am

Please help me understand exactly what you’re looking for here.

My understanding is that, given a race, you want to find all the runners that participated in that race filter(race_id=race.id). And, when doing so, you want to annotate each runner’s record with the total number of races that the runner has previously participated in, on the same horse as for the current race.

Am I understanding this correctly? Is that what you’re trying to achieve?

(Note: If you’re comfortable enough to do this in pure SQL, you might want to post the SQL you would use - that way, I might be able to relate your solution to the ORM-based solution. That might help you understand the ORM better.)

Ken

FredericBentz · June 26, 2020, 2:48am

You have totally understand. The name for model Runner is maybe not the more appropriate ( I use french names in my project, and I changed them for this post) Let say that a runner is a combination of a Horse and a Jockey (this last field is not important here, so I didn’t mention It).
The .annotate query is only focused on the Horses and their statistics from the past, in this case, the number of race they have run, and their best chrono, all for the the race filtered with the id.

PS: I will try to write the SQL query.

FredericBentz · June 26, 2020, 4:36am

SELECT horse_id, (SELECT count(run.id)
                    FROM runner run
                    INNER JOIN race
                    ON run.race_id = race.id
                    WHERE run.horse_id = r.horse_id
                    AND race.datetime_start < rc.datetime_start 
                    ) AS nb_race
FROM runner r
INNER JOIN  race rc
ON rc.id = r.race_id
WHERE rc.id = 7890
GROUP BY horse_id

This is just fort the nb_race
Note : the query works without the datetime_start filter. With it, all the the values returned are equal to zero. And this query don’t return the entire runner Objets like in the Django query above , only the horse_id and the nb_race
I must admit this is the best I can do for the moment without digging into internet for more info.
Anyway I think it shows you the idea.

KenWhitesell · June 26, 2020, 11:19am

This gives me a starting point - I’m going to need to think about this a little.

I’ve put together some sample test data:

Horse

id	name
1	Red
2	Green
3	Blue
4	Brown

Race

id	date	name
1	1 Mar 2020	Florida
2	1 May 2020	Kentucky
3	15 May 2020	Maryland
4	1 June 2020	New York

Runner

id	horse	race	chrono
1	1	1	61.1
2	2	1	61.2
3	3	1	61.3
4	2	2	62.2
5	3	2	62.3
6	4	2	62.4
7	3	3	63.3
8	4	3	63.4
9	1	3	63.1
10	4	4	64.4
11	1	4	64.1
12	2	4	64.2

So, can you confirm that if I were to run your desired query on the data defined above, with race_id = 3, I should see the following results?

id	horse	race	chrono	nb_race	best_chrono
7	3	3	63.3	2	61.3
8	4	3	63.4	1	62.4
9	1	3	63.1	1	61.1

FredericBentz · June 26, 2020, 12:51pm

Thank you,
I modified a little the SQL query (the GROUP BY was unnecessary and I use WHERE instead of JOIN, it was more efficient is this case), and it’s now fully functional, it returns the full runner Objects, and even the datetime_start filter works.

SELECT *, (SELECT count(run.id)
                    FROM runner run
                    INNER JOIN race
                    ON run.race_id = race.id
                    WHERE run.horse_id = r.horse_id
                    AND race.datetime_start < rc.datetime_start 
                    ) AS nb_race
FROM runner r, race rc
WHERE r.race_id = rc.id
AND rc.id = 7890

KenWhitesell · June 26, 2020, 12:59pm

And if you want the minimum time from the previous results, you can extend the query like this:

SELECT *, (SELECT count(run.id)
                    FROM runner run
                    INNER JOIN race
                    ON run.race_id = race.id
                    WHERE run.horse_id = r.horse_id
                    AND race.datetime_start < rc.datetime_start 
                    ) AS nb_race, 
            (SELECT min(run.chrono)
                    FROM runner run
                    INNER JOIN race
                    ON run.race_id = race.id
                    WHERE run.horse_id = r.horse_id
                    AND race.datetime_start < rc.datetime_start 
                    ) AS best_time
FROM runner r, race rc
WHERE r.race_id = rc.id
AND rc.id = 7890

FredericBentz · June 26, 2020, 1:04pm

This is exact, so how do I now write the same query in Django ?

FredericBentz · June 26, 2020, 7:36pm

The sample you have created is good, and the responses are what we should expect.
On my side I’m trying different options : subquery or RawSQL, but the end is always the same, I just can’t access the horse_id outside the subquery, even with an OuterRef('horse_id').
This is my last attempt:

runners = Runner.objects.filter(race_id=race.id) \
        .annotate( nb_course=RawSQL(
            "SELECT count(runner.id) FROM runner INNER JOIN race ON race.course_id = race.id WHERE runner.horse_id = %s", (OuterRef('horse_id'))))

Just how do you do a " Correlated subquery" in Django ? How can I access the horse_id in the Runner.objects.filter(race_id=race.id) outside the RawSQL() ?

If you answer this, everything will become crystal clear, and I can even try a solution without RawSQL()

KenWhitesell · June 26, 2020, 8:23pm

Well, the short answer is that I’m temporarily stumped as well. What I thought was going to work, doesn’t.
I’ve been reading (and rereading) the Subquery docs, as well as the sections on generating aggregates for each item of a QuerySet, Subquery expressions and using aggregates in subquery expressions, and I don’t have anything working yet.
I have to admit though, I do like a good puzzle so I’m going to keep working on this for a while longer.

FredericBentz · June 26, 2020, 8:33pm

Thank you Ken, I will try to resolve this on my side too.

FredericBentz · June 26, 2020, 10:08pm

I found this post on Stack Overflow and it works. This query gives me the nb_race fro every horse, independently from the datetime_start, the SQL query give the same results, so it’s correct.

runners = Runner.objects.filter(race_id=race.id)\
        .annotate(nb_race=Subquery(Runner.objects.filter(horse_id=OuterRef('horse_id')) \
         .values('horse_id').annotate(count=Count('horse_id')).values('count')))

Now I need to add the datetime_star filter, but I will study this query first, I don’t understand it totally for the moment…

FredericBentz · June 26, 2020, 11:02pm

OK, this one works for the nb_race:

race = Race.objects.get(id=race_id)
runners = Runner.objects.filter(race_id=race.id)\
        .annotate(nb_race=Subquery(Runner.objects.filter(
                   horse_id=OuterRef('horse_id'), race__datetime_start__lt=race.datetime_start)
                      .values('horse_id').annotate(count=Count('horse_id')).values('count')))

Now I will try the second query.

FredericBentz · June 26, 2020, 11:13pm

This is perfect, I got it !

race = Race.objects.get(id=race_id)
runners = Runner.objects.filter(race_id=race.id)\
        .annotate(nb_race=Subquery(Runner.objects.filter(
                horse_id=OuterRef('horse_id'), race__datetime_start__lt=race.datetime_start)
                      .values('horse_id').annotate(count=Count('horse_id')).values('count')))\
        .annotate(best_chrono=Subquery(Runner.objects.filter(
                 horse_id=OuterRef('horse_id'), race__datetime_start__lt=race.datetime_start)
                      .values('horse_id').annotate(min=Min('chrono')).values('min')))

Topic		Replies	Views
Using annotation to annotate Using Django	12	12562	September 11, 2020
QuerySet API Using Django	0	586	January 9, 2020
It is possible to perform a query within an annotate? Using the ORM	6	842	August 18, 2022
Using nested Subqueries Using the ORM	0	155	August 2, 2024
Join on subquery Using the ORM	5	2751	December 27, 2023

annotate with subquery in Django

Related topics