Prefetching spatially related data

I have created something similar to the following in my project:

from django.contrib.gis.db import models

class Location(models.Model):
    area = models.PolygonField()

class Plant(models.Model):
   position = models.PointField()

I’m trying to come up with a way to describe a relation between the two models such that I can get a Plant queryset with prefetched Locations that have areas that intersect Plant object positions.
i.e. :

plants = Plant.objects.prefetch_related('locations')

I know how to get the locations for one object without a relation. Beyond that I actually came up with a way to define a custom relation and manager to get the proper related records, but the private API is just a bit opaque for me to figure out exactly what is happening with the prefetched values.

Does anyone know if there is a simpler way of avoiding the N+1 select problem when it comes to spatially related data like this?

Unfortunately the ORM doesn’t allow you to define relationship between models that don’t involve foreign keys. If it was the case you could possibly do something like

class Location(models.Model):
    area = models.PolygonField()

class Plant(models.Model):
   position = models.PointField()
   locations = Relation(
        Location, Q(position__within=F("locations__area"))
   )

Knowing that it’s important to understand that prefetch related is really just a glorified __in lookup that performs assignments. In other words

Author.objects.prefetch_related("books")

Is equivalent to

authors = Author.objects.all()
authors_books = defaultdict(list)
for book in Book.objects.filter(author__in=authors):
    authors_books[book.author_id].append(book)
for author in authors:
    author._prefetch_related_cache["books"] = authors_books[author.id]

This means that in order to emulate what prefetch_related('locations') does you need to craft a query that can return all the associated Location annotated with plant_id within their defined areas from a set of Plant.

Something like the following should work

from django.contrib.postgres.expressions import ArraySubquery
from django.db.models import OuterRef, Exists

plant_ids = [plant.id for plant in plants]
locations = Location.objects.filter(
    Exists(
        Plant.objects.filter(
            id__in=plant_ids,
            position__within=OuterRef("area"),
        )
    )
).annotate(
    plant_ids=ArraySubquery(
        Plant.objects.filter(
            position__within=OuterRef("area")
        ).values("id")
    )
)
plant_locations = defaultdict(list)
for location in locations:
    for plant_id in location.plant_ids:
        plant_locations[plant_id].append(location)

Which should perform a single SQL query of the form

SELECT
   location.*,
   ARRAY(
       SELECT id
       FROM plant
       WHERE ST_Within(plant.position, location.area)
   ) plant_ids
FROM location
WHERE EXISTS (
    SELECT 1
    FROM plant
    WHERE ST_Within(plant.position, location.area)
    LIMIT 1
)

Which should be roughly equivalent to the more concise but hard to express through the ORM and likely more performant

SELECT
   location.*,
   array_agg(plant.id) plant_ids
FROM location
INNER JOIN plant ON (ST_Within(plant.position, location.area))
GROUP BY location.id
1 Like

isn’t this supposed to be:

for author in authors:
    author._prefetch_related_cache["books"] = authors_books[author.id]

Yes, it shouldn’t affect the intent of the untested pseudo code but it’s more correct.

Thank you so much for such an in-depth reply. Just for completeness, do you know if there is any way to express that kind of join using the ORM?

INNER JOIN plant ON (ST_Within(plant.position, location.area))

You’ll have to use extra as a last resort and some low level expressions to achieve that I’m afraid

SELECT
   location.*,
   array_agg(plant.id) plant_ids
FROM location, plant
WHERE ST_Contains(location.area, plant.position)
GROUP BY location.id

which is equivalent as the above is an implicit INNER JOIN through a WHERE criteria.

I believe the following should work

from django.db.models.expression import Col

plant_table = Plant._meta.db_table
plant_id_field = Plant._meta.get_field("id")
plant_location_field = Plant._meta.get_field("location")
Location.objects.extra(
    tables=[plant_table]
).annotate(
     plant_ids=ArrayAgg(Col(plant_table, plant_id_field)),
).filter(
    area__contains=Col(plant_table, plant_location_field),
)
2 Likes

Enlightening!

Thank you so much for your help.