Fetch similar Django objects in one query if the objects have no direct relation

Let’s say I have the following model:

class Book(models.Model):
    class Meta:
      constraints = [
          models.UniqueConstraint(
              fields=["name", "version"],
              name="%(app_label)s_%(class)s_unique_name_version",
          )
      ]

    name = models.CharField(max_length=255)
    version = models.PositiveIntegerField()

For each book, I need to get all higher versions:

book_to_future_book_mappings = {}
for book in Book.objects.all():
    higher_versions = Book.objects.filter(name=name, version__gt=book.version).order_by("version")
    book_to_future_book_mappings[book] = list(higher_versions)

The problem with this approach is that it doesn’t scale well. It makes a separate query for every book, which is very slow if I have millions of books in my database.

Is there anyway I can get all of this data in just one or a few queries? Perhaps I can use RawSQL if needed?

Note my database is PostgreSQL.

I think you should be able to do this using the PostgreSQL ArrayAgg function.

Just winging this, with no expectation of this being 100% correct, you could try something like:

book_futures = TeBook.objects.annotate(
  futures = ArrayAgg(
    Subquery(
      TeBook.objects.filter(
        name=OuterRef('name'),
        version__gt=OuterRef('version')
      ).order_by('version').values_list('id', flat=True)
    )
  )
)

Note that this is all operating in PostgreSQL, which means that the subquery can’t return the actual Book objects within the aggregate. I think about the best you can do is return the list of IDs of the related books, which then you can reference through your book_futures list.

Also, I’d suggest you create an index on version

Well, I had a couple minutes to play with this and it doesn’t work. (Nor did any of the half-dozen basic variations of this I tried.)
This does seem to be a PostgreSQL issue in that I tried directly running these queries within psql, and that is where the errors I’m seeing are coming from. So Django and the ORM aren’t having a problem with it, it’s an invalid query for PostgreSQL.
(I’m pointing this out to highlight that going to raw SQL is not going to help. Again, the issue is not the ORM.)
If you do come up with a raw query that does what you need, we can try translating that back to the ORM.

Here’s what I came up with on how to do this in python with a single query

from itertools import chain
all_books = Book.objects.all().order_by('version')

book_mapping = defaultdict([])
for book in all_books:
    for version, books_list in book_mapping[book.name].items():
        book_mapping[book.name][version].append(book)
    book_mapping[book.name].append({book.version: [book]})

return chain.from_iterable(chain.from_iterable(books_list.values()) for books_list in book_mapping.values())

The trouble here will be making sure the ordering on version is actually ascending as the algorithm assumes that everything is in order, hence why it can append the book to the other versions for the book name in book_mapping.

Keep in mind that as your Book table grows, this will eat up more and more memory.

I’ve got it working as a single PostgreSQL query.

The key is using the PostgreSQL Array function, not ArrayAgg. Since the PostgreSQL interface doesn’t define usage of the Array function, we have to do our own.

What I came up with was:

class MakeArray(Func):
    function = 'ARRAY'

which then allows for a solution very close to my original idea:

book_futures = Book.objects.annotate(
  futures = MakeArray(
    Subquery(
      Book.objects.filter(
        name=OuterRef('name'),
        version__gt=OuterRef('version')
      ).order_by('version').values('id')
    )
  )
)

This gives you an annotation column (futures) with the IDs of the referenced books.

Very helpful. Thank you both @KenWhitesell and @CodenameTim. I’ll try both solutions when I’m back in office later this month.

Note: It looks like this feature is added in Django 4.0 - PostgreSQL specific query expressions | Django documentation | Django

That changes my suggestion to:

book_futures = Book.objects.annotate(
  futures = ArraySubquery(
      Book.objects.filter(
        name=OuterRef('name'),
        version__gt=OuterRef('version')
      ).order_by('version').values('id')
  )
)

(Disclaimer, I have not yet tried this under Django 4.0)

1 Like

This worked perfectly. Thanks!