class Book(models.Model):
class Meta:
constraints = [
models.UniqueConstraint(
fields=["name", "version"],
name="%(app_label)s_%(class)s_unique_name_version",
)
]
name = models.CharField(max_length=255)
version = models.PositiveIntegerField()
For each book, I need to get all higher versions:
book_to_future_book_mappings = {}
for book in Book.objects.all():
higher_versions = Book.objects.filter(name=name, version__gt=book.version).order_by("version")
book_to_future_book_mappings[book] = list(higher_versions)
The problem with this approach is that it doesn’t scale well. It makes a separate query for every book, which is very slow if I have millions of books in my database.
Is there anyway I can get all of this data in just one or a few queries? Perhaps I can use RawSQL if needed?
Note that this is all operating in PostgreSQL, which means that the subquery can’t return the actual Book objects within the aggregate. I think about the best you can do is return the list of IDs of the related books, which then you can reference through your book_futures list.
Well, I had a couple minutes to play with this and it doesn’t work. (Nor did any of the half-dozen basic variations of this I tried.)
This does seem to be a PostgreSQL issue in that I tried directly running these queries within psql, and that is where the errors I’m seeing are coming from. So Django and the ORM aren’t having a problem with it, it’s an invalid query for PostgreSQL.
(I’m pointing this out to highlight that going to raw SQL is not going to help. Again, the issue is not the ORM.)
If you do come up with a raw query that does what you need, we can try translating that back to the ORM.
Here’s what I came up with on how to do this in python with a single query
from itertools import chain
all_books = Book.objects.all().order_by('version')
book_mapping = defaultdict([])
for book in all_books:
for version, books_list in book_mapping[book.name].items():
book_mapping[book.name][version].append(book)
book_mapping[book.name].append({book.version: [book]})
return chain.from_iterable(chain.from_iterable(books_list.values()) for books_list in book_mapping.values())
The trouble here will be making sure the ordering on version is actually ascending as the algorithm assumes that everything is in order, hence why it can append the book to the other versions for the book name in book_mapping.
Keep in mind that as your Book table grows, this will eat up more and more memory.
The key is using the PostgreSQL Array function, not ArrayAgg. Since the PostgreSQL interface doesn’t define usage of the Array function, we have to do our own.
What I came up with was:
class MakeArray(Func):
function = 'ARRAY'
which then allows for a solution very close to my original idea: