Hey folks,
I’m in the process of optimizing some filter queries that join span several tables and noticed a repeating pattern of including “unnecessary” tables among joins. I found no obvious way other than using raw()
, but I’d very much like to keep using filter()
if possible. Does anyone better versed in Django queries have suggestions? Thanks!
class Person(models.Model):
pass
class Dog(models.Model):
owner = models.ForeignKey(Person, on_delete=models.DO_NOTHING)
class Cat(models.Model):
owner = models.ForeignKey(Person, on_delete=models.DO_NOTHING)
person = Person.objects.create()
someone_else = Person.objects.create()
cat = Cat.objects.create(owner=person)
cat_of_someone_else = Cat.objects.create(owner=someone_else)
dog = Dog.objects.create(owner=person)
dog_of_someone_else = Dog.objects.create(owner=someone_else)
# Let's find dogs owned by the same owner as the owner of this cat
self.assertEqual(
[dog],
list(
Dog.objects.filter(
owner__cat=cat
)
)
)
# Generates a JOIN that includes owners
#
# SELECT "my_app_dog"."id", "my_app_dog"."owner_id"
# FROM "my_app_dog"
# INNER JOIN "my_app_person"
# ON ("my_app_dog"."owner_id" = "my_app_person"."id")
# INNER JOIN "my_app_cat"
# ON ("my_app_person"."id" = "my_app_cat"."owner_id")
# WHERE "my_app_cat"."id" = 1
# But the same can be achieved without joining owners
self.assertEqual(
[dog],
list(
Dog.objects.raw("""
SELECT *
FROM my_app_dog
INNER JOIN my_app_cat
ON my_app_dog.owner_id = my_app_cat.owner_id
WHERE
my_app_cat.id = %s
""", [cat.id])
)
)
# Is there a way to achieve this without raw?
# Can I set up a model relation that allows me to do
# something like this?
# Dog.objects.filter(
# cats_by_the_same_owner__id=cat.id
# )
Hi,
I think you can try:
dog = Dog.objects.filter(onwer_id=cat.owner_id)
1 Like
Thanks, I may have oversimplified the example. The actual filter is a long chain of JOIN, in the middle of which dogs are joined with cats via owners, like this:
.filter(
some__more__stuff__dog__owner__cat__even_more___relations__id=id
)
…which I would like to turn into something that does not result in JOIN person
:
.filter(
some__more__stuff__dog__cats_by_the_same_owner__cat__even_more___relations__id=id
)
…or back to my original example refined a bit: given a cat_id
, I want to get dogs without JOIN person
:
Dog.objects.filter(
owner__cat__id=cat_id
)
To directly answer your question, no. The Django ORM does not provide a facility to do joins between tables based on a non-related field.
From a practical perspective, have you done an explain analyze
on both of your queries to actually prove that one is better than the other? I wouldn’t second-guess the query planner without some solid evidence that the query being executed is actually using all the resources you think it’s using.
I have done some explain analyze
and although this kind of optimization does not usually come first, below are benchmarks from a real query I’m working on (not the dogs/cats example). The original Django-generated query joins 7 tables, the optimized raw query only joins 4:
Before:
Explain analyze in production:
Planning time: 7-10ms
Execution time: 25-70ms, avg ~35ms
pgbench with 1000 repeats, non-production system with a clone of production data:
latency average = 15.315 ms
tps = 65.295739 (without initial connection time)
After:
Explain analyze in production:
Planning time: 2ms
Execution time: 18-40ms, avg ~22ms
pgbench with 1000 repeats, non-production system with a clone of production data:
latency average = 2.580 ms
tps = 387.647384 (without initial connection time)
That seems like a significant gain. The results might of course still be different in the wild and if there was an easy way of keeping a dozen lines of Python code and not turning it into ~80 or more lines of raw SQL I’d go for it without thinking much (and would come to a final conclusion after having collected performance monitoring data).
As there seems to be no Django-way to doing this, I might look elsewhere for optimization opportunities.
Thanks for the insights everyone!
1 Like