Avoiding superflous join

mak-zascaler · October 25, 2024, 9:42pm

Hi,

I have a following problem which i don’t know how to solve

i have 3 models:

class A(models.Model):
   ...
  b  = models.OneToOneField(B)

class B(models.Model):
  ...
  m2m = models.ManyToMany(D,  through=C)

class C:
   ...
   b = models.ForeignKey(B)
   d = models.ForeignKey(D) 

class D:
   foo = models.IntegerField() # just some field it doesn't really matter

what im looking for is a query
select ... from A join C on (A.b_id = C.b_id) join D on (C.b_id = D.b_id) where D.foo = 1

unfortunately django’s typical .filter(b__d__foo=1) produces
select ... from A join B on (A.b_id = b.id) join B on (B.id = C.b_id) join D on (C.b_id = D.b_id) where D.foo = 1

which is useless is there any work around it?

CodenameTim · October 25, 2024, 11:36pm

What about using an Exists in which you can control the comparisons yourself? There’s also FilteredRelation, but I don’t think that’s going to help here.

from django.db.models import Exists, OuterRef

A.objects.filter(
    Exists(
        D.objects.filter(
            foo=1,
            c_set__b_id=OuterRef('b_id'),
        )
    )
)

mak-zascaler · October 28, 2024, 12:34pm

Yes, that seems to work quite fine - its a better version what im using currently and trying to replace with joins - thanks for that!

Once caveat tho, there is no way to express this without explicit through right?
Also while this works i guess no way to express what i need with pure joins?

KenWhitesell · October 28, 2024, 12:50pm

What is your real concern here?

When you’re working with the Django ORM, you are not logically working with tables and rows and joins - you’re working with Python objects that are used to create SQL statements being sent to the database engine. It’s an abstraction layer created precisely to allow you to ignore those underlying details in most cases. (Admittedly, it’s not a “perfect” abstraction, and there are limitations, as is the case with every ORM ever created.)

If performance is truly an issue (and not just an abstract goal), you would need to examine the output of an explain analyze of the queries being created on live data to determine if one approach or the other is truly better. Note that you can’t simply compare the SQL, because you can’t determine by visual inspection how the database engine is going to execute that SQL - and that can even be different depending upon the database engine, version, size of the tables involved, indexes, and possibly even tablespaces.

The SQL itself isn’t even really “code”. It’s more like a description of the data needing to be retrieved. The database engine is free to interpret, reinterpret, or even ignore parts of the SQL when creating its search path.

Topic		Replies	Views
How to optimize this query without raw? Using the ORM	4	680	August 19, 2022
Left join with additional condition Using the ORM	4	1505	February 22, 2024
Cannot access foreign key ID in nested join Using the ORM	4	1001	April 30, 2024
ORM filtering with nested joins Using the ORM	4	2024	August 14, 2022
Simple(ish) SQL to ORM query Using the ORM	2	437	March 23, 2023

Avoiding superflous join

Related topics