Avoiding superflous join

Hi,

I have a following problem which i don’t know how to solve

i have 3 models:

class A(models.Model):
   ...
  b  = models.OneToOneField(B)

class B(models.Model):
  ...
  m2m = models.ManyToMany(D,  through=C)

class C:
   ...
   b = models.ForeignKey(B)
   d = models.ForeignKey(D) 

class D:
   foo = models.IntegerField() # just some field it doesn't really matter

what im looking for is a query
select ... from A join C on (A.b_id = C.b_id) join D on (C.b_id = D.b_id) where D.foo = 1

unfortunately django’s typical .filter(b__d__foo=1) produces
select ... from A join B on (A.b_id = b.id) join B on (B.id = C.b_id) join D on (C.b_id = D.b_id) where D.foo = 1

which is useless is there any work around it?

What about using an Exists in which you can control the comparisons yourself? There’s also FilteredRelation, but I don’t think that’s going to help here.

from django.db.models import Exists, OuterRef

A.objects.filter(
    Exists(
        D.objects.filter(
            foo=1,
            c_set__b_id=OuterRef('b_id'),
        )
    )
)

Yes, that seems to work quite fine - its a better version what im using currently and trying to replace with joins - thanks for that!

Once caveat tho, there is no way to express this without explicit through right?
Also while this works i guess no way to express what i need with pure joins?

What is your real concern here?

When you’re working with the Django ORM, you are not logically working with tables and rows and joins - you’re working with Python objects that are used to create SQL statements being sent to the database engine. It’s an abstraction layer created precisely to allow you to ignore those underlying details in most cases. (Admittedly, it’s not a “perfect” abstraction, and there are limitations, as is the case with every ORM ever created.)

If performance is truly an issue (and not just an abstract goal), you would need to examine the output of an explain analyze of the queries being created on live data to determine if one approach or the other is truly better. Note that you can’t simply compare the SQL, because you can’t determine by visual inspection how the database engine is going to execute that SQL - and that can even be different depending upon the database engine, version, size of the tables involved, indexes, and possibly even tablespaces.

The SQL itself isn’t even really “code”. It’s more like a description of the data needing to be retrieved. The database engine is free to interpret, reinterpret, or even ignore parts of the SQL when creating its search path.

1 Like