Supporting custom JOIN query `extra(joins=['JOIN ....'])`

I’ve been using Django with my company’s project for about 3 years now, and have often wanted to create more advanced queries, without dipping into raw SQL

They often have to do with legacy decisions on model layout

A simplified example in my app:

Customer:
    name
    legacy_id: TEXT (indexed)  # ex: `"1234"`
    user = ForeignKey

User:
    username
    legacy_id: INT (indexed) # ex: `1234`

I would want to query Customer__user, not only via the user_id ForeignKey, but via the legacy_id.


I could see extra being extended to have a joins kwarg where I can inject custom SQL.

It is very similar to the tables kwarg

c = Customer.objects.extra(
    selects={'username': '"u"."username"'}, 
    # NOTE: casts `u.legacy` to `text` prior to joining
    joins=['LEFT JOIN "users" "u" ON "u"."legacy_id"::text = "customers"."legacy_id"'],
)

print(c.username)

from the docs:

This is an old API that we aim to deprecate at some point in the future

I personally often need to do something that is easy in SQL, but less intuitive in django. I wonder why the django community seems to want to move away from these.

I began my career in Rails, which had a very intuitive method for this, it would be nice to see Django support it as well.

Author.joins("INNER JOIN books ON books.author_id = authors.id AND books.out_of_print = FALSE")

I’m not clear on what you’re trying to accomplish here that can’t be handled through the ORM. Do you have a specific example you can post? (You make a reference to a phone_number, but your simplified example doesn’t include one.)

I grant that there are some very complex queries that can’t be easily handled through the ORM, or situations where the ORM isn’t a good fit, and for that there will always be the ability to write raw SQL. It is generally acknowledged that the ORM doesn’t resolve every issue or is a complete 100% replacement for SQL. But, I think the ORM may go a lot farther than you might think in what it can do.

Ken

Hi Ken,

Thanks for responding, I edited my original post – the key part being:

    # NOTE: casts `u.legacy` to `text` prior to joining
    joins=['LEFT JOIN "users" "u" ON "u"."legacy_id"::text = "customers"."legacy_id"'],

(removed comment about phone_number)

Yep, you’re right.

Depending upon a lot of things that you probably don’t want to bother with describing here, I see your point with this - and this does fit into the category of things that the Django ORM doesn’t handle well. (That general category being databases created outside of Django and/or databases not well designed by a Django perspective.)

If this is something to be used frequently or exists in sufficiently large quantities where performance is going to be an issue, I’d still consider either adding the appropriate column to the many side of the relationship or adding a join-table based upon the legacy_id field.
(Even though both those legacy_id fields are indexed, because of the need to cast the user legacy_id field to text, postgresql needs to perform that operation on every row before doing the lookup on the customer table, creating some degree of overhead. The amount and degree of overhead is going to be roughly proportional to the size of the tables. At less than 1000 rows, it’s probably negligible.)

Assuming Join.as_sql was adjusted to properly deal with mismatching column type you should be able to achieve what you’re after by using ForeignKey(to_field, from_field).

class User(models.Model):
    legacy_id = models.IntegerField(unique=True)

class Customer(models.Model):
    legacy_id = models.TextField(db_index=True)
    user = models.ForeignKey(
        User, from_field='legacy_id', to_field='legacy_id',
    )

Another solution could be to add support for transforms to ForeignObject.from_fields/.to_fields so you could do something like

user = models.ForeignKey(
    User, from_field='legacy_id', to_field='legacy_id__text',
)

Note that unless you have a functional index on users.legacy_id::text or customers.legacy_id::int I doubt that indices will be able to be used on JOINing.

In the mean time a solution could be to teach PostgreSQL about how to do an implicit cast from int -> text via CREATE CAST (integer AS text) WITH INOUT AS IMPLICIT.