Left join with additional condition

Let us assume I have these two models:

class Project(Model):
    # ...

class ProjectTask(Model):
    project = ForeignKey(Project)
    canceled = BooleanField()

I need to model this (legacy) query using the Django ORM:

SELECT * FROM project JOIN project_task ON project_task.project_id = project.id AND NOT project_task.canceled

I gave this a try like this:

Project.objects.filter(projecttask__canceled=False)

However, this is the query I get:

SELECT
    "app"."project"."id",
    -- other fields
FROM
    "app"."project"
    LEFT OUTER JOIN "app"."project_task" ON (
        "app"."project"."id" = "app"."project_task"."project_id"
    )
WHERE
        NOT "app"."project_task"."canceled"

Which yields different results.

Is there a way, at ORM level, to move the extra condition inside of the join clause as opposed to the here clause?

There are two possible interpretations and types of results for this type of situation.

It seems to me (and I could be wrong here) that the query you have written is going to give you (possibly) multiple instances of the same project instance - one for each instance of project_task where canceled == False.

If that’s what you want, then you need to “invert” your thinking on the query.
e.g.:
project_tasks = ProjectTask.objects.filter(canceled=False) will give you all the ProjectTask instances that aren’t cancelled, and access to their related Project.
(You can improve the performance of this query by adding a select_related('project') clause to the query, but it is not required.)

If this isn’t what you’re looking for, then please explain how the ORM query you have shown here generates results different from what is required.

Apparently, FilteredRelation was what I was looking for.

Project.objects.annotate(projecttask_rel=FilteredRelation("projecttask", Q(cancelled=False)))

then, any fields I would access using projecttask need to be accessed using projecttask_rel:

Project.objects.annotate(projecttask_rel=FilteredRelation("projecttask", Q(projectttask__cancelled=False)), task_id=F("projecttask_rel__id"))

This generates a LEFT JOIN statement with the condition inside of it, as opposed to being in the WHERE clause.

While working on the (badly designed) legacy db that brought me here with the original question of this thread, I noticed another issue that I wasn’t sure how to handle with the migration to Django.

Apparently (for reasons that are beyond me), there are some tables that have a fk to a certain table, but which is also used to JOIN other tables in some queries.

For example: let’s say there’s table_a, table_b, and table_c.
table_c has a field, a_id, which is a foreign key to table_a.

However, there are also some queries which are being performed on the db that look something like:

select table_b.*, table_c.some_field from table_b left join table_c on table_c.a_id = table_b.id

Of course, it becomes pretty hard to model this query with the ORM if the models look like this:

class TableA(Model):
    # ...

    class Meta:
        managed = False
        db_table = '"schema1"."table_a"'

class TableB(Model):
    # ...

    class Meta:
        managed = False
        db_table = '"schema1"."table_b"'

class TableC(Model):
    a = ForeignKey(TableA)
    some_field = ...
    # ...

    class Meta:
        managed = False
        db_table = '"schema1"."table_c"'

Have you ever found yourself in such a situation?

This is how I was able to work around this: I created a proxy model for TableC, which looks like this:

class TableCWithReferenceToTableB(TableC):
    _a = ForeignKey(TableB, db_constraint=False, db_column="a_id")
    
    class Meta:
        managed = False
        proxy = True
        db_table = '"schema1"."table_c"'

Now, I am able to use the ORM like this:

TableB.objects.filter(tablecwithreferencetotableb__some_field="some_condition")

Which generates a query similar to

select * from table_b inner join table_c on table_c.a_id = table_c.id where table_c.some_field = "some_condition"

How does this solution look? Do you think there’s a better way to go around this design flaw in the db?

I have encountered a situation that resembles the one you previously discussed. In my Django project, I have models for Author and Book as follows:

from django.db import models

class Author(models.Model):
    name = models.CharField(max_length=100, primary_key=True)
    date = models.CharField()
    other_field = models.CharField()

class Book(models.Model):
    title = models.CharField(max_length=200)
    author = models.ForeignKey(Author, on_delete=models.CASCADE)
    publication_date = models.CharField()

I’m aiming to construct an SQL query akin to the following:

SELECT book.title, book.author, author.other_field AS other_field
FROM book
LEFT OUTER JOIN author ON (book.author = author.name AND book.publication_date = author.date)

However, I’m uncertain whether achieving this using Django ORM is feasible, or if resorting to a raw query is necessary. Despite conducting extensive research, I haven’t found a solution yet. I initially explored the possibility of using FilteredRelation, but it became apparent that it’s not suitable for this particular scenario.