LEFT JOIN... WHERE IS NULL clause

mariostg · April 12, 2023, 3:21pm

I know that it is strongly encouraged to use foreign key to execute such equivalent query, but in this case, I cannot.

I have an EncumbranceImport table that gets data on a daily basis through a csv file and its content has to be cleaned up before further processing. One of the cleaning process it to verify if there are any costcenter in the EncumbranceImport that do not currently exists in the CostCenters table. IF so, a list has to be made and presented to the user for further action.

I cooked up this function but clearly, the query generated does not look at all what would be expected. Tried with ~Q too. Being reasonably familiar with SQL, I wonder if I should not simply use raw query. Or that will be another source of problems?

    def missing_costcenters(self):
        qs_cc = EncumbranceImport.objects.all().values_list("costcenter")
        cc_set = set(map(lambda e: e[0], qs_cc))
        q_list = [Q(costcenter=cc) for cc in cc_set]
        print(q_list)
        qset = Costcenters.objects.filter(reduce(operator.or_, q_list))
        print(qset)
        print(qset.query)

Print output:

[<Q: (AND: ('costcenter', '8486C2'))>, <Q: (AND: ('costcenter', '8486C1'))>, <Q: (AND: ('costcenter', '8486B1'))>]
<QuerySet [<Costcenters: 8486C1 - KITCHEN>]>
SELECT "costcenters_costcenters"."id", "costcenters_costcenters"."costcenter", "costcenters_costcenters"."shortname", "costcenters_costcenters"."fund_id", "costcenters_costcenters"."source_id", "costcenters_costcenters"."isforecastable", "costcenters_costcenters"."isupdatable", "costcenters_costcenters"."note", "costcenters_costcenters"."parent_id" FROM "costcenters_costcenters" WHERE ("costcenters_costcenters"."costcenter" = 8486C2 OR "costcenters_costcenters"."costcenter" = 8486C1 OR "costcenters_costcenters"."costcenter" = 8486B1)

KenWhitesell · April 12, 2023, 5:14pm

Let me see if I understand you correctly.

You’re reading a file to be imported into EncumbranceImport.

You want to find all costcenter values in EncumbranceImport that are not in the CostCenters table.

If I am correct, do you need any other information from the EncumbranceImport data beyond those missing Costcenter objects?

What is the size of this Costcenter model? (Roughly how many rows?) If it’s not too huge (> 1M or so) I’d be looking at something like this:

import_cc = set(EncumbranceImport.objects.values_list("costcenter", flat=True))
costcenters = set(Costcenters.objects.values_list('pk', flat=True))
missing = import_cc.difference(costcenters)

mariostg · April 12, 2023, 6:48pm

Yes, you do understand exactly the situation.
In the EncumbranceImport, there is more to be done. What is done for costcenters has to be done for funds too. If unknown cost centers are found, then the process must stop and action taken before doing anything else. There are also defaults values the need to be set in other tables according to some other fields in EncumbranceImport. Exemple, if balance reaches 0, then forecast must be set to working plan. etc.

CostCenter model is les than 1000 lines, EncumbranceImport not more than 50k lines.

While waiting for someone to answer, I came up with this

        qs_cc_import = EncumbranceImport.objects.all().values_list("costcenter")
        cc_set_import = set(map(lambda e: e[0], qs_cc_import))
        qs_cc = Costcenters.objects.all().values_list("costcenter")
        cc_set = set(map(lambda e: e[0], qs_cc))
        return cc_set_import.difference(cc_set)

I don’t know what flat=True is but looks promising.
Thanks again.

KenWhitesell · April 12, 2023, 7:03pm

Side note: Beyond the point that flat=True makes either of these unnecessary, there’s a better alternative this:

would be this:

cc_set_import = {
   e.costcenter for e in EncumbranceImport.objects.distinct('costcenter').only('costcenter')
}

mariostg · April 19, 2023, 1:42pm

Good side note. I had seen object.distinct, but sqlite3 does not support this capability. It is a bit annoying because when I am in production with postgresql it wont be an issue.

KenWhitesell · April 19, 2023, 4:28pm

The distinct function in this case is really just a “micro-optimization “. Removing it does not affect the fundamental functionality of the statement.

Topic		Replies	Views
Django ORM: equivalent of SQL `NOT IN` query Using Django	5	1589	March 14, 2020
Left join with additional condition Using the ORM	4	1704	February 22, 2024
Database Query Using Django	5	922	February 7, 2020
Question about Queries Using Django	4	788	March 20, 2020
How to optimize this query without raw? Using the ORM	4	689	August 19, 2022

LEFT JOIN... WHERE IS NULL clause

Related topics