Is there any way to avoid using exclude()?

Hello! I’m just starting using Django, the task is to create large DB (millions of records, largest table has about 70 columns). Current problem is filtering data - getting values using 4 conditions, one is not equal. As I understand, I have to use .exclude(condition=value), but it makes query work more than 10 minutes instead of 3-4 secs. I’ve analyzed generated SQL query and found that instead of simple “AND field != value” it uses embedded query, which selects all of these fields equal to value, then filtering them using NOT…
Models for example:
class Model1(models.Model):
name = models.TextField()

class Model2(models.Model):
    model1 = models.ForeignKey(Model1, on_delete=models.CASCADE)
    name = models.TextField()


class Model3(models.Model):
    name = models.TextField()
    model2 = models.ForeignKey(Model2, on_delete=models.CASCADE)

Generated query:
SELECT "db_model1"."id", "db_model1"."name" FROM "db_model1" INNER JOIN "db_model2" ON ("db_model1"."id" = "db_model2"."model1_id") WHERE ("db_model2"."name" = m2 AND NOT ("db_model1"."id" IN (SELECT U1."model1_id" FROM "db_model2" U1 INNER JOIN "db_model3" U2 ON (U1."id" = U2."model2_id") WHERE U2."name" = m3)))
instead of simply:
SELECT "db_model1"."id", "db_model1"."name" FROM "db_model1" INNER JOIN "db_model2" ON ("db_model1"."id" = "db_model2"."model1_id") INNER JOIN "db_model3" ON ("db_model2"."id" = "db_model3"."model2_id") WHERE "db_model2"."name" = m2 AND "db_model3"."name" != m3
Is there any way to solve this problem otherwise than using raw SQL for all such queries?
Thanks in advance, sorry for my english))

You can define this as a Q clause and then negate that clause in your filter - something like:

Model.objects.filter(~Q(name="ABC"))

Thanks, forgot to say - tried that, it also generates the same slow subquery…

In this case, then I’d be inclined to believe there’s some aspect of this not being shown in your examples.

From a testing environment where I run manage.py shell_plus --print-sql I have:

In [2]: BTitle.objects.filter(Q(title='Abc'))
Out[2]: SELECT "bt_btitle"."id",
       "bt_btitle"."title"
  FROM "bt_btitle"
 WHERE "bt_btitle"."title" = 'Abc'
 LIMIT 21

Execution time: 0.000937s [Database: default]
<QuerySet []>

In [3]: BTitle.objects.filter(~Q(title='Abc'))
Out[3]: SELECT "bt_btitle"."id",
       "bt_btitle"."title"
  FROM "bt_btitle"
 WHERE NOT ("bt_btitle"."title" = 'Abc')
 LIMIT 21

Execution time: 0.000905s [Database: default]
<QuerySet []>

This demonstrates that the negation will generate the not clause on a Q object.

If you would share the actual models and queries being used and tested, we might be able to identify the issue.

It seems like you could use an Exists subquery here.

from django.db.models import OuterRef, Exists

Model1.objects.filter(
    ~Exists(Model3.objects.filter(
        model2__model1_id=OuterRef('id'),
    ))
)

This may not help your queries perform better though. That may require using explain analyze if it’s PostgreSQL or the equivalent for whichever DB you are using.

Edit:
You’ll need to modify that Model3 queryset to do the filtering to exclude the instances you want excluded still. This was just an example of the concept.

So it’s look like this behavior is applied to related fields… Sorry, forgot to share the filter expression for example models:

print(Model1.objects.filter(model2__name='m2').exclude(model2__model3__name='m3').query)

I don’t think there is any reason to share real database structure - the difference will be mostly in number of fields (70+ columns in ‘model3’ table) and their names, and the number of .filter() in the expression itself)) The relation structure is the same, as in these simple models I’ve wrote for example.

Thank you, will read about the OuterRef and its usage, didn’t even hear of it before))

There are two possible different interpretations that you might be looking for here, and they would have two different implementations.

You might be looking for:

I want all instances of Model1, which have at least one instance of Model2 referring to it with a name = ‘m2’, except for those instances where there’s any Model3 referring to Model2 having a name = ‘m3’.

or, the other logical interpretation is:

I want all instances of Model1, which have at least one instance of Model2 referring to it with a name = ‘m2’, and having an instance where there’s a Model3 referring to Model2 having a name other than ‘m3’.

The ORM query you’ve generated addresses the first requirement, where the query you’ve written appears to address the second. They are not the same query. (Most notably, the ORM query will return results where there are no Model3s referring to a Model2 - the exclude will yield a null set for exclusion, while your hand-crafted query will not yield results in that case because of the inner join requiring entities existing on both sides of the join.)

1 Like

Thank you for the explanation. In fact, any model1 always linked with one model2 and one or more model3’s linked to the same model2: model1 is unique case data, model2 and model3 - common data that present in many cases (model3 are ‘properties’ of model2). So the goal is to get records list by values from model1, model2 and model3, filters for m1 and m2 uses equal and lte, for model3 - equal, isin and not equal.
As I understand (will clarify tomorrow), model3 records, that doesn’t fit the filter, shouldn’t be selected, so for now looks like it’s really better to use SQL and select all 3 models as joined rows because ORM will anyway get all values of model3 for every model2 and we will need to filter wrong values manually before displaying them…
Thanks again for the answers!

This quirk also surprised me, so even though I’m late to the party, I’ve found a solution how to overcome this issue and avoid highly inefficient subquery that with big tables can be slow as hell.

Instead of this

Model.objects.exclude(submodels__name=‘Not me’)

That roughly translates to:

SELECT * FROM “model” WHERE NOT (“model”.“id” IN (SELECT U1.“model_id” FROM “submodel” U1 WHERE U1.“name” = %s))

Use this:

Model.objects.annotate(_submodel_name=F(‘submodels__name’)).exclude(_submodel_name=‘Not me’, _source_name__isnull=False)

That translates to:

SELECT * FROM “model” LEFT OUTER JOIN “submodel” ON (“model”.“id” = “submodel”.“model_id”) WHERE NOT (“submodel”.“name” = %s AND “submodel”.“name” IS NOT NULL)

This change showed drastic difference in our production environment, give it a try.