Problem filtering multiple values in ManyToMany relationship

Hi!

I made a Django app where the user can create an arbitrary filter using buttons, selects and inputs. The filter is represented in JSON format and later converted to a Django ORM query. The conversion code is based on the django.db.models.Q object, using & and | to combine the multiple filter rules to a single Q object that is passed to a Model.objects.filter.

It has been working for more than a year, but yesterday we found a case where it doesn’t: when we need to filter by multiple objects in the same ManyToMany relationship.

Using the example from the docs, with Publication and Article, I’m doing something equivalent to:

q = Q(publications__id=1) & Q(publications__id=2)
Article.objects.filter(q)

And the result is always None. I think it’s trying to find a publication with id==1==2, what is impossible.

The common workaround is to use multiple .filter calls:

Article.objects.filter(publications__id=1).filter(publications__id=2)

This also works with my models, but doesn’t seem a viable solution for my case, since mine is based on the Q object. Maybe I could have a special handling for the outer most filter rules, to use multiple .filter calls instead of combining Q objects, but these type of rules (ManyToMany relationship filtering) may appear in more deep levels of the JSON filter (inside other AND and OR rules). I would have to rewrite the entire conversion code to use only .filter and .exclude, instead of Q object, I’m not even sure it is possible…

The questions are:

  1. Is this an expected behavior? This ticket seems to state that yes. (but I’m not very convinced)
  2. How to reproduce the multiple .filter calls behavior using only Q objects?

Thanks for the attention!

I confirm that this is the expected behaviour, please refer to this part of the documentation about filtering against multi-valued relationships.

Is this an expected behavior? This ticket seems to state that yes. (but I’m not very convinced)

As stated in the documentation dealing with multi-valued relationships filtering raises the question of whether to require each attribute to coincide in the same related object.

Because Django must support both mode of filtering, that is either each attributes on the same related objects match or not, it has to provide an API for both.

I’ll state that in most reports from users confused about this differentiation the stated expectations are the opposite of yours; folks want the single filter call behaviour from multiple filter calls. This supports the current state of things where both have to exist to force the user to lift the ambiguity that arises from filtering multi-valued relationships.

How to reproduce the multiple .filter calls behavior using only Q objects?

AFAIK there is no way to do that without calling internal Querset methods. When you make a call to filter all spanned JOINs are reused, when you make distinct filter calls only single-valued relationship JOINs are.

Thanks for your response!

I see your point.

I think that, for this type of case (filter multiple “equalities” for the same field), the current behavior doesn’t make sense, because no object will ever have different values for the same field at the same time. But I also understand that this is a specific case, and that the behavior of the filtering should not change for a specific case.

Given that, any suggestions about what should I do?

  1. Is using the internal Querset methods too dangerous? Could you point me to which method should I use there?
  2. Currently I use recursion to create the Q object from the JSON, starting from the more deep rules. Would it be possible to adapt it to use the .filter way?

EDIT: Also, if the .filter API supports the two behavior, shouldn’t the Q object also support both of them?

Agreed. I understand that in your particular example the criterion are mutually exclusive which are guiding your expectations but it’s not something the ORM can infer.

Is using the internal Querset methods too dangerous? Could you point me to which method should I use there?

I wouldn’t personally use them as they can change anytime and are undocumented and I don’t feel comfortable sharing a method that could turn into an anti-pattern going forward. If you’re interested look into reused_aliases and how it’s passed around.

Also, if the .filter API supports the two behavior, shouldn’t the Q object also support both of them?

I think this is a mis-categorisation of the problem, I feel like it has little to do with the usage of Q which is only a mean to call into filter. In other words, the filter method has a way to express both joint and disjoint multi-valued filtering and Q is just a way to manipulate criterion that is more powerful than dict.

Currently I use recursion to create the Q object from the JSON, starting from the more deep rules. Would it be possible to adapt it to use the .filter way?

Hard to tell without seeing how you are actually doing it but I assume you could build a list[Q] instead of a single Q that you pass to filter and then do

criteria: list[Q] = get_filter_criteria(request.data)
qs = Article.objects.all()
for criterion in criteria:
    qs = qs.filter(criterion)
return qs

Thanks again for your reply!

The problem with this strategy, if I understand it right, is the one I tried to express in my first message: “Maybe I could have a special handling for the outer most filter rules, to use multiple .filter calls instead of combining Q objects” (that’s what you seem to suggest) “but these type of rules (ManyToMany relationship filtering) may appear in more deep levels of the JSON filter (inside other AND and OR rules).”

Example:

{
    "and": [
        {
            "and": [
                {
                    "equal": {
                        "publications": 1
                    }
                },
                {
                    "equal": {
                        "publications": 2
                    }
                }
            ]
        }
		<more rules>
    ]
}

Even if I handle the outer “and” the way you suggest, wouldn’t the inner “and” still have the same problem?

A stripped down version of my current code (removed comparison operations and error handling):

import functools
import operator
from django.db.models import Q


class FilterParser:
    def __init__(self):
        self.operations = {
            'and': lambda args: functools.reduce(operator.and_, map(self.parse, args)),
            'or': lambda args: functools.reduce(operator.or_, map(self.parse, args)),
            'not': lambda query: ~self.parse(query),
        }

    def parse(self, rule: dict) -> Q:
        operation, arguments = next(iter(rule.items()))
        if fn := self.operations.get(operation):
            return fn(arguments)

It resolves the recursion from the deepmost rules to the outermost rules.

If every time the code finds an “and” it merges the Qs with a .filter loop, creating a queryset object, in the next step it would have to merge multiple querysets, the same way it currently merges multiple Qs.

But… I did some research now, and it is possible, right? We can use & and | between querysets objects. I guess I never did it. Did a test now and the & between querysets seem to behave like multiple .filter calls, exactly what I need. So maybe I don’t even need the Q objects, and can use only querysets.

So, changing my comparison operations to return querysets, I would have something equivalent to:

q1 = Article.objects.filter(publications__id=1)
q2 = Article.objects.filter(publications__id=2)
queryset = Article.objects.all()
queryset &= q1
queryset &= q2

What about the “not”? We can’t pass a queryset to .exclude, right? Neither do ~queryset.

Maybe a flag passed down by recursion when it finds a “not”, so inner comparisons operations use .exclude instead of .filter. But I’m still not sure if it will work, or if there is a better way…

EDIT: The flag should probably be inverted each time a “not” is found in the lower levels. :thinking:

EDIT2: And the flag also switches “and” to “or” and “or” to “and”…