Filtering model instance's m2m field to exactly match another m2m field or list

Hi everyone. Second day in a row I’m struggling trying to filter instance by M2M field to match exactly another model M2M field contains similar data.
Let’s say I have models:

class Options(models.Model):
    name = model.CharField() # opt1, opt2, opt3 ... opt5

class Foo(models.Model):
    options = models.ManyToMany(Options)
    size = models.IntgerField()

class Bar(models.Model):
    options = models.ManyToMany(Options)
    size = models.IntgerField()

So, now I need filter() Foo instance to get all objects which contain exactly the same Option that Bar instance has.
For example: If Bar has options [‘opt2’, ‘opt3’, ‘opt5’] I want to get all Foo instance which have exactly 3 options above [‘opt2’, ‘opt3’, ‘opt5’]. Not one of them, not two of them.
I managed to get close with code:

bars = Bar.objects.all()
for bar in bars:
    bar_options = [option for option in bar.options.all()]

    foo_match = Foo.objects.filter(size = bar.size)
    if bar_options:
        foo_match = foo_match.filter(options__in=bar_options)\
                               .annotate(num_options=Count("options"))\
                               .filter(num_options=len(bar_options))

In most of the cases it works however in some iterations of Bar I have matched the same Foo instances even they have more options then Bar has.
Would appreciate any help.

Hello @DoOrQuit, could you share the database backend you are using?

This is not easy to achieve but depending on the backend we might be able to take shorcuts.

On Postgres the following should work in a constant number of queries (2).

# Keep a reference to foo and bar options with outer references so they can be reused.
foo_options = Foo.options.through.objects.filter(
    foo_id=OuterRef("pk")
).values("option_id")
bar_options = Bar.options.through.objects.filter(
    bar_id=OuterRef(OuterRef("pk"))
).values("option_id")
bars = Bar.objects.annotate(
    # Accumulate all the foo ids
    matching_options_foo_pks=ArraySubquery(
        Foo.objects.filter(
            # All the foo with no set difference between foo and bar options.
            ~Exists(
                foo_options.difference(bar_options)
            ),
            # ... and not difference between bar and foo options (both directions)
            ~Exists(
                bar_options.difference(foo_options)
            ),
        ).values("pk")
    )
)
# Retrieve all matching foos across all bars in a single query a bit
# like prefetch related does.
matching_foos = Foo.objects.in_bulk(
    set(itertools.chain.from_iterable(
        bar.matching_options_foo_pks for bar in bars
    ))
)
for bar in bars:
    foo_matches = [
        matching_foos[foo_pk]
        for foo_pk in bar.matching_options_foo_pks
    ]

This should generate SQL of the form

SELECT
    bar.*,
   ARRAY(
       SELECT foo.id
       FROM foo
       WHERE (
           NOT EXISTS (
               SELECT option_id FROM foo_options WHERE foo_options.foo_id = foo.id
               EXCEPT
               SELECT option_id FROM bar_options WHERE bar_options.bar_id = bar.id
           )
           AND NOT EXISTS (
               SELECT option_id FROM bar_options WHERE bar_options.bar_id = bar.id
               EXCEPT
               SELECT option_id FROM foo_options WHERE foo_options.foo_id = foo.id
           )
       )
   ) matching_options_foo_pks
FROM bar 

Thank you for a response. Your guess regarding DB backend was right.
However I’m experiencing some exceptions in code :

matching_foos = Foo.objects.in_bulk(
    set(itertools.chain.from_iterable(
        bar.matching_options_foo_pks for bar in bars
    ))
)

The SQL Exceptions:

django.db.utils.ProgrammingError: invalid reference to FROM-clause entry for table "app_foo"
LINE 1: ...app_foo_options" U0 WHERE U0."foo_id" = ("app...
                                                             ^
HINT:  Perhaps you meant to reference the table alias "v0".

Probably some naming issue but cannot resolve it yet.

So it seems to be a bug in the ORM when subqueries and composed queries are used together, the query works with the following patch to Django

diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index b33652dac5..34f9322d02 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -989,6 +989,17 @@ def change_aliases(self, change_map):
                 if alias == old_alias:
                     table_aliases[pos] = new_alias
                     break
+
+        # 3. Rename the external aliases of combined queries.
+        for combined_query in self.combined_queries:
+            external_change_map = {
+                alias: aliased
+                for alias, aliased in change_map.items()
+                if alias in combined_query.external_aliases
+            }
+            if external_change_map:
+                combined_query.change_aliases(external_change_map)
+
         self.external_aliases = {
             # Table is aliased or it's being changed and thus is aliased.
             change_map.get(alias, alias): (aliased or alias in change_map)

I’ll file a ticket later today about it and thanks to your example I already have a test handy :slight_smile:

As for making it work, albeit less efficiently, until this gets resolved in a released version of Django the following should do

bars = Bar.objects.prefetch_related("options")  # two queries
for bar in bars:
   bar_option_ids = sorted([option.id for option in bar.options])
   foos = Foo.objects.alias(  # one query per, effectively a N+1
       options_ids=ArraySubquery(
           Foo.options.through.objects.filter(
               foo_id=OuterRef("pk")
           ).order_by("option_id").values("option_id")
       ),
   ).filter(
       options_ids=bar_option_ids
   )

The actual ticket and associated PR.