Hi everyone. Second day in a row I’m struggling trying to filter instance by M2M field to match exactly another model M2M field contains similar data.
Let’s say I have models:
class Options(models.Model):
name = model.CharField() # opt1, opt2, opt3 ... opt5
class Foo(models.Model):
options = models.ManyToMany(Options)
size = models.IntgerField()
class Bar(models.Model):
options = models.ManyToMany(Options)
size = models.IntgerField()
So, now I need filter()
Foo
instance to get all objects which contain exactly the same Option
that Bar
instance has.
For example: If Bar
has options [‘opt2’, ‘opt3’, ‘opt5’] I want to get all Foo
instance which have exactly 3 options above [‘opt2’, ‘opt3’, ‘opt5’]. Not one of them, not two of them.
I managed to get close with code:
bars = Bar.objects.all()
for bar in bars:
bar_options = [option for option in bar.options.all()]
foo_match = Foo.objects.filter(size = bar.size)
if bar_options:
foo_match = foo_match.filter(options__in=bar_options)\
.annotate(num_options=Count("options"))\
.filter(num_options=len(bar_options))
In most of the cases it works however in some iterations of Bar
I have matched the same Foo
instances even they have more options then Bar
has.
Would appreciate any help.
Hello @DoOrQuit, could you share the database backend you are using?
This is not easy to achieve but depending on the backend we might be able to take shorcuts.
On Postgres the following should work in a constant number of queries (2).
# Keep a reference to foo and bar options with outer references so they can be reused.
foo_options = Foo.options.through.objects.filter(
foo_id=OuterRef("pk")
).values("option_id")
bar_options = Bar.options.through.objects.filter(
bar_id=OuterRef(OuterRef("pk"))
).values("option_id")
bars = Bar.objects.annotate(
# Accumulate all the foo ids
matching_options_foo_pks=ArraySubquery(
Foo.objects.filter(
# All the foo with no set difference between foo and bar options.
~Exists(
foo_options.difference(bar_options)
),
# ... and not difference between bar and foo options (both directions)
~Exists(
bar_options.difference(foo_options)
),
).values("pk")
)
)
# Retrieve all matching foos across all bars in a single query a bit
# like prefetch related does.
matching_foos = Foo.objects.in_bulk(
set(itertools.chain.from_iterable(
bar.matching_options_foo_pks for bar in bars
))
)
for bar in bars:
foo_matches = [
matching_foos[foo_pk]
for foo_pk in bar.matching_options_foo_pks
]
This should generate SQL of the form
SELECT
bar.*,
ARRAY(
SELECT foo.id
FROM foo
WHERE (
NOT EXISTS (
SELECT option_id FROM foo_options WHERE foo_options.foo_id = foo.id
EXCEPT
SELECT option_id FROM bar_options WHERE bar_options.bar_id = bar.id
)
AND NOT EXISTS (
SELECT option_id FROM bar_options WHERE bar_options.bar_id = bar.id
EXCEPT
SELECT option_id FROM foo_options WHERE foo_options.foo_id = foo.id
)
)
) matching_options_foo_pks
FROM bar
Thank you for a response. Your guess regarding DB backend was right.
However I’m experiencing some exceptions in code :
matching_foos = Foo.objects.in_bulk(
set(itertools.chain.from_iterable(
bar.matching_options_foo_pks for bar in bars
))
)
The SQL Exceptions:
django.db.utils.ProgrammingError: invalid reference to FROM-clause entry for table "app_foo"
LINE 1: ...app_foo_options" U0 WHERE U0."foo_id" = ("app...
^
HINT: Perhaps you meant to reference the table alias "v0".
Probably some naming issue but cannot resolve it yet.
So it seems to be a bug in the ORM when subqueries and composed queries are used together, the query works with the following patch to Django
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index b33652dac5..34f9322d02 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -989,6 +989,17 @@ def change_aliases(self, change_map):
if alias == old_alias:
table_aliases[pos] = new_alias
break
+
+ # 3. Rename the external aliases of combined queries.
+ for combined_query in self.combined_queries:
+ external_change_map = {
+ alias: aliased
+ for alias, aliased in change_map.items()
+ if alias in combined_query.external_aliases
+ }
+ if external_change_map:
+ combined_query.change_aliases(external_change_map)
+
self.external_aliases = {
# Table is aliased or it's being changed and thus is aliased.
change_map.get(alias, alias): (aliased or alias in change_map)
I’ll file a ticket later today about it and thanks to your example I already have a test handy
As for making it work, albeit less efficiently, until this gets resolved in a released version of Django the following should do
bars = Bar.objects.prefetch_related("options") # two queries
for bar in bars:
bar_option_ids = sorted([option.id for option in bar.options])
foos = Foo.objects.alias( # one query per, effectively a N+1
options_ids=ArraySubquery(
Foo.options.through.objects.filter(
foo_id=OuterRef("pk")
).order_by("option_id").values("option_id")
),
).filter(
options_ids=bar_option_ids
)
@charettes thank you for such a detailed answer which gave me some ideas.
I ended up with something like this:
bars = Bar.objects.all()
for bar in bars:
bar_options = [option for option in bar.options.all()]
foo_base_qs = Foo.objects.filter(size = bar.size)
if bar_options:
options_num = len(bar_options)
foo_match = foo_base_qs.annotate(
total_options=Count("options__name"),
matching_options=Count(
'options__name',
filter=Q(options__name__in=options)
)
).filter(
matching_options=options_num,
total_options=options_num
)
return foo_match
Probably not perfect for some cases but works so far in my particular cases.