filtering for self-referring model with a specific field value

ehudkaldor · January 12, 2024, 6:11am

i have a model representing dependencies, with a self-referring non-symmetric manytomany field to describe said dependencies(field name is “depends_on”, related name is “dependees”). the model also has a text field representing its state (in_progress, success, failed).

i have a post_save trigger so that when an object’s state is updated. if the object state is changed to success, i want to get all its dependees that have all their depends_on objects state as success.

example:
lets say i have a object A, which depends on objects B and C, whose state currently is in_progress. now object B’s state changes to “success”. if i run the query i am looking for on B, it will not return A, because A’s other depends_on (C) state is still in_progress.
but if now C’s state is set to “success” and i run the query on C, it will return A, because both A’s depends_on are now “success”.

now, i know i can probably do some list comprehension on B’s dependees and then run a query on each dependee. but i am looking to cut down on database access, so i am looking for a way to get them all in a single query.

in words, i would say “when an object state is set to ‘success’, get me all its dependees which have all their depends_on members state as ‘success’”. and for the life of me, i cannot build this query.

any ideas?

KenWhitesell · January 12, 2024, 1:22pm

I believe that what you’re looking for would be a query using what’s known as a Common Table Expression (CTE).

The Django ORM itself does not support CTEs. There is a third-party package called django-cte that adds this facility. Also see Django CTE Trees — Django CTE Trees 1.0.0b2 documentation

But, as a cautionary tale, also see #34963 (Recursive and other "combinator" queries broken in django-cte) – Django

Your “safest” route in that it’s not subject to API-related issues would be to write the query yourself as raw SQL.

Another option, depending upon the operational characteristics of your data, would be to select a different (non-normalized) structure to represent your data. (See note below)

By “operational characteristics”, I’m referring to:

Total number of rows
The width and depth of the dependency “mesh”
- Because of the M2M relationships, this isn’t a tree
The potential existance of dependency loops
The frequency of inserts or removals of these objects
Whether these dependencies are static or dynamic
- I don’t mean this in absolute terms either. “Static” in this case may also mean that the dependencies are updated so infrequently that the time to adjust the data is negligible.

And probably some other issues not coming to mind.

Note: If you’re not familiar with some of the ideas for denormalizing self-referential data, see Managing Hierarchical Data in MySQL — Mike Hillyer's Personal Webspace and django-treebeard — django-treebeard 4.7 documentation.
I’m not suggesting you use any one of these three mechanisms. I’m only presenting them to you as examples of what can be done to create data structures that facilitate optimized queries. (Also to make it clear that there are different ways to handle situations like this, and that your selection of a structure does depend upon some of the operational characteristics of your data.)

Finally, if you find none of this acceptable, you could store your data in an external graph database that is used only to manage these relationships.

Topic		Replies	Views
How to filter the Model objects on a field that is a ManyToManyField? Using the ORM	11	1500	July 20, 2022
A specific ManyToMany query crash Using Django	2	1074	July 16, 2020
how to use Relationship objects Using the ORM	2	757	September 9, 2022
"recursive" filter -- db-side Using the ORM	6	2002	March 16, 2022
Sum with filter on related queryset Using the ORM	1	2990	September 1, 2022

filtering for self-referring model with a specific field value

Related Topics