It produces these two separate SQL queries (1 SELECT + 1 UPDATE):
SELECT `polls_choice`.`id` FROM `polls_choice` INNER JOIN `polls_question` ON (`polls_ch
oice`.`question_id` = `polls_question`.`id`) WHERE (`polls_choice`.`id` = 1 AND `polls_q
uestion`.`question_text` = 'TEST')
(The UPDATE query will only be executed if the first one returns rows)
UPDATE `polls_choice` SET `votes` = 100 WHERE `polls_choice`.`id` IN (1)
If I remove this filter .filter(question__question_text="TEST"), it results in a single UPDATE query as expected:
Choice.objects.filter(pk=1).update(votes=100)
UPDATE `polls_choice` SET `votes` = 100 WHERE `polls_choice`.`id` = 1
So this means, if your QuerySet has some filter conditions based on related objects, AND then you run .update(...) on it, it results in 2 separate queries instead of a single update (it can be done with 1 single update using joins).
Is this behaviour documented?
Since this silently turns the .update(...) into two separate queries it can cause race conditions that would otherwise not happen if the operation was performed as a single query.
The reason why it happens is that the original QuerySet.update logic predates generalized support for UPDATE JOIN and thus took an approach where it turned any references to the non-target table in filter (and thus WHERE) into a subquery of the form
UPDATE
`polls_choice`
SET
`votes` = 100
WHERE
`polls_choice`.`id` IN (
SELECT
`polls_choice`.`id`
FROM
`polls_choice`
INNER JOIN `polls_question` ON (
`polls_choice`.`question_id` = `polls_question`.`id`
)
WHERE (
`polls_choice`.`id` = 1
AND `polls_question`.`question_text` = 'TEST'
)
The problem with this approach though is that MySQL doesn’t allow self-select updates (UPDATE table ... WHERE (SELECT ... FROM table)) so the ORM must materialize the list of IDs on the Python side and pass it back (opening a race condition window).
Ultimately I think the SQLUpdateCompiler logic should be refactored to make use of JOIN if supported instead and fallback to the IN (subquery) mechanism otherwise but that’s a lot work. In the mean time I think an admonition to the update documentation could be worthy.
@charettes thanks for the info! greatly appreciated.
I think it really warrants a big red warning box in the docs. We had multiple people banging their heads against the wall for multiple days chasing “the impossible race condition”. Because if you read the Python code with the assumption that .update(...) is a single UPDATE query then there’s no bug.
We thought we may be observing a MySQL transaction bug, until an inspection of MySQL general query logs revealed the issue of SELECT-then-UPDATE.
The docs even have this snippet in them as of now:
Using update() also prevents a race condition wherein something might change in your database in the short period of time between loading the object and calling save() .
When under the conditions discussed, .update() causes the exact same race condition that the docs are suggesting that it prevents.
On MySQL, QuerySet.update() may execute a SELECT followed by an UPDATE instead of a single UPDATE when filtering on related tables, which can introduce a race condition if concurrent changes occur between the queries. To ensure atomicity, consider using transactions or avoiding such filter conditions on MySQL.
The third demonstrates how the MySQL flavor of UPDATE FROM (which supports updating many tables in a single statement which is pretty wild) can be used to support MTI updates in a single statement even when RETURNING isn’t supported (which is the case on MySQL).
I think 1. should be considered a strong contender for merging as-is because it addresses a technically avoidable foot-gun and 2. is a good basis for anyone interested in learning more about the ORM (3. remains more a PoC than anythings). I’d be happy to propose a feature request for 1. if it gets support here and sponsor / mentor a potential contributor with a reasonable experience with SQL to see 2. through.