It produces these two separate SQL queries (1 SELECT + 1 UPDATE):
SELECT `polls_choice`.`id` FROM `polls_choice` INNER JOIN `polls_question` ON (`polls_ch
oice`.`question_id` = `polls_question`.`id`) WHERE (`polls_choice`.`id` = 1 AND `polls_q
uestion`.`question_text` = 'TEST')
(The UPDATE query will only be executed if the first one returns rows)
UPDATE `polls_choice` SET `votes` = 100 WHERE `polls_choice`.`id` IN (1)
If I remove this filter .filter(question__question_text="TEST"), it results in a single UPDATE query as expected:
Choice.objects.filter(pk=1).update(votes=100)
UPDATE `polls_choice` SET `votes` = 100 WHERE `polls_choice`.`id` = 1
So this means, if your QuerySet has some filter conditions based on related objects, AND then you run .update(...) on it, it results in 2 separate queries instead of a single update (it can be done with 1 single update using joins).
Is this behaviour documented?
Since this silently turns the .update(...) into two separate queries it can cause race conditions that would otherwise not happen if the operation was performed as a single query.
The reason why it happens is that the original QuerySet.update logic predates generalized support for UPDATE JOIN and thus took an approach where it turned any references to the non-target table in filter (and thus WHERE) into a subquery of the form
UPDATE
`polls_choice`
SET
`votes` = 100
WHERE
`polls_choice`.`id` IN (
SELECT
`polls_choice`.`id`
FROM
`polls_choice`
INNER JOIN `polls_question` ON (
`polls_choice`.`question_id` = `polls_question`.`id`
)
WHERE (
`polls_choice`.`id` = 1
AND `polls_question`.`question_text` = 'TEST'
)
The problem with this approach though is that MySQL doesn’t allow self-select updates (UPDATE table ... WHERE (SELECT ... FROM table)) so the ORM must materialize the list of IDs on the Python side and pass it back (opening a race condition window).
Ultimately I think the SQLUpdateCompiler logic should be refactored to make use of JOIN if supported instead and fallback to the IN (subquery) mechanism otherwise but that’s a lot work. In the mean time I think an admonition to the update documentation could be worthy.
@charettes thanks for the info! greatly appreciated.
I think it really warrants a big red warning box in the docs. We had multiple people banging their heads against the wall for multiple days chasing “the impossible race condition”. Because if you read the Python code with the assumption that .update(...) is a single UPDATE query then there’s no bug.
We thought we may be observing a MySQL transaction bug, until an inspection of MySQL general query logs revealed the issue of SELECT-then-UPDATE.
The docs even have this snippet in them as of now:
Using update() also prevents a race condition wherein something might change in your database in the short period of time between loading the object and calling save() .
When under the conditions discussed, .update() causes the exact same race condition that the docs are suggesting that it prevents.