QuerySet ".update(...)" silently turns into SELECT+UPDATE? (MySQL)

TL;DR: When a QuerySet is being filtered based on related objects, executing .update(...) generates SELECT-then-UPDATE instead of a single UPDATE?

I have a query like this:

Choice.objects.filter(pk=1) \
            .filter(question__question_text="TEST") \
            .update(votes=100)

It produces these two separate SQL queries (1 SELECT + 1 UPDATE):

SELECT `polls_choice`.`id` FROM `polls_choice` INNER JOIN `polls_question` ON (`polls_ch
oice`.`question_id` = `polls_question`.`id`) WHERE (`polls_choice`.`id` = 1 AND `polls_q
uestion`.`question_text` = 'TEST')

(The UPDATE query will only be executed if the first one returns rows)

UPDATE `polls_choice` SET `votes` = 100 WHERE `polls_choice`.`id` IN (1)

If I remove this filter .filter(question__question_text="TEST"), it results in a single UPDATE query as expected:

Choice.objects.filter(pk=1).update(votes=100)

UPDATE `polls_choice` SET `votes` = 100 WHERE `polls_choice`.`id` = 1

So this means, if your QuerySet has some filter conditions based on related objects, AND then you run .update(...) on it, it results in 2 separate queries instead of a single update (it can be done with 1 single update using joins).

Is this behaviour documented?

Since this silently turns the .update(...) into two separate queries it can cause race conditions that would otherwise not happen if the operation was performed as a single query.

(Django 5.2, MySQL 8.4)

:wave: @bbkfhq!

This MySQL-only behavior is effectively not documented.

The reason why it happens is that the original QuerySet.update logic predates generalized support for UPDATE JOIN and thus took an approach where it turned any references to the non-target table in filter (and thus WHERE) into a subquery of the form

UPDATE
    `polls_choice`
SET
    `votes` = 100
WHERE
    `polls_choice`.`id` IN (
        SELECT
            `polls_choice`.`id`
        FROM
            `polls_choice`
            INNER JOIN `polls_question` ON (
                `polls_choice`.`question_id` = `polls_question`.`id`
            )
        WHERE (
            `polls_choice`.`id` = 1
            AND `polls_question`.`question_text` = 'TEST'
        )

The problem with this approach though is that MySQL doesn’t allow self-select updates (UPDATE table ... WHERE (SELECT ... FROM table)) so the ORM must materialize the list of IDs on the Python side and pass it back (opening a race condition window).

Ultimately I think the SQLUpdateCompiler logic should be refactored to make use of JOIN if supported instead and fallback to the IN (subquery) mechanism otherwise but that’s a lot work. In the mean time I think an admonition to the update documentation could be worthy.

@charettes thanks for the info! greatly appreciated.

I think it really warrants a big red warning box in the docs. We had multiple people banging their heads against the wall for multiple days chasing “the impossible race condition”. Because if you read the Python code with the assumption that .update(...) is a single UPDATE query then there’s no bug.

We thought we may be observing a MySQL transaction bug, until an inspection of MySQL general query logs revealed the issue of SELECT-then-UPDATE.

The docs even have this snippet in them as of now:

Using update() also prevents a race condition wherein something might change in your database in the short period of time between loading the object and calling save() .

When under the conditions discussed, .update() causes the exact same race condition that the docs are suggesting that it prevents.

I think we both agree that the docs and/or the code should be adjusted so at this point it’s a matter of filing a ticket and making the changes.

1 Like

Thanks, I have created this ticket: #36213

1 Like

Just an update for anyone following this thread,

A warning has been added to the docs (v5.2).

MySQL does not support self-select updates

On MySQL, QuerySet.update() may execute a SELECT followed by an UPDATE instead of a single UPDATE when filtering on related tables, which can introduce a race condition if concurrent changes occur between the queries. To ensure atomicity, consider using transactions or avoiding such filter conditions on MySQL.

1 Like

Thanks for seeing this through @bbkfhq :person_bowing:

For anyone interested in preventing this problem from happening in the first place on MySQL and potentially modernizing the SQL used to perform updates involving related tables (and even possibly adding support for aggregation) I’ve demonstrated that it should not be too hard to adjust SQLUpdateCompiler to take advantage of the non-standardized UPDATE FROM extension supported on Postgres, SQLite, and MySQL in some flavor.

This branch includes three improvements

  1. The first one avoids the Python-level materialization problem on MySQL entirely by forcing a database level materialization. This is a strategy we used for deletes successfully and reduces the race condition significantly (or would entirely if FOR UPDATE was used).
  2. The second dip toes in a MySQL flavor of UPDATE FROM by demonstrating it can be done. We’d want a database agnostic solution here though that is based on a feature flag (e.g. supports_update_from that would be disabled on Oracle, and SQLite < 3.33).
  3. The third demonstrates how the MySQL flavor of UPDATE FROM (which supports updating many tables in a single statement which is pretty wild) can be used to support MTI updates in a single statement even when RETURNING isn’t supported (which is the case on MySQL).

I think 1. should be considered a strong contender for merging as-is because it addresses a technically avoidable foot-gun and 2. is a good basis for anyone interested in learning more about the ORM (3. remains more a PoC than anythings). I’d be happy to propose a feature request for 1. if it gets support here and sponsor / mentor a potential contributor with a reasonable experience with SQL to see 2. through.

2 Likes