Hey! Long time no see! My current employer is running into some performance issues with bulk_update and I figured I’d take a look. When I initially added bulk_update I went with the simplest approach possible - a big CASE WHEN statement. This was because it was universally supported with a common syntax on all our backends.
Specific databases have much more efficient ways of doing this, and I planned to add database-specific updates at some point (ticket). But I never got round to it (until now?).
All database-specific syntax relies on some form of static VALUES expression. For example, the syntax between Postgres and Sqlite is almost the same. I haven’t tried with MySQL, but things look to work similarly (except using a slightly specific VALUES syntax), same with Oracle, except annoying to implement.
-- Postgres:
UPDATE blog_posts
SET title = v.title
FROM (
VALUES (1, 'post 2'), (3, 'post 3')
) as v(id, title)
where blog_posts.id = v.id;
-- Sqlite:
-- Annoyingly, sqlite doesn't support named columns
-- with `VALUE` expressions. We name these with a sub-select:
UPDATE blog_posts
SET title = v.title
FROM (
SELECT "column1" AS "id", "column2" AS "misc"
FROM (VALUES (1, 'post 2'), (3, 'post 3'))
) AS v
WHERE blog_posts.id = v.id;
I’ve been prototyping how to add this to Django, and I’ve got something half-working on sqlite. The relevant code is here (it’s horrible, I know). I’d love some feedback on the general approach before I spend any more time on it.
I added two new expressions: Values and ValueTuple, which can be used to construct VALUES ('foo'), ('bar')) type expressions.
In order to make this work on sqlite, and potentially other backends, it seems easiest to try to add a new “kind” of Table type: ValuesTable. This gives us some nice properties: we can validate the column types/counts against the base table, and it trivially allows us to surface this via a queryset method:
# Combine with values_list to select a subset of data
Note.objects.values("pk", "note").select_values(
[(1,"test"), (2, "test2")]
).all()
# Or even do this, to get back fully-formed `Note` instances?
Note.objects.select_from_literals([(1,"test"), (2, "test2")))
And in doing so, we can treat this as a subquery and re-use a bunch of our subquery/alias logic. In order for this to work, we need to support some kind of UPDATE ... FROM in SQLUpdateCompiler:
# A bulk-update implementation
# No idea about the specific syntax
SomeModel.objects.update_from(
note=F('source__note'), # ?
source=SomeModel.objects.values("pk", "note").select_values([
(1, "foobar"), (2, "some new note")
])
)
I haven’t dug into this, but it seems that doing this might also unlock the ability to use related fields in .update() - i.e. Note.objects.update(note=F('tag__name')), by pushing the joins into update from and returning a set of values:
SomeModel.objects.update_from(
note=F('related_field__content'),
source=SomeModel.objects.values("pk", "related_field__content")
)
UPDATE some_model
SET note = v.content
FROM (
SELECT "some_model"."pk"
"related_table"."content" AS "content"
FROM some_model
INNER JOIN related_table ...
) AS v
WHERE some_model.id = v.id;
Essentially, it boils down to this: I can probably hack something very postgres specific into bulk_update somewhat easily, and I might well do this as an initial change.
But it seems that adding some kind of “table from VALUES” expression would unlock a few nice things down the line, and bulk_update could be trivially built off these expressions/abstractions?
Is this something we’d like to add? Are there any glaring issues or roadblocks that I’m not seeing?