Hey! Long time no see! My current employer is running into some performance issues with bulk_update
and I figured I’d take a look. When I initially added bulk_update
I went with the simplest approach possible - a big CASE WHEN
statement. This was because it was universally supported with a common syntax on all our backends.
Specific databases have much more efficient ways of doing this, and I planned to add database-specific updates at some point (ticket). But I never got round to it (until now?).
All database-specific syntax relies on some form of static VALUES
expression. For example, the syntax between Postgres and Sqlite is almost the same. I haven’t tried with MySQL, but things look to work similarly (except using a slightly specific VALUES syntax), same with Oracle, except annoying to implement.
-- Postgres:
UPDATE blog_posts
SET title = v.title
FROM (
VALUES (1, 'post 2'), (3, 'post 3')
) as v(id, title)
where blog_posts.id = v.id;
-- Sqlite:
-- Annoyingly, sqlite doesn't support named columns
-- with `VALUE` expressions. We name these with a sub-select:
UPDATE blog_posts
SET title = v.title
FROM (
SELECT "column1" AS "id", "column2" AS "misc"
FROM (VALUES (1, 'post 2'), (3, 'post 3'))
) AS v
WHERE blog_posts.id = v.id;
I’ve been prototyping how to add this to Django, and I’ve got something half-working on sqlite. The relevant code is here (it’s horrible, I know). I’d love some feedback on the general approach before I spend any more time on it.
I added two new expressions: Values
and ValueTuple
, which can be used to construct VALUES ('foo'), ('bar'))
type expressions.
In order to make this work on sqlite, and potentially other backends, it seems easiest to try to add a new “kind” of Table type: ValuesTable
. This gives us some nice properties: we can validate the column types/counts against the base table, and it trivially allows us to surface this via a queryset method:
# Combine with values_list to select a subset of data
Note.objects.values("pk", "note").select_values(
[(1,"test"), (2, "test2")]
).all()
# Or even do this, to get back fully-formed `Note` instances?
Note.objects.select_from_literals([(1,"test"), (2, "test2")))
And in doing so, we can treat this as a subquery and re-use a bunch of our subquery/alias logic. In order for this to work, we need to support some kind of UPDATE ... FROM
in SQLUpdateCompiler
:
# A bulk-update implementation
# No idea about the specific syntax
SomeModel.objects.update_from(
note=F('source__note'), # ?
source=SomeModel.objects.values("pk", "note").select_values([
(1, "foobar"), (2, "some new note")
])
)
I haven’t dug into this, but it seems that doing this might also unlock the ability to use related fields in .update()
- i.e. Note.objects.update(note=F('tag__name'))
, by pushing the joins into update from
and returning a set of values:
SomeModel.objects.update_from(
note=F('related_field__content'),
source=SomeModel.objects.values("pk", "related_field__content")
)
UPDATE some_model
SET note = v.content
FROM (
SELECT "some_model"."pk"
"related_table"."content" AS "content"
FROM some_model
INNER JOIN related_table ...
) AS v
WHERE some_model.id = v.id;
Essentially, it boils down to this: I can probably hack something very postgres specific into bulk_update
somewhat easily, and I might well do this as an initial change.
But it seems that adding some kind of “table from VALUES
” expression would unlock a few nice things down the line, and bulk_update
could be trivially built off these expressions/abstractions?
Is this something we’d like to add? Are there any glaring issues or roadblocks that I’m not seeing?