Could bulk_update() aggregate the pk's in WHEN clauses?

Hello @tuttle, thanks for opening a discussion on the subject.

One thing you have to keep in mind here is that BaseExpression.__hash__ (which is necessary for your when_updates[attr] is not cheap by any mean). We actually just uncovered and resolved a significant performance regression in it but it still remains relatively expensive. There could be ways to circumvent this problem by storing non-expression (e.g. literal values such as 42) directly and then turning then into Value at when_statements creation but that seems like a lot of work.

Given the current complain about bulk_update is how long it takes to generate the query I’m worried that making do even more stuff at query creation time might not be desirable. (see #31202 (Bulk update suffers from poor performance with large numbers of models and columns) – Django and #29771 (Support database-specific syntax for bulk_update) – Django).

IMO users will prefer to use chains of filter(pk__in=pks).update(field=something) over bulk_update when dealing with updates that have a small cardinality of values which is the only case where this optimization would make sense in the first place (if there’s no overlap of updated values the same chain of when_statements will be generated at the cost of many hash of expressions).

You mention that the query length is massive but is that a problem for query planers? Ultimately the ORM often generates verbose SQL queries but what actually matter, particularly in the case of bulk_update, is how fast the query performs.

Do you have any benchmark demonstrating that this would make updates significantly faster? For which backends and to what extent? Does it only apply when dealing with a certain number of fields or different values? Is it relevant if we eventually switch to using database-specific syntax instead?

These are questions that should be answered to a certain extent before such a ticket is considered.

1 Like