Speeding up Postgres bulk_create by using unnest

Through some reading last evening I stumbled upon an article that described a technique that allegedly sped up large INSERT performance by 50% on Postgres.

The technique revolves around changing the INSERT statement from

INSERT INTO table (f0, ..., fn)
VALUES (f0_v0, ..., fn_v0), ..., (f0_vm, ..., fn_vm)

to

INSERT INTO table (fa, ..., fn)
SELECT * FROM unnest(f0v[], ..., fnv[])

For those unfamiliar with unnest when used in the context of a table expression it can be seen as an analogous to the zip function in Python.

Apparently the speed benefits come from a reduction in time spent planning the query as the unnest technique avoids having to reason about n * m binding parameters (one for each value of each row) as only n homogeneous arrays are passed over for insertion. In other words, the amount of time spent planing the insert is not a function of the number of items anymore which is a desirable property for large inserts.

One point that caught my attention was the author’s statement that

several object-relational mappers use it under the hood

while I knew for sure that Django’s ORM didn’t so I gave it a try to see how invasive such a change would be.

Well I’m happy to report that through the exploration work I identified a potential performance improvement to SQL generating logic which yield a 5% improvement on the query_values_10000.benchmark.QueryValues10000.time_query_values_10000 ASV benchmark and would likely get better if the benchmark was inserting more than one field at a time.

I also managed to get unnest insert working in a non-invasive manner under the following conditions

  1. It can only be used if all values are literals as an homogeneous array of values must be bound to unnest parameters. From my experience this is the case most of the time as it’s rare that database expressions are provided for insertion purposes.
  2. It cannot be used when inserting ArrayField because unnest doesn’t allow to discriminate on the depth of the unnesting (it’s more of a flatten function) so there’s no way to provide a bound parameter that is an array of arrays without creating a custom function.

When the proper conditions are met unnest can be used and from my local testing trying to reproduce the article’s author result I can reliably reproduce a 20-25% speed up when using bulk_create to insert 500 rows at a time and no slow down up when using the technique for 2-499 rows.

The great thing about this approach is that it is compatible with all the other options bulk_create support such as ignore_conflicts or update_fields and friends. It’s really just a replacement for VALUES and one binding parameter per field X row and also avoids generating a large string to send over the wire. It also happens to pass the full test suite :slight_smile:

I’m happy to create an optimization ticket out of this branch but I wanted to get some community validation on the merit of the approach and the reproducibility of the results first.

Cheers!

10 Likes

I like this idea. If other ORMs are doing it (which?) I assume it doesn’t have edge cases you didn’t already consider here.

Other thing I wonder: what is the chance that Postgres “fixes” this themselves and we end up reverting this?

1 Like

When the proper conditions are met unnest can be used and from my local testing trying to reproduce the article’s author result I can reliably reproduce a 20-25% speed up when using bulk_create to insert 500 rows at a time and no slow down up when using the technique for 2-499 rows.

So you’re saying the testing only started showing speed up for when we get to 500+ rows? Kinda weird… if you have a testing framework with all of this already probably wouldn’t hurt to try and get detailed measurements for, way, 20/40/60/… etc rows.

If you have a variant of a script already, I don’t mind tweaking it to run it a bunch on my end to try and generate convincing results.

Like this idea of course, just thinking here that a more detailed graph could help with re-enforcing the point

i’d say it’s unlikely to happen, as in this context a “fix” would either be a huge performance regression in unnest or a massive increase in speed for binding parameters.

In the first case we implicitly rely on the upstream project to not have gigantic performance regressions out of the blue, and in the second case everyone wins. In both cases we can conditionally use this depending on the Postgres version: older releases will maintain the same performance characteristics even if they change in newer releases.

I love this idea - it sounds like a great and transparent performance improvement.

I quickly ran some of the most hacky benchmarks (code - only god can judge me) comparing bulk_create speeds when inserting 0-10k rows into a table with 0-100 columns on Postgres 17, as I think comparing the growth factor of the runtime against these two variables is important.

The results are impressive! Worst case (10k rows, 100 columns) we are 2x as fast, and even at 1k rows it’s reduced from 0.14 to 0.09s :rocket:

Data: results.csv · GitHub

By columns:

Average time: red=main, orange=insert-optimizations (rebased)

Max time: orange=main, red=insert-optimizations (rebased)

By rows:

Average time: red=main, orange=insert-optimizations (rebased)

Max time: orange=main, red=insert-optimizations (rebased)

3 Likes

It seems a great improvement, I would be very interested in an optimization ticket.

20% for bulk_create would be a pretty huge win! I don’t often hit the speed limit of bulk inserts, but when I do it’s very frustrating. The change looks surprisingly small too, which would hopefully mean it’s easy to get merged.

1 Like

Thanks for the feedback everyone and for the benchmarks @orf (which are already better than the ones I had in Python interpreter session at the late time of writing this post).

Given there has been results validation by another ORM contributor and interest by the community I’ll file a new optimization ticket and attach the proposed changes to it.


Associated ticket and PR.

3 Likes