Speeding up Postgres bulk_create by using unnest

Through some reading last evening I stumbled upon an article that described a technique that allegedly sped up large INSERT performance by 50% on Postgres.

The technique revolves around changing the INSERT statement from

INSERT INTO table (f0, ..., fn)
VALUES (f0_v0, ..., fn_v0), ..., (f0_vm, ..., fn_vm)

to

INSERT INTO table (fa, ..., fn)
SELECT * FROM unnest(f0v[], ..., fnv[])

For those unfamiliar with unnest when used in the context of a table expression it can be seen as an analogous to the zip function in Python.

Apparently the speed benefits come from a reduction in time spent planning the query as the unnest technique avoids having to reason about n * m binding parameters (one for each value of each row) as only n homogeneous arrays are passed over for insertion. In other words, the amount of time spent planing the insert is not a function of the number of items anymore which is a desirable property for large inserts.

One point that caught my attention was the author’s statement that

several object-relational mappers use it under the hood

while I knew for sure that Django’s ORM didn’t so I gave it a try to see how invasive such a change would be.

Well I’m happy to report that through the exploration work I identified a potential performance improvement to SQL generating logic which yield a 5% improvement on the query_values_10000.benchmark.QueryValues10000.time_query_values_10000 ASV benchmark and would likely get better if the benchmark was inserting more than one field at a time.

I also managed to get unnest insert working in a non-invasive manner under the following conditions

  1. It can only be used if all values are literals as an homogeneous array of values must be bound to unnest parameters. From my experience this is the case most of the time as it’s rare that database expressions are provided for insertion purposes.
  2. It cannot be used when inserting ArrayField because unnest doesn’t allow to discriminate on the depth of the unnesting (it’s more of a flatten function) so there’s no way to provide a bound parameter that is an array of arrays without creating a custom function.

When the proper conditions are met unnest can be used and from my local testing trying to reproduce the article’s author result I can reliably reproduce a 20-25% speed up when using bulk_create to insert 500 rows at a time and no slow down up when using the technique for 2-499 rows.

The great thing about this approach is that it is compatible with all the other options bulk_create support such as ignore_conflicts or update_fields and friends. It’s really just a replacement for VALUES and one binding parameter per field X row and also avoids generating a large string to send over the wire. It also happens to pass the full test suite :slight_smile:

I’m happy to create an optimization ticket out of this branch but I wanted to get some community validation on the merit of the approach and the reproducibility of the results first.

Cheers!

10 Likes

I like this idea. If other ORMs are doing it (which?) I assume it doesn’t have edge cases you didn’t already consider here.

Other thing I wonder: what is the chance that Postgres “fixes” this themselves and we end up reverting this?

1 Like

When the proper conditions are met unnest can be used and from my local testing trying to reproduce the article’s author result I can reliably reproduce a 20-25% speed up when using bulk_create to insert 500 rows at a time and no slow down up when using the technique for 2-499 rows.

So you’re saying the testing only started showing speed up for when we get to 500+ rows? Kinda weird… if you have a testing framework with all of this already probably wouldn’t hurt to try and get detailed measurements for, way, 20/40/60/… etc rows.

If you have a variant of a script already, I don’t mind tweaking it to run it a bunch on my end to try and generate convincing results.

Like this idea of course, just thinking here that a more detailed graph could help with re-enforcing the point

i’d say it’s unlikely to happen, as in this context a “fix” would either be a huge performance regression in unnest or a massive increase in speed for binding parameters.

In the first case we implicitly rely on the upstream project to not have gigantic performance regressions out of the blue, and in the second case everyone wins. In both cases we can conditionally use this depending on the Postgres version: older releases will maintain the same performance characteristics even if they change in newer releases.

I love this idea - it sounds like a great and transparent performance improvement.

I quickly ran some of the most hacky benchmarks (code - only god can judge me) comparing bulk_create speeds when inserting 0-10k rows into a table with 0-100 columns on Postgres 17, as I think comparing the growth factor of the runtime against these two variables is important.

The results are impressive! Worst case (10k rows, 100 columns) we are 2x as fast, and even at 1k rows it’s reduced from 0.14 to 0.09s :rocket:

Data: results.csv · GitHub

By columns:

Average time: red=main, orange=insert-optimizations (rebased)

Max time: orange=main, red=insert-optimizations (rebased)

By rows:

Average time: red=main, orange=insert-optimizations (rebased)

Max time: orange=main, red=insert-optimizations (rebased)

3 Likes

It seems a great improvement, I would be very interested in an optimization ticket.

20% for bulk_create would be a pretty huge win! I don’t often hit the speed limit of bulk inserts, but when I do it’s very frustrating. The change looks surprisingly small too, which would hopefully mean it’s easy to get merged.

1 Like

Thanks for the feedback everyone and for the benchmarks @orf (which are already better than the ones I had in Python interpreter session at the late time of writing this post).

Given there has been results validation by another ORM contributor and interest by the community I’ll file a new optimization ticket and attach the proposed changes to it.


Associated ticket and PR.

3 Likes

New contributor here, let me know if this is not the right place to report this.

I’ve recently upgraded to Django 5.2 and I’m getting some failing tests while trying to bulk_create items that have PolygonField fields (GeoDjango).

Log output:

django.db.utils.InternalError: parse error - invalid geometry
LINE 1: ...0, 2)[], ('{NULL,NULL,NULL}')::numeric(10, 2)[], ('{NULL,NUL...
                                                             ^
HINT:  "NU" <-- parse error at position 2 within geometry

Also, I noticed this in the postgres logs:

INSERT INTO "model_name" (..., "geometry", ...) SELECT * FROM UNNEST(
....

# My geometry field appears like this
('{NULL,NULL}')::geography(MULTIPOLYGON,4326)[], 

# While other fields appear like this
(ARRAY[NULL, NULL])::varchar(50)[]

My model field is set up like this:

geometry = models.PolygonField(null=True, blank=True, geography=True)

I’m using: Postgres 15, Postgis 3.5, Django 5.2, Python 3.10.
I’m wasn’t able to create a simple reproducible example yet, but I’ll keep trying.

This will be fixed by this commit in Django 5.2.1 meant to be released on May 7th.

@charettes Maybe you are interested - I have started to play around with a copy_create pendant using COPY FROM for the data transfer. My first runtime numbers look promising, see idea - should the postgres copy path get a copy_insert/create method? · Issue #4 · netzkolchose/django-fast-update · GitHub (since I use Django 5.2.4 the unnest patch should already be included in the numbers).

Also I wonder, if the unnest trick would also apply to the UPDATE … FROM VALUES pattern speeding things further up.

Thanks for sharing. It’d be interesting to see what a patch against Django’s source code that makes bulk_create faster under some circumstances might look like over adding a new interface as ultimately I suspect folks that really want to perform fast bulk creation on Postgres end up using psycopgdirectly instead.

As for MTI, and partitioning of objects with and without pk I suspect bulk_create could be adapted to do a much better job today on backends supporting RETURNING(all except MySQL).

Yes psycopg3 brings a much better copy support, which is ~2 times faster than my current still psycopg2 compatible approach, but only when fully relying on the binary transport and the auto adaption for values (thats how psycopg calls the python to postgres type conversions). The latter is tricky - to really benefit from that, late value adjustments in methods like prep_save should be avoided, or the speed benefit goes away.

For copy_update I had started to port the custom python to postgres encoding to the auto adaption of psycopg3, but got kinda stuck on the question, whether things should be abstracted closer to django internals or not (e.g. make use of prep_save and such to some degree).

Last but not least I found plain model instantiation to be quite expensive itself (thats the reason why I came up with that from_dict=True switch) and wonder if this could be made faster somehow. I did not further investigate it yet, no clue if it is mainly CPU bound (lots of ctor logic to get through?), RAM/allocation pressure (RAM usage is alot higher) or both.

Edit: As for MTI I am pretty sure I have missed several important details there. While MTI brings so much convenience to modelling - on the other hand I try to avoid it like the plague for anything that needs performance. Imho better bulk support for those would be a partial relief at least. The fact that mysql is not able to propagate newly created pks in reliable way is - idk, planned dumbness of the DBMS? I mean comon mysql folks, even sqlite is miles ahead in this regard, mariadb also got it right now…