Faster bulk_create using dictionaries

adamsol · January 10, 2026, 3:49pm

Recently, there were some optimizations in bulk_create:

#35936 (Speeding up Postgres bulk_create by using unnest) – Django
#36088 (Avoid unnecessary DEFAULT usage on bulk_create for models with db_default fields) – Django
#36815 (Avoid unnecessary prepare_value calls when inserting db_defaults) – Django

However, one of the slowest parts of the whole process of bulk-inserting data is still (quite surprisingly) creating Python objects before the actual inserting. This was already discussed in other threads:

The same issue also significantly slows down adding identifiers to M2M fields.

(For additional reference, see this SO thread for a benchmark of creating objects in Python, or this blog post for a report of huge performance gains by avoiding creating model instances when fetching lots of data.)

So I was thinking about an option of inserting data with dictionaries. Since we already have values for fetching rows using dictionaries instead of objects, inserting data this way should fit as well. It could be handled directly in the existing bulk_create method (allowing for an arbitrary mix of dictionaries and objects in the list). We can just take data from the dictionary instead of from the object, and for missing keys we can insert the model defaults.

I’ve been experimenting with this, and have prepared an example implementation as a proof of concept: Faster bulk_create using dictionaries · adamsol/django@ed1ad9c · GitHub. The speedup in my tests was between 1.6x and 2x, depending on the model. It seems that models with db_default benefit the most, as we additionally avoid creating DatabaseDefault instances for each object.

Does this sound like something worth implementing in Django?

It’s of course possible to build such a helper function outside of Django (which I have done in a project I’m working on). Nonetheless, it would be convenient to have a faster method of inserting data built into the framework - especially for DB migrations, as they tend to be difficult to test automatically, so importing and using custom functions may easily lead to them getting broken after some code refactoring.

jerch · January 11, 2026, 4:35pm

@adamsol I have not looked yet on your approach, but want to give you a few more pointer on what i have tried so far and why.

For django-computedfields I tested different approaches to cut ORM functionality which these results:

github.com/netzkolchose/django-computedfields

Cutting ORM ropes...

opened 07:55PM - 29 Jul 25 UTC

jerch

This is just a playground for now... The limitations around UNIONed queries and… the .only idea made me think, whether we can safe update runtime by further cutting ORM ropes. My first idea was to leave out the instance creation in _bulk_updater_ and just go with dictionaries for local updates. So I took the most heavy local cf model _SelfRef_ and edited _bulk_updater_ and the compute functions to refer to dict entries instead of instance attributes with these results in _updatedata_ command (for 100k records): - run with no update: 48000 rec/s --> 110000 rec/s, speedup is ~2.2x - run with full update: 20000 rec/s --> 30000 rec/s, speedup is ~1.5x The benefit is somewhat underwhelming given the fact, that it drops all nice attribute access pattern the ORM provides. Or to put it differently - while the ORM instance creation puts a significant perf burden in select queries, it is still only within a ~2x range. So the ORM does a pretty good job not to penalize things too much. Well, next stop for investigations is the raw cursor interface (the tests above were still using the queryset.values() ORM interface) ...

This tested SELECTs with model instances vs. dictionaries retrieved via .values(). The roundtrip with updates would still create model instances, thus the benefit is lower with ~1.5 times speedup. For a fully dictionary-based handling I expect the benefit somewhere in 2-3x speedup for postgres. This at least is indicated by my tests with copy_insert impls tested here: idea - should the postgres copy path get a copy_insert/create method? · Issue #4 · netzkolchose/django-fast-update · GitHub

I had no time yet to write everything down into neatly tested lib code yet, as I got distracted with a few psycopg issues and patched those first.

adamchainz · January 11, 2026, 11:32pm

Maybe! I agree it fits with how .values() can return dictionaries—the symmetry of allowing dicts in some operations is appealing. However, it would be a big scope change, since it would logically lead tobulk_update also accepting dictionaries.

I would also like to see attempts to optimize Model.__init__ so this is less of a problem. It does a lot of work, and while some improvements have been made, and maybe there are more yet.

Aha, that’s a good hint… I found that we don’t need to create DatabaseDefault instances per model instance, leading to this ~12% optimization: #36858 (Optimize `db_default` creation) – Django

adamsol · January 12, 2026, 6:45pm

I would also like to see attempts to optimize Model.__init__ so this is less of a problem. It does a lot of work, and while some improvements have been made, and maybe there are more yet.

I’m not sure if much can be done on the Django side here, since object creation overhead comes from Python itself - as benchmarked in the SO answer that I linked earlier. It’s getting better in newer Python versions (my measurements were on 3.13), but dictionaries should still win convincingly in most cases.

I found that we don’t need to create DatabaseDefault instances per model instance, leading to this ~12% optimization:

Nice, so now the advantage of avoiding objects will diminish a little, but 1.6x-1.8x should still be achievable.

adamchainz · January 12, 2026, 8:48pm

Yeah, objects cannot be as fast as plain dicts. But the thread is not quite an apples-to-apples comparison, as it’s using dataclasses, which have generated code that may not be as efficient as a vanilla class can be.

Model.__init__ does a lot of stuff: django/django/db/models/base.py at 2b192bff26cf956c168790fce6a637cbd814250b · django/django · GitHub

I optimized it a bit ten years ago in Optimized Model instantiation a bit. · django/django@d2a26c1 · GitHub

I’m sure some more targeted profiling and investigating could find further speedups, especially given how Django and Python have changed since then.

(This is not to dismiss the dict support idea, still.)

adamsol · January 13, 2026, 6:05pm

You’re right, after benchmarking further, I can see that Model.__init__ is actually the main culprit, and Python’s object creation overhead is less significant. The following script:

def measure(f):
    import time
    t = time.perf_counter()
    f()
    print(f'{(time.perf_counter() - t):.3f}')


class A:
    def __init__(self, id):
        self.id = id


class B(models.Model):
    class Meta:
        app_label = 'test'


N = 100_000

measure(lambda: [{'id': i} for i in range(N)])
measure(lambda: [A(id=i) for i in range(N)])
measure(lambda: [B(id=i) for i in range(N)])

gives results like:

0.018
0.051
0.236

But I guess this doesn’t change much regarding the dictionary idea.

adamsol · January 24, 2026, 1:04pm

I’ve created Faster bulk_create using dictionaries · Issue #113 · django/new-features · GitHub

Topic		Replies	Views
How to avoid the overhead of model instances in bulk_create Using the ORM	4	4944	November 22, 2023
bulk_update() example in docs Using the ORM	7	18083	November 15, 2022
Speeding up Postgres bulk_create by using unnest ORM	11	1169	August 24, 2025
Bulk import from another database to a Django powered one Using the ORM	5	451	March 28, 2023
bulk_create has a questionable side-effect on its input ORM	3	359	May 10, 2024

Faster bulk_create using dictionaries

Related topics