I would like to update a PostgreSQL column in 1M rows as such:
MyModel.objects.all().update(bool_field=True)
Is this code efficient for large datasets?
I would like to update a PostgreSQL column in 1M rows as such:
MyModel.objects.all().update(bool_field=True)
Is this code efficient for large datasets?
It’s about as efficient as it’s possible to be. This should create one UPDATE sql statement such that all the work is being done in the database.
See the full description in the docs at update.
Thank you!
I have read the docs and also tried to search the code, indeed this is what I understood.
Actually, I already use it and it performs well, given the size, but… I was not sure why!
Your answer suggests that the database takes care of this.
I asked the question because this table could easily grow bigger and I was not sure whether it would be better to send those updates in batches (which will take more time) or just… have faith!
Quoting this solution for efficiently updating or deleting in batches, should it be needed…
(Just ran out of memory)
If your table becomes very large or has a lot of writes performing a single UPDATE might be too disruptive as it cause a lot of contention on the table while all the rows and associated indices are update as your database will attempt to perform the change in a single transaction.
If you can afford for the change to not be performed atomically then the solution referenced above is going to be a safer bet as it will give your database time to breathe and do other stuff between your updates that you can even space out in time with a time.sleep on each iterations.
You must make sure that you don’t run the above in a transaction though (transaction.atomic) as otherwise it’s going to perform even worse than a single update. That’s one of the reason why data migrations should explicitly set atomic = False for example.