Current behavior
As of now, bulk_create accepts an iterable of Model instances, converting it to list of input objects. If it performs an insert query with RETURNING clause, it then sets the fields of Model instances to the values returned from this clause, correctly, however it performs it directly on the objects passed as input.
Potential problem
While it might be insignificant for current bulk_create use-cases, such behavior might cause more problems for conditional upserts, support for which is being worked on (ticket #34277)
An example
Provided we use Postgres as database backend and have defined a model
from django.db.models import Model, IntegerField
class ExampleModel(Model):
key_1 = IntegerField()
key_2 = IntegerField()
conditional_property = IntegerField(null=True)
data_property_1 = IntegerField()
data_property_2 = IntegerField()
class Meta:
unique_together = ['key_1', 'key_2']
Running the following code
ExampleModel.objects.create(key_1=1337, key_2=37, data_property_1=42, data_property_2=0)
ExampleModel.objects.create(key_1=1338, key_2=38, data_property_1=9000, data_property_2=1, conditional_property=5)
create_instances = [
ExampleModel(key_1=1337, key_2=37, data_property_1=0, data_property_2=42),
ExampleModel(key_1=1338, key_2=38, data_property_1=1, data_property_2=9000),
]
ExampleModel.objects.bulk_create(
create_instances,
update_conflicts=True,
update_fields=['data_property_1'],
unique_fields=['key_1', 'key_2'],
condition=Q(conditional_property__isnull=True),
)
results = [
(
result.data_property_1,
result.data_property_2,
)
for result in ExampleModel.objects.bulk_create(
create_instances,
update_conflicts=True,
update_fields=['data_property_2'],
unique_fields=['key_1', 'key_2'],
)
Would result in results
containing the following values:
[(0, 0), (9000, 1)]
In other words, after the first bulk_create is performed, field data_property_1
is updated to value from input, after which every input instance gets updated from actual database values (if we consider these columns as db_returning_fields
for the sake of this example, which they could surely be).
This causes the second bulk_create to simply pass values obtained and set from database on the input list right back to the database, which will effectively cause us to lose changes we wanted to write on second bulk_create, since our input values are overwritten by db data instead.
Further discussion setup
I believe such behavior is implemented by bulk_create so that it could preserve original input order while partitioning the set into records with and without pk and working on them separately.
One way to avoid this side-effect would be to make a deepcopy of input objects, operating on it instead, however this could be costly for large batches, since we would double the amount of memory needed for insert, so this approach should be treated with caution or discarded as inneficient.
With that said, i am not sure at the moment what other easy fixes would be.
As another measure taken, documentation could clarify and warn about this behavior, like it warns about potential queryset evaluation due to converting input to list internally.
I will be happy to collaborate and provide further info if needed, as of now I am not sure it deserves a separate ticket but will create one if it does.