Sorry? This is not on the table.
My bad, I thought we put it on the table
It would be very nice of you to share a minimal code sample of what you have implemented in your own project to achieve the opt out mechanism. You can also add a link to that comment to your original ticket on Trac so people can view this discussion.
What I was suggesting was to introduce a setting that letās you delay the new behavior for one version or so.
Hello everyone,
Following the ongoing discussion about the changes made to update_or_create
in Django 4.2, and considering my current work on extending the bulk_create
method to integrate F
and Q
expressions (see Ticket #34277 and the related GitHub Pull Request), I would like to propose a similar enhancement for update_or_create
.
The idea is to enable update_or_create
to handle F
and Q
expressions. This would offer greater flexibility and allow for more dynamic and conditional update or creation operations. For instance, it could enable updating fields based on their current values or external conditions, or creating new objects with attributes determined by complex conditions.
Extending update_or_create
in this way could potentially address some of the issues raised by the current changes in Django 4.2, offering more control and precision in managing object updates and creations.
I am eager to hear your thoughts on this proposal and whether this could be an interesting avenue to explore for enhancing the functionality of update_or_create
in Django.
Thank you for your thoughts and suggestions.
Best regards,
Yeah sorry, I was being ambitious/facetious about the update_fields thing. I think adding a setting bypass would be a nice addition to help people in a similar situation to me make the transition to 4.2
Another question I want to ask on this topic. Documentation states with passing update_fields āThere will be a slight performance benefit from preventing all of the model fields from being updated in the databaseā. Was the behavior change of update_or_create in 4.2 solely to take advantage of this āslightā performance increase?
It would be nice to not have to go back over 10s of thousands of lines of code. Upgrading a central package like Django will inevitably need some patching but this seems like an opportunity to mitigate a big one with a few lines of code that will preserve the functionality of the change introduced in 4.2 but provide options for backwards compatibility with global search or replace versus reworking every extended save method.
and to be clear, Iām all for efficiency gains. I will start using update_fields in my update_or_create()s moving forward for sure!
Iāll try this and report back! I would like to use latest version so if this can help my bypass this new forced behavior its worth a spin.
Isnāt there all sorts of things that could cause this? This would never happen in my project or any project using ACID compliant databases. If you are many concurrent writes and require atomic transactions wouldnāt you want to address that at the database level instead of changing the behavior of manager methods like update_or_create when there are many other scenarios in which you would still have data loss problems?
Here is our CustomManager solution:
class CustomManager(Manager):
""" Custom Manager that by default reverts the behavior of update_or_create to < 4.2 behavior and adds ability to pass extra kwargs to save """
def update_or_create(self, defaults=None, extra_save_kwargs=None, use_update_fields=False, **kwargs):
"""
Look up an object with the given kwargs, updating one with defaults
if it exists, otherwise create a new one.
Return a tuple (object, created), where created is a boolean
specifying whether an object was created.
"""
defaults = defaults or {}
extra_save_kwargs = extra_save_kwargs or {}
self._for_write = True
with transaction.atomic(using=self.db):
# Lock the row so that a concurrent update is blocked until
# update_or_create() has performed its save.
obj, created = self.select_for_update().get_or_create(defaults, **kwargs)
if created:
return obj, created
for k, v in resolve_callables(defaults):
setattr(obj, k, v)
update_fields = set(defaults)
concrete_field_names = self.model._meta._non_pk_concrete_field_names
# update_fields does not support non-concrete fields.
if concrete_field_names.issuperset(update_fields) and use_update_fields:
# Add fields which are set on pre_save(), e.g. auto_now fields.
# This is to maintain backward compatibility as these fields
# are not updated unless explicitly specified in the
# update_fields list.
for field in self.model._meta.local_concrete_fields:
if not (
field.primary_key or field.__class__.pre_save is Field.pre_save
):
update_fields.add(field.name)
if field.name != field.attname:
update_fields.add(field.attname)
obj.save(using=self.db, update_fields=update_fields, **extra_save_kwargs)
else:
obj.save(using=self.db, **extra_save_kwargs)
return obj, False
It was not solely for performance but also for correctness. Updating all fields is simply wrong in concurrent scenarios in my experience. When the whole object is updated all other requests that updated parts of the object in the meantime (like incrementing a view count by one) will have their effects reversed. One could argue that this is a dataloss bug at the framework level. I for one was very much surprised when I figured out that the whole object was updated.
Iām not sure I am following you. The default behavior of save is update_fields=None meaning the entire object is re-written to the DB. update_or_create is now calling save in its non-default configuration. So, you are saying anytime I call save() manually using the defaults I am at risk of data-loss?
" If update_fields
is passed in, only the pre_save()
methods of the update_fields
are called. For example, this means that date/time fields with auto_now=True
will not be updated unless they are included in the update_fields
."
Does update_or_create now break auto_now?
Yes, depending on how your application works every call to save()
is a potential data loss. One could say this might be expected for save()
since itās purpose is to save the whole model usually. For update_or_create()
it is less clear. There is technically not even a need to call .save()
, it could just as well utilize create()
or update()
. The docs donāt even guarantee that .save()
would be called at all (it is but that can be considered an implementation detail).
Sidenote: the only case where save()
is actually safe in the sense that it does something without possibly breaking something else is when it creates a new object or is used in conjunction with select_for_update
(though only if you donāt use āSELECT FOR UPDATE OF ā¦ā.
Another way around this is to have opportunistic updates. There you essentially have a version column and each and every update includes āset version = version + 1 where version = known_version_we_are_upfating_fromā. If this update then return zero affected rows you know that another request updated your object in the meantime. Whether your data is important enough to use this depends on your application.
No, please refer to the patch which explicitly check for pre_save
overrides when building the update_fields
set hence why defining derived fields with a pre_save
override, such as the ones with auto_now
just works.
This is why a pre_save
override was suggested in a previous reply
Ok, I understand, thank you for clarifying. But with this solution would require swapping out every single field (that you want excluded, which for us is a lot) with your custom Field Class that extends/overrides pre_save, right?
Well if its not documented how itās implemented, and the documentation makes no guarantees on its implementation, and the user has no power to influence itā¦seems like a risky method to use in general in its current stateā¦
Correct, hence why the save
override, either manual or though a model mixin as @sarahboyce suggested, or the usage of GeneratedField
when possible seem like a better solution to me.
Using pre_save
remains a good solution if you have a few derived fields that have the exact same implementation for a few of your models.
That is a pessimistic take. The method does what it is documented to do - it creates or updates a row. For what it is worth basically no method in Djangoās docs describe how they work internally.
Yeah but update_or_create could also use QuerySet.update instead and would also accomplish the same result. Then then all of a sudden you would lose all of your post/pre save signals that you previously banked on running when using the update_or_create method. So to say the implementation doesnāt matter as long as it accomplishes its stated goal is not really fair. Iām not intending to be pessimisitic, but the example I just gave would have still broken a lot of code even if everyone was very careful about update_fields handling