update_or_create Behavior

Sorry? This is not on the table.

My bad, I thought we put it on the table

It would be very nice of you to share a minimal code sample of what you have implemented in your own project to achieve the opt out mechanism. You can also add a link to that comment to your original ticket on Trac so people can view this discussion. :+1:

1 Like

What I was suggesting was to introduce a setting that letā€™s you delay the new behavior for one version or so.

Hello everyone,

Following the ongoing discussion about the changes made to update_or_create in Django 4.2, and considering my current work on extending the bulk_create method to integrate F and Q expressions (see Ticket #34277 and the related GitHub Pull Request), I would like to propose a similar enhancement for update_or_create.

The idea is to enable update_or_create to handle F and Q expressions. This would offer greater flexibility and allow for more dynamic and conditional update or creation operations. For instance, it could enable updating fields based on their current values or external conditions, or creating new objects with attributes determined by complex conditions.

Extending update_or_create in this way could potentially address some of the issues raised by the current changes in Django 4.2, offering more control and precision in managing object updates and creations.

I am eager to hear your thoughts on this proposal and whether this could be an interesting avenue to explore for enhancing the functionality of update_or_create in Django.

Thank you for your thoughts and suggestions.

Best regards,

Yeah sorry, I was being ambitious/facetious about the update_fields thing. I think adding a setting bypass would be a nice addition to help people in a similar situation to me make the transition to 4.2

Another question I want to ask on this topic. Documentation states with passing update_fields ā€œThere will be a slight performance benefit from preventing all of the model fields from being updated in the databaseā€. Was the behavior change of update_or_create in 4.2 solely to take advantage of this ā€œslightā€ performance increase?

It would be nice to not have to go back over 10s of thousands of lines of code. Upgrading a central package like Django will inevitably need some patching but this seems like an opportunity to mitigate a big one with a few lines of code that will preserve the functionality of the change introduced in 4.2 but provide options for backwards compatibility with global search or replace versus reworking every extended save method.

1 Like

and to be clear, Iā€™m all for efficiency gains. I will start using update_fields in my update_or_create()s moving forward for sure!

Iā€™ll try this and report back! I would like to use latest version so if this can help my bypass this new forced behavior its worth a spin.

Isnā€™t there all sorts of things that could cause this? This would never happen in my project or any project using ACID compliant databases. If you are many concurrent writes and require atomic transactions wouldnā€™t you want to address that at the database level instead of changing the behavior of manager methods like update_or_create when there are many other scenarios in which you would still have data loss problems?

Here is our CustomManager solution:

class CustomManager(Manager):
    """ Custom Manager that by default reverts the behavior of update_or_create to < 4.2 behavior and adds ability to pass extra kwargs to save """
    def update_or_create(self, defaults=None, extra_save_kwargs=None, use_update_fields=False, **kwargs):
        """
        Look up an object with the given kwargs, updating one with defaults
        if it exists, otherwise create a new one.
        Return a tuple (object, created), where created is a boolean
        specifying whether an object was created.
        """
        defaults = defaults or {}
        extra_save_kwargs = extra_save_kwargs or {}
        self._for_write = True
        with transaction.atomic(using=self.db):
            # Lock the row so that a concurrent update is blocked until
            # update_or_create() has performed its save.
            obj, created = self.select_for_update().get_or_create(defaults, **kwargs)
            if created:
                return obj, created
            for k, v in resolve_callables(defaults):
                setattr(obj, k, v)

            update_fields = set(defaults)
            concrete_field_names = self.model._meta._non_pk_concrete_field_names
            # update_fields does not support non-concrete fields.
            if concrete_field_names.issuperset(update_fields) and use_update_fields:
                # Add fields which are set on pre_save(), e.g. auto_now fields.
                # This is to maintain backward compatibility as these fields
                # are not updated unless explicitly specified in the
                # update_fields list.
                for field in self.model._meta.local_concrete_fields:
                    if not (
                        field.primary_key or field.__class__.pre_save is Field.pre_save
                    ):
                        update_fields.add(field.name)
                        if field.name != field.attname:
                            update_fields.add(field.attname)
                obj.save(using=self.db, update_fields=update_fields, **extra_save_kwargs)
            else:
                obj.save(using=self.db, **extra_save_kwargs)
        return obj, False
2 Likes

It was not solely for performance but also for correctness. Updating all fields is simply wrong in concurrent scenarios in my experience. When the whole object is updated all other requests that updated parts of the object in the meantime (like incrementing a view count by one) will have their effects reversed. One could argue that this is a dataloss bug at the framework level. I for one was very much surprised when I figured out that the whole object was updated.

Iā€™m not sure I am following you. The default behavior of save is update_fields=None meaning the entire object is re-written to the DB. update_or_create is now calling save in its non-default configuration. So, you are saying anytime I call save() manually using the defaults I am at risk of data-loss?

" If update_fields is passed in, only the pre_save() methods of the update_fields are called. For example, this means that date/time fields with auto_now=True will not be updated unless they are included in the update_fields ."

Does update_or_create now break auto_now?

Yes, depending on how your application works every call to save() is a potential data loss. One could say this might be expected for save() since itā€™s purpose is to save the whole model usually. For update_or_create() it is less clear. There is technically not even a need to call .save(), it could just as well utilize create() or update(). The docs donā€™t even guarantee that .save() would be called at all (it is but that can be considered an implementation detail).

Sidenote: the only case where save() is actually safe in the sense that it does something without possibly breaking something else is when it creates a new object or is used in conjunction with select_for_update (though only if you donā€™t use ā€œSELECT FOR UPDATE OF ā€¦ā€.

Another way around this is to have opportunistic updates. There you essentially have a version column and each and every update includes ā€œset version = version + 1 where version = known_version_we_are_upfating_fromā€. If this update then return zero affected rows you know that another request updated your object in the meantime. Whether your data is important enough to use this depends on your application.

No, please refer to the patch which explicitly check for pre_save overrides when building the update_fields set hence why defining derived fields with a pre_save override, such as the ones with auto_now just works.

This is why a pre_save override was suggested in a previous reply

Ok, I understand, thank you for clarifying. But with this solution would require swapping out every single field (that you want excluded, which for us is a lot) with your custom Field Class that extends/overrides pre_save, right?

Well if its not documented how itā€™s implemented, and the documentation makes no guarantees on its implementation, and the user has no power to influence itā€¦seems like a risky method to use in general in its current stateā€¦

Correct, hence why the save override, either manual or though a model mixin as @sarahboyce suggested, or the usage of GeneratedField when possible seem like a better solution to me.

Using pre_save remains a good solution if you have a few derived fields that have the exact same implementation for a few of your models.

1 Like

That is a pessimistic take. The method does what it is documented to do - it creates or updates a row. For what it is worth basically no method in Djangoā€™s docs describe how they work internally.

Yeah but update_or_create could also use QuerySet.update instead and would also accomplish the same result. Then then all of a sudden you would lose all of your post/pre save signals that you previously banked on running when using the update_or_create method. So to say the implementation doesnā€™t matter as long as it accomplishes its stated goal is not really fair. Iā€™m not intending to be pessimisitic, but the example I just gave would have still broken a lot of code even if everyone was very careful about update_fields handling