Automatically setting update_fields when saving a model instance

After running into concurrency issues where my Django model instances were being overwritten by previous values due to .save() not having update_fields specified, I began refactoring to specify update_fields wherever possible. This process proved more difficult than anticipated for complex functions and during the process I had a thought - if we can reliably know which fields have changed in-memory during thread execution (tracked on the model instance), can we automatically set update_fields?

This would make concurrency much safer as even if the instance gets “overwritten”, only the overlapping fields are an issue.
Though I can see how this could be undesirable in some cases, it seems like majority of cases would benefit from having update_fields restricted to only fields which have been updated.

Here’s a draft of a BaseModel that can hopefully be used to achieve this functionality. Appreciate any insight into problematic side-effects this may cause.
Note: this accounts for deferred fields and does not incur additional database queries.

from django.db import models


class _Null:
    """A class to represent a Null type."""

    def __bool__(self):
        return False


NULL = _Null()


class BaseModel(models.Model):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.__original_values = self._get_loaded_field_values()

    def _get_loaded_field_names(self):
        concrete_fields = self._meta.concrete_fields
        normalized_deferred_fields = []
        for deferred_field in self.get_deferred_fields():
            if deferred_field not in concrete_fields and deferred_field.endswith("_id"):
                deferred_field = deferred_field[:-3]
            normalized_deferred_fields.append(deferred_field)
        return [
            field.name
            for field in concrete_fields
            if field.name not in normalized_deferred_fields
        ]

    def _get_loaded_field_values(self):
        field_values = {}
        for field_name in self._get_loaded_field_names():
            foreign_key_id = getattr(self, field_name + "_id", NULL)
            if foreign_key_id is not NULL:
                field_values[field_name] = foreign_key_id
            else:
                field_values[field_name] = getattr(self, field_name, NULL)
        return field_values

    def _get_update_fields(self):
        update_fields = []
        for field_name, current_field_value in self._get_loaded_field_values().items():
            original_field_value = self.__original_values.get(field_name, NULL)
            if (
                current_field_value is not NULL
                and current_field_value != original_field_value
            ):
                update_fields.append(field_name)
        return update_fields

    def refresh_from_db(self, *args, **kwargs):
        super().refresh_from_db(*args, **kwargs)
        self.__original_values = self._get_loaded_field_values()

    def save(self, *args, update_fields=None, **kwargs):
        """
        Automatically set `update_fields` (if not explicitly specified)
        based on which fields have been modified
        """
        creating = self._state.adding is True
        if not creating and update_fields is None:
            update_fields = self._get_update_fields()
        super().save(*args, update_fields=update_fields, **kwargs)
        self.__original_values = self._get_loaded_field_values()
3 Likes

I’d be very happy to see improvements in this area. I’m long time Django user and I did not realize how big this problem is until we received many support tickets about pattern IMHO common to many projects: our users changed their password in UI but it was reset to the old one because we had a very long cron job running with

for u in User.objects.all():
   do_something()
   u.save()

I can imagine that your approach hits many edge cases but if Django cannot solve this automatically, at least the docs should guide users to always pass update_fields to save (which we actually did in our project - enforce update_field to be present on every update).

1 Like

Always using update fields can lead to data inconsistency. If I had an inventory record that had quantity and name fields. I read it has 12 Pet Rocks and I change it to 12 Monster Trucks. At the same time someone else is readying 12 Pet Rocks and changes it to 15 Pet Rocks. After both partial updates go through we end up with 15 Monster Trucks. I would rather have last write wins in that scenario.

@megaman821 That would be a good example of an undesirable case. I think it can be described more generally as

initial_state = Model.objects.get()
final_state = do_stuff(initial_state)
final_state.save()

Where another process is running concurrently to change to a different final state. If update fields are restricted then you can end up with a combination of final states which could be an invalid or unexpected state. Which is definitely fair criticism.
We could always change the usage to handle both scenarios depending on what you want. For example, we could do something like .save(automatic_update_fields=True) or in the reverse default .save(update_fields=["__all__"]).

Update: I have been running this on some of my models in production for almost 2 months now without any known issues/drawbacks so far and has eliminated the concurrency problems (in the rarer case when fields are overlapping I use database locks to maintain data consistency)