Hook for extending `update_fields` in save() overrides

What do folks think about shipping an extend_update_fields() util to make manipulating the update_fields argument in in overridden save() methods less error-prone?

When upgrading a project to 4.2 I needed this in three or so models. I was in a bit of a hurry, so it took me two pull requests to do it. (The first time I forgot to handle iterables other than sets, because I was overly-focused on writing tests to capture the update_or_create() 4.2 behavior change scenario.)

Round 2 looked like this. (I’m just showing my thinking process and how I still missed cases. It’s not supposed to be optimized for performance or anything. You should use Simon’s suggestion because it doesn’t mutate the kwargs in place.)

def extend_update_fields(kwargs, field_name):
    """Update `update_field` inside `kwargs` (if present) in-place with `field_name`."""
    if (update_fields := kwargs.get("update_fields")) is not None:
        # DON'T do this: you shouldn't mutate the input
        if isinstance(update_fields, set):
            new = set(update_fields)
            kwargs["update_fields"] = new

This is a lot like Simon’s suggestion here, other than not eagerly converting to a set and only accepting one new field. EDIT: and mutating the kwargs in place.

However, both of our implementations missed *args. So the above snippet is still not compliant with the model.save() contract, because it doesn’t handle save(False, False, None, []).

Of course, Simon was sketching, not writing production code :stuck_out_tongue:. But as I’m considering whether to take a third go at this, it struck me – couldn’t we just ship a blessed util for this? Then our advice for auditing overridden save() methods would be something like: “scan for assignments to self, and pair them with extend_update_fields()”.

Hello @jacobtylerwalls!

What’s the reason to want to avoid creating a new set so bad? These objects should normally have only a few members as most models usually don’t have more than 20 fields and usually when you use update_fields you want to target a small subset of them a so a new set should have a low memory foot print.

From some local testing it takes about as much time to perform isinstance(update_fields_list, set) than to systematically do set(update_fields) and it’s also usually bad practice to alter a mutable argument passed by reference as that can unintended side effects. On CPython 3.10.x it appears to be faster with update_fields: set than other iterables so it seems like a faster path is taken already for creating shallow copies.

What if the save caller was planing to use update_fields: set for other purpose after calling save? A similar issue was caught recently with refresh_from_db.

With JSONField maturing in core without proper update_fields={'json_field__key'} support and a long standing desire to entirely replace auto_now and friends with fields that use default but have a way to opt-in into update_fields injection I think we’ll want to revisit how this feature is implemented in the future.

I’m not convinced that a core function is warranted here. I think it might be better to invest efforts in coming up with a generic solution to the problem of derived / generated fields and their interactions with update_fields.

Oh I do prefer your solution! :slight_smile:

I wasn’t saying my snippet was better, sorry–I was just trying to show that what seemed simple at first spiraled out into several further subtle points. (I’ll edit my post to clarify that.) I was just trying to suggest that each time I touched this, I found one more gotcha (i.e. *args not handled), and now you’ve helpfully called my attention to one more, re: defending against the caller doing something else with update_fields beyond using them in save().

So the journey went something like this:

  1. Read the release note or forum discussion and realize you need to patch update_fields. (In my case, I first ignored it when starting on the upgrade tasks a few months back, as it looked like a situation where if something was wrong, it had been wrong for a while, which would compete with other priorities, but I gave it another look this week, thinking “this should be quick”)
  2. You consult “overriding predefined model methods” as linked in the docs; the example uses a set, you write a regression test to build familiarity with what your save method was trying to accomplish, all is groovy
  3. You realize later any iterable is accepted, when you discover your own view (e.g. in some other project) sending update_fields as a list (ouch!)
  4. You realize later that save() allows positional arguments also, despite the release note saying “keyword argument” (also ouch, but less embarrassing)
  5. You realize later altering a mutable argument may cause unwanted side effects

That just seemes like a series of delicate things to get right for every maintainer of a project with custom save()'s, unless they have the cycles to put to a bigger refactor.

Of course nothing beats “slow down and think” :wink: , but just wanted to sound folks out to see whether there was interest in smoothing this path. I am sympathetic to this, though:

I think it might be better to invest efforts in coming up with a generic solution to the problem of derived / generated fields and their interactions with update_fields .

Another piece was determining what update_fields does with empty iterables; not the same as None. I went to the source to find out–it’s not discussed explicitly in the docs.

I might end up with something like this:

### Hypothetical current code ###
class MyModel(models.Model):
    def save(self, *args, **kwargs):
        if altering_field_one:
            self.field_one = 1
        if altering_field_two:
            self.field_two = 2
        super().save(*args, **kwargs)

### Correct version ###
def extend_update_fields(args, kwargs, fields_to_add: set):
    if len(args) >= 4 and (update_fields := args[3]) is not None:
        args[3] = set(update_fields) | fields_to_add
    elif (update_fields := kwargs.get("update_fields")) is not None:
        kwargs["update_fields"] = set(update_fields) | fields_to_add

    return args, kwargs

class MyModel(models.Model):
    def save(self, *args, **kwargs):
        if altering_field_one:
            self.field_one = 1
            extend_update_fields(args, kwargs, {"field_one"})
            args, kwargs = extend_update_fields(args, kwargs)
        elif altering_field_two:
            self.field_two = 2
            extend_update_fields(args, kwargs, {"field_two"})
            args, kwargs = extend_update_fields(args, kwargs)
        super().save(*args, **kwargs)

I think we could reasonably deprecate positional args for save(), and many other Django functions with several arguments where such calls are unclear and hopefully rare.

1 Like

For what is worth, in my head and in all the code bases I have worked with, update_fields was always a list. I never stopped thinking why until this post, and while I don’t have a clear answer, there is something about maintaining the order of the added fields to update_fields that feels more robust when debugging code/chasing bugs.

I would certainly advice to not assume that most people/code bases are using a set (not that I’m saying that this is currently proposed, but I think it’s worth saying it out loud). A quick and naïve search on GitHub returns 2.9K code matches for update_fields.append while only 163 code matches for update_fields.set.

1 Like

Thanks for sounding this out y’all! Opened a ticket to suggest making save() keyword-only.