update_or_create(defaults=None) sends update_fields=set() rather than None

I have a use case for accessing side effects trapped on a model’s save() method but aborting the underlying save with update_fields=[], as documented.

To do that, I needed to alter my update_fields helper (also documented) from:

if (update_fields := kwargs.get("update_fields")) is not None:  # wrong
    kwargs["update_fields"] = ...

to:

if update_fields := kwargs.get("update_fields"):
    ...

to avoid adding a field to update_fields in the event that it was deliberately an empty iterable. This brought my helper in line with the documentation.

However, this caused the regression test for my original helper to fail.

obj, created = Model.objects.update_or_create(pk=foo, other=bar)
baz_value = obj.baz
obj.refresh_from_db()
self.assertEqual(obj.baz, baz_value)

The baz value is calculated in save() but not persisted because update_or_create() sends set() instead of None when defaults=None. This in turn makes it impossible for my helper to distinguish whether it’s appropriate to add baz to the update_fields.

In short, it seems I can either ensure .update_or_create(defaults=None) works, or .save(update_fields=[]) works, but not both.


This is a nice demonstration of the pitfalls of implementing logic inside save(), but I’m not in a position to direct any such rework like that. Instead, I’m wondering: should update_or_create(defaults=None) be a little more careful about sending None rather than set() to update_fields? The docs say update_or_create() is shorthand for calling save(), so my impression is that it should delegate these decisions to save() instead of presumptuously passing set().


This diff passes the test suite:

diff --git a/django/db/models/query.py b/django/db/models/query.py
index 1270141619..cef769c6c2 100644
--- a/django/db/models/query.py
+++ b/django/db/models/query.py
@@ -1009,7 +1009,7 @@ class QuerySet(AltersData):
                         update_fields.add(field.name)
                         if field.name != field.attname:
                             update_fields.add(field.attname)
-                obj.save(using=self.db, update_fields=update_fields)
+                obj.save(using=self.db, update_fields=update_fields or None)
             else:
                 obj.save(using=self.db)
         return obj, False

Ideas?

I’m not entirely familiar with your use case but I’m pretty confident we can’t change update_or_create to pass update_fields or None without defeating the purpose why it was introduced in the first place.

If there are no fields to update no save should take place so it could even be argued that save should not be called at all when update_fields is empty

diff --git a/django/db/models/query.py b/django/db/models/query.py
index 9d29c47332..8c9434b055 100644
--- a/django/db/models/query.py
+++ b/django/db/models/query.py
@@ -1006,7 +1006,8 @@ def update_or_create(self, defaults=None, create_defaults=None, **kwargs):
                         update_fields.add(field.name)
                         if field.name != field.attname:
                             update_fields.add(field.attname)
-                obj.save(using=self.db, update_fields=update_fields)
+                if update_fields:
+                    obj.save(using=self.db, update_fields=update_fields)
             else:
                 obj.save(using=self.db)
         return obj, False

The baz value is calculated in save() but not persisted because update_or_create() sends set() instead of None when defaults=None.

Hard to tell without an exact test case but is there a reason why you compute baz and assign it if it’s not meant to be persisted? Can’t you gate the computation and assignment with the same conditional you already use to augment update_fields when necessary?

Thanks for having a look!

It is meant to be persisted. I noticed today I wasn’t handling the case where it shouldn’t be persisted correctly (the empty iterable case).

Here’s a fiddle with what I mean.

If there are no fields to update no save should take place so it could even be argued that save should not be called at all when update_fields is empty

I guess that’s my question – create() delegates to save() without requiring knowledge of whatever fields save() manages itself. But update_or_create() does require this knowledge post ticket-32095.

I’m pretty confident we can’t change update_or_create to pass update_fields or None without defeating the purpose

Ah, right, of course. I guess my choices are:

  • bite bullet: decouple cleaning from save
  • bite bullet: factor recalculations out of save methods into custom fields with pre_save() implemented
  • hack: audit all update_or_create() usage to impart knowledge of the fields that save() manages itself (not future-proof)
  • hack: continue mishandling empty iterables for update_fields in general, but define a custom empty iterable I can sniff for when it’s important to handle it correctly (i.e. when I truly want no fields to be updated)

1 is a huge lift. 2 would grow the number of custom fields more than I’d like. 3 doesn’t sound right. I’ll probably opt for 4, although it’s not strictly correct. I was just checking my understanding before going down that path.

I’ll probably just do the strictly incorrect thing only temporarily, emit warnings, and allow time for update_or_create() callsites to determine if they really need the injected fields from the save() call or not. Should eventually get correctness that way :smiley:

I agree with the following from the last time I asked for advice on this topic:

Maybe the next stop should be a new-features repo ticket to sound out what that generic solution might look like?


Finally, should we update the pseudocode in the docs for update_or_create() to stop showing save() given that it’s described here as an implementation detail?