save() behavior when updating model with default primary keys

Eager to hear advice about a scenario I encountered in which I felt the ORM didn’t live up to the maxim that “Django abstracts the need to use INSERT or UPDATE SQL statements.”


Example: Create data, then update that data without a model instance in hand.

class WithoutDefault(models.Model):
    name = models.CharField(primary_key=True)
    message = models.CharField(null=True)

In [1]: from my_app.models import WithoutDefault

In [2]: snow = WithoutDefault(name="snow")

In [3]: snow.save()  # imagine this instance goes away

In [4]: remade_snow = WithoutDefault(name="snow", message="fell")

In [5]: remade_snow.save()

In [6]: remade_snow.message
Out[6]: 'fell'

Same thing, but the primary key has a default.

class WithDefault(models.Model):
    name = models.CharField(primary_key=True, default="unrealistic default")
    message = models.CharField(null=True)

In [1]: from my_app.models import WithDefault

In [2]: snow = WithDefault(name="snow")

In [3]: snow.save()  # imagine this instance goes away

In [4]: remade_snow = WithDefault(name="snow", message="fell")

In [5]: remade_snow.save()
---------------------------------------------------------------------------
UniqueViolation                           Traceback (most recent call last)

UniqueViolation: duplicate key value violates unique constraint "myapp_withdefault_pkey"
DETAIL:  Key (name)=(snow) already exists.

The above exception was the direct cause of the following exception:

IntegrityError: duplicate key value violates unique constraint "myapp_withdefault_pkey"
DETAIL:  Key (name)=(snow) already exists.

Debugging narrative

  • Aha, I must need an UPDATE. But isn’t the ORM supposed to abstract that away?
  • Try force_update=True (doesn’t do the trick)
  • Try force_update=True, update_fields=None, (doesn’t do the trick)
  • Review the docs for save():

Django executes an UPDATE if it is an existing model instance and primary key is set to a value that exists in the database.

  • Didn’t grok “existing model instance” at first. I constructed an instance, and the value exists in the database. Does the instance exist? :sparkles:
  • See the discussion about select_on_save() (doesn’t do the trick)
  • Issue WithDefault.objects.filter(pk="snow").delete() before the save and carry on. (Or, I could fetch the instance and then assign attributes to it.)

Impressions

  • I was in an “update_or_create”-like scenario, where I didn’t know or care whether “snow” existed before. In fact, this works fine:
In [11]: WithDefault.objects.update_or_create(name="snow", defaults={"message": "fell"})
Out[11]: (<WithDefault: WithDefault object (snow)>, False)
  • But I was working with a model with over a dozen attributes. Shoving them all into “defaults” just to update one row felt a little funny–especially after I debugged my program and realized I was in a scenario where I would always be updating.
  • I really just wanted to be able to construct an instance and save(), and the only reason it didn’t work was that the primary key field had a default, which struck me as an orthogonal detail.

Questions

  • Should the ORM have attempted an UPDATE instead here?
  • If not, should force_update=True have at least worked?
  • How do folks feel about a reword of “existing” model instance to “fetched” model instance given that “exists” is used in two senses in the same sentence?

Notes
My more realistic default was a UUIDField with default=uuid.uuid4; I just didn’t want to uglify my shell example. :wink:

I’m moving this to the internals category because it does seem like a bug to me - apparently one that has been around for a while.

I am able to recreate these symptoms, with Django going as far back as Django 3.0

I’d like to get more knowledgeable eyes on this.

1 Like

Thanks @KenWhitesell! I just found the Trac ticket with a pretty good tl;dr

That would break the following scenario though.

a = Account(pk='known-uuid-pk')
a.title = new_title
a.save() # expects an UPDATE here.

But I would argue that force_update should be passed in this case.

I can move this over to Trac and suggest that we allow force_update to work here?

I found the section in the docs at Model instance reference | Django documentation | Django that I think covers this.

The one gotcha here is that you should be careful not to specify a primary-key value explicitly when saving new objects, if you cannot guarantee the primary-key value is unused. For more on this nuance, see Explicitly specifying auto-primary-key values above and Forcing an INSERT or UPDATE below.

So apparently this is expected behavior, and as such, I’m moving this thread back to the “Using Django” bucket.

Additional information, see the Backward incompatible changes section of the Django 3.0 release notes. Those notes cover this precise situation.

Great find on the 3.0 release notes, Ken–very helpful.

There’s an extensive discussion on the PR that added that release note that reads to me like it (understandably!) accepts the premise that there wasn’t a better underlying solution to be had.

I don’t have a strong feeling either way even if I think the changes proposed here makes the most sense as I’ve explained on Trac. It feels like in order to properly implement this optimization we’d need a way to differentiate between a user provided primary key value and default generated one as Tim mentioned.


I think hope we can do that like this, solving the Trac ticket without causing the warts:
EDIT: this doesn’t pass the tests, but I’ll keep playing with it :crossed_fingers:

diff --git a/django/db/models/base.py b/django/db/models/base.py
index 6eaa600f10..490a1ec31a 100644
--- a/django/db/models/base.py
+++ b/django/db/models/base.py
@@ -1007,6 +1007,7 @@ class Model(AltersData, metaclass=ModelBase):
             not raw
             and not force_insert
             and self._state.adding
+            and not pk_set
             and (
                 (meta.pk.default and meta.pk.default is not NOT_PROVIDED)
                 or (meta.pk.db_default and meta.pk.db_default is not NOT_PROVIDED)

Finally, about the paragraph in the docs about “adding new instances” – it doesn’t really speak to me, since I’m not intending to create a new instance. The end of the paragraph also nudges me toward force_update, but that doesn’t work here.

First, I should issue my disclaimer here. We’re well outside my scope of knowledge, and so everything beyond my very first reply shouldn’t be taken as being authoritative. We’re in an area where I’m expressing opinions based upon what I’m finding - probably at the same time as you.

Second, as a side note, I was actually less surprised that the WithDefault fails than I was to see that the WithoutDefault works!

Having the default specified does nothing in either example, because you are setting the primary key in all cases. So you have an unused attribute in the field definition creating this difference.

Confirming you’re talking about this paragraph?

Django keeps a _state variable to know whether a specific object has been saved to the database. Quoting from those docs:

Newly instantiated instances have adding=True and db=None , since they are yet to be saved.

So by creating an instance of the model as WithDefault(...), this is a new object. It may be a reference to an existing row in a table, but this object has nothing indicating whether that’s the case. (At least, that’s my current interpretation of what I’m reading.)

Still just my interpretation, but one I feel a little more confident about - This is not how I read this. To me, the first sentence is the overriding principle here. Don’t set a primary-key value when saving new objects if the primary-key value is used. The second sentence just provides supporting information to help explain why you shouldn’t do that - not that they provide a workaround.

Combine that with the release notes docs seeming to be very explicit and direct, and I end up concluding that there isn’t an alternative to using update_or_create when the status is unsure. (And that this behavior is very much intentional.)

What I’m finding frustrating is that I can’t seem to find any messages in the mailing lists (Django Users and Django Developers) from the timeframe of 29260 (Feb / March 2018) with any sort of discussion about this or why this was an issue to be solved.

I don’t think your idea here will help the situation.

What you can’t tell from this is whether the pk that is set currently exists in the database. Nor, as I’m reading that method, is it able to identify whether the pk value being set comes from the default. This just puts you back in the position of doing an update before an insert for every save on a new object having a default on the pk.

1 Like

Thanks, this is why I came to django-users first–I wasn’t sure of my footing!

What you can’t tell from this is whether the pk that is set currently exists in the database. Nor, as I’m reading that method, is it able to identify whether the pk value being set comes from the default.

Ugh, the early patch never gets the worm. :slight_smile:

I tried this another way: setting ._state.adding = False at the moment the default is not used. It passes the tests, pending some issues with passing pk= instead of fieldname=. Diff below.

I think that would actually bring things closer to the spirit of the paragraph we’re parsing. I read it to say something like:

Are you providing a primary key?
– Do you intend to be creating a new object?
---- You should be prepared for a primary-key collision to issue an update instead! Beware data loss!

Whereas my scenario was:
Are you providing a primary key? Yes.
Do you intend to create an object? idk, I’m in an overwriting (update-or-create) posture.
You should be prepared…! Wait, I thought the ORM abstracted away the difference between INSERT and UPDATE?


What I’m finding frustrating is that I can’t seem to find any messages in the mailing lists (Django Users and Django Developers) from the timeframe of 29260 (Feb / March 2018) with any sort of discussion about this or why this was an issue to be solved.

I think it was just to optimizing out one UPDATE query on models with auto-generated UUID pk’s. But now it’s costing queries for folks who now need to fetch an instance (directly, or via update_or_create()). It seems like it was implicit from the PR discussions that force_update=True would still be an option for folks. (To paraphrase Carlton, we’re trading one user’s need to call force_insert for someone else’s need to call force_update." But, alas–force_update wasn’t tested.

I think force_update=True not working here is a bug. But I think something like this draft patch would accomplish what the participants on the PRs imagined in the first place (but is it worth it… :thinking: )

diff --git a/django/db/models/base.py b/django/db/models/base.py
index 4a9150bf37..a42448dd12 100644
--- a/django/db/models/base.py
+++ b/django/db/models/base.py
@@ -489,6 +489,8 @@ class Model(AltersData, metaclass=ModelBase):
                 if val is _DEFERRED:
                     continue
                 _setattr(self, field.attname, val)
+                if field.primary_key:
+                    self._state.adding = False
         else:
             # Slower, kwargs-ready version.
             fields_iter = iter(opts.fields)
@@ -501,6 +503,8 @@ class Model(AltersData, metaclass=ModelBase):
                         f"{cls.__qualname__}() got both positional and "
                         f"keyword arguments for field '{field.name}'."
                     )
+                if field.primary_key:
+                    self._state.adding = False
 
         # Now we're left with the unprocessed fields that *must* come from
         # keywords, or default.
@@ -522,6 +526,9 @@ class Model(AltersData, metaclass=ModelBase):
                             val = kwargs.pop(field.attname)
                         except KeyError:
                             val = field.get_default()
+                        else:
+                            if field.primary_key:
+                                self._state.adding = False
                 else:
                     try:
                         val = kwargs.pop(field.attname)
@@ -531,6 +538,9 @@ class Model(AltersData, metaclass=ModelBase):
                         # get_default() to be evaluated, and then not used.
                         # Refs #12057.
                         val = field.get_default()
+                    else:
+                        if field.primary_key:
+                            self._state.adding = False
             else:
                 val = field.get_default()
 
diff --git a/tests/basic/tests.py b/tests/basic/tests.py
index ad82cffe8c..2aded16233 100644
--- a/tests/basic/tests.py
+++ b/tests/basic/tests.py
@@ -1,5 +1,6 @@
 import inspect
 import threading
+import uuid
 from datetime import datetime, timedelta
 from unittest import mock
 
@@ -172,7 +173,14 @@ class ModelInstanceCreationTests(TestCase):
         self.assertTrue(Article.objects.filter(id=a.id).exists())
 
     def test_save_primary_with_default(self):
-        # An UPDATE attempt is skipped when a primary key has default.
+        known_uuid = uuid.uuid4()
+        # An UPDATE attempt is made when a primary key is provided.
+        with self.assertNumQueries(2):
+            PrimaryKeyWithDefault(uuid=known_uuid).save()  # fails if pk= is given
+        with self.assertNumQueries(1):
+            PrimaryKeyWithDefault(uuid=known_uuid).save()
+
+        # Otherwise only an INSERT is attempted.
         with self.assertNumQueries(1):
             PrimaryKeyWithDefault().save()
1 Like

From my perspective, there are a relatively small number of likely combinations here when a default is defined for the pk field.

  1. The pk is not set, a default is provided. The nature of the default is such that the probability of a duplicate pk is effectively zero (e.g., UUID)

  2. The pk is set, the default is irrelevent. The pk is new.

  3. The pk is set, the default is irrelevent. The pk currently exists.

<opinion>
Case 1 is the most common case and works as you expect. (It can safely do an INSERT.)

Case 2 seems unusual to me. Using the analogy of an AutoField, it seems dangerous to want to manually assign a pk when you are expecting to algorithmically generate one. But, Django is still good with this, because an INSERT is still going to work.

Case 3 is the case we’re discussing here. You have a situation where you’re building an object where you don’t know if you’re going to need an update or an insert.

Now, if you don’t have a default for the pk, this does work as you’ve described - Django hides the distinction between an UPDATE and an INSERT.
</opinion>

<conjecture>
Django appears to be making the assumption that a default pk is going to be unique. What it currently does is the most performant choice in this situation.

Additionally, I think it also provides a safety net. If I had a default function to generate a pk, and I manually assign a pk, I’d want to know that I’ve created a conflict - if such a thing were to occur. I kind of cringe at the thought of mistakenly overwriting data. I’d rather make it an explicit choice to accept that happening than to have it happen by default.

So while it does create this wart in the API, I think I’m coming to the conclusion that it is the best overall solution in that it doesn’t sacrifice performance for the common case by doubling the number of SQL statements executed for a new object being saved.
</conjecture>

Thanks for the thoughtful responses, Ken.

Opened a Trac ticket for the one-liner fix to get force_update=True working in this scenario.

1 Like