From my perspective, there are a relatively small number of likely combinations here when a default is defined for the pk field.
-
The pk is not set, a default is provided. The nature of the default is such that the probability of a duplicate pk is effectively zero (e.g., UUID)
-
The pk is set, the default is irrelevent. The pk is new.
-
The pk is set, the default is irrelevent. The pk currently exists.
<opinion>
Case 1 is the most common case and works as you expect. (It can safely do an INSERT.)
Case 2 seems unusual to me. Using the analogy of an AutoField, it seems dangerous to want to manually assign a pk when you are expecting to algorithmically generate one. But, Django is still good with this, because an INSERT is still going to work.
Case 3 is the case we’re discussing here. You have a situation where you’re building an object where you don’t know if you’re going to need an update or an insert.
Now, if you don’t have a default for the pk, this does work as you’ve described - Django hides the distinction between an UPDATE and an INSERT.
</opinion>
<conjecture>
Django appears to be making the assumption that a default pk is going to be unique. What it currently does is the most performant choice in this situation.
Additionally, I think it also provides a safety net. If I had a default function to generate a pk, and I manually assign a pk, I’d want to know that I’ve created a conflict - if such a thing were to occur. I kind of cringe at the thought of mistakenly overwriting data. I’d rather make it an explicit choice to accept that happening than to have it happen by default.
So while it does create this wart in the API, I think I’m coming to the conclusion that it is the best overall solution in that it doesn’t sacrifice performance for the common case by doubling the number of SQL statements executed for a new object being saved.
</conjecture>