I’ve run into a use-case in which it’d be helpful to have the ability to specify a different set of defaults for the update operation compared to the create operation. While I don’t expect my particular use case to translate, here’s a more generic one.
When a record is created, we would want to set created_by, but if it’s being updated, we’d want to set modified_by. This use case can’t be solved by using update_or_create, unless it allows for us to specify a different set of default values.
The last I checked, the code change for this is minimal and I should be able to manage it. However, I’m not sure if we want to extend this function in this manner, hence the topic.
I use a similar pattern all the time and would take advantage of this change for sure.
For backwards compatibility I’m guessing the update fields would be a new argument and the existing defaults would behave as-is unless the new argument is present?
Yes, that’s what I was thinking. defaults would work for both cases giving backwards compatibility, then if create_defaults is specified as a value other than None, then defaults is only for the update case.
+1 a useful feature for sure, and still possible within a single SQL query.
Adding an extra kwarg is technically backwards incompatible, since a model could have a field called “create_defaults”. We could perhaps use a helper obj per field:
I was under the impression that update_or_create was a combination of 1-4+ SQL queries. Could you explain what you meant a little more for me?
Thank you for raising the create_defaults incompatibility! I think I personally would prefer this route rather than an object per field. I suspect I’d be more likely to dynamically manage all of the fields for a create operation than dynamically manage the operations for a specific field.
Briefly, update_or_create calls get_or_create (1) to create a new instance, and captures an IntegrityError if that row already exists. If the row exists, the defaults is used to update the identified attributes and save (2). If the row didn’t previously exist, the create has created the row.
This is one of my top Django annoyances, so I’d love to see it addressed. I write tons of data importers and never use create() because data importers need to work with existing data. So both get_or_create and update_or_create are all I use.
That said, almost every time I use update_or_create, I need to use the return tuple to write a follow-up save() or update() based on whether or not the record was created. This would be a significant performance gain for these projects.
since a model could have a field called “create_defaults”.
It’s possible to conditionally update or insert in a single SQL statement, with different behaviour per branch, with MERGE: PostgreSQL: Documentation: 15: MERGE or alternative per-DB syntax. I guess this could be an optimisation for update_or_create, compared to its current implantation.
Alright, I had to double check, but it looks like 2 queries at a minimum with 4 SQL queries in the worst possible case:
Get - fails because it doesn’t exist
Create - integrity error due to conflict from a competing query creating one after Get #1 failed.
Get #2 - succeeds
Save to update with changes
It sounds like what @adamchainz is mentioning is merge or something like upsert. I don’t think Django supports that yet, but what I wanted to clarify. I suppose I should have asked that directly.
Edit:
It is interesting that last save operation occurs regardless if defaults empty or not. It seems like that’s an easy situation for us to avoid an unnecessary operation.