Feature idea: update_or_create to allow different defaults for create and update operations.

Hi folks,

I’ve run into a use-case in which it’d be helpful to have the ability to specify a different set of defaults for the update operation compared to the create operation. While I don’t expect my particular use case to translate, here’s a more generic one.

Given the following Record model:

class Record(models.Model):
    some_id = models.CharField(unique=True)
    created_by = models.ForeignKey(User, ...)
    modified_by = models.ForeignKey(User, null=True, blank=True, ...)

When a record is created, we would want to set created_by, but if it’s being updated, we’d want to set modified_by. This use case can’t be solved by using update_or_create, unless it allows for us to specify a different set of default values.

Record.objects.update_or_create(
    some_id=some_value,
    defaults={"modified_by": user},
    create_defaults={"created_by": user},
)

The last I checked, the code change for this is minimal and I should be able to manage it. However, I’m not sure if we want to extend this function in this manner, hence the topic.

What do folks think?

7 Likes

I use a similar pattern all the time and would take advantage of this change for sure.

For backwards compatibility I’m guessing the update fields would be a new argument and the existing defaults would behave as-is unless the new argument is present?

2 Likes

This is a feature I would absolutely use.

1 Like

Yes, that’s what I was thinking. defaults would work for both cases giving backwards compatibility, then if create_defaults is specified as a value other than None, then defaults is only for the update case.

+1 from me. This would be super useful.

1 Like

+1 a useful feature for sure, and still possible within a single SQL query.

Adding an extra kwarg is technically backwards incompatible, since a model could have a field called “create_defaults”. We could perhaps use a helper obj per field:

Record.objects.update_or_create(
    …,
    defaults={
        "modified_by": Var(create=None, update=user),
    },
)

An alternative would be changing defaults to optionally take an obj that contains separate create and update dicts.

I was under the impression that update_or_create was a combination of 1-4+ SQL queries. Could you explain what you meant a little more for me?

Thank you for raising the create_defaults incompatibility! I think I personally would prefer this route rather than an object per field. I suspect I’d be more likely to dynamically manage all of the fields for a create operation than dynamically manage the operations for a specific field.

1 Like

Briefly, update_or_create calls get_or_create (1) to create a new instance, and captures an IntegrityError if that row already exists. If the row exists, the defaults is used to update the identified attributes and save (2). If the row didn’t previously exist, the create has created the row.

So it’s 1 or 2 SQL statements, depending.

1 Like

+1

This is one of my top Django annoyances, so I’d love to see it addressed. I write tons of data importers and never use create() because data importers need to work with existing data. So both get_or_create and update_or_create are all I use.

That said, almost every time I use update_or_create, I need to use the return tuple to write a follow-up save() or update() based on whether or not the record was created. This would be a significant performance gain for these projects.

since a model could have a field called “create_defaults”.

This doesn’t feel like a common use case to me. I did a GitHub code search and found zero public projects that do this. GitHub Code Search (Preview)

I would argue this isn’t a huge risk. :person_shrugging:

1 Like

It’s possible to conditionally update or insert in a single SQL statement, with different behaviour per branch, with MERGE: PostgreSQL: Documentation: 15: MERGE or alternative per-DB syntax. I guess this could be an optimisation for update_or_create, compared to its current implantation.

1 Like

Alright, I had to double check, but it looks like 2 queries at a minimum with 4 SQL queries in the worst possible case:

  1. Get - fails because it doesn’t exist
  2. Create - integrity error due to conflict from a competing query creating one after Get #1 failed.
  3. Get #2 - succeeds
  4. Save to update with changes

It sounds like what @adamchainz is mentioning is merge or something like upsert. I don’t think Django supports that yet, but what I wanted to clarify. I suppose I should have asked that directly.

Edit:
It is interesting that last save operation occurs regardless if defaults empty or not. It seems like that’s an easy situation for us to avoid an unnecessary operation.

1 Like

Just adding on that this is almost always my use case as well. Importing data and then having to follow up with a save call for existing objects.

One project deals with about 30k csv rows each day doing this pattern.

1 Like

+1, It should be very useful

+1 I run into the case of needing to do an update after creating on a regular basis. Would definitely be a good addition from my point of view!

make a separate ticket?

2 Likes