Saving in 3.2 versus 4.2

hepcat72 · August 9, 2023, 3:33pm

Hi,

We have a django-based project and recently decided to up our django dependency up from 3.2 to 4.2, but discovered some test failures and I’d like to learn more about the change in saving behavior between the 2 versions.

I have implemented work-arounds/fixes for the new save behavior and I believe I understand the change. I can also understand potentially why the change was made, even though I disagree with it. But first, let me first see if my understanding is correct. I should probably start with what my code does that 4.2 broke…

Background

I actually have 2 Model superclasses that raise exceptions in 4.2, (HierCachedModel and MaintainedModel). The fix for both classes is simple, but since the fix for MaintainedModel involves custom functions in the derived class, which may not behave as a developer intends, I’m going to focus on MaintainedModel. The basic idea behind MaintainedModel is, if your Model isa MaintainedModel, then you can create update methods for specific model fields that you can decorate as “updater” methods. In the decorator, you can also identify related Model classes whose changes should trigger an update to the maintained field. The superclass overrides the model’s save() method (along with other methods) to catch and trigger/propagate these changes. It makes updates to fields before calling super().save().

The Error

When we updated Django, a number of tests threw exceptions such as:

ValueError: 'Animal' instance needs to have a primary key value before this relationship can be used.

The problem stemmed from:

calls to update methods that are implemented in the derived class’s updater methods which query related records
traversals of relations implemented in MaintainedModel to trigger updates to related records

My Understanding

So I get what’s happening. In the above error case, the updater method is making a query on self.samples.filter(... and since the animal object is being created and Sample has a foreign key into Animal, self.samples is null (i.e. it doesn’t exist), hence an exception (inferred from the fact that pk doesn’t exist).

In django 3.2, it didn’t throw an exception, it just returned an empty queryset. In fact, if you called .count() on it, it returned 0. That made sense to me, because if you want to know how many related records exist, or you wanted a list of related records, you could answer that question (because the relations didn’t exist) and return either 0 or an empty list. I.e. if relations haven’t been set up, you can still respond to those queries.

Now however, if a developer using my class wants to update a field named “last_serum_sample”, they would have to handle the null case, where samples is null (or rather, infer that from pk not existing).

My quick fix for this issue was to catch that exception in the superclass and just not perform the update. I.e. I could assume that if the value being computed relies on a non-existent relation, because the record is just now being created, then I can deduce that there’s no valid value to enter into the field.

But their computed value may also include data from the object (i.e. not transiting a relation) or it could include data from e.g. Sample.objects.filter(... or any other computation that would not throw an exception in 4.2. So ignoring that update may throw away valid values.

My Question

I hate to require developers using my class to have to complicate their updater method code to test whether the primary key exists, or for me to clutter the log output with warnings about un-auto-updated fields.

I thought about possibly using “post-save” functionality. It would make my base class code more complex, but I also anticipate issues with saving updates in a post-save. Would that even be allowed, like if I use update_fields to possibly avoid an IntegrityError from a second save call, especially if the result of the developers update method results in no change?

Or is there a way to tell Django to behave the way it did in 3.2 for the duration of the save and return empty querysets?

hepcat72 · August 9, 2023, 3:40pm

Incidentally, the problem in HierCachedModel is that changes to records (in a save override), invalidates cached values in related records, so caches are deleted for all relations, and to do that, it traverses those relations to get the cache record IDs (<class>.<pk>.<fieldname>). Skipping those operations has no downside, as there is with MaintainedModel. If the relations don’t exist, there are no caches to delete.

hepcat72 · August 11, 2023, 10:03pm

What is the underlying reasoning behind this commit to the Django codebase? From the standpoint of any queries through reverse relations during a create never actually being able to return anything (in 3.2 they returned an empty queryset and did not error), I understand it, but… if you understand the circumstances surrounding the query returning nothing during a create, having it return an empty queryset makes direction-blind query code really simple and straightforward. And I’m not convinced that the added complexity is worth the guard-rails to keep developers from making incorrect assumptions about queries in those circumstances.

hepcat72 · August 11, 2023, 10:30pm

So I was looking at the discussion in the ticket attached to the PR. Apparently, the change was to make queries through reverse foreign keys behave the same as m2m queries. I think that centering the discussion/fix in that limited context had an unintended affect on my code, which is not implemented as traversing m2m relations. It traverses direct reverse foreign keys in order to propagate auto-updates. Those updates can be triggered by changes to any model directly or indirectly related to the model being saved. And in that context, returning [] (as it was put in the ticket) is the expected response.

My class was written as direction-blind. In order for me to prevent those queries to not hit the exception, will very much complicate my code. And I have no control over developers using my class as a superclass. All I can do is catch the exception and advise the developer in the resulting repackaged error how to avoid the exception. IMO, they shouldn’t have to know anything about that. It 100% works in every case in 3.2 where that exception is not thrown.

Topic		Replies	Views
Django - Query on overridden save() method cached or something The Admin	11	1950	March 8, 2023
update_or_create Behavior ORM	50	4390	January 19, 2024
ORM model 'save' method and race condition / ACID Isolation Using the ORM	2	98	July 13, 2024
Hook for extending `update_fields` in save() overrides ORM	6	1105	December 22, 2023
save() behavior when updating model with default primary keys Using the ORM	9	1841	May 3, 2024

Saving in 3.2 versus 4.2

Background

The Error

My Understanding

My Question

Related topics