Allow querying RelatedManagers on unsaved objects

In commit 7ba6ebe9, to address #19580, accessing a RelatedManager on an unsaved model instance now raises a ValueError.

The initial ticket comment said, “These are slight backwards compatibility changes, but to me it seems that almost always the changes will be visible only in code that isn’t working as intended.”

Right after the change was merged, people started noticing problems from this. See comment:56 on ticket 19580. A developer reply was, “this is undocumented behavior and as far as I’m aware it should be quite rare.”

Later, once this hit the world, Alex Matos said in comment:58, “while upgrading it from version 3.2 to 4.2 a lot of code broke due to this change, where code that was once returning an empty queryset, now raises a ValueError. In my opinion, this change should be fully reverted.” See also comment:10 on #33952 and the next comment.

I had exactly the same experience. I agree with those commenters that this should be reverted.

At this point, you might be tempted to say (as was somewhat said already): It’s been multiple years now, people will have adjusted their code by now. And sure, that’s probably generally true. But the problem is, this change is so fundamentally wrong that, at my job, we keep making this mistake, even in new code.

For us, a common case for this issue is in a Model.clean(). We will need to check some condition on related objects. For example:

if not self.is_active:
    if self.related_thing_one.filter(is_active=True).exists():
        raise ValidationError(…)

Other examples might be acting upon those related objects, in an overridden save() or a signal handler.

The old approach was simple and ergonomic. Things Just Worked whether the object was saved or not. The new approach requires me to 1) remember this is a problem, and 2) add if self.pk guards every time. Despite this change being a pet peeve of mine, I am still forgetting to add those guards, and I’m not the only one. That’s how intuitive the old behavior is and how unintuitive the new behavior is.

I believe the correct behavior is that accessing a RelatedManager on an unsaved object should be functionally equivalent to .none(). That is, it should return no results, successfully, without hitting the database.

Why shouldn’t it work that way? When you are trying to access foo.related_thing, you are fundamentally asking to perform an operation on the set of Foo’s related things. If that Foo has not been saved, then it can’t possibly have any related things. So the answer is known: it is the empty set. The answer is not “You asked an impossible/nonsensical question.”, which is what ValueError means.

P.S. I had more things linked for the reader’s convienence, but I’m a “new user”, so I can only put two links in my post, despite them all going to Django GitHub or the Django ticket tracker.

I’m not sure what I think just yet, but I want to add a few points to the conversation:

  • “unsaved” is kind of a misnomer – I wish the messaging around this was more clear that we’re only talking about missing pk values. A model instance with a python default for PK (or an instance hydrated with a known PK) works just fine, as I pointed out here.
  • I assume your proposal also involves reverting this change regarding passing unsaved objects to filters. I think you need both to get this working again.
  • What is your proposal for m2m fields? Leave some drift against reverse FKs, or find a way for read queries on m2m fields to return an empty set?
  • Finally, what do you think about blessing “half-usable” related managers that work only for reads but not for writes?

For us, a common case for this issue is in a Model.clean() . We will need to check some condition on related objects.

Again, I’m not sure what I think about your proposal yet, but I’ll just note that if you favor this pattern, then python-side defaults for your primary key fields would be a great fit for you, obviating the need to check self.pk all over.

A model instance with a python default for PK … works just fine

That’s a fair point. I could use UUIDs everywhere, such that I could then generate them in Python. But that’s quite the change.

I assume your proposal also involves reverting this change regarding passing unsaved objects to filters. I think you need both to get this working again.

I don’t know if it’s conceptually necessary to allow .filter(field=obj_without_pk) to fix this. It might be necessary as an implementation detail, at least without making a lot of other changes.

I don’t have strong feelings about the .filter() case. But I think my same point applies: the result is known, so yes, it should be allowed.

The change you referenced only adds a deprecation, a note to raise a ValueError in the future, and a test for it. The underlying code obviously already handles this case fine, and there’s no indication of what code could/should be removed once this is not longer possible.

Currently, what ends up happening is that Foo.objects.filter(field=obj_without_pk) results in WHERE "foo"."field" IS NULL. This is an obvious transformation of obj_without_pk.pk being None. Nothing is special-cased here. That’s a useless database query, but it works and produces the right results. Once the .filter(field=obj_without_pk) is no longer permitted, is there some plan to rip out more code? Otherwise, what is the benefit of prohibiting this? It seems like going to more work to disallow something for no benefit.

If this was to be allowed again, in theory, perhaps something even smarter could be done. (That’s not a requirement to un-deprecate it, though.) Maybe some layer (Q, perhaps) could know that a given condition is “always False”. Such a condition could then swallow the other side in an AND operation and be thrown away on an OR operation. If, at the top level, the condition was always False, then skip the query. Likewise, “always True” could be implemented similarly: thrown away in AND, swallow the other side in OR, if left at the top-level, omit nothing in the WHERE clause.

In digging into this, there seems to be some support for this already. EmptyResultSet is “A database query predicate is impossible.” and FullResultSet is “A database query predicate is matches everything.” EmptyResultSet is now .none() works.

What is your proposal for m2m fields? Leave some drift against reverse FKs, or find a way for read queries on m2m fields to return an empty set?

Those should return an empty set too. The same logic applies: It is known that there are no such objects, so it should return an empty set.

Finally, what do you think about blessing “half-usable” related managers that work only for reads but not for writes?

If I’m understanding correctly, yes, that’s a logical consequence of what I’m asking for. self.related_thing_one would work for reads (being equivalent to self.related_thing_one.none()) but not for writes (because it can’t). I don’t have a problem with that. It can work for reads and it cannot work for writes, so yeah, that’s logical. It used to behave this way. Presumably nobody is trying to use this for writes, because that cannot work.