Is it a bad idea to rely on Django keeping model object instances alive in the QuerySet’s _result_cache after a first iteration over the QuerySet?
I have a team mate who wrote some code to the effect of:
from foo.models import Foo
from foo.utils import patch_foo_with_name, do_something_with_foo_names
foos = Foo.objects.filter(bar="bla")
name = "Luke"
for foo in foos:
patch_foo_with_name(foo, name)
do_something_with_foo_names(foos)
Note that this example is extremely simplified. The stuff that is actually patched into each Foo object is actually the result of some really complex logic that cannot be expressed as QuerySet.annotate. do_something_with_foo_names() will iterate over the foos and expects the patched data to be there.
Is this dangerous? I feel that the QuerySet’s cache is an implementation detail of Django that cannot be relied upon. I don’t think there is any explicit guarantee in the documentation that says the cache (QuerySet._result_cache) will never drop objects from RAM, although that’s how it looks in the Django 1.11 source code. A future version of Django might decide to limit the size of the cache and objects can be dropped from the cache, right?
I already know that this approach is broken if the QuerySet is paginated or further refined - e.g. foos = foos.filter(age__gt=3) - before calling do_something_with_foo_names, but let’s ignore that case for the moment.
It’s a great idea to rely on the query set result cache. Far from an implementation detail, it’s documented behaviour that allows you to avoid refetching data from your database. If there’s any wording that could be improved on the docs here please feel free to make a PR and tag me on it
I understand that the purpose of the cache is to avoid refetching data from the database.
My concern is that the purpose of the cache might not include persisting, across iterations, extra things that I patched onto the objects.
Without digging into the Django source code, I would not know how Django avoids refetching data. What if Django were caching the database results in some data structure that is more compact than the full-fledged model objects, and re-instantiating the model objects every time the queryset is iterated over? I think that’s certainly a possibility, depending on how the Django core developers view the performance/memory usage trade-off. I hope I’m making sense.
Django will never change the representation of models in that way - it would break too many things, including built-in features like prefetch_related and annotations. Re-instantiating models can even have side effects due to pre/post init signals.
The result cache is implemented as a simple list of model instances and I can assure you it won’t change.