There’s a requested solution (change the __repr__()
behavior), but I still have trouble finding a clear statement of the problem. Mostly the motivation seems to be either that the query is unexpected, or that the query is overloading the DB.
I ran into this in the context of dynamic code analysis, using sys.setprofile
(sys.settrace
would be similar). In Kolo, we introspect a user’s program as it runs to extract interesting information about it. Specifically, we look at the local variables in each function call of our user’s code and we record their repr
when they’re not straightforwardly serializable as JSON. This led to a bug report about Kolo generating extra queries in a user’s code: Middleware still causes extra queries · Issue #4 · kolofordjango/kolo · GitHub (I think the earlier report was through a private channel.)
I was able to work around this by monkeypatching Queryset.__repr__
, but since other people have run into issues, I’m proposing this change upstream.
This is the same sort of problem that the original poster in the django-developers mailing list ran into: https://groups.google.com/g/django-developers/c/WzRZ0IfBipc/m/04FBC8IzAQAJ
The argument that you can work around it by not calling repr is certainly true, but in practice, I don’t think it’s reasonable. Production error reporting tools (like Sentry and New Relic) will call repr when reporting on exceptions.
And the reporters of the two tickets: https://code.djangoproject.com/ticket/20393:
In trying to track down some timeouts and strange queries in our apps we saw some strange queries being run by django. We were querying for a single record with a FOR UPDATE, getting an error, then django would run the same query 3 times but with a limit of 21 and then fetch all of the records, each time.
Eventually I tracked this down to our error-handling middleware calling repr() on a QuerySet, which then queried the database. __repr__
is supposed to be a string-representation of an object in its current state and should not gave side-effects like making database queries, especially in the case where the query uses FOR UPDATE or similar logic.
And https://code.djangoproject.com/ticket/30863:
Django will throw an exception (rightfully so).
As part of the usual error reporting process in debug mode, Django may eventually call repr() on the “base” queryset (that is essentially Result.objects.all()).
QuerySet.repr tries to be helpful by printing the first 21 results of the evaluated query. Because the base queryset orders by the un-indexed “name” column, this can easily overload the database when it does “SELECT … FROM Result ORDER BY name LIMIT 21” (trying to sort hundreds of millions of rows by an unindexed column)
Even with debug mode turned off, some error reporting tools like Sentry will call repr on the queryset, creating the same problem in production.
Personally I think that __repr__
should not do IO. I do however, acknowledge the practical benefit of the current implementation when working in interactive mode, which is why I suggested the change I did.
Which leaves only the unexpected-behavior issue. I think that could probably be handled as a documentation issue rather than a technical issue.
Documentation means that authors of tools like Kolo (and potentially Sentry and Newrelic - I haven’t checked how they handle this) will continue to need to monkeypatch Queryset
.
If there’s another problem I’m missing here, please let me know. As-is, I think the behavior is too useful for learning/debugging to get rid of, and likely not something where Django can auto-detect when and whether to do. As an absolute last resort I suppose there is the suggested option to put an explicit “don’t evaluate when __repr__()
-ing” flag on QuerySet
, but that feels non-discoverable and likely wouldn’t help with situations where Library A wants Application B’s model to have that flag set.
I think Django can detect when to do this. Specifically:
- In interactive mode with
sys.ps1
.
- In
print
debugging by moving the current implementation to __str__
.
- In
pdb
debugging by moving the implementation to __str__
- users can call str
or print
in their debugging session.
I think the rest of the time it’s preferable to avoid doing unexpected queries, or the solution to one of the above cases applies.