I also took a look at django-debug-toolbar and I found this:
- In interactive mode with
sys.ps1
.
I guess I should elaborate a bit here. The proposed solution solves your problem â you would stop needing a workaround, and people would stop filing bug reports to your tool.
But what happens when someone else maintains a tool that does end up running Python in interactive mode, and gets bug reports and starts asking for a workaround? I wouldnât be surprised if there are already tools doing that, and it feels like any solution needs to work for them just as well as for you.
If we had Django to do all over again, I agree that probably QuerySet.__repr__()
should be handled differently. But the current behavior is so long-standing and so baked in to peopleâs assumptions about how to teach and debug Django that I donât think Django can make the change, now, to differentiate __str__()
(evaluates the query) and __repr__()
(doesnât evaluate the query).
(though it might not even be usefully possible if starting over completely: the exact distinction between repr()
and str()
, and when to use each one, and why they exist as separate things, is not exactly accessible beginner-level Python knowledge, and I wouldnât be surprised if a complete do-over of Django still ended up with both methods doing the same thing on QuerySet
)
But what happens when someone else maintains a tool that does end up running Python in interactive mode, and gets bug reports and starts asking for a workaround?
I think we would then consider their proposal on its merits.
If we had Django to do all over again, I agree that probably
QuerySet.__repr__()
should be handled differently.
I agree!
But the current behavior is so long-standing and so baked in to peopleâs assumptions about how to teach and debug Django that I donât think Django can make the change, now, to differentiate
__str__()
(evaluates the query) and__repr__()
(doesnât evaluate the query).
There are several different options here, each with a different impact on teaching and debugging. Iâm hopeful that we can find a way forward that balances all these different needs.
(though it might not even be usefully possible if starting over completely: the exact distinction between
repr()
andstr()
, and when to use each one, and why they exist as separate things, is not exactly accessible beginner-level Python knowledge, and I wouldnât be surprised if a complete do-over of Django still ended up with both methods doing the same thing onQuerySet
)
I think for most beginner use-cases one would want str
, mostly via print
. The only exception I can think of is interactive mode, which is why I suggest special-casing it. For more advanced users I think itâs reasonable to expect them to understand the difference between str
and repr
and to choose the appropriate one for them.
Another reason we might benefit from changing repr
is to avoid unexpected SynchronousOnlyOperation
exceptions when working with querysets in an async context. The direction I see us going with the async ORM is to make query points as explicit as possible with await
.
This does count as a point against my idea of moving the current implementation to __str__
though.
I feel that I have responded to the criticisms of this proposal as best as I can, so Iâm not sure where to go from here.
Dear Django Development Community and Lily,
I am writing to extend our gratitude towards Lily for highlighting a critical issue surrounding the behavior of QuerySet.__repr__()
method and diligently working towards a viable solution. This conversation sheds light on a problem that we have encountered repeatedly in our production environments. The unintended database queries triggered by the __repr__()
method of QuerySets have posed significant challenges, and itâs encouraging to see concerted efforts being made to address this issue.
Our experience resonates with the points Lily has raised. It is not always practical to avoid the use of repr
in middleware, especially when Django itself invokes repr
inside the get_traceback_data
function. It is sub-optimal that get_traceback_data
evaluates repr
even in instances where the result is not utilized. This behavior deviates from the conventional expectation that repr
should not perform I/O operations, a point we wholeheartedly agree with Lily on.
Furthermore, itâs important to note that QuerySets present on the stack are not always intended for immediate evaluation. Often, they serve as intermediates that are destined for further filtering before any evaluation occurs. The additional limit imposed by the current __repr__()
implementation can, in certain corner cases, escalate the expense of executing a QuerySet compared to the original query, thereby exacerbating the problem.
One of the protective measures we have employed is the use of query timeouts to guard against runaway queries. However, the current behavior of Django undermines this safeguard by re-executing the query sans a timeout, and does so repetitively for every variable in every stack frame referencing the QuerySet. This repetition amplifies the risk and the impact of unintended database queries, contradicting the protective intent behind query timeouts.
In light of the above, we stand in support of revisiting and revising the behavior of QuerySet.__repr__()
as proposed.
Thank you once again, Lily, for your proactive engagement on this matter, and to the Django Development Community for fostering a space where such critical discussions can take place. We look forward to contributing to and witnessing the progress on this front.
Warm regards,
Bruno Thalmann
CTO
On behalf of the Development team at Intempus ApS, Copenhagen
Re-reading since it got bumped, the Mariusz/Simon suggestion was a way forward no? (Sorry if I missed a No)
The one about setting REPR_OUTPUT_SIZE
to 0
? It doesnât fix the problem for third party libraries, which is my use case, but it would be a workaround for individual projects.
OK, gotcha. Iâd imagined Kolo setting that for its needs. But OK, sorry for the noise.
This does not really solve the issue because it will still perform a query, even when it is set to 0.
At least as far as I can see here: django/django/db/models/query.py at 4a5048b036fd9e965515e31fdd70b0af72655cba ¡ django/django ¡ GitHub
Simonâs suggestion about was to adjust that:
Thanks for taking a look! I saw that suggestion.
But changing the behaviour for setting it to 0 would that be acceptable?
Perhaps a less intrusive change would be to allow REPR_OUTPUT_SIZE to be None which would result in the QuerySet not to be evaluated and print something like <QuerySet [NOT evaluated]>
?
I have tested a local version of this but this would also mean that REPR_OUTPUT_SIZE can be adjusted through the django settings - which as far as I know it can not at the moment. (monkey patching is ofc possible)
To be specific, I think itâs a non-starter for Kolo because itâs technically changing the behaviour of our userâs code - maybe theyâre depending on calling repr
on a queryset somewhere - so itâs not something we want to opt our users into silently.
I am happy to provide a patch for something like this:
diff --git a/django/db/models/query.py b/django/db/models/query.py
index de00bba8d7..9fbf00cdcd 100644
--- a/django/db/models/query.py
+++ b/django/db/models/query.py
@@ -371,6 +371,8 @@ class QuerySet(AltersData):
self.__dict__.update(state)
def __repr__(self):
+ if REPR_OUTPUT_SIZE is None:
+ return f"<{self.__class__.__name__}: [NOT evaluated]>"
data = list(self[: REPR_OUTPUT_SIZE + 1])
if len(data) > REPR_OUTPUT_SIZE:
data[-1] = "...(remaining elements truncated)..."
Creating a django settings for it as well as a follow up change I can also do. Just have to read up on how to do that.
But before spending time on writing the unit tests for this and making the Pull Request I would like to know if this is the direction we want to go in - so that we can actually can get the change in.
Note that currently there are no direct tests of QuerySet.repr so the first thing would probably be to write those.
What is the next step on getting an approval to move on with this?