Changing QuerySet.__repr__ to avoid unexpected queries

I also took a look at django-debug-toolbar and I found this:

  • In interactive mode with sys.ps1.

I guess I should elaborate a bit here. The proposed solution solves your problem – you would stop needing a workaround, and people would stop filing bug reports to your tool.

But what happens when someone else maintains a tool that does end up running Python in interactive mode, and gets bug reports and starts asking for a workaround? I wouldn’t be surprised if there are already tools doing that, and it feels like any solution needs to work for them just as well as for you.

If we had Django to do all over again, I agree that probably QuerySet.__repr__() should be handled differently. But the current behavior is so long-standing and so baked in to people’s assumptions about how to teach and debug Django that I don’t think Django can make the change, now, to differentiate __str__() (evaluates the query) and __repr__() (doesn’t evaluate the query).

(though it might not even be usefully possible if starting over completely: the exact distinction between repr() and str(), and when to use each one, and why they exist as separate things, is not exactly accessible beginner-level Python knowledge, and I wouldn’t be surprised if a complete do-over of Django still ended up with both methods doing the same thing on QuerySet)

1 Like

But what happens when someone else maintains a tool that does end up running Python in interactive mode, and gets bug reports and starts asking for a workaround?

I think we would then consider their proposal on its merits.

If we had Django to do all over again, I agree that probably QuerySet.__repr__() should be handled differently.

I agree!

But the current behavior is so long-standing and so baked in to people’s assumptions about how to teach and debug Django that I don’t think Django can make the change, now, to differentiate __str__() (evaluates the query) and __repr__() (doesn’t evaluate the query).

There are several different options here, each with a different impact on teaching and debugging. I’m hopeful that we can find a way forward that balances all these different needs.

(though it might not even be usefully possible if starting over completely: the exact distinction between repr() and str(), and when to use each one, and why they exist as separate things, is not exactly accessible beginner-level Python knowledge, and I wouldn’t be surprised if a complete do-over of Django still ended up with both methods doing the same thing on QuerySet)

I think for most beginner use-cases one would want str, mostly via print. The only exception I can think of is interactive mode, which is why I suggest special-casing it. For more advanced users I think it’s reasonable to expect them to understand the difference between str and repr and to choose the appropriate one for them.

Another reason we might benefit from changing repr is to avoid unexpected SynchronousOnlyOperation exceptions when working with querysets in an async context. The direction I see us going with the async ORM is to make query points as explicit as possible with await.

This does count as a point against my idea of moving the current implementation to __str__ though.

1 Like

I feel that I have responded to the criticisms of this proposal as best as I can, so I’m not sure where to go from here.

Dear Django Development Community and Lily,

I am writing to extend our gratitude towards Lily for highlighting a critical issue surrounding the behavior of QuerySet.__repr__() method and diligently working towards a viable solution. This conversation sheds light on a problem that we have encountered repeatedly in our production environments. The unintended database queries triggered by the __repr__() method of QuerySets have posed significant challenges, and it’s encouraging to see concerted efforts being made to address this issue.

Our experience resonates with the points Lily has raised. It is not always practical to avoid the use of repr in middleware, especially when Django itself invokes repr inside the get_traceback_data function. It is sub-optimal that get_traceback_data evaluates repr even in instances where the result is not utilized. This behavior deviates from the conventional expectation that repr should not perform I/O operations, a point we wholeheartedly agree with Lily on.

Furthermore, it’s important to note that QuerySets present on the stack are not always intended for immediate evaluation. Often, they serve as intermediates that are destined for further filtering before any evaluation occurs. The additional limit imposed by the current __repr__() implementation can, in certain corner cases, escalate the expense of executing a QuerySet compared to the original query, thereby exacerbating the problem.

One of the protective measures we have employed is the use of query timeouts to guard against runaway queries. However, the current behavior of Django undermines this safeguard by re-executing the query sans a timeout, and does so repetitively for every variable in every stack frame referencing the QuerySet. This repetition amplifies the risk and the impact of unintended database queries, contradicting the protective intent behind query timeouts.

In light of the above, we stand in support of revisiting and revising the behavior of QuerySet.__repr__() as proposed.

Thank you once again, Lily, for your proactive engagement on this matter, and to the Django Development Community for fostering a space where such critical discussions can take place. We look forward to contributing to and witnessing the progress on this front.

Warm regards,

Bruno Thalmann
CTO
On behalf of the Development team at Intempus ApS, Copenhagen

Re-reading since it got bumped, the Mariusz/Simon suggestion was a way forward no? (Sorry if I missed a No)

The one about setting REPR_OUTPUT_SIZE to 0? It doesn’t fix the problem for third party libraries, which is my use case, but it would be a workaround for individual projects.

OK, gotcha. I’d imagined Kolo setting that for its needs. But OK, sorry for the noise. :+1:

This does not really solve the issue because it will still perform a query, even when it is set to 0.
At least as far as I can see here: django/django/db/models/query.py at 4a5048b036fd9e965515e31fdd70b0af72655cba ¡ django/django ¡ GitHub

Simon’s suggestion about was to adjust that:

Thanks for taking a look! I saw that suggestion.
But changing the behaviour for setting it to 0 would that be acceptable?
Perhaps a less intrusive change would be to allow REPR_OUTPUT_SIZE to be None which would result in the QuerySet not to be evaluated and print something like <QuerySet [NOT evaluated]>?
I have tested a local version of this but this would also mean that REPR_OUTPUT_SIZE can be adjusted through the django settings - which as far as I know it can not at the moment. (monkey patching is ofc possible)

To be specific, I think it’s a non-starter for Kolo because it’s technically changing the behaviour of our user’s code - maybe they’re depending on calling repr on a queryset somewhere - so it’s not something we want to opt our users into silently.

1 Like

I am happy to provide a patch for something like this:

diff --git a/django/db/models/query.py b/django/db/models/query.py
index de00bba8d7..9fbf00cdcd 100644
--- a/django/db/models/query.py
+++ b/django/db/models/query.py
@@ -371,6 +371,8 @@ class QuerySet(AltersData):
         self.__dict__.update(state)

     def __repr__(self):
+        if REPR_OUTPUT_SIZE is None:
+            return f"<{self.__class__.__name__}: [NOT evaluated]>"
         data = list(self[: REPR_OUTPUT_SIZE + 1])
         if len(data) > REPR_OUTPUT_SIZE:
             data[-1] = "...(remaining elements truncated)..."

Creating a django settings for it as well as a follow up change I can also do. Just have to read up on how to do that.

But before spending time on writing the unit tests for this and making the Pull Request I would like to know if this is the direction we want to go in - so that we can actually can get the change in.
Note that currently there are no direct tests of QuerySet.repr so the first thing would probably be to write those.

What is the next step on getting an approval to move on with this?