Changing QuerySet.repr to avoid unexpected queries

Lily-Foote · June 13, 2023, 10:10am

I also took a look at django-debug-toolbar and I found this:

github.com/jazzband/django-debug-toolbar

Should SQL queries be counted when they are generated by a logger at 'DEBUG' level?

opened 03:26PM - 13 Dec 18 UTC

I was baffled for a few hours trying to figure out where a bunch of unnecessary …queries were coming from on my app. I used `django-debug-toolbar`, which was a big help, but it also threw me a significant red herring. If your logging setup looks anything like this ... ``` LOGGING = { 'version': 1, 'disable_existing_loggers': False, 'handlers': { 'info_to_file': { 'level': 'DEBUG' if DEBUG else 'INFO', 'class': 'logging.FileHandler', 'filename': os.path.join(BASE_DIR, 'django.log'), }, }, 'loggers': { 'django': { 'handlers': ['info_to_file'], 'level': 'DEBUG', 'propagate': True, }, }, } ``` ... then you are logging every SQL query made to the database when `DEBUG == True`. I do this during development sometimes, because it is a built-in way to solve the same problems `django-debug-toolbar` addresses. #### The problem When I had my logger at `'level': 'DEBUG'` and `django-debug-toolbar` running, my page was generating queries at `O(n)` for `n` objects on the page, because every time this exception is raised, `django.template.base.VariableDoesNotExist: Failed lookup for key [some_template_variable] in ...`, the Django logger prints the **entire** template context. When the Django logger prints the template context, `__repr__()` and `__str__()` are called on every object in the context. This is a huge problem when you have foreign keys in your `__str__()` methods, and quickly balloons your queries to `O(n)`. So, I solved my ballooning query problem by switching the debug level to `'INFO'` during development. Immediately, queries were `O(1)` again. I was a little frustrated, because I just spent a lot of time debugging extra queries caused by my debugger. And I was just generally frustrated that the logger was even capable of running database queries. #### Solving the problem Can we make a special distinction for for queries generated by the Django logger?

ubernostrum · June 14, 2023, 5:34am

In interactive mode with sys.ps1.

I guess I should elaborate a bit here. The proposed solution solves your problem – you would stop needing a workaround, and people would stop filing bug reports to your tool.

But what happens when someone else maintains a tool that does end up running Python in interactive mode, and gets bug reports and starts asking for a workaround? I wouldn’t be surprised if there are already tools doing that, and it feels like any solution needs to work for them just as well as for you.

If we had Django to do all over again, I agree that probably QuerySet.__repr__() should be handled differently. But the current behavior is so long-standing and so baked in to people’s assumptions about how to teach and debug Django that I don’t think Django can make the change, now, to differentiate __str__() (evaluates the query) and __repr__() (doesn’t evaluate the query).

(though it might not even be usefully possible if starting over completely: the exact distinction between repr() and str(), and when to use each one, and why they exist as separate things, is not exactly accessible beginner-level Python knowledge, and I wouldn’t be surprised if a complete do-over of Django still ended up with both methods doing the same thing on QuerySet)

Lily-Foote · June 16, 2023, 3:34pm

But what happens when someone else maintains a tool that does end up running Python in interactive mode, and gets bug reports and starts asking for a workaround?

I think we would then consider their proposal on its merits.

If we had Django to do all over again, I agree that probably QuerySet.__repr__() should be handled differently.

I agree!

But the current behavior is so long-standing and so baked in to people’s assumptions about how to teach and debug Django that I don’t think Django can make the change, now, to differentiate __str__() (evaluates the query) and __repr__() (doesn’t evaluate the query).

There are several different options here, each with a different impact on teaching and debugging. I’m hopeful that we can find a way forward that balances all these different needs.

(though it might not even be usefully possible if starting over completely: the exact distinction between repr() and str(), and when to use each one, and why they exist as separate things, is not exactly accessible beginner-level Python knowledge, and I wouldn’t be surprised if a complete do-over of Django still ended up with both methods doing the same thing on QuerySet)

I think for most beginner use-cases one would want str, mostly via print. The only exception I can think of is interactive mode, which is why I suggest special-casing it. For more advanced users I think it’s reasonable to expect them to understand the difference between str and repr and to choose the appropriate one for them.

Lily-Foote · June 16, 2023, 3:39pm

Another reason we might benefit from changing repr is to avoid unexpected SynchronousOnlyOperation exceptions when working with querysets in an async context. The direction I see us going with the async ORM is to make query points as explicit as possible with await.

This does count as a point against my idea of moving the current implementation to __str__ though.

Lily-Foote · June 30, 2023, 1:56pm

I feel that I have responded to the criticisms of this proposal as best as I can, so I’m not sure where to go from here.

Thalmann · October 16, 2023, 12:18pm

Dear Django Development Community and Lily,

I am writing to extend our gratitude towards Lily for highlighting a critical issue surrounding the behavior of QuerySet.__repr__() method and diligently working towards a viable solution. This conversation sheds light on a problem that we have encountered repeatedly in our production environments. The unintended database queries triggered by the __repr__() method of QuerySets have posed significant challenges, and it’s encouraging to see concerted efforts being made to address this issue.

Our experience resonates with the points Lily has raised. It is not always practical to avoid the use of repr in middleware, especially when Django itself invokes repr inside the get_traceback_data function. It is sub-optimal that get_traceback_data evaluates repr even in instances where the result is not utilized. This behavior deviates from the conventional expectation that repr should not perform I/O operations, a point we wholeheartedly agree with Lily on.

Furthermore, it’s important to note that QuerySets present on the stack are not always intended for immediate evaluation. Often, they serve as intermediates that are destined for further filtering before any evaluation occurs. The additional limit imposed by the current __repr__() implementation can, in certain corner cases, escalate the expense of executing a QuerySet compared to the original query, thereby exacerbating the problem.

One of the protective measures we have employed is the use of query timeouts to guard against runaway queries. However, the current behavior of Django undermines this safeguard by re-executing the query sans a timeout, and does so repetitively for every variable in every stack frame referencing the QuerySet. This repetition amplifies the risk and the impact of unintended database queries, contradicting the protective intent behind query timeouts.

In light of the above, we stand in support of revisiting and revising the behavior of QuerySet.__repr__() as proposed.

Thank you once again, Lily, for your proactive engagement on this matter, and to the Django Development Community for fostering a space where such critical discussions can take place. We look forward to contributing to and witnessing the progress on this front.

Warm regards,

Bruno Thalmann
CTO
On behalf of the Development team at Intempus ApS, Copenhagen

carltongibson · October 17, 2023, 10:56am

Re-reading since it got bumped, the Mariusz/Simon suggestion was a way forward no? (Sorry if I missed a No)

Lily-Foote · October 17, 2023, 1:33pm

The one about setting REPR_OUTPUT_SIZE to 0? It doesn’t fix the problem for third party libraries, which is my use case, but it would be a workaround for individual projects.

carltongibson · October 17, 2023, 2:16pm

OK, gotcha. I’d imagined Kolo setting that for its needs. But OK, sorry for the noise.

Thalmann · October 17, 2023, 3:00pm

This does not really solve the issue because it will still perform a query, even when it is set to 0.
At least as far as I can see here: django/django/db/models/query.py at 4a5048b036fd9e965515e31fdd70b0af72655cba · django/django · GitHub

carltongibson · October 17, 2023, 3:12pm

Simon’s suggestion about was to adjust that:

Thalmann · October 17, 2023, 3:16pm

Thanks for taking a look! I saw that suggestion.
But changing the behaviour for setting it to 0 would that be acceptable?
Perhaps a less intrusive change would be to allow REPR_OUTPUT_SIZE to be None which would result in the QuerySet not to be evaluated and print something like <QuerySet [NOT evaluated]>?
I have tested a local version of this but this would also mean that REPR_OUTPUT_SIZE can be adjusted through the django settings - which as far as I know it can not at the moment. (monkey patching is ofc possible)

Lily-Foote · October 17, 2023, 4:56pm

To be specific, I think it’s a non-starter for Kolo because it’s technically changing the behaviour of our user’s code - maybe they’re depending on calling repr on a queryset somewhere - so it’s not something we want to opt our users into silently.

Thalmann · October 19, 2023, 8:28am

I am happy to provide a patch for something like this:

diff --git a/django/db/models/query.py b/django/db/models/query.py
index de00bba8d7..9fbf00cdcd 100644
--- a/django/db/models/query.py
+++ b/django/db/models/query.py
@@ -371,6 +371,8 @@ class QuerySet(AltersData):
         self.__dict__.update(state)

     def __repr__(self):
+        if REPR_OUTPUT_SIZE is None:
+            return f"<{self.__class__.__name__}: [NOT evaluated]>"
         data = list(self[: REPR_OUTPUT_SIZE + 1])
         if len(data) > REPR_OUTPUT_SIZE:
             data[-1] = "...(remaining elements truncated)..."

Creating a django settings for it as well as a follow up change I can also do. Just have to read up on how to do that.

But before spending time on writing the unit tests for this and making the Pull Request I would like to know if this is the direction we want to go in - so that we can actually can get the change in.
Note that currently there are no direct tests of QuerySet.repr so the first thing would probably be to write those.

What is the next step on getting an approval to move on with this?

Topic		Replies	Views
Safe repr for error pages Django Internals	7	97	February 24, 2025
Make django query pasteable Using the ORM	2	103	April 12, 2024
Slight modification to default `__repr__` for models? Django Internals	4	129	February 6, 2025
Proposal: Add Performance Warnings ORM	12	871	August 30, 2023
dj-tracker - A django app that tracks your queries to help optimize them Show & Tell	0	514	February 1, 2023

Changing QuerySet.__repr__ to avoid unexpected queries

Related topics

Changing QuerySet.repr to avoid unexpected queries