Proposal: Add Performance Warnings

chre00002 · August 17, 2023, 10:47am

Hi guys,

I wanted to ask whether you think it’s a reasonable idea to add an option for performance warnings with regards to query generation to Django?

Motivating example:
So my Professor gave me the task to find out why Django created 239 sql-queries for a specific webpage. I eventually figured out that most of these queries where caused by accessing a ForeignKey attribute within the _init_() method of a model class.
I was new to Django and had no previous knowledge of descriptors in Python so it took me quite a while to figure this out.
Additionally as the definition of QuerySets and the evaluation of QuerySets happens in different locations this can also makes it hard to figure out where a select_related/prefetch_related is missing (Especially if for example the QuerySet is defined in a view method and the related object access happens in a method defined within a model class).

This led me to thinking about whether Django could make it easier for Developers to debug these performance issues so I tried implementing warnigs for this. Here is what I came up with:

I added an option for enabling performance warnings within the settings file of the project

PERFORMANCE_WARNINGS = False

Within the related_descriptors file I added the warnings (Using Python warnings) within all of the _get_() methods. E.g. from the _get_() method of the ForwardManyToOneDescriptor:

def __get__(self, instance, cls=None):
        """
        Get the related instance through the forward relation.

        With the example above, when getting ``child.parent``:

        - ``self`` is the descriptor managing the ``parent`` attribute
        - ``instance`` is the ``child`` instance
        - ``cls`` is the ``Child`` class (we don't need it)
        """
        if instance is None:
            return self

        # The related instance is loaded from the database and then cached
        # by the field on the model instance state. It can also be pre-cached
        # by the reverse accessor (ReverseOneToOneDescriptor).
        try:
            rel_obj = self.field.get_cached_value(instance)
        except KeyError:
            #Added code
            if settings.PERFORMANCE_WARNINGS:
                warnings.warn("\nPerformance Warning: A database query was created when trying to access attribute '{}' in the model \n{}\n relating to the model \n{}.\n This is either the case because '{}' is accessed within the __init__method of \n{}\n or because there was no select_related()/prefetch_related() with '{}' as parameter.".format(self.field.name, instance.__class__, self.field.related_model, self.field.name, instance.__class__, self.field.name)
                              , stacklevel=2)
            #

The reasoning here is that there is only a cache miss in _get_() in two cases if I am not mistaken:
Either select_related()/prefetch_related() was not used on this attribute and/or the attribute was accessed within the _init_() method of the model meaning that this warning makes sense here.
As the developer might want the query to happen here the PERFORMANCE_WARNINGS setting is false by default.

When I turn the performance warnings on whithin the motivating example this is what the terminal prints out:

/home/christoph/Studium/Bachelorarbeit/Django/mastersystem/mapsys2/core/models/master_application_system.py:363: UserWarning: 
Performance Warning: A database query was created when trying to access attribute 'status' in the model 
<class 'core.models.master_application_system.StudyProgramChoices'>
 relating to the model 
<class 'core.models.master_application_system.ApplicationStatus'>.
 This is either the case because 'status' is accessed within the __init__method of 
<class 'core.models.master_application_system.StudyProgramChoices'>
 or because there was no select_related()/prefetch_related() with 'status' as parameter.
  if self.status is None:
/home/christoph/Studium/Bachelorarbeit/Django/mastersystem/mapsys2/core/models/master_application_system.py:1269: UserWarning: 
Performance Warning: A database query was created when trying to access attribute 'overall_grade' in the model 
<class 'core.models.master_application_system.HumanReviews'>
 relating to the model 
<class 'core.models.master_application_system.Grades'>.
 This is either the case because 'overall_grade' is accessed within the __init__method of 
<class 'core.models.master_application_system.HumanReviews'>
 or because there was no select_related()/prefetch_related() with 'overall_grade' as parameter.
  self.__original_overall_grade = self.overall_grade
/home/christoph/Studium/Bachelorarbeit/Django/mastersystem/mapsys2/core/models/master_application_system.py:910: UserWarning: 
Performance Warning: A database query was created when trying to access attribute 'university' in the model 
<class 'core.models.master_application_system.DegreesObtained'>
 relating to the model 
<class 'core.models.master_application_system.UniversitiesWorld'>.
 This is either the case because 'university' is accessed within the __init__method of 
<class 'core.models.master_application_system.DegreesObtained'>
 or because there was no select_related()/prefetch_related() with 'university' as parameter.
  "university": self.university.name if self.university else None,
/home/christoph/Studium/Bachelorarbeit/Django/mastersystem/mapsys2/core/models/master_application_system.py:914: UserWarning: 
Performance Warning: A database query was created when trying to access attribute 'subject' in the model 
<class 'core.models.master_application_system.DegreesObtained'>
 relating to the model 
<class 'core.models.master_application_system.StudySubjects'>.
 This is either the case because 'subject' is accessed within the __init__method of 
<class 'core.models.master_application_system.DegreesObtained'>
 or because there was no select_related()/prefetch_related() with 'subject' as parameter.
  "subject": self.subject.subject_name if self.subject else None

This would definitely be very helpful for debugging especially for Developers who are not experts in Django as these warnings directly tell you which line of code caused an avoidable query and how this might be fixed.

KenWhitesell · August 17, 2023, 12:13pm

Have you taken a look at the Django Debug Toolbar?

If you haven’t, you should.

It makes a lot of performance-related information available, not just the number of queries being executed for a page. It’s also very popular. It tends to make most people’s “Top ‘x’ Django packages” lists.

chre00002 · August 17, 2023, 3:50pm

Thanks for the quick reply! I was actually aware of the Django Debug Toolbar and had used it in the past. I must have overlooked that it actually tells you where each query is generated as I thought it only tells you what queries and how many queries got generated.
I will still look into whether some more helpful information regarding this topic could be added to the Debug Toolbar but this shouldn’t concern Django itself.
Thank you for the help!

charettes · August 19, 2023, 1:53pm

Hello @chre00002, I suggest you have a look at django-seal which adds a QuerySet.seal() method that turns any attribute access that would result in a N+1 query problem into a warning.

The source and tests should pretty much self-documenting if you want to use seal.models.make_model_sealable to implement a feature alike PERFORMANCE_WARNINGS to have all models sealable and default managers automatically seal querysets.

At $WORK we’ve had the most success by making all models sealable, sealing querysets, and turning warnings into errors when running our full test suite. We get warnings logged in production and catch regression early on CI.

Happy to answer further questions you might have here.

shangxiao · August 28, 2023, 3:51am

Hmm very interesting Simon, I tried it out on a small sample project and it seems to be exactly what my $job is looking for.

Our previous CTO had setup a warning whenever a view triggered a lot of db queries but it was mostly a guess ie “if views trigger more than 50 queries, log a warning” but this nicely attacks the problem at the source. The problem with our current warning is that tests may not generate enough data to raise the warning additionally the escalation from warning to an exception for use in tests is a very nice touch!

chre00002 · August 28, 2023, 6:54am

Hi @charettes, first I wanted to thank you for the help! I tried out django-seal and think that it is quite helpful and in many ways tackles the problem that I presented. I also tried some other tools in regards to the n+1 query problem (Mostly from here).
And while a lot of them are really usefull none of them seems to work for the very specific case of related object access within the _init_() method of a model class.
Do you think that support for this case is possible (e.g. in django-seal)?

charettes · August 28, 2023, 1:02pm

That’s the origin of the package at $job also, we were developing views that had the potential to generate a tons of query (think GraphQL) and wanted a way to get better developing, CI, and production observability about them.

charettes · August 28, 2023, 1:12pm

Could you elaborate on that? Do you mean that you have model classes where you override the __init__ method and try to access related objects after calling super() or that you have post_init signals that try to access related attributes?

If that’s the case you could setup up a pre_init signal receiver that calls seal() on all SealableModel instances.

from django.db.models.signals import pre_init
from seal.models import SealableModel

def seal_at_init(instance, **kwargs):
    if isinstance(instance, SealableModel):
        instance.seal()

pre_init.connect(seal_at_init)

It makes we wonder if the newly added feature to pass seal=True at model initialization should default to that though.

Edit: it seems the above won’t work because pre_init is sent before instance._state is assigned.

It’d be great to have an example of the issue you are running into with rogue queries at initialization time as I suspect this is an uncommon case.

chre00002 · August 29, 2023, 3:12pm

Here is an example:

class Test(models.Model):
    test1 = models.ForeignKey(Test1, on_delete=models.CASCADE)

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        placeholder = self.test1

Upon initialisation of an Test instance a query is created for the related Test1 object. Now I get why this is happening and also that Djangos documentation discourages developers from writing custom _init_() methods. But as I have seen this creating n+1 problems in a real world example with the developers not easily finding this performance bug I still wonder how this could be improved.

As I have explaned previously Django-seal or other 3rd party tools don’t detect/fix this. This is why I was asking whether seal (or other 3rd party tools) could implement their features for this _init_ case as well?

charettes · August 29, 2023, 3:25pm

You should be able to define your own SealedModel abstract class that calls self.seal() in __init__ and get what you’re after

from seal.models import SealableModel

class SealedModel(SealableModel):
    class Meta:
        abstract = True

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.seal()

class Test(SealedModel):
    test1 = models.ForeignKey(Test1, on_delete=models.CASCADE)

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        placeholder = self.test1  # Would emit a warning

KenWhitesell · August 29, 2023, 4:44pm

You’ve really piqued my curiosity here. What is the use-case for doing something like this in the __init__ method of the Model class?

chre00002 · August 30, 2023, 10:40am

So one of the use cases from the real world example looks something like this:

class HumanReviews(auto_prefetch.Model):
    overall_grade = models.ForeignKey(
        Grades, on_delete=models.DO_NOTHING)
    # copy of grade to avoid set back to "TBD" if reviewer changes his mind:
    __original_overall_grade = None

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.__original_overall_grade = self.overall_grade

    def save(self, *args, **kwargs):
        # check whether we can change the grade from TBD back to something else:
        if (
            self.__original_overall_grade
            and self.__original_overall_grade.abbreviation != "TBD"
            and self.overall_grade.abbreviation == "TBD"
        ):
            raise ValidationError(
                "You cannot change the grade to TBD from something else."
            )

        # call super to save the object:
        super().save(*args, **kwargs)

At initialization the original grade from the database is stored in __original_overall_grade. After this the value of the overall_grade attribute of this object instance might be changed.

I haven’t seen any situation in Django where accessing related objects within the __init__() method is necessary or even the best solution (The above example included). However if a developer does decide to do something like this (for whatever reason) it will create queries. This performance bug might be hard to find for that developer especially if they are not that experienced with djangos ORM.
So my main point was that there are tools for finding and preventing most n+1 query problems within a django project but not for this special case (Other than the django debug toolbar wich is still usefull for debugging this case).

KenWhitesell · August 30, 2023, 11:31am

I guess my point would be to question why do that in __init__ at all, when the docs specifically recommend against it and provide three alternative mechanisms for handling this type of situation. (Two are in the green note box in Creating objects and the third is the from_db method.)

There was even a discussion here a while ago about a problem being caused by trying to save the current value of a field in the model for tracking purposes, and doing it in the __init__ method.

Our answer to developers doing this internally is “Don’t do that.”

Topic		Replies	Views
ForeignKey Load TImes Using Django	3	485	December 14, 2021
Poor performance of NCLOB on Oracle ORM	9	1609	November 22, 2020
Automatic prefetching of ForeignKeyFields Show & Tell	1	1272	April 6, 2020
issue for upgraded to Django 5.0.3 Mystery Errors	1	98	March 26, 2024
Preventing access to models during migrations (RunPython, specifically) ORM	2	721	March 17, 2020

Proposal: Add Performance Warnings

Related Topics