Add ability to capture all the db queries at once during tests

I work in a very large Django code base, and our data is crazy, we have a lot of databases. Our tests look like this:

class TestThing(TestCase):
    databases = {"db1", "db2", "db3", "db4", ...}

    def test_thing(self):
        with self.assertNumQueries(4, using="db1"), self.assertNumQueries(0, using="db2"), self.assertNumQueries(1, using="db3"), self.assertNumQueries(0, using="db4"):
            thing()

While correct, and explicit… I fear it’s… too explicit, not to mention, we have over 4 thousand tests, and it’s an absolute mess.

Actual numbers

$ grep "def test_" **/tests/*.py | wc -l
4179

I propose, we create a new function at the SimpleTestCase level called something like assertAllNumQueries that uses the class’s databases and would make the code a bit simpler to read:

class TestThing(TestCase):
    databases = {"db1", "db2", "db3", "db4", ...}

    def test_thing(self):
        with self.assertAllNumQueries(5):
            thing()

or keep what we have, and just expand using to allow for __all__ value.

class TestThing(TestCase):
    databases = {"db1", "db2", "db3", "db4", ...}

    def test_thing(self):
        # Note that __all__ in this case would be a sum() of all the queries that need 
        # to be equal to 5. Not 5 queries each database, but collectively 5 total.
        with self.assertNumQueries(5, using="__all__"):
            thing()

One clarification: I’m not concerned with which specific database (e.g., db1 versus db4) executes the query. My goal is to detect significant increases in the overall query count. For example, if a refactor causes the total number of queries to jump tenfold, that warrants further investigation. In contrast, a change from 20 to 21 queries is trivial and acceptable if the alteration is justified.

What do you guys think?

1 Like

Hey!

You may enjoy my package django-perf-rec, which records all queries and cache operations into a YAML file as a form of snapshot testing. This is automatic, and then you can detect unexpected queries or changes to their SQL at code review time.

I am unsure about adding anything Django. The using="__all__" approach has some merit because it matches other places where we use __all__ to match all databases. But it doesn’t help a project when a user wants to assert on a subset of databases, say two out of three.

You can already achieve a multi-database shortening with a quick use of contextmanager and ExitStack from contextlib - here’s something I drafted:

from contextlib import contextmanager, ExitStack
from django.test import TestCase

class TestThing(TestCase):
    databases = {"db1", "db2", "db3", "db4"}

    @contextmanager
    def assertManyQueries(self, **queries):
        with ExitStack() as stack:
            for db, num in queries.items():
                stack.enter_context(self.assertNumQueries(num, using=db))
            yield

    def test_thing(self):
        with self.assertManyQueries(db1=4, db2=0, db3=1, db4=0):
            thing()

What do you think?

Aha, I just saw the GitHub notification from your pytest-django PR comment. That’s where this proposal comes from.

I previously wrote there:

I would propose it extending assertNumQueries to allow using to accept both an iterable of database names, like using={"default", "other"}, and some special value for all databases, probably using="__all__".

I agree with myself that supporting a subset of databases would be great. But my proposal there, to accept an iterable of names, would look like:

self.assertNumQueries(4, using={"db1", "db2", "db3", "db4"})

This is still less expressive than the demo I gave above.

Maybe better would be to allow dicts for the num parameter, allowing:

self.assertNumQueries({"db1": 4, "db2": 0, "db3": 0, "db4": 0})

What do you think?

I like the proposal:

self.assertNumQueries(4, using={"db1", "db2", "db3", "db4"})

Along with

self.assertNumQueries(4, using="__all__")

Where __all__ corresponds with your TestCase.databases.


As a shortcut, I can get behind

self.assertManyQueries(db1=4, db2=0, db3=1, db4=0)

If integrated into Django itself.


PS.

I agree with myself that supporting a subset of databases would be great.

LOL

I’ve since added a new draft PR to get the ball rolling. Right now, my immediate concern/ what i’m attempting to accomplish is to be able to do:

class Test...
    databases = {"a", "b", "c", "d"}

    def test_thing(self):
        ...
        with self.assertNumQueries(4, using="__all__"):
            Model.objects.using("a").count()
            Model.objects.using("b").count()
            Model.objects.using("c").count()
            Model.objects.using("d").count()

or

class Test...
    databases = {"a", "b", "c", "d"}

    def test_thing(self):
        ...
        with self.assertNumQueries(3, using={"a", "b", "c"}):
            Model.objects.using("a").count()
            Model.objects.using("b").count()
            Model.objects.using("c").count()
            Model.objects.using("d").count()   # does not fail because we only care about a, b, and c

This would save us a lot of code/keep the code a lot more readable.


PS. this does not, in anyway, prevent anyone from continuing to write: self.assertNumQueries(3, using="c") and only capture a single db.

Hello @kingbuzzman! I’ve reviewed the feature request you submitted in ticket #36327. Before considering its inclusion, I’d like to understand if the following approach addresses your use case:

class TestThing(TestCase):
    databases = {"db1", "db2", "db3", "db4"}

    def test_thing(self):
        for db in self.databases:
            with (self.subTest(using=db), self.assertNumQueries(3, using=db)):
                Model.objects.using(db).count()

This pattern leverages a loop combined with subTest to iterate over the specified databases, providing clear and maintainable test code. It also facilitates debugging by isolating failures to specific database contexts.

If this approach doesn’t meet your requirements, could you provide more details on the specific challenges or limitations you’re encountering?

Given that this method offers a straightforward solution without necessitating changes to Django’s core, and in the absence of a more compelling need for the proposed feature, I plan to close the ticket as wontfix, but I’ll be happy to review this decision based on further explanations.

Hi @nessita,

Imagine you have the following models. Linked to their dbs… there are a lot more models. (Neighter here nor there we use a Router the using(..) is there for illustrations)

MyUser.objects.using("default")
AuditLog.objects.using("security")
Membership.objects.using("fanclub")
ComplianceAction.objects.using("compliance")
MonitorLog.objects.using("chat")

Issue we have is that when testing views, there are places in the code where we need to hit all these places and it causes the code to look like this:

class TestThing(TestCase):
    databases = {"default", "security", "fanclub", "compliance", "chat"}

    def test_thing(self):
        ...
        with self.assertNumQueries(3, using="default"), self.assertNumQueries(2, using="security"), self.assertNumQueries(2, using="fanclub"), self.assertNumQueries(1, using="compliance"), self.assertNumQueries(1, using="chat"):
            response = self.client.get(reverse("app:view"))
            ....

What i’m proposing is a “collective” assertNumQueries that would sum them all, such as:

class TestThing(TestCase):
    databases = {"default", "security", "fanclub", "compliance", "chat"}

    def test_thing(self):
        ...
        with self.assertNumQueries(9, using="__all__"):
            response = self.client.get(reverse("app:view"))
            ....

Or if we really want to narrow it:

class TestThing(TestCase):
    databases = {"default", "security", "fanclub", "compliance", "chat"}

    def test_thing(self):
        ...
        with self.assertNumQueries(5, using={"default", "compliance", "chat"):
            response = self.client.get(reverse("app:view"))
            ....

Frankly speaking, these tests are just here for regression, im trying to ensure that we don’t have a situation where someone forgets to add a select_related or prefetch_related and the query count jumps from 9 to 90 and no one notices it. So i dont care to know that the jump happened in default vs compliance; i just want to sum all the queries together.

Note that your example will not work with the examples provided.

PS. We don’t actually use django’s TestCase directly, we use pytest-django and we made a PR there, @adamchainz suggested this should be a part of django proper instead of simply being integrated directly into pytest-django.

IMO, it’s a rare edge case, and not something useful for most users. As such, it can live in your code not it Django itself.

Our tests are getting cluttered with endless per-database assertions and that’s on us – granted.

But by introducing a collective query assertion—say, using __all__—we can simplify tests, make performance issues easier to spot, and keep everyone’s code lean and clean. This change addresses a real challenge in complex, multi-database projects and benefits the entire Django community by reducing boilerplate and making tests more maintainable.

If this doesn’t nudge you guys.. nothing will.

Thank you @kingbuzzman for the extra details. I would like to mention three things:

  1. Just a small note: if you have a moment, I’d like to invite you to check out https://heyguys.cc/. It has a thoughtful take on using more inclusive language, like “folks” instead of “guys”, when referring to a group.
  2. Regarding the use of __all__, I found the suggestion a bit unclear at first. From your example, it seems the intent is to assert the total number of queries across all databases is 8, but I initially interpreted __all__ to mean “each database should have 8 queries.” If we were to consider a change like this, we would need careful consideration when defining the param name, something like any or another term that better communicates the intended behavior.
  3. Lastly, from your use case, I still think that the example with lots of assertNumQueries could be better and more cleanly be done with subTest:
class TestThing(TestCase):
    databases = {"default", "security", "fanclub", "compliance", "chat"}

    def test_thing(self):
        ...
        queries_per_db = [(3, "default"), (2, "security"), (2, "fanclub"), (1, "compliance"), (1, "chat")]
        for amount, db in queries_per_db:
            with selgf.subTest(using=db):
                with self.assertNumQueries(amount, using=db):
                    response = self.client.get(reverse("app:view"))

I agree with Mariusz that this seems a very specific need arising from a niche use case. I don’t think this applies to the broader ecosystem, and Django is a framework designed to offer robust and accurate solutions for common scenarios.

Thank you for the reply @nessita,

  1. Apologies, I appreciate you educating me on this – force of habit; no excuse, I’ll will try harder.
  2. I can see you guysfolks getting confused by this. I really think this is a semantic issue. What reads better? using="any" vs using="__all__" – i’d argue that __all__ in this context still makes the most sense. You will be using “all” of the databases you have available in your project/assigned to the test.
  3. This is still off the mark. You would make N-calls to the client.get, when only one is required. The closest thing would be:
import contextlib

class TestThing(TestCase):
    databases = {"default", "security", "fanclub", "compliance", "chat"}

    def test_thing(self):
        ...
        queries_per_db = [(3, "default"), (2, "security"), (2, "fanclub"), (1, "compliance"), (1, "chat")]
        context = contextlib.ExitStack()
        for amount, db in queries_per_db:
            context.enter_context(self.assertNumQueries(amount, using=db))
        with context:
            response = self.client.get(reverse("app:view"))

This works, and we have something very similar already. This is still off the mark – the use case is to attempt to catch anyone fat-fingering it and missing to add a select_related or prefetch_related or a misguided for-loop that balloons the queries. In which case, there is no need to know that there should be 3 queries from default and holistically just know that there are 9 total queries.

This is a good point that I hadn’t thought of.


I think leaving this ticket as wontfix is fine. It is really niche to have a bunch of databases. It’s also not asking much to make a custom assertion function that checks the query count per DB, or a total count across all DBs.