Test time doubled after Django 3.2->4.2 and Postgres 10->13 update

I have spent the entire week debugging why our Django project’s test suite has gone from running in 30m (±5m) to 60m (±15m). I have been methodically ruling out various PRs that we had in progress and recently merged at around the time we started seeing consistently slow test runs. We have roughly 500 tests and they certainly could stand some optimization, but the sudden doubling in test time is the thing that I set out this week to figure out and rectify.

The weirdest thing was that if I ran any individual test that had lengthened by over 4x (according to the log), it would run in a normal/expected running time. But if I ran the entire test suite, that test would take more than 4 times as long.

Today, I happened to be watching one of the tests when I noticed it appeared to hang after it was done running a load, which made no sense, so I added some debug prints between every line of one of the affected tests:

    def test_labeled_elements_common_with_compound(self):
        """
        Test to ensure count 0 entries are not created when measured compound doesn't have that element
        """
        import time
        startTime = time.time()
        # When I run the test by itself, it takes 94 seconds
        print("Debug prints for test, because it took 383 seconds in one full test suite run")
        print("Loading animals and samples at %.3f" % (time.time() - startTime))
        self.load_multilabel_data()
        print("Loading isocorr data at %.3f" % (time.time() - startTime))
        call_command(
            "load_accucor_msruns",
            accucor_file="DataRepo/example_data/obob_fasted_glc_lac_gln_ala_multiple_labels/"
            "alafasted_cor.xlsx",
            protocol="Default",
            date="2021-04-29",
            researcher="anonymous",
            new_researcher=False,
            isocorr_format=True,
            skip_cache_updates=True,
        )
        print("Making distinct/ordered query at %.3f" % (time.time() - startTime))
        pg = (
            PeakGroup.objects.filter(msrun__sample__name="xzl5_panc")
            .filter(name__exact="serine")
            .filter(
                peak_group_set__filename="alafasted_cor.xlsx",
            )
            .order_by("id", "peak_data__labels__element")
            .distinct("id", "peak_data__labels__element")
        )

        print("Asserting peak group count at %.3f" % (time.time() - startTime))
        self.assertEqual(pg.count(), 2)
        print("Asserting carbon label count at %.3f" % (time.time() - startTime))
        self.assertEqual(pg.filter(peak_data__labels__element__exact="C").count(), 1)
        print("Asserting nitrogen label count at %.3f" % (time.time() - startTime))
        self.assertEqual(pg.filter(peak_data__labels__element__exact="N").count(), 1)
        print("Test done at %.3f" % (time.time() - startTime))

And here is the result…

When the test is run by itself…

Debug prints for test, because it took 383 seconds in one full test suite run
Loading animals and samples at 0.000
Loading isocorr data at 6.486
Making distinct/ordered query at 92.031
Asserting peak group count at 92.032
Asserting carbon label count at 92.035
Asserting nitrogen label count at 92.037
Test done at 92.040

When run with the entire suite…

Debug prints for test, because it took 383 seconds in one full test suite run
Loading animals and samples at 0.000
Loading isocorr data at 6.242
Making distinct/ordered query at 110.290
Asserting peak group count at 110.290
Asserting carbon label count at 110.329
Asserting nitrogen label count at 255.194
Test done at 396.944

Everything actually runs really fast (or rather, as fast as has been the case with Django 3.2 and postgres 10), but the things that run very slow (at least in this particular case), appear to be:

pg.filter(peak_data__labels__element__exact="C").count()
pg.filter(peak_data__labels__element__exact="N").count()

So the slowness problem appears to possibly be related to postgres (or Django’s interface to it). In Django 3.2 and postgres 10, this test always ran in a time on par with when I now run just that one test (~90s).

What could be going on here?

I would have guessed that perhaps I needed to rebuild postgres indexes, but this also happens in our freshly built github test environment:

    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:13.4
        env:
          POSTGRES_USER: postgres
          POSTGRES_PASSWORD: postgres
          POSTGRES_DB: tracebase
        ports:
          - 5432:5432
        # needed because the postgres container does not provide a healthcheck
        options: >-
          --health-cmd pg_isready --health-interval 10s
          --health-timeout 5s --health-retries 5

Is this a known issue with either Django 4.2 and/or postgres 13?

Hello @hepcat72 :wave:

One thing that would be greatly useful to debug the issue further if you’ve narrowed it down to these two .count() is what the generated SQL looks like.

It’d be great to know if it changed between 3.2 and 4.2 and what your model definitions look like otherwise it’s going to be very hard to help you further.

4.2 shipped with some improvements to aggregation and there was also some changes in how .exclude is implemented but from a cursory look at your report nothing jumps out.

Thanks. I will do that today and report back.

1 Like

OK. I ran 4 cases (one of them twice). Interestingly, the first time I ran the 1 case where it repeatedly has run consistently slowly, it ran fast. When I repeated the test, it again ran slowly.

Here is the code for the test I ran. (I just put back 2 print statements, compared to what I pasted above.):

    @override_settings(DEBUG=True)
    def test_labeled_elements_common_with_compound(self):
        """
        Test to ensure count 0 entries are not created when measured compound doesn't have that element
        """
        self.load_multilabel_data()
        call_command(
            "load_accucor_msruns",
            accucor_file="DataRepo/example_data/obob_fasted_glc_lac_gln_ala_multiple_labels/"
            "alafasted_cor.xlsx",
            protocol="Default",
            date="2021-04-29",
            researcher="Xianfeng Zeng",
            new_researcher=False,
            isocorr_format=True,
            skip_cache_updates=True,
        )
        pg = (
            PeakGroup.objects.filter(msrun__sample__name="xzl5_panc")
            .filter(name__exact="serine")
            .filter(
                peak_annotation_file__filename="alafasted_cor.xlsx",
            )
            .order_by("id", "peak_data__labels__element")
            .distinct("id", "peak_data__labels__element")
        )

        self.assertEqual(pg.count(), 2)
        print("TEST CARBON COUNT")
        self.assertEqual(pg.filter(peak_data__labels__element__exact="C").count(), 1)
        print("TEST NITROGEN COUNT")
        self.assertEqual(pg.filter(peak_data__labels__element__exact="N").count(), 1)

I changed my .env to indicate either DATABASE_PORT=5433 for postgres 13 or DATABASE_PORT=5432 for postgres 10 and I edited the requirements to indicate either Django==4.2.4 or Django==3.2.20 and did python -m pip install -r requirements/dev.txt to use different Django versions.

And for completeness, I set this in the settings:

LOGGING = {
    "version": 1,
    "filters": {
        "require_debug_true": {
            "()": "django.utils.log.RequireDebugTrue",
        }
    },
    "handlers": {
        "console": {
            "level": "DEBUG",
            "filters": ["require_debug_true"],
            "class": "logging.StreamHandler",
        }
    },
    "loggers": {
        "django.db.backends": {
            "level": "DEBUG",
            "handlers": ["console"],
        }
    },
}

Results

Case 1 (FAST) Django 4.2, postgres 13

python manage.py test -k test_labeled_elements_common_with_compound

TEST CARBON COUNT
(0.001) SELECT COUNT(*) FROM (SELECT DISTINCT ON ("DataRepo_peakgroup"."id", "DataRepo_peakdatalabel"."element") "DataRepo_peakgroup"."id" AS "col1", "DataRepo_peakgroup"."name" AS "col2", "DataRepo_peakgroup"."formula" AS "col3", "DataRepo_peakgroup"."msrun_id" AS "col4", "DataRepo_peakgroup"."peak_annotation_file_id" AS "col5" FROM "DataRepo_peakgroup" INNER JOIN "DataRepo_msrun" ON ("DataRepo_peakgroup"."msrun_id" = "DataRepo_msrun"."id") INNER JOIN "DataRepo_sample" ON ("DataRepo_msrun"."sample_id" = "DataRepo_sample"."id") INNER JOIN "DataRepo_archivefile" ON ("DataRepo_peakgroup"."peak_annotation_file_id" = "DataRepo_archivefile"."id") INNER JOIN "DataRepo_peakdata" ON ("DataRepo_peakgroup"."id" = "DataRepo_peakdata"."peak_group_id") INNER JOIN "DataRepo_peakdatalabel" ON ("DataRepo_peakdata"."id" = "DataRepo_peakdatalabel"."peak_data_id") WHERE ("DataRepo_sample"."name" = 'xzl5_panc' AND "DataRepo_peakgroup"."name" = 'serine' AND "DataRepo_archivefile"."filename" = 'alafasted_cor.xlsx' AND "DataRepo_peakdatalabel"."element" = 'C') ORDER BY "DataRepo_peakgroup"."id" ASC, "DataRepo_peakdatalabel"."element" ASC) subquery; args=('xzl5_panc', 'serine', 'alafasted_cor.xlsx', 'C'); alias=default
TEST NITROGEN COUNT
(0.002) SELECT COUNT(*) FROM (SELECT DISTINCT ON ("DataRepo_peakgroup"."id", "DataRepo_peakdatalabel"."element") "DataRepo_peakgroup"."id" AS "col1", "DataRepo_peakgroup"."name" AS "col2", "DataRepo_peakgroup"."formula" AS "col3", "DataRepo_peakgroup"."msrun_id" AS "col4", "DataRepo_peakgroup"."peak_annotation_file_id" AS "col5" FROM "DataRepo_peakgroup" INNER JOIN "DataRepo_msrun" ON ("DataRepo_peakgroup"."msrun_id" = "DataRepo_msrun"."id") INNER JOIN "DataRepo_sample" ON ("DataRepo_msrun"."sample_id" = "DataRepo_sample"."id") INNER JOIN "DataRepo_archivefile" ON ("DataRepo_peakgroup"."peak_annotation_file_id" = "DataRepo_archivefile"."id") INNER JOIN "DataRepo_peakdata" ON ("DataRepo_peakgroup"."id" = "DataRepo_peakdata"."peak_group_id") INNER JOIN "DataRepo_peakdatalabel" ON ("DataRepo_peakdata"."id" = "DataRepo_peakdatalabel"."peak_data_id") WHERE ("DataRepo_sample"."name" = 'xzl5_panc' AND "DataRepo_peakgroup"."name" = 'serine' AND "DataRepo_archivefile"."filename" = 'alafasted_cor.xlsx' AND "DataRepo_peakdatalabel"."element" = 'N') ORDER BY "DataRepo_peakgroup"."id" ASC, "DataRepo_peakdatalabel"."element" ASC) subquery; args=('xzl5_panc', 'serine', 'alafasted_cor.xlsx', 'N'); alias=default

Case 2 (UNEXPECTEDLY FAST) Django 4.2, postgres 13

python manage.py test

TEST CARBON COUNT
(0.020) SELECT COUNT(*) FROM (SELECT DISTINCT ON ("DataRepo_peakgroup"."id", "DataRepo_peakdatalabel"."element") "DataRepo_peakgroup"."id" AS "col1", "DataRepo_peakgroup"."name" AS "col2", "DataRepo_peakgroup"."formula" AS "col3", "DataRepo_peakgroup"."msrun_id" AS "col4", "DataRepo_peakgroup"."peak_annotation_file_id" AS "col5" FROM "DataRepo_peakgroup" INNER JOIN "DataRepo_msrun" ON ("DataRepo_peakgroup"."msrun_id" = "DataRepo_msrun"."id") INNER JOIN "DataRepo_sample" ON ("DataRepo_msrun"."sample_id" = "DataRepo_sample"."id") INNER JOIN "DataRepo_archivefile" ON ("DataRepo_peakgroup"."peak_annotation_file_id" = "DataRepo_archivefile"."id") INNER JOIN "DataRepo_peakdata" ON ("DataRepo_peakgroup"."id" = "DataRepo_peakdata"."peak_group_id") INNER JOIN "DataRepo_peakdatalabel" ON ("DataRepo_peakdata"."id" = "DataRepo_peakdatalabel"."peak_data_id") WHERE ("DataRepo_sample"."name" = 'xzl5_panc' AND "DataRepo_peakgroup"."name" = 'serine' AND "DataRepo_archivefile"."filename" = 'alafasted_cor.xlsx' AND "DataRepo_peakdatalabel"."element" = 'C') ORDER BY "DataRepo_peakgroup"."id" ASC, "DataRepo_peakdatalabel"."element" ASC) subquery; args=('xzl5_panc', 'serine', 'alafasted_cor.xlsx', 'C'); alias=default
TEST NITROGEN COUNT
(0.021) SELECT COUNT(*) FROM (SELECT DISTINCT ON ("DataRepo_peakgroup"."id", "DataRepo_peakdatalabel"."element") "DataRepo_peakgroup"."id" AS "col1", "DataRepo_peakgroup"."name" AS "col2", "DataRepo_peakgroup"."formula" AS "col3", "DataRepo_peakgroup"."msrun_id" AS "col4", "DataRepo_peakgroup"."peak_annotation_file_id" AS "col5" FROM "DataRepo_peakgroup" INNER JOIN "DataRepo_msrun" ON ("DataRepo_peakgroup"."msrun_id" = "DataRepo_msrun"."id") INNER JOIN "DataRepo_sample" ON ("DataRepo_msrun"."sample_id" = "DataRepo_sample"."id") INNER JOIN "DataRepo_archivefile" ON ("DataRepo_peakgroup"."peak_annotation_file_id" = "DataRepo_archivefile"."id") INNER JOIN "DataRepo_peakdata" ON ("DataRepo_peakgroup"."id" = "DataRepo_peakdata"."peak_group_id") INNER JOIN "DataRepo_peakdatalabel" ON ("DataRepo_peakdata"."id" = "DataRepo_peakdatalabel"."peak_data_id") WHERE ("DataRepo_sample"."name" = 'xzl5_panc' AND "DataRepo_peakgroup"."name" = 'serine' AND "DataRepo_archivefile"."filename" = 'alafasted_cor.xlsx' AND "DataRepo_peakdatalabel"."element" = 'N') ORDER BY "DataRepo_peakgroup"."id" ASC, "DataRepo_peakdatalabel"."element" ASC) subquery; args=('xzl5_panc', 'serine', 'alafasted_cor.xlsx', 'N'); alias=default

Repeated Case 2 exactly (SLOW)

TEST CARBON COUNT
(147.730) SELECT COUNT(*) FROM (SELECT DISTINCT ON ("DataRepo_peakgroup"."id", "DataRepo_peakdatalabel"."element") "DataRepo_peakgroup"."id" AS "col1", "DataRepo_peakgroup"."name" AS "col2", "DataRepo_peakgroup"."formula" AS "col3", "DataRepo_peakgroup"."msrun_id" AS "col4", "DataRepo_peakgroup"."peak_annotation_file_id" AS "col5" FROM "DataRepo_peakgroup" INNER JOIN "DataRepo_msrun" ON ("DataRepo_peakgroup"."msrun_id" = "DataRepo_msrun"."id") INNER JOIN "DataRepo_sample" ON ("DataRepo_msrun"."sample_id" = "DataRepo_sample"."id") INNER JOIN "DataRepo_archivefile" ON ("DataRepo_peakgroup"."peak_annotation_file_id" = "DataRepo_archivefile"."id") INNER JOIN "DataRepo_peakdata" ON ("DataRepo_peakgroup"."id" = "DataRepo_peakdata"."peak_group_id") INNER JOIN "DataRepo_peakdatalabel" ON ("DataRepo_peakdata"."id" = "DataRepo_peakdatalabel"."peak_data_id") WHERE ("DataRepo_sample"."name" = 'xzl5_panc' AND "DataRepo_peakgroup"."name" = 'serine' AND "DataRepo_archivefile"."filename" = 'alafasted_cor.xlsx' AND "DataRepo_peakdatalabel"."element" = 'C') ORDER BY "DataRepo_peakgroup"."id" ASC, "DataRepo_peakdatalabel"."element" ASC) subquery; args=('xzl5_panc', 'serine', 'alafasted_cor.xlsx', 'C'); alias=default
TEST NITROGEN COUNT
(145.653) SELECT COUNT(*) FROM (SELECT DISTINCT ON ("DataRepo_peakgroup"."id", "DataRepo_peakdatalabel"."element") "DataRepo_peakgroup"."id" AS "col1", "DataRepo_peakgroup"."name" AS "col2", "DataRepo_peakgroup"."formula" AS "col3", "DataRepo_peakgroup"."msrun_id" AS "col4", "DataRepo_peakgroup"."peak_annotation_file_id" AS "col5" FROM "DataRepo_peakgroup" INNER JOIN "DataRepo_msrun" ON ("DataRepo_peakgroup"."msrun_id" = "DataRepo_msrun"."id") INNER JOIN "DataRepo_sample" ON ("DataRepo_msrun"."sample_id" = "DataRepo_sample"."id") INNER JOIN "DataRepo_archivefile" ON ("DataRepo_peakgroup"."peak_annotation_file_id" = "DataRepo_archivefile"."id") INNER JOIN "DataRepo_peakdata" ON ("DataRepo_peakgroup"."id" = "DataRepo_peakdata"."peak_group_id") INNER JOIN "DataRepo_peakdatalabel" ON ("DataRepo_peakdata"."id" = "DataRepo_peakdatalabel"."peak_data_id") WHERE ("DataRepo_sample"."name" = 'xzl5_panc' AND "DataRepo_peakgroup"."name" = 'serine' AND "DataRepo_archivefile"."filename" = 'alafasted_cor.xlsx' AND "DataRepo_peakdatalabel"."element" = 'N') ORDER BY "DataRepo_peakgroup"."id" ASC, "DataRepo_peakdatalabel"."element" ASC) subquery; args=('xzl5_panc', 'serine', 'alafasted_cor.xlsx', 'N'); alias=default

Case 3 (FAST) Django 3.2, postgres 10

python manage.py test -k test_labeled_elements_common_with_compound

TEST CARBON COUNT
(0.001) SELECT COUNT(*) FROM (SELECT DISTINCT ON ("DataRepo_peakgroup"."id", "DataRepo_peakdatalabel"."element") "DataRepo_peakgroup"."id" AS Col1, "DataRepo_peakgroup"."name" AS Col2, "DataRepo_peakgroup"."formula" AS Col3, "DataRepo_peakgroup"."msrun_id" AS Col4, "DataRepo_peakgroup"."peak_annotation_file_id" AS Col5 FROM "DataRepo_peakgroup" INNER JOIN "DataRepo_msrun" ON ("DataRepo_peakgroup"."msrun_id" = "DataRepo_msrun"."id") INNER JOIN "DataRepo_sample" ON ("DataRepo_msrun"."sample_id" = "DataRepo_sample"."id") INNER JOIN "DataRepo_archivefile" ON ("DataRepo_peakgroup"."peak_annotation_file_id" = "DataRepo_archivefile"."id") INNER JOIN "DataRepo_peakdata" ON ("DataRepo_peakgroup"."id" = "DataRepo_peakdata"."peak_group_id") INNER JOIN "DataRepo_peakdatalabel" ON ("DataRepo_peakdata"."id" = "DataRepo_peakdatalabel"."peak_data_id") WHERE ("DataRepo_sample"."name" = 'xzl5_panc' AND "DataRepo_peakgroup"."name" = 'serine' AND "DataRepo_archivefile"."filename" = 'alafasted_cor.xlsx' AND "DataRepo_peakdatalabel"."element" = 'C') ORDER BY "DataRepo_peakgroup"."id" ASC, "DataRepo_peakdatalabel"."element" ASC) subquery; args=('xzl5_panc', 'serine', 'alafasted_cor.xlsx', 'C')
TEST NITROGEN COUNT
(0.002) SELECT COUNT(*) FROM (SELECT DISTINCT ON ("DataRepo_peakgroup"."id", "DataRepo_peakdatalabel"."element") "DataRepo_peakgroup"."id" AS Col1, "DataRepo_peakgroup"."name" AS Col2, "DataRepo_peakgroup"."formula" AS Col3, "DataRepo_peakgroup"."msrun_id" AS Col4, "DataRepo_peakgroup"."peak_annotation_file_id" AS Col5 FROM "DataRepo_peakgroup" INNER JOIN "DataRepo_msrun" ON ("DataRepo_peakgroup"."msrun_id" = "DataRepo_msrun"."id") INNER JOIN "DataRepo_sample" ON ("DataRepo_msrun"."sample_id" = "DataRepo_sample"."id") INNER JOIN "DataRepo_archivefile" ON ("DataRepo_peakgroup"."peak_annotation_file_id" = "DataRepo_archivefile"."id") INNER JOIN "DataRepo_peakdata" ON ("DataRepo_peakgroup"."id" = "DataRepo_peakdata"."peak_group_id") INNER JOIN "DataRepo_peakdatalabel" ON ("DataRepo_peakdata"."id" = "DataRepo_peakdatalabel"."peak_data_id") WHERE ("DataRepo_sample"."name" = 'xzl5_panc' AND "DataRepo_peakgroup"."name" = 'serine' AND "DataRepo_archivefile"."filename" = 'alafasted_cor.xlsx' AND "DataRepo_peakdatalabel"."element" = 'N') ORDER BY "DataRepo_peakgroup"."id" ASC, "DataRepo_peakdatalabel"."element" ASC) subquery; args=('xzl5_panc', 'serine', 'alafasted_cor.xlsx', 'N')

Case 4 (FAST) Django 3.2, postgres 10

python manage.py test

TEST CARBON COUNT
(0.001) SELECT COUNT(*) FROM (SELECT DISTINCT ON ("DataRepo_peakgroup"."id", "DataRepo_peakdatalabel"."element") "DataRepo_peakgroup"."id" AS Col1, "DataRepo_peakgroup"."name" AS Col2, "DataRepo_peakgroup"."formula" AS Col3, "DataRepo_peakgroup"."msrun_id" AS Col4, "DataRepo_peakgroup"."peak_annotation_file_id" AS Col5 FROM "DataRepo_peakgroup" INNER JOIN "DataRepo_msrun" ON ("DataRepo_peakgroup"."msrun_id" = "DataRepo_msrun"."id") INNER JOIN "DataRepo_sample" ON ("DataRepo_msrun"."sample_id" = "DataRepo_sample"."id") INNER JOIN "DataRepo_archivefile" ON ("DataRepo_peakgroup"."peak_annotation_file_id" = "DataRepo_archivefile"."id") INNER JOIN "DataRepo_peakdata" ON ("DataRepo_peakgroup"."id" = "DataRepo_peakdata"."peak_group_id") INNER JOIN "DataRepo_peakdatalabel" ON ("DataRepo_peakdata"."id" = "DataRepo_peakdatalabel"."peak_data_id") WHERE ("DataRepo_sample"."name" = 'xzl5_panc' AND "DataRepo_peakgroup"."name" = 'serine' AND "DataRepo_archivefile"."filename" = 'alafasted_cor.xlsx' AND "DataRepo_peakdatalabel"."element" = 'C') ORDER BY "DataRepo_peakgroup"."id" ASC, "DataRepo_peakdatalabel"."element" ASC) subquery; args=('xzl5_panc', 'serine', 'alafasted_cor.xlsx', 'C')
TEST NITROGEN COUNT
(0.001) SELECT COUNT(*) FROM (SELECT DISTINCT ON ("DataRepo_peakgroup"."id", "DataRepo_peakdatalabel"."element") "DataRepo_peakgroup"."id" AS Col1, "DataRepo_peakgroup"."name" AS Col2, "DataRepo_peakgroup"."formula" AS Col3, "DataRepo_peakgroup"."msrun_id" AS Col4, "DataRepo_peakgroup"."peak_annotation_file_id" AS Col5 FROM "DataRepo_peakgroup" INNER JOIN "DataRepo_msrun" ON ("DataRepo_peakgroup"."msrun_id" = "DataRepo_msrun"."id") INNER JOIN "DataRepo_sample" ON ("DataRepo_msrun"."sample_id" = "DataRepo_sample"."id") INNER JOIN "DataRepo_archivefile" ON ("DataRepo_peakgroup"."peak_annotation_file_id" = "DataRepo_archivefile"."id") INNER JOIN "DataRepo_peakdata" ON ("DataRepo_peakgroup"."id" = "DataRepo_peakdata"."peak_group_id") INNER JOIN "DataRepo_peakdatalabel" ON ("DataRepo_peakdata"."id" = "DataRepo_peakdatalabel"."peak_data_id") WHERE ("DataRepo_sample"."name" = 'xzl5_panc' AND "DataRepo_peakgroup"."name" = 'serine' AND "DataRepo_archivefile"."filename" = 'alafasted_cor.xlsx' AND "DataRepo_peakdatalabel"."element" = 'N') ORDER BY "DataRepo_peakgroup"."id" ASC, "DataRepo_peakdatalabel"."element" ASC) subquery; args=('xzl5_panc', 'serine', 'alafasted_cor.xlsx', 'N')

So the test slowness (per test) is apparently at least somewhat intermittent, which perhaps is why the running time on github has a wider running time range (60m ±15m). Out of the roughly half dozen times I’ve run all tests locally, this is the only time I’ve seen it run fast under Django 4.2 and postgres 13. Otherwise, that one test runs consistently slowly when running all tests and consistently fast when running an individual test.

BTW, I was reading one of your other posts and saw that the query optimization in Django 4.2 is based on psycopg3 and we still have psycopg2 in the requirements, so I tried pip install "psycopg[binary]". I ran the dj4.2/pg13 version twice and again encountered the slowness (on the second attempt).

Thank you for providing more details.

The good news is that the exact (except for some alias case and quoting) SQL is generated in both cases so something else is causing the slowness.

Do you know if the slow instances of the counts query are performed on way rows than the first time?

Could it be that the ordering of tests changed between both versions and some tests that break data isolation happen to generate more data by the time the count one runs?

The fact that the queries spend all of that time waiting for Postgres to execute with the exact same SQL between versions makes me believe that the slowless might have more to do with setup difference between 10 and 13 than the Django version bump.

I don’t understand your question. Could you rephrase?

I’m actually in the middle of testing the ordering, however as of today, I have repeated this test a few time and it has been slow 50% of the time. I had just determined that this is the order of the tests each time (according to the debug output of our test suite):

STARTING TEST: DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests.test_accucor_bad_label at 10/30/2023, 16:28:52
STARTING TEST: DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests.test_accucor_load_blank_fail at 10/30/2023, 16:28:52
STARTING TEST: DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests.test_accucor_load_blank_skip at 10/30/2023, 16:28:54
STARTING TEST: DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests.test_accucor_load_in_debug at 10/30/2023, 16:28:55
STARTING TEST: DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests.test_accucor_load_sample_prefix at 10/30/2023, 16:28:57
STARTING TEST: DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests.test_accucor_load_sample_prefix_missing at 10/30/2023, 16:28:57
STARTING TEST: DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests.test_conflicting_peak_group at 10/30/2023, 16:28:57
STARTING TEST: DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests.test_conflicting_peakgroups at 10/30/2023, 16:28:58
STARTING TEST: DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests.test_duplicate_compounds_one_msrun at 10/30/2023, 16:28:58
STARTING TEST: DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests.test_duplicate_peak_group at 10/30/2023, 16:28:58
STARTING TEST: DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests.test_multiple_accucor_labels at 10/30/2023, 16:28:59
STARTING TEST: DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests.test_multiple_accucor_one_msrun at 10/30/2023, 16:29:00
STARTING TEST: DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests.test_record_missing_compound at 10/30/2023, 16:29:02
STARTING TEST: DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_labeled_elements_common_with_compound at 10/30/2023, 16:29:06

So the first test class to run is AccuCorDataLoadingTests and the first test in the second class to run is the one that experiences slowness. Regarding the status of the test database for each of these tests, the DB should be (as far as I understand it) reset each time:

@override_settings(CACHES=settings.TEST_CACHES)
class AccuCorDataLoadingTests(TracebaseTestCase):
    fixtures = ["data_types.yaml", "data_formats.yaml", "lc_methods.yaml"]

    @classmethod
    def setUpTestData(cls):
        call_command(
            "load_study",
            "DataRepo/example_data/small_dataset/small_obob_study_prerequisites.yaml",
        )

        call_command(
            "load_animals_and_samples",
            animal_and_sample_table_filename=(
                "DataRepo/example_data/small_dataset/"
                "small_obob_animal_and_sample_table.xlsx"
            ),
            skip_cache_updates=True,
        )

        super().setUpTestData()

@override_settings(CACHES=settings.TEST_CACHES)
class IsoCorrDataLoadingTests(TracebaseTestCase):
    fixtures = ["data_types.yaml", "data_formats.yaml", "lc_methods.yaml"]

    @classmethod
    def setUpTestData(cls):
        call_command(
            "load_study",
            "DataRepo/example_data/protocols/loading.yaml",
            verbosity=2,
        )
        call_command(
            "load_study",
            "DataRepo/example_data/tissues/loading.yaml",
            verbosity=2,
        )
        call_command(
            "load_compounds",
            compounds="DataRepo/example_data/consolidated_tracebase_compound_list.tsv",
            verbosity=2,
        )

        call_command(
            "load_animals_and_samples",
            animal_and_sample_table_filename=(
                "DataRepo/example_data/AsaelR_13C-Valine+PI3Ki_flank-KPC_2021-12_isocorr_CN-corrected/"
                "TraceBase Animal and Sample Table Templates_AR.xlsx"
            ),
            skip_researcher_check=True,
            skip_cache_updates=True,
        )

        super().setUpTestData()

Each class has at least 1 member method to load some data and/or load data in the test itself. Neither has a setUp or tearDown override.

I could dig through some logs to evaluate test order (because our test suite prints tests with their running times), but as far as I understand it, the tests should be in the same order every time, regardless of version. (Incidentally, do you know if there is a way to just print the test order? I’ve tried installing an app called list_tests, but I’ve come to learn that it misses some tests.)

I determined that if I just run python manage.py test DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_labeled_elements_common_with_compound, I can reproduce the slowness. I am about to run the following to see if I can determine any clues as to the problem and see if there is an order difference when the slowness occurs/doesn’t occur:

date;gtimeout 130 python manage.py test DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_labeled_elements_common_with_compound; \
date;gtimeout 130 python manage.py test DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_labeled_elements_common_with_compound; \
date;gtimeout 130 python manage.py test DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_labeled_elements_common_with_compound; \
date;gtimeout 130 python manage.py test DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_labeled_elements_common_with_compound; \
date;gtimeout 130 python manage.py test DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_labeled_elements_common_with_compound; \
date;gtimeout 130 python manage.py test DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_labeled_elements_common_with_compound; \
date;gtimeout 130 python manage.py test DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_labeled_elements_common_with_compound; \
date;gtimeout 130 python manage.py test DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_labeled_elements_common_with_compound; \
date;gtimeout 130 python manage.py test DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_labeled_elements_common_with_compound; \
date;gtimeout 130 python manage.py test DataRepo.tests.loading.test_load_accucor_msruns.AccuCorDataLoadingTests DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_labeled_elements_common_with_compound; \
date;

The gtimeout is just to make this as fast as possible. When the test is “fast”, it runs anywhere from 80s to 107s, but when it’s slow, it runs in over 350s.

Could you add a breakpoint to your tests and run EXPLAIN SELECT COUNT(*) FROM (...) and see how different the plan is between slow and fast runs?

from django.db.models import connection
with connection.cursor() as cursor:
   cursor.execute("EXPLAIN ...")
   print(cursor.fetchone())

Also did you try running Django 3.2 against Postgres 13 to see if you can reproduce the failure? If that’s the case you should likely seek help from Postgres support channel as well. I suspect the query plan provided from EXPLAIN will help greatly here.

Also, I tried running the 1 test by itself about 6 or 7 times in a row and it was always fast, so that suggests you may be onto something about the database state after particular tests, but if something like that is happening, I cannot see how it wouldn’t be initializing the database for that 1 test (which is the first and only test that runs in that class in this test case). I tried supplying --reverse to see if that fixes this one case. It ran fast the one time I tried it, so that supports the notion that it has to do with the state of the database between tests. But also, why would repeating the exact same scenario be fast the first time and slow the next? Could there be some sort of async timing issue?

That’s a good idea, I’ll give it a shot. My timeout idea hasn’t really worked out anyway. When it kills the python process, the DB is left in a bad state.

Oh geez. Yes, it happens in Django 3.2 when using postgres 13… and apparently I’ve been unintentionally testing that scenario for the past 30m. I forgot to reinstall 4.2 when I intended to switch back to 4.2 & 13.

All sorts of things could happen at the database level. For example, the dataset could be in memory, the plan might have been JIT compiled, and or a different index might be used.

Hopefully the usage of EXPLAIN will shed some more light here.

In this case I suggest comparing the output of EXPLAIN.

It’s awkward to get as Django immediately executes aggregate instead of returning a lazy object so you can’t do .aggregate(...).explain() but hopefully the cursor.execute example will allow you to get the right data.

1 Like

Where does connection live? I get ImportError: cannot import name 'connection' from 'django.db.models'

It’s in django.db
(added to make 20 character response)

1 Like

OK. Finally worked this out. I haven’t used EXPLAIN before, so I had a few false starts. Not sure if this is exactly how you intended me to do this, but here’s what I did in the test:

        from django.db import connection
        with connection.cursor() as cursor:
            cursor.execute("EXPLAIN (ANALYZE) SELECT COUNT(*) FROM (SELECT DISTINCT ON (\"DataRepo_peakgroup\".\"id\", \"DataRepo_peakdatalabel\".\"element\") \"DataRepo_peakgroup\".\"id\" AS \"col1\", \"DataRepo_peakgroup\".\"name\" AS \"col2\", \"DataRepo_peakgroup\".\"formula\" AS \"col3\", \"DataRepo_peakgroup\".\"msrun_id\" AS \"col4\", \"DataRepo_peakgroup\".\"peak_annotation_file_id\" AS \"col5\" FROM \"DataRepo_peakgroup\" INNER JOIN \"DataRepo_msrun\" ON (\"DataRepo_peakgroup\".\"msrun_id\" = \"DataRepo_msrun\".\"id\") INNER JOIN \"DataRepo_sample\" ON (\"DataRepo_msrun\".\"sample_id\" = \"DataRepo_sample\".\"id\") INNER JOIN \"DataRepo_archivefile\" ON (\"DataRepo_peakgroup\".\"peak_annotation_file_id\" = \"DataRepo_archivefile\".\"id\") INNER JOIN \"DataRepo_peakdata\" ON (\"DataRepo_peakgroup\".\"id\" = \"DataRepo_peakdata\".\"peak_group_id\") INNER JOIN \"DataRepo_peakdatalabel\" ON (\"DataRepo_peakdata\".\"id\" = \"DataRepo_peakdatalabel\".\"peak_data_id\") WHERE (\"DataRepo_sample\".\"name\" = 'xzl5_panc' AND \"DataRepo_peakgroup\".\"name\" = 'serine' AND \"DataRepo_archivefile\".\"filename\" = 'alafasted_cor.xlsx' AND \"DataRepo_peakdatalabel\".\"element\" = 'C') ORDER BY \"DataRepo_peakgroup\".\"id\" ASC, \"DataRepo_peakdatalabel\".\"element\" ASC) subquery;")  # args=('xzl5_panc', 'serine', 'alafasted_cor.xlsx', 'C')")
            print(cursor.fetchone())

        # self.assertEqual(pg.filter(peak_data__labels__element__exact="C").count(), 1)
        # self.assertEqual(pg.filter(peak_data__labels__element__exact="N").count(), 1)

Here’s the output when slowness is encountered (running multiple tests):

('Aggregate  (cost=45.77..45.78 rows=1 width=8) (actual time=142972.335..142972.338 rows=1 loops=1)',)

Here is the output when it’s fast (running 1 test):

('Aggregate  (cost=99.01..99.02 rows=1 width=8) (actual time=0.674..0.677 rows=1 loops=1)',)

There should be a lot more output for those explains. You may need to do a fetchall on that cursor. But even that summary line is interesting. Clearly the planner has come up with vastly different results in the two cases.

Follow up suggestion, too - when you print the output from that, print each element of the tuple on it’s own line. It’s going to be a lot easier to read it that way.

1 Like

OK. I edited the test code thusly:

        from django.db import connection
        with connection.cursor() as cursor:
            cursor.execute("EXPLAIN (ANALYZE) SELECT COUNT(*) FROM (SELECT DISTINCT ON (\"DataRepo_peakgroup\".\"id\", \"DataRepo_peakdatalabel\".\"element\") \"DataRepo_peakgroup\".\"id\" AS \"col1\", \"DataRepo_peakgroup\".\"name\" AS \"col2\", \"DataRepo_peakgroup\".\"formula\" AS \"col3\", \"DataRepo_peakgroup\".\"msrun_id\" AS \"col4\", \"DataRepo_peakgroup\".\"peak_annotation_file_id\" AS \"col5\" FROM \"DataRepo_peakgroup\" INNER JOIN \"DataRepo_msrun\" ON (\"DataRepo_peakgroup\".\"msrun_id\" = \"DataRepo_msrun\".\"id\") INNER JOIN \"DataRepo_sample\" ON (\"DataRepo_msrun\".\"sample_id\" = \"DataRepo_sample\".\"id\") INNER JOIN \"DataRepo_archivefile\" ON (\"DataRepo_peakgroup\".\"peak_annotation_file_id\" = \"DataRepo_archivefile\".\"id\") INNER JOIN \"DataRepo_peakdata\" ON (\"DataRepo_peakgroup\".\"id\" = \"DataRepo_peakdata\".\"peak_group_id\") INNER JOIN \"DataRepo_peakdatalabel\" ON (\"DataRepo_peakdata\".\"id\" = \"DataRepo_peakdatalabel\".\"peak_data_id\") WHERE (\"DataRepo_sample\".\"name\" = 'xzl5_panc' AND \"DataRepo_peakgroup\".\"name\" = 'serine' AND \"DataRepo_archivefile\".\"filename\" = 'alafasted_cor.xlsx' AND \"DataRepo_peakdatalabel\".\"element\" = 'C') ORDER BY \"DataRepo_peakgroup\".\"id\" ASC, \"DataRepo_peakdatalabel\".\"element\" ASC) subquery;")  # args=('xzl5_panc', 'serine', 'alafasted_cor.xlsx', 'C')")
            for l in cursor.fetchall():
                print(l)

Here is the Django 4.2 and postgres 13 output:

('Aggregate  (cost=45.77..45.78 rows=1 width=8) (actual time=202326.210..202326.214 rows=1 loops=1)',)
('  ->  Unique  (cost=0.88..45.76 rows=1 width=1052) (actual time=10492.963..202326.205 rows=1 loops=1)',)
('        ->  Nested Loop  (cost=0.88..45.75 rows=1 width=1052) (actual time=10492.959..202326.188 rows=8 loops=1)',)
('              Join Filter: ("DataRepo_peakdata".id = "DataRepo_peakdatalabel".peak_data_id)',)
('              Rows Removed by Join Filter: 3076',)
('              ->  Nested Loop  (cost=0.62..37.47 rows=1 width=8) (actual time=10492.455..202323.935 rows=8 loops=1)',)
('                    Join Filter: ("DataRepo_peakgroup".id = "DataRepo_peakdata".peak_group_id)',)
('                    Rows Removed by Join Filter: 7888',)
('                    ->  Index Scan using "DataRepo_peakdata_peak_group_id_4dd87f4a" on "DataRepo_peakdata"  (cost=0.25..8.26 rows=1 width=8) (actual time=0.017..7.149 rows=7896 loops=1)',)
('                    ->  Nested Loop  (cost=0.38..29.20 rows=1 width=4) (actual time=2.942..25.621 rows=1 loops=7896)',)
('                          Join Filter: ("DataRepo_peakgroup".peak_annotation_file_id = "DataRepo_archivefile".id)',)
('                          ->  Nested Loop  (cost=0.38..18.43 rows=1 width=8) (actual time=2.935..25.614 rows=1 loops=7896)',)
('                                Join Filter: ("DataRepo_msrun".sample_id = "DataRepo_sample".id)',)
('                                Rows Removed by Join Filter: 83',)
('                                ->  Nested Loop  (cost=0.12..10.15 rows=1 width=12) (actual time=0.028..2.513 rows=84 loops=7896)',)
('                                      Join Filter: ("DataRepo_peakgroup".msrun_id = "DataRepo_msrun".id)',)
('                                      Rows Removed by Join Filter: 3486',)
('                                      ->  Index Scan using "DataRepo_peakgroup_peak_annotation_file_id_6dc2fc25" on "DataRepo_peakgroup"  (cost=0.12..8.14 rows=1 width=12) (actual time=0.006..0.464 rows=84 loops=7896)',)
("                                            Filter: ((name)::text = 'serine'::text)",)
('                                            Rows Removed by Filter: 1008',)
('                                      ->  Seq Scan on "DataRepo_msrun"  (cost=0.00..2.00 rows=1 width=8) (actual time=0.013..0.017 rows=42 loops=663264)',)
('                                ->  Index Scan using "DataRepo_sample_pkey" on "DataRepo_sample"  (cost=0.25..8.27 rows=1 width=4) (actual time=0.194..0.275 rows=1 loops=663264)',)
("                                      Filter: ((name)::text = 'xzl5_panc'::text)",)
('                                      Rows Removed by Filter: 281',)
('                          ->  Seq Scan on "DataRepo_archivefile"  (cost=0.00..10.75 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=7896)',)
("                                Filter: ((filename)::text = 'alafasted_cor.xlsx'::text)",)
('              ->  Index Scan using "DataRepo_peakdatalabel_peak_data_id_f6151d4a" on "DataRepo_peakdatalabel"  (cost=0.25..8.27 rows=1 width=12) (actual time=0.007..0.217 rows=386 loops=8)',)
("                    Filter: ((element)::text = 'C'::text)",)
('                    Rows Removed by Filter: 180',)
('Planning Time: 0.963 ms',)
('Execution Time: 202327.334 ms',)

And here is the Django 3.2 and postgres 10 output:

('Aggregate  (cost=48.03..48.04 rows=1 width=8) (actual time=0.207..0.209 rows=1 loops=1)',)
('  ->  Unique  (cost=48.01..48.02 rows=1 width=1052) (actual time=0.204..0.208 rows=1 loops=1)',)
('        ->  Sort  (cost=48.01..48.02 rows=1 width=1052) (actual time=0.202..0.205 rows=8 loops=1)',)
('              Sort Key: "DataRepo_peakgroup".id',)
('              Sort Method: quicksort  Memory: 25kB',)
('              ->  Nested Loop  (cost=13.01..48.00 rows=1 width=1052) (actual time=0.135..0.190 rows=8 loops=1)',)
('                    ->  Nested Loop  (cost=12.73..46.52 rows=4 width=8) (actual time=0.121..0.154 rows=8 loops=1)',)
('                          ->  Nested Loop  (cost=8.72..21.64 rows=1 width=4) (actual time=0.099..0.125 rows=1 loops=1)',)
('                                ->  Nested Loop  (cost=8.58..13.46 rows=1 width=8) (actual time=0.089..0.114 rows=1 loops=1)',)
('                                      ->  Hash Join  (cost=8.30..12.43 rows=1 width=4) (actual time=0.062..0.087 rows=1 loops=1)',)
('                                            Hash Cond: ("DataRepo_msrun".sample_id = "DataRepo_sample".id)',)
('                                            ->  Seq Scan on "DataRepo_msrun"  (cost=0.00..3.68 rows=168 width=8) (actual time=0.023..0.037 rows=84 loops=1)',)
('                                            ->  Hash  (cost=8.29..8.29 rows=1 width=4) (actual time=0.021..0.021 rows=1 loops=1)',)
('                                                  Buckets: 1024  Batches: 1  Memory Usage: 9kB',)
('                                                  ->  Index Scan using "DataRepo_sample_name_e81ec2f9_like" on "DataRepo_sample"  (cost=0.28..8.29 rows=1 width=4) (actual time=0.015..0.019 rows=1 loops=1)',)
("                                                        Index Cond: ((name)::text = 'xzl5_panc'::text)",)
('                                      ->  Index Scan using unique_peakgroup on "DataRepo_peakgroup"  (cost=0.28..0.89 rows=1 width=12) (actual time=0.026..0.026 rows=1 loops=1)',)
('                                            Index Cond: (((name)::text = \'serine\'::text) AND (msrun_id = "DataRepo_msrun".id))',)
('                                ->  Index Scan using "DataRepo_archivefile_pkey" on "DataRepo_archivefile"  (cost=0.14..4.16 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=1)',)
('                                      Index Cond: (id = "DataRepo_peakgroup".peak_annotation_file_id)',)
("                                      Filter: ((filename)::text = 'alafasted_cor.xlsx'::text)",)
('                          ->  Bitmap Heap Scan on "DataRepo_peakdata"  (cost=4.01..24.59 rows=29 width=8) (actual time=0.018..0.019 rows=8 loops=1)',)
('                                Recheck Cond: (peak_group_id = "DataRepo_peakgroup".id)',)
('                                Heap Blocks: exact=2',)
('                                ->  Bitmap Index Scan on "DataRepo_peakdata_peak_group_id_4dd87f4a"  (cost=0.00..4.00 rows=29 width=0) (actual time=0.016..0.016 rows=8 loops=1)',)
('                                      Index Cond: (peak_group_id = "DataRepo_peakgroup".id)',)
('                    ->  Index Only Scan using unique_peakdata on "DataRepo_peakdatalabel"  (cost=0.29..0.37 rows=1 width=12) (actual time=0.004..0.004 rows=1 loops=8)',)
('                          Index Cond: ((peak_data_id = "DataRepo_peakdata".id) AND (element = \'C\'::text))',)
('                          Heap Fetches: 8',)
('Planning time: 1.012 ms',)
('Execution time: 0.362 ms',)

And here is the Django 4.2 and postgres 13 output when running just the one test (which is always fast):

('Aggregate  (cost=99.01..99.02 rows=1 width=8) (actual time=1.399..1.403 rows=1 loops=1)',)
('  ->  Unique  (cost=98.99..99.00 rows=1 width=1052) (actual time=1.390..1.398 rows=1 loops=1)',)
('        ->  Sort  (cost=98.99..99.00 rows=1 width=1052) (actual time=1.389..1.393 rows=8 loops=1)',)
('              Sort Key: "DataRepo_peakgroup".id',)
('              Sort Method: quicksort  Memory: 25kB',)
('              ->  Nested Loop  (cost=5.36..98.98 rows=1 width=1052) (actual time=0.261..1.377 rows=8 loops=1)',)
('                    ->  Nested Loop  (cost=5.08..80.19 rows=51 width=8) (actual time=0.152..1.234 rows=8 loops=1)',)
('                          ->  Nested Loop  (cost=0.55..28.36 rows=1 width=4) (actual time=0.138..1.214 rows=1 loops=1)',)
('                                ->  Nested Loop  (cost=0.29..27.20 rows=1 width=8) (actual time=0.061..0.873 rows=84 loops=1)',)
('                                      ->  Nested Loop  (cost=0.14..18.92 rows=1 width=8) (actual time=0.048..0.667 rows=84 loops=1)',)
('                                            ->  Seq Scan on "DataRepo_archivefile"  (cost=0.00..10.75 rows=1 width=4) (actual time=0.011..0.012 rows=1 loops=1)',)
("                                                  Filter: ((filename)::text = 'alafasted_cor.xlsx'::text)",)
('                                            ->  Index Scan using "DataRepo_peakgroup_peak_annotation_file_id_6dc2fc25" on "DataRepo_peakgroup"  (cost=0.14..8.16 rows=1 width=12) (actual time=0.035..0.634 rows=84 loops=1)',)
('                                                  Index Cond: (peak_annotation_file_id = "DataRepo_archivefile".id)',)
("                                                  Filter: ((name)::text = 'serine'::text)",)
('                                                  Rows Removed by Filter: 1008',)
('                                      ->  Index Scan using "DataRepo_msrun_pkey" on "DataRepo_msrun"  (cost=0.14..8.16 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=84)',)
('                                            Index Cond: (id = "DataRepo_peakgroup".msrun_id)',)
('                                ->  Index Scan using "DataRepo_sample_pkey" on "DataRepo_sample"  (cost=0.27..0.72 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=84)',)
('                                      Index Cond: (id = "DataRepo_msrun".sample_id)',)
("                                      Filter: ((name)::text = 'xzl5_panc'::text)",)
('                                      Rows Removed by Filter: 1',)
('                          ->  Bitmap Heap Scan on "DataRepo_peakdata"  (cost=4.52..51.52 rows=31 width=8) (actual time=0.010..0.012 rows=8 loops=1)',)
('                                Recheck Cond: (peak_group_id = "DataRepo_peakgroup".id)',)
('                                Heap Blocks: exact=1',)
('                                ->  Bitmap Index Scan on "DataRepo_peakdata_peak_group_id_4dd87f4a"  (cost=0.00..4.51 rows=31 width=0) (actual time=0.005..0.005 rows=8 loops=1)',)
('                                      Index Cond: (peak_group_id = "DataRepo_peakgroup".id)',)
('                    ->  Index Only Scan using unique_peakdata on "DataRepo_peakdatalabel"  (cost=0.29..0.37 rows=1 width=12) (actual time=0.017..0.017 rows=1 loops=8)',)
('                          Index Cond: ((peak_data_id = "DataRepo_peakdata".id) AND (element = \'C\'::text))',)
('                          Heap Fetches: 8',)
('Planning Time: 1.195 ms',)
('Execution Time: 1.566 ms',)

Wow, that first execution plan is really wild.

In between these runs, are you adding or deleting data from these tables?

If so, you may want to try a vacuum full analyze after the data has been updated (inserted, deleted, whatever) and before the next set of tests execute. That should at least ensure that the statistics tables are updated properly before the queries are analyzed.

We only add data. We have a curation process that all takes place before data is loaded and all the tests (in this particular case) are of the loaders. It is scientific data and it is fairly normalized in the database. The count query that causes the slowdown in django 4.2 and postgres 13 is:

pg.filter(peak_data__labels__element__exact="C").count()

The record is a PeakGroup record. PeakData links to PeakGroup. PeakDataLabel links to PeakData and it contains the element field.

I’m not familiar with the construction of these queries behind the scenes, or how to interpret this output, but I don’t understand why the “plan” appears to traverse the models: MSRun, Sample, and ArchiveFile which are either higher in the hierarchy or adjacent. Perhaps it has to do with the new ArchiveFile table, which is linked to from multiple tables in the hierarchy now? The relationship hierarchy is:

PeakDataLabelPeakDataPeakGroupMSRunSample

Since we’re starting from a PeakGroup record and each ForeignKey is a M:1, why does it appear to be doing a join with MSRun? And why is it joining PeakGroup with ArchiveFile? Those aren’t part of the peak_data__labels__element relations from a PeakGroup specified in the query.

So what you’re suggesting is that the statistics tables that analyze depends on may have stale data from other test classes and that the analyze output is inaccurate as a result? Or are you implying that that could also be affecting the performance?

I’m sorry, but I don’t have any knowledge of the PostgreSQL query planner to even begin to offer a guess as to the heuristics used when mapping out a query, or why it may choose to use (or not use) certain tables or indexes when planning a query.

What I’m saying is that the query planner uses the data in the statistics tables for mapping out the query.

The vacuum full analyze command, in addition to cleaning up the table spaces, also causes the statistics tables to be updated with the current statistics of the data. (These tables are not updated with every insert/update/delete)

So, what I’m suggesting is that: If you’re inserting a lot of data into these tables, and the statistics are sufficiently out-of-date relative to what’s actually in the database, that the query planner is going to create a suboptimal query plan - and that can definitely affect performance.