I’m updating an app that is currently on Django 3.2.x to 4.2.x and have run into performance issues when iterating over the same queryset with the same data between versions (using psycopg2 with postgres). I’ve narrowed down the issue via profiling, here are the timings below (I see this consistently):
This is on psycopg2 2.9.7. Has anyone seen anything similar or know of any settings that might have changed that might affect this? I’ve tried psycopg3 and see similarly poor performance as well, also have tried disabling server-side cursors, no luck yet.
Kind of hard to provide you any support here as the only thing the profiling data demonstrates is that for the same number of call to psycopg2.extensions.cursor it takes significantly more time to execute on 4.2.
I’d look into what particular (sql, params) are being fed into the cursor between the two Django versions to compare it and try to identify differences.
It might be that for the same ORM calls some subtly different SQL is generated on 3.2 and 4.2 which happens to be less efficient and make it look like psycopg2 takes more time to run.
@charettes you are correct - the data going into what generates the SQL (from a previous SQL query) is different. What was happening is that we generate a very large OR query with data from the results of a previous SQL query, and in that data there were many duplicates in Django 4 vs Django 3. This resulted in a query that returned the same results, but was ~5x as big. Eliminating the duplicates fixes the performance (and improves beyond django 3 as well).