I’m a bit puzzled by this one.
I recently tried upgrading psycopg for 2.9.9 to 3.1.18. When running my test suite, some tests still passes, but some of them are just hanging, blocked, never ending.
I added pytest-timeout to try to understand what was up, and I can see the runner is blocked in:
Execute a query or command to the database.
"""
try:
> with self._conn.lock:
E Failed: Timeout >60.0s
../../../.pyenv/versions/3.12.2/envs/braindate-api-3.12/lib/python3.12/site-packages/psycopg/cursor.py:727: Failed
So I’m assuming the connection can’t get a lock on the database now? Some queries work though, so it seems that it’s only after X queries that I have the issue.
I’m still trying to find a good minimal reproduction, but if anyone has clues as to what I can do to debug this, it would be appreciated.
What changed between v2 and v3 that can trigger this? This could help me debug it too.
Are you running everything sync? Or are there some parts of this that are async? Are there any configuration or settings that you’ve changed? Are there any other changes you’ve made other than just changing psycopg?
I do have some async views and tests in the app, and this was my first thought as well. But the failure happens even if no async test/view is executed.
Ok, so I looked a bit more and actually, the issue seems to be when I use a JSONField with a custom encoder.
In the encoder, I do this (among other things):
For what its worth, I can confirm it seems to relate to the custom encoder attempting to access a queryset whilst the lock is in place. I solved it by manually encoding the dict in a pre_save signal as a cheesy workaround.
Thanks for your investigations so far! We’ve also been affected by this issue in our project and were also really puzzled what was happening. Your posts gave us a first idea where to look.
We did some further digging into the psycopg3 code and discovered that serializing the model to save (which in turn triggers the JSONSerializiers we are all using) happens inside a threading lock (not a database lock!). When the serialization triggers another database query (as seen in the code of @Alexerson), it uses the same connection and arrives at the same line of code trying to acquire the threading lock again, which deadlocks with the first acquisition.
We don’t understand yet why this behaviour was choosen, as psycopg2 behaved differently and the change is not listed anywhere in the docs.
I started a discussion in the psycopg3 repo and hope to get some clarification over there. We are also trying to document our findings within our own project.
Would also be happy to hear if you found out anything more apart from the workaround, which works flawlessly (as the serialization is not triggered from within the INSERT operation and therefore avoiding the deadlock).