Since django has introduced support for asyncrhonous tests I started using the django built-in TestCase to run tests for channels consumers.
Recently I added a couple of “complex” tests involving consumers (ie: checking what is going on with 3 clients connected at the same time, previously it was just tested with 2 clients) and my test suite started “randomly” returning me a RuntimeError:
RuntimeError: <asyncio.locks.Lock object at 0x77eba260dae0 [locked]> is bound to a different event loop
Digging deeper I discovered that this error is thrown only when I run that specific test-case (which has 12 tests involving consumers/channels) and is more “randomic” when I run my whole test suite (about 2100 tests) and is never thrown if a run the tests one-by-one. In my environment the error raises approximatelly in 1 out of 10 cases if the whole suite is run while it raises in 4 out of 5 cases if only the problematic test-case is run.
After a couple of our spent investigating I found that the lock which is causing the problem is one of those defined in the ChannelLock class, putting a couple of debug logs in the acquire and release methods I found out that sometimes (I am still working to get what is going one) a test is completed leaving an unreleased lock, which is not cleaned up by the channel layer.
I belive that the problem could be caused by TestCase which wraps each async test into an async_to_sync which creates a new event-loop since it was launched from a sync context, because of this the channel layer may ends-up using dirty values.
I already tried using RedisChannelLayer.flush which should have cleaned up also the receive-locks in setUp and tearDown methods, but the result did not changed. Currently I am forcyng layers to re-build in setUp (thus resetting the status of locks).
Does anyone else have experimented something similar?
Am I missing something which could have help finding the real root-cause of this issue (which to me seems something time-related)?
I suspect this to be a potential issue, however I would like to ear if I am the only one who is using it in this way.
Thank you @carltongibson , I will try this when I have other time to fight with this kind of strangeness of asyncio!
I had already tried using also pytest with and without the xdist plugin (n=2, n=1) and also running with django tests (with parallel=2 and without it), in all these cases I got the RuntimeError.
To give a bit more context this is the final error which I found in my test logs:
/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/channels/utils.py:57: RuntimeError
> await task
E RuntimeError: <asyncio.locks.Lock object at 0x7fdc293233d0 [locked]> is bound to a different event loop
consumer_callables = [<bound method Queue.get of <Queue at 0x7fdc2bba0250 maxsize=0 _queue=[{'type': 'websocket.disconnect', 'code': 1000}]...ChannelLayer object at 0x7fdc2b6a5780>>, 'specific.93ae75bd140e4209a54ce7b2a5eb43ff!bae3dd8bbc2a4dd0ae5a17a01aa18c2f')]
dispatch = <bound method AsyncConsumer.dispatch of <project.consumers.Multiplexer object at 0x7fdc2938b940>>
i = 1
result = {'type': 'websocket.connect'}
task = <Task finished name='Task-708' coro=<RedisChannelLayer.receive() done, defined at /opt/hostedtoolcache/Python/3.10.15/...4> exception=RuntimeError('<asyncio.locks.Lock object at 0x7fdc293233d0 [locked]> is bound to a different event loop')>
tasks = [<Task cancelled name='Task-711' coro=<Queue.get() done, defined at /opt/hostedtoolcache/Python/3.10.15/x64/lib/python...> exception=RuntimeError('<asyncio.locks.Lock object at 0x7fdc293233d0 [locked]> is bound to a different event loop')>]
It is a bit hard to debug what is going on using asyncio, because error messages just say “oh, you are on the wrong loop!” and I did not found any useful or convenient method to list/enum how many event-loops are there? when they were created? and by who?
Also channels does not provide many logs (I had to edit files in my venv to get more info), maybe I will add a PR to channels-redis or channels to add a bit more info at least to help debugging when an error occurs.