Channel Layer and Django TestCase interaction and best practice

sevdog · October 7, 2024, 2:21pm

Since django has introduced support for asyncrhonous tests I started using the django built-in TestCase to run tests for channels consumers.

Recently I added a couple of “complex” tests involving consumers (ie: checking what is going on with 3 clients connected at the same time, previously it was just tested with 2 clients) and my test suite started “randomly” returning me a RuntimeError:

RuntimeError: <asyncio.locks.Lock object at 0x77eba260dae0 [locked]> is bound to a different event loop

Digging deeper I discovered that this error is thrown only when I run that specific test-case (which has 12 tests involving consumers/channels) and is more “randomic” when I run my whole test suite (about 2100 tests) and is never thrown if a run the tests one-by-one. In my environment the error raises approximatelly in 1 out of 10 cases if the whole suite is run while it raises in 4 out of 5 cases if only the problematic test-case is run.

After a couple of our spent investigating I found that the lock which is causing the problem is one of those defined in the ChannelLock class, putting a couple of debug logs in the acquire and release methods I found out that sometimes (I am still working to get what is going one) a test is completed leaving an unreleased lock, which is not cleaned up by the channel layer.

I belive that the problem could be caused by TestCase which wraps each async test into an async_to_sync which creates a new event-loop since it was launched from a sync context, because of this the channel layer may ends-up using dirty values.

I already tried using RedisChannelLayer.flush which should have cleaned up also the receive-locks in setUp and tearDown methods, but the result did not changed. Currently I am forcyng layers to re-build in setUp (thus resetting the status of locks).

Does anyone else have experimented something similar?
Am I missing something which could have help finding the real root-cause of this issue (which to me seems something time-related)?

I suspect this to be a potential issue, however I would like to ear if I am the only one who is using it in this way.

carltongibson · October 7, 2024, 2:45pm

Hey @sevdog

This seems to ring a bell to do with use of the parallel test runner. (Can’t 100% recall the exact time I was hitting this…)

What happens is a particular asyncio object gets created on the main thread (at import time?) then a subprocess runs whatever code it’s used on.

Presumably this doesn’t come up with --parallel=1?

I wonder if using SerializeMixin would be sufficient here.

Otherwise finding a way to defer the creation of the offending object until the test is running would be the way.

HTH

sevdog · October 7, 2024, 3:04pm

Thank you @carltongibson , I will try this when I have other time to fight with this kind of strangeness of asyncio!

I had already tried using also pytest with and without the xdist plugin (n=2, n=1) and also running with django tests (with parallel=2 and without it), in all these cases I got the RuntimeError.

To give a bit more context this is the final error which I found in my test logs:

/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/channels/utils.py:57: RuntimeError
>                   await task
E                   RuntimeError: <asyncio.locks.Lock object at 0x7fdc293233d0 [locked]> is bound to a different event loop

consumer_callables = [<bound method Queue.get of <Queue at 0x7fdc2bba0250 maxsize=0 _queue=[{'type': 'websocket.disconnect', 'code': 1000}]...ChannelLayer object at 0x7fdc2b6a5780>>, 'specific.93ae75bd140e4209a54ce7b2a5eb43ff!bae3dd8bbc2a4dd0ae5a17a01aa18c2f')]
dispatch   = <bound method AsyncConsumer.dispatch of <project.consumers.Multiplexer object at 0x7fdc2938b940>>
i          = 1
result     = {'type': 'websocket.connect'}
task       = <Task finished name='Task-708' coro=<RedisChannelLayer.receive() done, defined at /opt/hostedtoolcache/Python/3.10.15/...4> exception=RuntimeError('<asyncio.locks.Lock object at 0x7fdc293233d0 [locked]> is bound to a different event loop')>
tasks      = [<Task cancelled name='Task-711' coro=<Queue.get() done, defined at /opt/hostedtoolcache/Python/3.10.15/x64/lib/python...> exception=RuntimeError('<asyncio.locks.Lock object at 0x7fdc293233d0 [locked]> is bound to a different event loop')>]

It is a bit hard to debug what is going on using asyncio, because error messages just say “oh, you are on the wrong loop!” and I did not found any useful or convenient method to list/enum how many event-loops are there? when they were created? and by who?
Also channels does not provide many logs (I had to edit files in my venv to get more info), maybe I will add a PR to channels-redis or channels to add a bit more info at least to help debugging when an error occurs.

carltongibson · October 7, 2024, 3:32pm

Maybe you have fixture that is being created early, again in another thread or process than that the test runs in

My pytest foo isn’t enough to control that, but you could try inlining the fixture to eliminate that as a possibility.

Topic		Replies	Views
Async Django channels with multi tenant schemas Async/Channels	0	172	May 16, 2024
Testing Django Channels: Lack of documentation and examples Using Django	0	12	September 22, 2024
Django-channels Async/Channels	2	142	June 14, 2024
Adding async to Django testing Async	7	6349	November 17, 2019
[SOLVED] OperationalError and SynchronousOnlyOperation with async Using Django	0	2213	December 22, 2020

Channel Layer and Django TestCase interaction and best practice

Related topics