I’m facing an odd behavior on one of my applications. So I have an application that has an POST
entrypoint that receives a list of order id’s and after processing each order it must inform an external API about the monetary value of each order. So what we did was the following:
order_details_tasks = [
asyncio.create_task(
self.launch_individual_order_details(order),
name=f"task_order_{order.id}",
)
for order in active_orders
]
results = await asyncio.gather(*order_details_tasks, return_exceptions=True)
for task, result in zip(order_details_tasks, results):
if isinstance(result, Exception):
print(f"⚠️ Task '{task.get_name()}' raised an exception: {result}")
else:
print(f"✅ Task '{task.get_name()}' succeeded with result: {result}")
The launch_individual_order_details(order)
function does the following stuff and other non I/O
logic before this block:
logger.debug(f"Sending order details with success for order: {order.id}")
await order_service.send_order_request(order)
Inside send_order_request
we create a entry on a table called Transaction
with the order id and the corresponding order amount and in pending state and send http request using aiohttp.CLient
library. Afterwards we update the transaction status to Success if the request response is succesful and to error if the request fails.
So the problem we are facing is that when our system is with a relative amount of load in our pods when the pod uses 70% of the CPU limits we give to them we notice that some of our tasks simply break execution and don’t inform the event loop about any possible error.
After even further investigation I encapsulated the coroutine where the request is made with a try/except
block and printed the traceback in case a asyncio.CancelledError
is raised and noticed that in fact those tasks are being canceled when they interact with my MySQL databases like the following:
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File line 100, in launch_individual_order_details
await database_sync_to_async(self.update_transaction_status)(
details_transaction, Transaction.SUCCESS
)
File "/opt/venv/lib/python3.13/site-packages/asgiref/sync.py", line 485, in __call__
ret = await exec_coro
^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError
{}
If instead of running this tasks in the event loop in run them through celery everything works as expected an they don’t fail so I don’t know what is happening but I suspect that the event loop may be canceling some of these tasks due to limits on the database access. Can someone give me some hints regarding this topic? If I run 100 tasks by minute in a 30 minutes heady load testing I would say that 35 in total fail.