A datapoint for asgiref performance

I have been working on some perf issues in some code (that, granted, has many problems intrinsic to its design).

I’ve found that async_to_sync and sync_to_async-wrapped calls tend to cost me 3ms-5ms per call. This is obviously super dependent on use cases, but I’m glad to have an idea of the number now. (This number doesn’t include the initial wrap, in general I have been declaring a wrapped function then calling that).

Obviously for a single call it doesn’t matter, but in our case we have calls in pretty hot loops, and it is a bit painful.

and of course something like:

await sync_to_async(getattr)(instance, 'field_name')

to get the field name is a good way to cause pain.

Thanks for the measurement. I agree that is slow and would like to see this optimized, if possible.

Some questions for more investigation, if you want:

Are you using Python 3.12? Could you try 3.13 (nearly at final release)?

Are you using the latest asgiref, and have you checked for performance-related issues in its repo?

Could you try using cProfile to check in-depth what’s happening? Here’s a post with my preferred toolchain that I used for optimizing the system checks.

1 Like

I can no longer edit my post for whatever reason, so I just wanted to add a comment here that I was actually only doing measures on sync_to_async, and hadn’t really been measuring async_to_sync calls.

I was definitely using the latest asgiref at the time (unfortunately I no longer have access to the test environment that I was working on), and those measurements were in a “production” environment that was running fairly hot (hence me noticing).

A TODO for myself would be to build up a simple test harness to help highlight the costs we were witnessing for sync_to_async.

1 Like

@rtpg This is on my backlog too (if you’d like to collaborate)

So I did start looking a bit at async_to_sync and posted a microbenchmark in another thread. I could also look at setting up a similar-looking microbenchmark script for sync_to_async (though probably need to try a bit harder to generate useful behavior, since sync_to_async is much more about coordination between various threads)

I think that’s great. I’d like to have (e.g.) Locust running against example applications (as well as microbenchmarks).

We could likely gather various bits in a repo, and start building out?
(Wanna hit me up on your backchannel of choice?)

1 Like