As part of the process outlined in DEP 0001, I would like to start a discussion on a draft DEP I have posted here, which I am calling “Unasyncify Codegen”.
This is a strategy to generate a sync variant to async functions, similar to what is done by psycopg (though targetted at specific functions instead of entire files). I believe this would be a helpful tool for moving forward with DEP 0009.
I have discussed this idea with several members of the community, and have a reference implementation, (which I found was needed to validate the idea).
I am looking for a Shepherd for this DEP, any feedback, and (of course) any bikeshedding on naming.
That sounds great. I think it would help to see and discuss the reference implementation. If people feel comfortable with it, they might be more motivated to shepherd it.
I linked a reference implementation near the end of the DEP that handles some small examples. I am comfortable with discussing that implementation.
It doesnt show using this for queryset methods, though.
Main reason is I want to decouple the “using code gen for sync/async” discussion from “how to handle transitioning async queryset methods from being just a call to the sync variant” discussion. I plan on opening a thread in the Async subforum on this second point.
Unfortunately I do not have access to a keyboard the next 6 days so it will take me some time to write it out
Oh, my mistake then. I was looking at the thread in my mail program on the phone which didn’t show the dep link so I assumed it wasn’t published yet. Sorry for the noise.
I have continued working on the reference implementation and set up a PR on my fork showing some of the changes.
I would appreciate eyes on the DEP itself, and even some opinions.
If there’s too much pushback in the community I am not going to investigate this much more. I’m a bit at the ‘lots of tedious test changes to try and get coverage up’ phase and if people already have other ideas on how to tackle this issue I’d rather put efforts into there.
Might be interesting to try and apply this codegen to a third party library willing to offer itself up as a sacrifice and to “prove the point” about this being feasible (my ideas on this: django-ninja perhaps? Or just look up a list of packages that use asgiref, since pervasive usage of sync_to_async and async_to_sync might be indicative of missing something here)
unasync exists as an alternative package that does something similar. Instead of inline code editing, it generates separate files. I like being able to generate stuff inline next to each other because of how easy it is to see stuff be in sync… and also for the ORM at least we want two methods on the same class, not two classes outright. But maybe unasync’s strategy is cleaner…
@steering_council this is the lightest possible ping on this DEP, since I had originally posted it right before the previous steering council disbanded.
More or less (and just my take) this is going to need a serious load of experimentation, and demonstration, before we get anywhere close to “Oh, yes, and let’s merge this into Django”.
So, absolutely, let’s see some third-party libraries doing this — whether it’s established ones, or ones set up specifically to show the proof of concept. (I take the existing psycopg example as a big plus, but even so, nothing much changes there.)
In the meantime, putting together the proposal, and the discussion for OK when might we merge this (if at all) is worth having. As I mentioned on the DEP PR, code generation has a long-history in Django — that’s how it started and all that was removed, for good reason. Are we really going back that route? (I was looking at the PR just this morning… that folks would run a management command was like whoa… — Like maybe, but… )
I think this is an interesting idea — !!! — and the code duplication from needing sync+async pathways is not great (to put it mildly) but I’d like to see a lot more actual running examples (and the problems that will arise identified) before we rush to any conclusions. (Measure twice… — let’s experiment )
Thank you for looking over this, Carlton. And thanks for the extra vote on “we should experiement” on this, because I’d also like to figure out ways to validate this idea beyond “well we have coverage”.
RE third party libs, Psycopg3 is the big example of success. For completeness’ sake, urllib3 seems to have tried to do something similar but the effort was abandoned, so would be an example of failure on this front.
But then making each test to pass in async was going to be yet another slog, and after that I would have to tackle requests. It was becoming lonely and urllib3 was continuing to move fast. So the hip has stalled, and I’m now focusing my limited time into urllib3 itself
I am a bit curious about some of your comments regarding codegen. In my vision here, the codegen commands would not need to be run by users of Django. It would be by Django developers, and only when touching APIs that have sync and async variants. And even then the flow in the finalized flow would be to treat the async variant as “canonical” and have codegen give us the sync version.
And like I tried to emphasize in the DEP, the codegen results would be checked in, in the packaged code… so at worst this is automation for helping keep two implementation synced up, and not codegen at runtime or setup time.
I do not know about Django’s old codegen, but I imagine that was regarding codegen needed to be done by users, right? I still believe codegen is a real cost and if there’s a path that involves no codegen that makes me very happy. Just want to make sure there’s no misunderstanding about who would invoke codegen and at what step of the process.
Maybe I’m misundertanding Django’s old codegen story! Cursory Googling brings up this thread now , if there’s a Django version number in mind on this I’d love to know.
One other point here I’d like to lay out… the one thing where codegen beats everything in my mindset is performance. I want there to be as close to no drawbacks as possible for existing systems in any transition, and codegen giving sync APIs completey sync code flows is huge for that.
I don’t think performance is the be-all-and-end-all, but it feels like there’s very littler margin for performance penalties for sync APIs. Maybe there’s a way to have async_to_sync be performant enough, but what cost would we be even comfortable with on that front? I would be comfortable with “some” cost, but I feel like the existing async_to_sync cost is more than that “some”.
@rtpg It was all dawn of time stuff… pre magic removal.
Just thinking: I wonder if we can target early some smaller parts where we already have significant duplication, like in the middleware, sessions and auth code… This would be relatively easy to test and verify already, I’d guess.
Yeah I am up for pointing this at a smaller subsystem. The DB backend in particular for sessions is already doing the duplication so we could just point it at the existing system with very little code changes (probably).
The trickier stuff is when the existing async implementations call into sync implementations.
# in django/contrib/sessions/backend/files.py
async def aload(self):
return self.load()
The existing code calls into the sync implementation, so existing subclasses of the file-based SessionStore that override justload also have a version of aload that behaves properly. If aload is ported over to use codegen then there’s breakage there, unless some other stuff happens.
Just I had opened up another topic regarding this issue in the past… there’s definitely some trickery we can do to avoid issues on this front, but maybe this is a thing where backwards compatibility breaks would just be the most reasonable.
Devil is in the details but the async pathways are still new and lightly used. I’d be surprised if we can do it all without any breaking changes… — Try to avoid them, and doc them if we can’t, or similar.
I feel like if we could land @fcurella’s async cursor work, a lot would simplify here. (The existing a methods call the sync equivalent at bottom because there’s no option to do otherwise.) I’ve been pondering putting that into a third-party package, so we can at least play with it properly.
For codegen, I have split out the codegen experiment into its own package (as django-unasyncify). I am going to try and see if I can get any bites in third party apps that might want to try it, but this might be a bit of a failed experiment.
I have been looking at async_to_sync (see here), and generally speaking seeing costs that are under a millisecond (often much lower!). If the cost/benefit matrix is acceptable, there might be a “straightforward” answer here of leaning onto async_to_sync to offer one canonical async implementation and a sync fallback.
But I believe that for sync database connections we would need a sync_to_async operation on the other end to balance things out, and that would be costly. That requirement is not immovable, it might be worth looking at replacing async_unsafe usage inside of DB connection APIs with a context-level lock or something. Tricky though, and I don’t like paying a potential correctness cost for this.