Maybe it's just a blind spot, when it comes to async django

Hi, django fellows!

@andrewgodwin says in DEP-9 that there is no easy way to make django async. I remember he was doing a talk on this and asked the audience: “Why can’t we just add async and await everywere?” The answer, of course, was the fact that we cared about compatibility.

I doubt that. I think the most straightforward way actually works. And it allows for having both blocking and async versions of django with minimal adjustments.

So, let’s suppose we add async’s and await’s everywhere. How will it work? The async case is more or less clear. The django API is not 100% convertable to async, but it’s doable. Even now django has some kind of async API (async for, aget, etc)

Let’s take the blocking usecase (remember that our codebase is all async-await now?). We just make the top-level functions like Model.save a regular function (an API facade). The functions it calls like Model.save_base will all be async.

How can we turn an async function into a regular one? (We are using blocking i/o) Very easy:

class Get:
    def __await__(self):
        return 1
        yield


async def f():
    return (await Get())


def run(co):
    try:
        next(co.__await__())
    except StopIteration as ex:
        return ex.value
    else:
        assert False, "unreachable"

assert run(f()) == 1

Here we call an async function f. We make use of the fact that, with blocking i/o, generators never yield anything.

What do you think? Is this the blind spot of the django team or myself?

@carltongibson

I believe that what Andrew identified is that the performance hit was unacceptable when trying to take a more direct approach. (I believe he documented somewhere - perhaps a slide in a talk - just how bad it was.) So my recollection is that it would “work”, but be unacceptably slow. I know I remember seeing a number of postings along the way where he mentions performance considerations as a real concern and the work involved to reduce the negative effects of going async.

@KenWhitesell That is not true. Keep in mind that we are paying the exactly same price in async code. I did benchmarks myself, it is not noticable for queries like MyModel.objects.get(pk=pk) (I mean, if you make a lot of those, of course).

Another argument may be that the blocking approach is not the most performant anyway, and that we usually turn to the async one for performance.

So, as I said: the performance hit is minimal, and is paid with the async approach anyway (the one we usually use for performance).

What is not true? That my recollection of what was said 4 - 6 years ago is faulty? I’ll concede that readily.

Let’s be clear, I am not making these claims, I am merely passing along what little knowledge I have in this area. I can’t address the accuracy of what I remember hearing or reading. I have not read the code involved, nor have run any benchmarks.

Personally, I have no interest in core Django going async. I have, at best, about 5% of one project that could theoretically see some improvement by going async. Beyond that one edge case for me, I find no use for it.

I won’t deny that others have different needs - great. But I would consider any change that reduces the performance of sync Django in favor of async Django to be a real negative.

Sorry @KenWhitesell I’m just attacking the expressed point of view, any personal remarks are just for the sake of better debate :slight_smile:

Why would you think so? As I said, the blocking approach does not put performance first. Python itself does not put performance first, obviously. Just being curious, as performance is not an issue here, as I said.

On the other hand, django is losing popularity very rapidly on the account of its poor async support. The impact for the framework is serious. sqlalchemy, for example, does have “native” async support, albeit implemented via greenlets.

Let’s start with the base case.

I have a view:

def list_objects(request):
    object_list = SomeModel.objects.all()
    return render(request, "my_template.html", {'objects': object_list})

Please explain to me precisely how the addition of any async-related code is going to reduce the period of time between when the function is called and when the response is returned.

Hint: It doesn’t, and can’t. Asynchronous operations does not reduce the latency of any individual view, except in those cases where that view needs to perform multiple concurrent operations that can be performed in parallel. Its benefits generally come from a potential improvement in overall throughput when a process needs to scale to handle multiple concurrent requests.

Again, I don’t deny that such environments and situations exist. Nor do I contend that those issues aren’t important to some people.

They just don’t matter to me. In the areas where we work, there’s just no benefit to it.

An assertion expressing an opinion unsupported by any evidence that I have seen. (Nor do I consider some abstract definition of “popularity” as being fundamentally important. If I did, I’d still be working with Wordpress.)

1 Like

Also, for some information regarding the effect on sync Django by providing async support, see Async Performance, also, Evaluation of performance of async views - they suck?, and Evidence of out-of-the-box advantages of ASGI vs. WSGI.

Oh yeah, it can. There are a lot of unknowns here. However, what I meant is the async code is more performant in general, as it allows processing more requests at once. 10k connections and all that. A relational database is kind of a bottleneck, but not all code is working with the database.

But let’s take your example. Suppose we have a WSGI server. A lot of requests are hitting it every second, and we can process only a few of them at once (suppose we have dozens of threads doing that). So the first 30 requests will be processed quickly. The next 30 will be processed with a delay. And the last ones may time out. You see that the latency is not the same? Compare to what we have in the async world: we process all the requests at once. The latency is roughly the same for all requests and none of them times out.

You will argue that the requests probably need a database, and the latter cannot take all of them at once, that is true. But first, not all code needs a database. Second, with async, you have more flexibility there as well. You can set the number of connections to whatever value you want, to whatever is optimal for your database. Can easily increase the number of connections, say, from 100 to 200. With the blocking approach, however, 1 thread = 1 connection, so you are less flexible.

Does not relate to your example, but what if a view makes an http request? In that case, it can block its thread for a long time. You can say you can use a threadpool for the db-related tasks, and an async thread for all the rest, like the latest django does. But this is a significantly more complex system, compared to 1 async thread: it’s like WSGI and ASGI combined into one.

Here is the evidence: Company Announcement | Pydantic See the first (and the only) graph

1 Like

Please, specify. Given the situation of one request being made to one view, explain how the addition of the creation of an event loop in which to run this view will reduce the time required for its execution.

Only true when the requests themselves perform operations that are constrained by the time necessary to perform operations external to that process (e.g. I/O)

Every context-switch, every dispatch to the event loop, every change in processing context costs time. The CPU waiting for an I/O to complete (“blocked”) at the point where it needs to continue is going to require less time to resume that process than if an event loop is going to receive the notification that the I/O has finished and then dispatch to the waiting process.

In an extreme case, if you have a CPU-intensive view that doesn’t access the database at all, it’s not going to matter if you’re running sync or async - you are effectively limited one request per core.

Again, there’s a large number of sites where this isn’t true. Not everyone is running Facebook, Twitter, Instagram, YouTube, etc.

Multiple requests / second probably translates to a minimum of 100,000 requests / day. I know more people than not that are hopeful for 100,000 requests / month.

I’ve talked to a fairly sizeable number of people at events like DjangoCon or PyCon regarding the sizes of their deployments. In terms of number of sites based on Django, my (informal and unsupported) impression is that there are a lot more that don’t see activity anywhere near the range of 1 request / second.

Most do - at least some datastore - for the User object to populate request.user or for session support. But I do acknowledge that it’s not universally true.

Also not universally true, and not necessarily accurate. It’s also possible that they all time out because the net latency caused by the number of requests in total exceeds the capacity of the system to handle them. (e.g. A CPU-bound view as mentioned above.)

That’s one of the edge cases I referenced earlier. I continue to acknowledge the usefulness of async in those situations. I just don’t think it’s as prevelent as you appear to want to make it seem. (An opinion, yes. But in 10 years now of working almost exclusively in Django, I can still count on one hand the number of times I’ve needed to do that.)

Hmmm… It doesn’t look like django is losing anything based on that graph. Thank you for confirming my point. (That they aren’t showing the same rate of growth as a different product really isn’t relevent to me. What I see in that chart is a gradual increase, by month, of Django downloads.)

But anyway, look, I know you have been beating the drum trying to drive async develompent here for more than a year. I applaud your persistence.

I just have no use for it - at least not to the degree that I’m willing to sacrifice the performance and ease of development of synchronous Django for its adoption.

That’s what we, the proactive developers, do!

I already did. A view cannot be viewed in isolation (please excuse the pun). There are other views and other requests for the same view.

You can always limit the throughput (the number of simultaneously processing requests) with whatever number you want.

It’s not only the graph, I see the quickly rising popularity of FastAPI in the community, compared to django. Partly because of its own merits, partly because of the poor async support in django.

And this is where we’ll need to agree to disagree.

I fail to see how you can know the behavioral characteristics of the apps I work on better than I do - that you can assert a condition that I know not to be true.

1 Like

Hi @pwtail — reading all your posts, and conversations with Ken, I’m not sure what to make of this.

It’s all very sensationalist: “Shock content” and such. It doesn’t seem to engage with the async work that’s already in place. And I’m afraid I don’t have the bandwidth to try and work out what the connection could be from what you’ve posted. :woman_shrugging:

Your Vinyl projects are all archived, so there’s no real way to experiment and see if what you’re talking about even makes sense.

If you’re serious about pushing this forward, I would suggest developing as a third-party package with clear documentation on how you add it to your Django project, and what the benefits are. Folks can then give it a run, and if it serves a purpose you build up a bit of a user base, and so support for possibly adopting your ideas in Django itself.

I hope that makes sense.

Kind Regards,

Carlton

Hi @carltongibson! I am glad you asked. No, this post is not connected to any of my other ideas.

Vinyl was about rewriting django to separate away the I/O layer. Another attempt was putting greenlets to work. And “Shock content” was about using django as it is. All totally different, and this one is no different (yes, I’m good with words).

The idea is very easy to grasp: I propose adding async and await everywhere in the django codebase. The django’s sync API in this case, will be just a facade to respective async functions (but no event loop is needed in this case).

No, this is the only thing that is not possible, as I propose to add async and await to most of the functions in the django codebase.

No it doesn’t. My point is that I think it was a blind spot of the django team, that they didn’t go with the most straightforward approach, and instead went with this idea of using a threadpool.

Current async support in django, as you probably know, is not on par with sqlalchemy even, which, through greenlets, is able to use async database drivers, let alone the async-native ORMs.

Well, Django 4.2 just added support for psycopg’s async database driver, so I’d say we’re making pretty good progress here. You’re welcome to use other tech if it suits you better.

If you’re serious here, you’re going to need a demonstration — implement it for a a subset of Django and show what it can do.

The approach used is the one asyncio dictates — asyncio is the approach to async provided by Python.
If you’ve a much better async implementation then a proof-of-concept to Python Ideas would be the place to go. If it were adopted there, we might use it here.

1 Like

No, it added support for psycopg’s sync driver - the sync part of it.

I have no problem with making a PoC or the entire thing myself. I am not asking you to do a brainstorm either. Just the idea that I expressed here is very concrete, very straightforward: why can’t we add async and await everywere?

There should be obvious answer to this and every middle developer should know it. @andrewgodwin when was making an introduction to his plan for async support, started with a question: “Well, why can’t we just add async and await everywhere?” (btw, I lost the link to it, but I recall the answer was compatibility).

I am asking the same question. What is the answer to it?

1 Like

Existing Django code doesn’t work with asyncio (async/await) — so you can just make everything async def and then break everything out there. Or you can introduction async incrementally, as we’re doing.

It just feels like you’re trolling. I’m going to mute this. If you have serious suggestions for alternative approaches to async in Python, your best bet would be the Python mailing list, rather than here.

1 Like

Now we’re getting somewhere. Yes, I propose adding async def everywhere. And then just wrap the top-level functions like Model.save to be regular functions.

In the code snippet I’ve shown how you can run an async function without an event loop, in case when it doesn’t do async I/O (because it is doing blocking I/O, for example). In this case a coroutine is a generator that never yields.

How this is an alternative approach? I propose to add async and await everywhere and have just a single thread with asyncio running (for async usecase). This is the most vanilla approach. Using a threadpool is an alternative approach. Using greenlets is an alternative approach. But not mine.

No surprise that you feel that way, as it’s clear from the above that you didn’t get the idea. Despite me trying to put it in plainest terms.

Finally I think I do have the right edition of this in my mind. Yes, it can (and should) be done as a 3rd party project since it totally breaks the compatibility. I will call it vinyl.

The project will be a fork of django 4.2. The main idea is to add “async” and “await” everywhere throughout the codebase. However, it will refuse from the start any compatibility with django. So, “the API facade” that I was proposing above, won’t be required.

A taste of the future API:

await queryset
await obj.related_obj
async for obj in queryset2:
  await obj.save()

Every fetching will require an await, so iterating an unfetched queryset will result in an error. The rest will be pretty familiar for django users. An interesting detail: when no fetching is required, the existing django code/syntax will work!

The vinyl framework will be async-first without doubt. But it won’t be async-only.

It will be possible to use blocking I/O with vinyl, and to deploy as a WSGI app. The views in the app will be async functions anyway. But, in the case of blocking I/O, they will be treated as just generators (and those generators will never yield).

Why do I need to support WSGI? First: I don’t want to provide all the async db backends at once all by myself. Second: because I want to reuse the existing django testsuite, I need tested db backends, which currently don’t exist. In other words, I need to get past the chicken an egg problem. As a result, my framework, vinyl, will be passing tests and so, production ready, from day 1. There is no doubt however, that it’s asyncio that will be used in 95% cases or so.

To summarize: I am going to build vinyl framework that will be a django fork. The bulk of the changes will be just adding “async” and “await” everywhere. It will support both blocking and async I/O with a single codebase and with a single API.

I am sure django will soon become legacy and all the new features will be developed for vinyl.

What do you think?@andrewgodwin @charettes @adamchainz @carltongibson @felixxm @apollo13

Is the silence really the answer of the django team?

Hi,

Any news on the progress? It’s been a month or so… ?

I am sure django will soon become legacy and all the new features will be developed for vinyl.

Ok.