Feasibility of using subprocess for auto-reloader

Hi @orf — I hope you don’t mind me pinging you here, I wanted to pick your brain.

(@andrewgodwin cc-ing you here for interest and expertise :slight_smile:)

tl;dr — do you think it would be feasible, and do you have any preliminary pointers, for allow the auto-reloader to use subprocess, rather than (or as well as) threading?

More detail

We had an issue with Channels’ runserver not working with the auto-reloader.

The summary is that Daphne/twisted is initialised with the asyncio event loop from the main thread, but then the auto-reloader runs it in a sub-thread, and trouble ensues. (I reduced this here — hopefully you’ll how that’s parallel to the auto-loader situation.)

We patched that in Daphne, giving twisted it’s own event loop to work with.

That seemed to work but caused a number of unrelated issues. Essentially folks all use get_event_loop() expecting to get A) the event loop from the main thread, and B) the event loop Daphne/twisted is running (which, the expectation is, will be the same.) When that’s not true, things break. (Related issues: daphne#332, daphne#319, daphne#299)

I’m going to have to revert the original fix — but that means no auto-reload support, at least for now.

The solution would be if the auto-reloader would spin up Daphne as a subprocess, rather than a thread. That way it gets to run in the main thread, and everyone is happy.

Can I ask for initial thoughts on the plausibility of such?

Thanks! :pray:

Hi @orf — I hope you don’t mind me pinging you here, I wanted to pick your brain.

Hey! I don’t mind at all. With work and current world events I’ve been running on empty recently, so I’ve not really not had the energy to passively keep up with everything Django. But I’m always happy to be pinged when needed!

Right. So this is a tricky issue. An initial version of the autoreloader did indeed do something this, but I immediately ran into all kinds of issues. The tl;dr is that to get a reliable set of modules we reload on we have to run the iter_modules_and_files method inside the process that’s actually doing the “work”. Imagine a case where you have a view that indirectly or directly executes import my_cool_module as part of a view. The parent process has no idea that this module has been loaded in the child process.

So I originally used multiprocessing with a message queue and had the child process send the output from iter_modules_and_files to the parent process, which would then do the actual file watching and simply terminate the child process when something changes. This got quite complex quite quickly and I abandoned that approach in favour of the far simpler “loop, check and exit” approach we had previously.

I think a better approach would be to see if we can just invert the usage of threads here. The reason we run the main Django entrypoint in a thread is that (surprisingly to me) sys.exit() doesn’t actually work from a thread. It just terminates the current thread. As the autoreload loops main job is to terminate the current process when a change is detected, this is quite critical.

If I remember correctly I didn’t put too much effort into looking at if we can do it another way. A quick google shows that there might be one easy way, but it has some caveats.

Off the top of my head, assuming _thread.interrupt_main() isn’t suitable, we could either:

  1. Install a signal handler to catch a signal that’s supported in Windows, Linux and MacOS
  2. A more complicated refactor that involves a threading.Event that is polled as part of the “main Django loop”, whatever that is.

Where possible I’m generally adverse to using signals, but I think a more complicated (and possibly cleaner) refactor might be a significant amount of effort.

1 Like

Hey @orf. That’s a good answer — nice links — let me chew on it for a while :grin:

Keep safe, take care!
Thanks. C.

Hupper (from the awesome Pylons people) does utilize the subprocess-IPC style reloading I ran into issues implementing.

I do wonder if we could utilize hupper in Django. It supports both Watchman and Watchdog (i.e native fs notifications in Python) and the code looks really clean. I haven’t got any firm attachment to the autoreload implementation that I wrote, but there are certain advantages to keeping this “in-house”, especially with something as critical to the development experience like an autoreloader. However there are also big advantages to utilising a stable third party package, especially when dealing with a non-differentiating and somewhat complex feature like an auto-reloader, which simply just “has to work” and that’s it.

What are you thoughts here? I know Django has a tenuous relationship with relying on third party packages, but if we can get comfortable this could greatly simplify a few things and give us some benefits to boot. Just a thought :thinking:

What are you thoughts here?

Ha! :smile: — OK, very initial…

That it’s hard, and that if we could out-source it that would be amazing. The move to Watchman wasn’t too smooth, as folks struggle to get it going, so I keep looking around. I saw uvicorn implemented a Watchgod based reload, you’ve mentioned hupper here. There seem to be lots of options, but how good/stable are any of them?

Do you think we could define a Backend API, then ship versions? Stat, Watch*, … — I don’t know if we need many in core, but having an abstraction means we’re not tied if a particular project stops being maintained. (That’s just a thought.) (I guess to we already have that with the different reloaders…)

There seem to be lots of options, but how good/stable are any of them?

From my experience at looking at these I’d definitely say that hupper is one of the the most mature and “cleanest” implementation. I’ve not heard of watchgod but it seems like a somewhat simplistic implementation. Another popular one is watchdog, which is also quite mature but went through a long period of inactivity. That’s since been resolved and is under active development again.

Do you think we could define a Backend API, then ship versions

I wouldn’t be adverse to that at all, but the feature matrix could get really big and overlapping. To begin with it would be interesting to see if we can just “add hupper support” to Django, delegate watchman support to it and get non-watchman native FS watching for free. Making it more general might be the path to madness as each implementation has it’s own way of iterating modules and likely it’s own quirks around how it handles some corner cases. For example hupper doesn’t handle imports from zip files, and might not handle some of the weird issues that where reported to us with networked filesystems.

Anyway, maybe this is getting a bit off-topic. I think figuring out if we can just invert the way we handle threads would be the quickest win to the immediate problem at hand.

OK, gottcha. Thanks! Let me have a think about all you’ve said. :face_with_monocle: