Is there a way to disable the SynchronousOnlyOperation check when using the ORM in a Jupyter Notebook?

Hi,

I have a data science project that uses django as an experiment tracking and data management (annotation) backend. It exposes an API but also has a wrapper on top of the ORM that makes it possible to import the project in a Jupyter Notebook and use the ORM in an interactive fashion to create datasets from querysets, annotate new samples in a notebook and track/save training run artifacts.

I’m really excited for all of the new aio features that are coming so I tried to upgrade the project to 3.0 but unfortunately it breaks the jupyter based workflow that has worked so well for me. It turns out that ipykernel runs everything in an async context, causing django to throw a SynchronousOnlyOperation error on any database call.

Is there a flag that I can set to disable this check when using the ORM outside of the typical web server request flow?

I created a stack overflow issue for this as well which has a bit more info.

Looking over the code I could probably just add the flag myself here https://github.com/django/django/blob/master/django/utils/asyncio.py#L21 and use a forked version of django until the ORM fully supports asyncio.

Hey, if you’re seeing SynchronousOnlyOperation it means you’ve been calling the ORM in an unsafe fashion the entire time; you need to access the ORM via a synchronous context, otherwise you run the risk of corrupting your database.

The easiest thing to do would be to write your database access in a separate function, and then call that function with sync_to_async, like so:

from asgiref.sync import sync_to_async

def my_db_function():
    <do orm stuff>

await sync_to_async(my_db_function)()

Thanks for the reply.

Unfortunately, in order to support top level awaits in Jupyter, ipykernels execute code in a default event loop that triggers the async_unsafe check. I tried using sync_to_async but with the way I use the ORM in a notebook I’d either have either wrap every line in sync_to_async or monkey patch my whole package.

Since I’m running a single cell at a time and making sequential synchronous requests I’m not too worried about blocking the event loop or running into any other issues that come with running multiple concurrent io operations.

For now I plan on just using a fork of django that adds a ALLOW_ASYNC_UNSAFE flag that I can enable when running the project in a notebook. (https://github.com/michalwols/django/commit/b6c2d53a05d0ff720a4bc13cccb2a81571ce6078)

EDIT: I have a feeling that this will come up a bunch once people start upgrading to 3.0 so it might make sense to add the flag to django with a big fat warning when it’s enabled.

Ugh, so this is a Jupyter design issue then. I agree we need to add a workaround, since I doubt we’ll get too far trying to convince them to change their execution model.

I’ll see if we can get a patch in that allows the flag to be toggled like you suggest. Would an environment variable work in a Jupyter notebook, or is a setting much easier?

Either option should work.

In my case the whole project is an installable python package and I abstract away the typical django configuration by exposing a configure function that uses django.conf.settings.configure(**overrides) and django.setup() to allow the user to configure the project without the standard settings file.

OK, I think I’ll go for the environment variable then, just because then normal Django users can’t put it in a settings file and forget about it.

1 Like

Pull request for this is going through here: https://github.com/django/django/pull/12172

2 Likes