Anyone had look at using Marimo with Django?

Hi folks,

I shared a post a while back about using Django and Jupyter together, because in many cases the python methods defined on a model can be really helpful for doing analysis of the data stored in your database. It’s linked below:

However, I found Jupyter could be a bit of a pain to work with sometimes, because while it’s very powerful, there’s also a fair amount of UI to learn and navigate, and the format any sessions are saved in is an ipynb notebook - by default isn’t very git friendlly, and the results of your queries are also saved in the format. You don’t always want this.

Enter Marimo

I recently came across the open source project Marimo - if you have ever used Observable with their nifty “reactive notebook” idea, I think you’ll be fairly comfortable with Marimo too, as you can think of being somewhat like the Pythonic answer to Observable.

With reactive notebooks, it’s much harder to end up with mysterious amounts of state hidden, making reproducing notebooks a real pain. There’s a number of common issues people have with notebooks, and they’re outlined on their docs page

Anyway, here’s how the makers of Marimo describe it:

Highlights.

  • reactive: run a cell, and marimo automatically updates all affected cells and outputs

  • interactive: bind sliders, tables, plots, and more to Python — no callbacks required

  • reproducible: no hidden state, deterministic execution order

  • deployable: executable as a script, deployable as an app

  • developer-friendly: git-friendly .py file format, GitHub Copilot, fast autocomplete, code formatting, and more

marimo was built from the ground up to solve many well-known problems associated with traditional notebooks.

Source: marimo docs.

This looks like it might be a nice tool for doing data exploration work with Django, and because you get lots of nice widgets for inspecting data when like you might with Jupyter, and the result of a saved exploratory session is just some python that can be run unattended, like any other script.

It feels like it might be an interesting middle ground between messing in a django shell (or shell_plus) session, and writing a full blown django management command.

For example, to start a notebook session, assuming you’ve added it to your dependencies you would call the following to start a session:

marimo edit 

That’ll give you notebook UI, and you can get access to your project by

import marimo as mo
import django
django.setup()

From there you can then interact with the ORM, from a notebook sesion, as if you were making queries using the django shell.

The advantage is that you get all the nice widgets and rendering of data provided by Marimo (their page shows loads of it). So you can query like so, and the like a notebook, the last value in a cell will be displayed.

from apps.my_app.models import SomeModel
Somemodel.objects.all()

Working with subfolders

One thing I’m appreciating is that using Marimo, I’m able to have all kind of saved notebooks and analysis files saved in a separate folder called analysis, which I can keep under version control and have it pushed to a separate remote repository to the main project repo.

From the project root, I just run the following, and I can interact with my data using the all the methods and models defined in my project codebase.

marimo edit ./analysis/my-analysis-notebook.py

My only extra dependency is the marimo library, which you’d only need to include in a development dependencies file. Considering the functionality it offers, doesn’t rely on that many external dependencies in its own right - useful to know.

Anyway with this approach, I’m able to have an open source django project, and a private analysis repo, where I feel much less worried about the two mixing, and me accidentally leaking data into the open repo for example.

Has anyone else used it and had either good or bad experiences with this yet?

I’m really impressed, and I’d find it useful to compare notes with others.

4 Likes

Thank you for highlighting this tool. It looks really useful and I’ll put it on my list to play around with :slight_smile:

1 Like

Slight update here.

While you can run analysis in an interactive notebook session like so:

marimo edit ./analysis/my-analysis-notebook.py

And even run a simplified web-app of sorts with

marimo run ./analysis/my-analysis-notebook.py

I’ve had some struggles running a python script that uses django unattended, like the following:

python ./analysis/my-analysis-notebook.py

For me this was one of the selling points of the tool for many people - “it’s just python” - so it looks like there is some knowledge about how the dependency graph is built so that various cells are executed after django.setup() has completed, or that PYTHONPATH might be set up properly (I had some trouble importing libraries when running python ./analysis/my-analysis-notebook.py, vs marimo edit ./analysis/my-analysis-notebook.py).

That said, being able to check notebook sessions into a separate source control repo is very helpful for investigating bugs or carrying out some ad-hoc analysis.

I’ll link to this post in the Marimo discord, so see if there are any pointers they might have.

Oh neat.

The Marimo team have documented some key decisions about how their system using MEPs (Marimo Enhancement Proposals).

These which athey list are really helpful for understanding what’s going on under the hood - MEP1, was incredibly helpful for helping my understand how the graph of reactive notebooks cells is created.

Resolving the path struggles

I think I’ve figured out the weird path shenanigans too.

Basically when you run this

marimo edit ./analysis/my-analysis-notebook.py

Marimo knows that the current path is the project root you’re in, this means you can import modules as if you were in a django shell session.

However, if you call this:

python ./analysis/my-analysis-notebook.py

Then it’s as if the current path is was the one inside the analysis directory, even though you’re calling python from the project root.

So, none of your modules can be imported, as they’re no longer on the path.

I was able to get this working by adding this snippet on startup to the python in the generated notebook.

So in the bit where your generated python looks like this;

import marimo as mo
import django
django.setup()

You make it look like this instead:

import marimo as mo
import django

import sys
import pathlib

sys.path.append(str(pathlib.Path().cwd()))


django.setup()

This makes sure that the rest of your modules are on your pythonpath, so any imports of models that are mentioned in the notebook work as you might expect:

This still feels a bit clumsy though - I’m sure there must be a nice way to run the same file without these path shenanigans.

Still there’s at least some resolution here, even if it’s not very elegant. Hope this helps anyone else experimenting with the two.

1 Like

Okay, I figured I should share an update as I realised I solved this a while ago, and have a solution that I’m finding works well enough in my job to allow me and a non-python coding co-worker to collaborate on ORM queries for various BI tasks.

Using this, they’ve been able to update and create their own Django ORM queries in notebooks, that use our own domain model, to answer questions about the data.

As mentioned before, the approach we have is to use a separate private data-analysis repo that we install in the same project directory as our main django app, that contains various notebooks.

Because Marimo notebooks are mainly just python files, we were able to set up a helper function like this that we can call in every notebook to deal with the path shenanigans:

def setup_django_for_marimo(main_project_path=None, set_async_unsafe=True):
    """
    Set up the django environment for working with the Marimo notebook.

    It adds the parent directory, containing the main django project
    the notebook to the sys.path and sets up the required environment
    variable, `DJANGO_ALLOW_ASYNC_UNSAFE` to true, to allow for running
    database queries in the notebook.
    """
    import django

    import os
    import sys
    import pathlib

    if not main_project_path:
        path_obj = pathlib.Path()
        main_project_path = str(path_obj.cwd())

    if set_async_unsafe:
        os.environ["DJANGO_ALLOW_ASYNC_UNSAFE"] = "True"

    if main_project_path not in sys.path:
        sys.path.append(main_project_path)

    # set up django
    return django.setup()

We’re then able to run queries of the rest of the notebook, and use the handy Marimo input widgets to make simple forms for interactivity, and the table widgets for simple export of the results of noodling around in the ORM to common re-use.

Running a workshop at DjangoCon Europe in Dublin on Marimo and Django this

Using Marimo in this way has been something I’ve been really happy to integrate into my workflow in my day job, and I think it would be helpful for others too, as Marimo is a lovely piece of software and plaus quite nicely with Django.

I’m working on a workshop proposal, and if you’re curious you should be able to see the draft below:

I’m sharing this as I figure it might be useful to others thinking of coming, or who have been following along this thread.

Hope this is of interest to others too :slight_smile:

2 Likes

Quick update. This was accepted!

If you have any specific questions about this workshop, let me know, and I’ll do what I can to work it into the content.

1 Like