Django and multi-tenancy issue

Hi!

I recently implemented my multi-tenancy app in Django (both django-tenants and django-tenants-schema apps are shared-db with multiple schemas, but I wanted to go for a shared-db with shared-schema since it seems to scale better).

So I created a custom middleware that infers the tenant from the domain and stores it in a thread_locals variable. I use Gunicorn with Uvicorn, since I have an ASGI app.

Generally speaking it seems to work fine, though there seems to be an issue on form validation.

If I open two different browser (one per tenant), I actually see different data (as expected)… then I open an edit form in each browser (to edit one record for the first tenant and another one for the second tenant), I submit the first form (to save the changes) and do the same for the second form. Sometimes it works, other times it doesn’t, complaining that the form is not valid (either the first one or the second one). Furthermore, what is even more strange, is that, even in case of validation error, keeping the modal open and trying to press the submit button several times, at some point the response is processed correctly (without changing anything on the form).

Looking at the logs it seems that the tenant is properly set on both forms, so I’m going crazy trying to understand what’s going on…

Do you have any suggestions?
Any hint would be appreciated.

P.S. More details in the first comment.

This:

isn’t going to work.

In a full async environment, everything is running in the same thread. It’s not “one thread per connection”.

This also isn’t going to work with multiple workers, because each worker is running its own threads, thread pools, and event loops.

You need to either set this as a session variable or in a cache element associated with that session (effectively the same thing) or by setting a header in the request.

Thank you Ken.
Is ContextVar one other valid alternative or does it suffer the same issues?

Same issues. You either need to set this in session or a request header.

Ok, I’m missing a point.
Let’s suppose I set tenant in a request header (some users might be anonymous, so I don’t know if sessions are a feseable idea). Since I’m also using the current tenant in other places other than requests (e.g. models’ Managers, custom templates’ Loaders, custom logging, …), thread_local / ContextVars was a global common point to set/get the tenant. How can I make this info live outside the request?

You really can’t. And I’m not saying that lightly.

As mentioned earlier, when you’re running async, every request is executing in the same thread within the same process.

From the docs for Async views:

The main benefits are the ability to service hundreds of connections without using Python threads.

If you’re going to “do something different” based upon an individual request, the information from that request needs to be available at every point.

(And that’s probably one of the reasons why the current existing multi-tenancy apps are structured the way they are.)

Just a last note (I don’t know whether it’s important or not).
Actually my app is an old plain synch app (I don’t use asynch anywhere) BUT it indeed has a chat module that uses websockets with channels (this is why I’m using Gunicorn with Uvicorn).

If I switched to plain Gunicorn (without Uvicorn) and gave up the chat would I still have the problem?
Thank you.

There are two separate issues here, first the switch to gunicorn:

That I don’t know. I’m guessing you wouldn’t, as long as you configure gunicorn to be multiprocess and not multithreaded.

But this whole idea of trying to do it this way seems so fragile and subject to any number of strange errors. It has a very strong bad “code smell” to me.

I’d be far more inclined to do the additional work to create whatever shims or proxies might be needed to ensure that the data is passed appropriately rather than this type of dynamic monkey-patching. You have no way of ensuring that something else doesn’t step on those values - or that you’re not stepping on something else.

Most importantly, you have no way of knowing what changes might be coming down the line such that what works now is going to cause these types of strange failures in a future release.

Now, regarding the change to gunicorn, it does not necessarily mean you need to give up your chat facility. The traffic to the two different subsystems can be segregated.

In my systems, nginx is the deployed web server. It proxies the regular Django traffic to my uwsgi instance, and the websocket traffic to a Daphne instance.

1 Like

Hi @KenWhitesell and @pperliti I will add another solution I found while trying to migrate parts of an old Django project to ASGI.

from asgiref.local import Local

This class claims to be a drop-in replacement for thread locals. I found django-simple-history already used this in the following manner

try:
    from asgiref.local import Local as LocalContext
except ImportError:
    from threading import local as LocalContext

Even django-multitenant has the same approach. Check this commit

Here is the source code of the ASGI local. It is very well documented and easy to read.
asgiref local.py

1 Like

When there is a will there’s (often) a way :slight_smile:

According to my understanding, this is always the case:
“In Django’s admin, instances are not reused between requests. Each request is completely separate and stateless, which is a fundamental part of how HTTP and Django’s request-response cycle is designed.”

This means that you can do things with the instances without worrying that it will be reused for other tenants.

I needed to get the tenant schema name into a ModelForm init method. I then have a thread safe singleton from which I can look up all tenant data, using it’s schema name. This is how I did it. It is a kind of a hack but it uses only documented Django methods.

Can you see any problems with this?


class PlaceNameListForm(forms.ModelForm):
    def __init__(self, *args, **kwargs):
        # Get the schema name from the kwargs. We must remove it before calling super(), otherwise an error will occur
        tenant_schema_name = kwargs.pop('tenant_schema_name', None)
        super().__init__(*args, **kwargs)

class PlaceNameListFormSet(BaseModelFormSet):
    """ We used this FormSet to transfer the tenant schema name over to the PlaceNameListForm"""

    def __init__(self, *args, **kwargs):
        self.tenant_schema_name = ''

        # Get the tenant schema name from our class name, see PlaceNameAdmin.get_changelist_formset()
        class_name = self.__class__.__name__
        if "_" in class_name:
            prefix, tenant_schema_name = class_name.split("_", 1)
            self.tenant_schema_name = tenant_schema_name
        super().__init__(*args, **kwargs)

    def _construct_form(self, i, **kwargs):
        # Pass the tenant schema name to the form by using the kwargs
        kwargs['tenant_schema_name'] = self.tenant_schema_name
        return super()._construct_form(i, **kwargs)


class PlaceNameAdmin(admin.ModelAdmin):
    def get_changelist_form(self, request, **kwargs):
        return PlaceNameListForm

    def get_changelist_formset(self, request, **kwargs):
        # This is a hack to get the tenant schema name into PlaceNameListFormSet. From there it is passed to
        # the PlaceNameListForm. The schema name is used to fetch correct settings for this tenant.
        # I tried many other ways but this was the only one that worked. (You can't use kwargs because
        # the factory methods in the Django code raises an exception if you pass in an unknown kwarg)

        # Override the usual formset class with PlaceNameListFormSet and then have the super() create it
        kwargs['formset'] = PlaceNameListFormSet
        FormSet = super().get_changelist_formset(request, **kwargs)

        # Dynamically inherit from PlaceNameListFormSet and create a new class with the schema name in the class name
        tenant_schema_name = request.tenant.schema_name
        class_name = f"PlaceNameListFormSet_{tenant_schema_name}"
        formset_class = type(class_name, (FormSet,), {})
        return formset_class

ps.
I asked ChatGPT 4 in several ways about ways to solve this, but it was unable to help me.

The paragraph you are quoting refers to the Django admin - which is still completely synchronous.

And, in a synchronous environment, you should be ok.

The OP of this thread specifically mentioned that they are working toward async, in which all async views execute in the same thread.

Side note: “Those relying upon ChatGPT are forever doomed to repeat the mistakes of thousands of programmers before them.”