Multiple databases in Django (is it useable?)

This sounds a lot like the kind of problems we ran into when we needed to support multiple databases (one for each tenant) too. I wrote about it in this thread, but we have since moved on from that implementation, so it’s useful only for reference.
We then came across this excellent ebook on multi-tenant systems in Django. Unfortunately even that was not enough to support our specific use case, but our current (working) implementation was derived from the solutions presented in that book. Particularly making DRF work with it was a bit tricky. I can’t link to a repo showing it in action yet, but I will post some snippets of important bits in the following. “Tenant” and “Project” are used interchangeably. I hope it helps you figure out a working solution. It may not be what Django was intended to do, but the rich ecosystem by far outweighs the need to fiddle with DB routing, I think.

context.py:

import contextlib
from contextvars import ContextVar
from .middleware import PROJECT

# holds a global (but context-specific) state of the currently handled project,
# in particular its database name. this is used for seperate modules working
# on the same data basis for external models, for which multiple instances
# exist, one in each database.
# NOTE: this variable should not be accessed directly unless specifically
# necessary. to get the tenant of the current context, the function get_tenant()
# should be used instead
TENANT = ContextVar("DB-name of current project/tenant", default=None)

def get_tenant():
    """The current tenant, if one is set at all.

    The tenant is set either through a context manager or the middleware class
    ProjectFromRequest.

    Returns:
        str: The database name of the current tenant, assuming one is set at all.
    """
    if TENANT.get() is not None:
        return TENANT.get()

    if PROJECT.get() is not None:
        return PROJECT.get()

    return None

@contextlib.contextmanager
def project_context(project):
    """Manages the context of the current project.

    Essentially sets the given project as the one being used for the context
    so that modules dealing with multiple instances of the same model know
    which tenant/project to work with. For example:

    In module "foo.py":
        from context import project_context

        project = Project.objects.get(pk=1)
        with project_context(project):
            project.foo = bar
            project.save()

    In module "routers.py":
        from context import TENANT

        class Router(...):
            ...
            def save_model():
                db_connection(name=TENANT.get()).save(project)

    The router in "routers.py" doesn't know by itself which database to use, so
    it asks the current tenant from the global contextvar "TENANT", which in turn
    was set (and reset outside of the context) by the context manager in "foo.py".
    """
    TENANT.set(project.short_name)
    yield
    TENANT.set(None)

routers.py:

import sys
from django.apps import apps
from .context import get_tenant
from .settings import PROJECT_PREFIX, TEST_PREFIX, APPS_EXTERNALS

# per-request cache for a dict of external models. use method get_external_models
# to access the dict.
# this dict MUST be lazy-loaded since trying to load it during module loading
# would run into cyclical dependencies between modules
EXTERNAL_MODELS = {}

def get_external_models():
    """Returns a dict with the name of external models as keys.

    Note that the dict is cached in the global EXTERNAL_MODELS, however
    access should happen via this method because the data is lazy-loaded
    to avoid cyclical dependencies.

    Returns:
        dict: The name, downcast, of external models as keys in a dict. The
            values can be ignored
    """
    if EXTERNAL_MODELS == {}:
        for app in APPS_EXTERNALS:
            for model in apps.get_app_config(app).get_models():
                if hasattr(model, "is_external") and model.is_external:
                    EXTERNAL_MODELS[str(model.__name__).lower()] = True

    return EXTERNAL_MODELS

class ProjectDBRouter:
    """The router handling the task of sending requests to the project databases
    to the right place.

    This is necessary for three reasons:
        1.) The project databases have a completely different structure than
            the database used by Django and should only contain project data
        1.) The project databases in production are not to be managed by Django
        2.) The project test databases for local tests have to be constructed
            and populated completely by the testing code.

    Therefore this router has two tasks:
        1.) Sending data for reading and writing to the right databases based
            on the model (is it internal to Django or external) and if it is
            currently performing tests or is in production mode
        2.) Allowing or disallowing migrations based on what model they concern,
            if that model is internal/external and if the migration is part of
            creating/populating a test database
    """
    #pylint: disable=unused-argument,invalid-name

    def check_db(self, model):
        """Internal function to check which to Db to read/write from/to

        Args:
            model (django.models.Model): The model class to check
            **hints (dict): Carried over from the function signature of a router
                but is ignored here
        Returns:
            str: The name of the database to use or None if the router is
                indifferent about it
        Raises:
            LookupError: If no tenant was set. This can happen when a function
                in the call stack does not use the project context manager
        """
        # external models ought to have an is_external attribute set to True
        if hasattr(model, "is_external") and model.is_external:
            db_name = get_tenant()
            if db_name is None or db_name == "":
                raise LookupError("No tenant was set in ProjectDBRouter for external model")

            if "test" in sys.argv:
                return TEST_PREFIX + PROJECT_PREFIX + db_name

            return PROJECT_PREFIX + db_name

        return None

    def db_for_read(self, model, **hints):
        """@see routers.check_db"""
        return self.check_db(model)

    def db_for_write(self, model, **hints):
        """@see routers.check_db"""
        return self.check_db(model)

    def allow_relation(self, obj1, obj2, **hints):
        # return None means that router is indifferent
        return None

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        externals = get_external_models()

        # if there are hints, we can possibly use these directly
        if "target_db" in hints:
            return db == hints["target_db"]

        if "is_external" in hints and hints["is_external"]:
            if db == "default":
                return False
            if "is_test_data" in hints and hints["is_test_data"]:
                return True

        # the default database only accepts internal models or migrations
        # that are not model specific
        if db == "default":
            return model_name is None or model_name not in externals

        # the project databases only accept migrations for the external models
        # if it is a test database
        if model_name is not None and model_name in externals:
            return db[0:len(TEST_PREFIX)] == TEST_PREFIX \
                and db[len(TEST_PREFIX):len(TEST_PREFIX)+len(PROJECT_PREFIX)] == PROJECT_PREFIX

        # none of the other checks caught something, so better safe than sorry
        return False

middleware.py:

from contextvars import ContextVar
from projects.models import Project

PROJECT = ContextVar("Project loaded from request", default=None)

class ProjectFromRequest:
    """This middleware exists to save a reference to the current project as
    parsed from the request.

    This is used in the process of determining to which database external
    models belong and read/write to/from.
    Please note that this does require the project's short name to be set as
    the deepest nested subdomain of the host part of the requested URL, e.g:
        http://proj_name.sub.domain.tld/some/path/with/further/segments
    would result in the current project to be set to the instance with short
    name "proj_name". If none such project exists, other subsystems might fail
    because no project is set, but the middleware continues operation. This in
    turn is intended so that views not requiring a project do not need to set
    the subdomain in such a way.
    """
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        try:
            segments = request.get_host().split(".")
            project = Project.objects.get(short_name=segments[0])
            PROJECT.set(project.short_name)
        except:
            pass

        response = self.get_response(request)
        return response

1 Like