Multiple databases in Django (is it useable?)

I searching about multiple database support in Django, and decided that is pretty limited and useless. Is it real situation or I don’t understand basic conceptions of it?

  1. Yes, we have DatabaseRouter object. But we can choose DB only by model._meta.app_label. And app_label is a static subclass and applied to Model Class instead of Model instance. So, if I have Book table in ten databases (for multi-tenancy purposes), I need to create ten Book1Book10 models. It is insane. Here is no way to determine DB dynamically. Is my understanding correct?

  2. DatabaseRouter don’t have access to external objects at all. For example I can’t set current DB info into session data or can’t choose DB based request (url, cookie, request body, etc) information. Yes here is solution in the internet that using import threading and set global attribute into threading.local(). That is much unsecured way because one thread may be reused by multiple users. So we are returning to question 1: Here is no way to determine DB dynamically and apply it to whole view. Only one possible solution is to add instance.using('database') method for each query and that is really painful solution for already running application

  3. multiple databases is not supported in manage.py at all. Yes we can use --database=db_name parameter for migration, but only migration operations has this parameter. If we try to run manage.py custom_command, databaseRouter.db_for_read() is not called at all. I just get error raise ImproperlyConfigured("settings.DATABASES is improperly configured. ". So looks like I need to add multiple databases support from scratch for management commands.

So I have three exact questions:

  1. Is it possible to choose DB according logged in user session information (let’s assume that session DB is default and don’t need to be routed)?
  2. Is it possible to choose DB according request infomation (for example if URL will be https://example.com/db_name/api_url or https://example.com/api_url?db_name=db_name)
  3. Is --database paramter only supported for migration commands? How to call databaseRouter manually before proceeding with default DB in this case?

Ok, I solved 3rd question.

My command class was like

class Command(BaseCommand):

    def add_arguments(self, parser):
        parser.add_argument('arg1', type=str)
        ...
        parser.add_argument('argN', type=str)

    @transaction.atomic
    def handle(self, *args, **options):
        arg1 = options['arg1']
        ...

Here is transaction.atomic trying to use DB set in using parameter (and it is default if using is None), and ignores db_for_read/db_for_write of DatabaseRouter. So I changed it like

class Command(BaseCommand):

    def add_arguments(self, parser):
        parser.add_argument('arg1', type=str)
        ...
        parser.add_argument('argN', type=str)

    @transaction.atomic
    def handle(self, *args, **options):
        arg1 = options['arg1']
        db = options['database']

       with transaction.atomic(using=db)
           ...

Not sure if that is a logically correct behaviour…

Absolutely. See the using clause in Manually selecting a database. And just because the examples only show the use of a literal string as the database selector does not mean that that’s the only way to use it. You can use a variable or the result of an expression or function call.

Same answer as above.

yep! That’s what I’m talking about:

I’m already have tones of views and services inside it and I want to proceed all operations inside whole view with specific database. And looks like that is not possible, because all view’s fields and methods are not accessible from database router.

That’s correct. Why? Because not all database requests come from views. A router cannot route on information that may not exist. That’s why, if you’re going to route based on information in the view, that the selection needs to be done within the view.

There are other options - custom managers being one.

But any way you choose, you’re going to need to change the views.

Seriously, how many views do you have in your project?

I ready to handle it by myself. Something like

def db_for_read(?some_args_here?)
  if request.session.db is not None:
    return request.session.db
  else:
    return use_another_logic_to_determine_db()

Here is many enclosures (services) inside views, so maybe ~300 or more in total… Some views has 10 and more direct queries + queries inside services.

This sounds a lot like the kind of problems we ran into when we needed to support multiple databases (one for each tenant) too. I wrote about it in this thread, but we have since moved on from that implementation, so it’s useful only for reference.
We then came across this excellent ebook on multi-tenant systems in Django. Unfortunately even that was not enough to support our specific use case, but our current (working) implementation was derived from the solutions presented in that book. Particularly making DRF work with it was a bit tricky. I can’t link to a repo showing it in action yet, but I will post some snippets of important bits in the following. “Tenant” and “Project” are used interchangeably. I hope it helps you figure out a working solution. It may not be what Django was intended to do, but the rich ecosystem by far outweighs the need to fiddle with DB routing, I think.

context.py:

import contextlib
from contextvars import ContextVar
from .middleware import PROJECT

# holds a global (but context-specific) state of the currently handled project,
# in particular its database name. this is used for seperate modules working
# on the same data basis for external models, for which multiple instances
# exist, one in each database.
# NOTE: this variable should not be accessed directly unless specifically
# necessary. to get the tenant of the current context, the function get_tenant()
# should be used instead
TENANT = ContextVar("DB-name of current project/tenant", default=None)

def get_tenant():
    """The current tenant, if one is set at all.

    The tenant is set either through a context manager or the middleware class
    ProjectFromRequest.

    Returns:
        str: The database name of the current tenant, assuming one is set at all.
    """
    if TENANT.get() is not None:
        return TENANT.get()

    if PROJECT.get() is not None:
        return PROJECT.get()

    return None

@contextlib.contextmanager
def project_context(project):
    """Manages the context of the current project.

    Essentially sets the given project as the one being used for the context
    so that modules dealing with multiple instances of the same model know
    which tenant/project to work with. For example:

    In module "foo.py":
        from context import project_context

        project = Project.objects.get(pk=1)
        with project_context(project):
            project.foo = bar
            project.save()

    In module "routers.py":
        from context import TENANT

        class Router(...):
            ...
            def save_model():
                db_connection(name=TENANT.get()).save(project)

    The router in "routers.py" doesn't know by itself which database to use, so
    it asks the current tenant from the global contextvar "TENANT", which in turn
    was set (and reset outside of the context) by the context manager in "foo.py".
    """
    TENANT.set(project.short_name)
    yield
    TENANT.set(None)

routers.py:

import sys
from django.apps import apps
from .context import get_tenant
from .settings import PROJECT_PREFIX, TEST_PREFIX, APPS_EXTERNALS

# per-request cache for a dict of external models. use method get_external_models
# to access the dict.
# this dict MUST be lazy-loaded since trying to load it during module loading
# would run into cyclical dependencies between modules
EXTERNAL_MODELS = {}

def get_external_models():
    """Returns a dict with the name of external models as keys.

    Note that the dict is cached in the global EXTERNAL_MODELS, however
    access should happen via this method because the data is lazy-loaded
    to avoid cyclical dependencies.

    Returns:
        dict: The name, downcast, of external models as keys in a dict. The
            values can be ignored
    """
    if EXTERNAL_MODELS == {}:
        for app in APPS_EXTERNALS:
            for model in apps.get_app_config(app).get_models():
                if hasattr(model, "is_external") and model.is_external:
                    EXTERNAL_MODELS[str(model.__name__).lower()] = True

    return EXTERNAL_MODELS

class ProjectDBRouter:
    """The router handling the task of sending requests to the project databases
    to the right place.

    This is necessary for three reasons:
        1.) The project databases have a completely different structure than
            the database used by Django and should only contain project data
        1.) The project databases in production are not to be managed by Django
        2.) The project test databases for local tests have to be constructed
            and populated completely by the testing code.

    Therefore this router has two tasks:
        1.) Sending data for reading and writing to the right databases based
            on the model (is it internal to Django or external) and if it is
            currently performing tests or is in production mode
        2.) Allowing or disallowing migrations based on what model they concern,
            if that model is internal/external and if the migration is part of
            creating/populating a test database
    """
    #pylint: disable=unused-argument,invalid-name

    def check_db(self, model):
        """Internal function to check which to Db to read/write from/to

        Args:
            model (django.models.Model): The model class to check
            **hints (dict): Carried over from the function signature of a router
                but is ignored here
        Returns:
            str: The name of the database to use or None if the router is
                indifferent about it
        Raises:
            LookupError: If no tenant was set. This can happen when a function
                in the call stack does not use the project context manager
        """
        # external models ought to have an is_external attribute set to True
        if hasattr(model, "is_external") and model.is_external:
            db_name = get_tenant()
            if db_name is None or db_name == "":
                raise LookupError("No tenant was set in ProjectDBRouter for external model")

            if "test" in sys.argv:
                return TEST_PREFIX + PROJECT_PREFIX + db_name

            return PROJECT_PREFIX + db_name

        return None

    def db_for_read(self, model, **hints):
        """@see routers.check_db"""
        return self.check_db(model)

    def db_for_write(self, model, **hints):
        """@see routers.check_db"""
        return self.check_db(model)

    def allow_relation(self, obj1, obj2, **hints):
        # return None means that router is indifferent
        return None

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        externals = get_external_models()

        # if there are hints, we can possibly use these directly
        if "target_db" in hints:
            return db == hints["target_db"]

        if "is_external" in hints and hints["is_external"]:
            if db == "default":
                return False
            if "is_test_data" in hints and hints["is_test_data"]:
                return True

        # the default database only accepts internal models or migrations
        # that are not model specific
        if db == "default":
            return model_name is None or model_name not in externals

        # the project databases only accept migrations for the external models
        # if it is a test database
        if model_name is not None and model_name in externals:
            return db[0:len(TEST_PREFIX)] == TEST_PREFIX \
                and db[len(TEST_PREFIX):len(TEST_PREFIX)+len(PROJECT_PREFIX)] == PROJECT_PREFIX

        # none of the other checks caught something, so better safe than sorry
        return False

middleware.py:

from contextvars import ContextVar
from projects.models import Project

PROJECT = ContextVar("Project loaded from request", default=None)

class ProjectFromRequest:
    """This middleware exists to save a reference to the current project as
    parsed from the request.

    This is used in the process of determining to which database external
    models belong and read/write to/from.
    Please note that this does require the project's short name to be set as
    the deepest nested subdomain of the host part of the requested URL, e.g:
        http://proj_name.sub.domain.tld/some/path/with/further/segments
    would result in the current project to be set to the instance with short
    name "proj_name". If none such project exists, other subsystems might fail
    because no project is set, but the middleware continues operation. This in
    turn is intended so that views not requiring a project do not need to set
    the subdomain in such a way.
    """
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        try:
            segments = request.get_host().split(".")
            project = Project.objects.get(short_name=segments[0])
            PROJECT.set(project.short_name)
        except:
            pass

        response = self.get_response(request)
        return response

1 Like