Use DATABASE_ROUTERS to pick a database connection for Transaction API by default

In general, I am bumping the older discussion from Google Groups that took place a couple of years ago and Django Issue Tracker ticket that I opened recently, but it turned out that the behavior described is expected and documented in Django’s Transaction API docs.

The goal of this thread is to start a discussion about changing the default behaviour of the Transaction API. In my opinion, all transaction commands should be resolved via DATABASE_ROUTERS configuration and if none is available, then just use the DEFAULT_DB_ALIAS value to execute. It could either use a value returned by db_for_write() method of database routers or rely on new, introduced by this feature, db_for_transaction() method.

The reason why I find the current behavior unexpected is that the other Django ORM APIs, such as the QuerySet API, always resolve to the database alias provided via DATABASE_ROUTERS if it is not explicitly specified via .using() queryset method or using= argument of one of the API methods.

5 Likes

There’s actually another topic that describes the similar problem that I have faced. But, there’s no replies since July 2020, so I guess its time to bring up this discussion again.

2 Likes

Hi @diachkow

The low traffic is probably due to Django’s behaviour working for nearly everyone. The main use case for database routers is shifting some read queries to a replica, so transactions being on the default/primary database probably makes little difference.

I saw on your ticket:

I am developing an application that switches the database connection on per-tenant bases. The database configurations are added to settings.DATABASES at runtime in a middleware

That is really pushing Django outside of its design goals. Django doesn’t support runtime mutation of settings except for testing. But particularly so with settings.DATABASES - it has a special warning when you change it: django/django/test/signals.py at 888b9042b3598bab6557c62de82505eec9ea62ed · django/django · GitHub .

(Have you considered PostgreSQL schemas for tenant isolation? There are a number of high quality DjangoCon talks on the topic and various packages: https://www.youtube.com/results?search_query=tenant+postgresql+django .)

On the Google Groups discussion, Aymeric wrote:

You can meaningfully route individual ORM queries based on the model being queried and it’s often useful to do so. So Django supports it.

It would be convenient to do the same for transaction blocks, but it isn’t possible in a gener
ally useful way because you don’t know yet which models will be queried. So Django doesn’t support it.

I think that summarizes the main problem I see - the router can’t meaningfully know what the transaction is for. Transactions can be for reads or writes, and what the transaction will do depends on the code within.

I think you’re probably better off using your own atomic() wrapper that calls the router, if that is what you want. Maybe call it write_atomic() and do using=router.db_for_write(...)) ?

1 Like

Have you considered PostgreSQL schemas for tenant isolation?

Yes, I have considered using schemas for multi-tenant support. In fact, in our application we are like having two tiered multi-tenancy. To keep it simple, the organizations that are the main customers of our applications can create different, let us call them “projects” in our system. Each project should have different participants, and their data and the project data itself should be stored separately from the data of other projects. At first, to have this in our system, we used the django-tenants library to organize the data on projects. Now that we need to keep the data of each organization separately in a dedicated database instance, as this is the part of the business requirements we have, we need to organize this dynamic database connection feature in our Django application.

We managed to implement our custom middleware + database router pair, which dynamically establishes new connections at runtime and adds them to DATABASES settings and routes them to a necessary database alias, using a shared context variable (contextlib.ContextVar instance).

I think you’re probably better off using your own atomic() wrapper that calls the router

This was the first thing we thought of when we were trying to figure out how to use the desired database alias for our transaction operations. But since other third-party libraries, like the django-tenants themselves, can also call the Transaction API with the default `@atomic’ decorator, this didn’t work.

The workaround we found to fulfill our needs is to patch django.db.transaction.DEFAULT_DB_ALIAS and set it to a string-like object that dynamically resolves to the desired database alias. This way, Django’s Transaction API now has a default fallback to our dynamic database alias.

After your explanation of why database routers can’t be used for the Transaction API because we don’t know the models that the operations are calling, I think it would be nice to have an option to somehow override the default behavior of the django.db.transaction.get_connection(...) function. If only we had an option to specify via Django settings the path to a callable object that takes an optional database alias as its only argument and should return the DatabaseWrapper object, i.e. the connection object, and this callable would be used to resolve the database connection for the Transaction API - that would be awesome. I am thinking of something like

TRANSACTION_CONNECTION_FACTORY = 'django.db.transaction.get_connection'

Or we can copy-paste the current get_connection implementation to a new function and then make a new implementation of get_connection that rely on TRANSACTION_CONNECTION_FACTORY settings:

# ... where ever django settings defaults are stored ...

TRANSACTION_CONNECTION_FACTORY = 'django.db.transaction.default_connection_factory'

# ... in django.db.transaction module ...

from django.conf import settings
from django.utils.module_loading import import_string

def get_connection(using=None):
   factory = import_string(settings.TRANSACTION_CONNECTION_FACTORY)
   return factory(using)

This way, we can make the Transaction API to use a dynamically-resolved db alias for executing transactions.

2 Likes

Gosh, that sounds complicated. All the best with it.

I think you would instead be best doing this within a custom database backend. You can then swap your configuration on a per-instance basis. That should allow you to avoid changing settings.DATABASES, and will avoid the need for any “transaction connection factory”, since it’s more specific.

2 Likes

Well, maybe it’s actually a good idea to use a custom database backend for our use case. We did not consider this as it felt like a little overhead for my task, but with all the issues we faced with patching the settings.DATABASES object and the transactions API underneath, maybe the DB backend is really what we were looking for. I’ll let you know later here if that worked out. However, I still think it would be great to have an option to somehow change the default fall-back option for the DB alias in the Transactions API and have an option to resolve this dynamically, i.e. be a callable object.

2 Likes

Hi Adam

I came across this comment from Florian https://groups.google.com/g/django-developers/c/clzg6MiixFc/m/azrqPlCVCQAJ. Here he sites a custom database backend implementation you mentioned over here ara/ara/server/db/backends/distributed_sqlite/base.py at master · ansible-community/ara · GitHub.
I tried to replicate the same but with slight modifications.
Instead of using thread local variables I used connection.settings_dict['NAME']

Custom database backend:

#mydbengine
from django.db.backends.postgresql.base import DatabaseWrapper as BaseDatabaseWrapper
from django.db import connection


class DatabaseWrapper(BaseDatabaseWrapper):
    """
    Custom postgresql database backend meant to work with multi-tenancy
    in order to dynamically load different databases at runtime.
    """

    def get_new_connection(self, conn_params):
        conn_params["database"] = connection.settings_dict['NAME']
        return super().get_new_connection(conn_params)

Middleware:

from django.db import connection
from django.utils.deprecation import MiddlewareMixin

class ChangeDatabaseMiddleware(MiddlewareMixin):
    def process_request(self, request):
        # Change the database connection to the database for the current tenant.
        domain = request.get_host().split('.')[0]
        connection.settings_dict = DATABASES[domain]

Router:

from django.db import connection


class TenantRouter:
    """
    A router to route database requests based on the tenant.
    """

    def get_current_tenant(self):
        """
        Used to fetch the current tenant
        """
        return connection.settings_dict['NAME']

    def db_for_read(self, model, **hints):
        """
        Attempts to read from the tenant's database.
        """
        return self.get_current_tenant()

    def db_for_write(self, model, **hints):
        """
        Attempts to write to the tenant's database.
        """
        return self.get_current_tenant()

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        """
        Make sure migrations are only applied to the tenant's database.
        """
        return True

    def allow_relation(self, obj1, obj2, **hints):
        """
        Allow relations if both objects are in the same tenant's database.
        """
        return True

DATABASES = {
    'default': {},
    'tenanta': {
        'ENGINE': 'mydbengine',
        'NAME': 'tenanta',
        'USER': 'postgres',
        'PASSWORD': 'postgres',
        'HOST': 'localhost',
        'PORT': '5432',
    },
    'tenantb': {
        'ENGINE': 'mydbengine',
        'NAME': 'tenantb',
        'USER': 'postgres',
        'PASSWORD': 'postgres',
        'HOST': 'localhost',
        'PORT': '5432',
    }
}

This approach works for most cases but when we have a transaction.atomic() decorator it doesn’t pick the correct database without providing the using parameter.

This is because in

#django.db.transaction.py
def get_connection(using=None):
    """
    Get a database connection by name, or the default database connection
    if no name is provided. This is a private API.
    """
    if using is None:
        using = DEFAULT_DB_ALIAS
    return connections[using]
#django.db.utils.py
DEFAULT_DB_ALIAS = 'default'

Here we can see no matter if we set connection.settings_dict['NAME'] dynamically using a custom middleware or a custom database backend the transaction APIs fall back to DEFAULT_DB_ALIAS set to “default” without providing using param

The problem is doing transaction.atomic(using=connection.settings_dict['NAME']) works well in individual django projects search and replace would not be ideal but will work.
But this is a problem in third party packages which still use transaction.atomic() without the using param
Then we would have to search and replace in each and every third party packages.
This can be avoided by simply just replacing the DEFAULT_DB_ALIAS in django.db.utils.py but this is bad coding practice.

Is there any way to override the default transaction behavior using a custom database backend as you mentioned.
#35349 (Transaction API does not respect the DATABASE_ROUTERS configuration) – Django @diachkow

Cheers,
Febin

@adamchainz @diachkow
I found the issue in my code
I forgot to include my custom db engine mydbengine inside my default DATABASES
For reference ara/ara/server/settings.py at master · ansible-community/ara · GitHub

DATABASES = {
    'default': {
        'ENGINE': 'mydbengine',
    },
    'tenanta': {
        'ENGINE': 'mydbengine',
        'NAME': 'tenanta',
        'USER': 'postgres',
        'PASSWORD': 'postgres',
        'HOST': 'localhost',
        'PORT': '5432',
    },
    'tenantb': {
        'ENGINE': 'mydbengine',
        'NAME': 'tenantb',
        'USER': 'postgres',
        'PASSWORD': 'postgres',
        'HOST': 'localhost',
        'PORT': '5432',
    }
}```

Why doesn’t it work? I think more debugging/info is required. The atomic() should be routed to the default database, which then picks the correct connection parameters? You shouldn’t need a database router with that custom backend, as it acts like a router. Perhaps still using a router is causing double connections to your tenant DB.

As a last resort, you could also use patchy.patch() to inject custom database-picking code you want into transaction.atomic (more accurately than with monkey-patching).

(Btw if you want more in-depth help, I’m available for consulting work.)

2 Likes

Thanks for your reply

It worked. The root cause of this issue is mentioned below:

If the custom database backend is not specified it will default to the default database backend

Thank you for this insight. So if we are using a custom database backend a database router is redundant got it. I removed the database router and it is working as expected.

Thank you for this valuable insight. This might be useful for us somewhere else.
Once again Thank you for your valuable time. Hope this discussion thread will be beneficial for someone else.
Have a great day
Cheers,
Febin

1 Like