Allow 'Database Router' to override default `connection`

rockisch · September 5, 2024, 7:31pm

Hey,

Recently I had to use the Database Router feature, but ended up a bit surprised when I noticed that there is no way to override what DB is used when calling connection.cursor()! I get that the current API’s db_for_read and db_for_write doesn’t make sense for cursors, but adding a separate db_for_cursor or db_for_connection would easily fix the issue (in a backwards compatible way!).

I’m thinking about opening a feature request, but wanted to gather some thoughts before doing so. For reference, my use case is that I implemented a context manager for swapping what DB the Database Router uses (similar to this), so I can write code like this:

with context_db("replica"):
    user = User.objects.first()
    with connection.cursor() as cursor:
        cursor.execute(...)

Afaik there are 2 ‘supported’ ways to ensure this works as expected:

Update all usages of connection to connections[db_name]
Create a custom DB engine that does the swap on the egine level

I feel like 1 will make the calling code needlessly complex, since I would need to fetch db_name from something like a ContextVar. 2 would kinda work, and is probably the route I’ll be following for now, but I feel like this kind of change more sense in a ‘Database Router’ than on a ‘Database Engine’ context.

Thoughts?

adamchainz · September 5, 2024, 9:27pm

I think this is outside the design goal of database routers. My understanding is that database routers are for routing the model layer - queries from QuerySets. If you’re using the connection object, it’s your responsibility to do routing.

If you have a lot of pre-existing code that uses connection directly, yes, that will need a batch update. Hopefully, a choice regex find-and-replace would help. In general, it’s rare for Django projects to use connection directly, rather relying only on models and querysets.

rockisch · September 9, 2024, 7:21pm

Well, in that case there’s no need to implement the feature through routers. Maybe implementing it as a context manager, in the same way I was planning to implement it? That is:

with connection.override("replica"):
    ...  # 'connection.cursor()' will use 'replica'

I still find that having a global way to choose what DB will be used by default should be a common enough feature

In general, it’s rare for Django projects to use connection directly

Although I don’t have any data to back it up… I do find that hard to believe. There’s just too much stuff that’s outside the scope of an ORM, but that is still relevant in any decently sized application (DB locks, isolation levels, DB-specific queries that don’t translate well into a ORM model like recursive queries) that requires a cursor to be executed.

rockisch · September 9, 2024, 7:25pm

Also, when looking at my codebase, I just noticed that there are some 3rd party applications that most likely use connection.cursor under the hood, so I don’t think the ‘replace every connection.cursor’ approach would work.

adamchainz · September 9, 2024, 7:33pm

I’ve looked at ~100 projects that were deployed to production. I would guess about <10% used connection directly. Most Django projects don’t need the advanced SQL features in your list. I think you’ll be in the top 1% of Django users if you know how to use those features!

Yes, there is that. I even maintain one package that uses extensive direct connection (django-mysql). But it consistently provides a using parameter, which can at least be used with custom routing.

If we add something to Django core, it would probably be too unsafe to actually swap connection. I think it’s too risky, with regards to connection management, threads, and so on. I think we’d want a different object, like routed_connection, which third-party packages would need to interact with. So maybe not such a win…?

rockisch · September 9, 2024, 8:22pm

Yeah, that wouldn’t be that helpful xD

It would just change the current proxy to fetch a different connection when getting a cursor. As it stands, connection is defined as a ConnectionProxy initialized with alias="default":

class ConnectionProxy:
    """Proxy for accessing a connection object's attributes."""

    def __init__(self, connections, alias):
        self.__dict__['_connections'] = connections
        self.__dict__['_alias'] = alias

    def __getattr__(self, item):
        return getattr(self._connections[self._alias], item)

    ...  # Other wrapper stuff that uses 'self._connections[self._alias]'

The only change that would need to be made is to fetch alias from a ContextVar, and modify that ContextVar when the context manager is called. ContextVars are async-safe, so no issue there.

Worst that can happen is that some piece of code accesses the connection directly with connections["default"], or that stores a reference to the cursor for a long time. Still all that would happen is that it would use the default DB instead of the overwritten one, so that’s not going to break anything.

It also wouldn’t impact users not using the feature at all.

We do! (the codebase in question is for my company, so I’m not the only dev)

We also use a lot of Postgres specific stuff, and have done some DB magic in the past to handle zero-downtime model migrations in deploys with multiple servers.

And damn. I get that some of this stuff is niche, but I really though more people managed their isolation levels properly… probably tons of bugs waiting to be discovered

chris48s · January 28, 2025, 5:09pm

While I agree that using connection.cursor() directly is both discouraged and relatively uncommon/advanced, using multiple DB Connections is also relatively uncommon/advanced. They are both features you turn to when your app has DB requirements beyond the common case. I’d wager that although both groups are quite small, the overlap in the venn diagram:

“has an app with >1 DB connections” and
“uses connection.cursor()”

might be quite a big overlap.

Topic		Replies	Views
Use DATABASE_ROUTERS to pick a database connection for Transaction API by default ORM	9	1970	April 9, 2024
Force queries in a context to execute in the named database Using the ORM	6	27	January 8, 2025
Multiple django databases for different apps Using Django	7	2333	January 10, 2022
Multiple databases in Django (is it useable?) Using Django	6	3243	September 24, 2021
default database port change throws error for raw queries Mystery Errors	2	26	October 15, 2024

Allow 'Database Router' to override default `connection`

Related topics