Allow 'Database Router' to override default `connection`

Hey,

Recently I had to use the Database Router feature, but ended up a bit surprised when I noticed that there is no way to override what DB is used when calling connection.cursor()! I get that the current API’s db_for_read and db_for_write doesn’t make sense for cursors, but adding a separate db_for_cursor or db_for_connection would easily fix the issue (in a backwards compatible way!).

I’m thinking about opening a feature request, but wanted to gather some thoughts before doing so. For reference, my use case is that I implemented a context manager for swapping what DB the Database Router uses (similar to this), so I can write code like this:

with context_db("replica"):
    user = User.objects.first()
    with connection.cursor() as cursor:
        cursor.execute(...)

Afaik there are 2 ‘supported’ ways to ensure this works as expected:

  1. Update all usages of connection to connections[db_name]
  2. Create a custom DB engine that does the swap on the egine level

I feel like 1 will make the calling code needlessly complex, since I would need to fetch db_name from something like a ContextVar. 2 would kinda work, and is probably the route I’ll be following for now, but I feel like this kind of change more sense in a ‘Database Router’ than on a ‘Database Engine’ context.

Thoughts?

I think this is outside the design goal of database routers. My understanding is that database routers are for routing the model layer - queries from QuerySets. If you’re using the connection object, it’s your responsibility to do routing.

If you have a lot of pre-existing code that uses connection directly, yes, that will need a batch update. Hopefully, a choice regex find-and-replace would help. In general, it’s rare for Django projects to use connection directly, rather relying only on models and querysets.

2 Likes

Well, in that case there’s no need to implement the feature through routers. Maybe implementing it as a context manager, in the same way I was planning to implement it? That is:

with connection.override("replica"):
    ...  # 'connection.cursor()' will use 'replica'

I still find that having a global way to choose what DB will be used by default should be a common enough feature

In general, it’s rare for Django projects to use connection directly

Although I don’t have any data to back it up… I do find that hard to believe. There’s just too much stuff that’s outside the scope of an ORM, but that is still relevant in any decently sized application (DB locks, isolation levels, DB-specific queries that don’t translate well into a ORM model like recursive queries) that requires a cursor to be executed.

Also, when looking at my codebase, I just noticed that there are some 3rd party applications that most likely use connection.cursor under the hood, so I don’t think the ‘replace every connection.cursor’ approach would work.

I’ve looked at ~100 projects that were deployed to production. I would guess about <10% used connection directly. Most Django projects don’t need the advanced SQL features in your list. I think you’ll be in the top 1% of Django users if you know how to use those features!

Yes, there is that. I even maintain one package that uses extensive direct connection (django-mysql). But it consistently provides a using parameter, which can at least be used with custom routing.

If we add something to Django core, it would probably be too unsafe to actually swap connection. I think it’s too risky, with regards to connection management, threads, and so on. I think we’d want a different object, like routed_connection, which third-party packages would need to interact with. So maybe not such a win…?