I work on Open edX, an open source project that for historical reasons uses MySQL for its production/development backend, and SQLite for running tests. I’m curious if there are established patterns for simultaneously supporting collations for multiple database types in Django code. For instance:
- Applying specific character encoding and collation types on an app+database basis, e.g. “always use
utf8mb4
character encoding for my app’s models when running in MySQL”. - Normalizing index/sorting case sensitivity behavior between MySQL and PostgreSQL/SQLite. I want some fields to always be case sensitive, regardless of database, like institution-assigned content identifiers. I want other fields to always be case insensitive, like titles.
I know that there’s a db_collation setting on CharFields now, which seems like it’d do exactly what I would want for one database backend, but wouldn’t work for more than one platform at a time. I know I can force case insensitivity in sorts and comparisons with things like iexact
or using Lower
, but my understanding is that would have severe performance penalties (some of our models have to scale out to billions of rows).
We currently do some ad hoc detection and switching of behavior in migrations files using db.connection.vendor
. Does it make sense to try to make new subclasses like CaseSensitiveCharField
and CaseInsensitiveCharField
? A separate utility function to call to manually set the collation in the migration after the normal CharField has been created? Are there existing patterns for this kind of thing?
Thank you.