Different models for different databases

Hi all! I’m currently working on a GSoC project to improve the search backend provided in the Wagtail CMS, which is based on Django.

The issue I’ve run into is that I need to create different models for each specific database system that the user may have. That’s because different database systems implement full-text search capabilities in varying approaches.

For example, in PostgreSQL you may want to have a table with the following definition:

CREATE TABLE fts(original_text text, stemmed_text tsvector);

On top of that, you could use a GIN index on the stemmed_text column, to improve full-text search performance.

However, this could be implemented in SQLite as:

CREATE VIRTUAL TABLE fts USING FTS5(original_text);

You would not use a GIN index here. They’re not even supported by SQLite.

As you see, the syntax is fundamentally different. On PostgreSQL, the stemmed_text column needs to be created, while there’s no need to have additional columns on SQLite. Therefore, it looks like the models themselves are different.

This seems like a reason to use conditional declaration of the models, meaning that only one of the models would get defined, depending on which database system we’re using.

if connection.vendor == 'postgresql':
    class PostgreSQLModel(models.Model):
        original_text = TextField()
        stemmed_text = SearchVectorField()
elif connection.vendor == 'sqlite':
    class SQLiteModel(models.Model):
        original_text = TextField()

This means that, when the user is using a PostgreSQL database, the PostgreSQLModel will get picked up by Django, but not the SQLiteModel; and vice versa.

Is this a reasonable approach? Can you think of a better, more orthodox implementation?

I’m not sure if it helps, or if your question is past this, but have you seen the documentation on database backends? And for migration specific logic, the SchemaEditor? In the simplest case, you can have 1 Field type that will generate the appropriate sql depending on the backend configuration.

I would try to keep the same model and conditionally define the fields e.g.

if connection.vendor == 'postgresql':
    from django.contrib.postgres.search import SearchVectorField
else:
    SearchVectorField = object


# Create your models here.
class MyModel(models.Model):
    original_text = models.TextField()
    stemmed_text = SearchVectorField()

I don’t know how exactly you would like to handle VIRTUAL tables on SQLite, so my advice may not be the best.

Thanks for your suggestions! I had thought about defining custom Field types that generate the appropriate SQL types for each backend, but I had originally discarded that possibility because of the fact that some database systems use a completely different syntax, like the one I showed above (virtual tables).

However, I didn’t know about the possibility of customising the SchemaEditor. I’ve been looking into that recently, and it looks like a possible way to tackle this issue. It would enable the customization of the creation of the table, so that Django runs CREATE VIRTUAL TABLE instead of the usual CREATE TABLE. I will test it to see if it is actually a feasible approach, but it seems like a really good idea in forethought. Thank you!

Do you think that it is a better approach to define the fields conditionally, instead of creating a custom Field type that exports different types for each database (tsvectors for PostgreSQL, None for SQLite, …)?

For inspiration it might also help to look at how JSONField was implemented. While not exactly, it seems like has to deal with “each vendor requires its own implementation” sort of thing.

Also, it might help to look at the existing database backend features and see how they are handled across vendors.

I think what you’re trying to do is challenging because there is no vendor ORM agnostic way to handle search yet. there is only support for full text search for postgresql in the core.

As far as I’m aware None data type will not work. You could use SearchVectorField = models.Field on SQLite which will be ignored by migrations, however all other attempts to use MyModel via the ORM will crash with “missing column” errors. That’s why I proposed to use object.

What I’m referring to is implementing SearchVectorField like so:

class SearchVectorField(models.Field):
    def db_type(self, connection):
        if connection.vendor == 'postgresql':
            return 'tsvector'
        else:
            return None

Which I believe should work, according to the Django docs on writing custom Fields:

if your column requires truly complex SQL setup, return None from db_type() . This will cause Django’s SQL creation code to skip over this field. You are then responsible for creating the column in the right table in some other way, but this gives you a way to tell Django to get out of the way

Yes, but you don’t really want to add stemmed_text database column on SQLite you want to use a proxy VIRTUAL TABLE. According to docs “You are then responsible for creating the column in the right table in some other way”, that’s why you will not be able to use a model with such column via ORM.