Handling multiple Postgres schemas

Here’s the scenario I’m faced with:

I’m working on a legacy postgres database which has approximately 80 different schemas, and I want to migrate an existing application that uses that db (and performs hand-made queries) to Django.

I’m following the advice here and I created one Django app per schema, and generated unmanaged models using inspectdb with a connection per schema that uses the search path option to only look for that schema.

So now I’m finding myself with about 80 django apps, all with a models.py file containing their respective tables.

I need to model some queries that look like this:

select a.field1, b.field2, c.field3 
from schema1.table1 a, schema2.table1 b, schema2.table2 c 
where ...

I’m stuck on this as I’m not sure how to achieve this. If I try to print out the query resulting from a call like: table1.objects.all(), it becomes a select on table1. I probably want it to be a query on schema1.table1, where schema1is also the name of the app that table1 model is contained in.

One way I’ve achieved such a query to be generated was to modify each model’s Meta.db_table to be prefixed with the name of the schema. This seems like a hack and I’m not even sure it works.

Is there a better way?

Also, I’m assuming that since this is all a single db, in the end I’ll have to put a single search_path option in the default connection to include all schemas. Is that accurate?

The ORM is not going to help you much here. Django does not support cross-database relations. You might be able to fake this to some degree using annotations with subqueries, but it’s going to be awkward.

Probably not using the ORM. The easiest way to do this is likely going to be raw SQL.

That can work if there are no tables with the same name. However, the search pattern will find the first table finding the requested name regardless of the schema in which it resides.

That’s interesting. I’ll have to try that.

If that works, and if it results in valid queries being generated, that may well solve your issue. If it works, then it removes the need for all the different database definitions.

It appears I was able to make it work.

By:

  • specifying the schema name in the db_table like this:
class Meta:
        managed = False
        db_table = '"schemaname"."tablename"'
  • including all the schemas in the db connection dict:
DATABASES = {
    # ...
    "my_db": {
        # ...
        "OPTIONS": {"options": "-c search_path=schema1,schema2,schema3"}
    }
}

I am able to perform queries that span multiple schemas on postgres without issues.

I’ll add a small note that is loosely related to this issue and might be beneficial to anyone who stumbles over this in the future:

Given that this was a legacy db, with over 80 different schemas, I had to use the inspectdb utility to generate unmanaged models from the existing db. inspectdb uses the database introspection classes which don’t work with multiple schemas out of the box, so I monkey patched my way until it worked.

Here are some spots I had to change in django.db.backends.postgresql.introspection.py:

in the TableInfo definition, add a field schema:

TableInfo = namedtuple("TableInfo", BaseTableInfo._fields + ("comment", "schema"))

In the get_table_list function, add the table schema to the query:

    def get_table_list(self, cursor):
        """Return a list of table and view names in the current database."""
       
        cursor.execute(
            """
            SELECT
                c.relname,
                CASE
                    WHEN c.relispartition THEN 'p'
                    WHEN c.relkind IN ('m', 'v') THEN 'v'
                    ELSE 't'
                END,
                obj_description(c.oid, 'pg_class'),
                n.nspname -- here
            FROM pg_catalog.pg_class c
            LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
            WHERE c.relkind IN ('f', 'm', 'p', 'r', 'v')
                AND n.nspname NOT IN ('pg_catalog', 'pg_toast')
                AND pg_catalog.pg_table_is_visible(c.oid)
        """
        )
        return [
            TableInfo(*row)
            for row in cursor.fetchall()
            if row[0] not in self.ignored_tables
        ]

then in django.core.management.commands.inspectdb.py at the end of the handle_inspection method, add the table schema in your call to self.get_meta:

                yield from self.get_meta(
                    table_name,
                    constraints,
                    column_to_field_name,
                    is_view,
                    is_partition,
                    comment,
                    # add schema
                    table_info[table_name][3]
                )

lastly, modify get_meta to include the passed schema in the table name:

    def get_meta(
        self,
        table_name,
        constraints,
        column_to_field_name,
        is_view,
        is_partition,
        comment,
        schema # add
    ):

    # ...
    meta += [
            # ...
            "        db_table = '%s'" % f"\"{schema}\".\"{table_name}\""
    ]
    # ...

voilà. This way, inspectdb will correctly import all models from all schemas, naming the models’ db_table with the schema and table name.

If there are tables with the same name across different schemas, this may potentially cause issues with model class naming and foreign keys. I haven’t tested that as that wasn’t the case for my legacy db.

1 Like

Are you able to do tests successfully?

Yes, everything has been working fine, both when it comes to retrieving models & creating / updating them.

I cannot use test because Django will create test_{table}, and in this case it doesn’t work

@yekesit I’m working on a project where we created some different schemas for data we needed to store outside of django models because the orm isn’t able to handle our particular use case (to put it very briefly), and I was able to make it work by making a custom test runner that creates the schemas for all the test databases (you have to do it for each separate one if the tests are being run in parallel and as far as i was able to find, there is no easy way to loop through the connections to each of the test databases, so the only solution i found was, knowing that the test database are always called the same, i connect to each one manually, open a pyscopg2 connection, and create the schemas i need, all in the setup_databases function. Huge hack but it works).

import psycopg2
from django.db import connection
from django.test.runner import DiscoverRunner

from project import settings


class ProjectTestRunner(DiscoverRunner):
  def setup_custom_schemas(self):
    with connection.cursor() as cursor:
      cursor.execute(
        "CREATE SCHEMA IF NOT EXISTS master;\n"
        "CREATE SCHEMA IF NOT EXISTS cohort"
      )
    db_settings = settings.DATABASES['default']
    if self.parallel > 1:
      for index in range(1, self.parallel + 1):
        conn = psycopg2.connect(
          dbname=f"test_project_{index}",
          user=db_settings["USER"],
          password=db_settings["PASSWORD"],
          host=db_settings["HOST"],
          port=db_settings["PORT"],
        )
        with conn.cursor() as cursor:
          cursor.execute(
            "CREATE SCHEMA IF NOT EXISTS other_schema;\n"
            "CREATE SCHEMA IF NOT EXISTS other_schema"
          )
        conn.commit()
        conn.close()
        
  def setup_databases(self, **kwargs):
    config = super().setup_databases(**kwargs)
    self.setup_custom_schemas()
    return config