Struggling with application structure for multi-tenant system

We are developing a system that serves as a platform to work with and configure any number of other client-specific projects. This has worked so far, but now that we implement more complex features this has started to turn into a mess that becomes more difficult to maintain and develop for over time. I am hoping someone can suggest an architecture/structure that will work better and is more pythonic/djangonic in nature. Unfortunately I cannot give a complete minimal working example as boiling down the code and removing client information itself would take several hours to days. If absolutely necessary I might do that, but cannot justify it at the moment.

We chose Django to handle the authentication system, session handling, MVC-like development, etc. In addition we included a model for projects

class Project(models.Model):
    name = models.CharField("Same as database name", max_length=50)

that serves to inform Django about what other projects exist. Furthermore we use a custom user model

class CustomUser(AbstractUser):
    allowed_projects = models.ManyToManyField(Project)
    is_allowed_all = models.BooleanField(default=False)

to assign users to projects. This works orthogonal to Django’s permission system (since we have to check these permissions with custom code wherever needed) and is a possible point of improvement, but at least this is working and requires no code change for new projects.

What also works okay is having views that change depending on the project that is selected via a URL parameter, then fetching the specific content:

urlpatterns = [
    path('<int:project_id>/', views.detail, name='detail'),
]

def detail(request, project_id):
    context = get_project_specific_context(project_id)
    return render(request, "projects/detail.html", context)

Of note here is that these views might fetch data from databases that Django doesn’t know about, or at least is not allowed to manage. For this we do not use any model classes and instead fetch data with simple hardcoded SQL queries on the project’s database.
As a result of that feature we do have the problem of populating test databases for the projects that are being created by fixtures in the Django test database during tests. We solved that with rerouting the database connections to hardcoded SQLite databases and manually populating them with data. Since these views only read data this was a sufficient solution.

Now the current biggest roadblock is implementing an API for CRUD operations on the models saved in the project databases. All project databases can be expected to be of the same structure (there is a seperate system managing them), but this structure is supposed to be completely independent of anything the Django project does. This is one problem, in that any changes to the structure has to be reflected in the models used by the Django project, but changes are rare and can be easily integrated. To ensure the API doesn’t produce inconsistencies during the necessary seperate deployment of changes, we can check on establishing the connection that the database’s version is the same as the version in the model classes.

To implement the API we use the Django REST Framework, which seems to do exactly what we need. We have models

class DataPoint(models.Model):
    name = models.CharField(unique=True, max_length=255)
    unit = models.CharField(max_length=20, blank=True, null=True)

    class Meta:
        managed = False
        db_table = 'data_point'

and serializers

 class DataPointSerializer(serializers.HyperlinkedModelSerializer):
      class Meta:
           model = DataPoint
           fields = ['name', 'unit']
           ordering = ['id']

and viewsets

class DataPointViewSet(viewsets.ModelViewSet):
    project = "project_1"
    queryset = DataPoint.objects.using("project_1").all().order_by("id")
    serializer_class = DataPointSerializer

and registering routes for each viewset

IMPLEMENTED PROJECTS = {
    "project_1": {"data_point": DataPointViewSet}
}

routes = []
router = routers.DefaultRouter()

for prj_name, prj_routes in IMPLEMENTED_PROJECTS.items():
    for route, viewset in prj_routes.items():
        router.register(prj_name + '/' + route, viewset)

routes += router.urls
return routes

The biggest problem with this is that the viewsets are project specific and hardcoded at that. That means a lot of copy&paste code if we want to enable the API for all projects. If we remove the using from the viewset, the API will try to fetch from the default database. Using a database router doesn’t work since it can only classify the read/write database on a model-class-level and has no information which viewset tries to fetch data in order so select the right database.
This problem with using a database router also strikes when trying to populate the test databases. In that regard we have no working solution yet and would try to get the API working before that anyway.

It is possible that we get something working with enough head-wall-banging, but is has become clear that the problems will only multiply with more features requiring the project databases. I hope someone might have an idea what we could improve, especially in regard to implementing the API on the project models.

I was watching Awesome Automated APIs with automatic REST last night from DjangoCon US last year, and I think it might be useful for you to get ideas. It’s not quite the same situation as yours but it’s related. the presenter’s problem was that there’s a huge database with 6000 tables and pre-existing schemas that Django can’t manage, and they needed to provide a read-only REST API to access them. The solution was extensive metaprogramming.

If all project databases have the same structure, would something like this work? (It’s a bit simplified from your example, and I’ve not run it in the REPL.)

def make_viewset(db_name: str):
    class DataPointViewSet(viewsets.ModelViewSet):
        project = db_name
        queryset = DataPoint.objects.using(db_name).all().order_by("id")
        serializer_class = DataPointSerializer

    return DataPointViewSet

PROJECTS = ["proj1","proj2"]
router = routers.DefaultRouter()
for project in PROJECTS:
    router.register(project + '/datapoint', make_viewset(project))

urlpatterns = router.urls

The other thing I’m wondering if is you actually need multiple databases, one for each project, or if each model could have a ‘project’ field in it in the same DB?

1 Like

Thank you very much! The idea about meta-programming was what we needed to make headway. There are some kinks to work out, but it seems like we’re on the right track. Once everything is working and we can write tests for the API, I will post the solution for future readers to avoid a DenverCoder9 scenario.

I haven’t watched the talk yet, but will eventually as it does seem interesting and somewhat relevant for us.

The multiple database solution is necessary and unlikely to change. While it might be beneficial for some features, we have other business needs and processes that would suffer from trying to work all with the same database.

With some work we’ve arrived at a solution that is not pretty, but it works and we can test it. The necessary steps are a bit involved and the code below is boiled down and modified from what’s actually in our code base, but I did promise to post the solution so here it is.

The first thing we did was make the external models (external to Django) be project-specific. Although they are the same for all projects, this proved necessary as the database router otherwise has no idea where to send the data for reading and writing. This also means that all serializers and viewsets are project-specific. The most obvious sign that this is not ideal is that Django throws a lot of warnings about duplicate models being registered, but aside from these warnings it seems to work.


def get_data_point_class(project_name):

    class DataPoint(models.Model):

        project = project_name

        is_external = True

        name = models.CharField(unique=True, max_length=255)

        class Meta:

            managed = False

            db_table = 'data_point'

    return DataPoint

def get_data_point_serializer(project_name):

    model_class = get_data_point_class(project_name)

    class DataPointSerializer(serializers.HyperlinkedModelSerializer):

        project = project_name

        project_model = model_class

        class Meta:

            model = model_class

            fields = ['id', 'name',]

            ordering = ['id']

    return DataPointSerializer

def get_data_point_viewset(project_name):

    class DataPointViewSet(viewsets.ModelViewSet):

        project = project_name

        serializer_class = get_data_point_serializer(project_name)

        queryset = serializer_class.project_model.objects.all().order_by('id')

    return DataPointViewSet

def get_api_routes():

    model_viewsets = {

        "data_point": get_data_point_viewset,

    }

    routes = []

    for project in settings.API_PROJECTS:

        router = routers.SimpleRouter()

        for model_name, viewset in model_viewsets.items():

            route = project + '/' + model_name

            router.register(route, viewset(project))

        routes += router.urls

    return routes

With the API routes now registered for each project dynamically a router can send the data to/from the right database:


class ProjectDBRouter:

    external_models = {

        "datapoint",

    }

    def db_for_read(self, model, **hints):

        if hasattr(model, "is_external") and model.is_external \

            and hasattr(model, 'project'):

            if "test" in sys.argv:

                return "test_prj_" + model.project

            return "prj_" + model.project

        return None

    def db_for_write(self, model, **hints):

        if hasattr(model, "is_external") and model.is_external \

            and hasattr(model, 'project'):

            if "test" in sys.argv:

                return "test_prj_" + model.project

            return "prj_" + model.project

        return None

    def allow_relation(self, obj1, obj2, **hints):

        return None

    def allow_migrate(self, db, app_label, model_name=None, **hints):

        # if there are hints, we can possibly use these directly

        if "target_db" in hints:

            return db == hints["target_db"]

        if "is_external" in hints and hints["is_external"]:

            if db == "default":

                return False

            if "is_test_data" in hints and hints["is_test_data"]:

                return True

        # the default database only accepts internal models or migrations

        # that are not model specific

        if db == "default":

            return model_name is None or model_name not in self.external_models

        # the project databases only accept migrations for the external models

        # if it is a test database

        if model_name is not None and model_name in self.external_models:

            return db[0:8] == "test_prj"

        # none of the other checks caught something, so better safe than sorry

        return False

However for tests it was a bit more complicated getting them working right. Usually we’d use fixtures to load test data to have a known state before running a test, but this proved impossible. Instead we use migrations to create the models and populate the tables while relying on the router to make sure the migrations are only allowed for the right circumstances (internal vs. external and production vs. testing).


class Migration(migrations.Migration):

    

    dependencies = [

        ('some_app', 'some_migration'),

    ]

    operations = [

        migrations.CreateModel(

            name='DataPoint',

            fields=[

                ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),

                ('name', models.CharField(max_length=255, unique=True)),

            ],

            options={

                'db_table': 'data_point',

                'managed': True,

            },

        ),

    ]

DATA = (

    (1, "data_point_name"),

)

def add_test_data(apps, schema_editor):

    model_class = get_data_point_class("name_of_project")

    for row in DATA:

        data_point = model_class()

        # ordinarily we'd like to set a specific ID for known behaviour, but then

        # the query turns from INSERT into UPDATE, which fails since the instance

        # doesn't exist yet

        # data_point.id = row[0]

        data_point.name = row[1]

        data_point.save()

class Migration(migrations.Migration):

    dependencies = [

        ('some_app', 'the_previous_migration_creating_the_models'),

    ]

    operations = [

        migrations.RunPython(add_test_data, hints={

            "is_external": True,

            "is_test_data": True,

            "target_db": "test_prj_name_of_project",

        })

    ]

Finally with all in place, the tests are fairly simple:


class ProjectAPITests(TestCase):

    fixtures = [

        "test_data_projects.json",

        "test_data_users.json",

    ]

    multi_db = True

    def test_get_project_view_structure(self):

        self.client.logout()

        self.client.login(username="some_user", password="some_password")

        expected = {

            "id": 1,

            "name": "data_point_name",

        }

        response = self.client.get("/api/v1/name_of_project/data_point/1/?format=json")

        self.assertEqual(response.status_code, 200)

        self.assertDictEqual(response.json(), expected)