SaaS django app that pulls data from each client legacy database

Hi I am building a SaaS type and although i am familiar with building Django apps, I am struggling with a few things.
The goal of the app is real time tracking of metrics. For each user, it needs to access its legacy database to pull some historical data (on a daily basis). After that, a python script does calculation and store the data to a PostgreSQL database that stores all the tenants info.

I am struggling to wrap my mind on how to make an API letting each user can access an external database through Django. I imagine the end result be as a connect functionality in the admin interface letting the user connect its database but I am overall confused on how to make that happen on the backend.

UPDATE: I have seen that this could be made by allowing the connection to the django app user FROM the SQL database. In that case, the user would need to create that access within his SQL database. I am still confused on what tool to use to proceed after that.

Any help on this, like experiences, tutorial, readings would be greatly appreciated!!

We’ve had a similar problem, which I have written about here. Two notes to that thread:
1.) The database settings for name/host/credentials are defined in the settings. That means adding a new project (=client/tenant) involves logging into the server and updating the settings file. However I think it should be possible to update the settings from within the running instance with data pulled from the database where the django app lives.
2.) We have since added an extended system to define read/write access to the API for each user and each project. Unfortunately I can’t post our solution here as it would expose too much code. Essentially it works with custom permissions, a custom backend checking for those permissions and decorators for views so that ensuring correct read/write access is as simple as adding that decorator to the view method. For DRF the permissions alone might be enough.

Overall our code ended up a bit undjangonic, but it’s still manageable. I’d rather have to deal with some extra complexity than miss all those goodies Django brings along.

Hi,

Thank you for your response, your help is precious and seeing your original post was also beneficial. I see a bit clearer in the situation but i am still someone confused and realized that my post reflected my confusion. So let me try to reformulate that better.

I have a multi-tenant app with multiple schemas within a PosgreSQL database. this database is storing the processed data for each tenant.

The part that does not make sense to me is how to let each tenant pull data from wherever they want (most likely their ERP database), process this data with a python script and then storing it in their own schema. Do you see my issue more clearly? Although all the django resources on the internet have been super useful, I can’t find any help on how to do this.

If I understand it right, there are two problems to solve. 1.) Allow users to configure a connection to an arbitrary database and 2.) Automatically route the output of the script to the correct schema of that user.

The usual way to have databases in Django is to define them in the DATABASES setting, but I think this is mostly intended to have different databases for the model data that the Django app itself uses. If you add external databases you will run into trouble with migrations and telling the Django app to pull which data from where.

Allowing the users to define extra database connections seems simple enough by having a model to encapsulate the data and a form for the user to fill out. But this then potentially saves credentials as cleartext in the database, which is bad. I’m sure this followup-problem has solutions as well, but I’m hesitant to suggest one.

These connection settings can then used by the transformation script to fetch the data. At that point it might be better to use manually written SQL queries as opposed to using models. In my experience it can be frustrating getting the Django app to populate models from the correct database.

For writing the output of the script to the user’s/tenant’s database it might also be more convenient to circumvent the model system and write the data directly via SQL queries. However if you later want to poll the data for charts or whatever within the Django app, it might be better to use models after all. In that case you will need a custom database router that sends the data for these models (and none else) to the schema of the user. This does require that the database connections to the schemas are defined in the settings, which might require a method for injecting settings during runtime. It might also cause problems if the connections remain open after the script finishes and you will need to consider migrations in the router implementation.

Overall I expect you will have to write some custom code to make Django do things it wasn’t quite intended to, but there should be a solution. Again, I wish I could just share some of our code to illuminate some things, but alas open-sourcing a closed source is an uphill battle.

Thank you again so much for all your advices, I think that it will much harder than I anticipated. Overall there is not a whole lot of multi tenancy tutorials and help in Django to tweak things around so this look like a bit of an uphill battle too ^^. Anyways thanks again and no worries about your source code. I am now facing a new debate. As a general rule, would you advice to first build a well functionning app without worrying about the multi tenancy side and then integrate the multi-tenancy after OR I should really emphasize on getting the architecture first before moving on.

Best,
Pierre

Tough choice in that regard. Some musings:

  • Architecture should always come first. Trying to solve problems of the architecture in the implementation just leads to cludges and patches tacked onto a frame that doesn’t match the problem it should solve
  • The old adage “The plan is useless, but planning is important” holds true. The first design will run into some problems in the implementation (or operation) and will require adjustments on a base level. But having done the planning can avoid a lot of problems in the first place
  • An iterative approach to development seems like the way to go if there are no strict requirements or safety considerations. That would involve settings up a Django app that does nothing, but works, and go from there
  • Unless you’re already an experienced Django developer, you will learn things you’d need for the detailed planning only once you implement and (try to) operate them. I sure did.

Overall I don’t think there’s an optimal approach I could divine at a distance, assuming anyone could. Maybe a hybrid approach: Using the best of your knowledge, design an app that considers the data flow, configurations, functionality, libraries and (software) infrastructure necessary to create that multi-tenant app you need. Then start with an empty Django app and iteratively implement and test that design until you run into a problem. Change the design, fix the implementation accordingly and continue.

Thank you! I am on it, I’ll build the app (in a basic form ) without thinking of multi tenancy first to see how everything looks like and then start again with multi tenancy. I believe it will be helpful as as you mentioned, I’ll learn new things on the way and when i begin with multi tenancy architecture I will now have to worry about what to make, rather HOW to make it. Thank you again for your kind help