Bear with me on this. I’m learning as I go here, and I likely have some things wrong that need corrected. Our project is over a year old now and there is a large code base, so unless I can figure out a way to juggle these 2 databases, there will have to be a major refactor, which would have to be a long-term goal, so what I’m looking for is whether or not what I’m about to describe is possible or not…
There are a number of factors that lead to this question, but basically, we have a data submission validation page so that users (who want to submit files for us (the curators) to load into the database) can validate their files and fix simple issues before it gets to us.
Originally, we wanted to simply load the files using the load scripts in debug/dry-run mode in the validation view, but given the way the load scripts were developed, they have side-effects, even when an exception is raised. Applying an atomic transaction was attempted, but there are a number of methods in the load that make queries, which raises an exception during atomic transactions.
One quick fix we used was to simply create a separate database, and after the initial implementation (just being careful to use .using(db)
and .save(using=db)
, it worked for awhile - validation loads went into the validation database and errors were returned that users could act on, but we found that adding new code unpredictably lead to some confusing problems and errors from the validation load. Honestly, we don’t fully understand what is going on. Various methods called from the load scripts that make independent queries turned out to still be going to the default database, which we didn’t realize would happen (this is our first Django project and we’re learning as we go).
Everything not involving the validation interface (including the load scripts) work as designed and pass all tests. It’s just that we’re having a hard time getting control over what database is used when attempting to get the validation view to only use the validation database. For example, a problem I worked around yesterday was a call in a new load script that was calling get_or_create
(that goes through using("validation")
). I added debug prints in the django base code that showed 4 or 5 queries were going to validation
, but 2 subsequent queries were going to default
and the error we were getting was about the primary key (an AutoField) didn’t meet the unique constraint - and I knew that that table in the validation database was empty. I worked around the problem by doing separate get
and create
calls - and that worked. No more unique constraint violations.
So while trying to figure all this out, I learned of “database routers” and figured I could create a router and switch its default database whenever we run the validation loads, but as I read the docs this morning, it seems that this use-case isn’t really what its designed for. It seems like it’s designed for individual models always defaulting to a statically set database?
I started to wonder how I could implement the desired outcome of having all queries from 1 script go to the validation database and wondered if any such attempt could actually work without affecting queries of other users who are browsing the site while 1 user runs the validation code…
Is this even possible? Should I start an overhaul to try and make atomic transactions in the load scripts possible so that we don’t need a second database?