Ideal process for integrating/maintaining external catalog of data

I’m looking to build out a collection manager using an open source dataset that contains json files (flesh-and-blood-cards/json/english at develop · the-fab-cube/flesh-and-blood-cards · GitHub), as well as helper scripts to generate PostgresDB tables of the dataset (flesh-and-blood-cards/helper-scripts/generate-sql-db at develop · the-fab-cube/flesh-and-blood-cards · GitHub). With all the items in the catalog already having unique_id given to them, I’m trying to figure out the ideal way of getting the data into a django app, and keeping it updated going forward. Some options seem to be:

-Keep the catalog data in a separate DB that django accesses (Multiple databases | Django documentation | Django)
-Create the DB using the helper scripts, and then inspectdb to create the django models
-Create the django models manually, and the use datamigrations or fixtures to populate the data

And then regarding updating the data, because everything already has a unique_id prior to being put in the database which will always be the same, is there any downside to just dropping those catalog tables and rebuilding them as opposed to writing a script to check for changes/updating them?

If you want to use those tables in a Django program using the ORM (as opposed to relying upon writing your own SQL queries), you’ll need to either manually create the models or use inspectdb and “tweak” the results. (The output from the inspectdb command always needs to be verified - it’s only going to produce completely appropriate results in trivial cases.)

Whether you use the same DB or a different DB is entirely up to you. There’s no technical reason to pick one option over the other.

Emptying those tables and repopulating them is ok if none of your applications allow for people to edit those tables or create foreign key links to those tables. (If so, you’d need to deal with issues like table locks and management of integrity constraints.)

1 Like

Thanks for your input!

Regarding the last point, the tables themselves wouldnt ever be edited but there will indeed be foreign keys referencing them. If I’m using the unique_id that is being assigned/provided within the dataset itself as the pk, and because of that am sure that every time the table is recreated it will be the same id, in theory I shouldnt have any referential integrity constraint violations as long as I set “on_delete=models.DO_NOTHING” on the model referencing it?

No, because during the period of time when you’re emptying the table, there is no entry in that table with that identified ID - thus violating the integrity constraint.

1 Like

That makes sense. After the initial data load-in, I’ll look into a non-destructive route of updating the tables.

Thanks again for you help!