Ideal process for integrating/maintaining external catalog of data

BrandonDatwyler · February 2, 2023, 6:15pm

I’m looking to build out a collection manager using an open source dataset that contains json files (flesh-and-blood-cards/json/english at develop · the-fab-cube/flesh-and-blood-cards · GitHub), as well as helper scripts to generate PostgresDB tables of the dataset (flesh-and-blood-cards/helper-scripts/generate-sql-db at develop · the-fab-cube/flesh-and-blood-cards · GitHub). With all the items in the catalog already having unique_id given to them, I’m trying to figure out the ideal way of getting the data into a django app, and keeping it updated going forward. Some options seem to be:

-Keep the catalog data in a separate DB that django accesses (Multiple databases | Django documentation | Django)
-Create the DB using the helper scripts, and then inspectdb to create the django models
-Create the django models manually, and the use datamigrations or fixtures to populate the data

And then regarding updating the data, because everything already has a unique_id prior to being put in the database which will always be the same, is there any downside to just dropping those catalog tables and rebuilding them as opposed to writing a script to check for changes/updating them?

KenWhitesell · February 2, 2023, 8:19pm

If you want to use those tables in a Django program using the ORM (as opposed to relying upon writing your own SQL queries), you’ll need to either manually create the models or use inspectdb and “tweak” the results. (The output from the inspectdb command always needs to be verified - it’s only going to produce completely appropriate results in trivial cases.)

Whether you use the same DB or a different DB is entirely up to you. There’s no technical reason to pick one option over the other.

Emptying those tables and repopulating them is ok if none of your applications allow for people to edit those tables or create foreign key links to those tables. (If so, you’d need to deal with issues like table locks and management of integrity constraints.)

BrandonDatwyler · February 2, 2023, 8:34pm

Thanks for your input!

Regarding the last point, the tables themselves wouldnt ever be edited but there will indeed be foreign keys referencing them. If I’m using the unique_id that is being assigned/provided within the dataset itself as the pk, and because of that am sure that every time the table is recreated it will be the same id, in theory I shouldnt have any referential integrity constraint violations as long as I set “on_delete=models.DO_NOTHING” on the model referencing it?

KenWhitesell · February 2, 2023, 8:41pm

No, because during the period of time when you’re emptying the table, there is no entry in that table with that identified ID - thus violating the integrity constraint.

BrandonDatwyler · February 2, 2023, 8:45pm

That makes sense. After the initial data load-in, I’ll look into a non-destructive route of updating the tables.

Thanks again for you help!

Topic		Replies	Views
Connect to external database and map fields to Django application Using the ORM	8	388	October 4, 2024
dumpdata and loaddata - issues and alternatives Forms & APIs	2	3817	February 16, 2022
How to manage a table created outside of Django Using the ORM	6	130	September 19, 2024
Where should data import code live? Using Django	5	701	February 21, 2020
id = models.AutoField() Using the ORM	5	27602	May 26, 2021

Ideal process for integrating/maintaining external catalog of data

Related topics