Hi kkarthik,
i do exactly what you are describing when it comes to pandas and dataframes. I have developed a backend with Django and DRF, and my colleagues, two veterinary professors, define diagnostic and clinical data in spreadsheets which is then used to populate the database.
As Ken said, I recommend you start by writing a separate program to do this. In my case, I have a base class which I call BaseImporter
which defines the common things about importing from a spreadsheet.
Then, for each tab of the spreadsheet I have a class which inherits from the BaseImporter
. Lastly, I have a separate program which puts it all together: fetch the spreadsheets (I use google sheets), read the spreadsheets, process the and sanity check dataframes before passing the data to the relevant importer.
Each importer logs its work and returns any logging information to the main program. This log information is then returned to whatever calls the main importer.
Finally, I use celery to run the importer. My colleagues visit a FormView
from which they can choose a species and then my importer will run in the background and import all the data for the chosen species.
I’ll share some basic examples of how I put this together. I can’t promise that the code is any good, and it is constantly changing, but I hope that this helps you in some way.
Base Class
class BaseImporter:
def __init__(self, species=None):
self.named_tuple = None
self.species = species
self.sheet = None
self.errors = [] # for return errors back to the importer tool
self.logging_level = logger.getEffectiveLevel()
def parse_case_sheet(self, obj, row):
return obj(**self.clean_data(row))
@staticmethod
def clean_data(row):
for key, value in row.items():
if isinstance(value, str):
value = value.strip()
if pd.isna(value) or value == "" or value == "null":
row[key] = None
return row
# and a whole bunch more
A sheet importer
class AssessmentImporter(BaseImporter):
def __init__(self, sheet, cases):
super().__init__()
self.sheet = sheet
self.cases = cases
def run_importer(self):
for index, row in self.sheet.iterrows():
data: Columns = self.parse_case_sheet(Columns, row)
if data.case is None or data.case not in self.cases or data.initial_assessment is None:
continue
case = self.cases[data.case]
assessment, created = Assessment.objects.update_or_create(
case=case,
defaults={
"initial_assessment": data.initial_assessment,
"updated_assessment": data.updated_assessment,
},
)
logging_message = "Created" if created else "updated"
self.debug(f"Case Assessment {assessment.initial_assessment} {logging_message}")
return self.errors
Main program
# get the spreadsheet in question
if species == "rabbit":
data = google_sheets(RANGES, RABBITS)
# get the tab (sheet) in question
assessment_sheet = extract_case_data(data, RANGES["assessment"])
# run the importer for the sheet
assessment_importer = AssessmentImporter(assessment_sheet, cases).run_importer()
log_importer_results("assessments", assessment_importer, importer_logs)
Celery Stuff
Be sure to check out this link: https://docs.celeryproject.org/en/stable/django/first-steps-with-django.html
This is my task that runs the importer
from config.celery import app
from celery.utils.log import get_task_logger
from populator.import_tool import importer
logger = get_task_logger(__name__)
@app.task(name="importer")
def run_importer(species):
"""Runs the importer"""
logger.debug("Run importer")
return importer(species)
And this is my view which runs the task
# This is from the POST method of my Import FormView
# This may or may not be of use to you, but you see how I execute
# and get the importer results from celery
if form.is_valid():
species = form.cleaned_data["species"]
result = run_importer.delay(species)
return HttpResponse(json.dumps({"task_id": result.id}), status=status.HTTP_200_OK, content_type='application/json')
Righto, that’s how I do it. I’m sure there are a 1000 different ways to do it and no doubt it can be done better, but I have found this to be highly functional and robust.
Best of luck!
Conor