Model design to efficiently process user inputted data

I am creating a web app where users will enter some details regarding a patient, an algorithm will then process these and generate some treatment recommendations. My question is how to both best design my models and then how best for the algorithm to access the user inputted data.

These are my current models for capturing the user inputted data. The Diagnosis, Problem, CurrentMed and PastMed models all have flexible number of entries (the user can dynamically add rows to the entry form) which is why I do not have a single larger Patient model:

models.py

class Patient(TimeStampedModel):
    # get a unique id for each patient
    patient_id = models.UUIDField(primary_key=True, unique=True, default=uuid.uuid4, editable=False)
    
    name = models.CharField("Patient Name", max_length=255)

    age = models.IntegerField("Age", default=0)

    class Sex(models.TextChoices):
        MALE = "male", "Male"
        FEMALE = "female", "Female"
        UNSPECIFIED = "unspecified", "Unspecified"

    sex = models.CharField(
        "Sex", max_length=20,
        choices=Sex.choices, default=Sex.UNSPECIFIED)

    creator = models.ForeignKey(
        settings.AUTH_USER_MODEL,
        null=True,
        on_delete=models.SET_NULL)

    
class Diagnosis(TimeStampedModel):
    
    DIAG_CHOICES = [
                (‘cancer’, ‘Cancer’),
                (‘influenza’, ‘Influenza’),
			 ('unspecified', 'Unspecified'),]
    
    diag_name = models.CharField(
        "diag", max_length=200,
        choices=DIAG_CHOICES, default="unspecified")

    patient = models.ForeignKey(Patient, on_delete=models.CASCADE)

class Problem(TimeStampedModel):
    
    PROB_CHOICES = [
                (‘pain’, ‘Pain’),
                (‘wheeze’, ‘Wheeze’),
			 ('unspecified', 'Unspecified'),]
    
    prob_name = models.CharField(
        "prob", max_length=200,
        choices=PROB_CHOICES, default="unspecified")

    patient = models.ForeignKey(Patient, on_delete=models.CASCADE)


class Med(TimeStampedModel):
    
    MED_CHOICES = [
        (‘Antibiotics’, (
                (‘penicillin’, ‘Penicillin’),
                (‘amoxicillin’, ‘Amoxicillin’),
            )
        ),
        (‘Painkillers’, (
                (‘paracetamol’, ‘Paracetamol’),
                (‘ibuprofen’, ‘Ibuprofen’),
            )
        ),
        ('unspecified', 'Unspecified'),]

    med_name = models.CharField(
        "", max_length=20,
        choices=MED_CHOICES, default='unspecified')
    
    dose = models.IntegerField("Dose (mg)", default=0)

	# Is this a medication currently being taken or one taken in the past
    timepoint = models.CharField(
        "timepoint", max_length=20,
        choices=[('current','current'), ('past', 'past')], default='unspecified')
    
    patient = models.ForeignKey(Patient, on_delete=models.CASCADE)

    class Meta:
        abstract = True


class CurrentMed(Med):
    timepoint = models.CharField(
        "", max_length=20,
        choices=[('current','current'), ('past', 'past')], default='current')


class PastMed(Med):
    timepoint = models.CharField(
        "", max_length=20,
        choices=[('current','current'), ('past', 'past')], default='past')

I think as input to the treatment selection algorithm it would be helpful to reassemble the user inputted data into a new object of a form similar to this:

new_object.py

drug_info = pd.read_csv('../data/drug_data/drug_info.csv')


class Patient2:
    def __init__(self, age, sex, meds, diagnosis, problems):
        self.age = age
        self.sex = sex
        self.meds = meds
        self.diagnosis = diagnosis
	   self.problems = problems

class Meds2:
    def __init__(self, current, past):
        self.current = process_meds(current)
        self.past = process_meds(past)

class Diagnosis2:
    def __init__(self, problems):
        self.diagnosis = diagnosis

class Problem2:
    def __init__(self, problems):
        self.problems = problems


def is_therapeutic_dose(drug_info, drug, dose):
    """check if prescribed dose of drug is above minimum therapeutic dose"""
    cutoff = drug_info[drug_info['drug'] == drug]['dose'].values[0]
    return dose >= cutoff


def process_meds(med_df):
    """calculate whether dose of each drug above minimum therapeutic dose, 
    and assign pharmacological class"""
    therapeutic = []
    classes = []
    for drug in med_df['drug']:
        dose = med_df[med_df['drug'] == drug]['dose'].values[0]
        therapeutic.append(is_therapeutic_dose(drug_info, drug, dose))
        classes.append(drug_info[drug_info['drug'] == drug]['class1'].values[0])
    med_df['therapeutic'] = therapeutic
    med_df['class'] = classes
    return med_df

To do this reassembley, however seems like it will be quite longwinded and potentially inefficient

reassembly.py

from django.db.utils import DEFAULT_DB_ALIAS
from django.contrib.admin.utils import NestedObjects
collector = NestedObjects(using=DEFAULT_DB_ALIAS)
queryset = Patient.objects.filter(patient_id='aa8e836d-f5dd-44cf-ba54-1bdc8b5e1445')
collector.collect(queryset)

patient_object_for_treatment_algo =  Patient2=(list(collector.data[Patient])[0].age,
                        					  list(collector.data[Patient])[0].sex,
                        					  processing_func(list(collector.data[CurrentMed])),
                        					  processing_func2(list(collector.data[Diagnosis]))
                        					  )

It seems to me like I am potentially overcomplicating the approach and wondered if there is a more obvious and efficient way to go about this?

You haven’t specified what the data needs to look like for whatever algorithm you’re trying to use, or whether that algorithm is internal to your process or an external api.

If this algorithm is part of your system, then you don’t need to collect any data. Pass the pk of the patient to the algorithm, and allow it to retrieve data as needed.

If that algorithm is external to your system, then your requirements are driven by what it expects to receive.

Many thanks @KenWhitesell

It would help if the data looks like the class outlined in new_object.py

The algorithm will be internal to my system so I suppose as you suggest the algorithm could just query the database as and when it needs information. I thought it would be perhaps more efficient to create an object that contains all the relevant information than going back and forth to the database but perhaps not.

It really doesn’t make a difference. Or more accurately, it’s probably less efficient to retrieve all the data in function A, then reformat it to pass it along to function B, than it would be for function B to retrieve it as necessary and in the desired format when it’s needed for processing.