Best practice: storing model details (Nested vs. Flat)

cybot · May 5, 2020, 11:03am

I am developing a django application to help keep track of macro- and micronutrients in a diet. So far, I am super happy with the django documentation and was able to resolve most of my issues. But right now I am stuck on a more conceptual question related to databases and model relationships.

My current model for an Ingredient and the associated NutrientProfile (that stores the nutrients) looks like this:

# models.py
class Ingredient(models.Model):
    ingredient_id = models.UUIDField(
        verbose_name="Ingredient",
        primary_key=True,
        default=uuid.uuid4,
        editable=False,
    )
    name = models.CharField(max_length=300, unique=False)
    synonyms = models.CharField(max_length=300, blank=True, null=True)
    category = models.CharField(max_length=300, blank=True, null=True)

class NutrientProfile(models.Model):
    nutrientProfileID = models.UUIDField(
        primary_key=True, default=uuid.uuid4, editable=False,
    )
    # Ingredient the NutrientProfile is associated to
    ingredient = models.ForeignKey(
        to="Ingredient",
        related_name='nutrientProfile',
        on_delete=models.CASCADE,
        blank=True,
        null=True,
    )
    fat_total = models.DecimalField(max_digits=7, decimal_places=4, blank=True, null=True)
    protein = models.DecimalField(max_digits=7, decimal_places=4, blank=True, null=True)
    cholesterol = models.DecimalField(max_digits=7, decimal_places=4, blank=True, null=True)
    ... more nutrients

For my DRF API-ViewSet I use prefetch_related() to reduce the DB-queries like this:

# views.py
class IngredientViewSet(viewsets.ReadOnlyModelViewSet):
    # Activate the paginator
    serializer_class = IngredientSerializer
    queryset = Ingredient.objects.all().prefetch_related('nutrientProfile')
    search_fields = ["name", "synonyms"]
    filter_backends = [filters.SearchFilter]

So currently I am able to create Ingredients, and associate them with one or more NutrientProfiles. I did not define this relation as One-to-One, because there may be multiple NutrientProfiles for a single ingredient (multiple data sources).

Now my question:
I realized, that I will need some more information on each nutrient, such as the unit, a comment, date added, etc., so I came up with the following Nutrient model to store more details.

class NutrientProfile(models.Model):
    NutrientProfileID = ... ID
    ingredient = ... FK
    fat_total = models.OneToOneField("Nutrient", on_delete=models.CASCADE)
    protein = models.OneToOneField("Nutrient", on_delete=models.CASCADE)
    cholesterol = models.OneToOneField("Nutrient", on_delete=models.CASCADE)
    ... more nutrients

class Nutrient(models.Model): 
    amount = models.DecimalField(max_digits=7, decimal_places=4, blank=True, null=True)
    unit = models.CharField(max_length=5, choices=allowed_units)
    comment = models.CharField(max_length=200)
    ...

So each field in NutrientProfile can have all necessary details stored in the Nutrient model. Does this make sense, or am I going in the wrong direction here? Would it be better to store all the details in the NutrientProfile itself? Like so?

class NutrientProfile(models.Model):
    NutrientProfileID = ... ID
    ingredient = ... FK
    fat_total = models.OneToOneField("Nutrient", on_delete=models.CASCADE)
    fat_total_amount = ...
    fat_total_comment = ...
    fat_total_unit = ...
    protein = models.OneToOneField("Nutrient", on_delete=models.CASCADE)
    protein_amount = ...
    ... more nutrients

Defining all details together, seems very ugly to me. But putting them in another table will increase the ‘work’ for the database, right? For 10.000 NutrientProfiles with each 30 nutrients, the Nutrient table would already have 300.000 rows. Wouldn’t that be far more inefficient than simply storing more columns in the NutrientProfile table? I am using PostgreSQL as my Database right now.

A, in my opinion, more flexible way of dealing with the details would be to define the NutrientProfile and Nutrient models like this:

class NutrientProfile(models.Model):
    NutrientProfileID = ... ID
    ingredient = ... FK
    # No defined nutrient fields

class Nutrient(models.Model): 
    nutrient_profile = models.ForeignKey('NutrientProfile', on_delete=models.CASCADE, null=False)
    # Nutrient type. For example: fat, protein, cholesterol,...
    # Maybe limited with predefined choices?
    nutrient_type = models.CharField(max_length=50, choices=??)
    amount = models.DecimalField(max_digits=7, decimal_places=4, blank=True, null=True)
    unit = models.CharField(max_length=5, choices=allowed_units)
    comment = models.CharField(max_length=200)
    ... more details
    class Meta:
        unique_together = ['nutrient_profile', 'nutrient_type']

However, my question remains the same. Is it preferrable to store more fields on a single model, or should I split up my models, even tough the resulting detail table will have more rows? How does that impact the SQL performance? Is nesting models like this considered bad practice?

And where do you draw the line in making more detailed models? For example in the Nutrient model, I could make a Unit and Comment model etc…

I would also be interested how you would approach such a problem in general! For example a Car model that is a little more detailed than the one in the Django documentation:

Car => a model for a single car (not type of car)
- some detail fields like “date_manufactured”, “is_automatic”,…
- Manufacturer FK
- has many Component groups
Component Group => e.g.: “Tires”, “Engine”, “AC”,…
- Limited to 200 different Component Groups via choices
- some detail fields like “date_manufacture”, “date_installed”, …
- has many Included Parts
Included Parts => e.g.: “Screws”, “Nuts”, “Gasket”, …
- some detail fields
- has many Manufacturers
Manufacturer => e.g.: some companies that produce screws / gaskets, etc.
- some detail fields
- has many employees
Employee
- detail fields about employee
- Name

So if I have an instance of a Car model and want to know the names of all employees that worked on it, I would have to specify my queryset like this:

queryset = Car.objects.all().prefetch_related('ComponentGroup').prefetch_related('IncludedParts').prefetch_related('manufacturer').prefetch_related('Employee')

Isn’t that a very expensive query? How would you go about modeling this? What are your thoughts about storing very detailes models (lots of fields) vs having many models (lot of tables)? Pro/Cons? Should you store details in the through model?

I am sorry if this has become a bit long, so,
TLDR: What’s your approach for storing large/detailed models? Split up into multiple models, or keep in one model?

Thank you in advance

KenWhitesell · May 5, 2020, 11:19pm

(Disclaimer: Perspective of an Old DB analyst who predates commercial relational databases)

Go deep. Modelling data is a well-known exercise, and the normalized forms exist because they establish a very well known set of “best practices”, and database engines are optimized to focus on those types of situations.

In real terms, I doubt you’re going to be working with a large enough database where the performance of the database itself is going to be an issue. You’re more likely going to encounter performance problems due to a poor algorithm.

Note: I was trying to look at your example to play with some ideas on how to handle your query, but the text of your model descriptions is unclear as to the relationships between entities. For example, it appears to me that there would logically be a many-to-many relationship between Car and ComponentGroup. Likewise, I would expect a many-to-many between Manufacturer and IncludedParts. (I see a one-to-many between Manufacturer and Employee.) I’m not sure about the relationship between Manufacturer and IncludedParts. If it’s really as generic as “Screws”, then that would imply a many-to-many.)

Number of tables itself isn’t an issue here. I’m currently working on a system that has 235 tables. There are queries being run that join up to 25 tables to get the results of the query. (Now, in these specific cases, we don’t use the ORM - we’ve created custom views in the database and then expose them to Django as an “unmanaged” model.

Bottom line? I would suggest doing it according to the commonly acknowledged best practices, and if that doesn’t work, then think about denormalizing the tables.

cybot · May 6, 2020, 3:49pm

Hi Ken!
Thanks a lot for your input. So basically I should not worry too much about DB Performance That’s great to hear.

Do you have any good resource that could help me learn best practices for model design in Django?

KenWhitesell · May 6, 2020, 4:10pm

<opinion> There’s nothing special or unique about this being in Django. Standard relational database practices would apply. </opinion>

I’ve glanced over the wikipedia page on database normalization. If you’re already kinda familiar with it, it’s a decent refresher, and there are a couple links that might be worth chasing down. Other than that, I don’t have any online references to point you to. Best suggestion I can make in the absence of someone else jumping into this topic to join in is to search around to see what you can find in the way of books and tutorials that makes sense to you.

It might also be worthwhile to try and find other open source projects that have complex data models and see if you can figure out what their schema is to see how they approach the issue.

There is some “art” to it as well as “engineering”. Any non-trivial schema is going to be able to be represented in multiple ways. Rarely are the answers so clear-cut that there aren’t some judgement calls needing to be made.

Topic		Replies	Views
Proper model relationships for many models Using Django	10	1772	August 6, 2020
Relational Normalisation and Deliberate Deviations Using the ORM	2	2355	March 6, 2021
Query DB-objects over more than two models Using Django	2	614	January 6, 2020
databse design one table for several tables - django models Using Django	5	504	December 7, 2021
A question about polymorphism in Django Using Django	10	3797	March 17, 2020

Best practice: storing model details (Nested vs. Flat)

Related topics