Relational Normalisation and Deliberate Deviations

On the subject of Third Normal Form (3NF), best practice and making one’s life a tad easier.

Ever since my CS classes of late high school, 3NF has been drilled into me. Of course, it makes a lot of sense for optimising databases and all in all its not, in my opinion, that difficult to implement given some forethought a careful consideration.

However, when working with non-technical people, and even technical people who are not familiar DB design ideas, it often appears to me that the easier approach is to deviate from 3NF.

As an example, let’s look at the following Django models.

class Treatment(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4)
    case = models.ForeignKey(Case, related_name="treatments", on_delete=models.CASCADE)
    text = models.CharField(max_length=512)
    heading = models.CharField(max_length=512)


class Choice(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4)
    uuid = models.UUIDField(unique=True, default=uuid.uuid4)
    treatment = models.ForeignKey(Treatment, related_name="choices", on_delete=models.CASCADE)
    points = models.IntegerField(null=False, blank=False)
    text = models.CharField(max_length=512)
    is_correct = models.BooleanField(blank=False, null=False, default=False)

    def __str__(self):
        return f"{self.text}"

The Treatment object contains data which can be displayed which is pertinent to all of its choices. For example, one might have.

Birds

When treating birds who have forgotten to fly, what is the recommended treatment?

  • A good nights sleep
  • Mandatory base jumping
  • Flight simulator lessons

The heading comes from the Treatment Object, as does the text. The options come from the Choice model.

I have several applications which have the same structure. I can normalise the model by creating say, a model called DescriptiveText which contains heading and a text attributes and have all models which use this feature to relate to DescriptiveText by way of a FK.

The challenge for some of my colleagues is they struggle to use the Django Admin and understand why we have such a design. When working with the API with standard model serializers, we’ll always have to create the DescriptiveText followed by the object in question. For me this is no biggy, but I will get some complaints.

I can of course dive into Django and create heavily customised endpoints or nested serializers. Customisation runs the risk of added complexity and nested serializers are just a pain when they’re not read-only, in my opinion.

So, what do you think? What would you do here? Ditch 3NF and just create the five or so models with text and heading attributes, or would you go the 3NF route?

The CS part of me says 3NF, use Inline admin and provide guidance and documentation of the use of the API. The part of me who can’t be bothered to deal with all the questions, feedback and pushback just says, sod it, let’s take the path of less resistance.

Having said all of this, I could be not seeing the wood for the trees or even be missing some key knowledge. Happy to be informed on both accounts.

As always, thank you for your thoughts.

C

Warning, the rant that follows comes from someone who was working with databases before SQL-as-a-language was codified as an ANSI / ISO standard and there were semi-relational databases that didn’t use SQL as the API. Things have changed since then, but I don’t think the principles have changed all that much.

The working precept for us was always “normalize until it hurts, denormalize until it works”. It was known that a true, full normalization always incurred a performance cost. Once you’ve properly distributed your database and indexes across your tablespaces to distribute the IO among your I/O channels, and determined that performance wasn’t what it needed to be, then you would look at denormalizing your table structures to improve performance.
(It’s the first case where I ran into the idea of “practicality beats purity” as being a fundamental principle.)

We also made the distinction between “User data” and “Application data”. “User data” was the term applied to the data used by the line-of-business - customer information, etc. “Application data” referred to data used by the system and was generally never seen by the end user. In Django terms, think of the ContentType table as such an example.

Keep in mind that the tradeoffs between normalized and denormalized data generally deals with the decision between optimizing data integrity, eliminating duplicate data, and minimizing update transaction times; and query performance. If you have some type of reference table that is very rarely updated, I see no reason to enforce normalization requirements.

Bottom line from this long-winded rant is, go ahead and denormalize it and don’t worry about it. This doesn’t appear to be transaction-oriented data that is going to be updated so frequently that data integrity is going to be an issue, and I doubt it’s going to be large enough that the quantity of data in case of replication is going to cause a problem either.

2 Likes

Cheers Ken! It’s nice to get straightforward advice and opinions.

I’ve got the denormalized models ready to go already and I’m ready to jump into it.

Cheers,

C