Nullable but not null

I’ve been working on a legacy project where several database fields are marked as nullable, even though they never actually contain null values. This inconsistency undermines trust in the database models. To address this, I wrote a simple script in Django to identify such fields.

What do you think about this? Have you experienced this before?

1 Like

<opinion>
I don’t accept this premise. The underlying requirements that drove the data definition is the only thing that matters.
</opinion>

The fact that you don’t have any rows with null values in them right now does not mean that it would be invalid to ever have null values in those columns - it’s strictly a data requirements issue.

You may simply have a case where there is no data with null values.

These models may be used in a transactional situation, where there are no null values now - but are null at other times within a process.

I believe the mistake would be to evaluate the tables in terms of the data currently stored in them, instead of evaluating them in the context of the business rules used to define them originally. I would consider those requirements to be binding beyond just what’s currently being used.

Certainly, a mistake may have been made with the original definitions, and that those columns actually should not allow for nulls - but again, that’s a recognition that can only be made by looking at the requirements and not the data. This can especially be true with older / legacy systems, where the original designers may no longer be available.

For example, I work with a lot of data that must adhere to certain published standards. If the standard says that field_x is optional, then I must define the database storage for it as being nullable - regardless of whether the current devices sending us that data is always going to have that field populated.

2 Likes

Hi Ken,

Thank you for your detailed response. I really appreciate it.

I think the quote you referenced does not fully capture the context. What I was trying to point out is that there may be a missing migration step for a newly required field. Nullable but not null columns are not necessarily incorrect, since they might be populated later by an asynchronous task, but they can indicate that a migration is incomplete.

I understand that, but my point is that looking at the data is not the right approach to this. The data is not going to definitively answer the question as to whether or not a column should be nullable. You need to understand the application and the functions for which the data is being used to determine this.

I completely agree with you about what you said.

You need to understand the application and the functions for which the data is being used to determine this.

Unfortunately, there may be legacy projects that were actively developed 5–10 years ago, with little documentation. The original developers have since left and there are no domain experts or deep understanding of the codebase remaining. In such situations, how can one identify fields that are technically nullable but are actually required or non-null in practice?

The company cannot afford to dedicate someone to read through the entire codebase just to identify nullable but required fields that may not even impact the end user, but still affect the developer experience. As a result, developers are left to find their own solutions and that’s what led me to this idea.

You can’t - at least not validly under the constraints you define in the following paragraph.

If you don’t understand what the code is doing with your data, you’re certainly not in a position to make a judgement regarding that schema. My general answer to such cases is “don’t touch it.” There is nothing objectively wrong with it as it exists.

Accept the situation for what it is - which would seem to be the base case that you are using regarding the rest of the code. The absolute most that I would ever recommend is something documenting the fact that it is unknown whether it is allowable for that field to be null.

But mostly, you don’t want to become a prime example of “There’s no time (budget, resources) to do it right, but there’s always time to do it over.” If they don’t want you spending the time to analyze the code, I can’t see how they would want you to have to spend time fixing bugs that occur as a result of making changes to functional schemas that simply don’t match some arbitrary aesthetic judgement.

Oh thanks for that @efe - that’s definitely a useful snippet for a large project.

Totally get the premise: A “perfectionist with deadlines” will surely have a bunch of null=True fields lying around after years of adding columns to a project.

1 Like

I agree that sometimes a null=True field can have no null values anymore, that premise of having null values is not true anymore. But I only think that would be a good idea if you have did some sort of back-population of the data, and you only would know about this if you were on the team that did it. It’s quite tricky I would say.

1 Like