Sparse table -- how to save space?

I have a polymorphic model Event. I use an event_type field to discriminate its “type”.

There are some fields in that model whose value is only meaningful if its event_type is a specific value: their values don’t matter at all for the other possible “types”.

The problem is that, while at first it was only one or two of those fields, now said model has about 7 fields that only matter when it has a specific type, and the Events that have that type value are approximately 1/1000 of the total amount of entries.

For that reason, I am afraid a lot of space might be wasted in my db for fields whose value isn’t meaningful.

Is there a way in Django to save space when you have these kinda sparse models/db tables?

Couple different thoughts here:

  • The amount of space used depends upon the database, the field type, and whether or not the field allows nulls. For example, a null varchar field in PostgreSQL takes no additional space - there’s a bit field indicating that a nullable column is null, and there is no space used for the column when the bit is set. (see PostgreSQL: Documentation: 14: 70.6. Database Page Layout)

  • Beyond a certain point, having a “sparse model” becomes a sign that the data model should be refactored. You reach the stage that those columns should be refactored to a separate table related to the base table.

  • Both these considerations really only begin to matter once you get beyond a certain size (number of rows) in the table. If you’re talking about less than 10,000 rows I wouldn’t worry about either of these factors. And I’d probably say they don’t become a significant issue until you’re above 100,000 rows.

2 Likes

I took a more in-depth look and its:
2 text fields, 2 datetime fields, 3 boolean fields, 1 many to many field (so not really an issue), 1 positive integer field, and a json field. Some of them are nullable, some not.

It’s less than 10k rows. However, looking at the size of the db, it’s 684MB in less than 6 months of usage of the application. I was expecting much less.

So I am thinking I might actually have to move the fields to a separate model, with a one to one relation to the original one. The most pain is gonna be in migrating the data, and fixing all the serializers that expect to write values to fields of the main model, which will now have to write to fields of a related one.

You don’t mention what database you’re using or what operating system it’s running on, but it’s probably worth checking out where that space is being used.

The nature of the activity is important as well. If there are a lot of inserts and deletes, there might be a lot of unused space tied up.

It’s postgres on ubuntu.

I actually discovered more than 500mb are being used to leftover logs from Django Silk from when I was tracking down a performance issue in production.

So, even with all that unused fields, the db occupies about 100mb. I’d say it’s definitely not worth it to do some very heavy refactoring at the moment, considering there really is no practical need for it.

1 Like