Choosing a data model for hierarchical date spans

Hi, I’m working on a personal project for tracking training in my sport (archery). The core purpose is to create a record of training sessions, with quantitative (dates, durations, shots taken etc) and qualitative (text descriptions) data, and then also be able to query and analyse it.

I have exisiting data that I have already been able to import into a prototype, and in the past I’ve used a fixed heriachical system to categorise dates into more personalised groupings than the natural year/month/week/day divisions, so in this case we have Seasons, Blocks and Cycles (macro-, meso- and microcycles for anyone in a sports science background). Each of which is defined originally by a half open interval of start and end date (ie the start date is part of the span, the end date is not and will be the start date of the next), and their hierarchy by where those dates align.

A minimal example dataset could be represented in a flat table like so:

date season_code block_code cycle_code session_number session_duration arrows
2022-10-01 22-23 B01 C04 1 90 100
2022-10-01 22-23 B01 C04 2 30 40
2022-10-02 22-23 B02 C05 1 75 120
2022-10-03 22-23 B02 C05 1 150 200
2022-10-03 22-23 B02 C06 1 210 230

Typically the date spans would be defined in advance when planning training for the year, and refined as I go through…

span_type code start end
season 22-23 2022-08-26 2023-08-24
block B00 2022-08-26 2022-10-01
block B01 2022-10-01 2022-12-29
cycle C04 2022-09-21 2022-10-02
cycle C05 2022-10-02 2022-10-04
cycle C06 2022-10-04 2022-11-15

So I started by modelling these date-spans as their own tables and sharing the common date fields.

# Date span models
class Span(models.Model):
    start = models.DateField()
    end = models.DateField()
    user = models.ForeignKey(User, on_delete=models.CASCADE)
    
class Season(Span):
    description = models.TextField(blank=True, null=True)

class Block(Span):
    title = models.CharField(max_length=30)

class Cycle(Span):
    title = models.CharField(max_length=30)

# Training Session model - actual value data goes here
# Simplified, in reality arrows is its own table with a foreign key to the session to store more granular data
class Session(models.Model):
    date = models.DateField()
    duration = models.DurationField()
    arrows = models.IntegerField()
    user = models.ForeignKey(User, on_delete=models.CASCADE)

The fundamental question I have is how to model the relationships between these entities. The secondary one is on how to manage them to keep the date values well formed (there are dependencies between rows so normal instance validation doesn’t help). I’ve been bouncing about between a few different ways of structuring this and I am struggling to settle on a clear answer, everything either feels like it has redundancy or makes it significantly harder to use the ORM.

Option 1 - Simple foreign key hierarchy:

Session -> Cycle -> Block -> Span
This seems like the most obvious approach, and allows me to simply use the ORM for aggregations and lookups.

class Block(Span):
    season = models.ForeignKey(Season, on_delete=models.CASCADE)
    ...

class Cycle(Span):
    block = models.ForeignKey(Season, on_delete=models.CASCADE)
    ...

class Session(models.Model):
    cycle = models.ForeignKey(Cycle, on_delete=models.PROTECT)
    ...

But now when I change the dates of a span, the sessions need to update which one they are pointing to otherwise there is an inconsistency:

# assume these already exist so we don't have to add any higher levels
block1 = Block.objects.get(start=date(2023,11,20), end=date(2024,1,15))
block2 = Block.objects.get(start=date(2024,1,15), end=date(2024,2,28))
cycle1 = Cycle.objects.create(start=date(2024,1,1), end=date(2024,1,15), block=block1)
cycle2 = Cycle.objects.create(start=date(2024,1,15), end=date(2024,1,28), block=block2)
# so far so good everything aligns

# need to change the end date of cycle 1 for reasons
cycle1.end += timedelta(days=4)
cycle1.save()
# manually updating the start of the next span
cycle2.start = cycle1.end
cycle2.save()

# now cycle dates are inconsistent with their parent blocks

assert cycle1.end <= cycle1.block.end # AssertionError
assert cycle2.start == cycle2.block.start # AssertionError

Option 1b - redundant foreign keys

class Session(models.Model):
    season = models.ForeignKey(Season, on_delete=models.PROTECT)
    block = models.ForeignKey(Season, on_delete=models.PROTECT)
    cycle = models.ForeignKey(Cycle, on_delete=models.PROTECT)

Same issues with data redundancy with the dates, plus now these keys could get out of sync with each other, but now at least I can jump directly to any level of the hierarchy without having to pass through intermediate layers…

Option 2 - Just work everything out from dates

The dates already define the hierarchy, so including explicit foreign keys is basically denormalising the data. With some neat manager methods I can do all of the same selection logic as I could before and encapsulate it nicely ie:

class SpanManager(models.Manager):
    def within(self, other: "Span"):
        """Return lower level spans that are contained within other."""
        return self.filter(start__gte=other.start, end__lte=other.end)

class Span(models.Model):
    ...
    objects = SpanManager()

season = Season.objects.get_by_some_logic() #...
# all cycles with dates aligned within a season
cycles = Cycle.objects.within(season)

 #can even present this as a pseudo-related manager in the concrete classes:
class Season(Span):
    ...
    @property
    def cycles(self):
        return Cycle.objects.within(self)

# same as above:
cycles = season.cycles.all()

This seemed to go pretty far, I can implement the same functionality for traversing back up and for finding sessions etc. This is infact how a previous version of this system worked (markdown files with frontmatter, read via dataview queries in obsidian). But now I lose out on the ability to do related lookups at the database level with the ORM:

Cycle.objects.filter(block__start__month=10) 
# FieldError

Which is quite painful.

Other options, alternatives, open questions

  • If I enforce full coverage of every date by a season/block/cycle construct then technically the end dates are redundant (it’s just the previous spans start date)?
  • Option 1 extreme might be to say that even storing any dates on the spans is redundant information. Just define their start and end from the minimum and maximum dates of the sessions they encompass, and annotate that information on if its needed at query time.
  • Does it even make sense to model these spans as individual tables and rows? The first block and cycle in a season necessarily all share the same start date, so really that is one piece of information. Maybe a single table that defines all of the crossover dates and what level they are at. But it’s hard to imagine how I could come up with a schema for this, that still allows the convenience of having individual Season/Block/Cycle objects that I can attach other metadata to.

Fundamentally the issue here is that these span objects are identified by their dates, but I’m struggling to translate that connection into the ORM/relational model abstraction, without duplicating information.

I can’t tell without building out the application further if this is actually a real issue or if I’m overcomplicating it, but it’s hard to build it out further without defining a working model schema.

So if anyone has made it this far I’d very much appreciate any advice or pointers to examples of apps that handle similar functionality.

From the information provided, I’d create the following models:


class Span(models.Model):
    user = models.ForeignKey(
        settings.AUTH_USER_MODEL, on_delete=models.CASCADE
    )
    name = models.CharField(max_length=30)
    start = models.DateField()
    end = models.DateField()

    class Meta:
        abstract = True


class Season(Span):
    description = models.TextField(blank=True, null=True)


class Block(Span):
    season = models.ForeignKey(Season, on_delete=models.CASCADE)


class Cycle(Span):
    block = models.ForeignKey(Season, on_delete=models.CASCADE)


class Session(models.Model):
    user = models.ForeignKey(
        settings.AUTH_USER_MODEL, on_delete=models.CASCADE
    )  # May be excluded since cycle field is already aware of the user.
    cycle = models.ForeignKey(Cycle, on_delete=models.PROTECT)
    date = models.DateField()
    duration = models.DurationField()
    arrows = models.IntegerField()

This is only structural.

As for behaviour:

But now when I change the dates of a span, the sessions need to update which one they are pointing to otherwise there is an inconsistency

I’m not sure that it would be a good idea to allow manual editing of child “span” dates without well-defined rules about how this should affect its siblings and parent spans’ ranges. Does it squash future siblings’ date ranges when its own range is extended, or does it push everything else into the future? The latter seems like it would cause chaos with every “span” in the future being potentially updated due to a single row’s update.

If I enforce full coverage of every date by a season/block/cycle construct then technically the end dates are redundant (it’s just the previous spans start date)?

I would think so.

If it is the case that previous end dates are automatically the start dates of the next sibling then that might help with the issue of updating the dates, since there is only one date to move within a set of constraints (two sibling boundaries and/or the parent’s boundaries). (Siblings would have to be ordered.) But even so, I imagine that both start and end may have to stay as nullable fields since boundary spans still need them? What do you think?