A question about polymorphism in Django

Assume that we have a tracking system for packages. Tracking of a package is a table of it’s movements. This movements can provide from different resources, for example a user can add some steps manually or a REST API that provide location of package carrier (truck for example) can add movements automatically.

We have a MovementModel which 2 of its fields can reference very different models. One is data_provider which can refer to either User or another model (TruckRestAPI). The other is carrier which can either be Truck or Plane.
So to summarize we have:

class Movement(models.Model):
    field1 = ...
    field2 = ...
    data_provider = foreignKey to { User or TruckRestAPI }
    carrier = foreignKey to { Truck, AirPlane, ... }

Because we want to be able to access the MovementModel from User or Truck, JSONField is not an option.

We also don’t want to be forced to create multiple fields on the MovementModel to the different models so DjangoAbstractModel is also not an option.

Also since performance is also of great importance to us, ConcreteModel and GenericRelations are not also good for us!
We can also use something similar to ConcreteModel with a better
performance like referencing the parent model with the foreign key which
is explicit:

class BaseMovement(models.Model):
    field1 = ...
    field2 = ...
class MovementUserVessel(BaseMovement):
    foreign_key = models.ForeignKey(BaseMovement):
    data_provider = models.ForeignKey(User)
    carrier = models.ForeignKey(Vessel)

which doesn’t make django to perform behind the scene join operations. But since there are 2 fields on our MovementModel, each referencing multiple models, this will make us have 6 different models referencing the BaseMovementModel which is also not a good idea!

Which one is better, is there other ways?

Some initial thoughts while I continue to read and think about this issue -

You’re setting up an initial set of contradictory conditions. You’ve identified what you don’t want to do, because you’re not sure what the performance impact may be. But, until you’ve looked at what your tables look like from a database perspective, and possibly modeled them as a proof-of-concept, you can’t know what the results are going to be.

Always remember that the ORM is a layer that sits on top of a real database structure. For example, you identify a possible field " data_provider = foreignKey to { User or TruckRestAPI } "
But, how is data_provider going to know which table it’s referring to? As an implementation detail, a ForeignKey field stores the primary key of a table. Somewhere, it needs to know which table the primary key is referring to - which effectively means a Generic Relation.

As a data-modeler, I would say you’re probably looking at multiple columns in this case, but I don’t know enough about your data needs to be sure.

The carrier issue may be a more straight-forward solution - first thought is multi-table inheritance: https://docs.djangoproject.com/en/3.0/topics/db/models/#multi-table-inheritance
Your movement model has an FK relationship with a Carrier model, where each instance of Carrier is subclassed (and linked via OneToOneField) with one of a Truck, AirPlane, CargoShip, etc.

But as a general rule, in these types of complex-data situations, I, personally, always start with modeling the tables and working backward to the ORM structure for any application spanning more than a dozen tables or so. (Admittedly, this approach comes from having 30 years experience with relational databases and data modeling.)

1 Like

Performance isn’t binary. GenericReltaion may be performant enough for your use case - and in fact probably is?

It has its own problems of course.

This is a good approach :slight_smile:

Also, Concrete inheritance doesn’t always kill performance. We have used this in a model that has about 25 subclasses, and it’s been fine.

You may find you need to model it in all the different ways, and then see which one works best (in terms of performance, simplicity of code, or whatever metric is important).

For multi-table inheritance I recommend checking out django-polymorphic (https://django-polymorphic.readthedocs.io/en/stable/). It makes multi-table inheritance a bit less painful and provides a few niceties like when a query is made at the base model, the inherited model classes are returned.

Example from the docs:

>>> Project.objects.all()
[ <Project:         id 1, topic "Department Party">,
  <ArtProject:      id 2, topic "Painting with Tim", artist "T. Turner">,
  <ResearchProject: id 3, topic "Swallow Aerodynamics", supervisor "Dr. Winter"> ]

Thanks for your reply. it’s a pseudo code, i just want to say that data_provider can reference to a another table, and it’s a union {User or TruckRestAPI} and I don’t know how i should model this field. How i should model a field that can reference to User or AnotherModel?

Great. Again, thank you very much. Your approach to modeling data is very good.

See this topic:

Can produce inefficient queries: The ORM cannot determine in advance what models are referenced by the generic foreign key. This makes it very difficult for it to optimize queries that fetch multiple types of products.

Mentioned as a cons for GenericRelation, did you think that for our case we can ignore this?

The way to model a reference to more than one other model is the use of a generic foreign key - that is the use case for that feature.
I have used GFKs in the past - they are useful for a specific set of problems. However, my opinion is that this case isn’t one of them.
I believe you’re going to encounter fewer problems in the long run by having two fields, a ForeignKey to User and a ForeignKey to TruckRestAPI, then in your model, add a constraint to ensure that one of them is populated while the other is null.

1 Like

I prefer django-model-utils: that allows you to use the parent/base model, and then select_subclasses (often on the specific subclasses you want), when you need to, rather than on every query.

We have 25 subclasses: using select_subclasses() on every query would join in 25 tables.

Yes it can be a problem, I’d just counsel measuring things if you don’t know.

Ken’s approach sounds the most sensible.

Didn’t know about that function in django-model-utils, thanks for sharing @schinckel