Creating a DerivedField, a Field-like object that actually adds a .annotate() to every query

Hello, I wasn’t sure if this should go here or in django internals. It’s about using django, but I’m getting into the arcane api’s writing your own Fields.

Environment: Django 3.2, Python 3.8, Postgres

My model has several fields that are derived from math on other fields(ie: score, events, avg_score where avg_score is score /event). My normal approach is to add properties as needed. In this case, we need to sort by these fields and paginate in the database. My solution was to add .annotate() to all of my queries. That works well enough, but it’s a lot of duplicated, wordy code. So I moved those annotate’s to a custom manager. Now I have this:

class ScoreboardManager(models.Manager):
    """
    This adds calculated fields as annotations, so that they can be searched and sorted
    """
    
    derived_fields = {
        'avg_score': ExpressionWrapper(
            models.F('score') / models.F('events'),
            output_field=models.DecimalField(max_digits=9, decimal_places=2)
        )
    }
    
    def get_queryset(self):
        return super().get_queryset().annotate(**self.synthesized_fields)
        
    class Scoreboard(models.Model):
        score = models.PositiveIntegerField(null=False,validators=[MinValueValidator(1)])
        events = models.PositiveIntegerField(null=False, validators=[MinValueValidator(1)])
        
        objects = ScoreboardManager()

This is much better. Except that I can’t help but feel that I would rather the derived_fields look like Fields in the model.
Like this:

class Scoreboard(models.Model):
    score = models.PositiveIntegerField(null=False,validators=[MinValueValidator(1)])
    events = models.PositiveIntegerField(null=False, validators=[MinValueValidator(1)])

    avg_score = DerivedField(models.F('games_played') / models.F('players_active'), typefield = DecimalField(max_digits=9, decimal_places=2))

This seemed simple enough at first. I figured that contribute_to_class could replace the all the managers on the class with subclasses that had the appropriate magic. Now I am starting to doubt that plan. A look at django.db.models.options has made me realize that the managers attr is not at all simple. Additionally I see possible issues with definition order.

I’m hoping that a django internals wrangler has some pointer or ideas about how to proceed. If I get this where I like it might make a library out of it and share.

(Note: I also considered the django-computedfields. That won’t work for our use case. It used pre-save hooks to do it’s work and we are loading the table using bulk_create which skips the hooks. )

Well, doing it via a custom manager is the way to do it as far as I am aware. You could further abstract it with a proxy model, but thats just a different way to access it from your business logic.

django-computedfields is def. not a good way to solve this for your issue at hand (disclaimer: I am the author). While it even would work with bulk_create, your problem is easy expressible with normal django API without denormalization. django-computedfields has a very limited use case for denormalization tasks, that is field computations that either create alot of calculation pressure during select queries, and/or cannot be expressed easily anymore with python properties or the db expressions (F(), Value() etc). Both is not the case for your issue, so the data duplication done by denormalization as with django-computedfields is not justified by any means.

1 Like

I actually found someone who prototyped about what I am trying to do.

This takes a an Expression and make a quasi-field of it. Rather then using a Manager to call .annotation(), it’s using nasty stack inspection hackery in get_col(). It’s also several versions behind.

Maybe that I can mash that up with what I’m doing with managers to get somewhere.

Well if you want some neater and less repetitive code, maybe styling something like this would do:

class MyModel(AnnotatedModel):
    score = ...
    events = ...

    @annotate(output_field=...)
    def avg_score(self):
        return <some db expression logic>

Here AnnotatedModel could provide the @annotate decorator to inject the annotations into the default manager on bootstrapping. (Just a rough idea, you def. want to handle some edge cases here, like unsaved object instances cannot access those fields yet and so…)

Compared to the direct field declaration as done by the package, this has a big advantage - you can still work on queryset side only, which avoids nasty hacks to get rid of normal field behavior. A normal field is simply not the best choice to abstract that, as the ORM always expects it to have some sort of real column representation on a table (I guess the package needs the hacks to remove it from insert/update calls).

I ended up with something mildly Field like, in that it looks like a Field and implements contribute_to_class, but it does not subclass Field or call cls._meta.add_field.

class DerivedFieldsManager(models.Manager):
    @cached_property
    def _derived_fields_as_annotations(self):
        return {df.name: df.expression for df in getattr(self.model, DERIVED_FIELDS_LIST, [])}

    def get_queryset(self):
        return super().get_queryset().annotate(**self._derived_fields_as_annotations)


class DerivedField:
    def __init__(self, query_expression, output_field=None):
        self.query_expression = query_expression
        self.output_field = output_field
        self.name = None

    def contribute_to_class(self, cls, name, **kwargs):
        self.name = name
        if not hasattr(cls, DERIVED_FIELDS_LIST):
            cls._DF_derived_fields_list = []
        cls._DF_derived_fields_list.append(self)

    @property
    def expression(self):
        return ExpressionWrapper(self.query_expression, self.output_field) 

I still think someone should finish this concept, but I have deadlines to hit.