Abstract base model for abstracting particular save behavior

I have a few models that have pairs of fields named like: text and rendered_text.

The former one(s) can contain LaTeX formulas, and what I’m doing is I’m saving those formulas already rendered (via server-side mathjax) as svgs in the field(s) whose name begins with rendered_. This makes things way easier on the frontend. Since I have several models that implement this behavior and on differently named fields, I tried to abstract away this behavior and I thought I could use an abstract model for this. This is what I came up with.

class TexRenderable(models.Model):
    class Meta:
        abstract = True

    def _get_renderable_field_pairs(self):
        ret = []
        for field in self._meta.get_fields():
            split_field_name = field.name.split("_", maxsplit=1)
            if split_field_name[0] == "rendered": # this is a target field
                target_field = field.name
                # look for a field with the same name minus the `rendered_` prefix
                source_field = self._meta.get_field(split_field_name[1]) # target field will contain rendered text from this field
                ret.append(
                    (source_field, target_field),
                )

        return ret

    def save(self, *args, **kwargs):
        creating = self.pk is None
        renderable_field_pairs = self._get_renderable_field_pairs()

        for (source, target) in renderable_field_pairs:
            # only re-render if the field's value has changed
            value_changed = creating or getattr(
                type(self).objects.get(pk=self.pk), source
            ) != getattr(self, source)
            if value_changed:
                rendered_content = get_rendered_tex(getattr(self, source))
                setattr(self, target, rendered_content)

        return super(TexRenderable, self).save(*args, **kwargs)

I’d like to hear your thoughts on this approach.

First of all, is this a good use case for abstract model inheritance? I was thinking maybe a mixin would be more appropriate given there are no fields defined inside of the base model class, but I don’t have much experience with mixins in python so I’m not sure.

Secondly, how does the actual implementation of the method look? There are a few things that feel hacky to me, like calling type(self) to get the model class. I haven’t worked with complex inheritance-related stuff in python, so I don’t (currently) know any better.

Bonus question:

Say I want a concrete model to inherit from TexRenderable as well as this other abstract model:

class AbstractItem(models.Model):
    course = models.ForeignKey(Course)
    topic = models.ForeignKey(Topic)

    class Meta:
        abstract = True

    def save(self, *args, **kwargs):
        if self.topic not in self.course.topics.all():
            raise ValidationError("Chosen topic doesn't belong to chosen course")
        return super(Item, self).save(*args, **kwargs)

this simply adds two foreign keys and does a check inside of save before actually saving. What would be the correct order to put them inside of the parentheses when declaring the child class? Does it even matter, or will the behavior of save be the same regardless? Will this even work, or will I get an error when calling save from a child instance? I know I could just try it out but this is a novel topic for me, so I really want to grasp how multiple inheritance works in python. I’ll do my research on the side too–if you feel this part is off-topic, just ignore it.

When I have used an ‘abstract base class’ I’ve found overriding / using the init_subclass hook useful. In your case, you could use it to save the list of renderable field pairs as a value on the class, recalculated once each time you start django, to save on recalculation for every ‘save’ invocation.
Re: the bonus question, I think you are asking whether:

class ConcreteItem(AbstractItem, TexRenderable):

will give you the right combination of save functions?
I don’t think so - I think one save function will clobber the other whichever way around they are.

I think a way to do this is for AbstractItem to derive from TexRenderable: the super().save() call will invoke the TexRenderable.save() after the validation check to update the rendered fields and then super(TexRenderable).save will be the models.Model save. (I certainly don’t fully understand python class inheritance perfectly - hoping someone will correct me)

I haven’t talked about type(self).objects.get(pk=self.pk). I think its fine? I tend to use self.class but I can’t justify that. It doesn’t look to me like this would be vulnerable to race conditions (?), but maybe it would be an issue if two people were working on the same document and hit save at the same moment(?) Good luck.

1 Like

Coming from a relational-modelling background, I always start my model design by defining my tables - and building the models from that.

Do you have a data model that you’re building from, or are you starting with creating the Django Model classes? If so, I’d be concerned that you’re building something that won’t translate well into the relational model.

Only after I have a solid data model do I start looking at class implementations and possible inheritance. (I find it easier to work in that direction.)

As a matter of fact, I do. What I’m trying to do here is to abstract away some of the common behvior that the models share. A lot of the entities at play in my app happen to have text fields that may contain TeX formulas and I want to have that text saved pre-rendered. The original, raw text would be also kept to allow the users to edit it in the UI.

Maybe what puzzles you is the need of having pairs of fields that need to be kept in sync, which goes against some of the general rules of sound database design. However, I think this is actually a fair use case of denormalization in my case. Here’s the rationale:

I need to serve text that contains TeX formulas and I can’t have them rendered by the client. It just brings in too many possible issues with the user experience and I don’t want to have to account for all of that. Therefore, the only option is to do the rendering on the server. However, I can’t render the text on demand when I show the formulas to a user, because server-side mathjax rendering is hella slow. It takes up to a minute to render a couple of formulas. So the only thing I’m left with is to pay a little computational price when such a text is inserted or updated (it’s okay; it’s not on the hot path–showing them fast later is) and to keep both the raw text and the processed one in different fields.

Of course, this is all I could think of. If you have other ideas, I’m very happy to hear and try out different things.

As for the second abstract model, I was just trying to factor out two fields and validation logic that is shared between two different models. Necessary? Definitely not, but I thought if would have been cool if I managed to make it work.

Thanks for taking the time to explain. Yes, I absolutely agree with you here.

That’s generally how I approach this situation. I prefer the mixins, particularly when I have multiple different sets of functionality to add to Models on a mix & match basis. (I’m not sure I have any that don’t supply fields though.)

What I don’t think I’m following is why you think it’s preferable to not include the fields in your mixins / base classes. If you need to access that data under different names, you can always add a model method to the class to retrieve the data under a different name.

As far as identifying specifics, Python features such as type(self) and Django features like self._meta.fields are the appropriate ways to retrieve this type of introspective data.

For another approach, if you look at some of the supplied mixins in packages like Django, django-extensions, django-extra-views, django-treebeard, etc, you’ll find many of them use class variables for “configuration”. (For example, think of the ‘model’ attribute used in things like CBVs and forms to identify the associated model. The parallel isn’t perfect, because that information isn’t available any other way, but it is a common pattern in this type of situation.)

If you’re concerned about the sequence of calls within a multi-inheritance class hierarchy, read up on the “Method Resolution Order” (MRO) in the Python docs. (If you’re not familiar with it, see 3. Data model — Python 3.12.0 documentation and The Python 2.3 Method Resolution Order | Python.org, along with most anything else you can find by searching on the topic.)

I’ll add a more detailed response later, but the issue is some models have more than one pair of such fields. For example, I have a question model that has a text field and a help_text field, both with their corresponding rendered field. If I hard coded the fields in the base class, (say three pairs) I would lose the ability to have an arbitrary amount of those fields (which is fine, since I already know what my models are gonna look like), but for all the models that have less renderable fields than that number, I’m wasting those db columns.

A very knowledgeable guy I spoke to this morning advised I could remove the base class altogether, have an attribute in my models called renderable_field_names (list of strings), and then use a post_save signal to run the rendering logic. This will work well because after all, I need to render the tex only after I know everything went well with the saving, but then I lose the ability to only re-render the fields whose value changed. Is there a way to tell if a value was updated inside of a post save signal handler?

So yes, a configuration attribute could be used to identify pairs of fields for an object that your mixin makes reference to.
e.g.: field_pairs = (('help', 'help_text'), ('raw', 'converted_text'), ...)

Another option in this case is that this becomes a related table that contains a pair of fields with the raw and converted text along with a Generic ForeignKey to the base class - and possibly some meta data such as a “logical field name” and “sequence number”. (There are a couple different ways to model this - this is just one.)

1 Like

I hadn’t thought of using generic foreign keys–I’m a bit afraid this would be overkill, though. Especially as the rendered text(s) will need to be accessed frequently and therefore the access needs to be as fast as possible (which isn’t generic relationships’ forte).

Honestly, the idea I like the best and gravitate towards is having the renderable fields’ names in a configuration variable like we talked about and then using the post_save signal to do the rendering. The only thing I’m left having to figure out is how to track if the fields’ contents changed since the last save. It’d be a shame to have to redo the rendering for no reason every time the model is saved.

Do you have any ideas as far as that?

Actually, it’s not that bad if you’re doing the reverse access. (Finding the objects related to a base object from the base object.) The ContentType for your base class is almost certainly going to be cached, so there won’t be any IO required for that reference.
What you’re really going to lose is the automatic integrity of having those related items deleted if the base object instance is deleted. But yes, if the number of instances per class is a fixed number, then GFKs aren’t necessary and likely not the best fit.

There’s another post from a different discussion on this site showing how you can use the from_db method in your model to track the original value of a field when the model is retrieved. See - `RecursionError` when deleting a model instance in Admin - #12 by KenWhitesell. (Might be worth reading the entire thread for context.) You could implement that method in your code and compare the original field to what is in the model at the time of the save to determine whether or not you need to rerender the text.

(Side note - if the rendering of text is that slow, you might also want to consider spinning that process off as a celery task to not slow down the person submitting the original text, unless they need to see it “right then”.)

1 Like

That’s been on my mind for quite a while–I just couldn’t justify adding celery to the project altogether just for that. But now that I added it for the pdf generation process (as per my other recent thread), it’s easier to just move the rendering to an async task. The only thing I need to be careful with is error handling: the rendering process shouldn’t generally fail (if supplied with invalid TeX code, it just produces glitched graphics but that’s not an error), but I need to account for whether something catastrophic happens (i.e. an error with the actual python call to subprocess.check_output).

I’ll take a look at the thread you linked and report back once I tried something. Thank you!

Here’s what I did:

the models that use the rendering behavior define an attribute such as:

 renderable_tex_fields = [
        ("text", "rendered_text"),
        ("solution", "rendered_solution"),
    ]

and they inherit from this class:

class TrackRenderableFieldsMixin(models.Model):
    class Meta:
        abstract = True

    @classmethod
    def from_db(cls, db, field_names, values):
        instance = super().from_db(db, field_names, values)
        for (source, _) in cls.renderable_tex_fields:
            setattr(instance, f"_old_{source}", getattr(instance, source))

        return instance

then in the signal handler:

@receiver(post_save)
def render_tex_fields(sender, instance, created, **kwargs):
    if not hasattr(sender, "renderable_tex_fields"):
        return

    re_rendered_field_values = {}
    for (source, target) in sender.renderable_tex_fields:
        value_changed = created or (
            getattr(instance, source) != getattr(instance, f"_old_{source}")
        )
        if value_changed:
            rendered_content = render_tex(getattr(instance, source))
            re_rendered_field_values[target] = rendered_content

    # use `update` to prevent calling `save` again and entering a loop
    sender.objects.filter(pk=instance.pk).update(**re_rendered_field_values)

Seems to be working like a charm. Thank you!