How to create a virtual/proxy model that hides storage details of some fields.

I have a model that represents run of a test:

class TestRun(models.Model):
    test = models.ForeignKey('Test', on_delete=models.CASCADE)
    regression = models.ForeignKey('Regression', on_delete=models.CASCADE)
    status = models.CharField(max_length=CHAR_MAX)

    class Meta:
        db_table = 'test_run'

Then there is a need to add other optional fields - most of them are just data fields and are not used for fillering data. I realized, that I can’t add all these fields to the model directly - as it will significantly affect speed of queries.

I end up with 2 models:

class TestRun(models.Model):
    test = models.ForeignKey('Test', on_delete=models.CASCADE)
    regression = models.ForeignKey('Regression', on_delete=models.CASCADE)
    filter_fields = models.JSONField(null=False, default=dict)
    fields = models.OneToOneField('TestFields', null=True, on_delete=models.CASCADE)

    class Meta:
        db_table = 'test_run'

# Storing too many data in a table that we use for filtering significantly slows down queries.
# Use separate model/table for a data that is not needed at filtering stage - to join it later.
class TestFields(models.Model):
    items = models.JSONField(null=False, default=dict)

    class Meta:
        db_table = 'test_fields'

But the problem is that I need to refactor all the code where TestRun model is used, while I changed just fields representation in DB.

Is it possible to create models like

# model that holds main (relationship) fields
class TestRunBase(models.Model):
    test = models.ForeignKey('Test', on_delete=models.CASCADE)
    regression = models.ForeignKey('Regression', on_delete=models.CASCADE)

class TestRunWrapper(TestRunBase):
    status = # get/store this field in TestRun.filter_fields
    duration = # get/store this field in TestRun.fields.items (=> in TestFields)

So I can use TestRunWrapper as I used TestRun before - create, save, bulk_create etc.

I looked at django-virtual-models, but it looks like it is focused on read path and doesn’t have a way to refer to related model for virtual fields.

On the surface, this is rarely a true statement. Extra fields in the database itself typically won’t affect the speed of a query, and you have the option of using the only and defer methods in your ORM queries to limit the fields being retrieved from the database.

Unless you can demonstrate that there is a real issue here, my initial reaction is that you’re just creating more work for yourself.

I use Django for write path / tests and separate queries generator for read path to create PostgreSQL queries and use them in Grafana.

I need to be able to store ~30 optional fields in TestRun model, but just a couple of them will be used for filtering. We created test data, measured query times and observed 2-4x increase.

With the approach I described here it is just 1.05-1.1x increase.

Fair enough. (It would be interesting to actually see the queries and models demonstrating this.)

I guess then, if you have a solution, what is the question you’re trying to answer here?

With the solution I have there is too big diff - as I need to create models and access their fields in a new way. I’m trying to find out whether it is possible to create a virtual model, that:

  1. Under the hood store its virtual fields in JSONField of real models (base and 1-on-1 related).
  2. (Optional) Doesn’t join related model right away, but only if its fields are referenced (so, perform filtering by the base model first, and then join related model as the very last step and lazily - only if these fields are accessed).

I do not really need (2) in Django, as I do not use Django for critical path (query DB from Grafana), but it is nice to have, as if I used Django for read, it would be mandatory requirement. Unfortunately, RDBMS doesn’t always optimize it the right way.

This is handled by the only and defer functions referenced above.

I’m still not sure I understand what you’re trying to do here.

You can create model methods that expose the data physically stored in the models as any type of object desired.

You also have the from_db and from_db_value method that can be used to change how a model is loaded, and the get_prep_value to take a Python value and format it for storage in the database. (Also see the docs for Custom Model Fields and Model Instance Reference for more details on this.)

However, none of these fundamentally affect how you query the data. The ORM is still generating SQL statements being issued to the database, which means it needs to reference the database in the manner that it is defined on the database. (I guess the get_prep_value could be used in some circumstances to help this, but I’m not sure it’s a generalized solution for what I think you’re trying to achieve.