GSoC '23 - Improving the Database Caching Backend Ideas

I am interested in this project, and was researching a bit about it. Primarily from the sources that were mentioned in the GSoC idea itself (SummerOfCode2023 – Django). Based on my reading of the Django docs that are present on caching methods, and based on the blog post written by Adam Johnson (cc @adamchainz ), I think I kind of have an understanding of what the current issues are, and what has to be worked upon. I’ll try to summarise what the issue with the default DatabaseCache is, that has been explained very well in the blog post, and is currently in use in django-mysql as well. (Link to the blog post: Building a better DatabaseCache for Django on MySQL - Adam Johnson)

  1. Storage is done using TEXT type. This was probably due to a historic reason, as at that point in time probably different databases supported by Django didn’t have a corresponding binary data format. As far as I know, all databases currently do have some sort of binary saving format, which will definitely be useful for us.

  2. Another issue that Adam mentions (and he has implemented a solution to in django-mysql is that the SELECT COUNT(*) operation is quite an expensive one, and it is done on every set() operation to check if the cache limit has been breached. The approach that Adam provides is a probabilistic approach, where a certain percentage of set() operations are checked for this, overall improving the efficiency of the operation. However, this might lead to some cases where the limit might be breached.

I am currently going through the codebases of both Django and django-mysql to understand better what the current implementations are, and what we can do as part of improving this.

Regarding the actual implementation of the code, one approach that struck me was the following -

We create a new class django.core.cache.backends.db.GenericDatabaseCache, which will contain the generic structure of all database cache class. This class will be inherited by all other DatabaseCache objects and then customised based on that. One advantage based on this is that for any new database that might be added as official support to Django, it would just require creating a new class that inherits from this class, and go ahead with the implementation.

For different databases, we could create classes like django.core.cache.backend.db.MySQLDatabaseCache, django.core.cache.backend.db.PostgresDatabaseCache, etc. This will again help in the future, when newer databases are introduced. Along with this, we will keep the normal and default cache as it is, django.core.cache.backends.db.DatabaseCache to provide backwards compatibility and not cause it to become a breaking change.

Hi! Just wanted to ping @adamchainz once more, for some inputs regarding this topic. Thank you!