I am interested in this project, and was researching a bit about it. Primarily from the sources that were mentioned in the GSoC idea itself (SummerOfCode2023 – Django). Based on my reading of the Django docs that are present on caching methods, and based on the blog post written by Adam Johnson (cc @adamchainz ), I think I kind of have an understanding of what the current issues are, and what has to be worked upon. I’ll try to summarise what the issue with the default
DatabaseCache is, that has been explained very well in the blog post, and is currently in use in
django-mysql as well. (Link to the blog post: Building a better DatabaseCache for Django on MySQL - Adam Johnson)
Storage is done using
TEXT type. This was probably due to a historic reason, as at that point in time probably different databases supported by Django didn’t have a corresponding binary data format. As far as I know, all databases currently do have some sort of binary saving format, which will definitely be useful for us.
Another issue that Adam mentions (and he has implemented a solution to in
django-mysql is that the
SELECT COUNT(*) operation is quite an expensive one, and it is done on every
set() operation to check if the cache limit has been breached. The approach that Adam provides is a probabilistic approach, where a certain percentage of
set() operations are checked for this, overall improving the efficiency of the operation. However, this might lead to some cases where the limit might be breached.
I am currently going through the codebases of both Django and
django-mysql to understand better what the current implementations are, and what we can do as part of improving this.
Regarding the actual implementation of the code, one approach that struck me was the following -
We create a new class
django.core.cache.backends.db.GenericDatabaseCache, which will contain the generic structure of all database cache class. This class will be inherited by all other
DatabaseCache objects and then customised based on that. One advantage based on this is that for any new database that might be added as official support to Django, it would just require creating a new class that inherits from this class, and go ahead with the implementation.
For different databases, we could create classes like
django.core.cache.backend.db.PostgresDatabaseCache, etc. This will again help in the future, when newer databases are introduced. Along with this, we will keep the normal and default cache as it is,
django.core.cache.backends.db.DatabaseCache to provide backwards compatibility and not cause it to become a breaking change.
Hi! Just wanted to ping @adamchainz once more, for some inputs regarding this topic. Thank you!
Apologies for the delay… I know the proposals period is over, I hope you got one in.
I wasn’t expecting to add one cache backend per database backend. Rather that we could adapt the existing
DatabaseCache based on some lessons from django-mysql’s cache backend. Or if that turns out to be unfeasible, create a single new class that works with all databases.
We should be able to use the ORM to generate appropriate SQL that will work on any database backend, even third-party database backends. I used raw SQL in django-mysql but the ORM was more limited when I wrote that, nearly 8 years ago.
Anyway I think all such questions are open to more research, that would be part of the GSoC project. Perhaps efficiency really does need per-backend raw SQL.
I unfortunately did not make a proposal in GSoC. However, I am still interested in working on this particular topic. I hope that is fine!
As you mentioned, editing the existing class, or making one generic class does seem to be a better option, and that making individual classes might be overkill.
I’ll look into these things - modifying the current class, and using the ORM for wherever SQL is required.
Hope this message finds you in good spirits. I am here to know if there is any way I would contribute to this project?