I am interested in this project, and was researching a bit about it. Primarily from the sources that were mentioned in the GSoC idea itself (SummerOfCode2023 – Django). Based on my reading of the Django docs that are present on caching methods, and based on the blog post written by Adam Johnson (cc @adamchainz ), I think I kind of have an understanding of what the current issues are, and what has to be worked upon. I’ll try to summarise what the issue with the default DatabaseCache
is, that has been explained very well in the blog post, and is currently in use in django-mysql
as well. (Link to the blog post: Building a better DatabaseCache for Django on MySQL - Adam Johnson)
-
Storage is done using
TEXT
type. This was probably due to a historic reason, as at that point in time probably different databases supported by Django didn’t have a corresponding binary data format. As far as I know, all databases currently do have some sort of binary saving format, which will definitely be useful for us. -
Another issue that Adam mentions (and he has implemented a solution to in
django-mysql
is that theSELECT COUNT(*)
operation is quite an expensive one, and it is done on everyset()
operation to check if the cache limit has been breached. The approach that Adam provides is a probabilistic approach, where a certain percentage ofset()
operations are checked for this, overall improving the efficiency of the operation. However, this might lead to some cases where the limit might be breached.
I am currently going through the codebases of both Django and django-mysql
to understand better what the current implementations are, and what we can do as part of improving this.