GSoC '23 - Improving the Database Caching Backend Ideas

anirudhprabhakaran3 · March 17, 2023, 1:48pm

I am interested in this project, and was researching a bit about it. Primarily from the sources that were mentioned in the GSoC idea itself (SummerOfCode2023 – Django). Based on my reading of the Django docs that are present on caching methods, and based on the blog post written by Adam Johnson (cc @adamchainz ), I think I kind of have an understanding of what the current issues are, and what has to be worked upon. I’ll try to summarise what the issue with the default DatabaseCache is, that has been explained very well in the blog post, and is currently in use in django-mysql as well. (Link to the blog post: Building a better DatabaseCache for Django on MySQL - Adam Johnson)

Storage is done using TEXT type. This was probably due to a historic reason, as at that point in time probably different databases supported by Django didn’t have a corresponding binary data format. As far as I know, all databases currently do have some sort of binary saving format, which will definitely be useful for us.
Another issue that Adam mentions (and he has implemented a solution to in django-mysql is that the SELECT COUNT(*) operation is quite an expensive one, and it is done on every set() operation to check if the cache limit has been breached. The approach that Adam provides is a probabilistic approach, where a certain percentage of set() operations are checked for this, overall improving the efficiency of the operation. However, this might lead to some cases where the limit might be breached.

I am currently going through the codebases of both Django and django-mysql to understand better what the current implementations are, and what we can do as part of improving this.

anirudhprabhakaran3 · March 17, 2023, 1:53pm

Regarding the actual implementation of the code, one approach that struck me was the following -

We create a new class django.core.cache.backends.db.GenericDatabaseCache, which will contain the generic structure of all database cache class. This class will be inherited by all other DatabaseCache objects and then customised based on that. One advantage based on this is that for any new database that might be added as official support to Django, it would just require creating a new class that inherits from this class, and go ahead with the implementation.

For different databases, we could create classes like django.core.cache.backend.db.MySQLDatabaseCache, django.core.cache.backend.db.PostgresDatabaseCache, etc. This will again help in the future, when newer databases are introduced. Along with this, we will keep the normal and default cache as it is, django.core.cache.backends.db.DatabaseCache to provide backwards compatibility and not cause it to become a breaking change.

anirudhprabhakaran3 · March 23, 2023, 4:03am

Hi! Just wanted to ping @adamchainz once more, for some inputs regarding this topic. Thank you!

adamchainz · April 16, 2023, 7:42am

Hi!

Apologies for the delay… I know the proposals period is over, I hope you got one in.

I wasn’t expecting to add one cache backend per database backend. Rather that we could adapt the existing DatabaseCache based on some lessons from django-mysql’s cache backend. Or if that turns out to be unfeasible, create a single new class that works with all databases.

We should be able to use the ORM to generate appropriate SQL that will work on any database backend, even third-party database backends. I used raw SQL in django-mysql but the ORM was more limited when I wrote that, nearly 8 years ago.

Anyway I think all such questions are open to more research, that would be part of the GSoC project. Perhaps efficiency really does need per-backend raw SQL.

Good luck!

anirudhprabhakaran3 · May 5, 2023, 3:15am

Hello!

I unfortunately did not make a proposal in GSoC. However, I am still interested in working on this particular topic. I hope that is fine!

As you mentioned, editing the existing class, or making one generic class does seem to be a better option, and that making individual classes might be overkill.

I’ll look into these things - modifying the current class, and using the ORM for wherever SQL is required.

Thank you!

TashfikS · May 28, 2023, 8:57am

Hello everyone,
Hope this message finds you in good spirits. I am here to know if there is any way I would contribute to this project?

Topic		Replies	Views
GSOC 2023: Improving the databse cache backend Mentorship	0	346	April 1, 2023
GSoC 2024 Proposal: Improve the Database Cache Backend Mentorship	0	183	April 1, 2024
GSoC 2024 Proposal Feedback: Improve Database Cache Backend Mentorship	5	479	March 30, 2024
Participating at Gsoc 2024 Mentorship	11	1048	April 26, 2024
Ticket 24306 - postgresql unlogged tables Django Internals	4	454	November 14, 2023

GSoC '23 - Improving the Database Caching Backend Ideas

Related topics