Management command to clear expired cache entries from DatabaseCache backend

I recently added some simple caching to an app I was working on, just using the DatabaseCache backend, but noticed that there doesn’t seem to be an in-built way to clear expired cache entries, other than on a cache hit of an expired entry.

Could this be something we provide as a management command, similar to the clearsessions one? It seems like it could be of value to me.

As the database table isn’t a model, it’s not trivial for end users to write a custom one themselves, as all access to the cache table is done with raw SQL directly via the cursor.

I had a quick look at trac, but couldn’t see a relevant ticket previously discussed. Let me know if I’ve missed one, though!

2 Likes

There doesn’t seem to be an external API available to do this manually, but I don’t believe your assumption is correct.

From the docs at Django’s cache framework | Django documentation | Django

Instead, expired cache entries are culled each time add() , set() , or touch() is called.

This happens every time those methods are called, and not just called on those entries being referenced. (See the source code in django.core.cache.backends.db.databaseCache, both the _cull method and its usage in _base_set.)

Personally, I’m not sure that there’s really a need to create a separate management command for this. What do you see is the issue or problem that would be resolved by doing this? (But yes, you could either call _cull directly - or delete the rows yourself.)

I don’t think this is actually true. _cull is only called in _base_set if the number of entries is above _max_entries (which seems to default to 300), not on every add/set/touch, so it’s not quite the same as periodically clearing out expired entries. That, and _cull does more than just clear expired entries—it reduces the table by the the 1/cull_frequency (or empties it completely) too, regardless of expiry dates.

What do you see is the issue or problem that would be resolved by doing this?

I guess originally I was imagining the cache would continuously grow larger without periodic removal of expired entries—but with a cache limit, this is only true until that limit is reached. I would expect most other backends to automatically clear up expired entries themselves regardless of cache limits (within the redis process, for example), so was surprised there isn’t a mechanism to do the same for the database backend.

Yes, this is correct - which is why I asked about the underlying concern. If it’s just a cache size issue, then this distinction is not nearly as important than if there was an accuracy / reliability / data correctness issue associated with expired cache data.

If _cull_frequency is not 0 (3 is the default), then _cull first deletes the expired entries. Then, if (and only if) the remaining number of entries is larger than the max entries then it will remove the 1/cull_frequency entries.

So, if your _max_entries is 300 (yes, the default - but can be set), and the cache is at 310 with 20 that are expired, when one of add/set/touch are called, only those 20 will be deleted. The other 290 unexpired entries will remain unaffected.

Hmmm… On the other hand, if there are only 4 expired keys in that set, those 4 would be removed followed by 102 unexpired entries (1/3 of 306) leaving 204. I guess that’s reasonable, since if you have defined a maximum number of entries you want in the cache, you may need to remove unexpired entries to free up space. Since both those values (max_entries and cull_frequency) are configurable, you can decide how many will be deleted with each cut.

I don’t - I would expect it to be based on the storage mechanism’s intrinsically-available features. Yes, redis and memcached both provide options to define a life-span for entries. However, none of the database, file system, nor local memory backends have that feature. This means that the expiration of entries does require some “trigger” to make them happen.

Note: Both Redis and Memcached actually work this way in part as well.

Redis does not automatically delete keys when they expire - they’re actually deleted when they are attempted to be retrieved. (There is also a periodic cleaning of some portion of the keys with expiration times - See EXPIRE | Docs)

Memcached does something similar - FAQ – Memcached Documentation

So essentially, all the backends do things in a similar way, from a perspective outside the cache backends. It’s just a question of whether it’s Django or the backend doing it.

Both those documents demonstrate what I was asking for for the database backend—both systems implement an out-of-band periodic cleanup of expired cache items. That’s all I was wondering/suggesting a management command be provided to implement for the database backend.

I hadn’t looked at the other built-in backends, but perhaps such a thing would be useful for those, too.

So essentially, all the backends do things in a similar way, from a perspective outside the cache backends. It’s just a question of whether it’s Django or the backend doing it.

Yes, they do—they all provide a periodic cleanup for those that have a third party dependency :wink: my question was…”why not this for the Django-provided backends too”.

I do understand what you’re saying about the cache limit and keys being expired once that limit is near/reached. But if that were considered a suitable solution by itself, then neither redis or Memcached would bother with the periodic cleanup, surely…

To be clear:

Yes, I agree this could be done. Whether or not it should is an open question.

<opinion>
Personally, I’m not seeing the value. I don’t see what the benefit would be of running it. Even if I were using the database backend on a Raspberry Pi I wouldn’t be personally concerned about an extra 300+ rows.

(The django-extensions package already has a clear_cache command to completely flush the cache, but I see that as filling a different need than what you’re proposing.)
</opinion>
(The above does not mean that I think it shouldn’t - only that I don’t forsee any situation where I would be using it.)

Side note: Other than in some limited testing situations, I have never used either of those two backends. I only use Redis / Valkey (now), or Memcached (historically). I don’t see the desirability of using either of the other two in a deployment environment.

As I see it, Redis and Memcached have fundamentally different uses than databases and file systems. While all four can be used for cache, only two are fundamentally designed for that purpose.

  • Redis and Memcached

    • Designed for temporary (non-persistent) use.
    • Designed to be constrained to a limited amount of memory
    • Provides an interface allowing for the definition of a “time-to-live” (ttl) attribute
    • Will periodically, or when needed, remove expired entries.
      • Can be configured to remove unexpired entries when necessary as an alternative to having the write operation fail.
  • Database and File System

    • Designed for persistent storage
    • Usually unconstrained by sizes smaller than the file system, which is typically orders-of-magnitude larger than available memory.
    • Does not define an attribute analogous to a TTL
    • Will only delete files or row when explicitly instructed
    • User-space applications (e.g. Django) must manage expiration.
    • Will not arbitrarily remove data when full - will always return a “write error”.

Another factor to consider with both PostgreSQL and Sqlite is that deleting rows does not necessarily mean that the space is going to be released back to the operating system. It’s not until VACUUM / VACUUM FULL is run that the table is physically shrunk. So a frequent cleaning of “expired” / “marked deleted” rows isn’t really going to minimize space utilization until the VACUUM is performed.

The FileSystem backend is worse. The TTL is stored in the file, which means that Django has to open the file and read it to determine whether or not it has expired.