RFC: Pre-compiling regular expressions as standard?

Most methods of the re module (eg re.sub) work in 2 stages: 1) compile the regex and 2) run the input on the compiled regex.

It’s a fairly well known performance boost to pre-compile regexes using re.compile, which separates step 1 to be done ahead of time (usually during module import). However, Currently, Django’s internals are a mixture of pre-compiled (using re.compile or _lazy_re_compile) and usage-time compiled patterns.

By pre-compiling patterns, we can trade import time performance (or first usage performance) for faster runtime performance. Some patterns, especially those in django.utils, are used quite often, so could likely massively benefit from pre-compilation.

Python’s re module caches the most recent 512 patterns for compilation. This likely means that performance is ok at the moment, especially on isolated microbenchmarks. However, even a small Django project can easily overflow that cache, resulting in churn and unnecessary extra compilations.

Therefore, I propose replacing patterns defined at runtime with pre-compiled patterns done at module import time (or lazily). This should improve runtime performance, with minimal import time overhead. Specifically, I suggest doing this as a single pass, rather than opening separate tickets per instance / app etc, and updating the contributor guide accordingly. A recent PR reminded me that this is some low-hanging fruit which automation will easily pick up on, rightfully or otherwise.

Interested in what other people think, whether this is a useful refactor and policy change.

1 Like

+1 to that

It would be possible to write a flake8 plugin that detects most uncompiled regexes through the ast.

1 Like

Ticket created:

1 Like