The changes listed in the old PR are pretty big compared to the changes I made, and there are no tests there to motivate the changes so it’s a bit hard to understand if my attempt to implement is just hopelessly naive or if the previous PR was over engineering or feature creeping.
Maybe someone can come up with some nasty template test cases?
Reading the old PR it seems there’s at least some feature creep. I think my proposal, being opt in, will not have some of the issues that some of the old PR tried to handle. I also think we can do this in two steps:
implement the basic thing I did above
talk about the option and have people try it but do NOT turn it on for createproject
if no problems are found, turn it on for createproject
deprecate the old thing
(very much later) hard switch and remove the old code
The regex for parsing template tags is the same for {% %}, {{ }} and {# #}. I’ve scope creeped a bit on the PR and included all three. This simplifies the implementation, and imo increases the utility at the same time.
One issue with introducing multi-line template tags, is that it will exacerbate this existing issue with the verbatim tag. At the moment the issue is a fairly niche edge-case, but with the introduction of multi-line template tags, it will apply to a much wider variety of situations.
Unlike the breaking change around template tags split between multiple lines, I can imagine situations where this may well actually cause problems in the wild. The whole point of the verbatim tag is that you should be able to use characters like {% etc.!! (There is also nothing we can do about the other issue, it’s just part and parcel of introducing multi-line tags. The verbatim issue is fixable though - even if it’s difficult).
The verbatim issue would be mitigated slightly if we restrict multi-line tokens to block-tokens only (and text-tokens, just not variable tokens or comment tokens)**. But even with this restriction, it still feels like we would be further breaking verbatim a little more than is ideal.
I wonder if we could replace the current regex based lexer with something a little more robust, and kill two birds with one stone. As the lexer currently stands, if you strip out comments it’s about 40 lines of logic, and is really nicely self-contained. It would be something we could swap out without having to worry about knock-on effects.
** FWIW, I think we should make this restriction regardless - I’m not sure what the advantage of multi-line filter-expressions would be. It would certainly make the filter-expression regex, and split_contents() more complicated.
Blockquote
if we restrict multi-line tokens to block-tokens only
So, to achieve this, the re.compile should not be done with the re.DOTALL flag, instead it would be something like this:
tag_re = re.compile(r"(\{%\s*[\s\S]*?\s*%\}|{{.*?}}|{#.*?#})")
which matches single line comment tokens and variable tokens, and multiline block-tokens.