Currently, the DTL Lexer uses RegEx for tokenisation. There are a couple of consequences of this.
- In some scenarios it can result in excessive back tracking, see PR #19360
- The current RegEx means the verbatim tag can be limited in certain situations, see ticket 23424
Also related is the ticket to allow multiple line template tags see PR #18805.
I have investigated writing a lexer that avoids using RegEx (is ‘recursive descent’, the correct phrase?). Here’s the branch, see link.
It passes the current test suite with one failure – it allows new lines inside tags, which is the change in PR #18805.
Performance-wise, the difference is varied. In normal usage, it is slower than the current RegEx implementation (35% in the var example), but can be significantly quicker in some scenarios (var nested open 500x). See the full benchmark comparison below the line.
Appreciate opinions and thoughts on whether this is something worth investigating further.
Benchmark examples:
Ticket 35675
----------------
CASE: var
Total time taken: 0.000174 seconds
Time per execution: 0.000002 seconds
----------------
CASE: tag
Total time taken: 0.000169 seconds
Time per execution: 0.000002 seconds
----------------
CASE: comment
Total time taken: 0.000127 seconds
Time per execution: 0.000001 seconds
----------------
CASE: mismatched
Total time taken: 0.000084 seconds
Time per execution: 0.000001 seconds
----------------
CASE: var_nested_open
Total time taken: 1.661711 seconds
Time per execution: 0.016617 seconds
----------------
CASE: tag_nested_open
Total time taken: 0.683057 seconds
Time per execution: 0.006831 seconds
----------------
CASE: comment_nested_open
Total time taken: 1.653287 seconds
Time per execution: 0.016533 seconds
----------------
CASE: var_nested_closed
Total time taken: 0.004123 seconds
Time per execution: 0.000041 seconds
----------------
CASE: tag_nested_closed
Total time taken: 0.003450 seconds
Time per execution: 0.000034 seconds
----------------
CASE: comment_nested_closed
Total time taken: 0.003978 seconds
Time per execution: 0.000040 seconds
----------------
CASE: mismatched_nested
Total time taken: 0.709095 seconds
Time per execution: 0.007091 seconds
New:
----------------
CASE: var
Total time taken: 0.000234 seconds
Time per execution: 0.000002 seconds
----------------
CASE: tag
Total time taken: 0.000246 seconds
Time per execution: 0.000002 seconds
----------------
CASE: comment
Total time taken: 0.000334 seconds
Time per execution: 0.000003 seconds
----------------
CASE: mismatched
Total time taken: 0.000229 seconds
Time per execution: 0.000002 seconds
----------------
CASE: var_nested_open
Total time taken: 0.003380 seconds
Time per execution: 0.000034 seconds
----------------
CASE: tag_nested_open
Total time taken: 0.003386 seconds
Time per execution: 0.000034 seconds
----------------
CASE: comment_nested_open
Total time taken: 0.003444 seconds
Time per execution: 0.000034 seconds
----------------
CASE: var_nested_closed
Total time taken: 0.000777 seconds
Time per execution: 0.000008 seconds
----------------
CASE: tag_nested_closed
Total time taken: 0.000377 seconds
Time per execution: 0.000004 seconds
----------------
CASE: comment_nested_closed
Total time taken: 0.000732 seconds
Time per execution: 0.000007 seconds
----------------
CASE: mismatched_nested
Total time taken: 0.003309 seconds
Time per execution: 0.000033 seconds