Performance Improvement 5 - Cache compiled regexes #995

JCWasmx86 · 2024-11-01T14:48:44Z

Even though the regex module has a cache, it's access is not that fast. E.g. re.sub is a combination of re._compile and pattern.sub. re._compile is checking the flags for e.g. DEBUG values or the verbosity. It uses an enum for the flags. enum.and is quite slow, so even for a cache hit we have around two to three enum.and calls. This patch caches commonly used regexes. The naming probably has to be adjusted.

Commonly used is defined as: pattern._compile is hit with this pattern+flags combination more than 1000 times in the netbox repo. That somewhat balances out the time needed for compilation, the runtime speed and the maintenance effort.

Before:

regex._compile is called 392.678x times, taking 40% of the time
enum.and is called 1.038.531x times, taking 20% of the time

After:

regex._compile is called 7226x times, taking 20% of the time
enum.and is called 267.691x times, taking 8-9% of the time. (That's still a lot)

This can probably improved for more regexes, but I think it's somewhat balanced at the point. The compilation time on my PC for the regexes is at around 0.1s

Timings:
Netbox, parallel: 2.5-2.8s
EDX Platform, parallel: 19s

This is the final patch of the performance improvement patch series. There are probably more improvements, but they are not that low-hanging fruits like the last 6 patches.

netlify · 2024-11-01T14:49:32Z

✅ Deploy Preview for djlint canceled.

Name	Link
🔨 Latest commit	`f41b27e`
🔍 Latest deploy log	https://app.netlify.com/sites/djlint/deploys/6730d186b3147700080b1c18

djlint/settings.py

oliverhaas · 2024-11-02T13:45:58Z

Hi. I came here mainly to apologize that I broke your branch with my PR. Sorry for that!
I appreciate the work you've put into djlint in your last PRs; I just ran some quick tests earlier and you got some really good numbers out of djlint!

I'm not sure if I understand all the changes, but I naively thought that your approach seems a little bit complicated, and I was gonna suggest to just use lru_cache (with what I assumed was a small performance penalty), basically like this:

# Call this module something like `regex.py` or `regex_custom_wrappers.py` 

import re
from functools import lru_cache


def search(regex, text, use_cache: bool = True, flags=None, **kwargs):
    if use_cache:
        re_compiled = _compile_cached(regex, flags=flags)
        return re_compiled.search(text, **kwargs)
    return re.search(regex, text, flags=flags, **kwargs)

# ... more regex functions

@lru_cache(maxsize=256)
def _compile_cached(regex, flags=None) -> re.Pattern:
    return re.compile(regex, flags=flags)

and then it's basically just find & replace. I've got about 20% faster formatting, I think, but I didn't want to spend too much time before I check in with you.
Do you think your approach is worth explicitly storing the compiled regexes? Maybe it's nicer to be more explicit, but I'm quite fond of lru_cache, so just let me know.

JCWasmx86 · 2024-11-02T14:40:23Z

Hi. I came here mainly to apologize that I broke your branch with my PR. Sorry for that!

No worries :)

I appreciate the work you've put into djlint in your last PRs; I just ran some quick tests earlier and you got some really good numbers out of djlint!

Thanks, I hope I made your workflows faster :p

I'm not sure if I understand all the changes, but I naively thought that your approach seems a little bit complicated, and I was gonna suggest to just use lru_cache (with what I assumed was a small performance penalty), basically like this:

I have an even better idea combining both approaches:

import regex as re

def search(regex: ???, text: str, flags:RegexFlags|None=None, **kwargs):
  return _compile_cached(regex, flags=flags).search(text, **kwargs)


--- Somewhere
old_search = re.search
re.search = re_search
format_it()
re.search = old_search

(Just the signatures have to match). I would make the cache as big as possible. (So, I think @cache?) If we monkeypatch the regex module, it should be quite trivial and shouldn't matter, because djLint is foremost a CLI tool and not a library so it's legal to do that

I have no clear preference, I just did it like this because I was first like:

Ok, this regex is used a lot (21k+ times) => Let's cache it
Ok, this regex is used a lot => Let's cache it

And so on.

So we have three things we could do:

Stay with the current code
Use your approach
Use the approach I proposed here

That's probably on monosans to decide :)

JCWasmx86 · 2024-11-02T17:53:29Z

After thinking a bit more, I think your solution @oliverhaas is superior in every manner. I've implemented it like you suggested (I added you as co-author as you made substantial improvements)

JCWasmx86 · 2024-11-02T19:13:41Z

I've added a few extra commits that optimize stuff that wasn't visible earlier. Based on the results of edx-platform:

clear&&git stash; time djlint . --reformat --lint >/dev/null 2>&1
1.35.2:
real	3m16.714s
user	8m42.154s
sys	0m0.378s
1.35.3:
real	0m21.490s
user	1m54.587s
sys	0m0.701s
1.35.4:
real	0m15.429s
user	1m38.236s
sys	0m0.875s
HEAD+patch:
real	0m11.467s
user	0m51.634s
sys	0m0.659s

oliverhaas · 2024-11-02T20:00:11Z

(Ignore my previous comment if you had seen it)

Here some stuff I've tried (average of 10 runs reformatting edx-platform):

Current master: 7.80s
This PR: 5.81s
Use re module instead of regex: 4.89s
Remove "our" manual caching/@cache: 4.90s
Revert using .compile() and just let re do the caching: 4.75s

So as far as I can see it would probably the best to use re and revert most of the manual caching, but let me know if I missed something or if you see different behavior. I would be happy to take care of a PR as well :).

JCWasmx86 · 2024-11-02T21:18:34Z

It seems there is an issue with the re module: (BTW how many cores do you have? 6 cores /12 threads?)

concurrent.futures.process._RemoteTraceback:                                                                                                                                                                                                  
"""
Traceback (most recent call last):
  File "/usr/lib64/python3.12/concurrent/futures/process.py", line 263, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.tmp/djLint/djlint/__init__.py", line 459, in process
    output["format_message"] = reformat_file(config, this_file)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.tmp/djLint/djlint/reformat.py", line 64, in reformat_file
    beautified_code = formatter(config, rawcode)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.tmp/djLint/djlint/reformat.py", line 32, in formatter
    expanded = expand_html(compressed, config)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.tmp/djLint/djlint/formatter/expand.py", line 62, in expand_html
    html = regex_utils.sub(
           ^^^^^^^^^^^^^^^^
  File "/home/user/.tmp/djLint/djlint/regex_utils.py", line 35, in sub
    return _compile_cached(regex, flags=flags).sub(repl, text, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.tmp/djLint/djlint/regex_utils.py", line 46, in _compile_cached
    return re_.compile(regex, flags=flags or 0)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/re/__init__.py", line 228, in compile
    return _compile(pattern, flags)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/re/__init__.py", line 307, in _compile
    p = _compiler.compile(pattern, flags)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/re/_compiler.py", line 745, in compile
    p = _parser.parse(p, flags)
        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/re/_parser.py", line 979, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/re/_parser.py", line 460, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/re/_parser.py", line 544, in _parse
    code = _escape(source, this, state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/re/_parser.py", line 443, in _escape
    raise source.error("bad escape %s" % escape, len(escape))
re.error: bad escape \K at position 14 (line 1, column 15)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/.local/bin/djlint", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/user/.tmp/djLint/djlint/__init__.py", line 414, in main
    file_errors.append(future.result())
                       ^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
re.error: bad escape \K at position 14 (line 1, column 15)

Or if I "fix" it with ChatGPT (My regex skills are basic):
re.error: look-behind requires fixed-width pattern

You probably would have to port those regexes that won't work. And if I understand correctly, re is old stuff, regex is new stuff, so I'm not sure how much sense it makes to use an older library with less features. Sure more performance (If somebody can get it working) is nice, but going back, that depends on the opinions of the maintainers. Furthermore has regex "more thorough Unicode support" (To quote https://docs.python.org/3/library/re.html), so I think re may not make sense if it breaks some unicode stuff

But I think the best way would be to first merge this PR and then build stuff upon it (imo)

oliverhaas · 2024-11-03T06:26:48Z

Weird, I did not get an error, but I only reformatted the edx-platform repo on one system and haven't tested anything else.

For me it's hard to keep track of whether regex is really still the "new stuff", or whether the important stuff got ported to re, especially since both have existed for so long and are (basically?) fully compatible.

From me it's definitely a thumbs up for merging.

(Off-topic: I just got a 16-core/32-threads CPU for my old retired workstation for fairly cheap. Haven't benchmarked djlint specifically, but compiling or running tests is literally more than 4 times faster compared to my 4-core laptop, which actually is noticeably making my workflows easier.)

monosans · 2024-11-03T07:35:45Z

We will not replace regex with re, as re lacks some of the features of regex and this would be a breaking change for those who create their own rules in djlint_rules.yaml.

JCWasmx86 · 2024-11-03T12:03:26Z

@oliverhaas If you want do do more optimizations, there are maybe a few things worth looking into it (And out of scope of this PR):

Modify child_of_unformatted_block to be smarter. After this patch is applied, it takes 40+% of the entire runtime. Maybe you could do fancy stuff with interval trees or bisection.
Check whether replacing (x.start(0), x.end()) for x in matches by x.span() could make sense (You could probably reduce up to 2-3%, if you really want to do micro-optimizations)

monosans · 2024-11-05T16:55:06Z

Hey, @JCWasmx86! There are some conflicts because of the changes I've made, sorry. Could you please resolve them and do some profiling to see how much this PR improves the performance now? Thanks!

JCWasmx86 · 2024-11-05T17:25:56Z

There are some conflicts because of the changes I've made, sorry. Could you please resolve them

I've resolved them with a lot of hard work :/, sadly the git history had to suffer for that

HEAD: (4 runs summed up)

real   0m49.220s => 12.305s/run
user   4m15.192s => 64s/run
sys    0m3.022s

This PR: (4 runs summed up)

real   0m42.939s => 10.735s/run
user   3m25.135s => 51.28375s/run
sys    0m2.984s

So there are still improvements

JCWasmx86 · 2024-11-06T16:39:16Z

Hey @monosans, what is needed for this PR to get merged? Can I assist in any kind?

monosans · 2024-11-07T06:01:40Z

Hey @monosans, what is needed for this PR to get merged? Can I assist in any kind?

Please see review comments

JCWasmx86 · 2024-11-07T11:15:06Z

@monosans I'm sorry there are none. Did you maybe forget to finalize the review?

djlint/helpers.py

djlint/regex_utils.py

djlint/helpers.py

djlint/lint.py

JCWasmx86 · 2024-11-07T17:32:27Z

djlint/lint.py

@@ -118,6 +121,7 @@ def linter(
                            "match": match.group().strip()[:20],
                            "message": rule["message"],
                        })
+            build_flags.cache_clear()


Just one quick question, why was this added? flags are just constants so the cache shouldn't grow that big

oliverhaas · 2024-11-09T21:25:36Z

@JCWasmx86 Thanks again for the hard work. I'm at 1.2s for edx-platform with this branch on my desktop, which is crazy to think about where djlint was just a while ago. I basically only have one feature/bug left on my pain points, and just a month ago I was looking for alternatives to djlint...

If you could share your profiling script in a gist or something, that would be awesome. I can't seem to get the profiler output quite as readable as your images.

JCWasmx86 · 2024-11-10T14:02:36Z

@oliverhaas I just used gprof2dot and snakeviz and temporarily modified the as code as described here: #986 (comment) (Parallel execution => Serial execution as otherwise the profiler gives stupid results)

I've profiled again with clear&&git stash; time hyperfine -i --warmup 3 --runs 10 'djlint . --lint --reformat' --output null for edx-platform

master:
Benchmark 1: djlint . --lint --reformat
  Time (mean ± σ):      5.886 s ±  0.037 s    [User: 37.871 s, System: 0.475 s]
  Range (min … max):    5.856 s …  5.981 s    10 runs
 
  Warning: Ignoring non-zero exit code.
 

real	1m16.701s
user	8m13.952s
sys	0m6.147s

PR:
Benchmark 1: djlint . --lint --reformat
  Time (mean ± σ):      4.450 s ±  0.178 s    [User: 28.246 s, System: 0.435 s]
  Range (min … max):    4.182 s …  4.587 s    10 runs
 
  Warning: Ignoring non-zero exit code.
 

real	0m57.471s
user	6m7.896s
sys	0m5.683s

Even though the regex module has a cache, it's access is not that fast. E.g. re.sub is a combination of re._compile and pattern.sub. re._compile is checking the flags for e.g. DEBUG values or the verbosity. It uses an enum for the flags. enum.__and__ is quite slow, so even for a cache hit we have around two to three enum.__and__ calls. This patch caches commonly used regexes. The naming probably has to be adjusted. Commonly used is defined as: pattern._compile is hit with this pattern+flags combination more than 1000 times in the netbox repo. That somewhat balances out the time needed for compilation, the runtime speed and the maintenance effort.

Having the cache in helpers.py has adverse effects

JCWasmx86 changed the title ~~perf: Cache compiled regexes~~ Performance Improvement 5 - Cache compiled regexes Nov 1, 2024

JCWasmx86 commented Nov 1, 2024

View reviewed changes

djlint/settings.py Outdated Show resolved Hide resolved

JCWasmx86 force-pushed the perf-5 branch from 4f5d2bd to 9fe82d3 Compare November 2, 2024 17:52

JCWasmx86 force-pushed the perf-5 branch from 4dc3603 to f79c2dd Compare November 3, 2024 06:33

JCWasmx86 force-pushed the perf-5 branch from 9ca1a02 to cdb15e3 Compare November 5, 2024 17:21

monosans requested changes Nov 7, 2024

View reviewed changes

djlint/helpers.py Outdated Show resolved Hide resolved

djlint/regex_utils.py Show resolved Hide resolved

djlint/helpers.py Outdated Show resolved Hide resolved

djlint/lint.py Outdated Show resolved Hide resolved

JCWasmx86 force-pushed the perf-5 branch from be8e236 to a3f4f29 Compare November 7, 2024 17:00

JCWasmx86 commented Nov 7, 2024

View reviewed changes

JCWasmx86 added 5 commits November 10, 2024 16:26

perf: Cache results of inside_html_attribute

db443a9

perf: Break out early in hot loop

d0c36b9

perf: Cache built flags

9ae8869

perf: Cache ignored blocks for linting

1364109

Having the cache in helpers.py has adverse effects

JCWasmx86 and others added 8 commits November 10, 2024 16:26

chore: Fix remaining stuff from conflicts

a080ba9

chore: use relative import

a50b1fa

fix: edit regex_utils type hints & add re.split

bf10860

fix: use regex_utils.split

5a98648

fix: clean build_flags cache

1c0958d

chore: Address reviews

9572a85

perf: Speedup output by reducing calls to echo()

ab1377d

perf: Reduce number of copies

f41b27e

JCWasmx86 force-pushed the perf-5 branch from 2ba60e7 to f41b27e Compare November 10, 2024 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Improvement 5 - Cache compiled regexes #995

Performance Improvement 5 - Cache compiled regexes #995

JCWasmx86 commented Nov 1, 2024

netlify bot commented Nov 1, 2024 •

edited

Loading

oliverhaas commented Nov 2, 2024 •

edited

Loading

JCWasmx86 commented Nov 2, 2024 •

edited

Loading

JCWasmx86 commented Nov 2, 2024

JCWasmx86 commented Nov 2, 2024

oliverhaas commented Nov 2, 2024 •

edited

Loading

JCWasmx86 commented Nov 2, 2024 •

edited

Loading

oliverhaas commented Nov 3, 2024

monosans commented Nov 3, 2024

JCWasmx86 commented Nov 3, 2024

monosans commented Nov 5, 2024

JCWasmx86 commented Nov 5, 2024

JCWasmx86 commented Nov 6, 2024

monosans commented Nov 7, 2024

JCWasmx86 commented Nov 7, 2024

JCWasmx86 Nov 7, 2024

oliverhaas commented Nov 9, 2024

JCWasmx86 commented Nov 10, 2024 •

edited

Loading

Performance Improvement 5 - Cache compiled regexes #995

Are you sure you want to change the base?

Performance Improvement 5 - Cache compiled regexes #995

Conversation

JCWasmx86 commented Nov 1, 2024

netlify bot commented Nov 1, 2024 • edited Loading

✅ Deploy Preview for djlint canceled.

oliverhaas commented Nov 2, 2024 • edited Loading

JCWasmx86 commented Nov 2, 2024 • edited Loading

JCWasmx86 commented Nov 2, 2024

JCWasmx86 commented Nov 2, 2024

oliverhaas commented Nov 2, 2024 • edited Loading

JCWasmx86 commented Nov 2, 2024 • edited Loading

oliverhaas commented Nov 3, 2024

monosans commented Nov 3, 2024

JCWasmx86 commented Nov 3, 2024

monosans commented Nov 5, 2024

JCWasmx86 commented Nov 5, 2024

JCWasmx86 commented Nov 6, 2024

monosans commented Nov 7, 2024

JCWasmx86 commented Nov 7, 2024

JCWasmx86 Nov 7, 2024

Choose a reason for hiding this comment

oliverhaas commented Nov 9, 2024

JCWasmx86 commented Nov 10, 2024 • edited Loading

netlify bot commented Nov 1, 2024 •

edited

Loading

oliverhaas commented Nov 2, 2024 •

edited

Loading

JCWasmx86 commented Nov 2, 2024 •

edited

Loading

oliverhaas commented Nov 2, 2024 •

edited

Loading

JCWasmx86 commented Nov 2, 2024 •

edited

Loading

JCWasmx86 commented Nov 10, 2024 •

edited

Loading