Use double caching for re._compile() #96346

serhiy-storchaka · 2022-08-27T20:14:29Z

The caching algorithm for re._compile() is one of the hottest sites in the stdlib. It was rewritten many times (it is not all changes):

The patch for using lru_cache() was repeatedly applied and reverted 3 times! Eventually it turned out to be slower than a simple dict lookup. It does not fit very well in this case, because some results should be not cached, and additional checks should be added before calling the cached function, adding significant overhead.

But the LRU caching algorithm can have advantage over the current simple algorithm if not all compiler patterns fit in the cache. It removes entries from the cache more smarty.

I tested with random keys with exponential distribution. With the cache size 512, the largest difference is for lambda between 1/70 and 1/80. The LRU caching algorithm has 3 times less misses: 0.16% to 0.33% against 0.5 to 1.1% misses in the current algorithm. For significantly larger (almost no misses) or smaller (tens percent of misses) lambda both algorithms gives almost the same result.

Direct implementation of the LRU caching algorithm using OrderedDict or just dict would be slower, because every hit requires moving the found entry to the end. But we can use double caching. The primary smaller and faster cache does not reorder entries. But if the key is not found in the primary cache, we look it up in the secondary LRU cache. This algorithm has the same number of misses as the LRU caching algorithm, but it has faster hits.

The text was updated successfully, but these errors were encountered:

serhiy-storchaka · 2022-08-28T08:32:51Z

The code that I used to count cache misses:

from random import expovariate
from collections import OrderedDict

def test(lambd, _MAXCACHE=512, *, n=10**6):
    _cache = {}
    m = 0
    for i in range(n):
        key = int(expovariate(lambd))
        try:
            _cache[key]
        except KeyError:
            m += 1
            if len(_cache) >= _MAXCACHE:
                # Drop the oldest item
                try:
                    del _cache[next(iter(_cache))]
                except (StopIteration, RuntimeError, KeyError):
                    pass
            _cache[key] = [key]
    return m/n

def test2(lambd, _MAXCACHE=512, *, n=10**6):
    _cache = OrderedDict()
    m = 0
    for i in range(n):
        key = int(expovariate(lambd))
        try:
            v = _cache[key]
        except KeyError:
            m += 1
            if len(_cache) >= _MAXCACHE:
                # Drop the least recently item
                _cache.popitem(last=False)
            _cache[key] = [key]
        else:
            _cache.move_to_end(key)
    return m/n

def test3(lambd, _MAXCACHE=512, _MAXCACHE2=256, *, n=10**6):
    _cache = {}
    _cache2 = OrderedDict()
    m = 0
    for i in range(n):
        key = int(expovariate(lambd))
        try:
            _cache[key]
        except KeyError:
            try:
                v = _cache2[key]
            except KeyError:
                m += 1
                if len(_cache2) >= _MAXCACHE:
                    # Drop the least recently item
                    _cache2.popitem(last=False)
                _cache2[key] = v = [key]
            else:
                _cache2.move_to_end(key)
            if len(_cache) >= _MAXCACHE2:
                # Drop the oldest item
                try:
                    del _cache[next(iter(_cache))]
                except (StopIteration, RuntimeError, KeyError):
                    pass
                #_cache.clear()
            _cache[key] = v
    return m/n

For example:

$ ./python -i re_cache.py
>>> a=70; x=test(1/a); y=test2(1/a); z=test3(1/a); x, y, x/y, x/z, z/y
(0.005344, 0.001695, 3.152802359882006, 3.29064039408867, 0.9581120943952803)
>>> a=80; x=test(1/a); y=test2(1/a); z=test3(1/a); x, y, x/y, x/z, z/y
(0.011016, 0.003435, 3.2069868995633186, 3.178303519907675, 1.0090247452692866)
>>> a=40; x=test(1/a); y=test2(1/a); z=test3(1/a); x, y, x/y, x/z, z/y
(0.000433, 0.000423, 1.0236406619385343, 1.0212264150943395, 1.0023640661938535)
>>> a=400; x=test(1/a); y=test2(1/a); z=test3(1/a); x, y, x/y, x/z, z/y
(0.493634, 0.445853, 1.1071676090550024, 1.0859027154497298, 1.019582687567427)

Since it involves random simulation, the results vary from run to run by few percents.

You can see that FIFO and LRU are equally good or bad for large and small lambda, and that combined algorithm is equal to LRU for the number of misses.

It does not work well if the caches has the same or close size:

>>> a=80; x=test(1/a); y=test2(1/a); z=test3(1/a, 512, 512); x, y, x/y, x/z, z/y
(0.011229, 0.003424, 3.279497663551402, 1.0396259605592075, 3.154497663551402)
>>> a=80; x=test(1/a); y=test2(1/a); z=test3(1/a, 512, 500); x, y, x/y, x/z, z/y
(0.010768, 0.003442, 3.1284137129575824, 1.8033830179199464, 1.7347472399767576)

But if the FIFO cache has size equal to 1/2 or 3/4 of the size of the LRU cache, the difference is negligible.

>>> a=80; x=test(1/a); y=test2(1/a); z=test3(1/a, 512, 256); x, y, x/y, x/z, z/y
(0.011022, 0.003399, 3.2427184466019416, 3.265777777777778, 0.9929390997352162)
>>> a=80; x=test(1/a); y=test2(1/a); z=test3(1/a, 512, 384); x, y, x/y, x/z, z/y
(0.010715, 0.00333, 3.2177177177177176, 3.0977161029199194, 1.0387387387387386)

Smaller size reduces the usefulness of the FIFO cache and can increase the average latency. Above tests do not measure latency, they only evaluate miss rate.

serhiy-storchaka · 2022-08-28T08:44:26Z

It was only theory. Here is an artificial example which shows that the optimization actually helps:

$ ./python -m timeit -n50000 -s 'from random import expovariate; from re import compile; lambd=1/80' 'compile(r"[az1\0-\uffff]"+str(int(expovariate(lambd))))'

Unpatched: 50000 loops, best of 5: 34.5 usec per loop
Patched: 50000 loops, best of 5: 17.9 usec per loop

The results are unstable, but in this case the patched variant is for sure faster than the baseline. In other examples the difference can be not such significant, and I have no any estimations what effect it has in real world code.

ambv · 2022-10-07T19:22:06Z

Thanks, Serhiy! ✨ 🍰 ✨

* main: (38 commits) pythongh-92886: make test_ast pass with -O (assertions off) (pythonGH-98058) pythongh-92886: make test_coroutines pass with -O (assertions off) (pythonGH-98060) pythongh-57179: Add note on symlinks for os.walk (python#94799) pythongh-94808: Fix regex on exotic platforms (python#98036) pythongh-90085: Remove vestigial -t and -c timeit options (python#94941) pythonGH-83901: Improve Signature.bind error message for missing keyword-only params (python#95347) pythongh-61105: Add default param, note on using cookiejar subclass (python#95427) pythongh-96288: Add a sentence to `os.mkdir`'s docstring. (python#96271) pythongh-96073: fix backticks in NEWS entry (pythonGH-98056) pythongh-92886: [clinic.py] raise exception on invalid input instead of assertion (pythonGH-98051) pythongh-97997: Add col_offset field to tokenizer and use that for AST nodes (python#98000) pythonGH-88968: Reject socket that is already used as a transport (python#98010) pythongh-96346: Use double caching for re._compile() (python#96347) pythongh-91708: Revert params note in urllib.parse.urlparse table (python#96699) pythongh-96265: Fix some formatting in faq/design.rst (python#96924) pythongh-73196: Add namespace/scope clarification for inheritance section (python#92840) pythongh-97646: Change `.js` and `.mjs` files mimetype to conform to RFC 9239 (python#97934) pythongh-97923: Always run Ubuntu SSL tests with others in CI (python#97940) pythongh-97956: Mention `generate_global_objects.py` in `AC How-To` (python#97957) pythongh-96959: Update HTTP links which are redirected to HTTPS (python#98039) ...

serhiy-storchaka added type-feature A feature request or enhancement performance Performance or resource usage topic-regex labels Aug 27, 2022

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Aug 27, 2022

pythongh-96346: Use double caching for re._compile()

5c69476

bedevere-bot mentioned this issue Aug 27, 2022

gh-96346: Use double caching for re._compile() #96347

Merged

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Aug 27, 2022

pythongh-96346: Use double caching for re._compile()

9151549

ambv pushed a commit that referenced this issue Oct 7, 2022

gh-96346: Use double caching for re._compile() (#96347)

c11b667

ambv closed this as completed Oct 7, 2022

mpage pushed a commit to mpage/cpython that referenced this issue Oct 11, 2022

pythongh-96346: Use double caching for re._compile() (python#96347)

053555c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use double caching for re._compile() #96346

Use double caching for re._compile() #96346

serhiy-storchaka commented Aug 27, 2022

serhiy-storchaka commented Aug 28, 2022

serhiy-storchaka commented Aug 28, 2022

ambv commented Oct 7, 2022

Use double caching for re._compile() #96346

Use double caching for re._compile() #96346

Comments

serhiy-storchaka commented Aug 27, 2022

serhiy-storchaka commented Aug 28, 2022

serhiy-storchaka commented Aug 28, 2022

ambv commented Oct 7, 2022