-
-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use double caching for re._compile() #96346
Comments
The code that I used to count cache misses: from random import expovariate
from collections import OrderedDict
def test(lambd, _MAXCACHE=512, *, n=10**6):
_cache = {}
m = 0
for i in range(n):
key = int(expovariate(lambd))
try:
_cache[key]
except KeyError:
m += 1
if len(_cache) >= _MAXCACHE:
# Drop the oldest item
try:
del _cache[next(iter(_cache))]
except (StopIteration, RuntimeError, KeyError):
pass
_cache[key] = [key]
return m/n
def test2(lambd, _MAXCACHE=512, *, n=10**6):
_cache = OrderedDict()
m = 0
for i in range(n):
key = int(expovariate(lambd))
try:
v = _cache[key]
except KeyError:
m += 1
if len(_cache) >= _MAXCACHE:
# Drop the least recently item
_cache.popitem(last=False)
_cache[key] = [key]
else:
_cache.move_to_end(key)
return m/n
def test3(lambd, _MAXCACHE=512, _MAXCACHE2=256, *, n=10**6):
_cache = {}
_cache2 = OrderedDict()
m = 0
for i in range(n):
key = int(expovariate(lambd))
try:
_cache[key]
except KeyError:
try:
v = _cache2[key]
except KeyError:
m += 1
if len(_cache2) >= _MAXCACHE:
# Drop the least recently item
_cache2.popitem(last=False)
_cache2[key] = v = [key]
else:
_cache2.move_to_end(key)
if len(_cache) >= _MAXCACHE2:
# Drop the oldest item
try:
del _cache[next(iter(_cache))]
except (StopIteration, RuntimeError, KeyError):
pass
#_cache.clear()
_cache[key] = v
return m/n For example:
Since it involves random simulation, the results vary from run to run by few percents. You can see that FIFO and LRU are equally good or bad for large and small lambda, and that combined algorithm is equal to LRU for the number of misses. It does not work well if the caches has the same or close size:
But if the FIFO cache has size equal to 1/2 or 3/4 of the size of the LRU cache, the difference is negligible.
Smaller size reduces the usefulness of the FIFO cache and can increase the average latency. Above tests do not measure latency, they only evaluate miss rate. |
It was only theory. Here is an artificial example which shows that the optimization actually helps:
Unpatched: 50000 loops, best of 5: 34.5 usec per loop The results are unstable, but in this case the patched variant is for sure faster than the baseline. In other examples the difference can be not such significant, and I have no any estimations what effect it has in real world code. |
Thanks, Serhiy! ✨ 🍰 ✨ |
* main: (38 commits) pythongh-92886: make test_ast pass with -O (assertions off) (pythonGH-98058) pythongh-92886: make test_coroutines pass with -O (assertions off) (pythonGH-98060) pythongh-57179: Add note on symlinks for os.walk (python#94799) pythongh-94808: Fix regex on exotic platforms (python#98036) pythongh-90085: Remove vestigial -t and -c timeit options (python#94941) pythonGH-83901: Improve Signature.bind error message for missing keyword-only params (python#95347) pythongh-61105: Add default param, note on using cookiejar subclass (python#95427) pythongh-96288: Add a sentence to `os.mkdir`'s docstring. (python#96271) pythongh-96073: fix backticks in NEWS entry (pythonGH-98056) pythongh-92886: [clinic.py] raise exception on invalid input instead of assertion (pythonGH-98051) pythongh-97997: Add col_offset field to tokenizer and use that for AST nodes (python#98000) pythonGH-88968: Reject socket that is already used as a transport (python#98010) pythongh-96346: Use double caching for re._compile() (python#96347) pythongh-91708: Revert params note in urllib.parse.urlparse table (python#96699) pythongh-96265: Fix some formatting in faq/design.rst (python#96924) pythongh-73196: Add namespace/scope clarification for inheritance section (python#92840) pythongh-97646: Change `.js` and `.mjs` files mimetype to conform to RFC 9239 (python#97934) pythongh-97923: Always run Ubuntu SSL tests with others in CI (python#97940) pythongh-97956: Mention `generate_global_objects.py` in `AC How-To` (python#97957) pythongh-96959: Update HTTP links which are redirected to HTTPS (python#98039) ...
The caching algorithm for
re._compile()
is one of the hottest sites in the stdlib. It was rewritten many times (it is not all changes):The patch for using
lru_cache()
was repeatedly applied and reverted 3 times! Eventually it turned out to be slower than a simple dict lookup. It does not fit very well in this case, because some results should be not cached, and additional checks should be added before calling the cached function, adding significant overhead.But the LRU caching algorithm can have advantage over the current simple algorithm if not all compiler patterns fit in the cache. It removes entries from the cache more smarty.
I tested with random keys with exponential distribution. With the cache size 512, the largest difference is for lambda between 1/70 and 1/80. The LRU caching algorithm has 3 times less misses: 0.16% to 0.33% against 0.5 to 1.1% misses in the current algorithm. For significantly larger (almost no misses) or smaller (tens percent of misses) lambda both algorithms gives almost the same result.
Direct implementation of the LRU caching algorithm using OrderedDict or just dict would be slower, because every hit requires moving the found entry to the end. But we can use double caching. The primary smaller and faster cache does not reorder entries. But if the key is not found in the primary cache, we look it up in the secondary LRU cache. This algorithm has the same number of misses as the LRU caching algorithm, but it has faster hits.
The text was updated successfully, but these errors were encountered: