Improve performance by caching find_spec #2408

crazybolillo · 2024-04-05T06:03:54Z

Certain checkers upstream on pylint like import-error heavily use find_spec. This method is IO intensive as it looks for files across several search paths to return a ModuleSpec.

Since imports across files may repeat themselves it makes sense to cache this method in order to speed up the linting process.

Local testing shows that caching reduces the total amount of calls to find_module methods (used by find_spec) by about 50%. Linting the test repository in the related issue goes from 40 seconds to 37 seconds. This was on a NVME disk and after warmup, so timing gains may be bigger on slower file systems like the one mentioned in the referenced issue.

Closes pylint-dev/pylint#9310.

	Type
✓	🔨 Refactoring

crazybolillo · 2024-04-05T06:08:58Z

The cache feels dirty but lru_cache and its siblings force you to cache the whole thing which is impossible (lists can't be hashed) and feels redundant, if I already have info on the module requested, who cares if the search paths this time are different. I may be horrendously wrong on that last statement, but it was my thought process.

There may be room for improvement, however I think this can get the ball rolling to discuss changes/improvements.

Here is a brief review of the profiling I did.

Before PR

Timings using hyperfine

  Time (mean ± σ):     40.231 s ±  0.429 s    [User: 36.848 s, System: 3.355 s]
  Range (min … max):   39.611 s … 40.846 s    10 runs

Syscalls

  % time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 62.70    5.696665           3   1578640   1260035 newfstatat
 21.24    1.929284           2    756383           clock_gettime
  4.53    0.411450           5     72581     13450 openat
  3.31    0.300422           2    101300           read
  1.99    0.180693           3     59132           close
  1.89    0.172086           2     69186         5 lseek
  1.84    0.166910           2     62868           fstat
  1.69    0.153237           2     58172     58153 ioctl
  0.31    0.028427           3      8483           getcwd
  0.15    0.013854           2      5047           mmap
  0.15    0.013600           2      4724           munmap
  0.13    0.011465           6      1804           getdents64
  0.05    0.004886           4      1149           brk
  0.02    0.001471           6       230           write
  0.00    0.000408           7        56           mprotect
  0.00    0.000221         221         1           rename
  0.00    0.000092          18         5           getrandom
  0.00    0.000076           7        10         4 readlink
  0.00    0.000042          42         1           unlink
  0.00    0.000021           3         6           fcntl
  0.00    0.000015           7         2           prlimit64
  0.00    0.000014           7         2           gettid
  0.00    0.000010          10         1           futex
  0.00    0.000009           9         1           getpid
  0.00    0.000005           1         5           geteuid
  0.00    0.000004           0         5           getuid
  0.00    0.000002           0         3           getgid
  0.00    0.000002           0         3           getegid
  0.00    0.000002           1         2           setfsuid
  0.00    0.000002           1         2           setfsgid
  0.00    0.000001           0        67           rt_sigaction
  0.00    0.000001           0         2         1 access
  0.00    0.000000           0         2           pread64
  0.00    0.000000           0         1           mremap
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         3           uname
  0.00    0.000000           0         1         1 mkdir
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           epoll_create1
  0.00    0.000000           0         1           rseq
------ ----------- ----------- --------- --------- ----------------
100.00    9.085377           3   2779886   1331649 total

After PR

Timings using hyperfine

  Time (mean ± σ):     37.277 s ±  0.131 s    [User: 35.208 s, System: 2.043 s]
  Range (min … max):   37.022 s … 37.508 s    10 runs

Syscalls

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.62    2.141208           2    827202    642701 newfstatat
 29.37    1.151274           2    569776           clock_gettime
  4.47    0.175144           4     43641     13443 openat
  2.79    0.109267           2     53100           read
  1.77    0.069260           1     40245         5 lseek
  1.69    0.066270           2     30199           close
  1.63    0.063822           1     33933           fstat
  1.34    0.052577           1     29240     29221 ioctl
  1.16    0.045629           3     11589           getcwd
  0.41    0.016088           8      1804           getdents64
  0.31    0.012145           2      5150           mmap
  0.28    0.011102           2      4836           munmap
  0.12    0.004787           4      1059           brk
  0.02    0.000756           3       229           write
  0.01    0.000429           6        67           rt_sigaction
  0.01    0.000300           5        56           mprotect
  0.00    0.000102          10        10         4 readlink
  0.00    0.000048           8         6           fcntl
  0.00    0.000041           8         5           getrandom
  0.00    0.000016           8         2           prlimit64
  0.00    0.000008           4         2           gettid
  0.00    0.000005           2         2           mremap
  0.00    0.000002           0         3           getgid
  0.00    0.000002           0         5           geteuid
  0.00    0.000002           0         3           getegid
  0.00    0.000002           1         2           setfsuid
  0.00    0.000002           1         2           setfsgid
  0.00    0.000001           0         2         1 access
  0.00    0.000001           0         5           getuid
  0.00    0.000000           0         2           pread64
  0.00    0.000000           0         1           getpid
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         3           uname
  0.00    0.000000           0         1         1 mkdir
  0.00    0.000000           0         1           unlink
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           epoll_create1
  0.00    0.000000           0         1           rseq
------ ----------- ----------- --------- --------- ----------------
100.00    3.920290           2   1652190    685376 total

crazybolillo · 2024-04-05T06:11:23Z

I also obtained this call graph to find the best place to cache. The issue refers to find_module but I thought it was better to cache up the stack in find_spec because that method calls itself all the finders available. Since I thought it was cool and may be of interest, I am attaching it. You will need to zoom into it.

crazybolillo · 2024-04-05T06:15:05Z

Also, I used the repository linked in the issue to obtain all the profiling data. I am still yet to profile the memory allocated for the process execution, will try to obtain that since I want to measure the hit an unbound cache may create.

perrinjerome · 2024-04-08T04:51:35Z

astroid/interpreter/_import/spec.py

+    def wrapper(*args):
+        key = ".".join(args[0])
+        if key not in modpath_cache:
+            modpath_cache[key] = func(*args)


maybe this cache also needs to be cleared there ?

astroid/astroid/manager.py

Line 437 in 4ccb9c4

def clear_cache(self) -> None:

Thanks for pointing that out. The cache is now cleared from there and I also renamed variables a little bit and made the code look more like existing cache code (e.g. the inference tip cache).

perrinjerome

Thanks, it seems good to clear the cache here as well. I check "Approve" here, but I don't astroid well ( I just submitted an issue yesterday for the first time ). A maintainer should approve, not me, but I can say that these changes look good to me.

DanielNoord · 2024-04-09T07:28:03Z

astroid/interpreter/_import/spec.py

+def spec_cache(func):
+    def wrapper(*args):
+        key = ".".join(args[0])
+        if key not in _spec_cache:
+            _spec_cache[key] = func(*args)
+
+        return _spec_cache[key]
+
+    return wrapper


Could you add typing to this function?

I have typed the function and also the dictionary used for caching (mypy was complaining).

DanielNoord · 2024-04-09T19:38:15Z

astroid/interpreter/_import/spec.py

@@ -423,6 +429,18 @@ def _find_spec_with_path(
    raise ImportError(f"No module named {'.'.join(module_parts)}")


+def spec_cache(func: Callable) -> Callable:


Suggested change

def spec_cache(func: Callable) -> Callable:

def spec_cache(func: Callable[[list[str], Sequence[str] | None], ModuleSpec]) -> Callable[[...], ModuleSpec]:

Changes have been applied.

Pierre-Sassoulas

Thank you for the analysis of what need to be cachec. Any reason not to use functools.cacheor functools.lru_cache instead of using a global ?

crazybolillo · 2024-04-15T03:05:13Z

Thank you for the analysis of what need to be cachec. Any reason not to use functools.cacheor functools.lru_cache instead of using a global ?

I really wanted to use something standard, however the problem I found with functools caches is that they take into consideration all function arguments while I thought it only made sense to cache the requested module because:

Search paths are lists so they need further processing to hash them
I don't think search paths should be cached? If I already know where the package is found, does it matter if you give me different search paths? This point I am not so sure about, but it was my logic.

DanielNoord

Perhaps we also need a changelog entry?

Code LGTM!

Pierre-Sassoulas

Right thank you for the explanation. In find_spec we have :type modpath: list or tuple, maybe we could force it to be a tuple with something like this:

def make_modpath_hashable(
    func: Callable[[list[str], Sequence[str] | None], ModuleSpec]
) -> Callable[..., ModuleSpec]:
    def wrapper(modpath: list[str], *args) -> ModuleSpec:
        return func(tuple(modpath), *args)
    return wrapper

@make_modpath_hashable
@lru_cache(maxsize=1024)
def find_spec(modpath: list[str], path: Sequence[str] | None = None) -> ModuleSpec:

(I didn't test if it work tbh, but it should ? )

Pierre-Sassoulas · 2024-04-15T07:22:05Z

astroid/interpreter/_import/spec.py

+    func: Callable[[list[str], Sequence[str] | None], ModuleSpec]
+) -> Callable[..., ModuleSpec]:
+    def wrapper(modpath: list[str], *args) -> ModuleSpec:
+        key = ".".join(modpath)


Suggested change

key = ".".join(modpath)

key = tuple(modpath)

I suppose it's faster to get something hashable that way ?

Looks better, thanks for the suggestion, I have applied it.

crazybolillo · 2024-04-16T19:39:55Z

I tried the approach but It does not seem to work, main reason being that find_spec uses lists, so wrapping the function to pass it a tuple in order to make it cachable fails.

We would also need to make *args hashable, but that one is tricky, since I don't think order matters for search paths, so if the search path order changes we would get different hashes and the cache wont be hit as much.

What do you think? Maybe I am missing something.

I think we could work a hack use lru_cache internals but the code would be probably more complicated than using our own cache 🤔

Pierre-Sassoulas · 2024-04-22T11:32:25Z

I'm hesitant to add a global, because pylint/astroid can be used on massive codebase and we had issues with memory constantly increasing in the past. I don't remember all cache clear API where we need to add this global to myself, and would need some time to check it (Jacob probably remember better than me if it comes to it though). Using a global is not impossible but it's easy to make a mistake and create a memory leak. So using lru_cache to set a max number of value stored would be convenient / reassuring. I suppose another (less elegant than adding two decorators) approach would be to create a new cached subfunction inside the current function with the paths we want to cache specifically, what do you think ?

crazybolillo · 2024-04-22T19:30:30Z

I think I was able to find a solution that lets us use lru_cache without getting into too much decorators or sub functions. I have basically wrapped the arguments in a class so we can control its hash and ignore problematic arguments. The benchmarks still look good as well.

Without cache:

Time (mean ± σ):     58.254 s ±  0.470 s    [User: 53.538 s, System: 4.705 s]
Range (min … max):   57.863 s … 59.150 s    10 runs

With cache:

Time (mean ± σ):     52.131 s ±  0.191 s    [User: 49.164 s, System: 2.954 s]
Range (min … max):   51.910 s … 52.550 s    10 runs

What do you think?

Pierre-Sassoulas

Amazing, thank you ! The perf increase and the design are great. I need to fix the coverage and the pipeline for pypy 3.10 on main but I think we're going to release astroid 3.2.0 for this alone !

astroid/manager.py

tests/test_manager.py

astroid/interpreter/_import/spec.py

crazybolillo · 2024-04-29T03:19:36Z

Nuked AstroidManager as suggested. Just waiting on your thoughts about exceptions preventing not found packages from being cached.

jacobtylerwalls · 2024-04-29T11:54:52Z

Thanks, I didn't notice that ImportError is raised when nothing is found. Good point.

However, I still think we should add the path to the cache key even when modules were found.

The following is unexpected, and I think it's realistic given the upstream callers like ast_from_module_name(). Pathing is one of the most sensitive parts of astroid.

>>> from astroid.interpreter._import.spec import find_spec
>>> find_spec(['brain'], ['astroid'])
ModuleSpec(name='brain', type=<ModuleType.PKG_DIRECTORY: 3>, location='astroid/brain', origin=None, submodule_search_locations=None)
>>> find_spec(['brain'], ['pylint'])
ModuleSpec(name='brain', type=<ModuleType.PKG_DIRECTORY: 3>, location='astroid/brain', origin=None, submodule_search_locations=None)

astroid/astroid/manager.py

Line 195 in 7a3b482

context_file: str | None = None,

jacobtylerwalls · 2024-04-29T11:55:49Z

For example, the path is part of the cache key for modules here:

astroid/astroid/manager.py

Lines 301 to 305 in 7a3b482

    
           def file_from_module_name( 
        
               self, modname: str, contextfile: str | None 
        
           ) -> spec.ModuleSpec: 
        
               try: 
        
                   value = self._mod_file_cache[(modname, contextfile)]

crazybolillo · 2024-04-30T05:12:16Z

I made the search path part of its cache by converting it into a frozenset however there are lots of cache misses and the performance benefit gets nullified, so maybe this is harder than I thought 🤔 . This is the diff I tried (when compared with the current PR)

diff --git a/astroid/interpreter/_import/spec.py b/astroid/interpreter/_import/spec.py
index c812a612..62bb3e6c 100644
--- a/astroid/interpreter/_import/spec.py
+++ b/astroid/interpreter/_import/spec.py
@@ -428,6 +428,7 @@ def _find_spec_with_path(
 @dataclass(frozen=True)
 class SpecArgs:
     key: tuple
+    path_cache: frozenset | None
     modpath: list[str] = field(compare=False, hash=False)
     path: Sequence[str] | None = field(compare=False, hash=False)
 
@@ -449,7 +450,7 @@ def find_spec(modpath: list[str], path: Sequence[str] | None = None) -> ModuleSp
     :return: A module spec, which describes how the module was
              found and where.
     """
-    return _find_spec(SpecArgs(tuple(modpath), modpath, path))
+    return _find_spec(SpecArgs(tuple(modpath), frozenset(path) if path else None, modpath, path))

Here are the cache specs after running it on the test repo provided in the original issue:

No cache for search path: CacheInfo(hits=39800, misses=34979, maxsize=1024, currsize=1024)
Caching search path: CacheInfo(hits=39755, misses=74619, maxsize=1024, currsize=1024)

Am I missing something? Maybe a better way to cache the path? Otherwise it seems like it may be better to optimize somewhere else.

crazybolillo · 2024-04-30T05:21:18Z

Also since the main point of this PR (perf) may take a while, I can make a new PR to apply the nuking of the Astroid test class separately.

jacobtylerwalls · 2024-04-30T12:04:34Z

About the dataclasses, did you consider doing something like this for simplicity?

diff --git a/astroid/interpreter/_import/spec.py b/astroid/interpreter/_import/spec.py
index c812a612..65d7c88d 100644
--- a/astroid/interpreter/_import/spec.py
+++ b/astroid/interpreter/_import/spec.py
@@ -16,7 +16,6 @@ import types
 import warnings
 import zipimport
 from collections.abc import Iterator, Sequence
-from dataclasses import dataclass, field
 from functools import lru_cache
 from pathlib import Path
 from typing import Any, Literal, NamedTuple, Protocol
@@ -425,13 +424,6 @@ def _find_spec_with_path(
     raise ImportError(f"No module named {'.'.join(module_parts)}")
 
 
-@dataclass(frozen=True)
-class SpecArgs:
-    key: tuple
-    modpath: list[str] = field(compare=False, hash=False)
-    path: Sequence[str] | None = field(compare=False, hash=False)
-
-
 def find_spec(modpath: list[str], path: Sequence[str] | None = None) -> ModuleSpec:
     """Find a spec for the given module.
 
@@ -449,15 +441,14 @@ def find_spec(modpath: list[str], path: Sequence[str] | None = None) -> ModuleSp
     :return: A module spec, which describes how the module was
              found and where.
     """
-    return _find_spec(SpecArgs(tuple(modpath), modpath, path))
-
+    return _find_spec(tuple(modpath), tuple(path) if path else None)
 
 @lru_cache(maxsize=1024)
-def _find_spec(args: SpecArgs) -> ModuleSpec:
-    _path = args.path or sys.path
+def _find_spec(modpath: tuple[str], path: tuple[str] | None = None) -> ModuleSpec:
+    _path = path or sys.path
 
     # Need a copy for not mutating the argument.
-    modpath = args.modpath[:]
+    modpath = list(modpath)
 
     submodule_path = None
     module_parts = modpath[:]
@@ -466,7 +457,7 @@ def _find_spec(args: SpecArgs) -> ModuleSpec:
     while modpath:
         modname = modpath.pop(0)
         finder, spec = _find_spec_with_path(
-            _path, modname, module_parts, processed, submodule_path or args.path
+            _path, modname, module_parts, processed, submodule_path or path
         )
         processed.append(modname)
         if modpath:

jacobtylerwalls · 2024-04-30T12:23:27Z

Am I missing something? Maybe a better way to cache the path? Otherwise it seems like it may be better to optimize somewhere else.

I'm wondering if the issue is in pylint. We're probably doing some unnecessary "is this import a relative import, if so provide the filepath we started from". It's providing all those filepaths that get checked first and strike out that are adding to the number of calls here.

This diff reduces a ton of calls, but it would need to be checked for correctness:

diff --git a/pylint/checkers/imports.py b/pylint/checkers/imports.py
index ac8962c50..361bd5571 100644
--- a/pylint/checkers/imports.py
+++ b/pylint/checkers/imports.py
@@ -1043,11 +1043,13 @@ class ImportsChecker(DeprecatedMixin, BaseChecker):
         module_file = node.root().file
         context_name = node.root().name
         base = os.path.splitext(os.path.basename(module_file))[0]
-
         try:
-            importedmodname = astroid.modutils.get_module_part(
-                importedmodname, module_file
-            )
+            if isinstance(node, nodes.ImportFrom):
+                importedmodname = astroid.modutils.get_module_part(
+                    importedmodname, module_file
+                )
+            else:
+                importedmodname = astroid.modutils.get_module_part(importedmodname)
         except ImportError:
             pass

crazybolillo · 2024-05-01T18:29:31Z

Yes, it seems like there are lots of calls for the same package with different paths which obliterates the cache. I collected all calls to find_spec with their arguments and grouped them by requested modpath. Here is an excerpt:

** ['jinja2'] 403
         - ['pylint-corpus/venv/lib/python3.11/site-packages/scalene']
         - ['.', 'pylint-corpus/venv/lib/python3.11/site-packages', 'pylint-corpus', 'pylint-corpus/venv/lib/python3.11/site-packages', 'pylint-corpus', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', 'pylint-corpus/venv/lib/python3.11/site-packages', 'astroid']
         - ['pylint-corpus/venv/lib/python3.11/site-packages/jinja2']
         - None
         - ['pylint-corpus/venv/lib/python3.11/site-packages/pkg_resources/_vendor/pyparsing/diagram']
         - ['pylint-corpus/venv/lib/python3.11/site-packages/pip/_vendor/pyparsing/diagram']
         - ['pylint-corpus/venv/lib/python3.11/site-packages/setuptools/_vendor/pyparsing/diagram']

jinja2 was requested 403 times and those were the search paths used. I also wonder why jinja2 is looked for since its not used in the test repo.

I will investigate in pylint.

crazybolillo · 2024-05-04T18:29:07Z

I followed the suggestion and applied a similar change in pylint. Cache specs do improve a lot. Also it seems like previous tests were running with a polluted virtualenv so that explains stuff like looking for jinja2 😅.

Run times diminished in all cases because of the clean virtualenv, however relatively speaking the performance increase seems even better than the first iteration. I also changed tuples to frozensets since apparently they are faster by a noticeable margin (according to hyperfine).

Stats

No Cache

Time (mean ± σ):     46.954 s ±  1.127 s    [User: 43.222 s, System: 3.700 s] 
Range (min … max):   45.499 s … 48.479 s    10 runs

Cache

tuple

Time (mean ± σ):     38.001 s ±  0.359 s    [User: 35.533 s, System: 2.452 s]
Range (min … max):   37.388 s … 38.509 s    10 runs

frozenset

Time (mean ± σ):     35.961 s ±  0.758 s    [User: 33.585 s, System: 2.337 s]
Range (min … max):   34.297 s … 36.718 s    10 runs

The related PR in pylint that would make this cache work is pylint#9595.

crazybolillo · 2024-05-04T18:57:18Z

Nevermind about the frozenset, when converting it back to a list ordering is sometimes lost so it can't be used, at least not for the sensitive modpath, for the search paths I don't think the order matters too much (?) so we can use frozensets there.

codecov · 2024-05-04T18:59:56Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.78%. Comparing base (7a3b482) to head (82c839e).
Report is 11 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2408      +/-   ##
==========================================
+ Coverage   92.76%   92.78%   +0.02%     
==========================================
  Files          94       94              
  Lines       11087    11095       +8     
==========================================
+ Hits        10285    10295      +10     
+ Misses        802      800       -2

Flag	Coverage Δ
linux	`92.60% <100.00%> (+0.02%)`	⬆️
pypy	`92.78% <100.00%> (+2.01%)`	⬆️
windows	`92.69% <100.00%> (+0.32%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
astroid/interpreter/_import/spec.py	`97.44% <100.00%> (+0.05%)`	⬆️
astroid/manager.py	`89.58% <100.00%> (+0.04%)`	⬆️

... and 2 files with indirect coverage changes

jacobtylerwalls · 2024-05-04T19:48:56Z

The potential indeterminacy of the frozenset does scare me a little bit. Just this week on Discord someone was asking for pointers about debugging indeterminate behavior of the cyclic-import check, and I advised them to look into nondeterminacy in astroid. Leaving a frozenset for the path sounds like exactly that kind of behavior.

I'd suggest using tuple for both. But if you want to prepare a new benchmark and discuss it, that's fine too.

Pierre-Sassoulas · 2024-05-04T20:05:53Z

the performance increase seems even better than the first iteration

Right, more than 20% faster is HUGE !

search paths I don't think the order matters too much (?)

Not sure if I understand this right but the order of discovery does matter, right ? Doesn't it affect pylint-dev/pylint#6535 (pinned issue in pylint) ?

jacobtylerwalls · 2024-05-04T20:13:38Z

Yeah, I traced what happens to that path argument, and it eventually ends up here, in an order-sensitive for loop in importlib, so I wouldn't want this to become unordered:

https://github.com/python/cpython/blob/978fba58aef347de4a1376e525df2dacc7b2fff3/Lib/importlib/_bootstrap_external.py#L1522

Side note: it seems like our constructor for Finder is actually useless? I can't find anywhere that uses _path. Could be a small gain to nuke it.

astroid/astroid/interpreter/_import/spec.py

Lines 85 to 86 in 4a8827d

    
           def __init__(self, path: Sequence[str] | None = None) -> None: 
        
               self._path = path or sys.path

Certain checkers upstream on pylint like import-error heavily use find_spec. This method is IO intensive as it looks for files across several search paths to return a ModuleSpec. Since imports across files may repeat themselves it makes sense to cache this method in order to speed up the linting process. Closes pylint-dev/pylint#9310.

This class predates efforts to have a central interface to control global state (including caches) and it is no longer needed.

crazybolillo · 2024-05-04T20:36:49Z

Ooops , I changed the search paths to tuple as well. Perf is still good, in fact since most of the time we should be getting no search paths I ran hyperfine again and there was not much of an impact.

astroid/interpreter/_import/spec.py

jacobtylerwalls

Thanks, great work!

crazybolillo · 2024-05-04T21:27:36Z

Thanks to everyone as well for the help and reviews 🤝

perrinjerome reviewed Apr 8, 2024

View reviewed changes

crazybolillo force-pushed the cache-find-spec-crazybolillo branch from 4ccb9c4 to 2dfce27 Compare April 9, 2024 04:27

crazybolillo requested a review from perrinjerome April 9, 2024 04:37

perrinjerome approved these changes Apr 9, 2024

View reviewed changes

DanielNoord reviewed Apr 9, 2024

View reviewed changes

crazybolillo force-pushed the cache-find-spec-crazybolillo branch from cc546b9 to 7fcfdac Compare April 9, 2024 21:28

crazybolillo requested a review from DanielNoord April 9, 2024 21:33

Pierre-Sassoulas added the Enhancement ✨ Improvement to a component label Apr 14, 2024

Pierre-Sassoulas reviewed Apr 14, 2024

View reviewed changes

DanielNoord reviewed Apr 15, 2024

View reviewed changes

Pierre-Sassoulas reviewed Apr 15, 2024

View reviewed changes

Pierre-Sassoulas added this to the 3.2.0 milestone Apr 16, 2024

crazybolillo force-pushed the cache-find-spec-crazybolillo branch 2 times, most recently from 38315c0 to 1b2f354 Compare April 22, 2024 19:29

Pierre-Sassoulas previously approved these changes Apr 22, 2024

View reviewed changes

Pierre-Sassoulas added the topic-performance label Apr 22, 2024

jacobtylerwalls requested changes Apr 28, 2024

View reviewed changes

astroid/manager.py Outdated Show resolved Hide resolved

tests/test_manager.py Outdated Show resolved Hide resolved

astroid/interpreter/_import/spec.py Show resolved Hide resolved

astroid/interpreter/_import/spec.py Outdated Show resolved Hide resolved

crazybolillo dismissed Pierre-Sassoulas’s stale review via e593f84 April 29, 2024 03:07

crazybolillo force-pushed the cache-find-spec-crazybolillo branch from 1b2f354 to e593f84 Compare April 29, 2024 03:07

crazybolillo requested a review from jacobtylerwalls April 29, 2024 03:19

jacobtylerwalls modified the milestones: 3.2.0, 3.3.0 May 4, 2024

crazybolillo force-pushed the cache-find-spec-crazybolillo branch from e593f84 to 53c71a7 Compare May 4, 2024 18:25

crazybolillo force-pushed the cache-find-spec-crazybolillo branch from 53c71a7 to 9984bc4 Compare May 4, 2024 18:53

jacobtylerwalls modified the milestones: 3.3.0, 3.2.0 May 4, 2024

crazybolillo added 2 commits May 4, 2024 14:31

Remove AstroidCacheSetupMixin

98a0380

This class predates efforts to have a central interface to control global state (including caches) and it is no longer needed.

crazybolillo force-pushed the cache-find-spec-crazybolillo branch from 9984bc4 to 98a0380 Compare May 4, 2024 20:32

jacobtylerwalls reviewed May 4, 2024

View reviewed changes

astroid/interpreter/_import/spec.py Show resolved Hide resolved

Fix mypy warnings and update typing

82c839e

jacobtylerwalls approved these changes May 4, 2024

View reviewed changes

jacobtylerwalls merged commit 2ec0115 into pylint-dev:main May 4, 2024
20 checks passed

		@@ -423,6 +429,18 @@ def _find_spec_with_path(
		raise ImportError(f"No module named {'.'.join(module_parts)}")


		def spec_cache(func: Callable) -> Callable:

	def spec_cache(func: Callable) -> Callable:
	def spec_cache(func: Callable[[list[str], Sequence[str] \| None], ModuleSpec]) -> Callable[[...], ModuleSpec]:

Improve performance by caching find_spec #2408

Improve performance by caching find_spec #2408

Conversation

crazybolillo commented Apr 5, 2024

crazybolillo commented Apr 5, 2024

Before PR

Timings using hyperfine

Syscalls

After PR

Timings using hyperfine

Syscalls

crazybolillo commented Apr 5, 2024

crazybolillo commented Apr 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

perrinjerome left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

crazybolillo commented Apr 15, 2024 • edited Loading

DanielNoord left a comment

Choose a reason for hiding this comment

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crazybolillo commented Apr 16, 2024

Pierre-Sassoulas commented Apr 22, 2024

crazybolillo commented Apr 22, 2024

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

crazybolillo commented Apr 29, 2024

jacobtylerwalls commented Apr 29, 2024

jacobtylerwalls commented Apr 29, 2024

crazybolillo commented Apr 30, 2024

crazybolillo commented Apr 30, 2024

jacobtylerwalls commented Apr 30, 2024

jacobtylerwalls commented Apr 30, 2024

crazybolillo commented May 1, 2024

crazybolillo commented May 4, 2024

Stats

No Cache

Cache

tuple

frozenset

crazybolillo commented May 4, 2024

codecov bot commented May 4, 2024 • edited Loading

Codecov Report

jacobtylerwalls commented May 4, 2024

Pierre-Sassoulas commented May 4, 2024

jacobtylerwalls commented May 4, 2024 • edited Loading

crazybolillo commented May 4, 2024

jacobtylerwalls left a comment

Choose a reason for hiding this comment

crazybolillo commented May 4, 2024

crazybolillo commented Apr 5, 2024 •

edited

Loading

crazybolillo commented Apr 15, 2024 •

edited

Loading

codecov bot commented May 4, 2024 •

edited

Loading

jacobtylerwalls commented May 4, 2024 •

edited

Loading