-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul file-system operations to use FileSystemCache #4623
Conversation
Argh: on windows python generates different error messages in the |
4da63be
to
7520f89
Compare
They'll still merge conflict, but the resolution will be trivial now (I'm trying avoid making one depend on the other)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, this should improve performance and this will make it easier to reason about correctness when there are concurrent file system updates. Also thanks for refactoring things a little while you were are it!
I have just a few minor things (also you need to fix a merge conflict). Finally, before merging, I'd like a few more steps:
- Measure the performance impact to a trivial fine-grained incremental update (preferably on macOS, since FS operations tend to be slow on macs).
- Manually verify that file system changes are still correctly picked up in some common scenarios (both with daemon and without).
mypy/server/update.py
Outdated
@@ -499,13 +505,14 @@ def get_module_to_path_map(manager: BuildManager) -> Dict[str, str]: | |||
for module, node in manager.modules.items()} | |||
|
|||
|
|||
def get_sources(modules: Dict[str, str], | |||
def get_sources(fscache: FileSystemCache, | |||
modules: Dict[str, str], | |||
changed_modules: List[Tuple[str, str]]) -> List[BuildSource]: | |||
# TODO: Race condition when reading from the file system; we should only read each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this TODO comment be removed now?
mypy/build.py
Outdated
@@ -544,6 +548,23 @@ def find_config_file_line_number(path: str, section: str, setting_name: str) -> | |||
return -1 | |||
|
|||
|
|||
class FindModuleCache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstring. Mention that this never flushes the FileSystemCache and caller is responsible for that.
mypy/build.py
Outdated
|
||
|
||
def find_module(id: str, lib_path_arg: Iterable[str]) -> Optional[str]: | ||
def find_module(cache: FindModuleCache, id: str, lib_path_arg: Iterable[str]) -> Optional[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a method of FindModuleCache
? If not, document in FindModuleCache
that this function populates the data structures and is tightly coupled with the class.
mypy/build.py
Outdated
dirs = [] | ||
for pathitem in lib_path: | ||
# e.g., '/usr/lib/python3.4/foo/bar' | ||
isdir = find_module_isdir_cache.get((pathitem, dir_chain)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this block of code removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's maintaining an isdir
cache, which is handled by the fscache
stat cache.
OK, all of the other changes I was making that conflicted with this are in now, so I've updated it all and made the requested cleanups. Doesn't affect the performance of a trivial fine-grained incremental update at all. Manual testing seems to pick up changes fine. |
@gvanrossum, indeed. I was originally hoping to get my old PR merged before this one was opened, but that of course didn't happen. The good news is my implementation in #4693 was in a single commit, so I should be able to revert that, rebase on master and re-apply the changes (with slight modification) without too much headache. Thanks for the heads up! PS: I originally interpreted your comment to mean that this is getting merged before my PR, is that the case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few additional remaining non-cached os
operations in mypy.build
(also see comments):
getmtime
(os.path.getmtime
)remove_cwd_prefix_from_path
(os.path.isfile
)
Should these be updated to use the cache as well?
mypy/build.py
Outdated
|
||
def list_dir(path: str) -> Optional[List[str]]: | ||
"""Return a cached directory listing. | ||
Module locations and some intermediate results are cached internally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: Indents seem off unless github diff view is acting up. Just to be extra clear, indents should be like this:
class Foo:
"""First line.
Second line (4 space indent).
"""
mypy/build.py
Outdated
# use hits to avoid adding it a second time when we see x.pyi. | ||
# This also avoids both x.py and x.pyi when x/ was seen first. | ||
hits = set() # type: Set[str] | ||
for item in sorted(os.listdir(os.path.dirname(module_path))): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use FileSystemCache
here and below?
mypy/dmypy_server.py
Outdated
manager = result.manager | ||
graph = result.graph | ||
self.fine_grained_manager = FineGrainedBuildManager(manager, graph) | ||
self.fine_grained_manager = mypy.server.update.FineGrainedBuildManager(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the imports still have from mypy.server.update import FineGrainedBuildManager
so why the module prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improperly resolved a merge conflict. When this patch first went up it used a prefix, but I changed it to not in another patch and must have undone it while merging.
mypy/main.py
Outdated
@@ -527,7 +527,7 @@ def add_invertible_flag(flag: str, | |||
.format(special_opts.package)) | |||
options.build_type = BuildType.MODULE | |||
lib_path = [os.getcwd()] + build.mypy_path() | |||
targets = build.find_modules_recursive(special_opts.package, lib_path) | |||
targets = build.FindModuleCache().find_modules_recursive(special_opts.package, lib_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of sad that this creates a separate instance of all the caches, and introduces an opportunity for a race condition. Maybe add a TODO (since it was a pre-existing condition)?
mypy/fscache.py
Outdated
|
||
|
||
class FileSystemCache: | ||
def __init__(self, pyversion: Tuple[int, int]) -> None: | ||
def __init__(self, pyversion: Optional[Tuple[int, int]] = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there shouldn't be two classes here. A base class that has no python_version and no read_with_python_encoding() method, and a subclass that has both. This would avoid the Optional and the assert (I had to go on a hunt for all instantiations of this class to get comfortable with your current version, and it seems all but one call site would be happy with the proposed base class.)
mypy/build.py
Outdated
lib_path = tuple(lib_path_arg) | ||
fscache = self.fscache | ||
|
||
def find() -> Optional[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're refactoring, I wonder if this should be made a regular method (with a longer name) and passed in whatever it needs from find_module(). The nested function always bothered me, and a nested function that references self bothers me more.
mypy/build.py
Outdated
|
||
|
||
def verify_module(id: str, path: str) -> bool: | ||
def verify_module(fscache: FileSystemCache, id: str, path: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could become a method on FileSystemCache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not add more knowledge about python-specific things to FileSystemCache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
mypy/build.py
Outdated
@@ -1399,14 +1325,6 @@ def delete_cache(id: str, path: str, manager: BuildManager) -> None: | |||
d. from P import M; checks filesystem whether module P.M exists in | |||
filesystem. | |||
|
|||
e. Race conditions, where somebody modifies a file while we're |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe keep the bullet with the first sentence and replace the rest with "Solved by FileSystemClass"?
except IOError as ioerr: | ||
# ioerr.strerror differs for os.stat failures between Windows and | ||
# other systems, but os.strerror(ioerr.errno) does not, so we use that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain that we want the error to be platform-independent so tests can have predictable output?
mypy/fscache.py
Outdated
return stat.S_ISREG(st.st_mode) | ||
try: | ||
st = self.stat(path) | ||
return stat.S_ISREG(st.st_mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly move this out of the try/except -- this can't raise OSError.
|
||
On case-insensitive filesystems (like Mac or Windows) this returns | ||
False if the case of the path's last component does not exactly | ||
match the case found in the filesystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this means that importing wrongly-cased packages will not be flagged, since we only compare the filename (which would be __init__.py
for a package). Not actionable except perhaps add a TODO comment; this was a pre-existing problem. (Python itself gets it right so that's a fallback.)
I updated |
Yay! |
This overhauls our file-system accesses to always go through a
FileSystemCache
when stat()ing or reading source files. (We don't use it for reading cache json files, since we seem to only do a single read on each anyways and so caching doesn't help us any.) This eliminates a number of potential bugs having to do with seeing inconsistent file-system state and reduces the number of file reads+decodes in fine-grained incremental mode.This also allows us to eliminate a number of caches maintained by
find_module
that duplicated functionality ofFileSystemCache
. Since I was reluctant to make theFileSystemCache
global, I took the opportunity to makefind_module
's caches non-global as well.