-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FindModuleCache: optionally leverage BuildSourceSet #12616
Conversation
This is a duplicate of #9478, necessary due to an accidental branch deletion. |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why this feature cannot/should not be enabled by default? Would some tests fail? I'm sympathetic to improving mypy performance in your use case, and even if the new logic can't fully replace the default logic, it may still be reasonable to include it behind a (at least somewhat hidden) flag.
It would still be great to have at least some test coverage for this feature, even if it's not enabled by default. Otherwise it could easily regress when somebody changes the related code.
add_invertible_flag( | ||
'--fast-module-lookup', default=False, | ||
help="Enable fast path for finding modules within input sources", | ||
group=code_group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this flag doesn't have good test coverage, I'd prefer to not advertise it in the mypy --help
output, since it could easily break between releases. Instead, this could be discussed in the documentation, with some caveats about being an experimental feature.
You can use help=argparse.SUPPRESS
for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you switch to help=argparse.SUPPRESS
?
@@ -158,6 +188,50 @@ def clear(self) -> None: | |||
self.initial_components.clear() | |||
self.ns_ancestors.clear() | |||
|
|||
def find_module_via_source_set(self, id: str) -> Optional[ModuleSearchResult]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstring. Also mention that this is not used by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
On Wed, Apr 20, 2022, 8:15 AM Jukka Lehtosalo ***@***.***> wrote:
***@***.**** commented on this pull request.
Do you know why this feature cannot/should not be enabled by default?
Would some tests fail? I'm sympathetic to improving mypy performance in
your use case, and even if the new logic can't fully replace the default
logic, it may still be reasonable to include it behind a (at least somewhat
hidden) flag.
In the previous PR Guido expressed some concerns about possible subtle
breakages so I figured putting this behind a switch would make it more
palatable. Tests were passing with it enabled though, so we could instead
turn it on by default and document that ot can be turned off if something
somehow happens to go wrong in a corner case.
Another possible option might be to run the test suite in both modes in CI
to ensure adequate coverage.
… It would still be great to have at least some test coverage for this
feature, even if it's not enabled by default. Otherwise it could easily
regress when somebody changes the related code.
------------------------------
In mypy/main.py
<#12616 (comment)>:
> @@ -878,6 +878,10 @@ def add_invertible_flag(flag: str,
'--explicit-package-bases', default=False,
help="Use current directory and MYPYPATH to determine module names of files passed",
group=code_group)
+ add_invertible_flag(
+ '--fast-module-lookup', default=False,
+ help="Enable fast path for finding modules within input sources",
+ group=code_group)
If this flag doesn't have good test coverage, I'd prefer to not advertise
it in the mypy --help output, since it could easily break between
releases. Instead, this could be discussed in the documentation, with some
caveats about being an experimental feature.
You can use help=argparse.SUPPRESS for this.
------------------------------
In mypy/modulefinder.py
<#12616 (comment)>:
> @@ -158,6 +188,50 @@ def clear(self) -> None:
self.initial_components.clear()
self.ns_ancestors.clear()
+ def find_module_via_source_set(self, id: str) -> Optional[ModuleSearchResult]:
Add docstring. Also mention that this is not used by default.
—
Reply to this email directly, view it on GitHub
<#12616 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABX3CSPUSUTSIWSWJRDAKTDVGANPPANCNFSM5TVDXZQQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I agree with this. This is something we might possibly enable by default in the future, but it seems too dangerous to do it now.
This is not a practical option, since it would slow down our CI significantly. A better option would be to have at least a handful of tests (possibly copy from existing tests and only add the flag) that would cover at least the most commonly used code paths. |
To clarify: I meant configuring CI to have a separate parallel run of the whole test suite with the flag enabled, which takes more compute but should leave overall latency unchanged. This means that local test runs would not exercise this feature unless explicitly enabled, which seems fine if the feature is not publicly exposed. |
I also added a handful of extra tests to verify that the fast path has similar behavior in some edge cases. Finally, I added a section in the docs to explain the purpose of the new flag, and its experimental status. |
9260c8e
to
3e128fa
Compare
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
@JukkaL are you comfortable shipping this feature given those adjustments? |
Anything more I can do to get this into master / next release? |
Ping @JukkaL ? This is a really big deal for us. We've deployed this fix onto a custom build of 0.790 because the performance impact was so significant and having to backport the fix is a big impediment to our ability to upgrade mypy, because of the additional complication of creating mypyc-enabled builds for all supported architectures. It would be very valuable to be able to start using mainline mypy again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few additional comments.
add_invertible_flag( | ||
'--fast-module-lookup', default=False, | ||
help="Enable fast path for finding modules within input sources", | ||
group=code_group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you switch to help=argparse.SUPPRESS
?
docs/source/running_mypy.rst
Outdated
@@ -516,6 +516,32 @@ same directory on the search path, only the stub file is used. | |||
(However, if the files are in different directories, the one found | |||
in the earlier directory is used.) | |||
|
|||
If a namespace package is spread across many distinct folders, for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move the explanation of the new flag to command_line.rst
? I don't think that we need to mention the option here, at least as long as we don't know how common it is to have large namespace packages that would be helped by the option. This page is more of an introduction to module lookup logic and doesn't need to cover every flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Gated behind a command line flag to assuage concerns about subtle issues in module lookup being introduced by this fast path.
Rebased on master and addressed comments. |
According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good now, and just in time for the 0.960 release.
Given a large codebase with folder hierarchy of the form ``` foo/ company/ __init__.py foo/ bar/ company/ __init__.py bar/ baz/ company/ __init__.py baz/ ... ``` with >100 toplevel folders, the time spent in load_graph is dominated by find_module because this operation is itself O(n) where n is the number of input files, which ends up being O(n**2) because it is called for every import statement in the codebase and the way find_module work, it will always scan through each and every one of those toplevel directories for each and every import statement of company.* Introduce a fast path that leverages the fact that for imports within the code being typechecked, we already have a mapping of module import path to file path in BuildSourceSet Gated behind a command line flag (--fast-module-lookup) to assuage concerns about subtle issues in module lookup being introduced by this fast path.
Great! |
I saw this in the changelog and I decided to test. We use namespaces a lot (with the exact same layout as above) and we have a 1.8k files codebase. We can't notice the speed difference. (we run mypy twice -- once on the whole codebase and once on individual scripts with other flags and another cache folder) With $ time ./bin/mypy
Success: no issues found in 1807 source files
Success: no issues found in 85 source files
real 0m15.235s
user 0m13.303s
sys 0m1.836s Without: $ time ./bin/mypy
Success: no issues found in 1807 source files
Success: no issues found in 85 source files
real 0m14.976s
user 0m13.090s
sys 0m1.814s The difference is really small and changes depending on the run. However we have only 19 toplevel folders. (interestingly almost proportional to the size of our codebase compared to yours!). |
Thanks for trying it out and reporting your results! For reference, as of today the codebase that motivated this PR has 21650 files split over 982 distinct subtrees that all share the same toplevel python package name. The number of packages sharing a toplevel namespace is the most significant factor that will determine how much of a speedup you may experience. Another relevant factor is the use of Also note that the alternate module lookup logic only takes effect for files that are part of the input sources, so depending on how you invoke mypy it may or may not trigger. We explicitly pass every toplevel folder in the input command line and in MYPYPATH and I have not done extensive testing of alternate invocation modes to see which ones actually trigger the fast path. |
Given a large codebase with folder hierarchy of the form
with >100 toplevel folders, the time spent in
load_graph
is dominated by
find_module
because this operation isitself
O(n)
wheren
is the number of input files, whichends up being
O(n**2)
because it is called for every importstatement in the codebase and the way find_module work,
it will always scan through each and every one of those
toplevel directories for each and every import statement
of
company.*
Introduce a fast path that leverages the fact that for
imports within the code being typechecked, we already
have a mapping of module import path to file path in
BuildSourceSet
In a real-world codebase with ~13k files split across
hundreds of packages, this brings
load_graph
from~180s down to ~48s, with profiling showing that
parse
is now taking the vast majority of the time, as expected.