-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] Add support for editable installs to the module resolver #12307
Conversation
|
@@ -73,6 +73,7 @@ enum ModuleResolutionPathBufInner { | |||
FirstParty(SystemPathBuf), | |||
StandardLibrary(FilePath), | |||
SitePackages(SystemPathBuf), | |||
EditableInstall(SystemPathBuf), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered just using the SitePackages
variant for editable installs, since these search paths do come about as a consequence of .pth
files in the site-packages
directory. However, the search paths for editable installs aren't necessarily children of the site-packages
search path (and indeed most often won't be), and I think it would be too easy to assume that all instances of the SitePackages
variant have the site-packages
directory as their root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a problem with introducing a new enum variant for pth-added paths, though I'm also not totally convinced by it: I don't know when we'd ever handle them differently, and it requires one more chunk of duplicated code wherever we handle this enum. I don't think the confusion you mention should occur: in general search paths should never be children of other search paths (this is a really bad setup that leads to the same module being importable under two different names), so it doesn't seem to me to be a likely confusion to assume this would be the case.
I'm not sure about the name EditableInstall
. pth
files are a feature of the Python import system that are used by the editable-install feature of some package managers, but the feature predates that use, and pth
files have been (and I'm sure still are) used in many other ways as well (e.g. for the zipped-egg installation format of setuptools). So I think it's potentially a little misleading to imply that every path discovered in a .pth
file is an editable install.
I'd suggest PthEntry
or something similar instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the name
EditableInstall
.pth
files are a feature of the Python import system that are used by the editable-install feature of some package managers, but the feature predates that use, andpth
files have been (and I'm sure still are) used in many other ways as well (e.g. for the zipped-egg installation format of setuptools). So I think it's potentially a little misleading to imply that every path discovered in a.pth
file is an editable install.
Hrm, true, but the validation we do elsewhere should ensure that we only add paths that point to directories when we iterate through .pth
files to try to find paths that could potentially be editable installs. While it's true that a directoy path in a .pth
file could point to something which wasn't intended to be an editable install, the Python runtime will still consider Python modules at that path to be importable, correct (since they'll be added to sys.path
by site.py
)? So IIUC a directory path in a .pth
file is in effect an editable install, even if it... wasn't meant to be.
All of which is to say, from the perspective of the module resolver, this variant should only really be used if we think it's more-likely-than-not that a path represents a directory that has been editably installed, or a path relative to a path representing a directory that has been editably installed. We should have done sufficient validation at this point that it's more than just "an arbitrary line in a .pth
file".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with either name but it might be at the time that we add some documentation to each variant because I don't think the linked order still applies (it doesn't mention editable installs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some more docs makes sense but I won't do it just now, since I may still refactor this pretty soon according to the idea Carl suggested on one of my earlier PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly any path found in a .pth
file will be added to sys.path
, and particularly if that path is a directory, it will result in modules there being importable. I'm not sure if there is any other validation you're referring to that we do.
The logical jump you're making that I don't quite follow is "if we add a path to sys.path that makes things importable, that's an 'editable install'." I don't think that follows. It's just importable Python source code. "Editable install" is a very specific user-facing feature that implies a certain use case. For example, there's no reason the path added by the pth
file line couldn't point to some Python code on a read-only filesystem. We should still support that just fine, but it's not very editable! In that case the pth file entry is being used for some other use case that I wouldn't describe as an "editable install."
Dude, stop working 😂 I thought you needed some rest |
I have an addiction :( |
CodSpeed Performance ReportMerging #12307 will improve performances by 4.81%Comparing Summary
Benchmarks breakdown
|
#[test] | ||
fn deleting_pth_file_on_which_module_resolution_depends_invalidates_cache() { | ||
const SITE_PACKAGES: &[FileSpec] = &[("_foo.pth", "/x/src")]; | ||
const X_DIRECTORY: &[FileSpec] = &[("/x/src/foo.py", "")]; | ||
|
||
let TestCase { | ||
mut db, | ||
site_packages, | ||
.. | ||
} = TestCaseBuilder::new() | ||
.with_site_packages_files(SITE_PACKAGES) | ||
.with_other_files(X_DIRECTORY) | ||
.build(); | ||
|
||
let foo_module_name = ModuleName::new_static("foo").unwrap(); | ||
let foo_module = resolve_module(&db, foo_module_name.clone()).unwrap(); | ||
assert_eq!( | ||
Some(foo_module.file()), | ||
system_path_to_file(&db, SystemPathBuf::from("/x/src/foo.py")) | ||
); | ||
|
||
let pth_file = site_packages.join("_foo.pth"); | ||
db.memory_file_system().remove_file(&pth_file).unwrap(); | ||
File::touch_path(&mut db, &pth_file); | ||
assert_eq!(resolve_module(&db, foo_module_name.clone()), None); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is failing, and I'm not sure why. To read the contents of the pth
files, I'm using system_path_to_file
in PthFileIterator::next()
, and source_text()
in PthFile::new()
. IIUC, I think that should be sufficient to indicate to Salsa that the module_search_paths()
query depends on the existence and contents of the _foo.pth
file, and that therefore deleting the _foo.pth
file should invalidate the previous query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick look at the salsa events and this seems correct.
[crates/red_knot_module_resolver/src/resolver.rs:1555:9] db.take_salsa_events().debug(&db) = [
Event {
runtime_id: RuntimeId {
counter: 0,
},
kind: WillCheckCancellation,
},
Event {
runtime_id: RuntimeId {
counter: 0,
},
kind: WillCheckCancellation,
},
Event {
runtime_id: RuntimeId {
counter: 0,
},
kind: WillCheckCancellation,
},
Event {
runtime_id: RuntimeId {
counter: 0,
},
kind: DidValidateMemoizedValue {
database_key: source_text(2),
},
},
Event {
runtime_id: RuntimeId {
counter: 0,
},
kind: DidValidateMemoizedValue {
database_key: parse_typeshed_versions(2),
},
},
Event {
runtime_id: RuntimeId {
counter: 0,
},
kind: WillCheckCancellation,
},
Event {
runtime_id: RuntimeId {
counter: 0,
},
kind: WillExecute {
database_key: dynamic_module_resolution_paths(0),
},
},
Event {
runtime_id: RuntimeId {
counter: 0,
},
kind: DidValidateMemoizedValue {
database_key: resolve_module_query(0),
},
},
]
It re-runs the dynamic_module_resolution_paths
query because the dynamic module resolution paths changed (you deleted a phf
file). It then re-runs the resolve_module_query
but evaluates to the same cached result. Not sure why this is the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that adding db.report_untracked_read()
to the function body, like you suggested elsehwere, makes the test start passing. But then if I start using db.system().read_to_string()
in the query to get the contents of the .pth
file rather than source_text()
(which you said should be okay if I added the report_untracked_read()
call), it starts failing again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good.
An unrelated comment for resolver
(and some of them might be because how I structured the code). I find it difficult to read the code because I always have to jump up and down. The ModuleResolutionSettings
aren't next to the ValidatedSearchPathSettings
, instead, the Pth
code becomes in between. We shouldn't refactor the code as part of this PR but maybe you can come up with an ordering that brings the code closer together.
Yeah, I have the same opinion. I'm planning on doing a followup where I move some more code out into a separate submodule just for finding the search paths -- but although this PR makes the problem worse, I didn't want to do it in this PR, as it would have made the diff harder to read and review. |
…ation rather than deletion of the `pth` file
Co-authored-by: Micha Reiser <[email protected]>
Co-authored-by: Carl Meyer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Thanks for addressing all feedback.
I think we should handle deduplication in a follow up PR. It's not specifically related to editable installs. It's just that editable installs make the problem worse.
My only ask before merging is to change PthFile
to use a String
instead of SourceText
and to use IndexMap
over OrderedSet
.
@@ -73,6 +73,7 @@ enum ModuleResolutionPathBufInner { | |||
FirstParty(SystemPathBuf), | |||
StandardLibrary(FilePath), | |||
SitePackages(SystemPathBuf), | |||
EditableInstall(SystemPathBuf), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with either name but it might be at the time that we add some documentation to each variant because I don't think the linked order still applies (it doesn't mention editable installs)
I already changed the PR so that it uses |
Summary
This PR adds support for editable installations in
site-packages
to the module resolver. Still a few details to be checked vis-à-vis Python's runtime semantics.Work towards #11653.
Test Plan
cargo test -p red_knot_module_resolver