-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix joinpath on Windows and improve joinpath on linux #33477
Conversation
f5d9e9d
to
5142031
Compare
base/path.jl
Outdated
@@ -246,8 +246,70 @@ function splitpath(p::String) | |||
return out | |||
end | |||
|
|||
joinpath(path::AbstractString) = path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well make the function type stable while you're at it and always return String
.
If you're concerned about efficiency and not creating lots of intermediary strings—which you seem to be—then you may want to use the |
base/path.jl
Outdated
continue | ||
end | ||
# same drive in different case | ||
result_drive = p_drive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to normalize the drive name capitalization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are UNC drives also case insensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to normalize the drive name capitalization?
Don't think so. We should just preserve the input capitalization.
Are UNC drives also case insensitive?
In this case it doesn't matter since if the user specifies a UNC path it is absolute.
@StefanKarpinski: Thank you very much for the detailed review. I went ahead and also removed the troublesome The current implementation of Hence, it is probably possible to keep the current implementation and check drive letter lowercase issues on Windows with an additional if-clause. It's also possible, if there is a preference, to use the implementation in this PR. Both options I am comfortable with. The advantage of the current implementation is speed, but as you mention it probably doesn't matter for such a non-critical path API function. The advantage of the existing |
test/path.jl
Outdated
@test joinpath(S("foo"), S(homedir())) == homedir() | ||
@test joinpath(S(abspath("foo")), S(homedir())) == homedir() | ||
|
||
if Sys.iswindows() | ||
@test joinpath(S("foo"),S("bar:baz")) == "bar:baz" | ||
@test joinpath(S("C:"),S("foo"),S("D:"),S("bar")) == "D:bar" | ||
@test joinpath(S("C:"),S("foo"),S("D:bar"),S("baz")) == "D:bar$(sep)baz" | ||
|
||
# relative folders and case-insensitive drive letters | ||
@test joinpath(S("C:\\a\\b"), S("c:c\\e")) == "c:\\a\\b\\c\\e" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the earlier or later capitalization of the drive name be kept?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preference is to keep the later capitalization. This follows the general rule to prefer p2
in joinpath(p1,p2)
, or if p2
is absolute to override p1
completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does Python do here? I would think that in the case where a
is absolute and b
is relative, if they are joinable (i.e. the drives are compatible) one would want joinpath(a, b)
to start with a
and end with b
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does Python do here? I
Python keeps the later capitalization. But it doesn't really matter, since it is case-insentive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting. I was thinking that this might affect the definition of relpath
but that doesn't seem to depend on joinpath
at all in its logic, although I suspect it probably does have a bug in the case of drive letters with case mismatches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, honestly I haven't had a close look at relpath
to tell how this affects relpath
. My general thinking is that it would be a good idea to audit all of our FileSystem API, to check how we do with UNC paths and relative paths with drive specifications. We also don't support extended paths, which is another good extension in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@test joinpath(S("C:\\a\\b"), S("c:c\\e")) == "c:\\a\\b\\c\\e" | |
@test joinpath(S("C:\\a\\b"), S("c:c\\e")) == "C:\\a\\b\\c\\e" |
And change this test.
@StefanKarpinski It seems that this function is also a candidate for early conversion. In other words I should define joinpath(path::AbstractString, paths::AbstractString...) = joinpath(String(path), String.(paths)...)
for performance reasons. |
Yes, this would be a good case for early conversion. |
Great, and I also confirmed locally that this early conversion does help a lot with performance and reducing allocations. |
Excellent. It should also make things easier on the compiler which needs to specialize less code for "weird string types". |
This seems good to me, I just need to look into #33455 (comment) and understand what's going on there. If I forget, please ping me to remind me about this because we definitely want this. |
NP. And thanks for taking a look at this. Yes the current bug in |
Do you think that if we weren't hitting that bug then the current |
If we weren't hitting that bug then the current implementation works for #33455 but not #33475 (it would require more tweaks to get it to fix that later bug) So this PR works around the compiler bug by performance optimization (on all platforms) and in the Windows case the implementation in this PR also fixes #33475 by testing another lettercasing of drive volume names. |
Ok, thanks for the explanation and reassurances. That makes me feel pretty good about this. I will leave it open for now while I look into the compiler issue. |
Maybe the compiler got smarter, or more likely my benchmarks might have been sloppy done, but now I see no performance difference between the |
The only thing I would change at this point is keeping the earlier drive name when they have a case mismatch, since I would expect |
base/path.jl
Outdated
result_path = p_path | ||
continue | ||
end | ||
result_drive = p_drive # same drive in different case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result_drive = p_drive # same drive in different case |
Just delete this line, I think.
test/path.jl
Outdated
@test joinpath(S("foo"), S(homedir())) == homedir() | ||
@test joinpath(S(abspath("foo")), S(homedir())) == homedir() | ||
|
||
if Sys.iswindows() | ||
@test joinpath(S("foo"),S("bar:baz")) == "bar:baz" | ||
@test joinpath(S("C:"),S("foo"),S("D:"),S("bar")) == "D:bar" | ||
@test joinpath(S("C:"),S("foo"),S("D:bar"),S("baz")) == "D:bar$(sep)baz" | ||
|
||
# relative folders and case-insensitive drive letters | ||
@test joinpath(S("C:\\a\\b"), S("c:c\\e")) == "c:\\a\\b\\c\\e" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@test joinpath(S("C:\\a\\b"), S("c:c\\e")) == "c:\\a\\b\\c\\e" | |
@test joinpath(S("C:\\a\\b"), S("c:c\\e")) == "C:\\a\\b\\c\\e" |
And change this test.
@StefanKarpinski I see where you are coming from. At the same time, I think the current behavior is preferred, since in |
I'm not sure I agree that |
Updated |
Fix #33475 and #33455