-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-6622: [R] Normalize paths for filesystem API on Windows #5445
Conversation
I think I'm with @pitrou here that introducing automatic path normalization into the C++ filesystem API might be a bit too much scope creep -- we could provide a built-in path normalization function for consumers that don't have it (e.g. Python has pathlib, R has its own function here) |
I can accept that, though we should document the C++ API's requirements accordingly. I created https://issues.apache.org/jira/browse/ARROW-6629 because I didn't see any filesystem docs at all, so nowhere to just add a note in this patch. |
This is failing, though differently than before: https://ci.appveyor.com/project/nealrichardson/arrow/builds/27536178/job/gu10af0vck7gum81#L4342 The path normalization is making absolute paths with respect to the current working directory in the R process, and that can clash with whatever the FileSystem object has as its path. I'll need to rethink the approach. |
bef7f7f
to
a1c1e3f
Compare
Codecov Report
@@ Coverage Diff @@
## master #5445 +/- ##
=========================================
Coverage ? 77.18%
=========================================
Files ? 58
Lines ? 4011
Branches ? 0
=========================================
Hits ? 3096
Misses ? 915
Partials ? 0
Continue to review full report at Codecov.
|
This is passing and ready for review @romainfrancois |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use fs::
here ?
r/R/filesystem.R
Outdated
shared_ptr(SubTreeFileSystem, xp) | ||
} | ||
|
||
clean_path <- function(path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like things potentially better fixed with fs::
cc @jimhester. I know fs
has been downgraded to Suggests
here, but I'm just worried we'd be missing cases with the implementation below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see the fs
function that does the winslash normalization we need here, but more than happy to use it if it exists.
|
||
fs_relative_path <- function(path) { | ||
# Make sure all path separators are "/", not "\" as on Windows | ||
path_sep <- ifelse(tolower(Sys.info()[["sysname"]]) == "windows", "\\\\", "/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably use .Platform$file.sep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not sure about the name of this function, it is not really doing anything to compute a relative path, you could probably just use normalizePath(mustWork = FALSE)
, assuming path doesn't actually have to exist on the filesystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: .Platform$file.sep
, I intended to but then on my Windows VM, .Platform$file.sep
reported "/" while tempfile()
still created paths with "\" 🤷♂
I originally did normalizePath
but unfortunately for this use case, normalizePath
does path expansion, even with mustWork=FALSE
. We need to keep the path relative (to whatever the FileSystem object holds as its cwd) and just need Windows \
to be converted to /
. Regular expressions were my last resort and I'm happy to use something better or more robust if it exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks a bit weird to do a no-op replacement on Unix, though I suppose it doesn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look correct to me. The simple solution should be to start with /-separated slashes and only use that.
r/R/filesystem.R
Outdated
shared_ptr(Selector, fs___Selector__create(base_dir, allow_non_existent, recursive)) | ||
shared_ptr( | ||
Selector, | ||
fs___Selector__create(fs_relative_path(base_dir), allow_non_existent, recursive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? Will it work correctly if base_dir
is e.g. a path in an S3 filesystem, or will it produce bogus results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question for all such cases below.
@@ -83,7 +83,6 @@ test_that("SubTreeFilesystem", { | |||
expect_is(st_fs, "FileSystem") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you should simply /-normalize td
above. Should be easy :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At worse you simply string-replace \
with /
(but only when on Windows :-)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's where I ended up: https://github.com/apache/arrow/pull/5445/files#diff-3bd936ee39b8abe149e5399681610c75R255
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, then you shouldn't call it fs_relative_path
...
Also note that |
8de4cbf
to
515d710
Compare
I renamed the functions to I wouldn't be surprised if we had to revisit this in the future, but for now this seems like a step forward, so I'll merge unless there are objections. |
I fixed this in R, though I wonder if that's not enough. At a minimum, the C++ docs should note this requirement (@pitrou maybe you know the best place in the docs to add this?), and I still think it would be nice if all of this path normalization were handled in C++ (cf. https://issues.apache.org/jira/browse/ARROW-6324). Closes apache#5445 from nealrichardson/win-fs-fix and squashes the following commits: 515d710 <Neal Richardson> Rename functions 1775a6d <Neal Richardson> Fix two other absolute paths I missed 0dce0d7 <Neal Richardson> Munge paths directly on windows f03815e <Neal Richardson> Normalize paths for filesystem API on Windows Authored-by: Neal Richardson <[email protected]> Signed-off-by: Neal Richardson <[email protected]>
I fixed this in R, though I wonder if that's not enough. At a minimum, the C++ docs should note this requirement (@pitrou maybe you know the best place in the docs to add this?), and I still think it would be nice if all of this path normalization were handled in C++ (cf. https://issues.apache.org/jira/browse/ARROW-6324).