-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-40074: [C++][FS][Azure] Implement DeleteFile()
for flat-namespace storage accounts
#40075
Conversation
|
@kou I added a commit making error messages a bit more detailed now and removed the |
cpp/src/arrow/filesystem/azurefs.cc
Outdated
DCHECK(!location.path.empty()); | ||
constexpr auto kFileBlobLeaseTime = std::chrono::seconds{15}; | ||
auto no_trailing_slash_location = location.RemoveTrailingSlash( | ||
/*preserve_original=*/true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I think that preserve_original
isn't a good approach. It will confuse us. Is it really needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want all
to be path the user used so error messages make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about RemoveTrailingSlashFromPath()
?
UPDATE: I added a commit renaming it and the docstring now explains why all
is always preserved. I don't think there will be any use-case for changing all
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that no_trailing_slash_location.all
isn't used by the user:
ARROW_ASSIGN_OR_RAISE(auto file_info,
GetFileInfo(container_client, no_trailing_slash_location));
GetFileInfo()
in the above code may return an error to the user. But GetFileInfo()
doesn't use no_trailing_slash_location.all
for an error message:
arrow/cpp/src/arrow/filesystem/azurefs.cc
Lines 1275 to 1277 in a03d957
return ExceptionToStatus( | |
exception, "GetProperties for '", file_client.GetUrl(), | |
"' failed. GetFileInfo is unable to determine whether the path exists."); |
It uses file_client.GetUrl()
and it's based on no_trailing_slash_location.path
not .all
.
And it seems that FileInfo::path()
of file_info
returned by GetFileInfo()
isn't used in this lambda. (file_info.path()
uses no_trailing_slash_location.all
but it's not used.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test failures when I remove this:
[ RUN ] TestAzureFileSystemOnAllScenarios/2.DeleteFileAtContainerRoot
/Users/felipe/code/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:942: Failure
Failed
'fs()->DeleteFile(data.ObjectPath() + "/")' did not fail with errno=ENOTDIR: IOError: Path does not exist 'w3ay9l35qoxg0i0lqbfababvwbrrrhxq/test-object-name/'. Detail: [errno 2] No such file or directory
[ FAILED ] TestAzureFileSystemOnAllScenarios/2.DeleteFileAtContainerRoot, where TypeParam = arrow::fs::TestingScenario<arrow::fs::AzureFlatNSEnv,true> (3185 ms)
[ RUN ] TestAzureFileSystemOnAllScenarios/2.DeleteFileAtSubdirectory
/Users/felipe/code/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:1000: Failure
Value of: _st.ToStringWithoutContextLines()
Expected: has substring "Not a directory: 'sco3n8q67r3cd2mas6lx53nlc36uw7f9/dir/file0/'"
Actual: "IOError: Path does not exist 'sco3n8q67r3cd2mas6lx53nlc36uw7f9/dir/file0/'. Detail: [errno 2] No such file or directory"
/Users/felipe/code/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:1000: Failure
Value of: _st.ToStringWithoutContextLines()
Expected: has substring "Not a directory: '3nw7fyp8whjb3r8e6g7eoyg2zav8nr8y/dir/file0/'"
Actual: "IOError: Path does not exist '3nw7fyp8whjb3r8e6g7eoyg2zav8nr8y/dir/file0/'. Detail: [errno 2] No such file or directory"
[ FAILED ] TestAzureFileSystemOnAllScenarios/2.DeleteFileAtSubdirectory, where TypeParam = arrow::fs::TestingScenario<arrow::fs::AzureFlatNSEnv,true> (12108 ms)
I'm going to implement this by copying location
and removing the trailing slashes manually.
...instead of taking a bool parameter.
…t found as a file
1891222
to
965fcba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
/// \pre location.path is not empty. | ||
Status DeleteFileOnContainer(const Blobs::BlobContainerClient& container_client, | ||
const AzureLocation& location, bool require_file_to_exist, | ||
const char* operation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we don't need to receive operation
as an argument. How about defining it as a local variable instead of an argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might use Move()
with it, so I will keep it.
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit a2d0729. There were 2 benchmark results with an error:
There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…mespace storage accounts (apache#40075) ### Rationale for this change It was not implemented yet. ### What changes are included in this PR? - An implementation of `DeleteFile()` that is specialized to storage accounts that don't have HNS support enabled - This fixes a semantic issue: deleting a file should not delete the parent directory when the file deleted was the last one - Increased test coverage - Fix of a bug in the version that deletes files in HNS-enabled accounts (we shouldn't let `DeleteFile` delete directories even if they are empty) ### Are these changes tested? Yes. Tests were re-written and moved to `TestAzureFileSystemOnAllScenarios`. * Closes: apache#40074 Lead-authored-by: Felipe Oliveira Carvalho <[email protected]> Co-authored-by: jerry.adair <[email protected]> Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
…mespace storage accounts (apache#40075) ### Rationale for this change It was not implemented yet. ### What changes are included in this PR? - An implementation of `DeleteFile()` that is specialized to storage accounts that don't have HNS support enabled - This fixes a semantic issue: deleting a file should not delete the parent directory when the file deleted was the last one - Increased test coverage - Fix of a bug in the version that deletes files in HNS-enabled accounts (we shouldn't let `DeleteFile` delete directories even if they are empty) ### Are these changes tested? Yes. Tests were re-written and moved to `TestAzureFileSystemOnAllScenarios`. * Closes: apache#40074 Lead-authored-by: Felipe Oliveira Carvalho <[email protected]> Co-authored-by: jerry.adair <[email protected]> Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
Rationale for this change
It was not implemented yet.
What changes are included in this PR?
DeleteFile()
that is specialized to storage accounts that don't have HNS support enabledDeleteFile
delete directories even if they are empty)Are these changes tested?
Yes. Tests were re-written and moved to
TestAzureFileSystemOnAllScenarios
.DeleteFile()
for flat-namespace storage accounts #40074