Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-40074: [C++][FS][Azure] Implement DeleteFile() for flat-namespace storage accounts #40075

Merged
merged 15 commits into from
Feb 21, 2024

Conversation

felipecrv
Copy link
Contributor

@felipecrv felipecrv commented Feb 14, 2024

Rationale for this change

It was not implemented yet.

What changes are included in this PR?

  • An implementation of DeleteFile() that is specialized to storage accounts that don't have HNS support enabled
  • This fixes a semantic issue: deleting a file should not delete the parent directory when the file deleted was the last one
  • Increased test coverage
  • Fix of a bug in the version that deletes files in HNS-enabled accounts (we shouldn't let DeleteFile delete directories even if they are empty)

Are these changes tested?

Yes. Tests were re-written and moved to TestAzureFileSystemOnAllScenarios.

Copy link

⚠️ GitHub issue #40074 has been automatically assigned in GitHub to PR creator.

@felipecrv
Copy link
Contributor Author

@av8or1

@felipecrv
Copy link
Contributor Author

@Tom-Newton

cpp/src/arrow/filesystem/azurefs.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs.cc Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs_test.cc Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs_test.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Feb 15, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 16, 2024
@felipecrv
Copy link
Contributor Author

felipecrv commented Feb 16, 2024

@kou I added a commit making error messages a bit more detailed now and removed the WithHierarchicalNamespace() that was needed because of differences in HNS and Flat namespace implementations.

@felipecrv felipecrv requested a review from kou February 16, 2024 02:06
DCHECK(!location.path.empty());
constexpr auto kFileBlobLeaseTime = std::chrono::seconds{15};
auto no_trailing_slash_location = location.RemoveTrailingSlash(
/*preserve_original=*/true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think that preserve_original isn't a good approach. It will confuse us. Is it really needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want all to be path the user used so error messages make sense.

Copy link
Contributor Author

@felipecrv felipecrv Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about RemoveTrailingSlashFromPath()?

UPDATE: I added a commit renaming it and the docstring now explains why all is always preserved. I don't think there will be any use-case for changing all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that no_trailing_slash_location.all isn't used by the user:

      ARROW_ASSIGN_OR_RAISE(auto file_info,
                            GetFileInfo(container_client, no_trailing_slash_location));

GetFileInfo() in the above code may return an error to the user. But GetFileInfo() doesn't use no_trailing_slash_location.all for an error message:

return ExceptionToStatus(
exception, "GetProperties for '", file_client.GetUrl(),
"' failed. GetFileInfo is unable to determine whether the path exists.");

It uses file_client.GetUrl() and it's based on no_trailing_slash_location.path not .all.

And it seems that FileInfo::path() of file_info returned by GetFileInfo() isn't used in this lambda. (file_info.path() uses no_trailing_slash_location.all but it's not used.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test failures when I remove this:

[ RUN      ] TestAzureFileSystemOnAllScenarios/2.DeleteFileAtContainerRoot
/Users/felipe/code/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:942: Failure
Failed
'fs()->DeleteFile(data.ObjectPath() + "/")' did not fail with errno=ENOTDIR: IOError: Path does not exist 'w3ay9l35qoxg0i0lqbfababvwbrrrhxq/test-object-name/'. Detail: [errno 2] No such file or directory

[  FAILED  ] TestAzureFileSystemOnAllScenarios/2.DeleteFileAtContainerRoot, where TypeParam = arrow::fs::TestingScenario<arrow::fs::AzureFlatNSEnv,true> (3185 ms)
[ RUN      ] TestAzureFileSystemOnAllScenarios/2.DeleteFileAtSubdirectory
/Users/felipe/code/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:1000: Failure
Value of: _st.ToStringWithoutContextLines()
Expected: has substring "Not a directory: 'sco3n8q67r3cd2mas6lx53nlc36uw7f9/dir/file0/'"
  Actual: "IOError: Path does not exist 'sco3n8q67r3cd2mas6lx53nlc36uw7f9/dir/file0/'. Detail: [errno 2] No such file or directory"

/Users/felipe/code/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:1000: Failure
Value of: _st.ToStringWithoutContextLines()
Expected: has substring "Not a directory: '3nw7fyp8whjb3r8e6g7eoyg2zav8nr8y/dir/file0/'"
  Actual: "IOError: Path does not exist '3nw7fyp8whjb3r8e6g7eoyg2zav8nr8y/dir/file0/'. Detail: [errno 2] No such file or directory"

[  FAILED  ] TestAzureFileSystemOnAllScenarios/2.DeleteFileAtSubdirectory, where TypeParam = arrow::fs::TestingScenario<arrow::fs::AzureFlatNSEnv,true> (12108 ms)

I'm going to implement this by copying location and removing the trailing slashes manually.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 16, 2024
...instead of taking a bool parameter.
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 16, 2024
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 16, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 19, 2024
@felipecrv felipecrv requested a review from kou February 20, 2024 18:42
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

cpp/src/arrow/filesystem/azurefs.cc Outdated Show resolved Hide resolved
/// \pre location.path is not empty.
Status DeleteFileOnContainer(const Blobs::BlobContainerClient& container_client,
const AzureLocation& location, bool require_file_to_exist,
const char* operation) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we don't need to receive operation as an argument. How about defining it as a local variable instead of an argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might use Move() with it, so I will keep it.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Feb 20, 2024
@felipecrv felipecrv merged commit a2d0729 into apache:main Feb 21, 2024
32 of 33 checks passed
@felipecrv felipecrv removed the awaiting merge Awaiting merge label Feb 21, 2024
@felipecrv felipecrv deleted the azure_delete_blob branch February 21, 2024 01:04
@github-actions github-actions bot added the awaiting review Awaiting review label Feb 21, 2024
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit a2d0729.

There were 2 benchmark results with an error:

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Feb 28, 2024
…mespace storage accounts (apache#40075)

### Rationale for this change

It was not implemented yet.

### What changes are included in this PR?

 - An implementation of `DeleteFile()` that is specialized to storage accounts that don't have HNS support enabled
 - This fixes a semantic issue: deleting a file should not delete the parent directory when the file deleted was the last one
 - Increased test coverage
 - Fix of a bug in the version that deletes files in HNS-enabled accounts (we shouldn't let `DeleteFile` delete directories even if they are empty)

### Are these changes tested?

Yes. Tests were re-written and moved to `TestAzureFileSystemOnAllScenarios`.
* Closes: apache#40074

Lead-authored-by: Felipe Oliveira Carvalho <[email protected]>
Co-authored-by: jerry.adair <[email protected]>
Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
thisisnic pushed a commit to thisisnic/arrow that referenced this pull request Mar 8, 2024
…mespace storage accounts (apache#40075)

### Rationale for this change

It was not implemented yet.

### What changes are included in this PR?

 - An implementation of `DeleteFile()` that is specialized to storage accounts that don't have HNS support enabled
 - This fixes a semantic issue: deleting a file should not delete the parent directory when the file deleted was the last one
 - Increased test coverage
 - Fix of a bug in the version that deletes files in HNS-enabled accounts (we shouldn't let `DeleteFile` delete directories even if they are empty)

### Are these changes tested?

Yes. Tests were re-written and moved to `TestAzureFileSystemOnAllScenarios`.
* Closes: apache#40074

Lead-authored-by: Felipe Oliveira Carvalho <[email protected]>
Co-authored-by: jerry.adair <[email protected]>
Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][FS][Azure] Implement DeleteFile() for flat-namespace storage accounts
2 participants