Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38703: [C++][FS][Azure] Implement DeleteFile() #39840

Merged
merged 29 commits into from
Feb 8, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
6947360
[C++][FS][Azure] Implement DeleteFile() #38703
av8or1 Jan 29, 2024
7df1f8d
GH-38703: [C++][FS][Azure] Implement DeleteFile() #39840
av8or1 Jan 29, 2024
a08d5a1
Merge branch 'delete-file-branch' of github.com:av8or1/arrow into del…
av8or1 Jan 30, 2024
79e6206
Merge branch 'apache:main' into delete-file-branch
av8or1 Jan 30, 2024
8559881
Merge branch 'delete-file-branch' of github.com:av8or1/arrow into del…
av8or1 Jan 30, 2024
6e082dd
Merge branch 'delete-file-branch' of github.com:av8or1/arrow into del…
av8or1 Jan 31, 2024
4863b20
Merge branch 'apache:main' into delete-file-branch
av8or1 Jan 31, 2024
fdde91a
Updated the RandomBlobName() function.
av8or1 Jan 31, 2024
b812aaa
Merge branch 'delete-file-branch' of github.com:av8or1/arrow into del…
av8or1 Jan 31, 2024
3330284
GH-38703: [C++][FS][Azure] Implement DeleteFile()
av8or1 Feb 1, 2024
b89a07a
Merge branch 'apache:main' into delete-file-branch
av8or1 Feb 1, 2024
4b1c635
Update cpp/src/arrow/filesystem/azurefs_test.cc
av8or1 Feb 1, 2024
516be74
Update cpp/src/arrow/filesystem/azurefs_test.cc
av8or1 Feb 1, 2024
6c0fa18
Update cpp/src/arrow/filesystem/azurefs_test.cc
av8or1 Feb 1, 2024
110932b
GH-38703: [C++][FS][Azure] Implement DeleteFile()
av8or1 Feb 1, 2024
953da8e
GH-38703: [C++][FS][Azure] Implement DeleteFile()
av8or1 Feb 1, 2024
6512c12
Merge branch 'apache:main' into delete-file-branch
av8or1 Feb 1, 2024
d2e6f9b
GH-38703: [C++][FS][Azure] Implement DeleteFile()
av8or1 Feb 1, 2024
58888b4
Merge branch 'delete-file-branch' of github.com:av8or1/arrow into del…
av8or1 Feb 1, 2024
5af7159
Merge branch 'apache:main' into delete-file-branch
av8or1 Feb 1, 2024
b96af56
Update cpp/src/arrow/filesystem/azurefs_test.cc
av8or1 Feb 2, 2024
79e80b7
Merge branch 'apache:main' into delete-file-branch
av8or1 Feb 2, 2024
022bd57
GH-38703: [C++][FS][Azure] Implement DeleteFile()
av8or1 Feb 2, 2024
58c4ccd
Merge branch 'apache:main' into delete-file-branch
av8or1 Feb 2, 2024
3cce85c
GH-38703: [C++][FS][Azure] Implement DeleteFile()
av8or1 Feb 2, 2024
168356e
Merge branch 'apache:main' into delete-file-branch
av8or1 Feb 7, 2024
dd60704
GH-38703: [C++][FS][Azure] Implement DeleteFile()
av8or1 Feb 7, 2024
39eb5ae
GH-38703: [C++][FS][Azure] Implement DeleteFile()
av8or1 Feb 7, 2024
fc7ada2
Update cpp/src/arrow/filesystem/azurefs_test.cc
av8or1 Feb 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 28 additions & 1 deletion cpp/src/arrow/filesystem/azurefs.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1690,6 +1690,32 @@ class AzureFileSystem::Impl {
}
}

Status DeleteFile(const AzureLocation& location) {
if (location.path.empty()) {
return Status::Invalid("Cannot delete an empty path");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use ValidateFileLocation() instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done. Those three lines have been replaced with:
RETURN_NOT_OK(ValidateFileLocation(location));


auto file_client =
datalake_service_client_->GetFileSystemClient(location.container)
.GetFileClient(location.path);

try {
auto response = file_client.Delete();
// Only the "*IfExists" functions ever set Deleted to false.
// All the others either succeed or throw an exception.
DCHECK(response.Value.Deleted);
} catch (const Storage::StorageException& exception) {
if (exception.ErrorCode == "FilesystemNotFound" ||
exception.ErrorCode == "PathNotFound") {
return PathNotFound(location);
}
return ExceptionToStatus(exception,
"Failed to delete a file: ", location.path, ": ",
file_client.GetUrl());
}
return Status::OK();
}

Status CopyFile(const AzureLocation& src, const AzureLocation& dest) {
RETURN_NOT_OK(ValidateFileLocation(src));
RETURN_NOT_OK(ValidateFileLocation(dest));
Expand Down Expand Up @@ -1875,7 +1901,8 @@ Status AzureFileSystem::DeleteRootDirContents() {
}

Status AzureFileSystem::DeleteFile(const std::string& path) {
return Status::NotImplemented("The Azure FileSystem is not fully implemented");
ARROW_ASSIGN_OR_RAISE(auto location, AzureLocation::FromString(path));
return impl_->DeleteFile(location);
}

Status AzureFileSystem::Move(const std::string& src, const std::string& dest) {
Expand Down
35 changes: 32 additions & 3 deletions cpp/src/arrow/filesystem/azurefs_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -376,6 +376,10 @@ culpa qui officia deserunt mollit anim id est laborum.
return s;
}

std::string RandomFileName(RNG &rng) { return RandomChars(10, rng); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use "blob" not "file" because Azure Blob Storage uses "blob"?

Copy link
Contributor Author

@av8or1 av8or1 Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can, yes. Note that my use case will be to access ADLS, which is why I referred to this as a file. Also because the method is named DeleteFile(), not DeleteBlob().


std::string RandomFileNameExtension(RNG &rng) { return RandomChars(3, rng); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to define separated functions?
Can RandomFileName() generate XXXXXXXXXX.ZZZ?

Copy link
Contributor Author

@av8or1 av8or1 Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it generates the filename only, sans extension. I suppose that it could be modified to generate XXXXXXX.ZZZ however, if that would be preferred. See commentary above. I could just place everything inline and delete these functions if that is the route that is recommended. I'm open to suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined.


static int RandomIndex(int end, RNG& rng) {
return std::uniform_int_distribution<int>(0, end - 1)(rng);
}
Expand Down Expand Up @@ -1382,6 +1386,32 @@ TEST_F(TestAzuriteFileSystem, DeleteDirContentsFailureNonexistent) {
this->TestDeleteDirContentsFailureNonexistent();
}

TEST_F(AzuriteFileSystemTest, DeleteFileSuccessHaveFile) {
void CreateFile(FileSystem* fs, const std::string& path, const std::string& data) {
const auto file_name = PreexistingData::RandomFileName() +
PreexistingData::RandomFileNameExtension();
ASSERT_OK(CreateFile(fs_.get(), file_name, "abc"));
arrow::fs::AssertFileInfo(fs_.get(), file_name, FileType::File);
ASSERT_OK(fs_->DeleteFile(file_name));
arrow::fs::AssertFileInfo(fs_.get(), file_name, FileType::NotFound);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Is this a valid test? (Why do you define a function in TEST_F() {...}?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not certain that I understand the question. The idea behind this test was to create a file to delete, then delete it. Thus the pseudo-code was:

  1. Create a filename
  2. Create the file and verify that the file creation was successful
  3. Verify that the file type was a file (could be removed if considered redundant)
  4. Delete the file and verify that the delete attempt did not assert
  5. Verify that the file is not found (because it has been deleted)
    As for the functions RandomFileName() and RandomFileNameExtension(), I created those to clean the code a bit and placed them with the RandomChars() method. This is just a choice, to abstract the implementation details into a function. It doesn't have to be that way. The content of those two methods could be placed inline at lines 1391 and 1392 and these two methods deleted if that is the preferred manner of going about it. Open to suggestion/guidance/feedback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the choice to create the RandomFileName() and RandomFileNameExtension() methods was done in part because I prefer abstractions (generally) and also because there was a similar abstraction for the container name, which is found at line 369, thus:
static std::string RandomContainerName(RNG& rng) { return RandomChars(32, rng); }
And so I decided to follow suit by creating abstractions for file names, just as there was one for container names. Again, I'll pull those back inline since that seems to be the preference.

Copy link
Contributor Author

@av8or1 av8or1 Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, now I understand why kou asked why I defined a function in TEST_F() {...} . The short answer is that this was a lack of deletion error on my part. I had copied the definition of the CreateFile() method temporarily while I wrote the test to ensure that I wrote the invocation parms correctly. Then I simply forgot to delete the reference. Apologies. I have deleted that from the most recent version. Good catch, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, due to the structure of how the RandomChars() function is written within the PreexistingData class, I'll have to leave the random-name functionality that I need within this class. However, I'll take kou's suggestion and combine the two methods, as well as changing the name to blob.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we create random containers is to not have test runs interfere with later runs when the tests crash and containers can't be deleted from the storage account. The blobs inside the containers don't have to be random. Every test case is creating a new random container that you can freely add named blobs to.


TEST_F(AzuriteFileSystemTest, DeleteFileSuccessNonexistent) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Success -> Failure?

const auto file_name = PreexistingData::RandomFileName() +
PreexistingData::RandomFileNameExtension();
ASSERT_OK(fs_->DeleteFile(file_name));
arrow::fs::AssertFileInfo(fs_.get(), file_name, FileType::NotFound);
}

TEST_F(AzuriteFileSystemTest, DeleteFileSuccessNotAFile) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

const auto container_name = RandomContainerName();
ASSERT_OK(fs_->CreateDir(container_name));
arrow::fs::AssertFileInfo(fs_.get(), container_name, FileType::Directory);
ASSERT_RAISES(IOError, fs_->DeleteFile(container_name));
ASSERT_OK(fs_->DeleteDir(container_name));
arrow::fs::AssertFileInfo(fs_.get(), container_name, FileType::NotFound);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need them for DeleteFile() test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point to this test was to attempt to delete a target that was not a file. So I created a directory (container), then attempted to delete it, which should fail, which I verify via:
ASSERT_RAISES(IOError, fs_->DeleteFile(container_name));
At line 1410. Then I clean up the mess by deleting the directory (container) via lines 1411 and 1412 (which is just a verification that the directory (container) was deleted).
If there is a different approach to this type of test that would be preferred, let me know.

}

av8or1 marked this conversation as resolved.
Show resolved Hide resolved
TEST_F(TestAzuriteFileSystem, CopyFileSuccessDestinationNonexistent) {
auto data = SetUpPreexistingData();
const auto destination_path = data.ContainerPath("copy-destionation");
Expand Down Expand Up @@ -1868,6 +1898,5 @@ TEST_F(TestAzuriteFileSystem, OpenInputFileClosed) {
ASSERT_RAISES(Invalid, stream->ReadAt(1, 1));
ASSERT_RAISES(Invalid, stream->Seek(2));
}

} // namespace fs
} // namespace arrow
} // namespace fs
} // namespace arrow
Loading