-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-38703: [C++][FS][Azure] Implement DeleteFile() #39840
Conversation
Modifications to support the deletion of a file.
OK, I took a look at the two checks that had failures. I didn't see anything in the Python check that applied to my push, but I did find the following in the Dev / Lint:
I have made corrections for these and will update. I haven't done that in a Github context, but I'll figure it out. I'll post again when that is successful. Thanks! |
Modifications to support the deletion of a file. Updated azurefs.cc and azurefs_test.cc to address Lint issues.
Alright, I have committed the changes that I mentioned earlier today. These changes address the spacing issue for the DeleteFile() method, the line length issue in the regression tests that I wrote, as well as some housekeeping that was located at the end of the azurefs_test.cc file that I did not write (there was an extra space between the closing brace of a namespace and the beginning of the commentary to note the end of that namespace ... I deleted the extra space as well as the extra blank line above them, per the Linter's feedback). Now waiting for further word regarding the contribution. Thanks! |
The CI AppVeyor states that there was a failure, but I saw the following in the "Details": 2620 100% tests passed, 0 tests failed out of 110 Thus it's not clear where the issue lies here. I'll defer to kou for felipecrv for guidance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix style?
https://github.com/apache/arrow/actions/runs/7716487109/job/21045075365?pr=39840#step:5:795
cmake --build BUILD_DIRECTORY format
will fix style.
cpp/src/arrow/filesystem/azurefs.cc
Outdated
if (location.path.empty()) { | ||
return Status::Invalid("Cannot delete an empty path"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use ValidateFileLocation()
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done. Those three lines have been replaced with:
RETURN_NOT_OK(ValidateFileLocation(location));
void CreateFile(FileSystem* fs, const std::string& path, const std::string& data) { | ||
const auto file_name = PreexistingData::RandomFileName() + | ||
PreexistingData::RandomFileNameExtension(); | ||
ASSERT_OK(CreateFile(fs_.get(), file_name, "abc")); | ||
arrow::fs::AssertFileInfo(fs_.get(), file_name, FileType::File); | ||
ASSERT_OK(fs_->DeleteFile(file_name)); | ||
arrow::fs::AssertFileInfo(fs_.get(), file_name, FileType::NotFound); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Is this a valid test? (Why do you define a function in TEST_F() {...}
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not certain that I understand the question. The idea behind this test was to create a file to delete, then delete it. Thus the pseudo-code was:
- Create a filename
- Create the file and verify that the file creation was successful
- Verify that the file type was a file (could be removed if considered redundant)
- Delete the file and verify that the delete attempt did not assert
- Verify that the file is not found (because it has been deleted)
As for the functions RandomFileName() and RandomFileNameExtension(), I created those to clean the code a bit and placed them with the RandomChars() method. This is just a choice, to abstract the implementation details into a function. It doesn't have to be that way. The content of those two methods could be placed inline at lines 1391 and 1392 and these two methods deleted if that is the preferred manner of going about it. Open to suggestion/guidance/feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the choice to create the RandomFileName() and RandomFileNameExtension() methods was done in part because I prefer abstractions (generally) and also because there was a similar abstraction for the container name, which is found at line 369, thus:
static std::string RandomContainerName(RNG& rng) { return RandomChars(32, rng); }
And so I decided to follow suit by creating abstractions for file names, just as there was one for container names. Again, I'll pull those back inline since that seems to be the preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, now I understand why kou asked why I defined a function in TEST_F() {...}
. The short answer is that this was a lack of deletion error on my part. I had copied the definition of the CreateFile() method temporarily while I wrote the test to ensure that I wrote the invocation parms correctly. Then I simply forgot to delete the reference. Apologies. I have deleted that from the most recent version. Good catch, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, due to the structure of how the RandomChars() function is written within the PreexistingData class, I'll have to leave the random-name functionality that I need within this class. However, I'll take kou's suggestion and combine the two methods, as well as changing the name to blob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we create random containers is to not have test runs interfere with later runs when the tests crash and containers can't be deleted from the storage account. The blobs inside the containers don't have to be random. Every test case is creating a new random container that you can freely add named blobs to.
arrow::fs::AssertFileInfo(fs_.get(), file_name, FileType::NotFound); | ||
} | ||
|
||
TEST_F(AzuriteFileSystemTest, DeleteFileSuccessNonexistent) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Success
-> Failure
?
std::string RandomFileName(RNG &rng) { return RandomChars(10, rng); } | ||
|
||
std::string RandomFileNameExtension(RNG &rng) { return RandomChars(3, rng); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to define separated functions?
Can RandomFileName()
generate XXXXXXXXXX.ZZZ
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it generates the filename only, sans extension. I suppose that it could be modified to generate XXXXXXX.ZZZ however, if that would be preferred. See commentary above. I could just place everything inline and delete these functions if that is the route that is recommended. I'm open to suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combined.
@@ -376,6 +376,10 @@ culpa qui officia deserunt mollit anim id est laborum. | |||
return s; | |||
} | |||
|
|||
std::string RandomFileName(RNG &rng) { return RandomChars(10, rng); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use "blob" not "file" because Azure Blob Storage uses "blob"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can, yes. Note that my use case will be to access ADLS, which is why I referred to this as a file. Also because the method is named DeleteFile(), not DeleteBlob().
arrow::fs::AssertFileInfo(fs_.get(), file_name, FileType::NotFound); | ||
} | ||
|
||
TEST_F(AzuriteFileSystemTest, DeleteFileSuccessNotAFile) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
ASSERT_OK(fs_->DeleteDir(container_name)); | ||
arrow::fs::AssertFileInfo(fs_.get(), container_name, FileType::NotFound); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need them for DeleteFile()
test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point to this test was to attempt to delete a target that was not a file. So I created a directory (container), then attempted to delete it, which should fail, which I verify via:
ASSERT_RAISES(IOError, fs_->DeleteFile(container_name));
At line 1410. Then I clean up the mess by deleting the directory (container) via lines 1411 and 1412 (which is just a verification that the directory (container) was deleted).
If there is a different approach to this type of test that would be preferred, let me know.
Sure, I have made the modifications recommended in that log (clang format). Specifically those found at lines 803-804 and 815-816. |
Alright, I have updated both azurefs.cc and azurefs_test.cc to include the modifications based on the feedback from kou. Thank you for the guidance. I await further word. |
Cleaned issues with azurefs_test.cc that were found in the logs.
I checked in tonight and found issues with azurefs_test.cc in the logs. So I have corrected those and updated the commit with the new modifications. |
Co-authored-by: Sutou Kouhei <[email protected]>
Yeah, funny thing that. During my first submission attempts, I encountered errors that complained about the fact that there were two spaces after the closing brace and prior to the namespace comments. Therefore I decided to delete one of those spaces on each line. After doing that, we see this error. It doesn't seem to like either implementation. I will however see what this formatter generates and then cross fingers. Thanks |
Updated ~cpp/src/arrow/filesystem/azurefs_test.cc to include an additional test for deleting files and cleaned the spacing on the last two lines in the file.
So the formatter changed the last two lines to include two spaces. And so I have resubmitted with the new test as recommended by kou, as well as this formatting change. We'll see what the checks produce. |
Updated the new regression test.
CreateFile(fs(), "abc", "data"); | ||
arrow::fs::AssertFileInfo(fs(), "abc", FileType::File); | ||
ASSERT_OK(fs()->DeleteFile("abc")); | ||
arrow::fs::AssertFileInfo(fs(), "abc", FileType::NotFound); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test failed:
[ RUN ] TestAzuriteFileSystem.DeleteFileSuccess
/arrow/cpp/src/arrow/filesystem/test_util.cc:120: Failure
Failed
'_error_or_value24.status()' failed with IOError: Not a regular file: 'abc'
/arrow/cpp/src/arrow/filesystem/test_util.cc:143: Failure
Expected equality of these values:
info.type()
Which is: FileType::NotFound
type
Which is: FileType::File
For path 'abc'
/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:1388: Failure
Failed
'fs()->DeleteFile("abc")' failed with IOError: Not a regular file: 'abc'
Could you run tests on your local environment? If you're on Linux, the following will work (you also need to install Azurite by sudo npm install -g azurite
):
cd CPP_BUILD_DIRECTORY
cmake --build .
LD_LIBRARY_PATH=$PWD/debug debug/arrow-azurefs-test
(Or you can use Docker instead.)
I think that we need to do something like the following:
const auto container_name = PreexistingData::RandomContainerName(rng_);
ASSERT_OK(fs()->CreateDir(container_name));
const auto file_name = ConcatAbstractPath(container_name, "abc");
CreateFile(fs(), file_name, "data");
arrow::fs::AssertFileInfo(fs(), file_name, FileType::File);
ASSERT_OK(fs()->DeleteFile(file_name));
arrow::fs::AssertFileInfo(fs(), file_name, FileType::NotFound);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also:
- https://arrow.apache.org/docs/developers/cpp/development.html#running-unit-tests and related documents
- https://arrow.apache.org/docs/developers/continuous_integration/docker.html to run
UBUNTU=22.04 archery docker run ubuntu-cpp-sanitizer
that is used for the "AMD64 Ubuntu 22.04 C++ ASAN UBSAN" CI job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I have incorporated that suggestion into the DeleteFileSuccess regression test. Thanks!
I did not see this failure now, but I will re-run again tomorrow to verify that I ran the tests correctly.
} | ||
|
||
TEST_F(TestAzuriteFileSystem, DeleteFileFailureNonexistent) { | ||
ASSERT_RAISES(IOError, fs()->DeleteFile("nonexistent")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deletes a container not a blob. Could you create a container and delete an nonexistent blob in the container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I have changed the test to the following:
TEST_F(TestAzuriteFileSystem, DeleteFileFailureNonexistent) {
const auto container_name = PreexistingData::RandomContainerName(rng_);
auto container_client = CreateContainer(container_name);
const auto blob_path = ConcatAbstractPath(container_name, "someblob");
ASSERT_RAISES(IOError, fs()->DeleteFile(blob_path));
}
I also updated the PR description. Could you check the change? |
We need to use |
Modified regression test per suggestion
Modified the deletion of a non-existent file regression test.
Good update, thanks! |
OK, I'll take a look, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Co-authored-by: Sutou Kouhei <[email protected]>
kou, I have applied your suggestion to azurefs_test.cc and taken a look at the logs. I didn't see any failures related to the commits that I have proposed. Awaiting further word. Thanks! |
Thanks! |
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit fa26fa5. There were 4 benchmark results indicating a performance regression:
The full Conbench report has more details. |
### Rationale for this change `DeleteFile()` API isn't implemented yet. ### What changes are included in this PR? Implement `DeleteFile()` by the "Delete Blob" API: https://learn.microsoft.com/en-us/rest/api/storageservices/delete-blob ### Are these changes tested? I tested the modification by creating a file via the web browser on our internal ADLS, then ran a sample program that deleted the file. I added three regression tests to cover the use case scenarios of: * A valid delete attempt, where "valid" means that the file exists and is indeed a file * An intentional failure where a file delete is attempted, but the file does not exist * An intentional failure where a file delete is attempted, but the target is a container * An intentional failure where a file delete is attempted, but the target is a directory ### Are there any user-facing changes? Yes. * Closes: apache#38703 Lead-authored-by: av8or1 <[email protected]> Co-authored-by: jerry.adair <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
### Rationale for this change `DeleteFile()` API isn't implemented yet. ### What changes are included in this PR? Implement `DeleteFile()` by the "Delete Blob" API: https://learn.microsoft.com/en-us/rest/api/storageservices/delete-blob ### Are these changes tested? I tested the modification by creating a file via the web browser on our internal ADLS, then ran a sample program that deleted the file. I added three regression tests to cover the use case scenarios of: * A valid delete attempt, where "valid" means that the file exists and is indeed a file * An intentional failure where a file delete is attempted, but the file does not exist * An intentional failure where a file delete is attempted, but the target is a container * An intentional failure where a file delete is attempted, but the target is a directory ### Are there any user-facing changes? Yes. * Closes: apache#38703 Lead-authored-by: av8or1 <[email protected]> Co-authored-by: jerry.adair <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
### Rationale for this change `DeleteFile()` API isn't implemented yet. ### What changes are included in this PR? Implement `DeleteFile()` by the "Delete Blob" API: https://learn.microsoft.com/en-us/rest/api/storageservices/delete-blob ### Are these changes tested? I tested the modification by creating a file via the web browser on our internal ADLS, then ran a sample program that deleted the file. I added three regression tests to cover the use case scenarios of: * A valid delete attempt, where "valid" means that the file exists and is indeed a file * An intentional failure where a file delete is attempted, but the file does not exist * An intentional failure where a file delete is attempted, but the target is a container * An intentional failure where a file delete is attempted, but the target is a directory ### Are there any user-facing changes? Yes. * Closes: apache#38703 Lead-authored-by: av8or1 <[email protected]> Co-authored-by: jerry.adair <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Rationale for this change
DeleteFile()
API isn't implemented yet.What changes are included in this PR?
Implement
DeleteFile()
by the "Delete Blob" API: https://learn.microsoft.com/en-us/rest/api/storageservices/delete-blobAre these changes tested?
I tested the modification by creating a file via the web browser on our internal ADLS, then ran a sample program that deleted the file. I added three regression tests to cover the use case scenarios of:
Are there any user-facing changes?
Yes.
DeleteFile()
#38703