Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38703: [C++][FS][Azure] Implement DeleteFile() #39840

Merged
merged 29 commits into from
Feb 8, 2024
Merged

GH-38703: [C++][FS][Azure] Implement DeleteFile() #39840

merged 29 commits into from
Feb 8, 2024

Conversation

av8or1
Copy link
Contributor

@av8or1 av8or1 commented Jan 29, 2024

Rationale for this change

DeleteFile() API isn't implemented yet.

What changes are included in this PR?

Implement DeleteFile() by the "Delete Blob" API: https://learn.microsoft.com/en-us/rest/api/storageservices/delete-blob

Are these changes tested?

I tested the modification by creating a file via the web browser on our internal ADLS, then ran a sample program that deleted the file. I added three regression tests to cover the use case scenarios of:

  • A valid delete attempt, where "valid" means that the file exists and is indeed a file
  • An intentional failure where a file delete is attempted, but the file does not exist
  • An intentional failure where a file delete is attempted, but the target is a container
  • An intentional failure where a file delete is attempted, but the target is a directory

Are there any user-facing changes?

Yes.

Modifications to support the deletion of a file.
@av8or1
Copy link
Contributor Author

av8or1 commented Jan 30, 2024

OK, I took a look at the two checks that had failures. I didn't see anything in the Python check that applied to my push, but I did find the following in the Dev / Lint:

cd /tmp/arrow-lint-p9tofs7a/cpp-build && /usr/local/bin/python /arrow/cpp/build-support/run_cpplint.py --cpplint_binary /arrow/cpp/build-support/cpplint.py --exclude_globs /arrow/cpp/build-support/lint_exclusions.txt --source_dir /arrow/cpp/src --source_dir /arrow/cpp/examples --source_dir /arrow/cpp/tools --quiet
/arrow/cpp/src/arrow/filesystem/azurefs.cc:1693:  Weird number of spaces at line-start.  Are you using a 2-space indent?  [whitespace/indent] [3]
Done processing /arrow/cpp/src/arrow/filesystem/azurefs.cc
Total errors found: 1

/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:1391:  Lines should be <= 90 characters long  [whitespace/line_length] [2]
/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:1399:  Lines should be <= 90 characters long  [whitespace/line_length] [2]
/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:1899:  Redundant blank line at the end of a code block should be deleted.  [whitespace/blank_line] [3]
/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:1901:  Namespace should be terminated with "// namespace fs"  [readability/namespace] [5]
/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:61:  Failed to find complete declaration of namespace arrow  [build/namespaces] [5]
Done processing /arrow/cpp/src/arrow/filesystem/azurefs_test.cc
Total errors found: 5

ninja: build stopped: subcommand failed.

I have made corrections for these and will update. I haven't done that in a Github context, but I'll figure it out. I'll post again when that is successful. Thanks!

av8or1 and others added 3 commits January 30, 2024 14:25
Modifications to support the deletion of a file.
Updated azurefs.cc and azurefs_test.cc to address
Lint issues.
…ete-file-branch

GH-38703: [C++][FS][Azure] Implement DeleteFile() #39840
@av8or1
Copy link
Contributor Author

av8or1 commented Jan 30, 2024

Alright, I have committed the changes that I mentioned earlier today. These changes address the spacing issue for the DeleteFile() method, the line length issue in the regression tests that I wrote, as well as some housekeeping that was located at the end of the azurefs_test.cc file that I did not write (there was an extra space between the closing brace of a namespace and the beginning of the commentary to note the end of that namespace ... I deleted the extra space as well as the extra blank line above them, per the Linter's feedback). Now waiting for further word regarding the contribution. Thanks!

@av8or1
Copy link
Contributor Author

av8or1 commented Jan 30, 2024

The CI AppVeyor states that there was a failure, but I saw the following in the "Details":

2620 100% tests passed, 0 tests failed out of 110

Thus it's not clear where the issue lies here. I'll defer to kou for felipecrv for guidance.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix style?
https://github.com/apache/arrow/actions/runs/7716487109/job/21045075365?pr=39840#step:5:795

cmake --build BUILD_DIRECTORY format will fix style.

Comment on lines 1694 to 1696
if (location.path.empty()) {
return Status::Invalid("Cannot delete an empty path");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use ValidateFileLocation() instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done. Those three lines have been replaced with:
RETURN_NOT_OK(ValidateFileLocation(location));

Comment on lines 1390 to 1397
void CreateFile(FileSystem* fs, const std::string& path, const std::string& data) {
const auto file_name = PreexistingData::RandomFileName() +
PreexistingData::RandomFileNameExtension();
ASSERT_OK(CreateFile(fs_.get(), file_name, "abc"));
arrow::fs::AssertFileInfo(fs_.get(), file_name, FileType::File);
ASSERT_OK(fs_->DeleteFile(file_name));
arrow::fs::AssertFileInfo(fs_.get(), file_name, FileType::NotFound);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Is this a valid test? (Why do you define a function in TEST_F() {...}?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not certain that I understand the question. The idea behind this test was to create a file to delete, then delete it. Thus the pseudo-code was:

  1. Create a filename
  2. Create the file and verify that the file creation was successful
  3. Verify that the file type was a file (could be removed if considered redundant)
  4. Delete the file and verify that the delete attempt did not assert
  5. Verify that the file is not found (because it has been deleted)
    As for the functions RandomFileName() and RandomFileNameExtension(), I created those to clean the code a bit and placed them with the RandomChars() method. This is just a choice, to abstract the implementation details into a function. It doesn't have to be that way. The content of those two methods could be placed inline at lines 1391 and 1392 and these two methods deleted if that is the preferred manner of going about it. Open to suggestion/guidance/feedback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the choice to create the RandomFileName() and RandomFileNameExtension() methods was done in part because I prefer abstractions (generally) and also because there was a similar abstraction for the container name, which is found at line 369, thus:
static std::string RandomContainerName(RNG& rng) { return RandomChars(32, rng); }
And so I decided to follow suit by creating abstractions for file names, just as there was one for container names. Again, I'll pull those back inline since that seems to be the preference.

Copy link
Contributor Author

@av8or1 av8or1 Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, now I understand why kou asked why I defined a function in TEST_F() {...} . The short answer is that this was a lack of deletion error on my part. I had copied the definition of the CreateFile() method temporarily while I wrote the test to ensure that I wrote the invocation parms correctly. Then I simply forgot to delete the reference. Apologies. I have deleted that from the most recent version. Good catch, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, due to the structure of how the RandomChars() function is written within the PreexistingData class, I'll have to leave the random-name functionality that I need within this class. However, I'll take kou's suggestion and combine the two methods, as well as changing the name to blob.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we create random containers is to not have test runs interfere with later runs when the tests crash and containers can't be deleted from the storage account. The blobs inside the containers don't have to be random. Every test case is creating a new random container that you can freely add named blobs to.

arrow::fs::AssertFileInfo(fs_.get(), file_name, FileType::NotFound);
}

TEST_F(AzuriteFileSystemTest, DeleteFileSuccessNonexistent) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Success -> Failure?

Comment on lines 379 to 381
std::string RandomFileName(RNG &rng) { return RandomChars(10, rng); }

std::string RandomFileNameExtension(RNG &rng) { return RandomChars(3, rng); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to define separated functions?
Can RandomFileName() generate XXXXXXXXXX.ZZZ?

Copy link
Contributor Author

@av8or1 av8or1 Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it generates the filename only, sans extension. I suppose that it could be modified to generate XXXXXXX.ZZZ however, if that would be preferred. See commentary above. I could just place everything inline and delete these functions if that is the route that is recommended. I'm open to suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined.

@@ -376,6 +376,10 @@ culpa qui officia deserunt mollit anim id est laborum.
return s;
}

std::string RandomFileName(RNG &rng) { return RandomChars(10, rng); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use "blob" not "file" because Azure Blob Storage uses "blob"?

Copy link
Contributor Author

@av8or1 av8or1 Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can, yes. Note that my use case will be to access ADLS, which is why I referred to this as a file. Also because the method is named DeleteFile(), not DeleteBlob().

arrow::fs::AssertFileInfo(fs_.get(), file_name, FileType::NotFound);
}

TEST_F(AzuriteFileSystemTest, DeleteFileSuccessNotAFile) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Comment on lines 1411 to 1412
ASSERT_OK(fs_->DeleteDir(container_name));
arrow::fs::AssertFileInfo(fs_.get(), container_name, FileType::NotFound);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need them for DeleteFile() test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point to this test was to attempt to delete a target that was not a file. So I created a directory (container), then attempted to delete it, which should fail, which I verify via:
ASSERT_RAISES(IOError, fs_->DeleteFile(container_name));
At line 1410. Then I clean up the mess by deleting the directory (container) via lines 1411 and 1412 (which is just a verification that the directory (container) was deleted).
If there is a different approach to this type of test that would be preferred, let me know.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Jan 31, 2024
@av8or1
Copy link
Contributor Author

av8or1 commented Jan 31, 2024

Could you fix style? https://github.com/apache/arrow/actions/runs/7716487109/job/21045075365?pr=39840#step:5:795

cmake --build BUILD_DIRECTORY format will fix style.

Sure, I have made the modifications recommended in that log (clang format). Specifically those found at lines 803-804 and 815-816.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jan 31, 2024
@av8or1
Copy link
Contributor Author

av8or1 commented Jan 31, 2024

Alright, I have updated both azurefs.cc and azurefs_test.cc to include the modifications based on the feedback from kou. Thank you for the guidance. I await further word.

av8or1 and others added 2 commits February 1, 2024 03:19
@av8or1
Copy link
Contributor Author

av8or1 commented Feb 1, 2024

I checked in tonight and found issues with azurefs_test.cc in the logs. So I have corrected those and updated the commit with the new modifications.

cpp/src/arrow/filesystem/azurefs_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs_test.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 1, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 1, 2024
@av8or1
Copy link
Contributor Author

av8or1 commented Feb 2, 2024

Could you run formatter?

https://github.com/apache/arrow/actions/runs/7744737025/job/21135532299?pr=39840#step:5:3012

--- /arrow/cpp/src/arrow/filesystem/azurefs_test.cc
+++ /arrow/cpp/src/arrow/filesystem/azurefs_test.cc (after clang format)
@@ -1887,5 +1887,5 @@
   ASSERT_RAISES(Invalid, stream->ReadAt(1, 1));
   ASSERT_RAISES(Invalid, stream->Seek(2));
 }
-} // namespace fs
-} // namespace arrow
+}  // namespace fs
+}  // namespace arrow
/arrow/cpp/src/arrow/filesystem/azurefs_test.cc had clang-format style issues

Yeah, funny thing that. During my first submission attempts, I encountered errors that complained about the fact that there were two spaces after the closing brace and prior to the namespace comments. Therefore I decided to delete one of those spaces on each line. After doing that, we see this error. It doesn't seem to like either implementation. I will however see what this formatter generates and then cross fingers. Thanks

Updated ~cpp/src/arrow/filesystem/azurefs_test.cc to include
an additional test for deleting files and cleaned the spacing
on the last two lines in the file.
@av8or1
Copy link
Contributor Author

av8or1 commented Feb 2, 2024

So the formatter changed the last two lines to include two spaces. And so I have resubmitted with the new test as recommended by kou, as well as this formatting change. We'll see what the checks produce.

Comment on lines 1386 to 1389
CreateFile(fs(), "abc", "data");
arrow::fs::AssertFileInfo(fs(), "abc", FileType::File);
ASSERT_OK(fs()->DeleteFile("abc"));
arrow::fs::AssertFileInfo(fs(), "abc", FileType::NotFound);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test failed:

https://github.com/apache/arrow/actions/runs/7761414467/job/21171628885?pr=39840#step:6:3519

[ RUN      ] TestAzuriteFileSystem.DeleteFileSuccess
/arrow/cpp/src/arrow/filesystem/test_util.cc:120: Failure
Failed
'_error_or_value24.status()' failed with IOError: Not a regular file: 'abc'
/arrow/cpp/src/arrow/filesystem/test_util.cc:143: Failure
Expected equality of these values:
  info.type()
    Which is: FileType::NotFound
  type
    Which is: FileType::File
For path 'abc'
/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:1388: Failure
Failed
'fs()->DeleteFile("abc")' failed with IOError: Not a regular file: 'abc'

Could you run tests on your local environment? If you're on Linux, the following will work (you also need to install Azurite by sudo npm install -g azurite):

cd CPP_BUILD_DIRECTORY
cmake --build .
LD_LIBRARY_PATH=$PWD/debug debug/arrow-azurefs-test

(Or you can use Docker instead.)

I think that we need to do something like the following:

const auto container_name = PreexistingData::RandomContainerName(rng_);
ASSERT_OK(fs()->CreateDir(container_name));
const auto file_name = ConcatAbstractPath(container_name, "abc");
CreateFile(fs(), file_name, "data");
arrow::fs::AssertFileInfo(fs(), file_name, FileType::File);
ASSERT_OK(fs()->DeleteFile(file_name));
arrow::fs::AssertFileInfo(fs(), file_name, FileType::NotFound);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I have incorporated that suggestion into the DeleteFileSuccess regression test. Thanks!
I did not see this failure now, but I will re-run again tomorrow to verify that I ran the tests correctly.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 2, 2024
}

TEST_F(TestAzuriteFileSystem, DeleteFileFailureNonexistent) {
ASSERT_RAISES(IOError, fs()->DeleteFile("nonexistent"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deletes a container not a blob. Could you create a container and delete an nonexistent blob in the container?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I have changed the test to the following:

TEST_F(TestAzuriteFileSystem, DeleteFileFailureNonexistent) {
  const auto container_name = PreexistingData::RandomContainerName(rng_);
  auto container_client = CreateContainer(container_name);
  const auto blob_path = ConcatAbstractPath(container_name, "someblob");
  ASSERT_RAISES(IOError, fs()->DeleteFile(blob_path));
}

@kou
Copy link
Member

kou commented Feb 2, 2024

I also updated the PR description. Could you check the change?

@kou
Copy link
Member

kou commented Feb 2, 2024

We need to use clang-format-14 for formatter.
See also: https://arrow.apache.org/docs/developers/cpp/development.html#code-style-linting-and-ci

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 7, 2024
Modified regression test per suggestion
Modified the deletion of a non-existent file regression test.
@av8or1
Copy link
Contributor Author

av8or1 commented Feb 7, 2024

I also updated the PR description. Could you check the change?

Good update, thanks!

@av8or1
Copy link
Contributor Author

av8or1 commented Feb 7, 2024

We need to use clang-format-14 for formatter. See also: https://arrow.apache.org/docs/developers/cpp/development.html#code-style-linting-and-ci

OK, I'll take a look, thanks!

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

cpp/src/arrow/filesystem/azurefs_test.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Feb 8, 2024
@av8or1
Copy link
Contributor Author

av8or1 commented Feb 8, 2024

kou, I have applied your suggestion to azurefs_test.cc and taken a look at the logs. I didn't see any failures related to the commits that I have proposed. Awaiting further word. Thanks!

@kou kou merged commit fa26fa5 into apache:main Feb 8, 2024
33 of 35 checks passed
@kou kou removed the awaiting merge Awaiting merge label Feb 8, 2024
@kou
Copy link
Member

kou commented Feb 8, 2024

Thanks!

Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit fa26fa5.

There were 4 benchmark results indicating a performance regression:

The full Conbench report has more details.

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
### Rationale for this change

`DeleteFile()` API isn't implemented yet.

### What changes are included in this PR?

Implement `DeleteFile()` by the "Delete Blob" API: https://learn.microsoft.com/en-us/rest/api/storageservices/delete-blob

### Are these changes tested?

I tested the modification by creating a file via the web browser on our internal ADLS, then ran a sample program that deleted the file. I added three regression tests to cover the use case scenarios of:

* A valid delete attempt, where "valid" means that the file exists and is indeed a file
* An intentional failure where a file delete is attempted, but the file does not exist
* An intentional failure where a file delete is attempted, but the target is a container
* An intentional failure where a file delete is attempted, but the target is a directory

### Are there any user-facing changes?

Yes.

* Closes: apache#38703

Lead-authored-by: av8or1 <[email protected]>
Co-authored-by: jerry.adair <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Feb 28, 2024
### Rationale for this change

`DeleteFile()` API isn't implemented yet.

### What changes are included in this PR?

Implement `DeleteFile()` by the "Delete Blob" API: https://learn.microsoft.com/en-us/rest/api/storageservices/delete-blob

### Are these changes tested?

I tested the modification by creating a file via the web browser on our internal ADLS, then ran a sample program that deleted the file. I added three regression tests to cover the use case scenarios of:

* A valid delete attempt, where "valid" means that the file exists and is indeed a file
* An intentional failure where a file delete is attempted, but the file does not exist
* An intentional failure where a file delete is attempted, but the target is a container
* An intentional failure where a file delete is attempted, but the target is a directory

### Are there any user-facing changes?

Yes.

* Closes: apache#38703

Lead-authored-by: av8or1 <[email protected]>
Co-authored-by: jerry.adair <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
thisisnic pushed a commit to thisisnic/arrow that referenced this pull request Mar 8, 2024
### Rationale for this change

`DeleteFile()` API isn't implemented yet.

### What changes are included in this PR?

Implement `DeleteFile()` by the "Delete Blob" API: https://learn.microsoft.com/en-us/rest/api/storageservices/delete-blob

### Are these changes tested?

I tested the modification by creating a file via the web browser on our internal ADLS, then ran a sample program that deleted the file. I added three regression tests to cover the use case scenarios of:

* A valid delete attempt, where "valid" means that the file exists and is indeed a file
* An intentional failure where a file delete is attempted, but the file does not exist
* An intentional failure where a file delete is attempted, but the target is a container
* An intentional failure where a file delete is attempted, but the target is a directory

### Are there any user-facing changes?

Yes.

* Closes: apache#38703

Lead-authored-by: av8or1 <[email protected]>
Co-authored-by: jerry.adair <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][FS][Azure] Implement DeleteFile()
3 participants