Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-39069: [C++][FS][Azure] Use the generic filesystem tests #40567

Merged
merged 10 commits into from
Apr 1, 2024

Conversation

kou
Copy link
Member

@kou kou commented Mar 15, 2024

Rationale for this change

We should provide common spec for all filesystem API.

What changes are included in this PR?

Enable the generic filesystem tests.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

Copy link

⚠️ GitHub issue #39069 has been automatically assigned in GitHub to PR creator.

@kou
Copy link
Member Author

kou commented Mar 15, 2024

We need to fix some methods:

https://github.com/apache/arrow/actions/runs/8292331910/job/22693446087?pr=40567#step:6:6004

[----------] 26 tests from TestAzureFileSystemGeneric
...
[ RUN      ] TestAzureFileSystemGeneric.CreateDir
/arrow/cpp/src/arrow/filesystem/test_util.cc:244: Failure
Failed
Expected 'fs->CreateDir("AB/def/EF/GH", true )' to fail with IOError, but got OK
[  FAILED  ] TestAzureFileSystemGeneric.CreateDir (107 ms)
[ RUN      ] TestAzureFileSystemGeneric.DeleteDir
/arrow/cpp/src/arrow/filesystem/test_util.cc:77: Failure
Expected equality of these values:
  paths
    Which is: { "AB" }
  expected_paths
    Which is: { "AB", "AB/GH" }
Errors while running CTest
/arrow/cpp/src/arrow/filesystem/test_util.cc:77: Failure
Expected equality of these values:
  paths
    Which is: { "AB" }
  expected_paths
    Which is: { "AB", "AB/GH" }
/arrow/cpp/src/arrow/filesystem/test_util.cc:77: Failure
Expected equality of these values:
  paths
    Which is: { "AB" }
  expected_paths
    Which is: { "AB", "AB/GH" }
[  FAILED  ] TestAzureFileSystemGeneric.DeleteDir (153 ms)
[ RUN      ] TestAzureFileSystemGeneric.DeleteDirContents
/arrow/cpp/src/arrow/filesystem/test_util.cc:307: Failure
Failed
Expected 'fs->DeleteDirContents("abc", true)' to fail with IOError, but got OK
[  FAILED  ] TestAzureFileSystemGeneric.DeleteDirContents (149 ms)
...
[ RUN      ] TestAzureFileSystemGeneric.MoveFile
/arrow/cpp/src/arrow/filesystem/test_util.cc:396: Failure
Failed
'fs->Move("abc", "def")' failed with NotImplemented: FileSystem::Move() is not implemented for Azure Storage accounts without Hierarchical Namespace support (see arrow/issues/40405).
[  FAILED  ] TestAzureFileSystemGeneric.MoveFile (34 ms)
...
[ RUN      ] TestAzureFileSystemGeneric.CopyFile
/arrow/cpp/src/arrow/filesystem/test_util.cc:570: Failure
Failed
Expected 'fs->CopyFile("AB/abc", "def/mno")' to fail with IOError, but got OK
[  FAILED  ] TestAzureFileSystemGeneric.CopyFile (164 ms)
...
[ RUN      ] TestAzureFileSystemGeneric.OpenOutputStream
/arrow/cpp/src/arrow/filesystem/test_util.cc:922: Failure
Expected equality of these values:
  "x-arrow/filesystem-test"
  _actual
    Which is: "application/octet-stream"
[  FAILED  ] TestAzureFileSystemGeneric.OpenOutputStream (71 ms)
...
[----------] 26 tests from TestAzureFileSystemGeneric (2670 ms total)

@kou
Copy link
Member Author

kou commented Mar 15, 2024

I've changed the current implementation to pass the generic filesystem tests but it's still dirty...

if (!recursive) {
if (recursive) {
std::vector<AzureLocation> target_locations;
// Recursive CreateDir calls require there is no file in parents.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearer wording:

// Recursive CreateDir calls require that all path segments be either a directory or not found

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Applied!

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Mar 17, 2024
if (info.type() == FileType::NotFound) {
target_locations.push_back(parent);
}
parent = parent.parent();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better name for the parent variable is prefix because a "prefix" can be the entire location whereas starting with parent = location; is very weird because of the choice of variable name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I've changed it to prefix.

Comment on lines +1840 to +1856
return ExceptionToStatus(exception, "Failed to create directory '",
location.all, "': ", container_client.GetUrl());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the first iteration of this loop, an exception shouldn't report a failure of the entire operation: the full path creates all the prefixes (they are "implied directories"). All the other functions handle implied directories, so it should be OK to rely on them.

Comment on lines +171 to +172
// - Whether the filesystem allows moving a file
virtual bool allow_move_file() const { return true; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a way to run these generic tests against the real Azure Data Lake environment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added support for running with AzureFlatNSEnv and AzureHierarchicalNSEnv.

@kou kou force-pushed the cpp-azure-generic-test branch from 78e5a1a to d4dd4f1 Compare March 28, 2024 04:29
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 28, 2024
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting review Awaiting review and removed awaiting change review Awaiting change review awaiting review Awaiting review awaiting changes Awaiting changes labels Mar 28, 2024
@kou
Copy link
Member Author

kou commented Mar 28, 2024

The generic filesystem tests are passed with Azurite but failed with hierarchical name space enabled Azure account:

[==========] Running 26 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 26 tests from TestAzureHierarchicalNSGeneric
[ RUN      ] TestAzureHierarchicalNSGeneric.Empty
[       OK ] TestAzureHierarchicalNSGeneric.Empty (227 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.NormalizePath
[       OK ] TestAzureHierarchicalNSGeneric.NormalizePath (38 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.CreateDir
[       OK ] TestAzureHierarchicalNSGeneric.CreateDir (1657 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.DeleteDir
/home/kou/work/cpp/arrow.kou/cpp/src/arrow/filesystem/test_util.cc:277: Failure
Failed
Expected 'fs->DeleteDir("AB/def")' to fail with IOError, but got OK
[  FAILED  ] TestAzureHierarchicalNSGeneric.DeleteDir (1434 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.DeleteDirContents
/home/kou/work/cpp/arrow.kou/cpp/src/arrow/filesystem/test_util.cc:306: Failure
Failed
Expected 'fs->DeleteDirContents("abc")' to fail with IOError, but got OK
[  FAILED  ] TestAzureHierarchicalNSGeneric.DeleteDirContents (1535 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.DeleteRootDirContents
[       OK ] TestAzureHierarchicalNSGeneric.DeleteRootDirContents (574 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.DeleteFile
[       OK ] TestAzureHierarchicalNSGeneric.DeleteFile (977 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.DeleteFiles
[       OK ] TestAzureHierarchicalNSGeneric.DeleteFiles (1240 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.MoveFile
/home/kou/work/cpp/arrow.kou/cpp/src/arrow/filesystem/azurefs.cc:1371: LeaseGuard::WaitUntilLatestKnownExpiryTime for 2794ms...
/home/kou/work/cpp/arrow.kou/cpp/src/arrow/filesystem/azurefs.cc:1371: LeaseGuard::WaitUntilLatestKnownExpiryTime for 2790ms...
/home/kou/work/cpp/arrow.kou/cpp/src/arrow/filesystem/azurefs.cc:1371: LeaseGuard::WaitUntilLatestKnownExpiryTime for 2774ms...
/home/kou/work/cpp/arrow.kou/cpp/src/arrow/filesystem/azurefs.cc:1371: LeaseGuard::WaitUntilLatestKnownExpiryTime for 2618ms...
/home/kou/work/cpp/arrow.kou/cpp/src/arrow/filesystem/azurefs.cc:1371: LeaseGuard::WaitUntilLatestKnownExpiryTime for 2734ms...
[       OK ] TestAzureHierarchicalNSGeneric.MoveFile (17419 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.MoveDir
/home/kou/work/cpp/arrow.kou/cpp/src/arrow/filesystem/test_util.cc:463: Skipped
Filesystem doesn't allow moving directories
[  SKIPPED ] TestAzureHierarchicalNSGeneric.MoveDir (50 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.CopyFile
/home/kou/work/cpp/arrow.kou/cpp/src/arrow/filesystem/test_util.cc:531: Failure
Failed
'fs->CopyFile("AB/abc", "def")' failed with IOError: Failed to copy a blob. (https://XXX.blob.core.windows.net/a2giu5a9ij1szn7tsx4l0z6194h5zqww/AB/abc -> https://clearcodearrow.blob.core.windows.net/a2giu5a9ij1szn7tsx4l0z6194h5zqww/def) Azure Error: [CannotVerifyCopySource] 401 Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
RequestId:2c08ee06-f01e-0062-12d2-807ca0000000
Time:2024-03-28T05:40:28.1650700Z
Request ID: 2c08ee06-f01e-0062-12d2-807ca0000000
[  FAILED  ] TestAzureHierarchicalNSGeneric.CopyFile (1007 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.GetFileInfo
[       OK ] TestAzureHierarchicalNSGeneric.GetFileInfo (800 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.GetFileInfoVector
[       OK ] TestAzureHierarchicalNSGeneric.GetFileInfoVector (686 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.GetFileInfoSelector
/home/kou/work/cpp/arrow.kou/cpp/src/arrow/filesystem/test_util.cc:149: Failure
Expected equality of these values:
  info.mtime()
    Which is: 8-byte object <00-8C 22-1E 29-D7 C0-17>
  mtime
    Which is: 8-byte object <FF-FF FF-FF FF-FF FF-FF>
For path 'AB'
/home/kou/work/cpp/arrow.kou/cpp/src/arrow/filesystem/test_util.cc:753: Failure
Failed
Expected 'fs->GetFileInfo(s)' to fail with IOError, but got OK
[  FAILED  ] TestAzureHierarchicalNSGeneric.GetFileInfoSelector (860 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.GetFileInfoSelectorWithRecursion
[       OK ] TestAzureHierarchicalNSGeneric.GetFileInfoSelectorWithRecursion (1626 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.GetFileInfoAsync
[       OK ] TestAzureHierarchicalNSGeneric.GetFileInfoAsync (683 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.GetFileInfoGenerator
[       OK ] TestAzureHierarchicalNSGeneric.GetFileInfoGenerator (771 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.OpenOutputStream
[       OK ] TestAzureHierarchicalNSGeneric.OpenOutputStream (779 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.OpenAppendStream
[       OK ] TestAzureHierarchicalNSGeneric.OpenAppendStream (496 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.OpenInputStream
[       OK ] TestAzureHierarchicalNSGeneric.OpenInputStream (543 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.OpenInputStreamWithFileInfo
[       OK ] TestAzureHierarchicalNSGeneric.OpenInputStreamWithFileInfo (543 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.OpenInputStreamAsync
[       OK ] TestAzureHierarchicalNSGeneric.OpenInputStreamAsync (419 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.OpenInputFile
[       OK ] TestAzureHierarchicalNSGeneric.OpenInputFile (509 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.OpenInputFileWithFileInfo
[       OK ] TestAzureHierarchicalNSGeneric.OpenInputFileWithFileInfo (549 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.OpenInputFileAsync
[       OK ] TestAzureHierarchicalNSGeneric.OpenInputFileAsync (394 ms)
[ RUN      ] TestAzureHierarchicalNSGeneric.SpecialChars
/home/kou/work/cpp/arrow.kou/cpp/src/arrow/filesystem/test_util.cc:1178: Failure
Failed
'fs->CopyFile("Blank Char/Special%Char.txt", "Special and%different.txt")' failed with IOError: Failed to copy a blob. (https://XXX.blob.core.windows.net/armiaz82i30wliew7324v0l421a4uhop/Blank%20Char/Special%25Char.txt -> https://XXX.blob.core.windows.net/armiaz82i30wliew7324v0l421a4uhop/Special%20and%25different.txt) Azure Error: [CannotVerifyCopySource] 401 Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
RequestId:dbfb99e1-801e-008c-4ad2-80d689000000
Time:2024-03-28T05:40:38.3406267Z
Request ID: dbfb99e1-801e-008c-4ad2-80d689000000
[  FAILED  ] TestAzureHierarchicalNSGeneric.SpecialChars (475 ms)
[----------] 26 tests from TestAzureHierarchicalNSGeneric (36306 ms total)

[----------] Global test environment tear-down
[==========] 26 tests from 1 test suite ran. (36306 ms total)
[  PASSED  ] 20 tests.
[  SKIPPED ] 1 test, listed below:
[  SKIPPED ] TestAzureHierarchicalNSGeneric.MoveDir
[  FAILED  ] 5 tests, listed below:
[  FAILED  ] TestAzureHierarchicalNSGeneric.DeleteDir
[  FAILED  ] TestAzureHierarchicalNSGeneric.DeleteDirContents
[  FAILED  ] TestAzureHierarchicalNSGeneric.CopyFile
[  FAILED  ] TestAzureHierarchicalNSGeneric.GetFileInfoSelector
[  FAILED  ] TestAzureHierarchicalNSGeneric.SpecialChars

 5 FAILED TESTS

Can we work on this as a separated task?

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 28, 2024
@kou
Copy link
Member Author

kou commented Mar 29, 2024

@felipecrv Do you want to review again before we merge this?

@kou kou merged commit 9e320d7 into apache:main Apr 1, 2024
36 checks passed
@kou kou deleted the cpp-azure-generic-test branch April 1, 2024 21:15
@kou kou removed the awaiting change review Awaiting change review label Apr 1, 2024
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 9e320d7.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them.

@felipecrv
Copy link
Contributor

@felipecrv Do you want to review again before we merge this?

Sorry for the delay in responding. No, this is fine. Thank you!

tolleybot pushed a commit to tmct/arrow that referenced this pull request May 2, 2024
…ache#40567)

### Rationale for this change

We should provide common spec for all filesystem API.

### What changes are included in this PR?

Enable the generic filesystem tests.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#39069

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
tolleybot pushed a commit to tmct/arrow that referenced this pull request May 4, 2024
…ache#40567)

### Rationale for this change

We should provide common spec for all filesystem API.

### What changes are included in this PR?

Enable the generic filesystem tests.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#39069

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
rok pushed a commit to tmct/arrow that referenced this pull request May 8, 2024
…ache#40567)

### Rationale for this change

We should provide common spec for all filesystem API.

### What changes are included in this PR?

Enable the generic filesystem tests.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#39069

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
rok pushed a commit to tmct/arrow that referenced this pull request May 8, 2024
…ache#40567)

### Rationale for this change

We should provide common spec for all filesystem API.

### What changes are included in this PR?

Enable the generic filesystem tests.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#39069

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…ache#40567)

### Rationale for this change

We should provide common spec for all filesystem API.

### What changes are included in this PR?

Enable the generic filesystem tests.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#39069

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants