-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-29847: [C++] Build with Azure SDK for C++ #36835
Conversation
|
4e3e7a5
to
d0f5b65
Compare
d0f5b65
to
8915352
Compare
8915352
to
9d14edd
Compare
…C++ filesystem (#36988) ### Rationale for this change We need to write tests for #18014. azurite is like a fake Azure blob storage so it can be used to write integration tests ### What changes are included in this PR? Extract the `azurite` related changes from #12914 to create a smaller PR that's easier to review. I have made very minimal changes compared to that PR. Currently `azurite` is configured for all the environments where `ARROW_AZURE` was enabled by #35701. I assume its deliberate that its not enabled yet for windows, alpine, conda, debian or fedora builds. ### Are these changes tested? Its tested by there aren't really any good tests in this PR. I used this `azurite` config in #36835 to make an integration test that uses the Azure C++ SDK. On its own we can't really write tests for this `azurite` setup PR. ### Are there any user-facing changes? No * Closes: #36886 Lead-authored-by: Thomas Newton <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
9d14edd
to
6ec7910
Compare
8aa551a
to
2560caf
Compare
So what do we think is the best option today? Is disabling |
I'll provide a patch for |
This is an ad-hoc patch but this will work. We need a real improvement later. diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 1dfaf71b4..9ff91f978 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -5082,6 +5082,13 @@ function(build_azure_sdk)
set(CMAKE_EXPORT_NO_PACKAGE_REGISTRY TRUE)
set(DISABLE_AZURE_CORE_OPENTELEMETRY TRUE)
set(ENV{AZURE_SDK_DISABLE_AUTO_VCPKG} TRUE)
+ if(MSVC)
+ string(REPLACE "/WX" "" CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG}")
+ string(REPLACE "/WX" "" CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG}")
+ else()
+ string(REPLACE "-Werror" "" CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG}")
+ string(REPLACE "-Werror" "" CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG}")
+ endif()
fetchcontent_makeavailable(azure_sdk)
set(AZURE_SDK_VENDORED
TRUE |
@github-actions crossbow submit -g cpp |
Revision: e3fb84c Submitted crossbow builds: ursacomputing/crossbow @ actions-ea66ccd89b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I'll merge this tomorrow if nobody objects it.
Thanks for your help on this @kou. I feel kind of bad about how much work I created for you during review, with my lack of C++ experience. Hopefully the native Azure support is worth it 🙂. |
Thank you for this PR @Tom-Newton! |
Don't worry. :-) |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 3cb4481. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
…Azure C++ filesystem (apache#36988) ### Rationale for this change We need to write tests for apache#18014. azurite is like a fake Azure blob storage so it can be used to write integration tests ### What changes are included in this PR? Extract the `azurite` related changes from apache#12914 to create a smaller PR that's easier to review. I have made very minimal changes compared to that PR. Currently `azurite` is configured for all the environments where `ARROW_AZURE` was enabled by apache#35701. I assume its deliberate that its not enabled yet for windows, alpine, conda, debian or fedora builds. ### Are these changes tested? Its tested by there aren't really any good tests in this PR. I used this `azurite` config in apache#36835 to make an integration test that uses the Azure C++ SDK. On its own we can't really write tests for this `azurite` setup PR. ### Are there any user-facing changes? No * Closes: apache#36886 Lead-authored-by: Thomas Newton <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
### Rationale for this change We want to use the Azure SDK for C++ to read/write to Azure blob storage. Obviously this is pretty important for building an `AzureFileSystem`. ### What changes are included in this PR? Builds the the relevant parts of the azure SDK as a cmake external project. Adds a couple of simple tests that just assert that the Azure SDK is working and a couple of lines in `AzureFileSystem` to initialise the blob storage client to ensure the build is working correctly in all environments. I started with the build setup from apache#12914 but I did make few changes. 1. Although its atypical for this project we chose to switch from cmake's `ExternalProject` to `FetchContent`. `FetchContent` is recomended by the Azure docs https://github.com/Azure/azure-sdk-for-cpp#cmake-project--fetch-content. It also solves a few problems including: automatically linking system curl and ssl instead of bootstrapping vcpkg and installing curl and ssl from there. 2. Only build one version of the Azure SDK for C++ because it contains all the components. Previously we were unnecessarily building 5 different versions of the whole thing on top of each other. This created race conditions for which version each component came from. 3. We are using `azure-core_1.10.2` which is a very recent version. There are a couple of important reasons for this 1. [an important managed identity fix](Azure/azure-sdk-for-cpp#4723), 2. [fixed support for curl versions < 7.71.0](Azure/azure-sdk-for-cpp#4792). There will be follow up PRs to enable Azure in the manylinux builds. We need to update `vcpkg` first so we can get a version of the Azure SDK which contains [an important managed identity fix](Azure/azure-sdk-for-cpp#4723). ### Are these changes tested? Yes. There is a simple test that just runs the Azure client against azurite. Additionally just initialising the client in `AzureFileSystem` goes a long way towards ensuring the build is working. ### Are there any user-facing changes? No * Closes: apache#29847 Lead-authored-by: Thomas Newton <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: shefali singh <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
### Rationale for this change We want to use the Azure SDK for C++ to read/write to Azure blob storage. Obviously this is pretty important for building an `AzureFileSystem`. ### What changes are included in this PR? Builds the the relevant parts of the azure SDK as a cmake external project. Adds a couple of simple tests that just assert that the Azure SDK is working and a couple of lines in `AzureFileSystem` to initialise the blob storage client to ensure the build is working correctly in all environments. I started with the build setup from apache#12914 but I did make few changes. 1. Although its atypical for this project we chose to switch from cmake's `ExternalProject` to `FetchContent`. `FetchContent` is recomended by the Azure docs https://github.com/Azure/azure-sdk-for-cpp#cmake-project--fetch-content. It also solves a few problems including: automatically linking system curl and ssl instead of bootstrapping vcpkg and installing curl and ssl from there. 2. Only build one version of the Azure SDK for C++ because it contains all the components. Previously we were unnecessarily building 5 different versions of the whole thing on top of each other. This created race conditions for which version each component came from. 3. We are using `azure-core_1.10.2` which is a very recent version. There are a couple of important reasons for this 1. [an important managed identity fix](Azure/azure-sdk-for-cpp#4723), 2. [fixed support for curl versions < 7.71.0](Azure/azure-sdk-for-cpp#4792). There will be follow up PRs to enable Azure in the manylinux builds. We need to update `vcpkg` first so we can get a version of the Azure SDK which contains [an important managed identity fix](Azure/azure-sdk-for-cpp#4723). ### Are these changes tested? Yes. There is a simple test that just runs the Azure client against azurite. Additionally just initialising the client in `AzureFileSystem` goes a long way towards ensuring the build is working. ### Are there any user-facing changes? No * Closes: apache#29847 Lead-authored-by: Thomas Newton <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: shefali singh <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Rationale for this change
We want to use the Azure SDK for C++ to read/write to Azure blob storage. Obviously this is pretty important for building an
AzureFileSystem
.What changes are included in this PR?
Builds the the relevant parts of the azure SDK as a cmake external project. Adds a couple of simple tests that just assert that the Azure SDK is working and a couple of lines in
AzureFileSystem
to initialise the blob storage client to ensure the build is working correctly in all environments.I started with the build setup from #12914 but I did make few changes.
ExternalProject
toFetchContent
.FetchContent
is recomended by the Azure docs https://github.com/Azure/azure-sdk-for-cpp#cmake-project--fetch-content. It also solves a few problems including: automatically linking system curl and ssl instead of bootstrapping vcpkg and installing curl and ssl from there.azure-core_1.10.2
which is a very recent version. There are a couple of important reasons for this 1. an important managed identity fix, 2. fixed support for curl versions < 7.71.0.There will be follow up PRs to enable Azure in the manylinux builds. We need to update
vcpkg
first so we can get a version of the Azure SDK which contains an important managed identity fix.Are these changes tested?
Yes. There is a simple test that just runs the Azure client against azurite. Additionally just initialising the client in
AzureFileSystem
goes a long way towards ensuring the build is working.Are there any user-facing changes?
No