Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-29847: [C++] Build with Azure SDK for C++ #36835

Merged
merged 93 commits into from
Aug 30, 2023

Conversation

Tom-Newton
Copy link
Contributor

@Tom-Newton Tom-Newton commented Jul 24, 2023

Rationale for this change

We want to use the Azure SDK for C++ to read/write to Azure blob storage. Obviously this is pretty important for building an AzureFileSystem.

What changes are included in this PR?

Builds the the relevant parts of the azure SDK as a cmake external project. Adds a couple of simple tests that just assert that the Azure SDK is working and a couple of lines in AzureFileSystem to initialise the blob storage client to ensure the build is working correctly in all environments.

I started with the build setup from #12914 but I did make few changes.

  1. Although its atypical for this project we chose to switch from cmake's ExternalProject to FetchContent. FetchContent is recomended by the Azure docs https://github.com/Azure/azure-sdk-for-cpp#cmake-project--fetch-content. It also solves a few problems including: automatically linking system curl and ssl instead of bootstrapping vcpkg and installing curl and ssl from there.
  2. Only build one version of the Azure SDK for C++ because it contains all the components. Previously we were unnecessarily building 5 different versions of the whole thing on top of each other. This created race conditions for which version each component came from.
  3. We are using azure-core_1.10.2 which is a very recent version. There are a couple of important reasons for this 1. an important managed identity fix, 2. fixed support for curl versions < 7.71.0.

There will be follow up PRs to enable Azure in the manylinux builds. We need to update vcpkg first so we can get a version of the Azure SDK which contains an important managed identity fix.

Are these changes tested?

Yes. There is a simple test that just runs the Azure client against azurite. Additionally just initialising the client in AzureFileSystem goes a long way towards ensuring the build is working.

Are there any user-facing changes?

No

@github-actions
Copy link

⚠️ GitHub issue #29847 has been automatically assigned in GitHub to PR creator.

@Tom-Newton Tom-Newton force-pushed the tomnewton/build_azure_sdk branch from 4e3e7a5 to d0f5b65 Compare July 24, 2023 07:43
@Tom-Newton Tom-Newton force-pushed the tomnewton/build_azure_sdk branch from d0f5b65 to 8915352 Compare August 1, 2023 22:36
@Tom-Newton Tom-Newton force-pushed the tomnewton/build_azure_sdk branch from 8915352 to 9d14edd Compare August 7, 2023 08:08
kou added a commit that referenced this pull request Aug 7, 2023
…C++ filesystem (#36988)

### Rationale for this change
We need to write tests for #18014. azurite is like a fake Azure blob storage so it can be used to write integration tests

### What changes are included in this PR?
Extract the `azurite` related changes from #12914 to create a smaller PR that's easier to review. I have made very minimal changes compared to that PR. 

Currently `azurite` is configured for all the environments where `ARROW_AZURE` was enabled by #35701. I assume its deliberate that its not enabled yet for windows, alpine, conda, debian or fedora builds. 

### Are these changes tested?
Its tested by there aren't really any good tests in this PR. I used this `azurite` config in #36835 to make an integration test that uses the Azure C++ SDK. On its own we can't really write tests for this `azurite` setup PR. 

### Are there any user-facing changes?
No

* Closes: #36886

Lead-authored-by: Thomas Newton <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
@Tom-Newton Tom-Newton force-pushed the tomnewton/build_azure_sdk branch from 9d14edd to 6ec7910 Compare August 8, 2023 07:39
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Aug 12, 2023
@Tom-Newton Tom-Newton changed the title WIP GH-29847: [C++] Build with azure C++ sdk GH-29847: [C++] Build with azure C++ sdk Aug 12, 2023
@Tom-Newton Tom-Newton force-pushed the tomnewton/build_azure_sdk branch from 8aa551a to 2560caf Compare August 13, 2023 12:06
@Tom-Newton Tom-Newton marked this pull request as ready for review August 13, 2023 12:07
@kou kou changed the title GH-29847: [C++] Build with azure C++ sdk GH-29847: [C++] Build with Azure SDK for C++ Aug 14, 2023
ci/scripts/python_wheel_manylinux_build.sh Outdated Show resolved Hide resolved
ci/scripts/python_wheel_manylinux_build.sh Outdated Show resolved Hide resolved
ci/vcpkg/vcpkg.json Outdated Show resolved Hide resolved
ci/docker/ubuntu-20.04-cpp.dockerfile Outdated Show resolved Hide resolved
cpp/cmake_modules/ThirdpartyToolchain.cmake Outdated Show resolved Hide resolved
cpp/cmake_modules/ThirdpartyToolchain.cmake Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs_test.cc Outdated Show resolved Hide resolved
cpp/thirdparty/versions.txt Outdated Show resolved Hide resolved
cpp/vcpkg.json Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Aug 14, 2023
@Tom-Newton
Copy link
Contributor Author

Tom-Newton commented Aug 25, 2023

I agree dependencies should not be build with -Werror etc. This is an issue with the use of the flag variables directly. As we can now use cmake 3.16 we should move to target based properties vs global flags but that is of course a major refactor...

So what do we think is the best option today? Is disabling -Werror at a global level an acceptable solution? Being able to disable -Werror would also be helpful for supporting Ubuntu 20 #36835 (comment)

@kou
Copy link
Member

kou commented Aug 26, 2023

I'll provide a patch for -Werror later. Please wait for a few days...

@kou
Copy link
Member

kou commented Aug 26, 2023

This is an ad-hoc patch but this will work. We need a real improvement later.

diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 1dfaf71b4..9ff91f978 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -5082,6 +5082,13 @@ function(build_azure_sdk)
   set(CMAKE_EXPORT_NO_PACKAGE_REGISTRY TRUE)
   set(DISABLE_AZURE_CORE_OPENTELEMETRY TRUE)
   set(ENV{AZURE_SDK_DISABLE_AUTO_VCPKG} TRUE)
+  if(MSVC)
+    string(REPLACE "/WX" "" CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG}")
+    string(REPLACE "/WX" "" CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG}")
+  else()
+    string(REPLACE "-Werror" "" CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG}")
+    string(REPLACE "-Werror" "" CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG}")
+  endif()
   fetchcontent_makeavailable(azure_sdk)
   set(AZURE_SDK_VENDORED
       TRUE

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Aug 26, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Aug 26, 2023
@kou
Copy link
Member

kou commented Aug 28, 2023

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: e3fb84c

Submitted crossbow builds: ursacomputing/crossbow @ actions-ea66ccd89b

Task Status
test-alpine-linux-cpp Github Actions
test-build-cpp-fuzz Github Actions
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-cuda-cpp Github Actions
test-debian-11-cpp-amd64 Github Actions
test-debian-11-cpp-i386 Github Actions
test-fedora-35-cpp Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-bundled Github Actions
test-ubuntu-20.04-cpp-minimal-with-formats Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions
test-ubuntu-22.04-cpp Github Actions
test-ubuntu-22.04-cpp-20 Github Actions
test-ubuntu-22.04-cpp-no-threading Github Actions

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I'll merge this tomorrow if nobody objects it.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Aug 29, 2023
@kou kou merged commit 3cb4481 into apache:main Aug 30, 2023
@kou kou removed the awaiting merge Awaiting merge label Aug 30, 2023
@Tom-Newton
Copy link
Contributor Author

Thanks for your help on this @kou. I feel kind of bad about how much work I created for you during review, with my lack of C++ experience. Hopefully the native Azure support is worth it 🙂.

@felipecrv
Copy link
Contributor

Thank you for this PR @Tom-Newton!

@kou
Copy link
Member

kou commented Aug 31, 2023

Don't worry. :-)
I'm happy that we have more contributors like you!

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 3cb4481.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…Azure C++ filesystem (apache#36988)

### Rationale for this change
We need to write tests for apache#18014. azurite is like a fake Azure blob storage so it can be used to write integration tests

### What changes are included in this PR?
Extract the `azurite` related changes from apache#12914 to create a smaller PR that's easier to review. I have made very minimal changes compared to that PR. 

Currently `azurite` is configured for all the environments where `ARROW_AZURE` was enabled by apache#35701. I assume its deliberate that its not enabled yet for windows, alpine, conda, debian or fedora builds. 

### Are these changes tested?
Its tested by there aren't really any good tests in this PR. I used this `azurite` config in apache#36835 to make an integration test that uses the Azure C++ SDK. On its own we can't really write tests for this `azurite` setup PR. 

### Are there any user-facing changes?
No

* Closes: apache#36886

Lead-authored-by: Thomas Newton <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
### Rationale for this change
We want to use the Azure SDK for C++ to read/write to Azure blob storage. Obviously this is pretty important for building an `AzureFileSystem`. 

### What changes are included in this PR?
Builds the the relevant parts of the azure SDK as a cmake external project. Adds a couple of simple tests that just assert that the Azure SDK is working and a couple of lines in `AzureFileSystem` to initialise the blob storage client to ensure the build is working correctly in all environments.

I started with the build setup from apache#12914 but I did make few changes.

1. Although its atypical for this project we chose to switch from cmake's `ExternalProject` to `FetchContent`. `FetchContent` is recomended by the Azure docs https://github.com/Azure/azure-sdk-for-cpp#cmake-project--fetch-content. It also solves a few problems including: automatically linking system curl and ssl instead of bootstrapping vcpkg and installing curl and ssl from there. 
2. Only build one version of the Azure SDK for C++ because it contains all the components. Previously we were unnecessarily building 5 different versions of the whole thing on top of each other. This created race conditions for which version each component came from.
3. We are using `azure-core_1.10.2` which is a very recent version. There are a couple of important reasons for this 1.  [an important managed identity fix](Azure/azure-sdk-for-cpp#4723), 2. [fixed support for curl versions < 7.71.0](Azure/azure-sdk-for-cpp#4792).

There will be follow up PRs to enable Azure in the manylinux builds. We need to update `vcpkg` first so we can get a version of the Azure SDK which contains [an important managed identity fix](Azure/azure-sdk-for-cpp#4723).

### Are these changes tested?
Yes. There is a simple test that just runs the Azure client against azurite. Additionally just initialising the client in `AzureFileSystem` goes a long way towards ensuring the build is working. 

### Are there any user-facing changes?
No
* Closes: apache#29847

Lead-authored-by: Thomas Newton <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: shefali singh <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
### Rationale for this change
We want to use the Azure SDK for C++ to read/write to Azure blob storage. Obviously this is pretty important for building an `AzureFileSystem`. 

### What changes are included in this PR?
Builds the the relevant parts of the azure SDK as a cmake external project. Adds a couple of simple tests that just assert that the Azure SDK is working and a couple of lines in `AzureFileSystem` to initialise the blob storage client to ensure the build is working correctly in all environments.

I started with the build setup from apache#12914 but I did make few changes.

1. Although its atypical for this project we chose to switch from cmake's `ExternalProject` to `FetchContent`. `FetchContent` is recomended by the Azure docs https://github.com/Azure/azure-sdk-for-cpp#cmake-project--fetch-content. It also solves a few problems including: automatically linking system curl and ssl instead of bootstrapping vcpkg and installing curl and ssl from there. 
2. Only build one version of the Azure SDK for C++ because it contains all the components. Previously we were unnecessarily building 5 different versions of the whole thing on top of each other. This created race conditions for which version each component came from.
3. We are using `azure-core_1.10.2` which is a very recent version. There are a couple of important reasons for this 1.  [an important managed identity fix](Azure/azure-sdk-for-cpp#4723), 2. [fixed support for curl versions < 7.71.0](Azure/azure-sdk-for-cpp#4792).

There will be follow up PRs to enable Azure in the manylinux builds. We need to update `vcpkg` first so we can get a version of the Azure SDK which contains [an important managed identity fix](Azure/azure-sdk-for-cpp#4723).

### Are these changes tested?
Yes. There is a simple test that just runs the Azure client against azurite. Additionally just initialising the client in `AzureFileSystem` goes a long way towards ensuring the build is working. 

### Are there any user-facing changes?
No
* Closes: apache#29847

Lead-authored-by: Thomas Newton <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: shefali singh <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Add azure-sdk-for-cpp to ThirdpartyToolchain
6 participants