Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-40212: [R][CI] Add a C++ with gcc 14 build #40244

Merged
merged 3 commits into from
Mar 1, 2024
Merged

Conversation

jonkeane
Copy link
Member

@jonkeane jonkeane commented Feb 26, 2024

This is an attempt to make sure we catching things like the issue we ran into in #40009 in CI so that we could confirm that we don't run into this in the future. CRAN does runs using pre-release compilers, and we've hit this a time or two. We can wait for them to come and tell us we need to move in order to stay up, but it would be nice if we could detect this ourselves. And more importantly: it gives us a hopefully easier we to replicate the error and confirm we've fixed it so that we can have confidence when we submit.

  143 |   if (std::find(supported_schemes.begin(), supported_schemes.end(), scheme) == [0m
      |  [0;1;32m      ^~~~~~~~~
 [0m [1m/usr/bin/../lib/gcc/x86_64-linux-gnu/14/../../../../include/c++/14/bits/streambuf_iterator.h:435:5:  [0m [0;1;30mnote:  [0mcandidate template ignored: could not match 'istreambuf_iterator' against '__normal_iterator' [0m
  435 |     find(istreambuf_iterator<_CharT> __first, [0m
      |  [0;1;32m    ^
 [0m1 error generated.

#40244 (comment) is a run before our fix showing the same failure.

I've also downloaded + saved the log from CRAN since it will be overwritten soon now that we have a new release up: Install log for 'arrow' with clang dev.txt

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Feb 26, 2024
# https://github.com/r-hub/containers/issues/23
case "$PACKAGE_MANAGER" in
apt-get)
apt-get install -y libcurl4-openssl-dev texlive-latex-base texlive-fonts-extra
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened an issue against the rhub/containers repo to add in libcurl4-openssl. But we need this to be installed for curl to install on this machine.

texlive installing is also probably not what we want here, but it was necessary to get the build to be clean. We might consider turning off pdf manual checking on these if tex isn't available or installing tinytex in some other way.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Feb 26, 2024
Comment on lines 36 to 37
- clang17
- clang18
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the new rhub container approach https://github.com/r-hub/containers we might be able to change most of our other rhub images to these.

It's nice that they are built daily so we don't have to build them ourselves.

@kou kou changed the title GH-40212: [R] [CI] Add newer clang builds from rhub/containers GH-40212: [R][CI] Add newer clang builds from rhub/containers Feb 26, 2024
@kou
Copy link
Member

kou commented Feb 26, 2024

clang v gcc might be totally wrong

Right. This is related to underlying C++ libraries (libstdc++ provided by GCC and libc++ provided by LLVM) not frontend compilers (g++ and clang++).

We need to use libstdc++ 14 (provided by GCC 14) to reproduce this. We can use either g++ or clang++.

@kou
Copy link
Member

kou commented Feb 27, 2024

It seems that Ubuntu 24.04 LTS will ship GCC 14: https://packages.ubuntu.com/search?keywords=gcc-14

How about testing Apache Arrow C++ with Ubuntu 24.04 for now something like the following?

diff --git a/dev/tasks/tasks.yml b/dev/tasks/tasks.yml
index cfc333c6b2..6e85d5b95e 100644
--- a/dev/tasks/tasks.yml
+++ b/dev/tasks/tasks.yml
@@ -1055,7 +1055,7 @@ tasks:
     params:
       image: conda-cpp-valgrind
 
-{% for ubuntu_version in ["20.04", "22.04"] %}
+{% for ubuntu_version in ["20.04", "22.04", "24.04"] %}
   test-ubuntu-{{ ubuntu_version }}-cpp:
     ci: github
     template: docker-tests/github.linux.yml
@@ -1073,6 +1073,15 @@ tasks:
         UBUNTU: 20.04
       image: ubuntu-cpp-bundled
 
+  test-ubuntu-24.04-cpp-gcc-14:
+    ci: github
+    template: docker-tests/github.linux.yml
+    params:
+      env:
+        UBUNTU: "24.04"
+      flags: -e CC=gcc-14 -e CXX=g++-14
+      image: ubuntu-cpp
+
   test-skyhook-integration:
     ci: github
     template: docker-tests/github.linux.yml
diff --git a/docker-compose.yml b/docker-compose.yml
index aec685775a..26a42fa139 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -373,7 +373,7 @@ services:
     #   docker-compose run --rm ubuntu-cpp
     # Parameters:
     #   ARCH: amd64, arm64v8, s390x, ...
-    #   UBUNTU: 20.04, 22.04
+    #   UBUNTU: 20.04, 22.04, 24.04
     image: ${REPO}:${ARCH}-ubuntu-${UBUNTU}-cpp
     build:
       context: .

I think that https://github.com/r-hub/containers/blob/main/containers/ubuntu-next/Dockerfile will use Ubuntu 24.04 after Ubuntu 24.04 is released. Then we can use it to test with GCC 14 (libstdc++ 14).

@jonkeane
Copy link
Member Author

clang v gcc might be totally wrong

Right. This is related to underlying C++ libraries (libstdc++ provided by GCC and libc++ provided by LLVM) not frontend compilers (g++ and clang++).

We need to use libstdc++ 14 (provided by GCC 14) to reproduce this. We can use either g++ or clang++.

AAAAAAH this is the part I was missing. Thank you so much for this!

And thank you for the suggestion, that looks like it's good. I'll push changes to this (though it might take me a little bit).

Thank you again for the help!

@jonkeane
Copy link
Member Author

@github-actions crossbow submit test-ubuntu-24.04-cpp-gcc-14

Copy link

Revision: fa78edb

Submitted crossbow builds: ursacomputing/crossbow @ actions-f0d7b93964

Task Status
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

@jonkeane jonkeane changed the title GH-40212: [R][CI] Add newer clang builds from rhub/containers GH-40212: [R][CI] Add a cpp gcc 14 build Feb 28, 2024
@jonkeane
Copy link
Member Author

@github-actions crossbow submit test-ubuntu-24.04-cpp-gcc-14

Copy link

Revision: 8815aa7

Submitted crossbow builds: ursacomputing/crossbow @ actions-d5b819f13c

Task Status
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

@jonkeane
Copy link
Member Author

@github-actions crossbow submit test-ubuntu-24.04-cpp-gcc-14

Copy link

Revision: 682db3e

Submitted crossbow builds: ursacomputing/crossbow @ actions-cf7bda84e3

Task Status
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

@jonkeane
Copy link
Member Author

@github-actions crossbow submit test-ubuntu-24.04-cpp-gcc-14

Copy link

Revision: 10ef8ef

Submitted crossbow builds: ursacomputing/crossbow @ actions-a40193e951

Task Status
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

@jonkeane
Copy link
Member Author

Ok, now with the fixes from #40010 we build (🎉 ) but I am seeing some timezone-related(?) errors in the C++ tests:

[ RUN      ] ScalarTemporalTest.TestAssumeTimezone
/arrow/cpp/src/arrow/compute/kernels/test_util.cc:79: Failure
Failed
'_error_or_value25.status()' failed with Invalid: Cannot locate timezone 'US/Central': US/Central not found in timezone database

/arrow/cpp/src/arrow/compute/kernels/test_util.cc:86: Failure
Failed
'_error_or_value26.status()' failed with Invalid: Cannot locate timezone 'US/Central': US/Central not found in timezone database

/arrow/cpp/src/arrow/compute/kernels/test_util.cc:86: Failure
Failed
'_error_or_value26.status()' failed with Invalid: Cannot locate timezone 'US/Central': US/Central not found in timezone database

/arrow/cpp/src/arrow/compute/kernels/test_util.cc:79: Failure
Failed
'_error_or_value25.status()' failed with Invalid: Cannot locate timezone 'US/Central': US/Central not found in timezone database

/arrow/cpp/src/arrow/compute/kernels/test_util.cc:86: Failure
Failed
'_error_or_value26.status()' failed with Invalid: Cannot locate timezone 'US/Central': US/Central not found in timezone database

/arrow/cpp/src/arrow/compute/kernels/test_util.cc:86: Failure
Failed
'_error_or_value26.status()' failed with Invalid: Cannot locate timezone 'US/Central': US/Central not found in timezone database

/arrow/cpp/src/arrow/compute/kernels/test_util.cc:79: Failure
Failed
'_error_or_value25.status()' failed with Invalid: Cannot locate timezone 'US/Central': US/Central not found in timezone database

/arrow/cpp/src/arrow/compute/kernels/test_util.cc:86: Failure
Failed
'_error_or_value26.status()' failed with Invalid: Cannot locate timezone 'US/Central': US/Central not found in timezone database

/arrow/cpp/src/arrow/compute/kernels/test_util.cc:86: Failure
Failed
'_error_or_value26.status()' failed with Invalid: Cannot locate timezone 'US/Central': US/Central not found in timezone database

/arrow/cpp/src/arrow/compute/kernels/test_util.cc:79: Failure
Failed
'_error_or_value25.status()' failed with Invalid: Cannot locate timezone 'US/Central': US/Central not found in timezone database

/arrow/cpp/src/arrow/compute/kernels/test_util.cc:86: Failure
Failed
'_error_or_value26.status()' failed with Invalid: Cannot locate timezone 'US/Central': US/Central not found in timezone database

/arrow/cpp/src/arrow/compute/kernels/test_util.cc:86: Failure
Failed
'_error_or_value26.status()' failed with Invalid: Cannot locate timezone 'US/Central': US/Central not found in timezone database

[  FAILED  ] ScalarTemporalTest.TestAssumeTimezone (18 ms)

https://github.com/ursacomputing/crossbow/actions/runs/8075079852/job/22061406834#step:6:8027

I'm also trying to build with not-gcc14 but still 24.04 (locally in docker) to see if those same failures happen there too. But in case someone sees these and knows what's up

@jonkeane
Copy link
Member Author

I'm also trying to build with not-gcc14 but still 24.04 (locally in docker) to see if those same failures happen there too. But in case someone sees these and knows what's up

Confirmed (locally) that gcc13 on 24.04 has similar timezone issues. I wonder if the timezone file or package changed in some way?

@jonkeane
Copy link
Member Author

Ha! Indeed. I just looked at the timezone info, and there are differences.

On 24.04:

ls /usr/share/zoneinfo
Africa      Asia       CST6CDT  Etc      HST     MST7MDT  WET                localtime   zone1970.tab
America     Atlantic   EET      Europe   Indian  PST8PDT  iso3166.tab        posixrules  zonenow.tab
Antarctica  Australia  EST      Factory  MET     Pacific  leap-seconds.list  tzdata.zi
Arctic      CET        EST5EDT  GMT      MST     UTC      leapseconds        zone.tab

On 22.04:

ls /usr/share/zoneinfo
Africa      Cuba     GMT+0        Japan              NZ-CHAT     Turkey
America     EET      GMT-0        Kwajalein          Pacific     tzdata.zi
Antarctica  Egypt    GMT0         leapseconds        Poland      UCT
Arctic      Eire     Greenwich    leap-seconds.list  Portugal    Universal
Asia        EST      Hongkong     Libya              posix       US
Atlantic    EST5EDT  HST          localtime          posixrules  UTC
Australia   Etc      Iceland      MET                PRC         WET
Brazil      Europe   Indian       Mexico             PST8PDT     W-SU
Canada      Factory  Iran         MST                right       zone1970.tab
CET         GB       iso3166.tab  MST7MDT            ROC         zonenow.tab
Chile       GB-Eire  Israel       Navajo             ROK         zone.tab
CST6CDT     GMT      Jamaica      NZ                 Singapore   Zulu

I'll figure out if there's some config we can do to make the US/... style available or change the tests.

@kou
Copy link
Member

kou commented Feb 29, 2024

We may need tzdata-legacy package.

@jonkeane
Copy link
Member Author

@github-actions crossbow submit test-ubuntu-24.04-cpp-gcc-14

Copy link

Revision: 4efaa60

Submitted crossbow builds: ursacomputing/crossbow @ actions-3350d88c0f

Task Status
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

@jonkeane
Copy link
Member Author

This is ready for review. The Python failures are #40288 I'm re-running the Java jobs which was something up with flightsql. I can't imagine the changes I made here are actually impacting that Java jobs which failed given what's changed here.

# The following dependencies will be downloaded due to missing/invalid packages
# provided by the distribution:
# - Abseil is old
# - libc-ares-dev does not install CMake config files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that it installs CMake config files now: https://packages.ubuntu.com/noble/amd64/libc-ares-dev/filelist

Could you remove this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 169 to 172
# - Abseil is old
# - libc-ares-dev does not install CMake config files
ENV absl_SOURCE=BUNDLED \
ARROW_ACERO=ON \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that libabsl-dev is updated: https://packages.ubuntu.com/search?keywords=libabsl-dev

Can we try it?

Suggested change
# - Abseil is old
# - libc-ares-dev does not install CMake config files
ENV absl_SOURCE=BUNDLED \
ARROW_ACERO=ON \
# - libc-ares-dev does not install CMake config files
ENV ARROW_ACERO=ON \

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@kou kou changed the title GH-40212: [R][CI] Add a cpp gcc 14 build GH-40212: [R][CI] Add a C++ with gcc 14 build Feb 29, 2024
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 29, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 1, 2024
@jonkeane
Copy link
Member Author

jonkeane commented Mar 1, 2024

@github-actions crossbow submit test-ubuntu-24.04-cpp-gcc-14

One more time here, though I expect this to pass — I ran it locally as well.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 1, 2024
Copy link

github-actions bot commented Mar 1, 2024

Revision: 7659c2c

Submitted crossbow builds: ursacomputing/crossbow @ actions-109c5bd189

Task Status
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit 2308e40 into apache:main Mar 1, 2024
59 of 60 checks passed
@kou kou removed the awaiting changes Awaiting changes label Mar 1, 2024
@github-actions github-actions bot added the awaiting merge Awaiting merge label Mar 1, 2024
@jonkeane jonkeane deleted the clang18_in_CI branch March 1, 2024 02:24
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 2308e40.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

thisisnic pushed a commit to thisisnic/arrow that referenced this pull request Mar 8, 2024
This is an attempt to make sure we catching things like the issue we ran into in apache#40009 in CI so that we could confirm that we don't run into this in the future. CRAN does runs using pre-release compilers, and we've hit this a time or two. We can wait for them to come and tell us we need to move in order to stay up, but it would be nice if we could detect this ourselves. And more importantly: it gives us a hopefully easier we to replicate the error and confirm we've fixed it so that we can have confidence when we submit. 

``` [1m/tmp/RtmpLtR2pg/R.INSTALL1d415a4f31ad3b/arrow/tools/cpp/src/arrow/filesystem/util_internal.cc:143:7:  [0m [0;1;31merror:  [0m [1mno matching function for call to 'find' [0m
  143 |   if (std::find(supported_schemes.begin(), supported_schemes.end(), scheme) == [0m
      |  [0;1;32m      ^~~~~~~~~
 [0m [1m/usr/bin/../lib/gcc/x86_64-linux-gnu/14/../../../../include/c++/14/bits/streambuf_iterator.h:435:5:  [0m [0;1;30mnote:  [0mcandidate template ignored: could not match 'istreambuf_iterator' against '__normal_iterator' [0m
  435 |     find(istreambuf_iterator<_CharT> __first, [0m
      |  [0;1;32m    ^
 [0m1 error generated.
``` 

apache#40244 (comment) is a run before our fix showing the same failure.

I've also downloaded + saved the log from CRAN since it will be overwritten soon now that we have a new release up: [Install log for 'arrow' with clang dev.txt](https://github.com/apache/arrow/files/14407630/Install.log.for.arrow.with.clang.dev.txt)

* GitHub Issue: apache#40212

Authored-by: Jonathan Keane <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants