-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-35140: [R] Rewrite configure script and ensure we don't use mismatched libarrow #35147
Conversation
|
|
If this is in fact the fix for the conda package issues (conda-forge/r-arrow-feedstock#56 (comment), conda-forge/r-arrow-feedstock#62), then we would have a way to test this in the nightly conda packaging jobs. We currently build the packages but don't install them and assert anything about their capabilities. Perhaps we can add something to https://github.com/apache/arrow/blob/main/dev/tasks/conda-recipes/r-arrow/build.sh to load the installed package and test that it has parquet, dataset, etc. |
@github-actions crossbow submit conda-linux--r4 conda-osx--r4 |
Revision: 1bc1a01 Submitted crossbow builds: ursacomputing/crossbow @ actions-02dfd6461d
|
r/configure
Outdated
if [ "$LIB_DIR" = "" ]; then | ||
# In case LIB_DIR wasn't previously set, infer from PKG_DIRS | ||
LIB_DIR=`echo $PKG_DIRS | sed -e 's/^-L//'` | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the following approach (we must set LIB_DIR
when PKG_*
are set) is better than this approach:
diff --git a/r/configure b/r/configure
index 65528bdc3..f452fffb1 100755
--- a/r/configure
+++ b/r/configure
@@ -126,10 +126,11 @@ else
if [ "$UNAME" = "Darwin" ] && [ "$FORCE_BUNDLED_BUILD" != "true" ]; then
if [ "$FORCE_AUTOBREW" != "true" ] && [ "`command -v brew`" ] && [ "`brew ls --versions ${PKG_BREW_NAME}`" != "" ]; then
echo "*** Using Homebrew ${PKG_BREW_NAME}"
- BREWDIR=`brew --prefix`
+ BREWDIR=`brew --prefix ${PKG_BREW_NAME}`
+ LIB_DIR="$BREWDIR/lib"
PKG_LIBS="-larrow -larrow_bundled_dependencies"
- PKG_DIRS="-L$BREWDIR/opt/$PKG_BREW_NAME/lib $PKG_DIRS"
- PKG_CFLAGS="-I$BREWDIR/opt/$PKG_BREW_NAME/include $PKG_CFLAGS"
+ PKG_DIRS="-L$LIB_DIR $PKG_DIRS"
+ PKG_CFLAGS="-I$BREWDIR/include $PKG_CFLAGS"
else
echo "*** Downloading ${PKG_BREW_NAME}"
if [ -f "autobrew" ]; then
@@ -144,7 +145,9 @@ else
if [ $? -ne 0 ]; then
echo "Failed to retrieve binary for ${PKG_BREW_NAME}"
fi
- # autobrew sets `PKG_LIBS`, `PKG_DIRS`, and `PKG_CFLAGS`
+ # autobrew sets `PKG_LIBS`, `PKG_DIRS`, `PKG_CFLAGS`, and `BREWDIR`
+ # but `LIB_DIR` isn't set.
+ LIB_DIR="$BREWDIR/lib"
fi
else
if [ "${NOT_CRAN}" = "true" ]; then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I didn't take this approach is that r/configure
has gotten so complex that I didn't think I could reason about all of the different branches of the code to know that I had found them all (see also #35049). Other, more concrete issues:
- I did confirm what @paleolimbot suggested, which is that if you have
pkg-config
available and have installedapache-arrow
viabrew
, you won't get to this*** Using Homebrew
block, you'll see that you're using arrow found by pkg-config. So at a minimum, we (also) needLIB_DIR
set in the pkg-config section above this. - Likewise, there have been similar reports, like the conda-forge one and [R] Improve package installation (on Fedora 36) with arrow system packages #31989, not on macOS/brew, so a brew-only fix wouldn't be sufficient.
- We actually don't need the autobrew one because (for "historical reasons", I assume) we explicitly set all of the feature flags in the autobrew script https://github.com/apache/arrow/blob/apache-arrow-11.0.0/r/tools/autobrew#L77
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kou is this possibly relevant? PKG_LIBS
and PKG_DIRS
seem to be set correctly but for some reason LIB_DIR
isn't.
diff --git a/r/configure b/r/configure
index 65528bdc38..8e130c53b4 100755
--- a/r/configure
+++ b/r/configure
@@ -110,7 +110,7 @@ else
PKG_CFLAGS="$PKGCONFIG_CFLAGS $PKG_CFLAGS"
PKG_LIBS="${PKGCONFIG_LIBS}"
PKG_DIRS="${PKGCONFIG_DIRS}"
- LIB_DIR=${FOUND_LIB_DIR}
+ LIB_DIR="${FOUND_LIB_DIR}"
# Check for version mismatch
PC_LIB_VERSION=`pkg-config --modversion arrow`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. Could you explain more the context for it? Can I see a build log for the case? This case #35140 ? Or other case?
If this is for #35140, I think that the isn't related because FOUND_LIB_DIR
doesn't have any space:
*** Arrow C++ libraries found via pkg-config at /opt/homebrew/Cellar/apache-arrow/11.0.0_3/lib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, #35140 uses Apache Arrow 11.0.0.
I was seeing the main branch.
Yes. The change #35147 (comment) will fix #35140. And it's already in the main branch: #34229
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the conda-side, we were able to fix it by specifying LIB_DIR=$PREFIX/lib
in our build script, see conda-forge/r-arrow-feedstock#63
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 my mistake, sorry I missed #34229, that explains some of the confusion. I'll tidy this PR up with any remaining useful changes and get it merged.
@@ -14,3 +14,5 @@ fi | |||
|
|||
# ${R_ARGS} necessary to support cross-compilation | |||
${R} CMD INSTALL --build r/. ${R_ARGS} | |||
# Ensure that features are enabled in the R build (feel free to add others) | |||
${R} -s -e 'library(arrow); stopifnot(arrow_with_dataset(), arrow_with_parquet(), arrow_with_s3())' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@h-vetinari Could you review conda part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That part is fine and already working (as in: shows the build fails) in conda-forge/r-arrow-feedstock#63.
However, I don't see how the BREW-specific patch above would apply to conda-forge (we'd need $PREFIX
), nor do I know what changed between 10 & 11. While taking a first look, my comment in that conda-forge PR was:
Do we need to do something like in the upstream R Makefile to discover the deps correctly?
I've applied the patches and added a bunch of comments to |
@github-actions crossbow submit -g r |
Revision: 9dfa86d Submitted crossbow builds: ursacomputing/crossbow @ actions-8ccc8bc92d |
I will inspect the successful builds to make sure features are enabled as they should be. Of the failures:
|
@github-actions crossbow submit -g r |
I think that this problem is related to |
Ah, wait. The job was passed at #35147 (comment) . |
It passed there because I changed it not to use the bundled build. That said, it didn't pick up the wrong absl, so maybe there's still something to learn from it. |
Hmm. It seems that autobrew not Homebrew was used in the passed job. |
Ah, OK. Thanks. |
In the failed build, |
autobrew does not use pkg-config to set PKG_CFLAGS, so |
I think that this is the same problem as #34523 .
Abseil was mixed when we build C++'s |
@github-actions crossbow submit -g r |
Revision: a769f6a Submitted crossbow builds: ursacomputing/crossbow @ actions-674657ab73 |
I don't fully understand the devdocs failure, but it has to do with the fact that arrow 12.0.0 has been released (https://github.com/ursacomputing/crossbow/actions/runs/4862485406/jobs/8668915213#step:9:3878 vs. https://github.com/ursacomputing/crossbow/actions/runs/4833910722/jobs/8614497534#step:9:3880 where it passed before). The cflags/libs are identical between the two builds. Rebasing to pull in the new version number would probably solve it, though I'm not sure it's worth the electricity to confirm it (that's an expensive build for very little value, it seems). |
Works great on Ubuntu via Docker (arm64 and amd64)!
It doesn't seem to be pulling the hosted static libraries for development, which I expected on Ubuntu/x86 (but maybe it's not supposed to?). |
We don't host binaries for x.y.z.9000, the nightly versions are like x.y.z.1000123, so if you wanted it to pull a nightly for a .9000 dev build, we'd have to teach it how to find the most recent one. Something for you (or whoever) to consider on #35049 I suppose. |
Made #35391 for the segfault on the ubuntu force tests job, which I see on |
Benchmark runs are scheduled for baseline = f794b20 and contender = ec89360. ec89360 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
…mismatched libarrow (apache#35147) I've significantly rewritten `r/configure` to make it easier to reason about and harder for issues like apache#34229 and apache#35140 to happen. I've also added a version check to make sure that we don't obviously try to use a system C++ library that doesn't match the R package version. Making sure this was applied in all of the right places and handling what to do if the versions didn't match was the impetus for the whole refactor. `configure` has been broken up into some functions, and the flow of the script is, as is documented at the top of the file: ``` # * Find libarrow on the system. If it is present, make sure # that its version is compatible with the R package. # * If no suitable libarrow is found, download it (where allowed) # or build it from source. # * Determine what features this libarrow has and what other # flags it requires, and set them in src/Makevars for use when # compiling the bindings. # * Run a test program to confirm that arrow headers are found ``` All of the detection of CFLAGS and `-L` dirs etc. happen in one place now, and they all prefer using `pkg-config` to read from the libarrow build what libraries and flags it requires, rather than hard-coding. (autobrew is the only remaining exception, but I didn't feel like messing with that today.) This should make the builds more future proof, should make it so more build configurations work (e.g. I suspect that a static build in ARROW_HOME wouldn't have gotten picked up correctly because it didn't add `-larrow_bundled_dependencies` to the libs, but now it will), and it may eliminate the redundant `-l` and `-D` setting I've observed in some builds (not harmful but definitely sloppy). Version checking has been added in an R script for ease of testing (and for easier handling of arithmetic), and there is an accompanying `test-check-versions.R` added. These are run on all the builds that use `ci/scripts/r_test.sh`. ### Behavior changes * If libarrow is found on the system (via ARROW_HOME, pkg-config, or brew), but the version does not match, it will not be used, and we will try a bundled build. This should mean that users installing a released version will never have libarrow version problems. * If both the found C++ library and R package are on matching dev versions (i.e. not identical given the x.y.z.9000 vs x+1.y.z-SNAPSHOT difference), it will proceed with a warning that you may need to rebuild if there are issues. This means that regular developers will see an extra message in the build output. * autobrew is only used on a release version unless you set FORCE_AUTOBREW=true. This eliminates another source of version mismatches (C++ release version, R dev version). * The path where you could set `LIB_DIR` and `INCLUDE_DIR` env vars has been removed. Use `ARROW_HOME` instead. * Closes: apache#35140 * Closes: apache#31989 Lead-authored-by: Neal Richardson <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Neal Richardson <[email protected]>
…mismatched libarrow (apache#35147) I've significantly rewritten `r/configure` to make it easier to reason about and harder for issues like apache#34229 and apache#35140 to happen. I've also added a version check to make sure that we don't obviously try to use a system C++ library that doesn't match the R package version. Making sure this was applied in all of the right places and handling what to do if the versions didn't match was the impetus for the whole refactor. `configure` has been broken up into some functions, and the flow of the script is, as is documented at the top of the file: ``` # * Find libarrow on the system. If it is present, make sure # that its version is compatible with the R package. # * If no suitable libarrow is found, download it (where allowed) # or build it from source. # * Determine what features this libarrow has and what other # flags it requires, and set them in src/Makevars for use when # compiling the bindings. # * Run a test program to confirm that arrow headers are found ``` All of the detection of CFLAGS and `-L` dirs etc. happen in one place now, and they all prefer using `pkg-config` to read from the libarrow build what libraries and flags it requires, rather than hard-coding. (autobrew is the only remaining exception, but I didn't feel like messing with that today.) This should make the builds more future proof, should make it so more build configurations work (e.g. I suspect that a static build in ARROW_HOME wouldn't have gotten picked up correctly because it didn't add `-larrow_bundled_dependencies` to the libs, but now it will), and it may eliminate the redundant `-l` and `-D` setting I've observed in some builds (not harmful but definitely sloppy). Version checking has been added in an R script for ease of testing (and for easier handling of arithmetic), and there is an accompanying `test-check-versions.R` added. These are run on all the builds that use `ci/scripts/r_test.sh`. ### Behavior changes * If libarrow is found on the system (via ARROW_HOME, pkg-config, or brew), but the version does not match, it will not be used, and we will try a bundled build. This should mean that users installing a released version will never have libarrow version problems. * If both the found C++ library and R package are on matching dev versions (i.e. not identical given the x.y.z.9000 vs x+1.y.z-SNAPSHOT difference), it will proceed with a warning that you may need to rebuild if there are issues. This means that regular developers will see an extra message in the build output. * autobrew is only used on a release version unless you set FORCE_AUTOBREW=true. This eliminates another source of version mismatches (C++ release version, R dev version). * The path where you could set `LIB_DIR` and `INCLUDE_DIR` env vars has been removed. Use `ARROW_HOME` instead. * Closes: apache#35140 * Closes: apache#31989 Lead-authored-by: Neal Richardson <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Neal Richardson <[email protected]>
…mismatched libarrow (apache#35147) I've significantly rewritten `r/configure` to make it easier to reason about and harder for issues like apache#34229 and apache#35140 to happen. I've also added a version check to make sure that we don't obviously try to use a system C++ library that doesn't match the R package version. Making sure this was applied in all of the right places and handling what to do if the versions didn't match was the impetus for the whole refactor. `configure` has been broken up into some functions, and the flow of the script is, as is documented at the top of the file: ``` # * Find libarrow on the system. If it is present, make sure # that its version is compatible with the R package. # * If no suitable libarrow is found, download it (where allowed) # or build it from source. # * Determine what features this libarrow has and what other # flags it requires, and set them in src/Makevars for use when # compiling the bindings. # * Run a test program to confirm that arrow headers are found ``` All of the detection of CFLAGS and `-L` dirs etc. happen in one place now, and they all prefer using `pkg-config` to read from the libarrow build what libraries and flags it requires, rather than hard-coding. (autobrew is the only remaining exception, but I didn't feel like messing with that today.) This should make the builds more future proof, should make it so more build configurations work (e.g. I suspect that a static build in ARROW_HOME wouldn't have gotten picked up correctly because it didn't add `-larrow_bundled_dependencies` to the libs, but now it will), and it may eliminate the redundant `-l` and `-D` setting I've observed in some builds (not harmful but definitely sloppy). Version checking has been added in an R script for ease of testing (and for easier handling of arithmetic), and there is an accompanying `test-check-versions.R` added. These are run on all the builds that use `ci/scripts/r_test.sh`. ### Behavior changes * If libarrow is found on the system (via ARROW_HOME, pkg-config, or brew), but the version does not match, it will not be used, and we will try a bundled build. This should mean that users installing a released version will never have libarrow version problems. * If both the found C++ library and R package are on matching dev versions (i.e. not identical given the x.y.z.9000 vs x+1.y.z-SNAPSHOT difference), it will proceed with a warning that you may need to rebuild if there are issues. This means that regular developers will see an extra message in the build output. * autobrew is only used on a release version unless you set FORCE_AUTOBREW=true. This eliminates another source of version mismatches (C++ release version, R dev version). * The path where you could set `LIB_DIR` and `INCLUDE_DIR` env vars has been removed. Use `ARROW_HOME` instead. * Closes: apache#35140 * Closes: apache#31989 Lead-authored-by: Neal Richardson <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Neal Richardson <[email protected]>
I've significantly rewritten
r/configure
to make it easier to reason about and harder for issues like #34229 and #35140 to happen. I've also added a version check to make sure that we don't obviously try to use a system C++ library that doesn't match the R package version. Making sure this was applied in all of the right places and handling what to do if the versions didn't match was the impetus for the whole refactor.configure
has been broken up into some functions, and the flow of the script is, as is documented at the top of the file:All of the detection of CFLAGS and
-L
dirs etc. happen in one place now, and they all prefer usingpkg-config
to read from the libarrow build what libraries and flags it requires, rather than hard-coding. (autobrew is the only remaining exception, but I didn't feel like messing with that today.) This should make the builds more future proof, should make it so more build configurations work (e.g. I suspect that a static build in ARROW_HOME wouldn't have gotten picked up correctly because it didn't add-larrow_bundled_dependencies
to the libs, but now it will), and it may eliminate the redundant-l
and-D
setting I've observed in some builds (not harmful but definitely sloppy).Version checking has been added in an R script for ease of testing (and for easier handling of arithmetic), and there is an accompanying
test-check-versions.R
added. These are run on all the builds that useci/scripts/r_test.sh
.Behavior changes
If libarrow is found on the system (via ARROW_HOME, pkg-config, or brew), but the version does not match, it will not be used, and we will try a bundled build. This should mean that users installing a released version will never have libarrow version problems.
If both the found C++ library and R package are on matching dev versions (i.e. not identical given the x.y.z.9000 vs x+1.y.z-SNAPSHOT difference), it will proceed with a warning that you may need to rebuild if there are issues. This means that regular developers will see an extra message in the build output.
autobrew is only used on a release version unless you set FORCE_AUTOBREW=true. This eliminates another source of version mismatches (C++ release version, R dev version).
The path where you could set
LIB_DIR
andINCLUDE_DIR
env vars has been removed. UseARROW_HOME
instead.Closes: [R] configure script for R fails to set PKG_CFLAGS #35140
Closes: [R] Improve package installation (on Fedora 36) with arrow system packages #31989