From 7fc54b8f1f2766d67c3ac44ce36e8c69b7b3af23 Mon Sep 17 00:00:00 2001 From: Dewey Dunnington Date: Wed, 11 Oct 2023 16:36:29 -0300 Subject: [PATCH 1/6] remove windows --- r/vignettes/developers/setup.Rmd | 72 -------------------------------- 1 file changed, 72 deletions(-) diff --git a/r/vignettes/developers/setup.Rmd b/r/vignettes/developers/setup.Rmd index 479af577aa848..1611d5bb55a75 100644 --- a/r/vignettes/developers/setup.Rmd +++ b/r/vignettes/developers/setup.Rmd @@ -38,12 +38,6 @@ set -e set -x ``` - -```{bash, save=run & windows, hide=TRUE} -# For some reason CRAN Mirror goes missing in CI -echo 'options(repos=structure(c(CRAN="https://cloud.r-project.org")))' > $HOME/.Rprofile -``` - Windows and macOS users who wish to contribute to the R package and don't need to alter libarrow (Arrow's C++ library) may be able to obtain a recent version of the library without building from source. @@ -103,43 +97,6 @@ sudo apt install -y cmake libcurl4-openssl-dev libssl-dev brew install cmake openssl ``` -#### Windows - -The package can be built on Windows using [RTools 4](https://cran.r-project.org/bin/windows/Rtools/). It can be built for mingw32 (i386), mingw64 (x64), or ucrt64 (UCRT x64). mingw64 is the recommended 64-bit installation. - -Open the corresponding RTools Bash, for example "Rtools MinGW 64-bit" for mingw64. - -Install CMake, ccache, and Ninja with: - -```{bash, save=run & windows} -pacman --sync --refresh --noconfirm \ - ${MINGW_PACKAGE_PREFIX}-{ccache,cmake,ninja,openssl} -export CMAKE_GENERATOR=Ninja -``` - -You will need to add R to your path. For a user-level installation, R will be at something like `~/Documents/R/R-4.1.2/bin`. For a global installation, R will be at something like `/c/Program\ Files/R/R-4.1.2/bin`. The R on your path needs to match the architecture you are compiling for, so if you are compiling on 32-bit specify `.../bin/i386` instead of `.../bin/x64`. - -```{bash} -export PATH=~/Documents/R/R-4.1.2/bin/x64:$PATH -``` - -You can install additional dependencies like so. Note that you are limited to the packages in [the RTools repo](https://github.com/r-windows/rtools-packages), which does not contain every dependency used by Arrow. - -```{bash, save=run & windows} -pacman --sync --refresh --noconfirm \ - ${MINGW_PACKAGE_PREFIX}-boost \ - ${MINGW_PACKAGE_PREFIX}-brotli \ - ${MINGW_PACKAGE_PREFIX}-lz4 \ - ${MINGW_PACKAGE_PREFIX}-protobuf \ - ${MINGW_PACKAGE_PREFIX}-snappy \ - ${MINGW_PACKAGE_PREFIX}-thrift \ - ${MINGW_PACKAGE_PREFIX}-zlib \ - ${MINGW_PACKAGE_PREFIX}-zstd \ - ${MINGW_PACKAGE_PREFIX}-aws-sdk-cpp \ - ${MINGW_PACKAGE_PREFIX}-re2 \ - ${MINGW_PACKAGE_PREFIX}-libutf8proc -``` - ### Step 2 - Configure the libarrow build We recommend that you configure libarrow to be built to a user-level directory rather than a system directory for your development work. This is so that the development version you are using doesn't overwrite a released version of libarrow you may already have installed, and so that you are also able work with more than one version of libarrow (by using different `ARROW_HOME` directories for the different versions). @@ -158,13 +115,6 @@ export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH echo "export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH" >> ~/.bash_profile ``` -_Special instructions on Windows:_ You will need to add `$ARROW_HOME/bin` to your `PATH` if you are using dynamic libraries (which is recommended). - -```{bash, save=run & windows} -export PATH=$ARROW_HOME/bin:$PATH -echo "export PATH=\"$ARROW_HOME/bin:$PATH\"" >> ~/.bash_profile -``` - Start by navigating in a terminal to the arrow repository. You will need to create a directory into which the C++ build will put its contents. We recommend that you make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). Next, change directories to be inside `cpp/build`: ```{bash, save=run & !sys_install} @@ -197,32 +147,10 @@ cmake \ .. ``` -##### Windows - -```{bash, save=run & !sys_install & windows} -cmake \ - -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ - -DCMAKE_INSTALL_LIBDIR=lib \ - -DARROW_COMPUTE=ON \ - -DARROW_CSV=ON \ - -DARROW_DATASET=ON \ - -DARROW_EXTRA_ERROR_CONTEXT=ON \ - -DARROW_FILESYSTEM=ON \ - -DARROW_MIMALLOC=ON \ - -DARROW_JSON=ON \ - -DARROW_PARQUET=ON \ - -DARROW_WITH_SNAPPY=OFF \ - -DARROW_WITH_ZLIB=ON \ - .. -``` - #### {-} `..` refers to the C++ source directory: you're in `cpp/build` and the source is in `cpp`. -**For Windows**: some options, including `-DARROW_JEMALLOC`, are not supported on Windows. - - ```{bash, save=run & !sys_install, hide=TRUE} # For testing purposes, build with only shared libraries cmake \ From c7513173975ab8b1791601bacf32e4b566490ee3 Mon Sep 17 00:00:00 2001 From: Dewey Dunnington Date: Wed, 11 Oct 2023 16:38:19 -0300 Subject: [PATCH 2/6] first stab at documenting what we currently would reccomend --- r/vignettes/developers/setup.Rmd | 53 +++++++++++++++++--------------- 1 file changed, 29 insertions(+), 24 deletions(-) diff --git a/r/vignettes/developers/setup.Rmd b/r/vignettes/developers/setup.Rmd index 1611d5bb55a75..9c8760cf8f77b 100644 --- a/r/vignettes/developers/setup.Rmd +++ b/r/vignettes/developers/setup.Rmd @@ -38,44 +38,49 @@ set -e set -x ``` -Windows and macOS users who wish to contribute to the R package and -don't need to alter libarrow (Arrow's C++ library) may be able to obtain a -recent version of the library without building from source. +The Arrow R package is unique compared to other R packages that you may have +contributed to because it builds on top of the large and feature-rich Arrow C++ +implementation. -### Linux +## Option 1: Using pre-built libarrow binaries -On Linux, you can download a .zip file containing libarrow from the +On Linux and MacOS, you can download a .zip file containing libarrow from the [nightly repository](https://nightlies.apache.org/arrow/r/libarrow/bin/). The directory names correspond to the OpenSSL version the binaries built with: -- "linux-openssl-1.0" (OpenSSL 1.0) -- "linux-openssl-1.1" (OpenSSL 1.1) -- "linux-openssl-3.0" (OpenSSL 3.0) -Version numbers in that repository correspond to dates. +- "linux-openssl-1.0" (OpenSSL 1.0, e.g., Centos7) +- "linux-openssl-1.1" (OpenSSL 1.1, e.g., Ubuntu 18.04) +- "linux-openssl-3.0" (OpenSSL >=3.0, e.g., MacOS or Ubuntu >= 20.04) -You'll need to create a `libarrow` directory inside the R package directory and unzip the zip file containing the compiled libarrow binary files into it. +Version numbers in that repository correspond to dates. Use the version with +the highest number (i.e., that was modified most frequently). -### macOS -On macOS, you can install libarrow using [Homebrew](https://brew.sh/): +You'll need to create a `libarrow` directory inside the R package directory and +unzip the zip file containing the compiled libarrow binary files into it. -```bash -# For the released version: -brew install apache-arrow -# Or for a development version, you can try: -brew install apache-arrow --HEAD -``` +On Windows, the binary can be found in the +[corresponding subdirectory of the nightly repository](https://nightlies.apache.org/arrow/r/libarrow/bin/windows/) +but must be extracted to `windows/libarrow`. -### Windows +Most of the time, you won't need to update your version of libarrow because +the R package rarely changes with updates to the C++ library; however, if you +start to get errors when rebuilding the R package, you may have to remove the +`libarrow` directory and repeat the above steps with a newer nightly build. -On Windows, you can download a .zip file containing libarrow from the -[nightly repository](https://nightlies.apache.org/arrow/r/libarrow/bin/windows/). +## Option 2: Using the bundled build -Version numbers in that repository correspond to dates. +On Linux and MacOS, you can use the same workflow you might use for another +package that contains compiled code (e.g., `R CMD INSTALL . --preclean` from +the shell, `devtools::load_all()` from an R prompt, or `Install & Restart` from +RStudio). If the `libarrow` directory is not populated, the configure script will +attempt to build the bundled version of the C++ library. This will take a few +minutes the first time you try to install or load the package; however, subsequent +reloading will skip this step. -You can set the `RWINLIB_LOCAL` environment variable to point to the zip file containing libarrow before installing the arrow R package. +This option is not currently supported on Windows. -## R and C++ +## Option 3: Use a local Arrow C++ development build If you need to alter both libarrow and the R package code, or if you can't get a binary version of the latest libarrow elsewhere, you'll need to build it from source. This section discusses how to set up a C++ libarrow build configured to work with the R package. For more general resources, see the [Arrow C++ developer guide](https://arrow.apache.org/docs/developers/cpp/building.html). From f0ec82842a2655bae9aebb37efe5729c827b2075 Mon Sep 17 00:00:00 2001 From: Dewey Dunnington Date: Thu, 12 Oct 2023 12:09:03 -0300 Subject: [PATCH 3/6] update instructions --- r/vignettes/developers/setup.Rmd | 44 +++++++++----------------------- 1 file changed, 12 insertions(+), 32 deletions(-) diff --git a/r/vignettes/developers/setup.Rmd b/r/vignettes/developers/setup.Rmd index 9c8760cf8f77b..1ea42571ad8cd 100644 --- a/r/vignettes/developers/setup.Rmd +++ b/r/vignettes/developers/setup.Rmd @@ -42,45 +42,25 @@ The Arrow R package is unique compared to other R packages that you may have contributed to because it builds on top of the large and feature-rich Arrow C++ implementation. -## Option 1: Using pre-built libarrow binaries - -On Linux and MacOS, you can download a .zip file containing libarrow from the -[nightly repository](https://nightlies.apache.org/arrow/r/libarrow/bin/). - -The directory names correspond to the OpenSSL version the binaries built with: - -- "linux-openssl-1.0" (OpenSSL 1.0, e.g., Centos7) -- "linux-openssl-1.1" (OpenSSL 1.1, e.g., Ubuntu 18.04) -- "linux-openssl-3.0" (OpenSSL >=3.0, e.g., MacOS or Ubuntu >= 20.04) - -Version numbers in that repository correspond to dates. Use the version with -the highest number (i.e., that was modified most frequently). - -You'll need to create a `libarrow` directory inside the R package directory and -unzip the zip file containing the compiled libarrow binary files into it. - -On Windows, the binary can be found in the -[corresponding subdirectory of the nightly repository](https://nightlies.apache.org/arrow/r/libarrow/bin/windows/) -but must be extracted to `windows/libarrow`. - -Most of the time, you won't need to update your version of libarrow because -the R package rarely changes with updates to the C++ library; however, if you -start to get errors when rebuilding the R package, you may have to remove the -`libarrow` directory and repeat the above steps with a newer nightly build. - -## Option 2: Using the bundled build +## Option 1: Using the bundled build with nightly libarrow binaries On Linux and MacOS, you can use the same workflow you might use for another package that contains compiled code (e.g., `R CMD INSTALL . --preclean` from the shell, `devtools::load_all()` from an R prompt, or `Install & Restart` from RStudio). If the `libarrow` directory is not populated, the configure script will -attempt to build the bundled version of the C++ library. This will take a few -minutes the first time you try to install or load the package; however, subsequent -reloading will skip this step. +attempt to download the latest nightly libarrow binary, extract it to the +`arrow/r/libarrow` directory (MacOS, Linux) or `arrow/r/windows/libarrow` +directory (Windows), and continue building the R package as usual. -This option is not currently supported on Windows. +Most of the time, you won't need to update your version of libarrow because +the R package rarely changes with updates to the C++ library; however, if you +start to get errors when rebuilding the R package, you may have to remove the +`libarrow` directory (MacOS, Linux) or `windows/libarrow` directory (Windows) +and do a "clean" rebuild. You can do this from a terminal with +`R CMD INSTALL . --preclean` or from RStudio using the "Clean and Install" +option from "Build" tab. -## Option 3: Use a local Arrow C++ development build +## Option 2: Use a local Arrow C++ development build If you need to alter both libarrow and the R package code, or if you can't get a binary version of the latest libarrow elsewhere, you'll need to build it from source. This section discusses how to set up a C++ libarrow build configured to work with the R package. For more general resources, see the [Arrow C++ developer guide](https://arrow.apache.org/docs/developers/cpp/building.html). From 824bcde883160c285e6174fd9b3be055bd79e904 Mon Sep 17 00:00:00 2001 From: Dewey Dunnington Date: Thu, 12 Oct 2023 12:20:51 -0300 Subject: [PATCH 4/6] one more pass --- r/vignettes/developers/setup.Rmd | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/r/vignettes/developers/setup.Rmd b/r/vignettes/developers/setup.Rmd index 1ea42571ad8cd..c2d4505e57da6 100644 --- a/r/vignettes/developers/setup.Rmd +++ b/r/vignettes/developers/setup.Rmd @@ -40,13 +40,15 @@ set -x The Arrow R package is unique compared to other R packages that you may have contributed to because it builds on top of the large and feature-rich Arrow C++ -implementation. +implementation. Because the R package integrates tightly with Arrow C++, +it typically requires a dedicated copy of the library (i.e., it is usually +not possible to link to a system version of libarrow during development). -## Option 1: Using the bundled build with nightly libarrow binaries +## Option 1: Using nightly libarrow binaries -On Linux and MacOS, you can use the same workflow you might use for another -package that contains compiled code (e.g., `R CMD INSTALL . --preclean` from -the shell, `devtools::load_all()` from an R prompt, or `Install & Restart` from +On Linux, MacOS, and Windows you can use the same workflow you might use for another +package that contains compiled code (e.g., `R CMD INSTALL .` from +a terminal, `devtools::load_all()` from an R prompt, or `Install & Restart` from RStudio). If the `libarrow` directory is not populated, the configure script will attempt to download the latest nightly libarrow binary, extract it to the `arrow/r/libarrow` directory (MacOS, Linux) or `arrow/r/windows/libarrow` From 21501f4b09c359a02d7b9d084a17e972293d8e08 Mon Sep 17 00:00:00 2001 From: Dewey Dunnington Date: Thu, 12 Oct 2023 14:31:16 -0300 Subject: [PATCH 5/6] Update r/vignettes/developers/setup.Rmd Co-authored-by: Jacob Wujciak-Jens --- r/vignettes/developers/setup.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/r/vignettes/developers/setup.Rmd b/r/vignettes/developers/setup.Rmd index c2d4505e57da6..3afb6090e0956 100644 --- a/r/vignettes/developers/setup.Rmd +++ b/r/vignettes/developers/setup.Rmd @@ -51,7 +51,7 @@ package that contains compiled code (e.g., `R CMD INSTALL .` from a terminal, `devtools::load_all()` from an R prompt, or `Install & Restart` from RStudio). If the `libarrow` directory is not populated, the configure script will attempt to download the latest nightly libarrow binary, extract it to the -`arrow/r/libarrow` directory (MacOS, Linux) or `arrow/r/windows/libarrow` +`arrow/r/libarrow` directory (MacOS, Linux) or `arrow/r/windows` directory (Windows), and continue building the R package as usual. Most of the time, you won't need to update your version of libarrow because From 3279dc32b79cad6c80245a7b1f9d141fa78ca41a Mon Sep 17 00:00:00 2001 From: Dewey Dunnington Date: Thu, 12 Oct 2023 15:35:32 -0300 Subject: [PATCH 6/6] apply suggestions --- r/vignettes/developers/setup.Rmd | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/r/vignettes/developers/setup.Rmd b/r/vignettes/developers/setup.Rmd index 3afb6090e0956..de33e72407792 100644 --- a/r/vignettes/developers/setup.Rmd +++ b/r/vignettes/developers/setup.Rmd @@ -49,7 +49,7 @@ not possible to link to a system version of libarrow during development). On Linux, MacOS, and Windows you can use the same workflow you might use for another package that contains compiled code (e.g., `R CMD INSTALL .` from a terminal, `devtools::load_all()` from an R prompt, or `Install & Restart` from -RStudio). If the `libarrow` directory is not populated, the configure script will +RStudio). If the `arrow/r/libarrow` directory is not populated, the configure script will attempt to download the latest nightly libarrow binary, extract it to the `arrow/r/libarrow` directory (MacOS, Linux) or `arrow/r/windows` directory (Windows), and continue building the R package as usual. @@ -57,10 +57,11 @@ directory (Windows), and continue building the R package as usual. Most of the time, you won't need to update your version of libarrow because the R package rarely changes with updates to the C++ library; however, if you start to get errors when rebuilding the R package, you may have to remove the -`libarrow` directory (MacOS, Linux) or `windows/libarrow` directory (Windows) +`libarrow` directory (MacOS, Linux) or `windows` directory (Windows) and do a "clean" rebuild. You can do this from a terminal with -`R CMD INSTALL . --preclean` or from RStudio using the "Clean and Install" -option from "Build" tab. +`R CMD INSTALL . --preclean`, from RStudio using the "Clean and Install" +option from "Build" tab, or using `make clean` if you are using the `Makefile` +located in the root of the R package. ## Option 2: Use a local Arrow C++ development build