Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37945: [R] Update developer documentation #38220

Merged
merged 6 commits into from
Oct 13, 2023
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 25 additions & 110 deletions r/vignettes/developers/setup.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -38,50 +38,31 @@ set -e
set -x
```


```{bash, save=run & windows, hide=TRUE}
# For some reason CRAN Mirror goes missing in CI
echo 'options(repos=structure(c(CRAN="https://cloud.r-project.org")))' > $HOME/.Rprofile
```

Windows and macOS users who wish to contribute to the R package and
don't need to alter libarrow (Arrow's C++ library) may be able to obtain a
recent version of the library without building from source.

### Linux

On Linux, you can download a .zip file containing libarrow from the
[nightly repository](https://nightlies.apache.org/arrow/r/libarrow/bin/).

The directory names correspond to the OpenSSL version the binaries built with:
- "linux-openssl-1.0" (OpenSSL 1.0)
- "linux-openssl-1.1" (OpenSSL 1.1)
- "linux-openssl-3.0" (OpenSSL 3.0)

Version numbers in that repository correspond to dates.

You'll need to create a `libarrow` directory inside the R package directory and unzip the zip file containing the compiled libarrow binary files into it.

### macOS
On macOS, you can install libarrow using [Homebrew](https://brew.sh/):

```bash
# For the released version:
brew install apache-arrow
# Or for a development version, you can try:
brew install apache-arrow --HEAD
```

### Windows

On Windows, you can download a .zip file containing libarrow from the
[nightly repository](https://nightlies.apache.org/arrow/r/libarrow/bin/windows/).

Version numbers in that repository correspond to dates.

You can set the `RWINLIB_LOCAL` environment variable to point to the zip file containing libarrow before installing the arrow R package.

## R and C++
The Arrow R package is unique compared to other R packages that you may have
contributed to because it builds on top of the large and feature-rich Arrow C++
implementation. Because the R package integrates tightly with Arrow C++,
it typically requires a dedicated copy of the library (i.e., it is usually
not possible to link to a system version of libarrow during development).

## Option 1: Using nightly libarrow binaries

On Linux, MacOS, and Windows you can use the same workflow you might use for another
package that contains compiled code (e.g., `R CMD INSTALL .` from
a terminal, `devtools::load_all()` from an R prompt, or `Install & Restart` from
RStudio). If the `libarrow` directory is not populated, the configure script will
attempt to download the latest nightly libarrow binary, extract it to the
`arrow/r/libarrow` directory (MacOS, Linux) or `arrow/r/windows`
directory (Windows), and continue building the R package as usual.

Most of the time, you won't need to update your version of libarrow because
the R package rarely changes with updates to the C++ library; however, if you
start to get errors when rebuilding the R package, you may have to remove the
`libarrow` directory (MacOS, Linux) or `windows/libarrow` directory (Windows)
and do a "clean" rebuild. You can do this from a terminal with
`R CMD INSTALL . --preclean` or from RStudio using the "Clean and Install"
option from "Build" tab.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a clean target in the Makefile — though I can appreciate wanting to give folks a single way to do this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


## Option 2: Use a local Arrow C++ development build

If you need to alter both libarrow and the R package code, or if you can't get a binary version of the latest libarrow elsewhere, you'll need to build it from source. This section discusses how to set up a C++ libarrow build configured to work with the R package. For more general resources, see the [Arrow C++ developer guide](https://arrow.apache.org/docs/developers/cpp/building.html).

Expand All @@ -103,43 +84,6 @@ sudo apt install -y cmake libcurl4-openssl-dev libssl-dev
brew install cmake openssl
```

#### Windows

The package can be built on Windows using [RTools 4](https://cran.r-project.org/bin/windows/Rtools/). It can be built for mingw32 (i386), mingw64 (x64), or ucrt64 (UCRT x64). mingw64 is the recommended 64-bit installation.

Open the corresponding RTools Bash, for example "Rtools MinGW 64-bit" for mingw64.

Install CMake, ccache, and Ninja with:

```{bash, save=run & windows}
pacman --sync --refresh --noconfirm \
${MINGW_PACKAGE_PREFIX}-{ccache,cmake,ninja,openssl}
export CMAKE_GENERATOR=Ninja
```

You will need to add R to your path. For a user-level installation, R will be at something like `~/Documents/R/R-4.1.2/bin`. For a global installation, R will be at something like `/c/Program\ Files/R/R-4.1.2/bin`. The R on your path needs to match the architecture you are compiling for, so if you are compiling on 32-bit specify `.../bin/i386` instead of `.../bin/x64`.

```{bash}
export PATH=~/Documents/R/R-4.1.2/bin/x64:$PATH
```

You can install additional dependencies like so. Note that you are limited to the packages in [the RTools repo](https://github.com/r-windows/rtools-packages), which does not contain every dependency used by Arrow.

```{bash, save=run & windows}
pacman --sync --refresh --noconfirm \
${MINGW_PACKAGE_PREFIX}-boost \
${MINGW_PACKAGE_PREFIX}-brotli \
${MINGW_PACKAGE_PREFIX}-lz4 \
${MINGW_PACKAGE_PREFIX}-protobuf \
${MINGW_PACKAGE_PREFIX}-snappy \
${MINGW_PACKAGE_PREFIX}-thrift \
${MINGW_PACKAGE_PREFIX}-zlib \
${MINGW_PACKAGE_PREFIX}-zstd \
${MINGW_PACKAGE_PREFIX}-aws-sdk-cpp \
${MINGW_PACKAGE_PREFIX}-re2 \
${MINGW_PACKAGE_PREFIX}-libutf8proc
```

### Step 2 - Configure the libarrow build

We recommend that you configure libarrow to be built to a user-level directory rather than a system directory for your development work. This is so that the development version you are using doesn't overwrite a released version of libarrow you may already have installed, and so that you are also able work with more than one version of libarrow (by using different `ARROW_HOME` directories for the different versions).
Expand All @@ -158,13 +102,6 @@ export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH
echo "export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH" >> ~/.bash_profile
```

_Special instructions on Windows:_ You will need to add `$ARROW_HOME/bin` to your `PATH` if you are using dynamic libraries (which is recommended).

```{bash, save=run & windows}
export PATH=$ARROW_HOME/bin:$PATH
echo "export PATH=\"$ARROW_HOME/bin:$PATH\"" >> ~/.bash_profile
```

Start by navigating in a terminal to the arrow repository. You will need to create a directory into which the C++ build will put its contents. We recommend that you make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). Next, change directories to be inside `cpp/build`:

```{bash, save=run & !sys_install}
Expand Down Expand Up @@ -197,32 +134,10 @@ cmake \
..
```

##### Windows

```{bash, save=run & !sys_install & windows}
cmake \
-DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
-DCMAKE_INSTALL_LIBDIR=lib \
-DARROW_COMPUTE=ON \
-DARROW_CSV=ON \
-DARROW_DATASET=ON \
-DARROW_EXTRA_ERROR_CONTEXT=ON \
-DARROW_FILESYSTEM=ON \
-DARROW_MIMALLOC=ON \
-DARROW_JSON=ON \
-DARROW_PARQUET=ON \
-DARROW_WITH_SNAPPY=OFF \
-DARROW_WITH_ZLIB=ON \
..
```

#### {-}

`..` refers to the C++ source directory: you're in `cpp/build` and the source is in `cpp`.

**For Windows**: some options, including `-DARROW_JEMALLOC`, are not supported on Windows.


```{bash, save=run & !sys_install, hide=TRUE}
# For testing purposes, build with only shared libraries
cmake \
Expand Down
Loading