Skip to content

Latest commit

 

History

History
126 lines (99 loc) · 5.24 KB

README.md

File metadata and controls

126 lines (99 loc) · 5.24 KB

Java API for cudf

This project provides java bindings for cudf, to be able to process large amounts of data on a GPU. This is still a work in progress so some APIs may change until the 1.0 release.

Behavior on Systems with Multiple GPUs

The cudf project currently works with a single GPU per process. The CUDA runtime assigns the default GPU to all new operating system threads when they start to interact with CUDA. This means that if you use a multi-threaded environment, like Java processes tend to be, and try to use a non-default GPU for cudf you can run into hard to debug issues including resource leaks, invalid states, application crashes, and poor performance.

To prevent this the Java cudf API will remember the device used to initialize the Rapids Memory Manager (RMM), and automatically set the thread's active device to it, if needed. It will not set the device back when the cudf call completes. This is different from most CUDA libraries and can result in unexpected behavior if you try to mix these libraries using the same thread.

Dependency

This is a fat jar with the binary dependencies packaged in the jar. This means the jar will only run on platforms the jar was compiled for. When this is in an official Maven repository we will list the platforms that it is compiled and tested for. In the mean time you will need to build it yourself. In official releases there should be no classifier on the jar and it should run against most modern cuda drivers.

<dependency>
    <groupId>ai.rapids</groupId>
    <artifactId>cudf</artifactId>
    <version>${cudf.version}</version>
</dependency>

In some cases there may be a classifier to indicate the version of cuda required. See the Build From Source section below for more information about when this can happen. No official release of the jar will have a classifier on it.

CUDA 11.0:

<dependency>
    <groupId>ai.rapids</groupId>
    <artifactId>cudf</artifactId>
    <classifier>cuda11</classifier>
    <version>${cudf.version}</version>
</dependency>

Build From Source

Build libcudf first, and make sure the JDK is installed and available. Specify the cmake option -DCUDF_USE_ARROW_STATIC=ON -DCUDF_ENABLE_ARROW_S3=OFF when building so that Apache Arrow is linked statically to libcudf, as this will help create a jar that does not require Arrow and its dependencies to be available in the runtime environment.

After building libcudf, the Java bindings can be built via Maven, e.g.:

mvn clean install

If you have a compatible GPU on your build system the tests will use it. If not you will see a lot of skipped tests.

Using the Java CI Docker Image

If you are interested in building a Java cudf jar that is similar to the official releases that can run on all modern Linux systems, see the Java CI README for instructions on how to build within a Docker environment using devtoolset. Note that building the jar without the Docker setup and script will likely produce a jar that can only run in environments similar to that of the build machine.

If you decide to build without Docker and the build script, examining the cmake and Maven settings in the Java CI build script can be helpful if you are encountering difficulties during the build.

Statically Linking the CUDA Runtime

If you use the default cmake options libcudart will be dynamically linked to libcudf and libcudfjni. To build with a static CUDA runtime, build libcudf with the -DCUDA_STATIC_RUNTIME=ON as a cmake parameter, and similarly build with -DCUDA_STATIC_RUNTIME=ON when building the Java bindings with Maven.

Building with a libcudf Archive

When statically linking the CUDA runtime, it is recommended to build cuDF as an archive rather than a shared library, as this allows the Java bindings to only have a single shared library that uses the CUDA runtime. To build libcudf as an archive, specify -DBUILD_SHARED_LIBS=OFF as a cmake parameter when building libcudf, then specify -DCUDF_JNI_LIBCUDF_STATIC=ON when building the Java bindings with Maven.

Per-thread Default Stream

The JNI code can be built with per-thread default stream (PTDS), which gives each host thread its own default CUDA stream, and can potentially increase the overlap of data copying and compute between different threads (see blog post).

Since the PTDS option is for each compilation unit, it should be done at the same time across the whole codebase. To enable PTDS, first build cuDF:

cd src/cudf/cpp/build
cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DCUDF_USE_PER_THREAD_DEFAULT_STREAM=ON
make -j`nproc`
make install

then build the jar:

cd src/cudf/java
mvn clean install -DCUDF_USE_PER_THREAD_DEFAULT_STREAM=ON

GPUDirect Storage (GDS)

The JNI code can be built with GPUDirect Storage (GDS) support, which enables direct copying between GPU device buffers and supported filesystems (see https://docs.nvidia.com/gpudirect-storage/).

To enable GDS support, first make sure GDS is installed (see https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html), then run:

cd src/cudf/java
mvn clean install -DUSE_GDS=ON