rapidsai · BradReesWork · May 2, 2019 · Apr 25, 2019 · Apr 25, 2019 · Apr 25, 2019
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,7 +8,7 @@
 - PR #210 Expose degree calculation kernel via python API
 - PR #220 Added bindings for Nvgraph triangle counting
 - PR #234 Added bindings for renumbering, modify renumbering to use RMM
-
+- PR #250 Add local build script to mimic gpuCI
 
 ## Improvements
 - PR #157 Removed cudatoolkit dependency in setup.py
@@ -21,8 +21,11 @@
 - PR #215 Simplified get_rapids_dataset_root_dir(), set a default value for the root dir
 - PR #233 Added csv datasets and edited test to use cudf for reading graphs
 - PR #247 Added some documentation for renumbering
+- PR #252 cpp test upgrades for more convenient testing on large input
 
 ## Bug Fixes
+- PR #256 Add pip to the install, clean up conda instructions
+- PR #253 Add rmm to conda configuration
 - PR #226 Bump cudf dependencies to 0.7
 - PR #169 Disable terminal output in sssp
 - PR #191 Fix double upload bug

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -49,5 +49,10 @@ contributing to. Start with _Step 3_ from above, commenting on the issue to let
 others know you are working on it. If you have any questions related to the
 implementation of the issue, ask them in the issue instead of the PR.
 
+### Building and Testing on a gpuCI image locally
+
+Before submitting a pull request, you can do a local build and test on your machine that mimics our gpuCI environment using the `ci/local/build.sh` script.
+For detailed information on usage of this script, see [here](ci/local/README.md).
+
 ## Attribution
 Portions adopted from https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md
diff --git a/README.md b/README.md
@@ -126,7 +126,6 @@ To install cuGraph from source, ensure the dependencies are met and follow the s
 
 2) Create the conda development environment 
 
-	A)   Building the `master` branch uses the `cugraph_dev` environment
 
 ```bash
 # create the conda environment (assuming in base `cugraph` directory)
@@ -145,22 +144,6 @@ conda deactivate
 
 
 
-	B) Create the conda development environment `cugraph_nightly`  
-
-If you are  on the latest development branch then you must use the `cugraph_nightly` environment.  The latest cuGraph code uses the latest cuDF features that might not yet be in the master branch.  To work off of the latest development branch, which could be unstable, use the nightly build environment.  
-
-```bash
-# create the conda environment (assuming in base `cugraph` directory)
-conda env create --name cugraph_nightly --file conda/environments/cugraph_nightly.yml
-
-# activate the environment
-conda activate cugraph_nightly 
-
-```
-
-
-
-
   - The environment can be updated as development includes/changes the dependencies. To do so, run:  
 
 
@@ -218,26 +201,48 @@ python setup.py install    # install cugraph python bindings
 
 #### Run tests
 
-6. Run either the standalone tests or the Python tests with datasets
-  - **C++ stand alone tests** 
+6. Run either the C++ or the Python tests with datasets
 
-    From the build directory : 
+  - **Python tests with datasets** 
 
     ```bash
-    # Run the cugraph tests
     cd $CUGRAPH_HOME
-    cd cpp/build
-    gtests/GDFGRAPH_TEST		# this is an executable file
+    cd python
+    pytest  
     ```
+  - **C++ stand alone tests** 
 
-  - **Python tests with datasets** 
+    From the build directory : 
 
     ```bash
+    # Run the cugraph tests
     cd $CUGRAPH_HOME
-    cd python
-    pytest  
+    cd cpp/build
+    gtests/GDFGRAPH_TEST		# this is an executable file
     ```
-
+ - **C++ tests with larger datasets**
+
+   If you already have the datasets:
+
+   ```bash
+   export RAPIDS_DATASET_ROOT_DIR=<path_to_ccp_test_and_reference_data>
+   ```
+   If you do not have the datasets:
+
+   ```bash
+   cd $CUGRAPH_HOME/datasets
+   source get_test_data.sh #This takes about 10 minutes and download 1GB data (>5 GB uncompressed)
+   ```
+
+   Run the C++ tests on large input:
+
+   ```bash
+   cd $CUGRAPH_HOME/cpp/build
+   #test one particular analytics (eg. pagerank)
+   gtests/PAGERANK_TEST
+   #test everything
+   make test
+   ```
 
 Note: This conda installation only applies to Linux and Python versions 3.6/3.7.
 
@@ -322,4 +327,5 @@ The RAPIDS suite of open source software libraries aim to enable execution of en
 
 ### Apache Arrow on GPU
 
-The GPU version of [Apache Arrow](https://arrow.apache.org/) is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.
+The GPU version of [Apache Arrow](https://arrow.apache.org/) is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.
+
diff --git a/ci/local/README.md b/ci/local/README.md
@@ -0,0 +1,57 @@
+## Purpose
+
+This script is designed for developer and contributor use. This tool mimics the actions of gpuCI on your local machine. This allows you to test and even debug your code inside a gpuCI base container before pushing your code as a GitHub commit.
+The script can be helpful in locally triaging and debugging RAPIDS continuous integration failures.
+
+## Requirements
+
+```
+nvidia-docker
+```
+
+## Usage
+
+```
+bash build.sh [-h] [-H] [-s] [-r <repo_dir>] [-i <image_name>]
+Build and test your local repository using a base gpuCI Docker image
+
+where:
+    -H   Show this help text
+    -r   Path to repository (defaults to working directory)
+    -i   Use Docker image (default is gpuci/rapidsai-base:cuda10.0-ubuntu16.04-gcc5-py3.6)
+    -s   Skip building and testing and start an interactive shell in a container of the Docker image
+```
+
+Example Usage:
+`bash build.sh -r ~/rapids/cugraph -i gpuci/cuda9.2-ubuntu16.04-gcc5-py3.6`
+
+For a full list of available gpuCI docker images, visit our [DockerHub](https://hub.docker.com/r/gpuci/rapidsai-base/tags) page.
+
+Style Check:
+```bash
+$ bash ci/local/build.sh -r ~/rapids/cugraph -s
+$ source activate gdf    #Activate gpuCI conda environment
+$ cd rapids
+$ flake8 python
+```
+
+## Information
+
+There are some caveats to be aware of when using this script, especially if you plan on developing from within the container itself.
+
+
+### Docker Image Build Repository
+
+The docker image will generate build artifacts in a folder on your machine located in the `root` directory of the repository you passed to the script. For the above example, the directory is named `~/rapids/cugraph/build_rapidsai-base_cuda9.2-ubuntu16.04-gcc5-py3.6/`. Feel free to remove this directory after the script is finished.
+
+*Note*: The script *will not* override your local build repository. Your local environment stays in tact.
+
+
+### Where The User is Dumped
+
+The script will build your repository and run all tests. If any tests fail, it dumps the user into the docker container itself to allow you to debug from within the container. If all the tests pass as expected the container exits and is automatically removed. Remember to exit the container if tests fail and you do not wish to debug within the container itself.
+
+
+### Container File Structure
+
+Your repository will be located in the `/rapids/` folder of the container. This folder is volume mounted from the local machine. Any changes to the code in this repository are replicated onto the local machine. The `cpp/build` and `python/build` directories within your repository is on a separate mount to avoid conflicting with your local build artifacts.
diff --git a/ci/local/build.sh b/ci/local/build.sh
@@ -0,0 +1,104 @@
+#!/bin/bash
+
+DOCKER_IMAGE="gpuci/rapidsai-base:cuda10.0-ubuntu16.04-gcc5-py3.6"
+REPO_PATH=${PWD}
+RAPIDS_DIR_IN_CONTAINER="/rapids"
+CPP_BUILD_DIR="cpp/build"
+PYTHON_BUILD_DIR="python/build"
+CONTAINER_SHELL_ONLY=0
+
+SHORTHELP="$(basename $0) [-h] [-H] [-s] [-r <repo_dir>] [-i <image_name>]"
+LONGHELP="${SHORTHELP}
+Build and test your local repository using a base gpuCI Docker image
+
+where:
+    -H   Show this help text
+    -r   Path to repository (defaults to working directory)
+    -i   Use Docker image (default is ${DOCKER_IMAGE})
+    -s   Skip building and testing and start an interactive shell in a container of the Docker image
+"
+
+# Limit GPUs available to container based on CUDA_VISIBLE_DEVICES
+if [[ -z "${CUDA_VISIBLE_DEVICES}" ]]; then
+    NVIDIA_VISIBLE_DEVICES="all"
+else
+    NVIDIA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES}
+fi
+
+while getopts ":hHr:i:s" option; do
+    case ${option} in
+        r)
+            REPO_PATH=${OPTARG}
+            ;;
+        i)
+            DOCKER_IMAGE=${OPTARG}
+            ;;
+        s)
+            CONTAINER_SHELL_ONLY=1
+            ;;
+        h)
+            echo "${SHORTHELP}"
+            exit 0
+            ;;
+        H)
+            echo "${LONGHELP}"
+            exit 0
+            ;;
+        *)
+            echo "ERROR: Invalid flag"
+            echo "${SHORTHELP}"
+            exit 1
+            ;;
+    esac
+done
+
+REPO_PATH_IN_CONTAINER="${RAPIDS_DIR_IN_CONTAINER}/$(basename ${REPO_PATH})"
+CPP_BUILD_DIR_IN_CONTAINER="${RAPIDS_DIR_IN_CONTAINER}/$(basename ${REPO_PATH})/${CPP_BUILD_DIR}"
+PYTHON_BUILD_DIR_IN_CONTAINER="${RAPIDS_DIR_IN_CONTAINER}/$(basename ${REPO_PATH})/${PYTHON_BUILD_DIR}"
+
+
+# BASE_CONTAINER_BUILD_DIR is named after the image name, allowing for
+# multiple image builds to coexist on the local filesystem. This will
+# be mapped to the typical BUILD_DIR inside of the container. Builds
+# running in the container generate build artifacts just as they would
+# in a bare-metal environment, and the host filesystem is able to
+# maintain the host build in BUILD_DIR as well.
+BASE_CONTAINER_BUILD_DIR=${REPO_PATH}/build_$(echo $(basename ${DOCKER_IMAGE})|sed -e 's/:/_/g')
+CPP_CONTAINER_BUILD_DIR=${BASE_CONTAINER_BUILD_DIR}/cpp
+PYTHON_CONTAINER_BUILD_DIR=${BASE_CONTAINER_BUILD_DIR}/python
+
+
+BUILD_SCRIPT="#!/bin/bash
+set -e
+WORKSPACE=${REPO_PATH_IN_CONTAINER}
+PREBUILD_SCRIPT=${REPO_PATH_IN_CONTAINER}/ci/gpu/prebuild.sh
+BUILD_SCRIPT=${REPO_PATH_IN_CONTAINER}/ci/gpu/build.sh
+cd ${WORKSPACE}
+if [ -f \${PREBUILD_SCRIPT} ]; then
+    source \${PREBUILD_SCRIPT}
+fi
+yes | source \${BUILD_SCRIPT}
+"
+
+if (( ${CONTAINER_SHELL_ONLY} == 0 )); then
+    COMMAND="${CPP_BUILD_DIR_IN_CONTAINER}/build.sh || bash"
+else
+    COMMAND="bash"
+fi
+
+# Create the build dir for the container to mount, generate the build script inside of it
+mkdir -p ${BASE_CONTAINER_BUILD_DIR}
+mkdir -p ${CPP_CONTAINER_BUILD_DIR}
+mkdir -p ${PYTHON_CONTAINER_BUILD_DIR}
+echo "${BUILD_SCRIPT}" > ${CPP_CONTAINER_BUILD_DIR}/build.sh
+chmod ugo+x ${CPP_CONTAINER_BUILD_DIR}/build.sh
+
+# Run the generated build script in a container
+docker pull ${DOCKER_IMAGE}
+docker run --runtime=nvidia --rm -it -e NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES} \
+       --user $(id -u):$(id -g) \
+       -v ${REPO_PATH}:${REPO_PATH_IN_CONTAINER} \
+       -v ${CPP_CONTAINER_BUILD_DIR}:${CPP_BUILD_DIR_IN_CONTAINER} \
+       -v ${PYTHON_CONTAINER_BUILD_DIR}:${PYTHON_BUILD_DIR_IN_CONTAINER} \
+       --cap-add=SYS_PTRACE \
+       ${DOCKER_IMAGE} bash -c "${COMMAND}"
diff --git a/conda/environments/cugraph_dev.yml b/conda/environments/cugraph_dev.yml
@@ -1,25 +1,26 @@
 name: cugraph_dev
 channels:
-- nvidia
-- rapidsai
-- numba
+- rapidsai/label/cuda9.2
+- nvidia/label/cuda9.2
+- rapidsai-nightly/label/cuda9.2
 - conda-forge
-- defaults
 dependencies:
-- cudf>=0.6
+- cudf=0.7.*
+- nvstrings=0.7.*
+- rmm=0.7.*
 - scipy
 - networkx
 - python-louvain
+- nccl
 - cudatoolkit
 - cmake>=3.12
 - python>=3.6,<3.8
-- numba>=0.40
+- numba>=0.41
 - pandas>=0.23.4
 - pyarrow=0.12.1
 - notebook>=0.5.0
 - boost
-- nvstrings>=0.3,<0.4
-- cffi>=1.10.0
+- cffi>=1.10.0   
 - distributed>=1.23.0
 - cython>=0.29,<0.30
 - pytest
@@ -30,7 +31,6 @@ dependencies:
 - numpydoc
 - ipython
 - recommonmark
-- pandoc=<2.0.0
+- pip
 - pip:
   - sphinx-markdown-tables
-
diff --git a/conda/environments/cugraph_dev_cuda10.yml b/conda/environments/cugraph_dev_cuda10.yml
@@ -1,25 +1,25 @@
 name: cugraph_dev
 channels:
-- nvidia/label/cuda10.0
 - rapidsai/label/cuda10.0
-- numba
+- nvidia/label/cuda10.0
+- rapidsai-nightly/label/cuda10.0
 - conda-forge
-- defaults
 dependencies:
-- cudf>=0.6
+- cudf=0.7.*
+- nvstrings=0.7.*
+- rmm=0.7.*
 - scipy
 - networkx
 - python-louvain
 - nccl
 - cudatoolkit
 - cmake>=3.12
 - python>=3.6,<3.8
-- numba>=0.40
+- numba>=0.41
 - pandas>=0.23.4
 - pyarrow=0.12.1
 - notebook>=0.5.0
 - boost
-- nvstrings>=0.3,<0.4
 - cffi>=1.10.0   
 - distributed>=1.23.0
 - cython>=0.29,<0.30
@@ -31,5 +31,6 @@ dependencies:
 - numpydoc
 - ipython
 - recommonmark
+- pip
 - pip:
-  - sphinx-markdown-tables
+  - sphinx-markdown-tables