Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge issue #260

Merged
merged 29 commits into from
May 2, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
2a9c3eb
Updated test paths
afender Apr 25, 2019
dee078c
added a script to get test datasets from s3
afender Apr 25, 2019
ebedef5
added readme for datasets
afender Apr 25, 2019
237d265
Update README.md
afender Apr 25, 2019
fb0217a
improved the s3 script to set the dataset path
afender Apr 25, 2019
f25b128
Merge branch 'cpp_test_datasets' of github.com:afender/cugraph into c…
afender Apr 25, 2019
b64b192
Update README.md
afender Apr 25, 2019
526ff8c
changelog remove extra new line
Apr 26, 2019
5c05049
changelog
Apr 26, 2019
a847327
changing tar.gz to tgz
afender Apr 29, 2019
4db0b31
Merge branch 'cpp_test_datasets' of github.com:afender/cugraph into c…
afender Apr 29, 2019
54b85b5
Merge remote-tracking branch 'upstream/branch-0.7' into localbuild
Apr 29, 2019
1e01f70
Update CONTRIBUTING
Apr 29, 2019
2f5a018
Add local build script
Apr 29, 2019
8b37c21
Updated the main readme
afender Apr 29, 2019
90dd4f0
Add style check example
Apr 29, 2019
9fea2dd
Remove extra character in README
Apr 29, 2019
c958b6d
Merge branch 'branch-0.7' into cpp_test_datasets
afender Apr 29, 2019
d61a6b1
Update CHANGELOG.md
afender Apr 29, 2019
294cfc2
add rmm to the conda configuration
ChuckHastings Apr 30, 2019
55f08e3
update changelog
ChuckHastings Apr 30, 2019
ad5ddf8
try the technique cudf is using for managing conda environment
ChuckHastings Apr 30, 2019
afac161
Merge pull request #250 from dillon-cullinan/localbuild
BradReesWork Apr 30, 2019
e27af17
Merge pull request #252 from afender/cpp_test_datasets
BradReesWork Apr 30, 2019
5e21bc7
Merge pull request #253 from ChuckHastings/bug_add_rmm
BradReesWork Apr 30, 2019
109252c
add pip to the install, remove nightly files, update readme
ChuckHastings May 1, 2019
98414b4
update CHANGELOG
ChuckHastings May 1, 2019
7b47256
Merge pull request #256 from ChuckHastings/bug_add_pip
afender May 1, 2019
7dbf41e
Merge remote-tracking branch 'upstream/branch-0.8' into merge_issue
afender May 1, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
- PR #210 Expose degree calculation kernel via python API
- PR #220 Added bindings for Nvgraph triangle counting
- PR #234 Added bindings for renumbering, modify renumbering to use RMM

- PR #250 Add local build script to mimic gpuCI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in a new Release 0.8 section

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same for the Improvements and Bug Fixs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so these are 0.7 changes that were already merged in 0.7.
Recall that this PR is just manually doing what 0.7->0.8 integration that GitHub couldn't do automatically.


## Improvements
- PR #157 Removed cudatoolkit dependency in setup.py
Expand All @@ -21,8 +21,11 @@
- PR #215 Simplified get_rapids_dataset_root_dir(), set a default value for the root dir
- PR #233 Added csv datasets and edited test to use cudf for reading graphs
- PR #247 Added some documentation for renumbering
- PR #252 cpp test upgrades for more convenient testing on large input

## Bug Fixes
- PR #256 Add pip to the install, clean up conda instructions
- PR #253 Add rmm to conda configuration
- PR #226 Bump cudf dependencies to 0.7
- PR #169 Disable terminal output in sssp
- PR #191 Fix double upload bug
Expand Down
5 changes: 5 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,5 +49,10 @@ contributing to. Start with _Step 3_ from above, commenting on the issue to let
others know you are working on it. If you have any questions related to the
implementation of the issue, ask them in the issue instead of the PR.

### Building and Testing on a gpuCI image locally

Before submitting a pull request, you can do a local build and test on your machine that mimics our gpuCI environment using the `ci/local/build.sh` script.
For detailed information on usage of this script, see [here](ci/local/README.md).

## Attribution
Portions adopted from https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md
62 changes: 34 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,6 @@ To install cuGraph from source, ensure the dependencies are met and follow the s

2) Create the conda development environment

​ A) Building the `master` branch uses the `cugraph_dev` environment

```bash
# create the conda environment (assuming in base `cugraph` directory)
Expand All @@ -145,22 +144,6 @@ conda deactivate



​ B) Create the conda development environment `cugraph_nightly`

If you are on the latest development branch then you must use the `cugraph_nightly` environment. The latest cuGraph code uses the latest cuDF features that might not yet be in the master branch. To work off of the latest development branch, which could be unstable, use the nightly build environment.

```bash
# create the conda environment (assuming in base `cugraph` directory)
conda env create --name cugraph_nightly --file conda/environments/cugraph_nightly.yml

# activate the environment
conda activate cugraph_nightly

```




- The environment can be updated as development includes/changes the dependencies. To do so, run:


Expand Down Expand Up @@ -218,26 +201,48 @@ python setup.py install # install cugraph python bindings

#### Run tests

6. Run either the standalone tests or the Python tests with datasets
- **C++ stand alone tests**
6. Run either the C++ or the Python tests with datasets

From the build directory :
- **Python tests with datasets**

```bash
# Run the cugraph tests
cd $CUGRAPH_HOME
cd cpp/build
gtests/GDFGRAPH_TEST # this is an executable file
cd python
pytest
```
- **C++ stand alone tests**

- **Python tests with datasets**
From the build directory :

```bash
# Run the cugraph tests
cd $CUGRAPH_HOME
cd python
pytest
cd cpp/build
gtests/GDFGRAPH_TEST # this is an executable file
```

- **C++ tests with larger datasets**

If you already have the datasets:

```bash
export RAPIDS_DATASET_ROOT_DIR=<path_to_ccp_test_and_reference_data>
```
If you do not have the datasets:

```bash
cd $CUGRAPH_HOME/datasets
source get_test_data.sh #This takes about 10 minutes and download 1GB data (>5 GB uncompressed)
```

Run the C++ tests on large input:

```bash
cd $CUGRAPH_HOME/cpp/build
#test one particular analytics (eg. pagerank)
gtests/PAGERANK_TEST
#test everything
make test
```

Note: This conda installation only applies to Linux and Python versions 3.6/3.7.

Expand Down Expand Up @@ -322,4 +327,5 @@ The RAPIDS suite of open source software libraries aim to enable execution of en

### Apache Arrow on GPU

The GPU version of [Apache Arrow](https://arrow.apache.org/) is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.
The GPU version of [Apache Arrow](https://arrow.apache.org/) is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.

57 changes: 57 additions & 0 deletions ci/local/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
## Purpose

This script is designed for developer and contributor use. This tool mimics the actions of gpuCI on your local machine. This allows you to test and even debug your code inside a gpuCI base container before pushing your code as a GitHub commit.
The script can be helpful in locally triaging and debugging RAPIDS continuous integration failures.

## Requirements

```
nvidia-docker
```

## Usage

```
bash build.sh [-h] [-H] [-s] [-r <repo_dir>] [-i <image_name>]
Build and test your local repository using a base gpuCI Docker image

where:
-H Show this help text
-r Path to repository (defaults to working directory)
-i Use Docker image (default is gpuci/rapidsai-base:cuda10.0-ubuntu16.04-gcc5-py3.6)
-s Skip building and testing and start an interactive shell in a container of the Docker image
```

Example Usage:
`bash build.sh -r ~/rapids/cugraph -i gpuci/cuda9.2-ubuntu16.04-gcc5-py3.6`

For a full list of available gpuCI docker images, visit our [DockerHub](https://hub.docker.com/r/gpuci/rapidsai-base/tags) page.

Style Check:
```bash
$ bash ci/local/build.sh -r ~/rapids/cugraph -s
$ source activate gdf #Activate gpuCI conda environment
$ cd rapids
$ flake8 python
```

## Information

There are some caveats to be aware of when using this script, especially if you plan on developing from within the container itself.


### Docker Image Build Repository

The docker image will generate build artifacts in a folder on your machine located in the `root` directory of the repository you passed to the script. For the above example, the directory is named `~/rapids/cugraph/build_rapidsai-base_cuda9.2-ubuntu16.04-gcc5-py3.6/`. Feel free to remove this directory after the script is finished.

*Note*: The script *will not* override your local build repository. Your local environment stays in tact.


### Where The User is Dumped

The script will build your repository and run all tests. If any tests fail, it dumps the user into the docker container itself to allow you to debug from within the container. If all the tests pass as expected the container exits and is automatically removed. Remember to exit the container if tests fail and you do not wish to debug within the container itself.


### Container File Structure

Your repository will be located in the `/rapids/` folder of the container. This folder is volume mounted from the local machine. Any changes to the code in this repository are replicated onto the local machine. The `cpp/build` and `python/build` directories within your repository is on a separate mount to avoid conflicting with your local build artifacts.
104 changes: 104 additions & 0 deletions ci/local/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
#!/bin/bash

DOCKER_IMAGE="gpuci/rapidsai-base:cuda10.0-ubuntu16.04-gcc5-py3.6"
REPO_PATH=${PWD}
RAPIDS_DIR_IN_CONTAINER="/rapids"
CPP_BUILD_DIR="cpp/build"
PYTHON_BUILD_DIR="python/build"
CONTAINER_SHELL_ONLY=0

SHORTHELP="$(basename $0) [-h] [-H] [-s] [-r <repo_dir>] [-i <image_name>]"
LONGHELP="${SHORTHELP}
Build and test your local repository using a base gpuCI Docker image

where:
-H Show this help text
-r Path to repository (defaults to working directory)
-i Use Docker image (default is ${DOCKER_IMAGE})
-s Skip building and testing and start an interactive shell in a container of the Docker image
"

# Limit GPUs available to container based on CUDA_VISIBLE_DEVICES
if [[ -z "${CUDA_VISIBLE_DEVICES}" ]]; then
NVIDIA_VISIBLE_DEVICES="all"
else
NVIDIA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES}
fi

while getopts ":hHr:i:s" option; do
case ${option} in
r)
REPO_PATH=${OPTARG}
;;
i)
DOCKER_IMAGE=${OPTARG}
;;
s)
CONTAINER_SHELL_ONLY=1
;;
h)
echo "${SHORTHELP}"
exit 0
;;
H)
echo "${LONGHELP}"
exit 0
;;
*)
echo "ERROR: Invalid flag"
echo "${SHORTHELP}"
exit 1
;;
esac
done

REPO_PATH_IN_CONTAINER="${RAPIDS_DIR_IN_CONTAINER}/$(basename ${REPO_PATH})"
CPP_BUILD_DIR_IN_CONTAINER="${RAPIDS_DIR_IN_CONTAINER}/$(basename ${REPO_PATH})/${CPP_BUILD_DIR}"
PYTHON_BUILD_DIR_IN_CONTAINER="${RAPIDS_DIR_IN_CONTAINER}/$(basename ${REPO_PATH})/${PYTHON_BUILD_DIR}"


# BASE_CONTAINER_BUILD_DIR is named after the image name, allowing for
# multiple image builds to coexist on the local filesystem. This will
# be mapped to the typical BUILD_DIR inside of the container. Builds
# running in the container generate build artifacts just as they would
# in a bare-metal environment, and the host filesystem is able to
# maintain the host build in BUILD_DIR as well.
BASE_CONTAINER_BUILD_DIR=${REPO_PATH}/build_$(echo $(basename ${DOCKER_IMAGE})|sed -e 's/:/_/g')
CPP_CONTAINER_BUILD_DIR=${BASE_CONTAINER_BUILD_DIR}/cpp
PYTHON_CONTAINER_BUILD_DIR=${BASE_CONTAINER_BUILD_DIR}/python


BUILD_SCRIPT="#!/bin/bash
set -e
WORKSPACE=${REPO_PATH_IN_CONTAINER}
PREBUILD_SCRIPT=${REPO_PATH_IN_CONTAINER}/ci/gpu/prebuild.sh
BUILD_SCRIPT=${REPO_PATH_IN_CONTAINER}/ci/gpu/build.sh
cd ${WORKSPACE}
if [ -f \${PREBUILD_SCRIPT} ]; then
source \${PREBUILD_SCRIPT}
fi
yes | source \${BUILD_SCRIPT}
"

if (( ${CONTAINER_SHELL_ONLY} == 0 )); then
COMMAND="${CPP_BUILD_DIR_IN_CONTAINER}/build.sh || bash"
else
COMMAND="bash"
fi

# Create the build dir for the container to mount, generate the build script inside of it
mkdir -p ${BASE_CONTAINER_BUILD_DIR}
mkdir -p ${CPP_CONTAINER_BUILD_DIR}
mkdir -p ${PYTHON_CONTAINER_BUILD_DIR}
echo "${BUILD_SCRIPT}" > ${CPP_CONTAINER_BUILD_DIR}/build.sh
chmod ugo+x ${CPP_CONTAINER_BUILD_DIR}/build.sh

# Run the generated build script in a container
docker pull ${DOCKER_IMAGE}
docker run --runtime=nvidia --rm -it -e NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES} \
--user $(id -u):$(id -g) \
-v ${REPO_PATH}:${REPO_PATH_IN_CONTAINER} \
-v ${CPP_CONTAINER_BUILD_DIR}:${CPP_BUILD_DIR_IN_CONTAINER} \
-v ${PYTHON_CONTAINER_BUILD_DIR}:${PYTHON_BUILD_DIR_IN_CONTAINER} \
--cap-add=SYS_PTRACE \
${DOCKER_IMAGE} bash -c "${COMMAND}"
20 changes: 10 additions & 10 deletions conda/environments/cugraph_dev.yml
Original file line number Diff line number Diff line change
@@ -1,25 +1,26 @@
name: cugraph_dev
channels:
- nvidia
- rapidsai
- numba
- rapidsai/label/cuda9.2
- nvidia/label/cuda9.2
- rapidsai-nightly/label/cuda9.2
- conda-forge
- defaults
dependencies:
- cudf>=0.6
- cudf=0.7.*
- nvstrings=0.7.*
- rmm=0.7.*
- scipy
- networkx
- python-louvain
- nccl
- cudatoolkit
- cmake>=3.12
- python>=3.6,<3.8
- numba>=0.40
- numba>=0.41
- pandas>=0.23.4
- pyarrow=0.12.1
- notebook>=0.5.0
- boost
- nvstrings>=0.3,<0.4
- cffi>=1.10.0
- cffi>=1.10.0
- distributed>=1.23.0
- cython>=0.29,<0.30
- pytest
Expand All @@ -30,7 +31,6 @@ dependencies:
- numpydoc
- ipython
- recommonmark
- pandoc=<2.0.0
- pip
- pip:
- sphinx-markdown-tables

15 changes: 8 additions & 7 deletions conda/environments/cugraph_dev_cuda10.yml
Original file line number Diff line number Diff line change
@@ -1,25 +1,25 @@
name: cugraph_dev
channels:
- nvidia/label/cuda10.0
- rapidsai/label/cuda10.0
- numba
- nvidia/label/cuda10.0
- rapidsai-nightly/label/cuda10.0
- conda-forge
- defaults
dependencies:
- cudf>=0.6
- cudf=0.7.*
- nvstrings=0.7.*
- rmm=0.7.*
- scipy
- networkx
- python-louvain
- nccl
- cudatoolkit
- cmake>=3.12
- python>=3.6,<3.8
- numba>=0.40
- numba>=0.41
- pandas>=0.23.4
- pyarrow=0.12.1
- notebook>=0.5.0
- boost
- nvstrings>=0.3,<0.4
- cffi>=1.10.0
- distributed>=1.23.0
- cython>=0.29,<0.30
Expand All @@ -31,5 +31,6 @@ dependencies:
- numpydoc
- ipython
- recommonmark
- pip
- pip:
- sphinx-markdown-tables
- sphinx-markdown-tables
Loading