cuGraph - GPU Graph Analytics

The RAPIDS cuGraph library is a collection of GPU accelerated graph algorithms that process data found in GPU DataFrames. The vision of cuGraph is to make graph analysis ubiquitous to the point that users just think in terms of analysis and not technologies or frameworks. To realize that vision, cuGraph operates, at the Python layer, on GPU DataFrames, thereby allowing for seamless passing of data between ETL tasks in cuDF and machine learning tasks in cuML. Data scientists familiar with Python will quickly pick up how cuGraph integrates with the Pandas-like API of cuDF. Likewise, users familiar with NetworkX will quickly recognize the NetworkX-like API provided in cuGraph, with the goal to allow existing code to be ported with minimal effort into RAPIDS.

While the high-level cugraph python API provides an easy-to-use and familiar interface for data scientists that's consistent with other RAPIDS libraries in their workflow, some use cases require access to lower-level graph theory concepts. For these users, we provide an additional Python API called pylibcugraph, intended for applications that require a tighter integration with cuGraph at the Python layer with fewer dependencies. Users familiar with C/C++/CUDA and graph structures can access libcugraph and libcugraph_c for low level integration outside of python.

For more project details, see rapids.ai.

NOTE: For the latest stable README.md ensure you are on the latest branch.

As an example, the following Python snippet loads graph data and computes PageRank:

import cudf
import cugraph

# read data into a cuDF DataFrame using read_csv
gdf = cudf.read_csv("graph_data.csv", names=["src", "dst"], dtype=["int32", "int32"])

# We now have data as edge pairs
# create a Graph using the source (src) and destination (dst) vertex pairs
G = cugraph.Graph()
G.from_cudf_edgelist(gdf, source='src', destination='dst')

# Let's now get the PageRank score of each vertex by calling cugraph.pagerank
df_page = cugraph.pagerank(G)

# Let's look at the top 10 PageRank Score
df_page.sort_values('pagerank', ascending=False).head(10)

Getting cuGraph

There are 3 ways to get cuGraph :

Quick start with Docker Repo
Conda Installation
Build from Source

cuGraph News

Scaling to 1 Trillion Edges

At GTC Spring '22 we presented results of running cuGraph on the Selene supercomputer using 2,048 GPUs and processing a graph with 1.1 Trillion edges. Synthetic data created with the RMAT generator found in cuGraph.

cuGraph Scaling

cuGraph Software Stack

cuGraph has a new multi-layer software stack that allows users and system integrators to access cuGraph at different layers.

cuGraph Software Stack

Currently Supported Features

As of Release 22.06

Supported Data Types

cuGraph supports graph creation with Source and Destination being expressed as:

cuDF DataFrame
Pandas DataFrame

cuGraph supports execution of graph algorithms from different graph objects

cuGraph Graph classes
NetworkX graph classes
CuPy sparse matrix
SciPy sparse matrix

cuGraph tries to match the return type based on the input type. So a NetworkX input will return the same data type that NetworkX would have.

Supported Graph

Type	Description
Graph	An undirected Graph by default
	directed=True yields a Directed Graph
Multigraph	A Graph with multiple edges between a vertex pair

ALL Algorithms support Graphs and MultiGraph (directed and undirected)

Supported Algorithms

Italic algorithms are planned for future releases.

Category	Algorithm	Scale	Notes
Centrality
	Katz	Multi-GPU
	Betweenness Centrality	Single-GPU
	Edge Betweenness Centrality	Single-GPU
	Eigenvector Centrality	Multi-GPU
	Degree Centrality	Multi-GPU	Python only
Community
	Leiden	Single-GPU
	Louvain	Multi-GPU
	Ensemble Clustering for Graphs	Single-GPU
	Spectral-Clustering - Balanced Cut	Single-GPU
	Spectral-Clustering - Modularity	Single-GPU
	Subgraph Extraction	Single-GPU
	Triangle Counting	Multi-GPU
	K-Truss	Single-GPU
Components
	Weakly Connected Components	Multi-GPU
	Strongly Connected Components	Single-GPU
Core
	K-Core	Single-GPU
	Core Number	Single-GPU
Flow
	MaxFlow	---
Influence
	Influence Maximization	---
Layout
	Force Atlas 2	Single-GPU
Linear Assignment
	Hungarian	Single-GPU	README
Link Analysis
	Pagerank	Multi-GPU	C++ README
	Personal Pagerank	Multi-GPU	C++ README
	HITS	Multi-GPU
Link Prediction
	Jaccard Similarity	Single-GPU
	Weighted Jaccard Similarity	Single-GPU
	Overlap Similarity	Single-GPU
	Sorensen Coefficient	Single-GPU	Python only
	Local Clustering Coefficient	---
Sampling
	Random Walks (RW)	Single-GPU	Biased and Uniform
	Egonet	Single-GPU	multi-seed
	Node2Vec	Single-GPU
	Neighborhood sampling	Multi-GPU
Traversal
	Breadth First Search (BFS)	Multi-GPU	with cutoff support C++ README
	Single Source Shortest Path (SSSP)	Multi-GPU	C++ README
	ASSP / APSP
Tree
	Minimum Spanning Tree	Single-GPU
	Maximum Spanning Tree	Single-GPU
Other
	Renumbering	Multi-GPU	multiple columns, any data type
	Symmetrize	Multi-GPU
	Path Extraction		Extract paths from BFS/SSP results in parallel
Data Generator
	RMAT	Multi-GPU
	Barabasi-Albert	---

cuGraph Notice

Vertex IDs are expected to be contiguous integers starting from 0. If your data doesn't match that restriction, we have a solution. cuGraph provides the renumber function, which is by default automatically called when data is added to a graph. Input vertex IDs for the renumber function can be any type, can be non-contiguous, can be multiple columns, and can start from an arbitrary number. The renumber function maps the provided input vertex IDs to either 32- or 64-bit contiguous integers starting from 0.

Additionally, when using the auto-renumbering feature, vertices are automatically un-renumbered in results.

cuGraph is constantly being updated and improved. Please see the Transition Guide if errors are encountered with newer versions

Graph Sizes and GPU Memory Size

The amount of memory required is dependent on the graph structure and the analytics being executed. As a simple rule of thumb, the amount of GPU memory should be about twice the size of the data size. That gives overhead for the CSV reader and other transform functions. There are ways around the rule but using smaller data chunks.

Size	Recommended GPU Memory
500 million edges	32 GB
250 million edges	16 GB

The use of managed memory for oversubscription can also be used to exceed the above memory limitations. See the recent blog on Tackling Large Graphs with RAPIDS cuGraph and CUDA Unified Memory on GPUs: https://medium.com/rapids-ai/tackling-large-graphs-with-rapids-cugraph-and-unified-virtual-memory-b5b69a065d4

Quick Start

Please see the Docker Repository, choosing a tag based on the NVIDIA CUDA version you’re running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize all of the RAPIDS libraries: cuDF, cuML, and cuGraph.

Conda

It is easy to install cuGraph using conda. You can get a minimal conda installation with Miniconda or get the full installation with Anaconda.

Install and update cuGraph using the conda command:

# CUDA 11.4
conda install -c nvidia -c rapidsai -c numba -c conda-forge cugraph cudatoolkit=11.4

# CUDA 11.5
conda install -c nvidia -c rapidsai -c numba -c conda-forge cugraph cudatoolkit=11.5

For CUDA > 11.5, please use the 11.5 environment

Note: This conda installation only applies to Linux and Python versions 3.8/3.9.

Build from Source and Contributing

Please see our guide for building cuGraph from source

Please see our guide for contributing to cuGraph.

Documentation

Python API documentation can be generated from docs directory.

Projects that use cuGraph

(alphabetical order)

ArangoDB - a free and open-source native multi-model database system - https://www.arangodb.com/
CuPy - "NumPy/SciPy-compatible Array Library for GPU-accelerated Computing with Python" - https://cupy.dev/
Memgraph - In-memory database - https://memgraph.com/
ScanPy - a scalable toolkit for analyzing single-cell gene expression data - https://scanpy.readthedocs.io/en/stable/

Open GPU Data Science

The RAPIDS suite of open source software libraries aims to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

Apache Arrow on GPU

The GPU version of Apache Arrow is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.

Name		Name	Last commit message	Last commit date
Latest commit History 5,392 Commits
.github		.github
benchmarks		benchmarks
ci		ci
conda		conda
cpp		cpp
datasets		datasets
docs/cugraph		docs/cugraph
github/workflows		github/workflows
img		img
notebooks		notebooks
python		python
thirdparty		thirdparty
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
PRTAGS.md		PRTAGS.md
README.md		README.md
SOURCEBUILD.md		SOURCEBUILD.md
TRANSITIONGUIDE.md		TRANSITIONGUIDE.md
build.sh		build.sh
codecov.yml		codecov.yml
conda_build.sh		conda_build.sh
print_env.sh		print_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cuGraph - GPU Graph Analytics

Getting cuGraph

cuGraph News

Scaling to 1 Trillion Edges

cuGraph Software Stack

Currently Supported Features

Supported Data Types

Supported Graph

Supported Algorithms

cuGraph Notice

Graph Sizes and GPU Memory Size

Quick Start

Conda

Build from Source and Contributing

Documentation

Projects that use cuGraph

Open GPU Data Science

Apache Arrow on GPU

About

Releases

Packages

Languages

License

eriknw/cugraph

Folders and files

Latest commit

History

Repository files navigation

cuGraph - GPU Graph Analytics

Getting cuGraph

cuGraph News

Scaling to 1 Trillion Edges

cuGraph Software Stack

Currently Supported Features

Supported Data Types

Supported Graph

Supported Algorithms

cuGraph Notice

Graph Sizes and GPU Memory Size

Quick Start

Conda

Build from Source and Contributing

Documentation

Projects that use cuGraph

Open GPU Data Science

Apache Arrow on GPU

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages