PyGDF implements the Python interface to access and manipulate the GPU DataFrame of GPU Open Analytics Initiative (GoAi). We aim to provide a simple interface that is similar to the Pandas DataFrame and hide the details of GPU programming.
Read more about GoAi and the GDF
You can get a minimal conda installation with Miniconda or get the full installation with Anaconda.
You can install and update PyGDF using the conda command:
conda install -c numba -c conda-forge -c gpuopenanalytics/label/dev -c defaults pygdf=0.1.0a3
You can create and activate a development environment using the conda command:
conda env create --name pygdf_dev --file conda_environments/testing_py35.yml
source activate pygdf_dev
To install PyGDF from source, clone the repository and run the python install command:
git clone https://github.com/gpuopenanalytics/pygdf.git
python setup.py install
Note: This assumes dependencies including libgdf are already installed, so it is recommended to use the conda environment.
A Dockerfile is provided for building and installing LibGDF and PyGDF from their respective master branches.
Notes:
- We test with and recommended installing nvidia-docker2
- Host's installed nvidia driver must support >= the specified CUDA version (9.2 by default).
- Alternative CUDA_VERSION should be specified via Docker build-arg
- Alternate branches for libgdf and pygdf may be specified as Docker build-args LIBGDF_REPO and PYGDF_REPO. See Dockerfile for example.
- Ubuntu 16.04 is the default OS for this container. Alternate OSes may be specified as Docker build-arg LINUX_VERSION. See list of available images.
- Python 3.6 is default, but other versions may be specified via PYTHON_VERSION build-arg
- GCC & G++ 5.x are default compiler versions, but other versions (which are supplied by the OS package manager) may be specified via CC and CXX build-args respectively
- numba (0.40.0), numpy (1.14.3), and pandas (0.20.3) versions are also configurable as build-args
From pygdf project root, to build with defaults:
docker build -t pygdf .
...
---> ec65aaa3d4b1
Successfully built ec65aaa3d4b1
Successfully tagged pygdf:latest
docker run --runtime=nvidia -it pygdf bash
/# source activate gdf
(gdf) root@3f689ba9c842:/# python -c "import pygdf"
(gdf) root@3f689ba9c842:/#
Currently, we don't support pip install yet. Please use conda for the time being.
This project uses py.test.
In the source root directory and with the development environment activated, run:
py.test
Please see the Demo Docker Repository for example notebooks on how you can utilize the GPU DataFrame.
The GPU Open Analytics Initiative (GoAi) seeks to foster and develop open collaboration between GPU analytics projects and products to enable data scientists to efficiently combine the best tools for their workflows. The first project of GoAi is the GPU DataFrame (GDF), which enables tabular data to be directly exchanged between libraries and applications on the GPU.
The GPU DataFrame is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. The GPU DataFrame uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Arrow are supported.