Skip to content

Commit

Permalink
Migrate from RAFT to CUVS (#3549)
Browse files Browse the repository at this point in the history
Summary:
Remove the dependency on `raft::compiled` and modify GPU implementations to use cuVS backend in place of RAFT.

A deeper insight into the dependency:
FAISS gets the ANN algorithm implementations such as IVF-Flat and IVF-PQ from cuVS. RAFT is meant to be a lightweight C++ header-only template library that cuVS relies on for the more fundamental / low-level utilities. Some examples of these are RAFT's device mdarray and mdspan objects; the RAFT resource object (`raft::resource`) that takes care of the stream ordering of device functions; linear algebra functions such as mapping, reduction, BLAS routines etc. A lot of the cuVS functions take the RAFT mdspan objects as arguments (for example `raft::device_matrix_view`). Therefore FAISS relies on both cuVS and RAFT. FAISS gets RAFT headers through cuVS and uses them to create the function arguments that can be consumed by cuVS. Note that we are not explicitly linking FAISS against `raft::raft` or `raft::compiled`. Only the required headers are included and compiled rather than compiling the whole RAFT shared library. This is the reason we still see mentions of `raft` in FAISS.

Pull Request resolved: #3549

Reviewed By: ramilbakhshyiev

Differential Revision: D62041013

Pulled By: asadoughi

fbshipit-source-id: 7230dcc06cf47baf95873adc1dec2adca4a8f82a
  • Loading branch information
tarang-jain authored and facebook-github-bot committed Nov 14, 2024
1 parent 0fb56d9 commit 1349220
Show file tree
Hide file tree
Showing 65 changed files with 1,095 additions and 1,054 deletions.
18 changes: 9 additions & 9 deletions .github/actions/build_cmake/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ inputs:
description: 'Enable GPU support.'
required: false
default: OFF
raft:
description: 'Enable RAFT support.'
cuvs:
description: 'Enable cuVS support.'
required: false
default: OFF
rocm:
Expand Down Expand Up @@ -50,11 +50,11 @@ runs:
if [ "${{ inputs.rocm }}" = "ON" ]; then
:
# regular CUDA for GPU builds
elif [ "${{ inputs.gpu }}" = "ON" ] && [ "${{ inputs.raft }}" = "OFF" ]; then
elif [ "${{ inputs.gpu }}" = "ON" ] && [ "${{ inputs.cuvs }}" = "OFF" ]; then
conda install -y -q cuda-toolkit -c "nvidia/label/cuda-12.4.0"
# and CUDA from RAFT channel for RAFT builds
elif [ "${{ inputs.raft }}" = "ON" ]; then
conda install -y -q libraft=24.06 cuda-version=12.4 cuda-toolkit -c rapidsai -c "nvidia/label/cuda-12.4.0" -c conda-forge
# and CUDA from cuVS channel for cuVS builds
elif [ "${{ inputs.cuvs }}" = "ON" ]; then
conda install -y -q libcuvs=24.08 cuda-version=12.4 cuda-toolkit -c rapidsai -c conda-forge -c "nvidia/label/cuda-12.4.0"
fi
# install test packages
Expand Down Expand Up @@ -102,7 +102,7 @@ runs:
sudo apt-get -qq clean >/dev/null
sudo rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
- name: Symblink system dependencies
if: inputs.raft == 'ON' || inputs.rocm == 'ON'
if: inputs.rocm == 'ON'
shell: bash
run: |
# symblink system libraries for HIP compiler
Expand All @@ -119,7 +119,7 @@ runs:
-DBUILD_TESTING=ON \
-DBUILD_SHARED_LIBS=ON \
-DFAISS_ENABLE_GPU=${{ inputs.gpu }} \
-DFAISS_ENABLE_RAFT=${{ inputs.raft }} \
-DFAISS_ENABLE_CUVS=${{ inputs.cuvs }} \
-DFAISS_ENABLE_ROCM=${{ inputs.rocm }} \
-DFAISS_OPT_LEVEL=${{ inputs.opt_level }} \
-DFAISS_ENABLE_C_API=ON \
Expand Down Expand Up @@ -174,5 +174,5 @@ runs:
if: always()
uses: actions/upload-artifact@v4
with:
name: test-results-arch=${{ runner.arch }}-opt=${{ inputs.opt_level }}-gpu=${{ inputs.gpu }}-raft=${{ inputs.raft }}-rocm=${{ inputs.rocm }}
name: test-results-arch=${{ runner.arch }}-opt=${{ inputs.opt_level }}-gpu=${{ inputs.gpu }}-cuvs=${{ inputs.cuvs }}-rocm=${{ inputs.rocm }}
path: test-results
20 changes: 10 additions & 10 deletions .github/actions/build_conda/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ inputs:
description: "CUDA toolkit version to use."
default: ""
required: false
raft:
description: "Enable RAFT support."
cuvs:
description: "Enable cuVS support."
default: ""
required: false
runs:
Expand Down Expand Up @@ -59,34 +59,34 @@ runs:
run: |
conda build faiss --user pytorch --label ${{ inputs.label }} -c pytorch
- name: Conda build (GPU)
if: inputs.label == '' && inputs.cuda != '' && inputs.raft == ''
if: inputs.label == '' && inputs.cuda != '' && inputs.cuvs == ''
shell: ${{ steps.choose_shell.outputs.shell }}
working-directory: conda
run: |
conda build faiss-gpu --variants '{ "cudatoolkit": "${{ inputs.cuda }}" }' \
-c pytorch -c nvidia/label/cuda-${{ inputs.cuda }} -c nvidia
- name: Conda build (GPU) w/ anaconda upload
if: inputs.label != '' && inputs.cuda != '' && inputs.raft == ''
if: inputs.label != '' && inputs.cuda != '' && inputs.cuvs == ''
shell: ${{ steps.choose_shell.outputs.shell }}
working-directory: conda
env:
PACKAGE_TYPE: ${{ inputs.label }}
run: |
conda build faiss-gpu --variants '{ "cudatoolkit": "${{ inputs.cuda }}" }' \
--user pytorch --label ${{ inputs.label }} -c pytorch -c nvidia/label/cuda-${{ inputs.cuda }} -c nvidia
- name: Conda build (GPU w/ RAFT)
if: inputs.label == '' && inputs.cuda != '' && inputs.raft != ''
- name: Conda build (GPU w/ cuVS)
if: inputs.label == '' && inputs.cuda != '' && inputs.cuvs != ''
shell: ${{ steps.choose_shell.outputs.shell }}
working-directory: conda
run: |
conda build faiss-gpu-raft --variants '{ "cudatoolkit": "${{ inputs.cuda }}" }' \
conda build faiss-gpu-cuvs --variants '{ "cudatoolkit": "${{ inputs.cuda }}" }' \
-c pytorch -c nvidia/label/cuda-${{ inputs.cuda }} -c nvidia -c rapidsai -c rapidsai-nightly -c conda-forge
- name: Conda build (GPU w/ RAFT) w/ anaconda upload
if: inputs.label != '' && inputs.cuda != '' && inputs.raft != ''
- name: Conda build (GPU w/ cuVS) w/ anaconda upload
if: inputs.label != '' && inputs.cuda != '' && inputs.cuvs != ''
shell: ${{ steps.choose_shell.outputs.shell }}
working-directory: conda
env:
PACKAGE_TYPE: ${{ inputs.label }}
run: |
conda build faiss-gpu-raft --variants '{ "cudatoolkit": "${{ inputs.cuda }}" }' \
conda build faiss-gpu-cuvs --variants '{ "cudatoolkit": "${{ inputs.cuda }}" }' \
--user pytorch --label ${{ inputs.label }} -c pytorch -c nvidia/label/cuda-${{ inputs.cuda }} -c nvidia -c rapidsai -c rapidsai-nightly -c conda-forge
18 changes: 9 additions & 9 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ jobs:
uses: ./.github/actions/build_cmake
with:
gpu: ON
linux-x86_64-GPU-w-RAFT-cmake:
name: Linux x86_64 GPU w/ RAFT (cmake)
linux-x86_64-GPU-w-CUVS-cmake:
name: Linux x86_64 GPU w/ cuVS (cmake)
needs: linux-x86_64-cmake
runs-on: 4-core-ubuntu-gpu-t4
steps:
Expand All @@ -89,7 +89,7 @@ jobs:
uses: ./.github/actions/build_cmake
with:
gpu: ON
raft: ON
cuvs: ON
linux-x86_64-GPU-w-ROCm-cmake:
name: Linux x86_64 GPU w/ ROCm (cmake)
needs: linux-x86_64-cmake
Expand Down Expand Up @@ -199,8 +199,8 @@ jobs:
with:
label: main
cuda: "11.4.4"
linux-x86_64-GPU-RAFT-packages-CUDA11-8-0:
name: Linux x86_64 GPU w/ RAFT packages (CUDA 11.8.0)
linux-x86_64-GPU-CUVS-packages-CUDA11-8-0:
name: Linux x86_64 GPU w/ cuVS packages (CUDA 11.8.0)
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v')
runs-on: 4-core-ubuntu-gpu-t4
env:
Expand All @@ -217,7 +217,7 @@ jobs:
ANACONDA_API_TOKEN: ${{ secrets.ANACONDA_API_TOKEN }}
with:
label: main
raft: "ON"
cuvs: "ON"
cuda: "11.8.0"
linux-x86_64-GPU-packages-CUDA-12-1-1:
name: Linux x86_64 GPU packages (CUDA 12.1.1)
Expand All @@ -238,8 +238,8 @@ jobs:
with:
label: main
cuda: "12.1.1"
linux-x86_64-GPU-RAFT-packages-CUDA12-1-1:
name: Linux x86_64 GPU w/ RAFT packages (CUDA 12.1.1)
linux-x86_64-GPU-CUVS-packages-CUDA12-1-1:
name: Linux x86_64 GPU w/ cuVS packages (CUDA 12.1.1)
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v')
runs-on: 4-core-ubuntu-gpu-t4
env:
Expand All @@ -256,7 +256,7 @@ jobs:
ANACONDA_API_TOKEN: ${{ secrets.ANACONDA_API_TOKEN }}
with:
label: main
raft: "ON"
cuvs: "ON"
cuda: "12.1.1"
windows-x86_64-packages:
name: Windows x86_64 packages
Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ jobs:
with:
label: nightly
cuda: "11.4.4"
linux-x86_64-GPU-RAFT-CUDA11-8-0-nightly:
name: Linux x86_64 GPU w/ RAFT nightlies (CUDA 11.8.0)
linux-x86_64-GPU-CUVS-CUDA11-8-0-nightly:
name: Linux x86_64 GPU w/ cuVS nightlies (CUDA 11.8.0)
runs-on: 4-core-ubuntu-gpu-t4
env:
CUDA_ARCHS: "70-real;72-real;75-real;80;86-real"
Expand All @@ -54,7 +54,7 @@ jobs:
ANACONDA_API_TOKEN: ${{ secrets.ANACONDA_API_TOKEN }}
with:
label: nightly
raft: "ON"
cuvs: "ON"
cuda: "11.8.0"
linux-x86_64-GPU-CUDA-12-1-1-nightly:
name: Linux x86_64 GPU nightlies (CUDA 12.1.1)
Expand All @@ -73,8 +73,8 @@ jobs:
with:
label: nightly
cuda: "12.1.1"
linux-x86_64-GPU-RAFT-CUDA12-1-1-nightly:
name: Linux x86_64 GPU w/ RAFT nightlies (CUDA 12.1.1)
linux-x86_64-GPU-CUVS-CUDA12-1-1-nightly:
name: Linux x86_64 GPU w/ cuVS nightlies (CUDA 12.1.1)
runs-on: 4-core-ubuntu-gpu-t4
env:
CUDA_ARCHS: "70-real;72-real;75-real;80;86-real"
Expand All @@ -89,7 +89,7 @@ jobs:
ANACONDA_API_TOKEN: ${{ secrets.ANACONDA_API_TOKEN }}
with:
label: nightly
raft: "ON"
cuvs: "ON"
cuda: "12.1.1"
windows-x86_64-nightly:
name: Windows x86_64 nightlies
Expand Down
10 changes: 5 additions & 5 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ if(FAISS_ENABLE_GPU)
endif()
endif()

if(FAISS_ENABLE_RAFT)
if(FAISS_ENABLE_CUVS)
include(cmake/thirdparty/fetch_rapids.cmake)
include(rapids-cmake)
include(rapids-cpm)
Expand All @@ -60,7 +60,7 @@ list(APPEND CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/cmake")
# Valid values are "generic", "avx2", "avx512", "sve".
option(FAISS_OPT_LEVEL "" "generic")
option(FAISS_ENABLE_GPU "Enable support for GPU indexes." ON)
option(FAISS_ENABLE_RAFT "Enable RAFT for GPU indexes." OFF)
option(FAISS_ENABLE_CUVS "Enable cuVS for GPU indexes." OFF)
option(FAISS_ENABLE_ROCM "Enable ROCm for GPU indexes." OFF)
option(FAISS_ENABLE_PYTHON "Build Python extension." ON)
option(FAISS_ENABLE_C_API "Build C API." OFF)
Expand All @@ -81,9 +81,9 @@ if(FAISS_ENABLE_GPU)
endif()
endif()

if(FAISS_ENABLE_RAFT AND NOT TARGET raft::raft)
find_package(raft COMPONENTS compiled distributed)
endif()
if(FAISS_ENABLE_CUVS AND NOT TARGET cuvs::cuvs)
find_package(cuvs)
endif()

add_subdirectory(faiss)

Expand Down
2 changes: 1 addition & 1 deletion INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ Several options can be passed to CMake, among which:
values are `ON` and `OFF`),
- `-DFAISS_ENABLE_PYTHON=OFF` in order to disable building python bindings
(possible values are `ON` and `OFF`),
- `-DFAISS_ENABLE_RAFT=ON` in order to enable building the RAFT implementations
- `-DFAISS_ENABLE_CUVS=ON` in order to enable building the cuVS implementations
of the IVF-Flat and IVF-PQ GPU-accelerated indices (default is `OFF`, possible
values are `ON` and `OFF`)
- `-DBUILD_TESTING=OFF` in order to disable building C++ tests,
Expand Down
62 changes: 31 additions & 31 deletions benchs/bench_ivfflat_raft.py → benchs/bench_ivfflat_cuvs.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
#
# Copyright (c) 2023, NVIDIA CORPORATION.
# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -44,8 +44,8 @@ def aa(*args, **kwargs):
help='whether to benchmark add operation on GPU index')
aa('--bm_search', default=True,
help='whether to benchmark search operation on GPU index')
aa('--raft_only', default=False, action='store_true',
help='whether to only produce RAFT enabled benchmarks')
aa('--cuvs_only', default=False, action='store_true',
help='whether to only produce cuVS enabled benchmarks')


group = parser.add_argument_group('IVF options')
Expand All @@ -70,9 +70,9 @@ def aa(*args, **kwargs):
mr = rmm.mr.PoolMemoryResource(rmm.mr.CudaMemoryResource())
rmm.mr.set_current_device_resource(mr)

def bench_train_milliseconds(index, trainVecs, use_raft):
def bench_train_milliseconds(index, trainVecs, use_cuvs):
co = faiss.GpuMultipleClonerOptions()
co.use_raft = use_raft
co.use_cuvs = use_cuvs
index_gpu = faiss.index_cpu_to_gpu(res, 0, index, co)
t0 = time.time()
index_gpu.train(trainVecs)
Expand All @@ -89,21 +89,21 @@ def bench_train_milliseconds(index, trainVecs, use_raft):
for n_cols in dataset_dims:
index = faiss.index_factory(n_cols, "IVF{},Flat".format(args.n_centroids))
trainVecs = rs.rand(n_rows, n_cols).astype('float32')
raft_gpu_train_time = bench_train_milliseconds(
cuvs_gpu_train_time = bench_train_milliseconds(
index, trainVecs, True)
if args.raft_only:
print("Method: IVFFlat, Operation: TRAIN, dim: %d, n_centroids %d, numTrain: %d, RAFT enabled GPU train time: %.3f milliseconds" % (
n_cols, args.n_centroids, n_rows, raft_gpu_train_time))
if args.cuvs_only:
print("Method: IVFFlat, Operation: TRAIN, dim: %d, n_centroids %d, numTrain: %d, cuVS enabled GPU train time: %.3f milliseconds" % (
n_cols, args.n_centroids, n_rows, cuvs_gpu_train_time))
else:
classical_gpu_train_time = bench_train_milliseconds(
index, trainVecs, False)
print("Method: IVFFlat, Operation: TRAIN, dim: %d, n_centroids %d, numTrain: %d, classical GPU train time: %.3f milliseconds, RAFT enabled GPU train time: %.3f milliseconds" % (
n_cols, args.n_centroids, n_rows, classical_gpu_train_time, raft_gpu_train_time))
print("Method: IVFFlat, Operation: TRAIN, dim: %d, n_centroids %d, numTrain: %d, classical GPU train time: %.3f milliseconds, cuVS enabled GPU train time: %.3f milliseconds" % (
n_cols, args.n_centroids, n_rows, classical_gpu_train_time, cuvs_gpu_train_time))


def bench_add_milliseconds(index, addVecs, use_raft):
def bench_add_milliseconds(index, addVecs, use_cuvs):
co = faiss.GpuMultipleClonerOptions()
co.use_raft = use_raft
co.use_cuvs = use_cuvs
index_gpu = faiss.index_cpu_to_gpu(res, 0, index, co)
index_gpu.copyFrom(index)
t0 = time.time()
Expand All @@ -125,20 +125,20 @@ def bench_add_milliseconds(index, addVecs, use_raft):
for n_rows in addset_sizes:
for n_cols in dataset_dims:
addVecs = rs.rand(n_rows, n_cols).astype('float32')
raft_gpu_add_time = bench_add_milliseconds(index, addVecs, True)
if args.raft_only:
print("Method: IVFFlat, Operation: ADD, dim: %d, n_centroids %d, numAdd: %d, RAFT enabled GPU add time: %.3f milliseconds" % (
n_train, n_rows, n_cols, args.n_centroids, raft_gpu_add_time))
cuvs_gpu_add_time = bench_add_milliseconds(index, addVecs, True)
if args.cuvs_only:
print("Method: IVFFlat, Operation: ADD, dim: %d, n_centroids %d, numAdd: %d, cuVS enabled GPU add time: %.3f milliseconds" % (
n_train, n_rows, n_cols, args.n_centroids, cuvs_gpu_add_time))
else:
classical_gpu_add_time = bench_add_milliseconds(
index, addVecs, False)
print("Method: IVFFlat, Operation: ADD, dim: %d, n_centroids %d, numAdd: %d, classical GPU add time: %.3f milliseconds, RAFT enabled GPU add time: %.3f milliseconds" % (
n_train, n_rows, n_cols, args.n_centroids, classical_gpu_add_time, raft_gpu_add_time))
print("Method: IVFFlat, Operation: ADD, dim: %d, n_centroids %d, numAdd: %d, classical GPU add time: %.3f milliseconds, cuVS enabled GPU add time: %.3f milliseconds" % (
n_train, n_rows, n_cols, args.n_centroids, classical_gpu_add_time, cuvs_gpu_add_time))


def bench_search_milliseconds(index, addVecs, queryVecs, nprobe, k, use_raft):
def bench_search_milliseconds(index, addVecs, queryVecs, nprobe, k, use_cuvs):
co = faiss.GpuMultipleClonerOptions()
co.use_raft = use_raft
co.use_cuvs = use_cuvs
index_gpu = faiss.index_cpu_to_gpu(res, 0, index, co)
index_gpu.copyFrom(index)
index_gpu.add(addVecs)
Expand All @@ -163,19 +163,19 @@ def bench_search_milliseconds(index, addVecs, queryVecs, nprobe, k, use_raft):
addVecs = rs.rand(n_add, n_cols).astype('float32')
for n_rows in queryset_sizes:
queryVecs = rs.rand(n_rows, n_cols).astype('float32')
raft_gpu_search_time = bench_search_milliseconds(
cuvs_gpu_search_time = bench_search_milliseconds(
index, addVecs, queryVecs, args.nprobe, args.k, True)
if args.raft_only:
print("Method: IVFFlat, Operation: SEARCH, dim: %d, n_centroids: %d, numVecs: %d, numQuery: %d, nprobe: %d, k: %d, RAFT enabled GPU search time: %.3f milliseconds" % (
n_cols, args.n_centroids, n_add, n_rows, args.nprobe, args.k, raft_gpu_search_time))
if args.cuvs_only:
print("Method: IVFFlat, Operation: SEARCH, dim: %d, n_centroids: %d, numVecs: %d, numQuery: %d, nprobe: %d, k: %d, cuVS enabled GPU search time: %.3f milliseconds" % (
n_cols, args.n_centroids, n_add, n_rows, args.nprobe, args.k, cuvs_gpu_search_time))
else:
classical_gpu_search_time = bench_search_milliseconds(
index, addVecs, queryVecs, args.nprobe, args.k, False)
print("Method: IVFFlat, Operation: SEARCH, dim: %d, n_centroids: %d, numVecs: %d, numQuery: %d, nprobe: %d, k: %d, classical GPU search time: %.3f milliseconds, RAFT enabled GPU search time: %.3f milliseconds" % (
n_cols, args.n_centroids, n_add, n_rows, args.nprobe, args.k, classical_gpu_search_time, raft_gpu_search_time))
print("Method: IVFFlat, Operation: SEARCH, dim: %d, n_centroids: %d, numVecs: %d, numQuery: %d, nprobe: %d, k: %d, classical GPU search time: %.3f milliseconds, cuVS enabled GPU search time: %.3f milliseconds" % (
n_cols, args.n_centroids, n_add, n_rows, args.nprobe, args.k, classical_gpu_search_time, cuvs_gpu_search_time))

print("=" * 40)
print("Large RAFT Enabled Benchmarks")
print("Large cuVS Enabled Benchmarks")
print("=" * 40)
# Avoid classical GPU Benchmarks for large datasets because of OOM for more than 500000 queries and/or large dims as well as for large k
queryset_sizes = [100000, 500000, 1000000]
Expand All @@ -188,7 +188,7 @@ def bench_search_milliseconds(index, addVecs, queryVecs, nprobe, k, use_raft):
addVecs = rs.rand(n_add, n_cols).astype('float32')
for n_rows in queryset_sizes:
queryVecs = rs.rand(n_rows, n_cols).astype('float32')
raft_gpu_search_time = bench_search_milliseconds(
cuvs_gpu_search_time = bench_search_milliseconds(
index, addVecs, queryVecs, args.nprobe, args.k, True)
print("Method: IVFFlat, Operation: SEARCH, numTrain: %d, dim: %d, n_centroids: %d, numVecs: %d, numQuery: %d, nprobe: %d, k: %d, RAFT enabled GPU search time: %.3f milliseconds" % (
n_cols, args.n_centroids, n_add, n_rows, args.nprobe, args.k, raft_gpu_search_time))
print("Method: IVFFlat, Operation: SEARCH, numTrain: %d, dim: %d, n_centroids: %d, numVecs: %d, numQuery: %d, nprobe: %d, k: %d, cuVS enabled GPU search time: %.3f milliseconds" % (
n_cols, args.n_centroids, n_add, n_rows, args.nprobe, args.k, cuvs_gpu_search_time))
Loading

0 comments on commit 1349220

Please sign in to comment.