Skip to content

Latest commit

 

History

History
320 lines (275 loc) · 18.2 KB

latitude-m4-metal-medium.md

File metadata and controls

320 lines (275 loc) · 18.2 KB

Eval On AMD 3GHz/16-Core + 125GB RAM + NVMe SSD (Bare Metal)

filter sparse ood
rank algorithm qps rank algorithm qps rank algorithm qps
1
zilliz
213.3K 1
zilliz
34.8K 1
scann
107.4K
2
pinecone
146.7K 2
pyanns
26.9K 2
pinecone-ood
76.9K
3
puck
62.3K 3
pinecone_smips
12.0K 3
zilliz
73.5K
4
parlayivf
55.0K 4
shnsw
8.2K 4
pyanns
55.5K
5
wm_filter
20.9K 5
nle
2.9K 5
sustech-ood
28.5K
6
pyanns
9.0K 6
cufe
0.1K 6
mysteryann-dif
27.9K
7
faissplus
8.5K 7
linscan
0.1K 7
mysteryann
26.6K
8
faiss
7.3K
spmat
8
vamana
20.0K
9
cufe
6.3K
sustech-whu
9
puck
19.0K
dhq
10
ngt
11.9K
fdufilterdiskann
11
epsearch
7.7K
hwtl_sdu_anns_filter
12
diskann
6.3K
13
cufe
5.4K
puck-fizz

Table Of Contents

Introduction

The NeurIPS2023 Practical Vector Search Challenge evaluated participating algorithms on Azure and EC2 CPU-based hardware instances.

In pursuit of expanding the evaluation criteria, we are also running on other generally available hardware configurations.

Shown here are results run on the following hardware:

  • AMD EPYC 9124 16-Core 3GHz processor
  • 125GB RAM
  • 440GB NVMe SSD
  • Bare-metal "m4-metal-medium" instance provided by Latitude

Results

The calculated rankings are shown at the top.

Notes:

  • Evaluations were run in late August 2024.
  • In each track, qualifying algorithms are ranked by largest qps where recall/ap >= 0.9.
  • All participating algorithms are shown for each track, but only qualifying algorithms are ranked.
  • Each track algorithm links to the build and run commmand used (or disqualifying errors, if any).
  • Pareto graphs for each track shown below.

Track: Filter

Filter

Track: Sparse

Sparse

Track: OOD

OOD

Track: Streaming

TODO

Data Export

The full data export CSV file can be found here.

Hardware_Inventory

How_To_Reproduce

This section shows the steps you can use to reproduce the results shown above, from scratch.

System Preparation

  • Signup for/sign into your Latitude account
  • Provision an "m4-metal-medium" instance with at least 100GB NVMe SSD with Linux 20.04.06 LTS
  • ssh remotely into the instance
  • update Linux via command sudo apt-get update
  • install Anaconda for Linux
  • run the following commands:
git clone [email protected]:harsha-simhadri/big-ann-benchmarks.git
cd big-ann-benchmarks
conda create -n bigann-latitude-m4-metal-medium python=3.10
conda activate bigann-latitude-m4-metal-medium
python -m pip install -r requirements_py3.10.txt 

Sparse Track

Prepare the track dataset by running the following command in the top-level directory of the repository:

python create_dataset.py --dataset sparse-full

See the latitude/commands directory for individual algorithm scripts.

Filter Track

Prepare the track dataset by running the following command in the top-level directory of the repository:

python create_dataset.py --dataset yfcc-10M

See the latitude/commands directory for individual algorithm scripts.

OOD Track

Prepare the track dataset by running the following command in the top-level directory of the repository:

python create_dataset.py --dataset text2image-10M 

See the latitude/commands directory for individual algorithm scripts.

Streaming Track

Prepare the track dataset by running the following command in the top-level directory of the repository:

python create_dataset.py --dataset msturing-30M-clustered
python -m benchmark.streaming.download_gt --runbook_file neurips23/streaming/final_runbook.yaml  --dataset msturing-30M-clustered

See the latitude/commands directory for individual algorithm scripts.

Analysis

To extract the data as CSV:

sudo chmod ugo+rw -R ./results/ # recursively add read/write permissions to directories and files under the results directory.
python data_export.py --recompute --output neurips23/latitude/data_export_m4-metal-medium.csv

To plot individual tracks:

python plot.py --neurips23track sparse --output neurips23/latitude/sparse.png --raw --recompute --dataset sparse-full
python plot.py --neurips23track filter --output neurips23/latitude/filter.png --raw --recompute --dataset yfcc-10M
python plot.py --neurips23track ood --output neurips23/latitude/ood.png --raw --recompute --dataset text2image-10M
TODO: streaming track

To render the ranking table, see this notebook.

Disclaimers_And_Credits

  • The hardware systems were graciously donated by Latitude
  • None of the Neurips2021/23 organizers is an employee or affiliated with Latitude.
  • George Williams, an organizer for both the NeurIPS2021 and NeurIPS2023 Competitions ran the evaluations described above.
  • Our main contact from Latitude is Victor Chiea, whom we were introduced by Harald Carlens from MLContests.
  • Latitude logo for sponsorship attribution below (note: it has a transparent background):