Skip to content

Commit

Permalink
Merge branch 'facebookresearch:main' into convert-circleci-to-github-…
Browse files Browse the repository at this point in the history
…actions
  • Loading branch information
robandpdx authored Mar 26, 2024
2 parents a2902fb + 14b8af6 commit af5b52f
Show file tree
Hide file tree
Showing 16 changed files with 137 additions and 971 deletions.
4 changes: 2 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ jobs:
command: |
cd conda
conda build faiss-gpu-raft --variants '{ "cudatoolkit": "<<parameters.cuda>>", "c_compiler_version": "<<parameters.compiler_version>>", "cxx_compiler_version": "<<parameters.compiler_version>>" }' \
-c pytorch -c nvidia/label/cuda-<<parameters.cuda>> -c nvidia -c rapidsai -c conda-forge
-c pytorch -c nvidia/label/cuda-<<parameters.cuda>> -c nvidia -c rapidsai -c rapidsai-nightly -c conda-forge
- when:
condition:
and:
Expand All @@ -182,7 +182,7 @@ jobs:
command: |
cd conda
conda build faiss-gpu-raft --variants '{ "cudatoolkit": "<<parameters.cuda>>", "c_compiler_version": "<<parameters.compiler_version>>", "cxx_compiler_version": "<<parameters.compiler_version>>" }' \
--user pytorch --label <<parameters.label>> -c pytorch -c nvidia/label/cuda-<<parameters.cuda>> -c nvidia -c rapidsai -c conda-forge
--user pytorch --label <<parameters.label>> -c pytorch -c nvidia/label/cuda-<<parameters.cuda>> -c nvidia -c rapidsai -c rapidsai-nightly -c conda-forge
build_cmake:
parameters:
Expand Down
137 changes: 2 additions & 135 deletions benchs/link_and_code/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,138 +21,5 @@ graph to improve the reconstruction. It is described in

ArXiV [here](https://arxiv.org/abs/1804.09996)

Code structure
--------------

The test runs with 3 files:

- `bench_link_and_code.py`: driver script

- `datasets.py`: code to load the datasets. The example code runs on the
deep1b and bigann datasets. See the [toplevel README](../README.md)
on how to download them. They should be put in a directory, edit
datasets.py to set the path.

- `neighbor_codec.py`: this is where the representation is trained.

The code runs on top of Faiss. The HNSW index can be extended with a
`ReconstructFromNeighbors` C++ object that refines the distances. The
training is implemented in Python.

Update: 2023-12-28: the current Faiss dropped support for reconstruction with
this method.

Reproducing Table 2 in the paper
--------------------------------

The results of table 2 (accuracy on deep100M) in the paper can be
obtained with:

```bash
python bench_link_and_code.py \
--db deep100M \
--M0 6 \
--indexkey OPQ36_144,HNSW32_PQ36 \
--indexfile $bdir/deep100M_PQ36_L6.index \
--beta_nsq 4 \
--beta_centroids $bdir/deep100M_PQ36_L6_nsq4.npy \
--neigh_recons_codes $bdir/deep100M_PQ36_L6_nsq4_codes.npy \
--k_reorder 0,5 --efSearch 1,1024
```

Set `bdir` to a scratch directory.

Explanation of the flags:

- `--db deep1M`: dataset to process

- `--M0 6`: number of links on the base level (L6)

- `--indexkey OPQ36_144,HNSW32_PQ36`: Faiss index key to construct the
HNSW structure. It means that vectors are transformed by OPQ and
encoded with PQ 36x8 (with an intermediate size of 144D). The HNSW
level>0 nodes have 32 links (theses ones are "cheap" to store
because there are fewer nodes in the upper levels.

- `--indexfile $bdir/deep1M_PQ36_M6.index`: name of the index file
(without information for the L&C extension)

- `--beta_nsq 4`: number of bytes to allocate for the codes (M in the
paper)

- `--beta_centroids $bdir/deep1M_PQ36_M6_nsq4.npy`: filename to store
the trained beta centroids

- `--neigh_recons_codes $bdir/deep1M_PQ36_M6_nsq4_codes.npy`: filename
for the encoded weights (beta) of the combination

- `--k_reorder 0,5`: number of results to reorder. 0 = baseline
without reordering, 5 = value used throughout the paper

- `--efSearch 1,1024`: number of nodes to visit (T in the paper)

The script will proceed with the following steps:

0. load dataset (and possibly compute the ground-truth if the
ground-truth file is not provided)

1. train the OPQ encoder

2. build the index and store it

3. compute the residuals and train the beta vocabulary to do the reconstruction

4. encode the vertices

5. search and evaluate the search results.

With option `--exhaustive` the results of the exhaustive column can be
obtained.

The run above should output:
```bash
...
setting k_reorder=5
...
efSearch=1024 0.3132 ms per query, R@1: 0.4283 R@10: 0.6337 R@100: 0.6520 ndis 40941919 nreorder 50000

```
which matches the paper's table 2.

Note that in multi-threaded mode, the building of the HNSW structure
is not deterministic. Therefore, the results across runs may not be exactly the same.

Reproducing Figure 5 in the paper
---------------------------------

Figure 5 just evaluates the combination of HNSW and PQ. For example,
the operating point L6&OPQ40 can be obtained with

```bash
python bench_link_and_code.py \
--db deep1M \
--M0 6 \
--indexkey OPQ40_160,HNSW32_PQ40 \
--indexfile $bdir/deep1M_PQ40_M6.index \
--beta_nsq 1 --beta_k 1 \
--beta_centroids $bdir/deep1M_PQ40_M6_nsq0.npy \
--neigh_recons_codes $bdir/deep1M_PQ36_M6_nsq0_codes.npy \
--k_reorder 0 --efSearch 16,64,256,1024
```

The arguments are similar to the previous table. Note that nsq = 0 is
simulated by setting beta_nsq = 1 and beta_k = 1 (ie a code with a single
reproduction value).

The output should look like:

```bash
setting k_reorder=0
efSearch=16 0.0147 ms per query, R@1: 0.3409 R@10: 0.4388 R@100: 0.4394 ndis 2629735 nreorder 0
efSearch=64 0.0122 ms per query, R@1: 0.4836 R@10: 0.6490 R@100: 0.6509 ndis 4623221 nreorder 0
efSearch=256 0.0344 ms per query, R@1: 0.5730 R@10: 0.7915 R@100: 0.7951 ndis 11090176 nreorder 0
efSearch=1024 0.2656 ms per query, R@1: 0.6212 R@10: 0.8722 R@100: 0.8765 ndis 33501951 nreorder 0
```

The results with k_reorder=5 are not reported in the paper, they
represent the performance of a "free coding" version of the algorithm.
The necessary code for this paper was removed from Faiss in version 1.8.0.
For a functioning verinsion, use Faiss 1.7.4.
Loading

0 comments on commit af5b52f

Please sign in to comment.