Skip to content

gaurav16gupta/constrainedANN

Repository files navigation

With the surging popularity of approximate near-neighbor search (ANNS), driven by advances in neural representation learning, the ability to serve queries accompanied by a set of constraints has become an area of intense interest. While the community has recently proposed several algorithms for constrained ANNS, almost all of these methods focus on integration with graph-based indexes, the predominant class of algorithms achieving state-of-the-art performance in latency-recall tradeoffs. In this work, we take a different approach and focus on developing a constrained ANNS algorithm via space partitioning as opposed to graphs. To that end, we introduce Constrained Approximate Partitioned Search (CAPS), an index for ANNS with filters via space partitions that not only retains the benefits of a partition-based algorithm but also outperforms state-of-the-art graph-based constrained search techniques in recall-latency tradeoffs, with only 10% of the index size.

Install (only for Faiss Kmeans clustering)

  • Faiss

    cd ..
    git clone https://github.com/facebookresearch/faiss.git
    cd faiss
    cmake -B build .
    make -C build -j faiss
    make -C build install
    

    This will generate the libfaiss.a

  • OpenBLAS

    git clone https://github.com/xianyi/OpenBLAS.git
    make
    

    This will generate the libopenblas.a

Provide path at INC and LFLAGS in Makefile

  • INC=-I faiss -I include/
  • LFLAGS=faiss/build/faiss/libfaiss.a OpenBLAS/libopenblas.a -lpthread -lm -ldl -lgfortran -fopenmp

Get Data:

  • Download from https://github.com/AshenOn3/NHQ
  • To genearate synthetic data: Run generateRandomTokens.py then getGT-filterSearch.py, each time changing the attribute length (default =3) of attributes to generate the synthetic attributes and groundtruth.

For your own data

  • base vectors and query vectors are stored in .fvecs format
  • base and query attributes are stored in .txt files.
  • Example -
<num points> <num attributes>
2 outdoor night
1 indoor daytime
3 outdoor night
2 indoor daytime
3 outdoor daytime

Where "2 outdoor night" is an example of space seperated 3 attributes.

Make sure to have these files in the data folder

data/sift/base.fvecs 
data/sift/label_base_3.txt
data/sift/query.fvecs 
data/sift/label_query_3.txt 
data/sift/label_3_hard_groundtruth.ivecs

If using bliss run

python3 include/bliss/dataPrepare_constrained.py --data="data/sift"
python3 include/bliss/construct.py --index='sift_epc40_K10_B1024_R1' --hdim=256 --mode=1 --kn=10
make index
./index data/sift/base.fvecs data/sift/label_base_3.txt indices/sift1024blissMode1 1024 bliss 1
make query
./query data/sift/base.fvecs data/sift/label_base_3.txt data/sift/query.fvecs data/sift/label_query_3.txt indices/sift1024blissMode1 data/sift/label_3_hard_groundtruth.ivecs 1024 bliss 1 500

If using faiss kmeans run

make index
./index data/sift/base.fvecs data/sift/label_base_3.txt indices/sift1024blissMode1 1024 kmeans 1
make kmeans
./query data/sift/base.fvecs data/sift/label_base_3.txt data/sift/query.fvecs data/sift/label_query_3.txt indices/sift1024blissMode1 data/sift/label_3_hard_groundtruth.ivecs 1024 kmeans 1 500

Functionalities:

  • Variable number of attributes
  • AND among attributes
  • Large number of attributes

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published