Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
ashvardanian committed Jul 29, 2023
2 parents cffe507 + 96baa09 commit 3500812
Show file tree
Hide file tree
Showing 3 changed files with 60 additions and 57 deletions.
23 changes: 22 additions & 1 deletion docs/benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ Also worth noting, 8-bit quantization results in almost no quantization loss and
Within this repository you will find two commonly used utilities:

- `cpp/bench.cpp` the produces the `bench` binary for broad USearch benchmarks.
- `python/bench.py` for simple benchmarks against FAISS.
- `python/bench.py` and `python/bench.ipynb` for interactive charts against FAISS.

To achieve best highest results we suggest compiling locally for the target architecture.

Expand Down Expand Up @@ -112,6 +112,27 @@ OPTIONS
-h, --help Print this help information on this tool and exit
```

Here is an example of running the C++ benchmark:

```sh
./build_release/bench \
--vectors datasets/wiki_1M/base.1M.fbin \
--queries datasets/wiki_1M/query.public.100K.fbin \
--neighbors datasets/wiki_1M/groundtruth.public.100K.ibin

./build_release/bench \
--vectors datasets/t2i_1B/base.1B.fbin \
--queries datasets/t2i_1B/query.public.100K.fbin \
--neighbors datasets/t2i_1B/groundtruth.public.100K.ibin \
--output datasets/t2i_1B/index.usearch \
--cos
```


> Optional parameters include `connectivity`, `expansion_add`, `expansion_search`.
For Python, jut open the Jupyter Notebook and start playing around.

## Datasets

BigANN benchmark is a good starting point, if you are searching for large collections of high-dimensional vectors.
Expand Down
35 changes: 0 additions & 35 deletions docs/compilation.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,22 +56,6 @@ Testing:
cmake -DCMAKE_CXX_COMPILER=gcc-12 -DCMAKE_CXX_COMPILER=g++-12 -DCMAKE_BUILD_TYPE=Debug -B ./build_debug && make -C ./build_debug && ./build_debug/test
```

Benchmarking:

```sh
./build_release/bench \
--vectors datasets/wiki_1M/base.1M.fbin \
--queries datasets/wiki_1M/query.public.100K.fbin \
--neighbors datasets/wiki_1M/groundtruth.public.100K.ibin

./build_release/bench \
--vectors datasets/t2i_1B/base.1B.fbin \
--queries datasets/t2i_1B/query.public.100K.fbin \
--neighbors datasets/t2i_1B/groundtruth.public.100K.ibin \
--output datasets/t2i_1B/index.usearch \
--cos
```

## Python 3

Use PyTest to validate the build.
Expand All @@ -96,25 +80,6 @@ pip install cibuildwheel
cibuildwheel --platform linux
```

Benchmarking:

```sh
pip install faiss-cpu
python python/scripts/bench.py speed \
--vectors datasets/wiki_1M/base.1M.fbin \
--queries datasets/wiki_1M/query.public.100K.fbin \
--neighbors datasets/wiki_1M/groundtruth.public.100K.ibin
```

> Optional parameters include `connectivity`, `expansion_add`, `expansion_search`.
Checking the effect of different embedding dimensions on construction speed:

```sh
python python/scripts/bench.py dimensions ...
python python/scripts/bench.py connectivity ...
```

## JavaScript

Node.JS:
Expand Down
59 changes: 38 additions & 21 deletions python/usearch/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,43 @@
import numpy as np


def numpy_scalar_size(dtype) -> int:
return {
np.float64: 8,
np.int64: 8,
np.uint64: 8,
np.float32: 4,
np.int32: 4,
np.uint32: 4,
np.float16: 2,
np.int16: 2,
np.uint16: 2,
np.int8: 1,
np.uint8: 1,
}[dtype]


def guess_numpy_dtype_from_filename(filename) -> typing.Optional[type]:
if filename.endswith(".fbin"):
return np.float32
elif filename.endswith(".dbin"):
return np.float64
elif filename.endswith(".hbin"):
return np.float16
elif filename.endswith(".ibin"):
return np.int32
elif filename.endswith(".bbin"):
return np.uint8
else:
return None


def load_matrix(
filename: str,
start_row: int = 0,
count_rows: int = None,
view: bool = False,
dtype: typing.Optional[type] = None,
) -> typing.Optional[np.ndarray]:
"""Read *.ibin, *.bbib, *.hbin, *.fbin, *.dbin files with matrices.
Expand All @@ -21,25 +53,11 @@ def load_matrix(
:return: parsed matrix
:rtype: numpy.ndarray
"""
dtype = np.float32
scalar_size = 4
if filename.endswith(".fbin"):
dtype = np.float32
scalar_size = 4
elif filename.endswith(".dbin"):
dtype = np.float64
scalar_size = 8
elif filename.endswith(".hbin"):
dtype = np.float16
scalar_size = 2
elif filename.endswith(".ibin"):
dtype = np.int32
scalar_size = 4
elif filename.endswith(".bbin"):
dtype = np.uint8
scalar_size = 1
else:
raise Exception("Unknown file type")
if dtype is None:
dtype = guess_numpy_dtype_from_filename(filename)
if dtype is None:
raise Exception("Unknown file type")
scalar_size = numpy_scalar_size(dtype)

if not os.path.exists(filename):
return None
Expand Down Expand Up @@ -74,7 +92,6 @@ def save_matrix(vectors: np.ndarray, filename: str):
:param filename: path to the matrix file
:type filename: str
"""
dtype = np.float32
if filename.endswith(".fbin"):
dtype = np.float32
elif filename.endswith(".dbin"):
Expand All @@ -86,7 +103,7 @@ def save_matrix(vectors: np.ndarray, filename: str):
elif filename.endswith(".bbin"):
dtype = np.uint8
else:
raise Exception("Unknown file type")
dtype = vectors.dtype

assert len(vectors.shape) == 2, "Input array must have 2 dimensions"
with open(filename, "wb") as f:
Expand Down

0 comments on commit 3500812

Please sign in to comment.