Skip to content

Commit

Permalink
add docs for index_arguments
Browse files Browse the repository at this point in the history
  • Loading branch information
olirice committed Nov 7, 2023
1 parent 2bb7c95 commit 95c31f3
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 2 deletions.
5 changes: 4 additions & 1 deletion docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,16 +101,19 @@ Available options for index `method` are:

Where `auto` selects the best available index method, `hnsw` uses the [HNSW](https://github.com/pgvector/pgvector#hnsw) method and `ivfflat` uses [IVFFlat](https://github.com/pgvector/pgvector#ivfflat).

HNSW and IVFFlat indexes both allow for parameterization to control the speed/accuracy tradeoff. vecs provides sane defaults for these parameters. For a greater level of control you can optionally pass an instance of `vecs.IndexArgsIVFFlat` or `vecs.IndexArgsHNSW` to `create_index`'s `index_arguments` argument. Descriptions of the impact for each parameter are available in the [pgvector docs](https://github.com/pgvector/pgvector).

When using IVFFlat indexes, the index must be created __after__ the collection has been populated with records. Building an IVFFlat index on an empty collection will result in significantly reduced recall. You can continue upserting new documents after the index has been created, but should rebuild the index if the size of the collection more than doubles since the last index operation.

HNSW indexes can be created immediately after the collection without populating records.

To manually specify `method` and `measure`, add them as arguments to `create_index` for example:
To manually specify `method`, `measure`, and `index_arguments` add them as arguments to `create_index` for example:

```python
docs.create_index(
method=IndexMethod.hnsw,
measure=IndexMeasure.cosine_distance,
measure=IndexArgsHNSW(m=8),
)
```

Expand Down
5 changes: 4 additions & 1 deletion docs/concepts_indexes.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,16 +40,19 @@ Available options for index `method` are:

Where `auto` selects the best available index method, `hnsw` uses the [HNSW](https://github.com/pgvector/pgvector#hnsw) method and `ivfflat` uses [IVFFlat](https://github.com/pgvector/pgvector#ivfflat).

HNSW and IVFFlat indexes both allow for parameterization to control the speed/accuracy tradeoff. vecs provides sane defaults for these parameters. For a greater level of control you can optionally pass an instance of `vecs.IndexArgsIVFFlat` or `vecs.IndexArgsHNSW` to `create_index`'s `index_arguments` argument. Descriptions of the impact for each parameter are available in the [pgvector docs](https://github.com/pgvector/pgvector).

When using IVFFlat indexes, the index must be created __after__ the collection has been populated with records. Building an IVFFlat index on an empty collection will result in significantly reduced recall. You can continue upserting new documents after the index has been created, but should rebuild the index if the size of the collection more than doubles since the last index operation.

HNSW indexes can be created immediately after the collection without populating records.

To manually specify `method` and `measure`, ass them as arguments to `create_index` for example:
To manually specify `method`, `measure`, and `index_arguments` add them as arguments to `create_index` for example:

```python
docs.create_index(
method=IndexMethod.hnsw,
measure=IndexMeasure.cosine_distance,
measure=IndexArgsHNSW(m=8),
)
```

Expand Down
2 changes: 2 additions & 0 deletions docs/support_changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,5 @@
- Bugfix: removed errant print statement

## master

- Feature: Parameterized IVFFlat and HNSW indexes
1 change: 1 addition & 0 deletions src/vecs/collection.py
Original file line number Diff line number Diff line change
Expand Up @@ -684,6 +684,7 @@ def create_index(
Args:
measure (IndexMeasure, optional): The measure to index for. Defaults to 'cosine_distance'.
method (IndexMethod, optional): The indexing method to use. Defaults to 'auto'.
index_arguments: (IndexArgsIVFFlat | IndexArgsHNSW, optional): Index type specific arguments
replace (bool, optional): Whether to replace the existing index. Defaults to True.
Raises:
Expand Down

0 comments on commit 95c31f3

Please sign in to comment.