Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index build arguments #58

Merged
merged 3 commits into from
Nov 7, 2023

Conversation

leothomas
Copy link
Contributor

What kind of change does this PR introduce?

Feature

What is the current behavior?

See supabase/vecs/issues/56

What is the new behavior?

Indexes can now be built with specific parameters (n_lists for IndexIVFFlat and m,ef_construction for IndexHSNW). PGvector default values are used in absence of user-supplied parameters and a warning will be raised if the user requests an index of one type but supplies arguments corresponding to a different type (eg: create index with type vecs.IndexMethod.hsnw and supply index_arguments=vecs.IndexArgsIVFFlat(n_lists=123)).

Note: this PR also changes the naming scheme for indexes slightly. Currently the indexes are named:
index ix_{ops}_ivfflat_{n_lists}_{unique_string} and ix_{ops}_hnsw_{unique_string} for IndexIVFFlat and IndexHNSW , respectively. Since the values of m and ef_construction are customizable and not easily accessible from the vecs client, it makes sense to encode these in the index name, similarly to the value of n_lists in the IndexIVFFlat index.

In order to avoid confusion between the two IndexHNSW build values in the index name, the values have been prepended with m and efc respectively. To maintain consistency, the n_lists value in the IndexIVFFLat is prepended with nl. So index names now looks like: ix_{ops}_ivfflat_nl{n_lists}_{unique_string} for IndexIVFFlat and ix_{ops}_hnsw_m{m}_efc{ef_construction}_{unique_string} for IndexHSNW.

Examples:

# c = client.get_or_create_collection(...)
# c.upsert(...)

# IVFFlat: no index build args, use default 
c.create_index(method=vecs.IndexMethod.ivfflat)
c.index
>>> 'ix_vector_cosine_ops_ivfflat_nl30_4ae41f2'

# IVFFlat: custom build args  
c.create_index(method=vecs.IndexMethod.ivfflat, index_arguments=vecs.IndexArgsIVFFlat(n_lists=123))
c.index
>>> 'ix_vector_cosine_ops_ivfflat_nl123_771ceda'

# HNSW: no index build args, use default 
c.create_index(method=vecs.IndexMethod.hnsw)
c.index
>>> 'ix_vector_cosine_ops_hnsw_m16_efc64_0ad32a1'

# HNSW: custom build args: 
c.create_index(method=vecs.IndexMethod.hnsw, index_arguments=vecs.IndexArgsHNSW(m=12, ef_construction=123))
c.index
>>> 'ix_vector_cosine_ops_hnsw_m12_efc123_bdcd632'

# Warning when index of one type is request but the index build arguments are supplied, use default values
c.create_index(method=vecs.IndexMethod.ivfflat, index_arguments=vecs.IndexArgsHNSW(m=12, ef_construction=123))
>>> /supabase/vecs/src/vecs/collection.py:726: UserWarning: IndexArgsHNSW build parameters were supplied but IndexMethod.ivfflat index was specified. Default parameters for IndexMethod.ivfflat index will be used instead.
c.index
>>> 'ix_vector_cosine_ops_ivfflat_nl30_32337d4'

c.create_index(method=vecs.IndexMethod.hnsw, index_arguments=vecs.IndexArgsIVFFlat(n_lists=123))
>>> /supabase/vecs/src/vecs/collection.py:726: UserWarning: IndexArgsIVFFlat build parameters were supplied but IndexMethod.hnsw index was specified. Default parameters for IndexMethod.hnsw index will be used instead.
c.index
>>> 'ix_vector_cosine_ops_hnsw_m16_efc64_be3f38f'

Copy link
Collaborator

@olirice olirice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nicely done!
a few tweaks to the logic and tests and we can get this merged. thanks!

src/tests/test_collection.py Outdated Show resolved Hide resolved
src/tests/test_collection.py Outdated Show resolved Hide resolved
src/tests/test_collection.py Outdated Show resolved Hide resolved
src/vecs/collection.py Outdated Show resolved Hide resolved
src/vecs/collection.py Outdated Show resolved Hide resolved
@leothomas
Copy link
Contributor Author

leothomas commented Nov 6, 2023

@olirice thanks for the review. Made a couple changes:

  • if index build args are supplied and either the incorrect index type or the auto index method is specified an error is now raised
  • Added test cases for setting the HNSW index build parameters separately (allowing the other to default)

That should address your comments. Please let me know if there are any further changes you'd like to see to this PR!

@olirice olirice self-requested a review November 7, 2023 19:58
@olirice
Copy link
Collaborator

olirice commented Nov 7, 2023

thats great, thanks for the contribution!

@olirice olirice merged commit ec961c4 into supabase:main Nov 7, 2023
10 checks passed
@leothomas
Copy link
Contributor Author

Thanks for getting it over the finish line! I'm looking forward to testing out some different index configurations!

@leothomas leothomas deleted the feature/index-build-arguments branch November 7, 2023 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants