Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ef_construction and m as optional parameters when building HSNW index #56

Closed
leothomas opened this issue Nov 1, 2023 · 3 comments
Closed

Comments

@leothomas
Copy link
Contributor

leothomas commented Nov 1, 2023

Summary

Currently the HSNW index creation does not specify (or allow specifying) the parameters m (max number of connections per layer), and ef_construction (size of the dynamic candidate list when constructing the graph) when creating the index. This means that HSNW index is built with the default pgvector values of m=16 and ef_construction=64. It would be great to be able to set these when instantiating the HSNW index using the vecs collection.

Rationale

These parameters can have important effects on the index performance.

Design

One possible implementation would be to create an IndexParameters class, which would be accepted by the create_index() function (alongside the IndexMeasure and IndexMethod) which would contain optional parameters m:int=16 and ef_construction:int=64 (pgvector defaults).

And then inserted into the SQL command sent to pgvector as:

f"""
create index ix_{ops}_hnsw_{unique_string}
on vecs."{self.table.name}"
using hnsw (vec {ops}) with (m={m} and ef_construction={ef_construction});
"""

The IndexParamters class could also be used to afford users fine grain control on the number of lists used when creating the IVFFlat index. An n_lists:Optional[int]=None parameter would be added to IndexParamters class and, in absence of a value supplied by the use, the n_lists value would be calculated as it currently is:

n_lists = (
    int(max(n_records / 1000, 30))
    if n_records < 1_000_000
    else int(math.sqrt(n_records))
)

Lastly a warning can be raised if the user supplies m and ef_construction but specifies and ivfflat index or inversely, if the user supplies a value for n_lists when specifying the hsnw index type. Only a warning is needed since the default values would be applied in this case.

It's a fairly small/self-contained feature so I'm happy to submit a PR!

Cheers!

@olirice
Copy link
Collaborator

olirice commented Nov 1, 2023

Hi Leo,

thanks for opening, that's very clearly laid out

IndexParameters sounds good though we may want a separate (data?)class for each one and the new index_arguments arg to the create_index function could take a union of e.g.

create_index(self, ..., index_arguments: Optional[Union[IndexArgsIVFFlat, IndexArgsHNSW]]):

or something similar

If that sounds good to you a PR would be amazing

@leothomas
Copy link
Contributor Author

leothomas commented Nov 6, 2023

Awesome! PR is open ^. Thanks!

@leothomas
Copy link
Contributor Author

Closing after merging PR - thanks for getting that in so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants