You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the HSNW index creation does not specify (or allow specifying) the parameters m (max number of connections per layer), and ef_construction (size of the dynamic candidate list when constructing the graph) when creating the index. This means that HSNW index is built with the default pgvector values of m=16 and ef_construction=64. It would be great to be able to set these when instantiating the HSNW index using the vecs collection.
One possible implementation would be to create an IndexParameters class, which would be accepted by the create_index() function (alongside the IndexMeasure and IndexMethod) which would contain optional parameters m:int=16 and ef_construction:int=64 (pgvector defaults).
And then inserted into the SQL command sent to pgvector as:
f"""create index ix_{ops}_hnsw_{unique_string}on vecs."{self.table.name}"using hnsw (vec {ops}) with (m={m} and ef_construction={ef_construction});"""
The IndexParamters class could also be used to afford users fine grain control on the number of lists used when creating the IVFFlat index. An n_lists:Optional[int]=None parameter would be added to IndexParamters class and, in absence of a value supplied by the use, the n_lists value would be calculated as it currently is:
Lastly a warning can be raised if the user supplies m and ef_construction but specifies and ivfflat index or inversely, if the user supplies a value for n_lists when specifying the hsnw index type. Only a warning is needed since the default values would be applied in this case.
It's a fairly small/self-contained feature so I'm happy to submit a PR!
Cheers!
The text was updated successfully, but these errors were encountered:
IndexParameters sounds good though we may want a separate (data?)class for each one and the new index_arguments arg to the create_index function could take a union of e.g.
Summary
Currently the HSNW index creation does not specify (or allow specifying) the parameters
m
(max number of connections per layer), andef_construction
(size of the dynamic candidate list when constructing the graph) when creating the index. This means that HSNW index is built with the default pgvector values ofm=16
andef_construction=64
. It would be great to be able to set these when instantiating the HSNW index using thevecs
collection.Rationale
These parameters can have important effects on the index performance.
Design
One possible implementation would be to create an
IndexParameters
class, which would be accepted by thecreate_index()
function (alongside theIndexMeasure
andIndexMethod
) which would contain optional parametersm:int=16
andef_construction:int=64
(pgvector defaults).And then inserted into the SQL command sent to pgvector as:
The
IndexParamters
class could also be used to afford users fine grain control on the number of lists used when creating theIVFFlat
index. Ann_lists:Optional[int]=None
parameter would be added toIndexParamters
class and, in absence of a value supplied by the use, then_lists
value would be calculated as it currently is:Lastly a warning can be raised if the user supplies
m
andef_construction
but specifies andivfflat
index or inversely, if the user supplies a value forn_lists
when specifying thehsnw
index type. Only a warning is needed since the default values would be applied in this case.It's a fairly small/self-contained feature so I'm happy to submit a PR!
Cheers!
The text was updated successfully, but these errors were encountered: