-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for configuring HNSW parameters #79193
Conversation
This PR extends the dense_vector type to allow configure HNSW params in `index_options`: `m` – max number of connections for each node, `ef_construction` – number of candidate neighbors to track while searching the graph for each newly inserted node. ``` "mappings": { "properties": { "my_vector": { "type": "dense_vector", "dims": 128, "index": true, "similarity": "l2_norm", "index_options": { "type" : "hnsw", "m" : 15, "ef_construction" : 50 } } } } ``` index_options as an object, and all parameters underneath are optional. If `m` or `ef_contruction` are not provided, the default values from the current codec will be used. Relates to elastic#78473
Pinging @elastic/es-search (Team:Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me as a short-term solution. (We discussed offline that in the medium-term, we want to add a more general/ elegant way for field mappers to add codec configuration.)
I left some small comments and had these bigger ones:
- Right now the marker interface
VectorFieldMapper
contains important logic, so now parsing logic is split acrossDenseVectorFieldMapper
andVectorFieldMapper
. Could we just contain it all inDenseVectorFieldMapper
to keep it simple? Then the marker interface would have just one methodgetKnnVectorsFormatForField
. - Instead of a very general name like
VectorFieldMapper
, we could name it something specific likePerFieldKnnVectorsFormatFieldMapper
. This makes it clear it exists for just one purpose and is not a general vector interface that other field mappers should extend. We could even put a big comment like "for internal use only" so no plugin authors are tempted to use it. This is not elegant but feels like an okay short-term solution.
x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/vectors/10_dense_vector_basic.yml
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/PerFieldMappingCodec.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/VectorFieldMapper.java
Outdated
Show resolved
Hide resolved
...ectors/src/test/java/org/elasticsearch/xpack/vectors/mapper/DenseVectorFieldMapperTests.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/VectorFieldMapper.java
Outdated
Show resolved
Hide resolved
One last thought: I think we could keep the config really simple and always require all parameters ( |
@jtibshirani Thanks for a great feedback. I've tried to address your comments in d5cc59f. Please continue to review when you have time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good to me, I left some final small comments.
server/src/main/java/org/elasticsearch/index/codec/PerFieldMappingCodec.java
Outdated
Show resolved
Hide resolved
...gin/vectors/src/main/java/org/elasticsearch/xpack/vectors/mapper/DenseVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
...gin/vectors/src/main/java/org/elasticsearch/xpack/vectors/mapper/DenseVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
...gin/vectors/src/main/java/org/elasticsearch/xpack/vectors/mapper/DenseVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
@jtibshirani Thanks for another round of review. I've tried to addressed your second round of feedback in db231e9. Please continue to review when you have time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mayya-sharipova !
@elasticmachine run elasticsearch-ci/part-1 |
* upstream/master: Changing test keytab to use aes256-cts-hmac-sha1-96 instead of des3-cbc-sha1-kd (elastic#78703) Add support for configuring HNSW parameters (elastic#79193) Deprecate resolution loss on date field (elastic#78921) Add Optional to Configure bind user (elastic#78303) Adapt BWC after backporting elastic#78765 (elastic#79350) [DOCS] Add deprecation notice for reset password tool (elastic#78793) added test for flattened type in top_metrics.yml (elastic#78960) [DOCS] Fixes indentation issue in GET trained models API docs. (elastic#79347) Fix parsing of PBES2 encrypted PKCS#8 keys (elastic#78904) Mute testReindex (elastic#79343) Node level can match action (elastic#78765) Fix duplicate license header in source files (elastic#79236) AllowAll for indicesAccessControl (elastic#78498) Better logging and internal user handling for operator privileges (elastic#79331) # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/MappingParser.java
Test the output of toString method as it is available now. Relates to elastic#79193
Test the output of toString method as it is available now. Relates to #79193
Test the output of toString method as it is available now. Relates to elastic#79193
This PR extends the dense_vector type to allow configure HNSW params in
index_options
:m
– max number of connections for each node,ef_construction
– number of candidate neighbors to track while searchingthe graph for each newly inserted node.
index_options as an object is optional. If not provided, the default values from the
current codec will be used.
Relates to #78473