Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for configuring HNSW parameters #79193

Merged
merged 4 commits into from
Oct 18, 2021

Conversation

mayya-sharipova
Copy link
Contributor

@mayya-sharipova mayya-sharipova commented Oct 14, 2021

This PR extends the dense_vector type to allow configure HNSW params in
index_options:
m – max number of connections for each node,
ef_construction – number of candidate neighbors to track while searching
the graph for each newly inserted node.

"mappings": {
  "properties": {
    "my_vector": {
      "type": "dense_vector",
      "dims": 128,
      "index": true,
      "similarity": "l2_norm",
      "index_options": {
        "type" : "hnsw",
        "m" : 15,
        "ef_construction" : 50
      }
    }
  }
}

index_options as an object is optional. If not provided, the default values from the
current codec will be used.

Relates to #78473

This PR extends the dense_vector type to allow configure HNSW params in
`index_options`:
`m` – max number of connections for each  node,
`ef_construction` – number  of candidate neighbors to track while searching
the graph for each newly inserted node.

```
"mappings": {
  "properties": {
    "my_vector": {
      "type": "dense_vector",
      "dims": 128,
      "index": true,
      "similarity": "l2_norm",
      "index_options": {
        "type" : "hnsw",
        "m" : 15,
        "ef_construction" : 50
      }
    }
  }
}
```

index_options as an object, and all parameters underneath are optional.
If  `m` or `ef_contruction` are not provided, the default values from the
current codec will be used.

Relates to elastic#78473
@mayya-sharipova mayya-sharipova added the :Search/Search Search-related issues that do not fall into other categories label Oct 14, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Oct 14, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@mayya-sharipova mayya-sharipova added v8.0.0 :Search Foundations/Mapping Index mappings, including merging and defining field types and removed Team:Search Meta label for search team :Search/Search Search-related issues that do not fall into other categories labels Oct 14, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Oct 14, 2021
@jtibshirani jtibshirani mentioned this pull request Oct 14, 2021
17 tasks
Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me as a short-term solution. (We discussed offline that in the medium-term, we want to add a more general/ elegant way for field mappers to add codec configuration.)

I left some small comments and had these bigger ones:

  • Right now the marker interface VectorFieldMapper contains important logic, so now parsing logic is split across DenseVectorFieldMapper and VectorFieldMapper. Could we just contain it all in DenseVectorFieldMapper to keep it simple? Then the marker interface would have just one method getKnnVectorsFormatForField.
  • Instead of a very general name like VectorFieldMapper, we could name it something specific like PerFieldKnnVectorsFormatFieldMapper. This makes it clear it exists for just one purpose and is not a general vector interface that other field mappers should extend. We could even put a big comment like "for internal use only" so no plugin authors are tempted to use it. This is not elegant but feels like an okay short-term solution.

@jtibshirani
Copy link
Contributor

If m or ef_contruction are not provided, the default values from the
current codec will be used.

One last thought: I think we could keep the config really simple and always require all parameters (type, m, ef_construction) to be specified if there is an index_options section. I don't see a benefit in allowing just one to be overridden: if you are tuning parameters, you will almost always be considering both?

@mayya-sharipova
Copy link
Contributor Author

@jtibshirani Thanks for a great feedback. I've tried to address your comments in d5cc59f. Please continue to review when you have time.

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me, I left some final small comments.

@mayya-sharipova
Copy link
Contributor Author

@jtibshirani Thanks for another round of review. I've tried to addressed your second round of feedback in db231e9. Please continue to review when you have time.

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mayya-sharipova !

@mayya-sharipova
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-1

@mayya-sharipova mayya-sharipova merged commit bdf8ca9 into elastic:master Oct 18, 2021
@mayya-sharipova mayya-sharipova deleted the hnsw-param branch October 18, 2021 12:54
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Oct 18, 2021
* upstream/master:
  Changing test keytab to use aes256-cts-hmac-sha1-96 instead of des3-cbc-sha1-kd (elastic#78703)
  Add support for configuring HNSW parameters (elastic#79193)
  Deprecate resolution loss on date field (elastic#78921)
  Add Optional to Configure bind user (elastic#78303)
  Adapt BWC after backporting elastic#78765 (elastic#79350)
  [DOCS] Add deprecation notice for reset password tool (elastic#78793)
  added test for flattened type in top_metrics.yml (elastic#78960)
  [DOCS] Fixes indentation issue in GET trained models API docs. (elastic#79347)
  Fix parsing of PBES2 encrypted PKCS#8 keys (elastic#78904)
  Mute testReindex (elastic#79343)
  Node level can match action (elastic#78765)
  Fix duplicate license header in source files (elastic#79236)
  AllowAll for indicesAccessControl (elastic#78498)
  Better logging and internal user handling for operator privileges (elastic#79331)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/MappingParser.java
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Oct 21, 2021
Test the output of toString method as it is available now.

Relates to elastic#79193
mayya-sharipova added a commit that referenced this pull request Oct 21, 2021
Test the output of toString method as it is available now.

Relates to #79193
lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request Oct 28, 2021
Test the output of toString method as it is available now.

Relates to elastic#79193
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants