-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the signature of vector script functions. #48260
Conversation
Pinging @elastic/es-search (:Search/Search) |
cfa1b3b
to
df79308
Compare
Previously the functions accepted a doc values reference, whereas they now accept the name of the vector field. Here's an example of how a vector function was called before and after the change. Before: `cosineSimilarity(params.query_vector, doc['vector'])` After: `cosineSimilarity(params.query_vector, 'vector')` This seems more intuitive, since we don't allow direct access to vector doc values and the the meaning of `doc['vector_field']` is a bit unclear. It also allows us to avoid retrieving the vector doc values on every function invocation. This can give a tiny speed-up, some example results on the glove-100-angular dataset: ``` Algorithm Recall QPS EsBruteforce() 1.000 5.390 EsBruteforce(cache_docvalues) 1.000 5.564 ```
df79308
to
1a832c5
Compare
@elasticmachine run elasticsearch-ci/packaging-sample-matrix |
I'm glad we are moving towards making the doc values lookup an implementation detail. However, this would be a hard break for all current users of cosineSimilarity. Could we possibly instead deprecate the current method and create a new method (completely different name)? Do we know there are sufficiently small numbers of this method that the break is "ok", or are we considering this break ok because vectors are experimental? |
I echo @rjernst 's concerns here. We should follow a deprecation model in the same way as we do with anything in Elastic. There was some work to make this possible to deprecate whitelisted methods previously, but it still needs to be completed. So I would add a new method and deprecate this one in the docs to begin with, and we can talk about finishing the ability to deprecate whitelisted methods with an "annotation" on the whitelist. |
After gathering some suggestions on how to handle the deprecation, I opened #48604 as a proposal. I'm closing this PR, it would be great to get your feedback on the new one. |
NOTE: not ready for review. This PR is meant to help our discussion around the best way to deprecate + remove these functions. The straightforward deprecation path is difficult, because Painless doesn't allow having two methods defined with the same name and arity.
Previously the functions accepted a doc values reference, whereas they now
accept the name of the vector field. Here's an example of how a vector function
was called before and after the change.
Before:
cosineSimilarity(params.query_vector, doc['vector'])
After:
cosineSimilarity(params.query_vector, 'vector')
This seems more intuitive, since we don't allow direct access to vector doc
values and the the meaning of
doc['vector_field']
is a bit unclear.It also allows us to avoid retrieving the vector doc values on every function
invocation. This can give a tiny speed-up, some example results on the
glove-100-angular dataset: