Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forbid empty doc values on vector functions #43944

Conversation

mayya-sharipova
Copy link
Contributor

Currently when a document misses a vector value, vector function
returns 0 as a score for this document. We think this is incorrect
and unexpected behaviour.
With this change, an error will be thrown if vector functions are
used with docs that are missing vector doc values.
Also VectorScriptDocValues is modified to allow size() function,
which can be used to check if a document has a value for the
vector field.

Currently when a document misses a vector value, vector function
returns 0 as a score for this document. We think this is incorrect
behaviour.
With this change, an error will be thrown if vector functions are
used with docs that are missing vector doc values.
Also VectorScriptDocValues is modified to allow size() function,
which can be used to check if a document has a value for the
vector field.
@mayya-sharipova mayya-sharipova added the :Search Relevance/Ranking Scoring, rescoring, rank evaluation. label Jul 3, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@mayya-sharipova
Copy link
Contributor Author

@jpountz I wanted to ask if you are fine if we expose size() function in VectorScriptDocValues?

Before we made a decision not to leak the internal representations of vectors.
But after thinking about it with @jtibshirani, we think that we need to expose size() function which will allow users to check if a document has a value for a vector field.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, my comment in the other PR was about the fact that users could have access to the binary representation of the vector. Exposing size() as a way to know whether the document has a value or not sounds good to me. 👍

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also looks good to me.

@@ -46,7 +46,11 @@ public BytesRef get(int index) {

@Override
public int size() {
throw new UnsupportedOperationException("vector fields may only be used via vector functions in scripts");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment: now that you can call size on a vector field, perhaps we could update the error message above to say "accessing a vector field's value through 'get' or 'value' is not supported".

@mayya-sharipova mayya-sharipova merged commit 5255eb3 into elastic:master Jul 5, 2019
mayya-sharipova added a commit that referenced this pull request Jul 5, 2019
Currently when a document misses a vector value, vector function
returns 0 as a score for this document. We think this is incorrect
behaviour.
With this change, an error will be thrown if vector functions are
used with docs that are missing vector doc values.
Also VectorScriptDocValues is modified to allow size() function,
which can be used to check if a document has a value for the
vector field.
mayya-sharipova added a commit that referenced this pull request Jul 5, 2019
@jtibshirani jtibshirani added :Search Relevance/Vectors Vector search and removed :Search Relevance/Ranking Scoring, rescoring, rank evaluation. labels Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants