-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
token_count datatype should handle null value #25046
Conversation
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fred84 thanks for opening this PR, the test already looks good to me. My only concern at this point is whether the token count of a null
value should be 0 or simply be an empty field instead. I will ask around for more opinions on this.
@@ -136,7 +136,7 @@ protected void parseCreateField(ParseContext context, List<IndexableField> field | |||
|
|||
final int tokenCount; | |||
if (value == null) { | |||
tokenCount = (Integer) fieldType().nullValue(); | |||
tokenCount = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure returning 0 if the value is null
here is the right thing to do. This would mean we cannot differentiate this from the token count of an empty string (""). I'm not sure what we do in other cases like this, but maybe we shouldn't add any IndexableField at all in this case. Maybe @jpountz or @jimczi have an opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks the handling of nullValue
. Instead we should check if nullValue()
is also null and in that case we can ignore the document completely (return without adding any field), otherwise we should count the number of tokens in the value that replace the null field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbuescher I've restored nullValue handling and added test
|
||
public void testParseNullValue() throws Exception { | ||
DocumentMapper mapper = createIndexWithTokenCountField(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: test is very short, so maybe we don't need any strutcturing empty lines in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@fred84 there seems to be test failures unrelated to your PR, but I'd like to get a clean CI run before merging this. Can you rebase or merge in master once again so I can kick of another build? It looks like its related to the recent branching of the 5.5 branch. Thanks. |
@cbuescher I merged master into my branch. |
@fred84 thanks. |
@elasticmachine test this again please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I will merge this to master and the current 5.x branch
Fixes an issue with the handling of null values for the token_count data type. Closes #24928
@fred84 thanks a lot for this fix |
@cbuescher thanks for review! |
* master: (53 commits) Log checkout so SHA is known Add link to community Rust Client (elastic#22897) "shard started" should show index and shard ID (elastic#25157) await fix testWithRandomException Change BWC versions on create index response Return the index name on a create index response Remove incorrect bwc branch logic from master Correctly format arrays in output [Test] Extending parsing checks for SearchResponse (elastic#25148) Scripting: Change keys for inline/stored scripts to source/id (elastic#25127) [Test] Add test for custom requests in High Level Rest Client (elastic#25106) nested: In case of a single type the _id field should be added to the nested document instead of _uid field. `type` and `id` are lost upon serialization of `Translog.Delete`. (elastic#24586) fix highlighting docs Fix NPE in token_count datatype with null value (elastic#25046) Remove the postings highlighter and make unified the default highlighter choice (elastic#25028) [Test] Adding test for parsing SearchShardFailure leniently (elastic#25144) Fix typo in shards.asciidoc (elastic#25143) List Hibernate Search (elastic#25145) [DOCS] update maxRetryTimeout in java REST client usage page ...
* master: (80 commits) Test: remove faling test that relies on merge order Log checkout so SHA is known Add link to community Rust Client (elastic#22897) "shard started" should show index and shard ID (elastic#25157) await fix testWithRandomException Change BWC versions on create index response Return the index name on a create index response Remove incorrect bwc branch logic from master Correctly format arrays in output [Test] Extending parsing checks for SearchResponse (elastic#25148) Scripting: Change keys for inline/stored scripts to source/id (elastic#25127) [Test] Add test for custom requests in High Level Rest Client (elastic#25106) nested: In case of a single type the _id field should be added to the nested document instead of _uid field. `type` and `id` are lost upon serialization of `Translog.Delete`. (elastic#24586) fix highlighting docs Fix NPE in token_count datatype with null value (elastic#25046) Remove the postings highlighter and make unified the default highlighter choice (elastic#25028) [Test] Adding test for parsing SearchShardFailure leniently (elastic#25144) Fix typo in shards.asciidoc (elastic#25143) List Hibernate Search (elastic#25145) ...
* master: (1889 commits) Test: remove faling test that relies on merge order Log checkout so SHA is known Add link to community Rust Client (elastic#22897) "shard started" should show index and shard ID (elastic#25157) await fix testWithRandomException Change BWC versions on create index response Return the index name on a create index response Remove incorrect bwc branch logic from master Correctly format arrays in output [Test] Extending parsing checks for SearchResponse (elastic#25148) Scripting: Change keys for inline/stored scripts to source/id (elastic#25127) [Test] Add test for custom requests in High Level Rest Client (elastic#25106) nested: In case of a single type the _id field should be added to the nested document instead of _uid field. `type` and `id` are lost upon serialization of `Translog.Delete`. (elastic#24586) fix highlighting docs Fix NPE in token_count datatype with null value (elastic#25046) Remove the postings highlighter and make unified the default highlighter choice (elastic#25028) [Test] Adding test for parsing SearchShardFailure leniently (elastic#25144) Fix typo in shards.asciidoc (elastic#25143) List Hibernate Search (elastic#25145) ...
Fix NPE in token_count datatype with null value (#24928)