token_count datatype should handle null value #25046

fred84 · 2017-06-04T06:01:01Z

Fix NPE in token_count datatype with null value (#24928)

elasticmachine · 2017-06-04T06:01:03Z

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

cbuescher

@fred84 thanks for opening this PR, the test already looks good to me. My only concern at this point is whether the token count of a null value should be 0 or simply be an empty field instead. I will ask around for more opinions on this.

cbuescher · 2017-06-06T14:32:00Z

core/src/main/java/org/elasticsearch/index/mapper/TokenCountFieldMapper.java

@@ -136,7 +136,7 @@ protected void parseCreateField(ParseContext context, List<IndexableField> field

        final int tokenCount;
        if (value == null) {
-            tokenCount = (Integer) fieldType().nullValue();
+            tokenCount = 0;


I'm not sure returning 0 if the value is null here is the right thing to do. This would mean we cannot differentiate this from the token count of an empty string (""). I'm not sure what we do in other cases like this, but maybe we shouldn't add any IndexableField at all in this case. Maybe @jpountz or @jimczi have an opinion on this.

This breaks the handling of nullValue. Instead we should check if nullValue() is also null and in that case we can ignore the document completely (return without adding any field), otherwise we should count the number of tokens in the value that replace the null field.

@jimczi thanks, thats what I thought. @fred84 would you mind changing this and also adding another test that checks values that return a real 0 token count (e.g. empty Stings, or test where the analyzer doesn't return any tokens)?

@cbuescher I've restored nullValue handling and added test

cbuescher · 2017-06-06T14:37:08Z

core/src/test/java/org/elasticsearch/index/mapper/TokenCountFieldMapperTests.java

+
+    public void testParseNullValue() throws Exception {
+        DocumentMapper mapper = createIndexWithTokenCountField();
+


nit: test is very short, so maybe we don't need any strutcturing empty lines in it.

cbuescher · 2017-06-07T16:29:26Z

@fred84 there seems to be test failures unrelated to your PR, but I'd like to get a clean CI run before merging this. Can you rebase or merge in master once again so I can kick of another build? It looks like its related to the recent branching of the 5.5 branch. Thanks.

fred84 · 2017-06-08T06:59:39Z

@cbuescher I merged master into my branch.

cbuescher · 2017-06-08T08:03:05Z

@fred84 thanks.
@elasticmachine test this please.

cbuescher · 2017-06-08T13:58:17Z

@elasticmachine test this again please

cbuescher

LGTM, I will merge this to master and the current 5.x branch

Fixes an issue with the handling of null values for the token_count data type. Closes #24928

cbuescher · 2017-06-09T12:20:16Z

@fred84 thanks a lot for this fix

fred84 · 2017-06-09T12:58:51Z

@cbuescher thanks for review!

* master: (53 commits) Log checkout so SHA is known Add link to community Rust Client (elastic#22897) "shard started" should show index and shard ID (elastic#25157) await fix testWithRandomException Change BWC versions on create index response Return the index name on a create index response Remove incorrect bwc branch logic from master Correctly format arrays in output [Test] Extending parsing checks for SearchResponse (elastic#25148) Scripting: Change keys for inline/stored scripts to source/id (elastic#25127) [Test] Add test for custom requests in High Level Rest Client (elastic#25106) nested: In case of a single type the _id field should be added to the nested document instead of _uid field. `type` and `id` are lost upon serialization of `Translog.Delete`. (elastic#24586) fix highlighting docs Fix NPE in token_count datatype with null value (elastic#25046) Remove the postings highlighter and make unified the default highlighter choice (elastic#25028) [Test] Adding test for parsing SearchShardFailure leniently (elastic#25144) Fix typo in shards.asciidoc (elastic#25143) List Hibernate Search (elastic#25145) [DOCS] update maxRetryTimeout in java REST client usage page ...

* master: (80 commits) Test: remove faling test that relies on merge order Log checkout so SHA is known Add link to community Rust Client (elastic#22897) "shard started" should show index and shard ID (elastic#25157) await fix testWithRandomException Change BWC versions on create index response Return the index name on a create index response Remove incorrect bwc branch logic from master Correctly format arrays in output [Test] Extending parsing checks for SearchResponse (elastic#25148) Scripting: Change keys for inline/stored scripts to source/id (elastic#25127) [Test] Add test for custom requests in High Level Rest Client (elastic#25106) nested: In case of a single type the _id field should be added to the nested document instead of _uid field. `type` and `id` are lost upon serialization of `Translog.Delete`. (elastic#24586) fix highlighting docs Fix NPE in token_count datatype with null value (elastic#25046) Remove the postings highlighter and make unified the default highlighter choice (elastic#25028) [Test] Adding test for parsing SearchShardFailure leniently (elastic#25144) Fix typo in shards.asciidoc (elastic#25143) List Hibernate Search (elastic#25145) ...

* master: (1889 commits) Test: remove faling test that relies on merge order Log checkout so SHA is known Add link to community Rust Client (elastic#22897) "shard started" should show index and shard ID (elastic#25157) await fix testWithRandomException Change BWC versions on create index response Return the index name on a create index response Remove incorrect bwc branch logic from master Correctly format arrays in output [Test] Extending parsing checks for SearchResponse (elastic#25148) Scripting: Change keys for inline/stored scripts to source/id (elastic#25127) [Test] Add test for custom requests in High Level Rest Client (elastic#25106) nested: In case of a single type the _id field should be added to the nested document instead of _uid field. `type` and `id` are lost upon serialization of `Translog.Delete`. (elastic#24586) fix highlighting docs Fix NPE in token_count datatype with null value (elastic#25046) Remove the postings highlighter and make unified the default highlighter choice (elastic#25028) [Test] Adding test for parsing SearchShardFailure leniently (elastic#25144) Fix typo in shards.asciidoc (elastic#25143) List Hibernate Search (elastic#25145) ...

Fix NPE in token_count datatype with null value (elastic#24928)

574fb4a

cbuescher requested changes Jun 6, 2017

View reviewed changes

fred84 added 2 commits June 6, 2017 18:58

differentiate null values and empty values in token_count field datatype

25a5247

Merge branch 'master' into 24928_token_count_mapper_npe

7cc69b4

Merge branch 'master' into 24928_token_count_mapper_npe

8372f73

cbuescher self-assigned this Jun 8, 2017

cbuescher approved these changes Jun 9, 2017

View reviewed changes

cbuescher merged commit dc5aa99 into elastic:master Jun 9, 2017

cbuescher pushed a commit that referenced this pull request Jun 9, 2017

Fix NPE in token_count datatype with null value (#25046)

d188f55

Fixes an issue with the handling of null values for the token_count data type. Closes #24928

cbuescher added :Search Foundations/Mapping Index mappings, including merging and defining field types >bug v5.6.0 v6.0.0 labels Jun 9, 2017

fred84 deleted the 24928_token_count_mapper_npe branch June 22, 2017 19:09

colings86 added v6.0.0-beta1 and removed v6.0.0 labels Aug 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

token_count datatype should handle null value #25046

token_count datatype should handle null value #25046

fred84 commented Jun 4, 2017

elasticmachine commented Jun 4, 2017

cbuescher left a comment

cbuescher Jun 6, 2017

jimczi Jun 6, 2017

cbuescher Jun 6, 2017

fred84 Jun 6, 2017 •

edited

Loading

cbuescher Jun 6, 2017

fred84 Jun 6, 2017

cbuescher commented Jun 7, 2017

fred84 commented Jun 8, 2017 •

edited

Loading

cbuescher commented Jun 8, 2017

cbuescher commented Jun 8, 2017

cbuescher left a comment

cbuescher commented Jun 9, 2017

fred84 commented Jun 9, 2017


		public void testParseNullValue() throws Exception {
		DocumentMapper mapper = createIndexWithTokenCountField();

token_count datatype should handle null value #25046

token_count datatype should handle null value #25046

Conversation

fred84 commented Jun 4, 2017

elasticmachine commented Jun 4, 2017

cbuescher left a comment

Choose a reason for hiding this comment

cbuescher Jun 6, 2017

Choose a reason for hiding this comment

jimczi Jun 6, 2017

Choose a reason for hiding this comment

cbuescher Jun 6, 2017

Choose a reason for hiding this comment

fred84 Jun 6, 2017 • edited Loading

Choose a reason for hiding this comment

cbuescher Jun 6, 2017

Choose a reason for hiding this comment

fred84 Jun 6, 2017

Choose a reason for hiding this comment

cbuescher commented Jun 7, 2017

fred84 commented Jun 8, 2017 • edited Loading

cbuescher commented Jun 8, 2017

cbuescher commented Jun 8, 2017

cbuescher left a comment

Choose a reason for hiding this comment

cbuescher commented Jun 9, 2017

fred84 commented Jun 9, 2017

fred84 Jun 6, 2017 •

edited

Loading

fred84 commented Jun 8, 2017 •

edited

Loading