Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

token_count datatype should handle null value #25046

Merged
merged 4 commits into from
Jun 9, 2017

Conversation

fred84
Copy link
Contributor

@fred84 fred84 commented Jun 4, 2017

Fix NPE in token_count datatype with null value (#24928)

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fred84 thanks for opening this PR, the test already looks good to me. My only concern at this point is whether the token count of a null value should be 0 or simply be an empty field instead. I will ask around for more opinions on this.

@@ -136,7 +136,7 @@ protected void parseCreateField(ParseContext context, List<IndexableField> field

final int tokenCount;
if (value == null) {
tokenCount = (Integer) fieldType().nullValue();
tokenCount = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure returning 0 if the value is null here is the right thing to do. This would mean we cannot differentiate this from the token count of an empty string (""). I'm not sure what we do in other cases like this, but maybe we shouldn't add any IndexableField at all in this case. Maybe @jpountz or @jimczi have an opinion on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks the handling of nullValue. Instead we should check if nullValue() is also null and in that case we can ignore the document completely (return without adding any field), otherwise we should count the number of tokens in the value that replace the null field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimczi thanks, thats what I thought. @fred84 would you mind changing this and also adding another test that checks values that return a real 0 token count (e.g. empty Stings, or test where the analyzer doesn't return any tokens)?

Copy link
Contributor Author

@fred84 fred84 Jun 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbuescher I've restored nullValue handling and added test


public void testParseNullValue() throws Exception {
DocumentMapper mapper = createIndexWithTokenCountField();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: test is very short, so maybe we don't need any strutcturing empty lines in it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cbuescher
Copy link
Member

@fred84 there seems to be test failures unrelated to your PR, but I'd like to get a clean CI run before merging this. Can you rebase or merge in master once again so I can kick of another build? It looks like its related to the recent branching of the 5.5 branch. Thanks.

@fred84
Copy link
Contributor Author

fred84 commented Jun 8, 2017

@cbuescher I merged master into my branch.

@cbuescher
Copy link
Member

@fred84 thanks.
@elasticmachine test this please.

@cbuescher cbuescher self-assigned this Jun 8, 2017
@cbuescher
Copy link
Member

@elasticmachine test this again please

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I will merge this to master and the current 5.x branch

@cbuescher cbuescher merged commit dc5aa99 into elastic:master Jun 9, 2017
cbuescher pushed a commit that referenced this pull request Jun 9, 2017
Fixes an issue with the handling of null values for the token_count data type.

Closes #24928
@cbuescher cbuescher added :Search Foundations/Mapping Index mappings, including merging and defining field types >bug v5.6.0 v6.0.0 labels Jun 9, 2017
@cbuescher
Copy link
Member

@fred84 thanks a lot for this fix

@fred84
Copy link
Contributor Author

fred84 commented Jun 9, 2017

@cbuescher thanks for review!

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jun 10, 2017
* master: (53 commits)
  Log checkout so SHA is known
  Add link to community Rust Client (elastic#22897)
  "shard started" should show index and shard ID (elastic#25157)
  await fix testWithRandomException
  Change BWC versions on create index response
  Return the index name on a create index response
  Remove incorrect bwc branch logic from master
  Correctly format arrays in output
  [Test] Extending parsing checks for SearchResponse (elastic#25148)
  Scripting: Change keys for inline/stored scripts to source/id (elastic#25127)
  [Test] Add test for custom requests in High Level Rest Client (elastic#25106)
  nested: In case of a single type the _id field should be added to the nested document instead of _uid field.
  `type` and `id` are lost upon serialization of `Translog.Delete`. (elastic#24586)
  fix highlighting docs
  Fix NPE in token_count datatype with null value (elastic#25046)
  Remove the postings highlighter and make unified the default highlighter choice (elastic#25028)
  [Test] Adding test for parsing SearchShardFailure leniently (elastic#25144)
  Fix typo in shards.asciidoc (elastic#25143)
  List Hibernate Search (elastic#25145)
  [DOCS] update maxRetryTimeout in java REST client usage page
  ...
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jun 10, 2017
* master: (80 commits)
  Test: remove faling test that relies on merge order
  Log checkout so SHA is known
  Add link to community Rust Client (elastic#22897)
  "shard started" should show index and shard ID (elastic#25157)
  await fix testWithRandomException
  Change BWC versions on create index response
  Return the index name on a create index response
  Remove incorrect bwc branch logic from master
  Correctly format arrays in output
  [Test] Extending parsing checks for SearchResponse (elastic#25148)
  Scripting: Change keys for inline/stored scripts to source/id (elastic#25127)
  [Test] Add test for custom requests in High Level Rest Client (elastic#25106)
  nested: In case of a single type the _id field should be added to the nested document instead of _uid field.
  `type` and `id` are lost upon serialization of `Translog.Delete`. (elastic#24586)
  fix highlighting docs
  Fix NPE in token_count datatype with null value (elastic#25046)
  Remove the postings highlighter and make unified the default highlighter choice (elastic#25028)
  [Test] Adding test for parsing SearchShardFailure leniently (elastic#25144)
  Fix typo in shards.asciidoc (elastic#25143)
  List Hibernate Search (elastic#25145)
  ...
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jun 10, 2017
* master: (1889 commits)
  Test: remove faling test that relies on merge order
  Log checkout so SHA is known
  Add link to community Rust Client (elastic#22897)
  "shard started" should show index and shard ID (elastic#25157)
  await fix testWithRandomException
  Change BWC versions on create index response
  Return the index name on a create index response
  Remove incorrect bwc branch logic from master
  Correctly format arrays in output
  [Test] Extending parsing checks for SearchResponse (elastic#25148)
  Scripting: Change keys for inline/stored scripts to source/id (elastic#25127)
  [Test] Add test for custom requests in High Level Rest Client (elastic#25106)
  nested: In case of a single type the _id field should be added to the nested document instead of _uid field.
  `type` and `id` are lost upon serialization of `Translog.Delete`. (elastic#24586)
  fix highlighting docs
  Fix NPE in token_count datatype with null value (elastic#25046)
  Remove the postings highlighter and make unified the default highlighter choice (elastic#25028)
  [Test] Adding test for parsing SearchShardFailure leniently (elastic#25144)
  Fix typo in shards.asciidoc (elastic#25143)
  List Hibernate Search (elastic#25145)
  ...
@fred84 fred84 deleted the 24928_token_count_mapper_npe branch June 22, 2017 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Foundations/Mapping Index mappings, including merging and defining field types v5.6.0 v6.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants