Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish between simple matches with and without the terms index #63945

Merged
merged 8 commits into from
Oct 27, 2020

Conversation

romseygeek
Copy link
Contributor

We currently use TextSearchInfo to let query parsers know when a field
will support match queries. Some field types (numeric, constant, range)
can produce simple match queries that don't use the terms index, and
it is useful to distinguish between these fields on the one hand and
keyword/text-type fields on the other. In particular, the
SignificantTextAggregation only works on fields that have indexed terms,
but there is currently no efficient way to see this at search time and so
the factory falls back on checking to see if an index analyzer has been
defined, with the result that some nonsensical field types are permitted.

This commit adds a new static TextSearchInfo implementation called
SIMPLE_MATCH_WITHOUT_TERMS that can be returned by field
types with no corresponding terms index. It changes significant text
to check for this rather than for the presence of an index analyzer.

This is a breaking change, in that the significant text agg will now throw
an error up-front if you try and apply it to a numeric field, whereas before
you would get an empty result.

@romseygeek romseygeek added >bug >breaking :Search Foundations/Mapping Index mappings, including merging and defining field types v8.0.0 v7.11.0 labels Oct 20, 2020
@romseygeek romseygeek self-assigned this Oct 20, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Oct 20, 2020
Copy link
Contributor

@polyfractal polyfractal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I stumbled into that oddity when working up tests for SigText after the VS refactoring, but didn't really have an idea how to fix it cleanly at the time :)

Probably best to wait on Mark's review since he knows the agg better, but 👍 from me

Do we need to add a note to the breaking-changes doc page?

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of minor comments/questions, LGTM otherwise

} else {
r = client().prepareSearch("cache_test_idx").setSize(0).addAggregation(significantTerms("foo").field("s")).get();
}
r = client().prepareSearch("cache_test_idx").setSize(0).addAggregation(significantTerms("foo").field("s")).get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we lose coverage with these changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, you're right, I had thought this was just testing a useless agg but actually I should change it instead to also have some text and to run the sigText agg over the text field instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@@ -42,7 +42,7 @@
public static final VersionFieldType INSTANCE = new VersionFieldType();

private VersionFieldType() {
super(NAME, false, false, true, TextSearchInfo.SIMPLE_MATCH_ONLY, Collections.emptyMap());
super(NAME, false, false, true, TextSearchInfo.NONE, Collections.emptyMap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have slipped when I double checked all the mappers, maybe it does not have a test so we don't enforce its consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there's no test for VersionFieldMapper - I'll add one as a follow up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you

if (ft.getTextSearchInfo() == TextSearchInfo.NONE) {
return false;
}
return ft.getTextSearchInfo() != TextSearchInfo.SIMPLE_MATCH_WITHOUT_TERMS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be return ft.getTextSearchInfo != NONE && gt.getTextSearchIngo != SIMPLE_MATCH_WITHOUT_TERMS ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@romseygeek
Copy link
Contributor Author

Do we need to add a note to the breaking-changes doc page?

We do, but I think I can only do that on backport as we don't have per-7.x-version changes in master?

@markharwood
Copy link
Contributor

markharwood commented Oct 21, 2020

Looks good but I'm not sure about the terms part of naming TextSearchInfo.SIMPLE_MATCH_WITHOUT_TERMS.
While it relates to Lucene's termsEnum it may be misinterpreted as relating to supporting term queries.

Number fields support a term query e.g. this works:

"term":{"num_tags" : "1"}

I recall they actually do a range query behind the scenes.
I don't have a better suggestion for SIMPLE_MATCH_WITHOUT_TERMS currently.

Copy link
Contributor

@markharwood markharwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test that checks annotated_text is supported by significant_text?

@polyfractal
Copy link
Contributor

We do, but I think I can only do that on backport as we don't have per-7.x-version changes in master?

Doh, yes, ignore me. :)

@romseygeek
Copy link
Contributor Author

it may be misinterpreted as relating to supporting term queries.

I'm less worried about this, because it's an internal name and I think the javadoc is clear enough?

Is there a test that checks annotated_text is supported by significant_text?

There isn't, I'll see if I can add one.

@romseygeek
Copy link
Contributor Author

Adding aggs tests to annotated text turns out to not be trivial. In the interest of not blocking this PR, I've opened a separate issue to deal with that.

@romseygeek
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-sample-unix (gradle download failure)

@romseygeek romseygeek merged commit 4191c72 into elastic:master Oct 27, 2020
@romseygeek romseygeek deleted the mapper/numericsearchinfo branch October 27, 2020 12:08
romseygeek added a commit that referenced this pull request Oct 27, 2020
…63945)

We currently use TextSearchInfo to let query parsers know when a field
will support match queries. Some field types (numeric, constant, range)
can produce simple match queries that don't use the terms index, and
it is useful to distinguish between these fields on the one hand and
keyword/text-type fields on the other. In particular, the
SignificantTextAggregation only works on fields that have indexed terms,
but there is currently no efficient way to see this at search time and so
the factory falls back on checking to see if an index analyzer has been
defined, with the result that some nonsensical field types are permitted.

This commit adds a new static TextSearchInfo implementation called
SIMPLE_MATCH_WITHOUT_TERMS that can be returned by field
types with no corresponding terms index. It changes significant text
to check for this rather than for the presence of an index analyzer.

This is a breaking change, in that the significant text agg will now throw
an error up-front if you try and apply it to a numeric field, whereas before
you would get an empty result.
romseygeek added a commit to romseygeek/elasticsearch that referenced this pull request Jan 13, 2021
romseygeek added a commit that referenced this pull request Jan 13, 2021
romseygeek added a commit that referenced this pull request Jan 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking >bug :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants