You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed (at least for me) unexpected behavior in the field resolution of the significant terms aggregations when prepending the document type to the field name:
java.lang.NullPointerException
at org.elasticsearch.search.aggregations.bucket.significant.SignificantTermsAggregatorFactory.getBackgroundFrequency(SignificantTermsAggregatorFactory.java:190)
at org.elasticsearch.search.aggregations.bucket.significant.SignificantStringTermsAggregator.buildAggregation(SignificantStringTermsAggregator.java:87)
at org.elasticsearch.search.aggregations.bucket.significant.SignificantStringTermsAggregator$WithOrdinals.buildAggregation(SignificantStringTermsAggregator.java:129)
at org.elasticsearch.search.aggregations.AggregationPhase.execute(AggregationPhase.java:135)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:136)
...
and in the current master this leads to "Infinity" scores:
Thanks for raising this issue.
Adding support for doctype prefixes in field names probably brings the expectation that background frequencies are also filtered by the doc type e.g. if an index contains tweet and email doc types then tweet.text should filter out any counts for email.text occurrences in the single indexed text field held by the es index.
This scenario will be expensive as we'll need to drop down a level into postings to count docs that match a doctype filter (ideally only if the index contains > 1 doctype with that field).
I'm not sure what would happen if the query is on an indiscriminate text field (so querying both email and tweet doc types) but the significant_terms analysis is requested on a qualified email.text field - a quick test with a plain terms agg suggests the counts produced in this case are not filtered by doc type. So if the foreground stats obtained from FieldData cache are unfiltered then there is a case for making the background stats unfiltered too.
Hi,
I noticed (at least for me) unexpected behavior in the field resolution of the significant terms aggregations when prepending the document type to the field name:
instead of
leads to an NPE in ES 1.1.0:
and in the current master this leads to "Infinity" scores:
Without the document type it works fine in both ES versions. Here is a gist to reproduce it: https://gist.github.com/hkorte/9974567
The text was updated successfully, but these errors were encountered: