Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] add support for terms agg in transforms #56696

Merged

Conversation

benwtrent
Copy link
Member

This adds support for terms and rare_terms aggs in transforms.

The default behavior is that the results are collapsed in the following manner:
<AGG_NAME>.<BUCKET_NAME>.<SUBAGGS...>...
Or if no sub aggs exist
<AGG_NAME>.<BUCKET_NAME>.<_doc_count>

The mapping is also defined as flattened by default. This is to avoid field explosion while still providing (limited) search and aggregation capabilities.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

@benwtrent
Copy link
Member Author

Reviewer, on the backport, I will place a check for flattened as being enabled on PUT for terms or rare_terms aggs to be used. Ability to disable it is removed in 8.

@benwtrent benwtrent force-pushed the feature/ml-add-terms-support-transforms branch from 1c8f47c to c18238c Compare May 13, 2020 16:15
Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! It would be good to cover the nesting case, otherwise LGTM.

@@ -265,12 +265,12 @@ setup:
"group_by": {
"time": {"date_histogram": {"fixed_interval": "1h", "field": "time"}}},
"aggs": {
"vals": {"terms": {"field":"airline"}}
"vals": {"significant_terms": {"field":"airline"}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄

@@ -246,6 +250,35 @@ public Object value(Aggregation agg, Map<String, String> fieldTypeMap, String lo
}
}

static class MultiBucketsAggExtractor implements AggValueExtractor {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this is almost exactly the way I implemented it, too (well, there is probably no other way).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great minds think alike.

if (bucket.getAggregations().iterator().hasNext() == false) {
nested.put(bucket.getKeyAsString(), bucket.getDocCount());
} else {
HashMap<String, Object> nestedBucketObject = new HashMap<>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to cover this branch and have a test with nested terms aggs, like your common user example, broke down by e.g. businesses or filtered by something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will do.

Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@benwtrent benwtrent merged commit fd812d2 into elastic:master May 15, 2020
@benwtrent benwtrent deleted the feature/ml-add-terms-support-transforms branch May 15, 2020 11:11
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request May 15, 2020
This adds support for `terms` and `rare_terms` aggs in transforms. 

The default behavior is that the results are collapsed in the following manner:
`<AGG_NAME>.<BUCKET_NAME>.<SUBAGGS...>...`
Or if no sub aggs exist
`<AGG_NAME>.<BUCKET_NAME>.<_doc_count>`

The mapping is also defined as `flattened` by default. This is to avoid field explosion while still providing (limited) search and aggregation capabilities.
benwtrent added a commit that referenced this pull request May 15, 2020
…56809)

* [Transform] add support for terms agg in transforms (#56696)

This adds support for `terms` and `rare_terms` aggs in transforms. 

The default behavior is that the results are collapsed in the following manner:
`<AGG_NAME>.<BUCKET_NAME>.<SUBAGGS...>...`
Or if no sub aggs exist
`<AGG_NAME>.<BUCKET_NAME>.<_doc_count>`

The mapping is also defined as `flattened` by default. This is to avoid field explosion while still providing (limited) search and aggregation capabilities.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants