Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML-Dataframe] Feature/fib multi aggs and sources #34525

Merged
merged 5 commits into from
Oct 19, 2018

Conversation

hendrikmuhs
Copy link

FEATURE BRANCH PR

implement support for multiple sources and aggregations

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@hendrikmuhs
Copy link
Author

retest this please

@hendrikmuhs hendrikmuhs force-pushed the feature/fib-multi-aggs branch from ed77c6a to cdcfba8 Compare October 17, 2018 07:14
}
return document;
});
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we are needlessly creating a leaky/incomplete abstraction. We should return a stream of objects that knows how to transform themselves into an XContentObject instead of creating a HashMap and then requiring the caller to know how to do the transformation (even thought the caller is us). This will afford us a nice separation of duties and an OK abstraction, though definitely not perfect as aggregations are so flexible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on what is leaky/incomplete?

Using a map is intentional, will add some reasoning as documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that all of this building could be self contained. As it stands, we have multiple functions passing around not well defined objects and just happen to know the inner workings of it. Maybe that is just how aggregations are, but seems to me a very easy way to introduce latent bugs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer the questions:

We have no schema for the output, it could be anything, having that said, this is an internal helper class, not exposed to the outside world. Before it goes out XContent::map ensures it is proper.

Regarding using maps as intermediate format: I checked for a more high-level construct and it turns out, there isn't one. The concept of a document in ingest is Map<String, Object>, so I choose the same here. Eventually we want to support running post-pivot ingest transformations. So this allows us to easily hook in ingest. (LBNL the map makes testing easier, but that isn't the primary reason.)

XContentBuilder builder;
try {
builder = jsonBuilder();
builder.startObject();
builder.field(destinationFieldName, b.getKey().get(destinationFieldName));
builder.field(aggName, aggResult.value());
builder.map(document);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, I think whatever object we return should know how to take a builder and add itself appropriately.

import java.util.Map;
import java.util.stream.Stream;

public class AggregationResultUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this class is only ever going to contain static utility methods it's usual to make it final and add a private no-op constructor.

List<CompositeValuesSourceBuilder<?>> sources, Collection<AggregationBuilder> aggregationBuilders) {
return agg.getBuckets().stream().map(bucket -> {
Map<String, Object> document;
document = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The declaration and initial assignment could all be one statement.

} else {
// should never be reached as we should prevent creating
// jobs with unsupported aggregations
logger.error("Unsupported aggregation, ignoring");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this ever did happen it's going to be frustrating as it doesn't say what the unsupported aggregation was. I think the message should include the type of the aggregation. Also, you could assert we don't get here. (The CI runs with assertions enabled but production runs with assertions disabled.)

public void testExtractCompositeAggregationResults() throws IOException {
String targetField = randomAlphaOfLengthBetween(5, 10);

List<CompositeValuesSourceBuilder<?>> sources = asList(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Arrays.asList with 1 element shows up as a warning in IntelliJ. You can avoid this by using Collections.singletonList for 1 element lists.

@hendrikmuhs hendrikmuhs force-pushed the feature/fib-multi-aggs branch from 4af0d68 to f331658 Compare October 18, 2018 08:30
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hendrikmuhs hendrikmuhs merged commit ad8cc92 into elastic:feature/fib Oct 19, 2018
@hendrikmuhs
Copy link
Author

Merging this to get some progress, I expect this part to be revisited, not the final version.

Note on CI: Broken as the new module is unknown and causes some tests to fail, so no green CI at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :ml Machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants