-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML-Dataframe] Feature/fib multi aggs and sources #34525
[ML-Dataframe] Feature/fib multi aggs and sources #34525
Conversation
Pinging @elastic/ml-core |
retest this please |
ed77c6a
to
cdcfba8
Compare
} | ||
return document; | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we are needlessly creating a leaky/incomplete abstraction. We should return a stream of objects that knows how to transform themselves into an XContentObject
instead of creating a HashMap
and then requiring the caller to know how to do the transformation (even thought the caller is us). This will afford us a nice separation of duties and an OK abstraction, though definitely not perfect as aggregations are so flexible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on what is leaky/incomplete?
Using a map is intentional, will add some reasoning as documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that all of this building could be self contained. As it stands, we have multiple functions passing around not well defined objects and just happen to know the inner workings of it. Maybe that is just how aggregations are, but seems to me a very easy way to introduce latent bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer the questions:
We have no schema for the output, it could be anything, having that said, this is an internal helper class, not exposed to the outside world. Before it goes out XContent::map
ensures it is proper.
Regarding using maps as intermediate format: I checked for a more high-level construct and it turns out, there isn't one. The concept of a document in ingest is Map<String, Object>
, so I choose the same here. Eventually we want to support running post-pivot ingest transformations. So this allows us to easily hook in ingest. (LBNL the map makes testing easier, but that isn't the primary reason.)
XContentBuilder builder; | ||
try { | ||
builder = jsonBuilder(); | ||
builder.startObject(); | ||
builder.field(destinationFieldName, b.getKey().get(destinationFieldName)); | ||
builder.field(aggName, aggResult.value()); | ||
builder.map(document); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, I think whatever object we return should know how to take a builder and add itself appropriately.
import java.util.Map; | ||
import java.util.stream.Stream; | ||
|
||
public class AggregationResultUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this class is only ever going to contain static
utility methods it's usual to make it final
and add a private
no-op constructor.
List<CompositeValuesSourceBuilder<?>> sources, Collection<AggregationBuilder> aggregationBuilders) { | ||
return agg.getBuckets().stream().map(bucket -> { | ||
Map<String, Object> document; | ||
document = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The declaration and initial assignment could all be one statement.
} else { | ||
// should never be reached as we should prevent creating | ||
// jobs with unsupported aggregations | ||
logger.error("Unsupported aggregation, ignoring"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this ever did happen it's going to be frustrating as it doesn't say what the unsupported aggregation was. I think the message should include the type of the aggregation. Also, you could assert
we don't get here. (The CI runs with assertions enabled but production runs with assertions disabled.)
public void testExtractCompositeAggregationResults() throws IOException { | ||
String targetField = randomAlphaOfLengthBetween(5, 10); | ||
|
||
List<CompositeValuesSourceBuilder<?>> sources = asList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Arrays.asList
with 1 element shows up as a warning in IntelliJ. You can avoid this by using Collections.singletonList
for 1 element lists.
4af0d68
to
f331658
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merging this to get some progress, I expect this part to be revisited, not the final version. Note on CI: Broken as the new module is unknown and causes some tests to fail, so no green CI at the moment. |
FEATURE BRANCH PR
implement support for multiple sources and aggregations