-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] verify that there are no duplicate leaf fields in aggs #41895
[ML] verify that there are no duplicate leaf fields in aggs #41895
Conversation
Pinging @elastic/ml-core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some comments
// TODO this will need changed once we allow multi-bucket aggs + field merging | ||
aggregationConfig.getAggregatorFactories().forEach(agg -> addAggNames(agg, usedNames)); | ||
aggregationConfig.getPipelineAggregatorFactories().forEach(agg -> addAggNames(agg, usedNames)); | ||
usedNames.addAll(groups.getGroups().keySet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might miss something, but wouldn't it be simpler to sort and then compare adjacent name pairs?
(of course you need logic to handle the dots)
} | ||
|
||
for (String fullName : usedNames) { | ||
String[] tokens = fullName.split("\\."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you omit the dots, so what if I have foo.bar.baz
and foobar
? If I get it right, this would fail validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. I need to fix that
} | ||
|
||
|
||
private static void addAggNames(AggregationBuilder aggregationBuilder, List<String> names) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could be just Collection<String>
?
aggregationBuilder.getPipelineAggregations().forEach(agg -> addAggNames(agg, names)); | ||
} | ||
|
||
private static void addAggNames(PipelineAggregationBuilder pipelineAggregationBuilder, List<String> names) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: as above, could be just Collection ?
@@ -136,6 +139,74 @@ public void testDoubleAggs() throws IOException { | |||
expectThrows(IllegalArgumentException.class, () -> createPivotConfigFromString(pivot, false)); | |||
} | |||
|
|||
public void testAggNameValidations() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests are good, but I wonder if aggFieldValidation(...)
could be re-factored in a way to test it at the unit test level with less boiler plate and more coverage?
...src/main/java/org/elasticsearch/xpack/core/dataframe/action/PutDataFrameTransformAction.java
Show resolved
Hide resolved
|
||
List<String> validationFailures = new ArrayList<>(); | ||
List<String> usedNames = new ArrayList<>(); | ||
// TODO this will need changed once we allow multi-bucket aggs + field merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"need changed" -> "need to be changed"?
|
||
for (String fullName : usedNames) { | ||
String[] tokens = fullName.split("\\."); | ||
for (int i = tokens.length - 1; i > 0; i--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why do you iterate backwards and create a separate StringBuilder in each iteration?
I would go fo something like:
for (String fullName : usedNames) {
String[] tokens = fullName.split("\.");
StringBuilder prefix = new StringBuilder();
for each token:
prefix.append(token)
check in "leafNames"
I believe this code will be both more performant and easier to read. Please LMK if I'm missing something here.
.../core/src/main/java/org/elasticsearch/xpack/core/dataframe/transforms/pivot/PivotConfig.java
Show resolved
Hide resolved
assertFalse(fieldValidation.isEmpty()); | ||
assertThat(fieldValidation.get(0), equalTo("field [user] cannot be both an object and a field")); | ||
|
||
pivotAggs = "{" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually it is a good idea to split such a long test method with independent tests into a few (here, 4) shorter methods. This makes the tests more "unit", thus increasing readability.
run elasticsearch-ci/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
...src/main/java/org/elasticsearch/xpack/core/dataframe/action/PutDataFrameTransformAction.java
Show resolved
Hide resolved
.../core/src/main/java/org/elasticsearch/xpack/core/dataframe/transforms/pivot/PivotConfig.java
Outdated
Show resolved
Hide resolved
.../core/src/main/java/org/elasticsearch/xpack/core/dataframe/transforms/pivot/PivotConfig.java
Show resolved
Hide resolved
.../core/src/main/java/org/elasticsearch/xpack/core/dataframe/transforms/pivot/PivotConfig.java
Show resolved
Hide resolved
.../src/test/java/org/elasticsearch/xpack/core/dataframe/transforms/pivot/PivotConfigTests.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…41895) * [ML] verify that there are no duplicate leaf fields in aggs * addressing pr comments * addressing PR comments * optmizing duplication check
…41895) * [ML] verify that there are no duplicate leaf fields in aggs * addressing pr comments * addressing PR comments * optmizing duplication check
This PR adds two validations for the data frame pivot config:
group_by
or theaggs
definitionsfoo.bar.baz
andfoo.bar
The best case scenario before this PR is that we can automatically determine the mapped type and we prevent the transform from even being started. However, if we rely on the dynamic mapping, index mapping failures will spam the logs until the task eventually fails due to the indexing failures.