Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RemoveProcessor updated to support fieldsToKeep #83665

Merged
merged 10 commits into from
Mar 7, 2022

Conversation

zembrzuski
Copy link
Contributor

@zembrzuski zembrzuski commented Feb 8, 2022

Relates to #83010

- Enhancement related to issue 83010 [elastic#83010]
@elasticsearchmachine elasticsearchmachine added v8.2.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Feb 8, 2022
@danhermann danhermann self-assigned this Feb 8, 2022
@danhermann danhermann added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Feb 8, 2022
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Feb 8, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@danhermann
Copy link
Contributor

@elasticmachine ok to test

@danhermann
Copy link
Contributor

@zembrzuski, thank you for working on this. In thinking a bit about it, I think the fields to keep should always include the metadata fields on the document. These are necessary for ingestion to work properly and their removal causes a variety of problems. I'll probably have a few more suggestions after a more thorough review, but I do know this change will be necessary.

@zembrzuski
Copy link
Contributor Author

@zembrzuski, thank you for working on this. In thinking a bit about it, I think the fields to keep should always include the metadata fields on the document. These are necessary for ingestion to work properly and their removal causes a variety of problems. I'll probably have a few more suggestions after a more thorough review, but I do know this change will be necessary.

Hi @danhermann
I have just pushed another patch. Now, only fields that are not metadata are eligible for being removed from the document. I also added some unit tests to make sure that every metadata is present in the copy of the document.

Tks so much for reviewing my PR. This is one of my first PRs, and I am very excited about contributing to ElasticSearch.

@danhermann
Copy link
Contributor

@elasticmachine update branch

Copy link
Contributor

@danhermann danhermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zembrzuski, thank you for working on this. It looks pretty good. There are just a few changes to be made below and then I think we'll be able to get it merged.

Comment on lines 47 to 51
RemoveProcessor(String tag, String description, List<TemplateScript.Factory> fieldsToRemove, boolean ignoreMissing) {
super(tag, description);
this.fieldsToRemove = new ArrayList<>(fieldsToRemove);
this.fieldsToKeep = new ArrayList<>();
this.ignoreMissing = ignoreMissing;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, this constructor has only one use in a test class. I'd suggest removing it and updating that one reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

.filter(documentField -> IngestDocument.Metadata.isMetadata(documentField) == false)
.filter(documentField -> shouldKeep(documentField, fieldsToKeep, document) == false)
.forEach(documentField -> removeWhenPresent(document, documentField));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an else block here. I know that one of either fieldsToKeep or fieldsToRemove must be empty, but I prefer not to skip the entire block rather than iterate over an empty list in the latter case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment on lines 882 to 884
public static boolean isMetadata(String field) {
return Arrays.stream(Metadata.values()).anyMatch(e -> e.fieldName.equals(field));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's statically initialize a list or set of metadata field names so we're not creating a new stream every time the processor is executed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment on lines 801 to 803
if (v instanceof Map) {
allFields.addAll(getAllFields((Map) v, prefix + k + "."));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: let's use instanceof pattern matching here

Suggested change
if (v instanceof Map) {
allFields.addAll(getAllFields((Map) v, prefix + k + "."));
}
if (v instanceof Map mapValue) {
allFields.addAll(getAllFields(mapValue, prefix + k + "."));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment on lines 108 to 118
if (compiledTemplatesToRemove.isEmpty() && compiledTemplatesToKeep.isEmpty()) {
throw new IllegalArgumentException(
"missing field [processors.remove.keep] or [processors.remove.field]. Please specify one of them."
);
}

if (compiledTemplatesToRemove.isEmpty() == false && compiledTemplatesToKeep.isEmpty() == false) {
throw new IllegalArgumentException(
"Too many fields specified. Please specify either [processors.remove.keep] or [processors.remove.field]."
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of consistency, let's use the same exception and message style as other processors with mutually exclusive configuration options. See the network direction processor for an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -0,0 +1,5 @@
pr: 83665
summary: "RemoveProcessor updated to support fieldsToKeep"
area: Infra/Scripting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this falls into the "ingest" category:

Suggested change
area: Infra/Scripting
area: Ingest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@zembrzuski zembrzuski force-pushed the issue_83010 branch 2 times, most recently from 23e71f2 to 42f610a Compare March 2, 2022 18:25
@zembrzuski
Copy link
Contributor Author

Hi @danhermann
Thanks for reviewing this PR! I've just pushed the code changes asked in the code review.

@danhermann
Copy link
Contributor

@zembrzuski, I think you just need to update the error message in the RemoveProcessorFactory tests so they pass.

@zembrzuski
Copy link
Contributor Author

@zembrzuski, I think you just need to update the error message in the RemoveProcessorFactory tests so they pass.

Ohh, my bad. I've just fixed these tests.

@danhermann
Copy link
Contributor

@elasticmachine update branch

@danhermann
Copy link
Contributor

@elasticmachine run elasticsearch-ci/rest-compatibility

@danhermann
Copy link
Contributor

@zembrzuski, everything here looks good. One minor change needed and then we can get this merged:

On the following two pairs of lines, processors.remove.keep should just be keep and processors.remove.field should just be field:
https://github.com/elastic/elasticsearch/pull/83665/files#diff-e44059a95b4557dd2e3c310c64cca96ce1c0dc1f479cfa40c2c32634a91c8512R116-R117

https://github.com/elastic/elasticsearch/pull/83665/files#diff-e44059a95b4557dd2e3c310c64cca96ce1c0dc1f479cfa40c2c32634a91c8512R125-R126

That will change the error messages a bit, so a couple of the unit tests will likely need to be updated accordingly.

@zembrzuski
Copy link
Contributor Author

Hi @danhermann
I've just updated the messages and the unit tests! Tks!

@danhermann
Copy link
Contributor

Hi @danhermann I've just updated the messages and the unit tests! Tks!

Thanks, @zembrzuski. It looks like the formatting check is complaining about a couple lines now. You should be able to fix that by running ./gradlew :modules:ingest-common:spotlessApply.

@zembrzuski
Copy link
Contributor Author

Hi @danhermann I've just updated the messages and the unit tests! Tks!

Thanks, @zembrzuski. It looks like the formatting check is complaining about a couple lines now. You should be able to fix that by running ./gradlew :modules:ingest-common:spotlessApply.

Ohh, I'm sorry. I've read about checkstyle in the contribution guideline but I forgot to check it this time.

@danhermann
Copy link
Contributor

Looks great, @zembrzuski! I've merged this in and it will be available in the next release. Thanks again for your contribution!

@danhermann danhermann merged commit eadb566 into elastic:master Mar 7, 2022
@ddolcimascolo
Copy link

Hey guys, so this landed in 8.2 right? Is it possible that the documentation was forgotten? I can't find anything related to the new keep configuration option in https://www.elastic.co/guide/en/elasticsearch/reference/current/remove-processor.html

I'll give it a try anyway :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Data Management Meta label for data/management team v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants