Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove all fields other than a defined list #83010

Closed
markwalkom opened this issue Jan 25, 2022 · 10 comments
Closed

Remove all fields other than a defined list #83010

markwalkom opened this issue Jan 25, 2022 · 10 comments
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement good first issue low hanging fruit Team:Data Management Meta label for data/management team

Comments

@markwalkom
Copy link
Contributor

This came up in a forum thread - https://discuss.elastic.co/t/reindex-identify-fields-to-remove-with-a-regular-expression/295275

The idea being that if you want to remove a bunch of fields, that you may not necessarily know the names of (or don't care), having a config option for the remove processor that you could simply define a set of fields to keep, thereyby removing everything else, would be a simple solution.

@markwalkom markwalkom added >enhancement needs:triage Requires assignment of a team area label labels Jan 25, 2022
@nik9000 nik9000 added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP team-discuss and removed needs:triage Requires assignment of a team area label labels Jan 26, 2022
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Jan 26, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@jakelandis
Copy link
Contributor

Sounds like a good enhancement request. However, we probably don't want to actually use reg-ex's here. Either a new keep processor (which is the opposite of remove) or a new field all_except_fields (mutually exclusive with fields) would likely be the best implementation.

@jakelandis jakelandis added good first issue low hanging fruit and removed team-discuss labels Jan 27, 2022
@harshlancer
Copy link

If this issue is beginner-friendly I would like to start my open source journey with this issue.
Please tell me if I can start working on this...

@danhermann
Copy link
Contributor

@harshlancer, you are welcome to start working on this if you would like. I would suggest adding a new option to the existing remove processor rather than creating a whole new processor to perform this function.

@harshlancer
Copy link

Sorry I don't think this issue is for a complete newbie....

@zembrzuski
Copy link
Contributor

Hi @danhermann
Since @harshlancer is not going to work on this issue, I will give it a try on this issue.

@zembrzuski
Copy link
Contributor

Just to keep you guys posted: I am already working on the implementation of this issue, and I expect to raise a PR in 2 or 3 days.
Tks!

zembrzuski added a commit to zembrzuski/elasticsearch that referenced this issue Feb 8, 2022
- Enhancement related to issue 83010 [elastic#83010]
@zembrzuski
Copy link
Contributor

Hi guys!
I've created a PR to try to address this issue:
Please let me know if I am on track to get it done :)

@danhermann
Copy link
Contributor

Closed by #83665

@zez3
Copy link

zez3 commented Sep 8, 2023

Wouldn't make more sense to have a correct parser whatever Integration in the first place and not waste those computational resources on ingesting first wrong and delete after?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement good first issue low hanging fruit Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

8 participants