Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop document in ingest pipeline #23726

Closed
clintongormley opened this issue Mar 23, 2017 · 10 comments · Fixed by #32278
Closed

Drop document in ingest pipeline #23726

clintongormley opened this issue Mar 23, 2017 · 10 comments · Fixed by #32278
Assignees
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement

Comments

@clintongormley
Copy link
Contributor

It'd be really useful to be able to drop a document in an ingest pipeline, ie to not index it at all.

Perhaps, by setting _index to ""?

@clintongormley clintongormley added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement labels Mar 23, 2017
@dadoonet
Copy link
Member

Is it in the case of an exception in the pipeline? I mean that we don't have conditionals so why would you not index a document that you sent to indexation process?

@clintongormley
Copy link
Contributor Author

Conditionals are next ;) (although there are scripts anyway)

But eg a document is going to an index that I no longer want to use - I want to be able to drop the document instead

@dadoonet
Copy link
Member

I wonder if we should have a drop processor to implement that. As we have the same feature in Logstash... https://www.elastic.co/guide/en/logstash/current/plugins-filters-drop.html

@talevy
Copy link
Contributor

talevy commented Mar 23, 2017

@clintongormley I know Beats requested this feature a while back... I was not aware that an empty string _index results in a successful non-index?

One way to achieve this is to run a fail processor. The difference here is that we would not return back a 2XX for this request, and the client would think there was a problem, instead of "dropping was successful".

Would ES need to change to support this? I tried indexing a document with an empty _index and get this:

{
  "took": 0,
  "errors": true,
  "items": [
    {
      "index": {
        "_index": "",
        "_type": "type",
        "_id": "id",
        "status": 500,
        "error": {
          "type": "string_index_out_of_bounds_exception",
          "reason": "String index out of range: 0"
        }
      }
    }
  ]
}

maybe if we introduce a new status for index items that are intended to be dropped? Maybe the pipeline can update metadata that will follow-up with an index request with a drop flag? since we would have to return a response to the user about this operation so the items array aligns with the request body

and respond with:

{
  "took": 884,
  "errors": false,
  "items": [
    {
      "index": {
        "_index": "",
        "_type": "type",
        "_id": "id",
        "_version": 1,
        "result": "dropped",
        "created": false,
        "status": 200
      }
    }
  ]
}

@clintongormley
Copy link
Contributor Author

I was not aware that an empty string _index results in a successful non-index?

No, it was a suggestion. I see the same string length exception that you do.

I like your suggested response for the drop processor.

The one problem is that I don't see how to trigger the drop processor without conditionals. I want to say: "if the index is Foo, then drop this document". There's no easy way to do this. The only conditionals I have are in a script, but then I can't use that to call the drop processor unless I throw some dummy exception to trigger an on_failure handler.

@dadoonet
Copy link
Member

For now (before we have conditionals - was not aware that was a plan BTW), on_failure seems to be the way to go.
I wonder if we can in a script throw a NoOpException which does not print any warn in logs but just triggers the on_failure pipeline.

@dadoonet
Copy link
Member

That said we can also support both. If index is empty or null, skip the operation.

@PhaedrusTheGreek
Copy link
Contributor

+1 for a clean way to drop messages

@talevy
Copy link
Contributor

talevy commented Mar 15, 2018

Closing due to the lack of infrastructure for properly handling this.

Feel free to re-open if this comes up again

@dadoonet
Copy link
Member

Hurray! We are going to support it finally! See #32278

@original-brownbear original-brownbear self-assigned this Jul 30, 2018
original-brownbear added a commit that referenced this issue Sep 5, 2018
* INGEST: Implement Drop Processor
* Adjust Processor API
* Implement Drop Processor
* Closes #23726
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Oct 18, 2018
* INGEST: Implement Drop Processor
* Adjust Processor API
* Implement Drop Processor
* Closes elastic#23726
original-brownbear added a commit that referenced this issue Oct 19, 2018
* INGEST: Implement Drop Processor (#32278)
* Adjust Processor API
* Closes #23726
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants