Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INGEST: Document Processor Conditional #33388

Merged

Conversation

original-brownbear
Copy link
Member

Relates #33188

@original-brownbear original-brownbear added >docs General docs changes :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v7.0.0 v6.5.0 labels Sep 4, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@original-brownbear original-brownbear requested review from talevy, jasontedor, rjernst and jakelandis and removed request for jasontedor September 4, 2018 16:44
[source,js]
--------------------------------------------------
{
"bytes": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the set processor for this example ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that seems less confusing :)

then the processor will be executed for the given document otherwise it will be skipped.
The `if` field takes a map with the script settings used defined in <<script-processor, script-options>>
and accesses a read only version of the document via the same `ctx` variable used by scripts in the
<<script-processor>>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we link to the painless doc ?

Do you think it would useful to also have additional examples (not necessarily the full the processor) of how to implement logstash's ~= ? It seemed to be the most common operator to use for this type of check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yea it could make sense to add more examples (I agree, it's way too hard to figure out how to do that in Painless from what we currently have in the docs).

WDYT @rjernst ?

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, my only concern is making sure the example works in docs tests.

@@ -721,12 +721,29 @@ All processors are defined in the following way within a pipeline definition:
// NOTCONSOLE

Each processor defines its own configuration parameters, but all processors have
the ability to declare `tag` and `on_failure` fields. These fields are optional.
the ability to declare `tag` ,`on_failure` and `if` fields. These fields are optional.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space before comma should be after


A `tag` is simply a string identifier of the specific instantiation of a certain
processor in a pipeline. The `tag` field does not affect the processor's behavior,
but is very useful for bookkeeping and tracing errors to specific processors.

The `if` field must contain a script that returns a boolean value. If the script evaluates to `true`
then the processor will be executed for the given document otherwise it will be skipped.
The `if` field takes a map with the script settings used defined in <<script-processor, script-options>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if doesn't take a map, right? I think you mean "object" (since we are talking about json), and that is optional (the example you have below here has if as the script source directly. You would only need an object if you wanted to pass params (unlikely for ingest I think) or use a different scripting language (not really possible since expressions don't support this, but theoretically a native script could be written as a custom script engine).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right .. "object" wins :)

Hmm on that note and kinda off topic but: maybe we should/could enable expressions here the same way we do for bucket selector aggregation (interpret 1.0 as true) to become consistent there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be difficult and not very useful, since expressions only have access to numerics. Expressions operate on doc values, but ingest does not have doc values, so it would require a lot of hacking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok makes sense :)

}
}
--------------------------------------------------
// NOTCONSOLE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this omitted from tests? Can we make an example that will work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured just with the example config we don't have enough to go by for running anything, we'd need to actually index a document against a concrete pipeline here to make a test out of it right?
(that may be a little confusing given we only want an example of the if field here? Idk, you decide :))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we can have extra setup that is hidden from the generated documentation. I think having the examples always "work" is key to keeping the documentation up to date as apis change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok :) Will look into that tomorrow morning :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't add a test for a snipped like this it seems. I get this error:

Execution failed for task ':docs:buildRestTests'.
> reference/ingest/ingest-node.asciidoc[737:745](js)// CONSOLE: Didn't match (?:(?<comment>#.+)|(?:(?:(?<method>GET|PUT|POST|HEAD|OPTIONS|DELETE)\s+(?<pathAndQuery>[^\n]+)(?<body>(?:\n(?!GET|PUT|POST|HEAD|OPTIONS|DELETE|startyaml|#)[^\n]+)+)?)|(?:startyaml(?s)(?<yaml>.+?)(?-s)endyaml)))\n+: {
    "set": {
      "if": "ctx.bar == 'expectedValue'",
      "field": "foo",
      "value": "bar"
    }
  }

if I try anything here. => Seems I at least need a snippet that includes an actual request. That would break with the style of the following docs that also just show a quick outline of the configuration (without tests) for each processor?

@original-brownbear
Copy link
Member Author

@rjernst @jakelandis

  • Fixed space around comma
  • Fixed wording "map" -> "object"
  • Made the example use "set" instead of "bytes" proc

... but:

  • Tests from the snippet in the form used here seem impossible with our test framework (unless I'm still misunderstanding something about it) ... we could use a different snippet though :)?

@rjernst
Copy link
Member

rjernst commented Sep 7, 2018

@nik9000 Can you provide any suggestions on making the snippet test discussed here work?

@jakelandis
Copy link
Contributor

@original-brownbear @rjernst - I think a standalone page (in addition to what is here), something like "Conditional Execution" would be beneficial. With specific mentions about the regex requirements, conditional dropping of events, and conditional pipelines. I can take the TODO to create that content (and implement the tests there). Hopefully that will unblock this PR.

thoughts ?

@original-brownbear
Copy link
Member Author

@jakelandis I think that (standalone page) would make sense. Putting the example scripts and the how-to use the conditional processor could be pretty helpful for people trying to convert a LS config (or just their problem) into our code without having to jump between the Painless docs and pipeline docs :)

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to get in since we need conditional processor docs, but I would still like suggestions from @nik9000 on ways to test this so we don't need NOTCONSOLE.

@original-brownbear
Copy link
Member Author

original-brownbear commented Oct 23, 2018

@rjernst alright thanks! Merged master into this PR since its been a while => will merge once green then.

@original-brownbear original-brownbear merged commit f0f7329 into elastic:master Oct 23, 2018
@original-brownbear original-brownbear deleted the ingest-conditional-docs branch October 23, 2018 15:37
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Oct 23, 2018
* master: (24 commits)
  ingest: better support for conditionals with simulate?verbose (elastic#34155)
  [Rollup] Job deletion should be invoked on the allocated task (elastic#34574)
  [DOCS] .Security index is never auto created (elastic#34589)
  CCR: Requires soft-deletes on the follower (elastic#34725)
  re-enable bwc tests (elastic#34743)
  Empty GetAliases authorization fix (elastic#34444)
  INGEST: Document Processor Conditional (elastic#33388)
  [CCR] Add total fetch time leader stat (elastic#34577)
  SQL: Support pattern against compatible indices (elastic#34718)
  [CCR] Auto follow pattern APIs adjustments (elastic#34518)
  [Test] Remove dead code from ExceptionSerializationTests (elastic#34713)
  A small typo in migration-assistance doc (elastic#34704)
  ingest: processor stats (elastic#34724)
  SQL: Implement IN(value1, value2, ...) expression. (elastic#34581)
  Tests: Add checks to GeoDistanceQueryBuilderTests (elastic#34273)
  INGEST: Rename Pipeline Processor Param. (elastic#34733)
  Core: Move IndexNameExpressionResolver to java time (elastic#34507)
  [DOCS] Force Merge: clarify execution and storage requirements (elastic#33882)
  TESTING.asciidoc fix examples using forbidden annotation (elastic#34515)
  SQL: Implement `CONVERT`, an alternative to `CAST` (elastic#34660)
  ...
jakelandis pushed a commit to jakelandis/elasticsearch that referenced this pull request Oct 23, 2018
* INGEST: Document Processor Conditional

Relates elastic#33188
jakelandis pushed a commit that referenced this pull request Oct 24, 2018
* INGEST: Document Processor Conditional

Relates #33188
@jakelandis
Copy link
Contributor

6.5 backport: fcad4e7

kcm pushed a commit that referenced this pull request Oct 30, 2018
* INGEST: Document Processor Conditional

Relates #33188
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >docs General docs changes v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants