-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INGEST: Add Pipeline Processor #32473
Conversation
* Adds Processor capable of invoking other pipelines * Closes elastic#31842
Pinging @elastic/es-core-infra |
@rjernst @tsg @talevy @jakelandis just a suggestion for the inter pipeline communication we talked about a while back. Maybe take a look if this kind of API is what we're looking for here, then I'd add tests and clean things up a bit. This version already works fine functionally at least I think :) |
I think there are a couple things needed before this could be merged:
Additionally, we need to think how this can be exposed as a method in painless. I've started discussions on tweaking how SPI works to possibly allow functors to be bound to a method in painless through the whitelist, so that the method call to say |
Sure that's a trivial
Def, that sounds good but would be a much larger refactoring. I just went with the most straightforward approach here for the POC. I can look into making that refactoring that if this is something we want now? |
I like the approach, it is pretty intuitive. I assume that if you have a pipeline processor in the middle of your (outer) pipeline that control will eventually come back to the original (outer) pipeline, and finish out that pipeline's execution ? Also, we should probably detect loops, preferably at time of creation... however that will be much more complicated when we get the |
yea that should work just fine and transparently :)
Yea this is a good one: I guess detecting loops if we also want to factor in conditionals is impossible (that's hypercomputation imo :P). I think theoretically we have the choice between not detecting this and allowing some crazy recursive pipelines or simply walking the pipeline graph and not allowing circles even if conditionals are present. |
We could, in-addition to walking the pipeline at creation, provide runtime checks such that no single processor can execute more then N times in the context of a single document (a circuit breaker of sorts). This would however, require maintaining some additional per document state accessible to all processors in the pipeline. I think this may be warranted, since "shooting yourself in the foot" here could mean a full stop of all ingest. |
Right now that you mention it ... we have to have this imo. You can update pipelines ... so some up front check isn't sufficient here in preventing infinite loops. |
I don't think we should have or need stack depth limits like that. We should simply not allow recursion at all? If we use IngestDocument to track the stack, and do not allow any recursion, then an identity hash set of the processors being executed should work. When a processor starts, it adds itself to the hash set, erroring if it already existed, then removes itself when it is done. |
I'm fine either way. I don't see a good use case for recursion. |
* Moves all pipeline state into the ingest service * Retains the existing pipeline store and pipeline execution service as inner classes to make the review easier, they should be flattened out in the next step * All tests for these classes were copied (and adapted) to the ingest service tests * This is a refactoring step to enable a clean implementation of a pipeline processor (See elastic#32473)
From the user perspective, I think this is a wonderful enhancement, enabling for example code reuse between Beats modules. It also fits nicely with the per-processor conditionals. I guess it's going to be possible to add a condition to the +1 on forbidding recursion, that seems like the right decision to me. |
* INGEST: Move all Pipeline State into IngestService * Moves all pipeline state into the ingest service * Retains the existing pipeline store and pipeline execution service as inner classes to make the review easier, they should be flattened out in the next step * All tests for these classes were copied (and adapted) to the ingest service tests * This is a refactoring step to enable a clean implementation of a pipeline processor (See #32473)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple comments, and yes, unit tests would be good.
import java.util.HashMap; | ||
import java.util.Map; | ||
|
||
public final class PipelineHolder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this accidentally added back? It shouldn't be necessary now right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst yea I apparently suck at merging :) Will remove
* @throws Exception On exception in pipeline execution | ||
*/ | ||
public boolean executePipeline(Pipeline pipeline) throws Exception { | ||
if (this.executedPipelines.add(pipeline) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using a set, I think there a call stack could be passed through, that doesn't need to be a member variable? I don't know ingest document coming more mutable than it already is. This method signature is also odd, as I would expect this to be an exception, but it looks like you are avoiding that because it would collide with exceptions that could be thrown from the pipeline itself? But this should fail the pipeline anyways, so I think it is ok to use an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst I can't really pass the callstack with the document can I? I only have the execute
method available to pass things (because any called pipeline itself could contain additional pipeline processors).
The reason I made this return boolean
instead of throwing right away was more of a style thing to make it clear that the exception was triggered by the pipeline processor. But in hindsight this may be a little needlessly complex :) Moving it in here.
@rjernst questions answered, unit tests added :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally suggesting changing the signature of execute on the ingest service to take in the stack, but I don't feel that strongly about it. LGTM.
@rjernst thanks! Will merge once CI passes :) |
* master: Painless: Add Bindings (#33042) Update version after client credentials backport Fix forbidden apis on FIPS (#33202) Remote 6.x transport BWC Layer for `_shrink` (#33236) Test fix - Graph HLRC tests needed another field adding to randomisation exception list HLRC: Add ML Get Records API (#33085) [ML] Fix character set finder bug with unencodable charsets (#33234) TESTS: Fix overly long lines (#33240) Test fix - Graph HLRC test was missing field name to be excluded from randomisation logic Remove unsupported group_shard_failures parameter (#33208) Update BucketUtils#suggestShardSideQueueSize signature (#33210) Parse PEM Key files leniantly (#33173) INGEST: Add Pipeline Processor (#32473) Core: Add java time xcontent serializers (#33120) Consider multi release jars when running third party audit (#33206) Update MSI documentation (#31950) HLRC: create base timed request class (#33216) [DOCS] Fixes command page titles HLRC: Move ML protocol classes into client ml package (#33203) Scroll queries asking for rescore are considered invalid (#32918) Painless: Fix Semicolon Regression (#33212) ingest: minor - update test to include dissect (#33211) Switch remaining LLREST usage to new style Requests (#33171) HLREST: add reindex API (#32679)
* INGEST: Add Pipeline Processor * Adds Processor capable of invoking other pipelines * Closes elastic#31842
Example:
Put inner and outer pipeline (with new processor):
Add document:
Works:
=>