-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(perf): Unexpanding pipeline without StageContext #3307
fix(perf): Unexpanding pipeline without StageContext #3307
Conversation
When calling the following endpoint: `v2/applications/{}/pipelines` with `expand=false` the `TaskController` will try "unexpand" the pipelines by removing "context" from all the stages. However, there is a wrinkle: during removal of the context we want to preserve the `group` key (if it exists). In order to check if the key is there, we make a call to `context.group`. The problem is that the `context` is actually a `StageContext` and the `.group` is actually a `StageContext::get("group")`. This particular call traverses the entire execution graph to see if `group` exists not only in the current stage context but also in the outputs of any of it's parent stages. Well, if the stage graph is large this is a O(n^2) operation. The kicker is that the `group` should never occur in the outputs (and if it does we don't care). This stage replaces the implicit groovy call to `.get` with explicit call to `.containsKey` which doesn't traverse the stage hierarchy. The result (on a 300 stage Kayenta canary pipeline): Before: 17.2s to fetch the last 2 executions After: 0.2s to fetch the last 2 executions
Also see semi-related: #3306 they are related and can go together but I want to merge separately since the other change is a bit more risky |
@@ -685,7 +685,7 @@ class TaskController { | |||
pipelineExecutions.each { pipelineExecution -> | |||
clearTriggerStages(pipelineExecution.trigger.other) // remove from the "other" field - that is what Jackson works against | |||
pipelineExecution.getStages().each { stage -> | |||
if (stage.context?.group) { | |||
if (stage.context?.containsKey("group")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expected to get my mind blown and this was even better 👏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orca - it aims to please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be better to check & change other stage.context.prop
type access and change if needed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 501. also does the same stage.context.manualSkip.. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be better to check & change other
stage.context.prop
type access and change if needed ?
ugh - this is virtually impossible... because we have no typed stage contexts it's impossible to know what a stage expects... but yes, it probably is almost always wrong... but it's a dangerous change.
ironically... this was also caused by same bug (#3224)
also why i introduced https://github.com/spinnaker/orca/blob/master/orca-core/src/main/java/com/netflix/spinnaker/orca/pipeline/model/StageContext.java#L74...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For manualSkip
I am not actually sure what the intention is (but 80% we shouldn't be looking upstream)
But I would rather not change it because, historically, that means we break someone. While this change is only used in the retrieval codepath
When calling the following endpoint: `v2/applications/{}/pipelines` with `expand=false` the `TaskController` will try "unexpand" the pipelines by removing "context" from all the stages. However, there is a wrinkle: during removal of the context we want to preserve the `group` key (if it exists). In order to check if the key is there, we make a call to `context.group`. The problem is that the `context` is actually a `StageContext` and the `.group` is actually a `StageContext::get("group")`. This particular call traverses the entire execution graph to see if `group` exists not only in the current stage context but also in the outputs of any of it's parent stages. Well, if the stage graph is large this is a O(n^2) operation. The kicker is that the `group` should never occur in the outputs (and if it does we don't care). This stage replaces the implicit groovy call to `.get` with explicit call to `.containsKey` which doesn't traverse the stage hierarchy. The result (on a 300 stage Kayenta canary pipeline): Before: 17.2s to fetch the last 2 executions After: 0.2s to fetch the last 2 executions
When calling the following endpoint:
v2/applications/{}/pipelines
withexpand=false
the
TaskController
will try "unexpand" the pipelines by removing "context" from all the stages.However, there is a wrinkle: during removal of the context we want to preserve the
group
key (if it exists).In order to check if the key is there, we make a call to
context.group
. The problem is that thecontext
is actually aStageContext
and the.group
is actually aStageContext::get("group")
.This particular call traverses the entire execution graph to see if
group
exists not only in the current stage contextbut also in the outputs of any of it's parent stages. Well, if the stage graph is large this is a O(n^2) operation.
The kicker is that the
group
should never occur in the outputs (and if it does we don't care).This stage replaces the implicit groovy call to
.get
with an explicit call to.containsKey
which doesn't traverse the stage hierarchy.The result (on a 300 stage Kayenta canary pipeline):
Before: 17.2s to fetch the last 2 executions
After: 0.2s to fetch the last 2 executions