-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(perf): Make getAncestors
call faster
#3306
fix(perf): Make getAncestors
call faster
#3306
Conversation
also see semi-related change: #3307 |
ancestors.add(curStage.getParent()); | ||
} catch (IllegalArgumentException e) { | ||
// It's not really possible for the parent not to exist.. But unittests beg to differ | ||
// Not logging anything here since not having a parent would have failed and loffed in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loffed
... logged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
19a1438
to
af54e5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume we rely on the existing unit tests to ensure that functionality is the same?
orca-core/src/main/java/com/netflix/spinnaker/orca/pipeline/model/Stage.java
Outdated
Show resolved
Hide resolved
orca-core/src/main/java/com/netflix/spinnaker/orca/pipeline/model/Stage.java
Outdated
Show resolved
Hide resolved
orca-core/src/main/java/com/netflix/spinnaker/orca/pipeline/model/Stage.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good! I've added a few comments on where I think the clarity/performance could be improved. I also echo @dreynaud 's question about how this will be tested---I didn't see any tests (in my quick check) that would explicitly test edge cases for this algorithm, so would be a bit nervous to merge this without adding some (unless they exist and I haven't found them).
@@ -56,7 +57,10 @@ public Object get(@Nullable Object key) { | |||
if (delegate().containsKey(key)) { | |||
return super.get(key); | |||
} else { | |||
return stage.ancestors().stream() | |||
if (ancestors == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this caching might be better off in Stage
. The reason I suggest this is because Stage
is mutable, and some mutations of Stage
will cause the result of stage.ancestors()
to change, which will cause StageContext
to have a stale cached value.
There's not really any way for StageContext
to invalidate this cache, as it doesn't have the visibility into the Stage
to know when relevant fields change. On the other hand, Stage
is in a better position to know this (though it is tricky as the implementation depends not on a single stage but on the values of multiple stages).
All this to say, I'm a bit nervous about how we're going to invalidate this cache...if the performance benefits are enough and we're sure that there are no relevant changes to any of the stages during the lifetime of StageContext
this might be worth the performance improvement, but it does come at the risk of some subtle potential bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. I think caching in the stage is just a fraught since the stage wouldn't know when a new stage is inserted into an execution upstream which could happen during stage planning.
The interesting bit is that, technically (today), no mutation will alter the behavior of this get
ter, the reason is that all this "lookup" in prior stage "outpus" is really only used in bake and evalvars stages today, and it's not like an upstream bake stage will complete while a stage's task is being evaluated (which is the lifetime of stagecontext).
This is all WAY too convoluted. But in the end, I agree with you that even though (I believe) this is safe today, it might not be tomorrow and the code has a considerable "surprise" factor. So I will take it out for now and if I think of a better way of doing this will put up another PR (the perf improvement from recursion removal is pretty considerable already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, i have an idea... @ezimanyi what do you think about caching the total number of stages in the execution and invalidating the cache when that changed? That actually seems safe. It's not immune to stages updating their requisiteStageRefIds
but that doesn't happen AFAIK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that sounds reasonable...
I was just looking that how stages get added to an Execution
and if I'm understanding correctly, there's no addStage
directly, but rather callers use getStages
and mutate the resulting array. Based on that, we really don't have a lot of control over how/when to invalidate the cache (from anywhere...either from stage or from Execution
). So I still think this would be a bit risky to cache the value...but given that the current usage pattern appears to be to only ever add stages to the array, caching based on number of stages is probably a reasonable solution if the performance benefits warrant it.
|
||
// 1. Find all stages we depend on through requisiteStageRefIds and push them on the queue | ||
if (!curStage.requisiteStageRefIds.isEmpty() && !directParentOnly) { | ||
List<Stage> refIdPriorStages = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect the performance can be further improved by removing this (and the following) iteration over all stages from the loop...this is still an O(N^2)
algorithm because we iterate (potentially twice) over all stages for every visited stage. (I presume the issues are mostly for executions with a lot of stages, where changing this to an O(N)
algorithm would have a lot of benefit.)
Instead you could build :
ImmutableMap<String, Stage> stageByRefId = execution.getStages().stream()
.collect(toImmutableMap(...))`
ImmutableSetMultiMap<String, Stage> stageByParentId = execution.getStages().stream()
.filter(it -> it.getSyntheticStageOwner() == SyntheticStageOwner.STAGE_BEFORE)
.collect(toImmutableSetMultimap(...))`
And then look up the required values in these pre-computed structures from within the while
loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well... it turns out this is not that easy... there is an expectation that the stages are returned in the order they appear in the execution and using a map doesn't preserve that order. I am tempted to say - noone should care but, then again, this is orca and I am sure I will break something if i change it. So going to leave as is for now. Ugh, this makes me upset...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes I see what you're saying...if some stage has requisiteStageRefIds = ["1", "2"]
we are expecting that we'll add the two stages to toVisit
in the order that the stages appear in getStages()
rather than in the order they refId
s appear in requisiteStageRefIds
(which is actually using the Collection
interface and so may be unordered anyway).
I guess one solution might be to make a wrapper class OrderedStage
that contains a Stage
and its order in the result of getStages()
and store that in the map; then the call to build refIdPriorStages
could order on the order field, then map to just get out the Stage
. That would replace an O(N)
operation (where N
is the total number of stages) with one that is M log(M)
(where M
is the average size of requisiteStageRefIds
). So the overall algorithm would go from O(N^2)
to O(N M log (M))
. I think that would be better as the average size of dependencies generally be much smaller than the number of stages (ie, most stages would in general only depend on one or two other stages).
Obviously that does add some complexity, though might have a reasonable impact for large enough N
(though definitely would want to profile to be sure). Either way, completely up to you to either leave this as is (which is better than it was) or try this here. If the change already committed fixes the issue so it's not really a problem anymore, it might not be worth adding this complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Seeing as I haven't returned to this PR in a few months... I am going to merge it as is and then noodle on these additional improvements
there are decent tests in |
This change unrolls the recursion of the `getAncestorImpl` call into an imperative call. getAncestors is actually called A TON! (especially since all `context.get` evaluations depend on it). If the execution depth is large (e.g. canary with many stages) this call can take a while and, in some cases, throw a StackOverflow excetion. Additionally, I added some logic to cache the `getAncesors` calls in the expression evaluation as it can't change For context: `getAncestors` is executed 64 times for a simple wait stage execution Results: Time for `getAncestors` was reduced by a factor of 7x (1000 executions on 100 stage pipeline went from 7.2s to 0.9s) Further more, due to caching the number of calls to `getAncestors` is reduces by ~3 (from 64 to 19) So a total improvement time about 21x As another example, planning a kayenta stage with 300 intervals went from 23.6s to 1.2s
fa91559
to
3d62585
Compare
This reverts commit 90c113b.
This reverts commit 90c113b.
This change unrolls the recursion of the `getAncestorImpl` call into an imperative call. getAncestors is actually called A TON! (especially since all `context.get` evaluations depend on it). If the execution depth is large (e.g. canary with many stages) this call can take a while and, in some cases, throw a StackOverflow excetion. Additionally, I added some logic to cache the `getAncesors` calls in the expression evaluation as it can't change For context: `getAncestors` is executed 64 times for a simple wait stage execution Results: Time for `getAncestors` was reduced by a factor of 7x (1000 executions on 100 stage pipeline went from 7.2s to 0.9s) Further more, due to caching the number of calls to `getAncestors` is reduces by ~3 (from 64 to 19) So a total improvement time about 21x As another example, planning a kayenta stage with 300 intervals went from 23.6s to 1.2s Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…pinnaker#3473) This reverts commit 90c113b.
This change unrolls the recursion of the
getAncestorImpl
call into an imperative call.getAncestors is actually called A TON! (especially since all
context.get
evaluations depend on it).If the execution depth is large (e.g. canary with many stages) this call can take a while and, in some cases, throw a StackOverflow excetion.
Additionally, I added some logic to cache the
getAncesors
calls in the expression evaluation as it can't changeFor context:
getAncestors
is executed 64 times for a simple wait stage executionResults:
Time for
getAncestors
was reduced by a factor of 7x (1000 executions on 100 stage pipeline went from 7.2s to 0.9s)Furthermore, due to caching the number of calls to
getAncestors
is reduces by ~3 (from 64 to 19)So a total improvement time about 21x
As another example, planning a kayenta stage with 300 intervals went from 23.6s to 1.2s