fix(perf): Make `getAncestors` call faster #3306

marchello2000 · 2019-11-19T22:51:05Z

This change unrolls the recursion of the getAncestorImpl call into an imperative call.
getAncestors is actually called A TON! (especially since all context.get evaluations depend on it).
If the execution depth is large (e.g. canary with many stages) this call can take a while and, in some cases, throw a StackOverflow excetion.
Additionally, I added some logic to cache the getAncesors calls in the expression evaluation as it can't change

For context: getAncestors is executed 64 times for a simple wait stage execution

Results:
Time for getAncestors was reduced by a factor of 7x (1000 executions on 100 stage pipeline went from 7.2s to 0.9s)
Furthermore, due to caching the number of calls to getAncestors is reduces by ~3 (from 64 to 19)

So a total improvement time about 21x

As another example, planning a kayenta stage with 300 intervals went from 23.6s to 1.2s

marchello2000 · 2019-11-19T22:51:58Z

also see semi-related change: #3307

dreynaud · 2019-11-19T23:17:24Z

orca-core/src/main/java/com/netflix/spinnaker/orca/pipeline/model/Stage.java

+          ancestors.add(curStage.getParent());
+        } catch (IllegalArgumentException e) {
+          // It's not really possible for the parent not to exist.. But unittests beg to differ
+          // Not logging anything here since not having a parent would have failed and loffed in


loffed... logged?

dreynaud

I presume we rely on the existing unit tests to ensure that functionality is the same?

orca-core/src/main/java/com/netflix/spinnaker/orca/pipeline/model/Stage.java

ezimanyi

Mostly looks good! I've added a few comments on where I think the clarity/performance could be improved. I also echo @dreynaud 's question about how this will be tested---I didn't see any tests (in my quick check) that would explicitly test edge cases for this algorithm, so would be a bit nervous to merge this without adding some (unless they exist and I haven't found them).

ezimanyi · 2019-11-19T23:52:51Z

orca-core/src/main/java/com/netflix/spinnaker/orca/pipeline/model/StageContext.java

@@ -56,7 +57,10 @@ public Object get(@Nullable Object key) {
    if (delegate().containsKey(key)) {
      return super.get(key);
    } else {
-      return stage.ancestors().stream()
+      if (ancestors == null) {


It seems like this caching might be better off in Stage. The reason I suggest this is because Stage is mutable, and some mutations of Stage will cause the result of stage.ancestors() to change, which will cause StageContext to have a stale cached value.

There's not really any way for StageContext to invalidate this cache, as it doesn't have the visibility into the Stage to know when relevant fields change. On the other hand, Stage is in a better position to know this (though it is tricky as the implementation depends not on a single stage but on the values of multiple stages).

All this to say, I'm a bit nervous about how we're going to invalidate this cache...if the performance benefits are enough and we're sure that there are no relevant changes to any of the stages during the lifetime of StageContext this might be worth the performance improvement, but it does come at the risk of some subtle potential bugs.

That's true. I think caching in the stage is just a fraught since the stage wouldn't know when a new stage is inserted into an execution upstream which could happen during stage planning.

The interesting bit is that, technically (today), no mutation will alter the behavior of this getter, the reason is that all this "lookup" in prior stage "outpus" is really only used in bake and evalvars stages today, and it's not like an upstream bake stage will complete while a stage's task is being evaluated (which is the lifetime of stagecontext).

This is all WAY too convoluted. But in the end, I agree with you that even though (I believe) this is safe today, it might not be tomorrow and the code has a considerable "surprise" factor. So I will take it out for now and if I think of a better way of doing this will put up another PR (the perf improvement from recursion removal is pretty considerable already)

Oh, i have an idea... @ezimanyi what do you think about caching the total number of stages in the execution and invalidating the cache when that changed? That actually seems safe. It's not immune to stages updating their requisiteStageRefIds but that doesn't happen AFAIK

I think that sounds reasonable...

I was just looking that how stages get added to an Execution and if I'm understanding correctly, there's no addStage directly, but rather callers use getStages and mutate the resulting array. Based on that, we really don't have a lot of control over how/when to invalidate the cache (from anywhere...either from stage or from Execution). So I still think this would be a bit risky to cache the value...but given that the current usage pattern appears to be to only ever add stages to the array, caching based on number of stages is probably a reasonable solution if the performance benefits warrant it.

ezimanyi · 2019-11-20T00:08:07Z

orca-core/src/main/java/com/netflix/spinnaker/orca/pipeline/model/Stage.java

+
+      // 1. Find all stages we depend on through requisiteStageRefIds and push them on the queue
+      if (!curStage.requisiteStageRefIds.isEmpty() && !directParentOnly) {
+        List<Stage> refIdPriorStages =


I suspect the performance can be further improved by removing this (and the following) iteration over all stages from the loop...this is still an O(N^2) algorithm because we iterate (potentially twice) over all stages for every visited stage. (I presume the issues are mostly for executions with a lot of stages, where changing this to an O(N) algorithm would have a lot of benefit.)

Instead you could build :

ImmutableMap<String, Stage> stageByRefId = execution.getStages().stream() .collect(toImmutableMap(...))` ImmutableSetMultiMap<String, Stage> stageByParentId = execution.getStages().stream() .filter(it -> it.getSyntheticStageOwner() == SyntheticStageOwner.STAGE_BEFORE) .collect(toImmutableSetMultimap(...))`

And then look up the required values in these pre-computed structures from within the while loop.

that's a good point, thanks!

well... it turns out this is not that easy... there is an expectation that the stages are returned in the order they appear in the execution and using a map doesn't preserve that order. I am tempted to say - noone should care but, then again, this is orca and I am sure I will break something if i change it. So going to leave as is for now. Ugh, this makes me upset...

Ah, yes I see what you're saying...if some stage has requisiteStageRefIds = ["1", "2"] we are expecting that we'll add the two stages to toVisit in the order that the stages appear in getStages() rather than in the order they refIds appear in requisiteStageRefIds (which is actually using the Collection interface and so may be unordered anyway).

I guess one solution might be to make a wrapper class OrderedStage that contains a Stage and its order in the result of getStages() and store that in the map; then the call to build refIdPriorStages could order on the order field, then map to just get out the Stage. That would replace an O(N) operation (where N is the total number of stages) with one that is M log(M) (where M is the average size of requisiteStageRefIds). So the overall algorithm would go from O(N^2) to O(N M log (M)). I think that would be better as the average size of dependencies generally be much smaller than the number of stages (ie, most stages would in general only depend on one or two other stages).

Obviously that does add some complexity, though might have a reasonable impact for large enough N (though definitely would want to profile to be sure). Either way, completely up to you to either leave this as is (which is better than it was) or try this here. If the change already committed fixes the issue so it's not really a problem anymore, it might not be worth adding this complexity.

That makes sense. Seeing as I haven't returned to this PR in a few months... I am going to merge it as is and then noodle on these additional improvements

marchello2000 · 2019-11-21T04:19:16Z

there are decent tests in StageSpec and StageNavigatorSpec. They seemed reasonably adequate to me, but I will look over them some more

This change unrolls the recursion of the `getAncestorImpl` call into an imperative call. getAncestors is actually called A TON! (especially since all `context.get` evaluations depend on it). If the execution depth is large (e.g. canary with many stages) this call can take a while and, in some cases, throw a StackOverflow excetion. Additionally, I added some logic to cache the `getAncesors` calls in the expression evaluation as it can't change For context: `getAncestors` is executed 64 times for a simple wait stage execution Results: Time for `getAncestors` was reduced by a factor of 7x (1000 executions on 100 stage pipeline went from 7.2s to 0.9s) Further more, due to caching the number of calls to `getAncestors` is reduces by ~3 (from 64 to 19) So a total improvement time about 21x As another example, planning a kayenta stage with 300 intervals went from 23.6s to 1.2s

This reverts commit 90c113b.

This change unrolls the recursion of the `getAncestorImpl` call into an imperative call. getAncestors is actually called A TON! (especially since all `context.get` evaluations depend on it). If the execution depth is large (e.g. canary with many stages) this call can take a while and, in some cases, throw a StackOverflow excetion. Additionally, I added some logic to cache the `getAncesors` calls in the expression evaluation as it can't change For context: `getAncestors` is executed 64 times for a simple wait stage execution Results: Time for `getAncestors` was reduced by a factor of 7x (1000 executions on 100 stage pipeline went from 7.2s to 0.9s) Further more, due to caching the number of calls to `getAncestors` is reduces by ~3 (from 64 to 19) So a total improvement time about 21x As another example, planning a kayenta stage with 300 intervals went from 23.6s to 1.2s Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

…pinnaker#3473) This reverts commit 90c113b.

marchello2000 requested review from srekapalli, ezimanyi and dreynaud November 19, 2019 22:51

marchello2000 mentioned this pull request Nov 19, 2019

fix(perf): Unexpanding pipeline without StageContext #3307

Merged

dreynaud reviewed Nov 19, 2019

View reviewed changes

marchello2000 force-pushed the mark/stage_ancestor_perf2 branch from 19a1438 to af54e5e Compare November 19, 2019 23:18

dreynaud approved these changes Nov 19, 2019

View reviewed changes

ezimanyi reviewed Nov 19, 2019

View reviewed changes

ezimanyi reviewed Nov 20, 2019

View reviewed changes

marchello2000 force-pushed the mark/stage_ancestor_perf2 branch from fa91559 to 3d62585 Compare February 27, 2020 20:35

marchello2000 added the ready to merge Approved and ready for merge label Feb 27, 2020

mergify bot added the auto merged Merged automatically by a bot label Feb 27, 2020

mergify bot added 2 commits February 27, 2020 20:48

Merge branch 'master' into mark/stage_ancestor_perf2

43964f9

Merge branch 'master' into mark/stage_ancestor_perf2

19a6294

mergify bot merged commit 90c113b into spinnaker:master Feb 27, 2020

marchello2000 deleted the mark/stage_ancestor_perf2 branch February 27, 2020 21:14

spinnakerbot added the target-release/1.19 label Feb 27, 2020

marchello2000 added a commit to marchello2000/orca that referenced this pull request Feb 27, 2020

Revert "fix(perf): Make getAncestors call faster (spinnaker#3306)"

b9643d1

This reverts commit 90c113b.

spinnakerbot mentioned this pull request Feb 27, 2020

Revert "fix(perf): Make getAncestors call faster (#3306)" #3473

Merged

mergify bot pushed a commit that referenced this pull request Feb 28, 2020

Revert "fix(perf): Make getAncestors call faster (#3306)" (#3473)

d7219d2

This reverts commit 90c113b.

KathrynLewis pushed a commit to KathrynLewis/orca that referenced this pull request Jan 31, 2021

Revert "fix(perf): Make getAncestors call faster (spinnaker#3306)" (s…

2b1c028

…pinnaker#3473) This reverts commit 90c113b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(perf): Make `getAncestors` call faster #3306

fix(perf): Make `getAncestors` call faster #3306

marchello2000 commented Nov 19, 2019

marchello2000 commented Nov 19, 2019

dreynaud Nov 19, 2019

marchello2000 Nov 21, 2019

dreynaud left a comment

ezimanyi left a comment

ezimanyi Nov 19, 2019

marchello2000 Nov 21, 2019

marchello2000 Nov 21, 2019

ezimanyi Nov 22, 2019

ezimanyi Nov 20, 2019

marchello2000 Nov 21, 2019

marchello2000 Nov 21, 2019

ezimanyi Nov 22, 2019

marchello2000 Feb 27, 2020

marchello2000 commented Nov 21, 2019

fix(perf): Make getAncestors call faster #3306

fix(perf): Make getAncestors call faster #3306

Conversation

marchello2000 commented Nov 19, 2019

marchello2000 commented Nov 19, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dreynaud left a comment

Choose a reason for hiding this comment

ezimanyi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marchello2000 commented Nov 21, 2019

fix(perf): Make `getAncestors` call faster #3306

fix(perf): Make `getAncestors` call faster #3306