Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(perf): Make getAncestors call faster #3306

Merged
merged 3 commits into from
Feb 27, 2020

Conversation

marchello2000
Copy link
Contributor

This change unrolls the recursion of the getAncestorImpl call into an imperative call.
getAncestors is actually called A TON! (especially since all context.get evaluations depend on it).
If the execution depth is large (e.g. canary with many stages) this call can take a while and, in some cases, throw a StackOverflow excetion.
Additionally, I added some logic to cache the getAncesors calls in the expression evaluation as it can't change

For context: getAncestors is executed 64 times for a simple wait stage execution

Results:
Time for getAncestors was reduced by a factor of 7x (1000 executions on 100 stage pipeline went from 7.2s to 0.9s)
Furthermore, due to caching the number of calls to getAncestors is reduces by ~3 (from 64 to 19)

So a total improvement time about 21x

As another example, planning a kayenta stage with 300 intervals went from 23.6s to 1.2s

@marchello2000
Copy link
Contributor Author

also see semi-related change: #3307

ancestors.add(curStage.getParent());
} catch (IllegalArgumentException e) {
// It's not really possible for the parent not to exist.. But unittests beg to differ
// Not logging anything here since not having a parent would have failed and loffed in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loffed... logged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Copy link
Contributor

@dreynaud dreynaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume we rely on the existing unit tests to ensure that functionality is the same?

Copy link
Contributor

@ezimanyi ezimanyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good! I've added a few comments on where I think the clarity/performance could be improved. I also echo @dreynaud 's question about how this will be tested---I didn't see any tests (in my quick check) that would explicitly test edge cases for this algorithm, so would be a bit nervous to merge this without adding some (unless they exist and I haven't found them).

@@ -56,7 +57,10 @@ public Object get(@Nullable Object key) {
if (delegate().containsKey(key)) {
return super.get(key);
} else {
return stage.ancestors().stream()
if (ancestors == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this caching might be better off in Stage. The reason I suggest this is because Stage is mutable, and some mutations of Stage will cause the result of stage.ancestors() to change, which will cause StageContext to have a stale cached value.

There's not really any way for StageContext to invalidate this cache, as it doesn't have the visibility into the Stage to know when relevant fields change. On the other hand, Stage is in a better position to know this (though it is tricky as the implementation depends not on a single stage but on the values of multiple stages).

All this to say, I'm a bit nervous about how we're going to invalidate this cache...if the performance benefits are enough and we're sure that there are no relevant changes to any of the stages during the lifetime of StageContext this might be worth the performance improvement, but it does come at the risk of some subtle potential bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I think caching in the stage is just a fraught since the stage wouldn't know when a new stage is inserted into an execution upstream which could happen during stage planning.

The interesting bit is that, technically (today), no mutation will alter the behavior of this getter, the reason is that all this "lookup" in prior stage "outpus" is really only used in bake and evalvars stages today, and it's not like an upstream bake stage will complete while a stage's task is being evaluated (which is the lifetime of stagecontext).

This is all WAY too convoluted. But in the end, I agree with you that even though (I believe) this is safe today, it might not be tomorrow and the code has a considerable "surprise" factor. So I will take it out for now and if I think of a better way of doing this will put up another PR (the perf improvement from recursion removal is pretty considerable already)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, i have an idea... @ezimanyi what do you think about caching the total number of stages in the execution and invalidating the cache when that changed? That actually seems safe. It's not immune to stages updating their requisiteStageRefIds but that doesn't happen AFAIK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that sounds reasonable...

I was just looking that how stages get added to an Execution and if I'm understanding correctly, there's no addStage directly, but rather callers use getStages and mutate the resulting array. Based on that, we really don't have a lot of control over how/when to invalidate the cache (from anywhere...either from stage or from Execution). So I still think this would be a bit risky to cache the value...but given that the current usage pattern appears to be to only ever add stages to the array, caching based on number of stages is probably a reasonable solution if the performance benefits warrant it.


// 1. Find all stages we depend on through requisiteStageRefIds and push them on the queue
if (!curStage.requisiteStageRefIds.isEmpty() && !directParentOnly) {
List<Stage> refIdPriorStages =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the performance can be further improved by removing this (and the following) iteration over all stages from the loop...this is still an O(N^2) algorithm because we iterate (potentially twice) over all stages for every visited stage. (I presume the issues are mostly for executions with a lot of stages, where changing this to an O(N) algorithm would have a lot of benefit.)

Instead you could build :

ImmutableMap<String, Stage> stageByRefId = execution.getStages().stream()
  .collect(toImmutableMap(...))`
ImmutableSetMultiMap<String, Stage> stageByParentId = execution.getStages().stream()
  .filter(it -> it.getSyntheticStageOwner() == SyntheticStageOwner.STAGE_BEFORE)
  .collect(toImmutableSetMultimap(...))`

And then look up the required values in these pre-computed structures from within the while loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well... it turns out this is not that easy... there is an expectation that the stages are returned in the order they appear in the execution and using a map doesn't preserve that order. I am tempted to say - noone should care but, then again, this is orca and I am sure I will break something if i change it. So going to leave as is for now. Ugh, this makes me upset...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes I see what you're saying...if some stage has requisiteStageRefIds = ["1", "2"] we are expecting that we'll add the two stages to toVisit in the order that the stages appear in getStages() rather than in the order they refIds appear in requisiteStageRefIds (which is actually using the Collection interface and so may be unordered anyway).

I guess one solution might be to make a wrapper class OrderedStage that contains a Stage and its order in the result of getStages() and store that in the map; then the call to build refIdPriorStages could order on the order field, then map to just get out the Stage. That would replace an O(N) operation (where N is the total number of stages) with one that is M log(M) (where M is the average size of requisiteStageRefIds). So the overall algorithm would go from O(N^2) to O(N M log (M)). I think that would be better as the average size of dependencies generally be much smaller than the number of stages (ie, most stages would in general only depend on one or two other stages).

Obviously that does add some complexity, though might have a reasonable impact for large enough N (though definitely would want to profile to be sure). Either way, completely up to you to either leave this as is (which is better than it was) or try this here. If the change already committed fixes the issue so it's not really a problem anymore, it might not be worth adding this complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Seeing as I haven't returned to this PR in a few months... I am going to merge it as is and then noodle on these additional improvements

@marchello2000
Copy link
Contributor Author

there are decent tests in StageSpec and StageNavigatorSpec. They seemed reasonably adequate to me, but I will look over them some more

This change unrolls the recursion of the `getAncestorImpl` call into an imperative call.
getAncestors is actually called A TON! (especially since all `context.get` evaluations depend on it).
If the execution depth is large (e.g. canary with many stages) this call can take a while and, in some cases, throw a StackOverflow excetion.
Additionally, I added some logic to cache the `getAncesors` calls in the expression evaluation as it can't change

For context: `getAncestors` is executed 64 times for a simple wait stage execution

Results:
Time for `getAncestors` was reduced by a factor of 7x (1000 executions on 100 stage pipeline went from 7.2s to 0.9s)
Further more, due to caching the number of calls to `getAncestors` is reduces by ~3 (from 64 to 19)

So a total improvement time about 21x

As another example, planning a kayenta stage with 300 intervals went from 23.6s to 1.2s
@marchello2000 marchello2000 added the ready to merge Approved and ready for merge label Feb 27, 2020
@mergify mergify bot added the auto merged Merged automatically by a bot label Feb 27, 2020
@mergify mergify bot merged commit 90c113b into spinnaker:master Feb 27, 2020
@marchello2000 marchello2000 deleted the mark/stage_ancestor_perf2 branch February 27, 2020 21:14
marchello2000 added a commit to marchello2000/orca that referenced this pull request Feb 27, 2020
mergify bot pushed a commit that referenced this pull request Feb 28, 2020
KathrynLewis pushed a commit to KathrynLewis/orca that referenced this pull request Jan 31, 2021
This change unrolls the recursion of the `getAncestorImpl` call into an imperative call.
getAncestors is actually called A TON! (especially since all `context.get` evaluations depend on it).
If the execution depth is large (e.g. canary with many stages) this call can take a while and, in some cases, throw a StackOverflow excetion.
Additionally, I added some logic to cache the `getAncesors` calls in the expression evaluation as it can't change

For context: `getAncestors` is executed 64 times for a simple wait stage execution

Results:
Time for `getAncestors` was reduced by a factor of 7x (1000 executions on 100 stage pipeline went from 7.2s to 0.9s)
Further more, due to caching the number of calls to `getAncestors` is reduces by ~3 (from 64 to 19)

So a total improvement time about 21x

As another example, planning a kayenta stage with 300 intervals went from 23.6s to 1.2s

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
KathrynLewis pushed a commit to KathrynLewis/orca that referenced this pull request Jan 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto merged Merged automatically by a bot ready to merge Approved and ready for merge target-release/1.19
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants