Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(perf): Make getAncestors call faster #3306

Merged
merged 3 commits into from
Feb 27, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -38,25 +38,14 @@
import com.google.common.base.Strings;
import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableSet;
import com.google.common.collect.Sets;
import com.netflix.spinnaker.kork.exceptions.SpinnakerException;
import com.netflix.spinnaker.orca.ExecutionStatus;
import com.netflix.spinnaker.orca.jackson.OrcaObjectMapper;
import com.netflix.spinnaker.orca.pipeline.model.support.RequisiteStageRefIdDeserializer;
import de.huxhorn.sulky.ulid.ULID;
import java.io.IOException;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Base64;
import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.Optional;
import java.util.Set;
import java.util.*;
import java.util.function.Predicate;
import java.util.stream.Stream;
import javax.annotation.Nonnull;
Expand Down Expand Up @@ -366,14 +355,10 @@ public Task taskById(String taskId) {
*/
public List<Stage> ancestors() {
if (execution != null) {
Set<String> visited = Sets.newHashSetWithExpectedSize(execution.getStages().size());
return ImmutableList.<Stage>builder()
.add(this)
.addAll(getAncestorsImpl(visited, false))
.build();
} else {
return emptyList();
return getAncestorsImpl(false);
}

return emptyList();
}

/**
Expand All @@ -385,81 +370,73 @@ public List<Stage> ancestors() {
*/
public List<Stage> directAncestors() {
if (execution != null) {
Set<String> visited = Sets.newHashSetWithExpectedSize(execution.getStages().size());
return ImmutableList.<Stage>builder()
.add(this)
.addAll(getAncestorsImpl(visited, true))
.build();
} else {
return emptyList();
return getAncestorsImpl(true);
}

return emptyList();
}

/**
* Worker method to get the list of all ancestors (parents and, optionally, prerequisite stages)
* of the current stage.
*
* @param visited list of visited nodes
* @param directParentOnly true to only include direct parents of the stage, false to also include
* stages this stage depends on (via requisiteRefIds)
* @return list of ancestor stages
*/
private List<Stage> getAncestorsImpl(Set<String> visited, boolean directParentOnly) {
visited.add(this.refId);
private ImmutableList<Stage> getAncestorsImpl(boolean directParentOnly) {
Set<String> visited = new HashSet<>();
ImmutableList.Builder<Stage> allAncestorStages = ImmutableList.builder();
Queue<Stage> toVisit = new ArrayDeque<>();

if (!requisiteStageRefIds.isEmpty() && !directParentOnly) {
// Get stages this stage depends on via requisiteStageRefIds:
List<Stage> previousStages =
execution.getStages().stream()
.filter(it -> requisiteStageRefIds.contains(it.refId))
.filter(it -> !visited.contains(it.refId))
.collect(toList());
List<Stage> syntheticStages =
execution.getStages().stream()
.filter(
s ->
previousStages.stream()
.map(Stage::getId)
.anyMatch(id -> id.equals(s.parentStageId)))
.collect(toList());
return ImmutableList.<Stage>builder()
.addAll(previousStages)
.addAll(syntheticStages)
.addAll(
previousStages.stream()
.flatMap(it -> it.getAncestorsImpl(visited, directParentOnly).stream())
.collect(toList()))
.build();
} else if (parentStageId != null && !visited.contains(parentStageId)) {
// Get parent stages, but exclude already visited ones:

List<Stage> ancestors = new ArrayList<>();
if (getSyntheticStageOwner() == SyntheticStageOwner.STAGE_AFTER) {
ancestors.addAll(
toVisit.add(this);

while (!toVisit.isEmpty()) {
Stage curStage = toVisit.remove();

if (visited.contains(curStage.id)) {
continue;
}

visited.add(curStage.id);
allAncestorStages.add(curStage);

// 1. Find all stages we depend on through requisiteStageRefIds and push them on the queue
if (!curStage.requisiteStageRefIds.isEmpty() && !directParentOnly) {
List<Stage> refIdPriorStages =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the performance can be further improved by removing this (and the following) iteration over all stages from the loop...this is still an O(N^2) algorithm because we iterate (potentially twice) over all stages for every visited stage. (I presume the issues are mostly for executions with a lot of stages, where changing this to an O(N) algorithm would have a lot of benefit.)

Instead you could build :

ImmutableMap<String, Stage> stageByRefId = execution.getStages().stream()
  .collect(toImmutableMap(...))`
ImmutableSetMultiMap<String, Stage> stageByParentId = execution.getStages().stream()
  .filter(it -> it.getSyntheticStageOwner() == SyntheticStageOwner.STAGE_BEFORE)
  .collect(toImmutableSetMultimap(...))`

And then look up the required values in these pre-computed structures from within the while loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well... it turns out this is not that easy... there is an expectation that the stages are returned in the order they appear in the execution and using a map doesn't preserve that order. I am tempted to say - noone should care but, then again, this is orca and I am sure I will break something if i change it. So going to leave as is for now. Ugh, this makes me upset...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes I see what you're saying...if some stage has requisiteStageRefIds = ["1", "2"] we are expecting that we'll add the two stages to toVisit in the order that the stages appear in getStages() rather than in the order they refIds appear in requisiteStageRefIds (which is actually using the Collection interface and so may be unordered anyway).

I guess one solution might be to make a wrapper class OrderedStage that contains a Stage and its order in the result of getStages() and store that in the map; then the call to build refIdPriorStages could order on the order field, then map to just get out the Stage. That would replace an O(N) operation (where N is the total number of stages) with one that is M log(M) (where M is the average size of requisiteStageRefIds). So the overall algorithm would go from O(N^2) to O(N M log (M)). I think that would be better as the average size of dependencies generally be much smaller than the number of stages (ie, most stages would in general only depend on one or two other stages).

Obviously that does add some complexity, though might have a reasonable impact for large enough N (though definitely would want to profile to be sure). Either way, completely up to you to either leave this as is (which is better than it was) or try this here. If the change already committed fixes the issue so it's not really a problem anymore, it might not be worth adding this complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Seeing as I haven't returned to this PR in a few months... I am going to merge it as is and then noodle on these additional improvements

execution.getStages().stream()
.filter(
it ->
parentStageId.equals(it.parentStageId)
&& it.getSyntheticStageOwner() == SyntheticStageOwner.STAGE_BEFORE)
.collect(toList()));
.filter(it -> curStage.requisiteStageRefIds.contains(it.refId))
.collect(toList());

toVisit.addAll(refIdPriorStages);
}

ancestors.addAll(
execution.getStages().stream()
.filter(it -> it.id.equals(parentStageId))
.findFirst()
.<List<Stage>>map(
parent ->
ImmutableList.<Stage>builder()
.add(parent)
.addAll(parent.getAncestorsImpl(visited, directParentOnly))
.build())
.orElse(emptyList()));

return ancestors;
} else {
return emptyList();
// 2. If we have a parent stage
if (curStage.parentStageId != null && !visited.contains(curStage.parentStageId)) {
// 2.1 AND we are an STAGE_AFTER, find all the stages with same parent as us but that are
// marked STAGE_BEFORE
if (curStage.getSyntheticStageOwner() == SyntheticStageOwner.STAGE_AFTER) {
toVisit.addAll(
execution.getStages().stream()
.filter(
it ->
curStage.parentStageId.equals(it.parentStageId)
&& it.getSyntheticStageOwner() == SyntheticStageOwner.STAGE_BEFORE)
.collect(toList()));
}

// 2.2 Add our direct parent to the queue
try {
toVisit.add(curStage.getParent());
} catch (IllegalArgumentException e) {
// It's not really possible for the parent not to exist.. But unittests beg to differ
// Not logging anything here since not having a parent would have failed and logged in
// other places
}
}
}

return allAncestorStages.build();
}

/**
Expand Down