Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent ForkScanner from visiting nodes more than once in some cases for in-progress builds with nested parallelism #287

Merged
merged 3 commits into from
Jun 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
import java.util.List;
import java.util.ListIterator;
import java.util.Set;
import java.util.stream.Collectors;

/**
* Scanner that will scan down all forks when we hit parallel blocks before continuing, but generally runs in linear order
Expand Down Expand Up @@ -443,8 +444,15 @@ ArrayDeque<ParallelBlockStart> leastCommonAncestor(@NonNull final Set<FlowNode>
throw new IllegalStateException("No least common ancestor found from " + heads);
}

// If we hit issues with the ordering of blocks by depth, apply a sorting to the parallels by depth
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interpreted this as a kind of TODO comment, and debugging showed that in existing test cases with nested parallelism, the result was ordered from innermost to outermost, but the ordering was opposite for my new test, and making the order stable fixes my test, so...

To be clear, I do not fully understand the specifics of this method or really this class, so there could still be other problems, but the current behavior is pretty bad and the new behavior should at least be better in some cases.

return convertForksToBlockStarts(parallelForks);
// The result must be in reverse topological order, i.e. inner branches must be visited before outer branches.
return convertForksToBlockStarts(parallelForks).stream().sorted((pbs1, pbs2) -> {
if (pbs1.forkStart.getEnclosingBlocks().contains(pbs2.forkStart)) {
return -1;
} else if (pbs2.forkStart.getEnclosingBlocks().contains(pbs1.forkStart)) {
return 1;
}
return 0;
Comment on lines +448 to +454
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Comparator.html this is not “consistent with equals”. That is generally harmless when sorting a list (and I suppose also a stream) but I do not think you can reliably implement a topological sort this way. For example in jshell

import java.util.stream.*;
Stream.of("axx a ax".split(" ")).sorted((x, y) -> x.startsWith(y) ? 1 : y.startsWith(x) ? -1 : 0).collect(Collectors.joining(" "))

works (a ax axx) but on the input axx b a ax you get back the same list, whereas you might expect a b ax axx or similar.

Whether there is any observable bug in this function as a result is another question.

Copy link
Member Author

@dwnusbaum dwnusbaum Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yes, I can look into this.

Whether there is any observable bug in this function as a result is another question.

Perhaps, if you have more than 3 levels of nested parallel steps, or if you have multiple parallel step children under a single parallel step, although I wonder if the latter case would work regardless of this change because of other issues.

In most cases though this list should just have one or two elements, which should work ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From some quick testing, I have only been able to reproduce cases where the outermost parallel step is the second to last element before the sort, when it needs to always be the final element. Other than that, the pre-sort order is fine. Perhaps a more targeted fix could be made further up in the method, and it seems plausible that with some really unusual Pipelines there is some reachable case that I was not able to trigger, but I am not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would recommend at least leaving in a code comment warning that this logic would not work for exotic cases which may or may not be reachable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #288.

}).collect(Collectors.toCollection(ArrayDeque::new));
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -992,4 +992,39 @@ public void testParallelCorrectEndNodeForVisitor() throws Exception {
testParallelFindsLast(jobPauseSecond, "wait2");
testParallelFindsLast(jobPauseMiddle, "wait3");
}

@Test
public void inProgressParallelInParallel() throws Exception {
WorkflowJob p = r.createProject(WorkflowJob.class);
p.setDefinition(new CpsFlowDefinition(
"stage('MyStage') {\n" + // 3, 4
" parallel(\n" + // 5
" outerA: {\n" + // 8
" semaphore('outerA')\n" + // 11
" },\n" +
" outerB: {\n" + // 9
" echo('outerB')\n" + // 12
" },\n" + // 13
" outerC: {\n" + // 10
" parallel(\n" + // 14
" innerA: {\n" + // 16
" semaphore('innerA')\n" + // 18
" },\n" +
" innerB: {\n" + // 17
" echo('innerB')\n" +
" }\n" +
" )\n" +
" }\n" +
" )\n" +
"}\n", true));
WorkflowRun b = p.scheduleBuild2(0).waitForStart();
SemaphoreStep.waitForStart("outerA/1", b);
SemaphoreStep.waitForStart("innerA/1", b);
ForkScanner scanner = new ForkScanner();
sanityTestIterationAndVisiter(b.getExecution().getCurrentHeads());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the parallel step start node (5), as well as the stage body start node (4), the stage step start node (3), and the the flow start node (2) were visited twice.

With the right structure you could end up visiting almost the whole Pipeline twice.

SemaphoreStep.success("outerA/1", null);
SemaphoreStep.success("innerA/1", null);
r.assertBuildStatusSuccess(r.waitForCompletion(b));
sanityTestIterationAndVisiter(b.getExecution().getCurrentHeads());
}
}