Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] Split the 'SW build & test' job into build and test jobs #19577

Merged
merged 2 commits into from
Nov 14, 2023

Conversation

jwnrt
Copy link
Contributor

@jwnrt jwnrt commented Sep 1, 2023

This PR attempts to cut down total CI length by splitting the "SW build and test" into "build" and "test" jobs to shorten the critical path.

Here's a Gantt chart of a typical CI run before this PR:

23-09-01_14:42:14

You can see that "software build & test" (SWB&T) takes the longest and is a dependency of the three FPGA tests (and a short packaging test).

The FPGA jobs only depend on the software build, not the test part. This PR split the build & test job so that the FPGA tests can start just after the build has finished.

@jwnrt jwnrt force-pushed the ci-split-build-and-test branch from 632356e to 84beeb8 Compare September 1, 2023 09:51
@pamaury
Copy link
Contributor

pamaury commented Sep 1, 2023

I you want to parallelize the SW tests with other tests (like CW310), probably you will need to move the SW tests to its own job, not just its own step within a job.

@jwnrt
Copy link
Contributor Author

jwnrt commented Sep 1, 2023

I you want to parallelize the SW tests with other tests (like CW310), probably you will need to move the SW tests to its own job, not just its own step within a job.

Thanks @pamaury, I'm still learning how Azure is structured.

I'll let this run finish to see how long build vs test takes. You're right we'll need to split them into multiple jobs and I think this will need a bit of work to get the build artefacts / bazel cache from the build job to the test job.

EDIT: it may also be the case that the ROM tests don't actually depend on the SW build artefacts, in which case we don't need to split this at all.

Update: build took 1h 9m, test took 1h 18m. Typical runs combined appear to take between 2h 30m and 3h.

@jwnrt jwnrt changed the title [ci] Split SW build and test [ci] Optimise critical path Sep 1, 2023
@jwnrt jwnrt force-pushed the ci-split-build-and-test branch 2 times, most recently from 5e4fd8f to cd315bc Compare September 11, 2023 09:25
@jwnrt jwnrt marked this pull request as ready for review September 19, 2023 13:20
@jwnrt jwnrt force-pushed the ci-split-build-and-test branch 2 times, most recently from cd315bc to b02d86f Compare September 19, 2023 13:27
Copy link
Contributor

@rswarbrick rswarbrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Should the commits be merged, though? (Or have I misunderstood? What happens with just the 1st one?)

@jwnrt
Copy link
Contributor Author

jwnrt commented Sep 19, 2023

Should the commits be merged, though?

Yes, I think maybe I created two commits by accident. I'll merge them once I get this working, thanks

@jwnrt jwnrt force-pushed the ci-split-build-and-test branch 4 times, most recently from f845743 to 075cf7b Compare September 25, 2023 08:37
@jwnrt jwnrt marked this pull request as draft September 26, 2023 08:35
@jwnrt jwnrt force-pushed the ci-split-build-and-test branch from 075cf7b to de0415c Compare September 26, 2023 10:17
@jwnrt
Copy link
Contributor Author

jwnrt commented Sep 26, 2023

Apologies for all the force pushes here, it seems the only realistic way to iterate on pipeline changes is to just keep pushing new commits. I'll mark as review when it's ready.

@jwnrt jwnrt force-pushed the ci-split-build-and-test branch 4 times, most recently from 27a6256 to 9fd240e Compare September 27, 2023 12:27
@jwnrt
Copy link
Contributor Author

jwnrt commented Oct 17, 2023

In the meantime, the new Bazel rule work done by @cfrantz and others has improved dependency tracking and dramatically reduced the runtime of SWB&T. Thank you for that!

SWB&T is no longer the huge outlier it used to be. Here's what a typical CI run looks like now:

23-10-17_13:57:44

This PR could still be useful (once fixed) as we could still move those ROM tests that are on the right of the graph to begin after the SW build rather than the SW test. Those ROM tests still depend on the CW310 bitstream though, which will probably be the new critical path.

Something like this:

23-10-17_13:57:44-edited

Above: SWB&T is split into a build and test step, not sure what the ratios are with the new improved dependency tracking. The bitstream build times become the new limiting factor.

@jwnrt jwnrt force-pushed the ci-split-build-and-test branch 2 times, most recently from 4a645e2 to 58e3f4e Compare November 10, 2023 20:03
@jwnrt jwnrt changed the title [ci] Optimise critical path [ci] Split the 'SW build & test' job into build and test jobs Nov 10, 2023
@jwnrt jwnrt requested a review from pamaury November 10, 2023 22:26
@jwnrt jwnrt marked this pull request as ready for review November 10, 2023 22:26
Copy link
Contributor

@pamaury pamaury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really nice @jwnrt :)

The test step will be moved to another job depending on this one.

Signed-off-by: James Wainwright <[email protected]>
This is now a separate job that others don't have to depend on.

Signed-off-by: James Wainwright <[email protected]>
@jwnrt jwnrt force-pushed the ci-split-build-and-test branch from 58e3f4e to 0aff3c0 Compare November 14, 2023 08:38
@jwnrt
Copy link
Contributor Author

jwnrt commented Nov 14, 2023

Here's a Gantt of the CI with these changes (well, before the most recent no-op force push):

image

This run did not use cached bitstreams, so those ROM tests started quite late. I'm hoping to get a CI run that uses cached bitstreams to show what that looks like. EDIT: nevermind, changes to the azure-pipelines.yml file trigger a bitstream rebuild (not sure why) so we won't see that in this PR.

The "CW310 ROM tests" job is now the standout in the critical path. Although it looks quite long in that chart, the duration matches the average we usually get.

@jwnrt jwnrt merged commit 2e519a5 into lowRISC:master Nov 14, 2023
31 checks passed
@jwnrt
Copy link
Contributor Author

jwnrt commented Nov 14, 2023

Here's a run which hit the bitstream cache:

Before

1h 49m

23-11-14_15:44:07

After

1h 34m

23-11-14_16:03:05


Was hoping for more than 15 minutes, but still something! Weirdly, the build job + test job take longer than the SW B&T job used to. Maybe some part of the cache isn't being passed between the two? I will look into it soon since we're still wasting cycles rebuilding even if it's not affecting the critical path.

@pamaury
Copy link
Contributor

pamaury commented Nov 14, 2023

I suspect the two steps are duplicating the otp image builds. I am not sure why they are not cached, we need to dig into to understand that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants