-
Notifications
You must be signed in to change notification settings - Fork 792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] Split the 'SW build & test' job into build and test jobs #19577
Conversation
632356e
to
84beeb8
Compare
I you want to parallelize the SW tests with other tests (like CW310), probably you will need to move the SW tests to its own job, not just its own step within a job. |
Thanks @pamaury, I'm still learning how Azure is structured. I'll let this run finish to see how long build vs test takes. You're right we'll need to split them into multiple jobs and I think this will need a bit of work to get the build artefacts / bazel cache from the build job to the test job. EDIT: it may also be the case that the ROM tests don't actually depend on the SW build artefacts, in which case we don't need to split this at all. Update: build took 1h 9m, test took 1h 18m. Typical runs combined appear to take between 2h 30m and 3h. |
5e4fd8f
to
cd315bc
Compare
cd315bc
to
b02d86f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! Should the commits be merged, though? (Or have I misunderstood? What happens with just the 1st one?)
Yes, I think maybe I created two commits by accident. I'll merge them once I get this working, thanks |
f845743
to
075cf7b
Compare
075cf7b
to
de0415c
Compare
Apologies for all the force pushes here, it seems the only realistic way to iterate on pipeline changes is to just keep pushing new commits. I'll mark as review when it's ready. |
27a6256
to
9fd240e
Compare
In the meantime, the new Bazel rule work done by @cfrantz and others has improved dependency tracking and dramatically reduced the runtime of SWB&T. Thank you for that! SWB&T is no longer the huge outlier it used to be. Here's what a typical CI run looks like now: This PR could still be useful (once fixed) as we could still move those ROM tests that are on the right of the graph to begin after the SW build rather than the SW test. Those ROM tests still depend on the CW310 bitstream though, which will probably be the new critical path. Something like this: Above: SWB&T is split into a build and test step, not sure what the ratios are with the new improved dependency tracking. The bitstream build times become the new limiting factor. |
4a645e2
to
58e3f4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's really nice @jwnrt :)
The test step will be moved to another job depending on this one. Signed-off-by: James Wainwright <[email protected]>
This is now a separate job that others don't have to depend on. Signed-off-by: James Wainwright <[email protected]>
58e3f4e
to
0aff3c0
Compare
Here's a Gantt of the CI with these changes (well, before the most recent no-op force push): This run did not use cached bitstreams, so those ROM tests started quite late. I'm hoping to get a CI run that uses cached bitstreams to show what that looks like. EDIT: nevermind, changes to the azure-pipelines.yml file trigger a bitstream rebuild (not sure why) so we won't see that in this PR. The "CW310 ROM tests" job is now the standout in the critical path. Although it looks quite long in that chart, the duration matches the average we usually get. |
Here's a run which hit the bitstream cache: BeforeAfterWas hoping for more than 15 minutes, but still something! Weirdly, the build job + test job take longer than the SW B&T job used to. Maybe some part of the cache isn't being passed between the two? I will look into it soon since we're still wasting cycles rebuilding even if it's not affecting the critical path. |
I suspect the two steps are duplicating the otp image builds. I am not sure why they are not cached, we need to dig into to understand that. |
This PR attempts to cut down total CI length by splitting the "SW build and test" into "build" and "test" jobs to shorten the critical path.
Here's a Gantt chart of a typical CI run before this PR:
You can see that "software build & test" (SWB&T) takes the longest and is a dependency of the three FPGA tests (and a short packaging test).
The FPGA jobs only depend on the software build, not the test part. This PR split the build & test job so that the FPGA tests can start just after the build has finished.