Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] CI critical path tracking issue #20350

Open
jwnrt opened this issue Nov 15, 2023 · 3 comments
Open

[ci] CI critical path tracking issue #20350

jwnrt opened this issue Nov 15, 2023 · 3 comments
Assignees
Labels
Component:CI Continuous Integration (Azure Pipelines & Co.) Help Wanted Hotlist for issues that requires extra attention Type:Task Tasks, to-do list.

Comments

@jwnrt
Copy link
Contributor

jwnrt commented Nov 15, 2023

Description

This issue is for tracking work towards optimising the critical path of our CI runs to reduce the start-to-end time.

Data

These dashboards are useful for analysing the critical path:

  • Job Explorer - enter the ID of an Azure pipelines run (?buildid= in a run's URL) to see its Gantt chart.
  • Total CI Duration - time from creation to finish of all successful master-branch CI runs.
  • CI Dashboard - tracks the durations of individual jobs and tests.

You can view these dashboards by signing in with an OpenTitan account. Apologies if you're reading this and don't have one.

Analysis

Typical CI runs looked like this a few months ago (3+ hours):

image

Among these jobs are:

  • Bitstream builds which take about an hour each and run in parallel to one another.
  • SW Build & Test which takes by far the longest at around 2 hours.
  • FPGA tests over on the right, which depend on both bitstreams and SWB&T.

Progress

Some improvements so far:

  1. [bazel] Rewrite binary & test rules  #19650
    • Improved dependency tracking has greatly reduced the SWB&T job to ~50m.
  2. [ci] Address @bitstreams//BUILD.bazel nullifying action cache #20307
    • Investigations into the bitstream cache give us more cache hits, avoiding 1-hour rebuilds.
  3. [ci] Split the 'SW build & test' job into build and test jobs #19577
    • SWB&T was split into separate build and test jobs so that FPGA tests can depend on just the build and not the tests. Saves ~15 minutes with a bitstream cache hit.

Typical CI runs now look like this (1h 34m):

image

Further work

The critical path is now:

Quick lint --+--> bitstreams (if not cached) --+--> CW310 ROM tests
              \                               /
               '--> SW build (if cached) ----'

Work is ongoing to improve bitstream cache hits further, and I suspect the SW build job isn't using as many cached artifacts as it could, but both of these need investigation.

The Quick Lint job takes ~10 minutes and contains a 5 minute documentation build. I'm moving this to Slow Lints in #20339.

@jwnrt jwnrt added Type:Task Tasks, to-do list. Component:CI Continuous Integration (Azure Pipelines & Co.) Help Wanted Hotlist for issues that requires extra attention labels Nov 15, 2023
@jwnrt
Copy link
Contributor Author

jwnrt commented Nov 16, 2023

If you're reading this issue and have ideas on how to reduce the duration of the critical path, it would be great to hear!

@johngt
Copy link

johngt commented Nov 16, 2023

Assigning @charles-at-lowrisc - just for awareness - not necessarily to work on this.

@jwnrt
Copy link
Contributor Author

jwnrt commented Jan 18, 2024

Some updates:

  • We're working on getting the Bazel cache working again tracked in this issue: [ci] Bazel cache not being used #20844.
  • Now that RTL changes are unblocked and we're rebuilding bitstreams again, we should probably move the check to see if we need to build (or load from cache) off of our bitstream builder pool. Bitstream jobs can queue up for ages just to load from the cache.

24-01-18_19:53:44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:CI Continuous Integration (Azure Pipelines & Co.) Help Wanted Hotlist for issues that requires extra attention Type:Task Tasks, to-do list.
Projects
None yet
Development

No branches or pull requests

3 participants