Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bazel] allocate additional core per verilator test to allow for harness and overhead #16915

Merged

Conversation

drewmacrae
Copy link
Contributor

@drewmacrae drewmacrae commented Dec 21, 2022

An alternative to #16913 to prevent harnesses from getting pre-empted by a simulation load.

Running 17 verilator tests and 1 fpga test on 72 cores will cause problems unless the test harnesses are prioritized above the verilated model. An alternative to this prioritization is to recognize that verilated tests require 4 cpus for the simulation alone and further resources for appropriate overhead. Empirically 17/72ths of a core is too little to prevent issues but 16/72nds is safe, but bazel requires cpu: (it parses up to the decimal and stops without throwing an error).

Won't fix: #16831 because it uses local_test_job based scheduling that overrides resource constrained scheduling. For #16831, similar methods would cause a 30% performance hit.

Signed-off-by: Drew Macrae [email protected]

rules/opentitan_test.bzl Outdated Show resolved Hide resolved
@drewmacrae
Copy link
Contributor Author

drewmacrae commented Dec 21, 2022

I think a 20% performance hit might be worth the simplicity this offers relative to #16913

@dmcardle
Copy link
Contributor

I'm kind of surprised that you can allocate a fractional number of CPUs. What does that mean? Is this the best reference for the cpu:n tag? https://bazel.build/reference/test-encyclopedia#other-resources

@drewmacrae
Copy link
Contributor Author

I was confused, the resources folks at bazel want the extra resources to be a float, but the cpus appears to be parsed as an int, now I need to figure out why this appeared to work. :/

@drewmacrae
Copy link
Contributor Author

Looks like it runs 18 simultaneously, so it's parsing as 4, not 4.5. There's a bit of validation code that appears to be having some trouble with this.

@drewmacrae
Copy link
Contributor Author

@drewmacrae drewmacrae changed the title [bazel] allocate 4.5 cores per verilator test to allow for harness and overhead [bazel] allocate 5 cores per verilator test to allow for harness and overhead Dec 22, 2022
@drewmacrae drewmacrae changed the title [bazel] allocate 5 cores per verilator test to allow for harness and overhead [bazel] allocate additional core per verilator test to allow for harness and overhead Dec 22, 2022
@drewmacrae drewmacrae force-pushed the conservative_resource_allocation branch from 5fa02a0 to e99a6f2 Compare December 22, 2022 14:30
@drewmacrae
Copy link
Contributor Author

ci/scripts/run-verilator-tests.sh overrides the effect of this commit so CI verilator jobs wont be protected by it. They would be safer if they limited --local_test_jobs to 4 or allocated 5 CPUs per test runner instead of 4.

@drewmacrae drewmacrae force-pushed the conservative_resource_allocation branch from e99a6f2 to e6bd406 Compare December 22, 2022 14:45
@drewmacrae drewmacrae marked this pull request as ready for review December 22, 2022 15:15
@@ -152,7 +152,7 @@ def verilator_params(
"@//hw:verilator",
"@//hw:fusesoc_ignore",
]
required_tags = ["verilator", "cpu:4"]
required_tags = ["verilator", "cpu:5"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably ought to depend on the number of threads for which the verilated model is compiled, but it's fine for now.

Copy link
Contributor Author

@drewmacrae drewmacrae Dec 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely ought to, but the tags don't like being updated when that information is known to Bazel as far as I can tell. So I got stuck trying to implement that.

Copy link
Contributor

@dmcardle dmcardle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w/ one nit! Thanks, Drew!

.bazelrc Outdated Show resolved Hide resolved
…overhead

An alternative to lowRISC#16913 to prevent harnesses from getting pre-empted by
a simulation load.

We also stop overriding resource based allocation by removing
--local_test_jobs arg and update documentation

Signed-off-by: Drew Macrae <[email protected]>
@drewmacrae drewmacrae force-pushed the conservative_resource_allocation branch from e6bd406 to 441b09c Compare December 22, 2022 18:02
@drewmacrae drewmacrae added the Status:Ready to merge PR is ready to be merged by a committer. label Dec 22, 2022
@drewmacrae drewmacrae merged commit 5603629 into lowRISC:master Dec 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status:Ready to merge PR is ready to be merged by a committer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ci/test] sw_strap_value_sim_verilator fails periodically in CI
3 participants