-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
boards: arm: mps2_an385: Enable QEMU icount mode #24811
boards: arm: mps2_an385: Enable QEMU icount mode #24811
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind updating the commit message to explain what the various arguments to -icount
mean.
well, one should refer to the QEMU documentation for that ... the semantic of what it does is already explained in the commit message: "decouple the host clock from the emulated guest clock." |
This commit enables the QEMU icount mode for `mps2_an385`, in order to decouple the host clock from the emulated guest clock. This prevents guest timing instability from causing test failures when the host CPU load is very high. The icount `shift` value of 7 was empirically chosen to allow the tests to complete in both realistic and reasonable amount of time. The following are quick notes on the parameters used: * -icount shift=7: Execute one instruction every 128ns of virtual time * -icount align=off: Do not synchronise the host and guest clocks * -icount sleep=off: Advance virtual time without sleeping/waiting * -rtc clock=vm: Isolate the guest RTC time from the host Signed-off-by: Stephanos Ioannidis <[email protected]>
I don't think we need to duplicate what's explained in the QEMU documentation. |
6315d22
to
f98a882
Compare
Added quick notes on the parameters used; for more details, one should reference the QEMU documentation. |
I guess I meant more why |
Gotcha, yeah that's useful. |
just see this PR, but why the shift value is 7? |
@wentongwu the shift value of 5 was making tests run too slow (e.g. 1000ms sleep was taking over 5000ms IIRC) and causing them to time out. 7 turned out to be the closest to the expected time. |
@stephanosio The virtual time of shift = 5 is the most closet clock compared with the real platform, there will be failed cases when running in heavy loaded host with shift = 7 (e.g. with 10 different sanity-check processes running at the same time on 10 different Zephyr source code folders). But it's true as I said in #14173 comment that it will take long wall time to complete the tests if shift value is low. So I add some changes in sanity-check in #22904 to calculate QEMU's cpu time. |
Since the guest does not interact with external time sources and there are no external time constraints, the shift value used does not really matter -- we should choose a value that is the most advantageous for testing. Moreover, the "virtual clock" produced by the shift value of 5 is also off by tens of percent from the actual clock (25MHz).
I am pretty sure sanitycheck is not meant to be run that way. If you run 10 instances of anything that is supposed to utilise 100% of CPU, that will likely fail from various complications (e.g. timeout). I have tested sanitycheck with 32 parallel "jobs" (equal to the CPU thread count) and did not see any failures; the CPU load was at 100% for the entire time. |
@stephanosio did you attempt/run into any issues making the same change to mps2_an521, qemu_cortex_m3, qemu_cortex_m0, or qemu_cortex_a53? Seems like we ought to apply this to all ARM QEMU targets. |
@andrewboie It does seem that |
@stephanosio this issue actually appears to affect, to varying degrees, all QEMU targets. See #14173 |
@andrewboie sure, I meant another "major" offender :) I locally ran sanitycheck for As for the rest of ARM platforms, I was thinking @wentongwu's #22904 could address them, but if he does not have a short-term plan to get that PR in, then I could look into them as well. |
This fixes the infamous
mps2_an385
-related "intermittent" CI failures.I have locally run
mps2_an385
sanitycheck to verify that this fixes the current problem -- it passed 10/10 times. Without this fix, it is practically impossible to get the sanitycheck to pass in the first run.https://gist.github.com/stephanosio/03c38bb39deca2efa96eda6bc68de5b2
Provides a partial fix for #14173.