Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Always restore MSR_IA32_TSC_DEADLINE after MSR_IA32_TSC #4666

Merged
merged 5 commits into from
Jul 8, 2024

Conversation

roypat
Copy link
Contributor

@roypat roypat commented Jul 5, 2024

When restoring MSRs, we currently do not enforce any sort of relative ordering. However, some MSRs have implicit dependencies on other MSRs being restored before them, and failing to fulfill these dependencies can result in incorrect VM behavior after resuming.

One example of such an implicit dependency between MSRs is the pair (MSR_IA32_TSC_DEADLINE, MSR_IA32_TSC). When restoring MSR_IA32_TSC_DEADLINE, KVM internally checks whether the value of this restoration implies that the guest was waiting for the tsc_deadline timer to expire at the time of being paused for snapshotting. If yes, it primes a (either hardware or software depending on support) timer on the host to make sure the guest will receive the expected interrupt after restoration. To determine whether this is needed, KVM looks at the guest's timestamp counter (TSC) and compares it with the requested tsc_deadline value. The value KVM reads for the guest's TSC depends on the value of MSR_IA32_TSC [6][7]. Thus, if MSR_IA32_TSC_DEADLINE is set before MSR_IA32_TSC is restored, this comparison will yield a wrong result (as the deadline value is compared with something uninitialized) [5]. This can either result in KVM determining the guest wasn't waiting for a timing expiry at the time of snapshotting, or cause it to schedule the timer interrupt too far into the future.

Note that the former is only a problem on AMD platforms, which do not support the TSC_DEADLINE feature at the hardware level. Here, KVM falls back to a software timer [1], which explicitly does not notify the vCPU if the deadline value is "in the past" [2][3]. The hardware timer used on other x86 platforms on the other hand always fires (potentially firing immediately if the deadline value is in the past) [4].

This commit fixes the above by ensuring we always restore MSR_IA32_TSC before restoring MSR_IA32_TSC_DEADLINE. We realize this by splitting the lists of MSRs that KVM gives us into one additional chunk containing all "deferred" MSRs that needs to be restored "as late as possible". This splitting happens at snapshot creation time, to get it off the hot-path.

Fixes #4099 (validated by running the reproducer script for 50 iterations on an m6a.metal and not observing the issue anymore)

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this
    PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet
    contribution quality standards.

  • This functionality cannot be added in rust-vmm.

Copy link

codecov bot commented Jul 5, 2024

Codecov Report

Attention: Patch coverage is 92.85714% with 2 lines in your changes missing coverage. Please review.

Project coverage is 82.12%. Comparing base (9103fba) to head (38d4d72).

Files Patch % Lines
src/vmm/src/vstate/vcpu/x86_64.rs 92.00% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4666   +/-   ##
=======================================
  Coverage   82.11%   82.12%           
=======================================
  Files         255      255           
  Lines       31261    31281   +20     
=======================================
+ Hits        25671    25689   +18     
- Misses       5590     5592    +2     
Flag Coverage Δ
4.14-c5n.metal 79.62% <92.85%> (+<0.01%) ⬆️
4.14-m5n.metal 79.61% <92.85%> (+<0.01%) ⬆️
4.14-m6a.metal 78.82% <92.85%> (+<0.01%) ⬆️
4.14-m6g.metal 76.63% <ø> (ø)
4.14-m6i.metal 79.60% <92.85%> (+<0.01%) ⬆️
4.14-m7g.metal 76.63% <ø> (ø)
5.10-c5n.metal 82.13% <92.85%> (+<0.01%) ⬆️
5.10-m5n.metal 82.12% <92.85%> (+<0.01%) ⬆️
5.10-m6a.metal 81.43% <92.85%> (+<0.01%) ⬆️
5.10-m6g.metal 79.41% <ø> (+<0.01%) ⬆️
5.10-m6i.metal 82.12% <92.85%> (+<0.01%) ⬆️
5.10-m7g.metal 79.41% <ø> (ø)
6.1-c5n.metal 82.13% <92.85%> (+<0.01%) ⬆️
6.1-m5n.metal 82.12% <92.85%> (+0.01%) ⬆️
6.1-m6a.metal 81.42% <92.85%> (+<0.01%) ⬆️
6.1-m6g.metal 79.40% <ø> (-0.01%) ⬇️
6.1-m6i.metal 82.11% <92.85%> (-0.01%) ⬇️
6.1-m7g.metal 79.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@roypat roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Jul 5, 2024
Copy link
Contributor

@kalyazin kalyazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Well done resolving it!
  2. We should mention this in the changelog.

src/vmm/src/vstate/vcpu/x86_64.rs Show resolved Hide resolved
@roypat
Copy link
Contributor Author

roypat commented Jul 8, 2024

  1. Well done resolving it!

thanks!!

2. We should mention this in the changelog.

True, done :)

kalyazin
kalyazin previously approved these changes Jul 8, 2024
zulinx86
zulinx86 previously approved these changes Jul 8, 2024
Copy link
Contributor

@zulinx86 zulinx86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, Patrick!!!

When restoring MSRs, we currently do not enforce any sort of relative
ordering. However, some MSRs have implicit dependencies on other MSRs
being restored before them, and failing to fulfill these dependencies
can result in incorrect VM behavior after resuming.

One example of such an implicit dependency between MSRs is the pair
(`MSR_IA32_TSC_DEADLINE`, `MSR_IA32_TSC`). When restoring
`MSR_IA32_TSC_DEADLINE`, KVM internally checks whether the value of this
restoration implies that the guest was waiting for the tsc_deadline
timer to expire at the time of being paused for snapshotting. If yes, it
primes a (either harddware or software depending on support) timer on
the host to make sure the guest will receive the expected interrupt
after restoration. To determine whether this is needed, KVM looks at the
guest's timestamp counter (TSC) and compares it with the requested
tsc_deadline value. The value KVM reads for the guest's TSC depends on
the value of MSR_IA32_TSC. Thus, if MSR_IA32_TSC_DEADLINE is set before
MSR_IA32_TSC is restored, this comparison will yield a wrong result (as
the deadline value is compared with something uninitialized). This can
either result in KVM determining the guest wasn't waiting for a timing
expiry at the time of snapshotting, or cause it to schedule the timer
interrupt too far into the future.

Note that the former is only a problem on AMD platforms, which do not
support the TSC_DEADLINE feature at the hardware level. Here, KVM falls
back to a software timer, which explicitly does not notify the vCPU if
the deadline value is "in the past". The hardware timer used on other
x86 platforms on the other hand always fires (potentially firing
immediately if the deadline value is in the past).

This commit fixes the above by ensuring we always restore MSR_IA32_TSC
before restoring MSR_IA32_TSC_DEADLINE. We realize this by splitting the
lists of MSRs that KVM gives us into one additional chunk containing all
"deferred" MSRs that needs to be restored "as late as possible". This
splitting happens at snapshot creation time, to get it off the hot-path.

Fixes firecracker-microvm#4099

Signed-off-by: Patrick Roy <[email protected]>
add changelog entry about firecracker-microvm#4099

Signed-off-by: Patrick Roy <[email protected]>
zulinx86
zulinx86 previously approved these changes Jul 8, 2024
As the previous commit shows, the order of restoration of MSRs can be
important. Thus, we should not effectively randomly shuffle them
whenever we construct a VM. With this commit, we preserve the order in
which KVM tells us about MSRs in `KVM_GET_MSR_INDEX_LIST`, under the
assumption that if any dependencies exist, KVM will account for them as
part of the above IOCTL.

Signed-off-by: Patrick Roy <[email protected]>
@pb8o pb8o merged commit 3f969b3 into firecracker-microvm:main Jul 8, 2024
7 checks passed
@roypat roypat mentioned this pull request Jul 9, 2024
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Processes get stuck after resuming VM from snapshot
5 participants