-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: problems with rdtsc in VMs moving backward #8976
Comments
On older Intel (or possibly only AMD, I can't remember) multi-core processors RDTSC is not kept synchronized between different cores. The different cores increment their counters independently, and are only periodically synchronized. My understanding is that the Linux kernel has some moderately complex code to ensure that RDTSC always give monotonically increasing results even when called from different cores. Presumably the Solaris kernel does not have that code. |
Sounds like we should probably audit our code to make sure that we're not depending on monotonic time. I just fixed the blocking profile, but I'm sure there are others. Can we move solaris to gethrvtime instead of rdtsc? Do any of our other OSes have this problem? I would imagine we can cope with RDTSC that goes backwards a little bit, but if it is totally unsynchronized we're in trouble. |
In a previous job I spent a lot of time on timekeeping stuff. I'm of the mind that for newer intel CPUs (basically all of the 64bit cpus) RDTSC _can_ be guaranteed to at least be monotonically increasing at a given rate, irrespective of changing of power management and clock speeds. This is a good thing. However, when you have multiple cores on a die, or multiple cpu packages on a machine, RDTSC is not guaranteed to report a monotonically increasing value. That is, if you call RDTSC and then your thread is rescheduled onto another core, the second RDTSC instruction _may_ report a value which is slightly behind the first (where slightly is in the scale of a small % of a counter moving in the gigahertz range). My knowledge of this is about three years out of date, but when I looked into the linux kernel code, it is not possible to set the TSC counter accurately so the linux kernel can only zero the counter across each CPU as it is coming online, which gets the counters close (again with a % of something free running in the ghz range). At the time the PPTP software I was using ran in kernel mode and pinned itself to one CPU core thereby eliminating a potential that TSC counters may drift between cores or may not be monotonic with respect to other cores. This also explains why the build is failing on the smartos solaris builder, but not on the sol11 builder, the former is running on real hardware, a big machine with multiple sockets, where the latter is running inside vmware which goes to some lengths to 'fix' RDTSC by virtualising the isntruction. |
Comment 7 by [email protected]: The other posters are correct; on Solaris, you should use gethrvtime() or gethrtime() as appropriate. It is guaranteed monotonic, and is optimised (along with gettimeofday() in newer releases) to be done entirely in userspace avoiding the kernel context switch. |
This also badly affects runtime tracer. |
@alexbrainman On my 6 core AMD, I'm tried running the following: I've tried running these on the following VM: In the different combinations, they all exhibited behavior of the monotonic time not being monotonic if the core count was 2 or more. If I only allocated a single core to the VM, on VirtualBox the pprof tests pass, but on VMWare they still fail. I have not observed this issue running on metal 4 core AMD Windows 7 boxes. |
Virtual machines really cannot tell time. I don't expect RDTSC to work On Fri, Apr 3, 2015 at 4:27 PM, Daniel Theophanes [email protected]
|
@davecheney If that's the expectation, should we define a IN_VM or NO_MONOTONIC_TICKS environment variable for builders that are running in VMs? |
FWIW, we already use |
@kardianos, do you observe non-monotonic RDSTCs when running windows
on that 6-core AMD box physically?
I imagine that VMM will just expose the underlying timestamp counter of the
physical cpus, so I couldn't understand why the timestamps are not monotonic
in the VM whereas on the physical machine it is.
Does the Windows VM detects it's running on AMD (virtual) cpus?
|
@minux When I run Windows XP SP3 on the 6 core box physically I observe monotonic; all tests pass repeatably. The windows VM detects the make and model of my host AMD cpus. |
Can someone explain why this is important? |
Block profile and runtime tracer use RDSTC. |
If the block profile and runtime tracer require monotonic time, why don't they use the OS-specific APIs that return monotonic time? |
OS timers have too low precision. |
From my comment on https://go-review.googlesource.com/#/c/8736/2 :
|
The TSC counter is not monotonic unless your process is pinned to a CPU On Fri, Apr 10, 2015 at 10:38 PM, Brad Fitzpatrick <[email protected]
|
It's not just VMs, either. Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz |
I think this issue is old and can be closed. The profiler issues have largely been solved on solaris and windows with a combination of using a stable sort and manual sequence order to work around this. |
@kardianos @bradfitz This bug is absolutely not fixed, I can trigger it just as before with Go 2f76c19. Details for that specific scenario are still at #16755 |
(Well, I guess that depends on your definition of "this bug". Over-reliance on RDTSC to be monotonic across cores still definitely exists.) |
Yes, solaris is still affected. Let's continue in #16755. |
The text was updated successfully, but these errors were encountered: