-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: network blips during concurrent GC #20457
Comments
\cc @aclements |
I note that the |
I forgot to add that I instrumented the Go runtime and can see lots of calls to |
Linux shows similar behavior: I'm not sure what to make of the 3 idle procs (5, 6 and 7). |
Probably unrelated, but #20307 also observed that GC had a detrimental effect on goroutine scheduling. |
@petermattis if you can reproduce easily, would you mind trying out tip (and maybe 1.7.5) to see whether anything has changed? |
@josharian Tip as of The networking blip is still there during GC, but it is no longer precisely 10ms. The network blips during sweep seem to be much reduced: |
I'm reasonably convinced that what I'm seeing is CPU starvation of my load generator. I tweaked the output of
The I tried disabling idle GC workers which significantly reduced the CPU spikes during GC, but made latencies much worse: In the above trace, CPU usage during marking is much lower than stock go1.8.1:
The downside to removing the idle GC workers is that the GC periods are about twice as long. Seems like there is a high-level question to be answered about how much CPU GC should use when the process is not CPU limited. |
Dup of #17969 ("aggressive GC completion is disruptive to co-tenants") then? |
Yep, that seems like the same issue I'm seeing. |
You're right that at the moment it's possible for not all idle Ps to help with GC. This is #14179 (which we fixed once, but the fix was broken, so then we unfixed it).
Yes. :) But I don't know what the answer is. Ideally we'd get the OS scheduler involved, since it's the only thing that knows what CPU resources are really idle. But (right now) we need global coordination between the GC workers, so we can't just run them at OS idle priority. Thanks for all the traces and debugging! Closing as a dup of #17969. |
What version of Go are you using (
go version
)?go version go1.8.1 darwin/amd64
What operating system and processor architecture are you using (
go env
)?What did you do?
While investigating tail latencies in CockroachDB, I noticed network blips during concurrent GC. The trace below is from a single-node cockroach cluster with a single client sending it requests as fast as possible. Note that the load is modest. My test machine is ~75% idle while gathering these traces.
trace.8.out.zip
Every time GC runs, there are no networking events for ~10ms which is quite close to the length of the GC run itself. This behavior occurs very quickly after starting the cluster and load generator. The heap size is modest (O(100 MB)).
Here is a zoom in on the first GC run from the above trace:
In addition to the blip while GC is running, there is a smaller blip after each GC run while sweeping. The image below shows G611 being called on to assist in sweep work from 62ms-69ms in the trace.
Note that I'm running both the
cockroach
process and the load generator (kv
) on the same machine. Are the GC runs in thecockroach
process starving thekv
process of CPU? Running both processes withGOMAXPROCS=4
(I'm using an 8 core machine), shows much better behavior.trace.4.out.zip
The text was updated successfully, but these errors were encountered: