-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: multi-ms sweep termination pauses (second edition) #42642
Comments
Thank you for filing this bug @rafael, welcome to the Go project, and great to catch you here! I shall kindly loop in the runtime crew @dr2chase @mknyszek @aclements @randall77 @cherrymui. |
That looks a lot to me like #10958 which should have been fixed by #24543... but it could be one of the points listed at #36365. Do you happen to know if your service is generating a lot of fairly large allocations and therefore doing a lot of zeroing in the runtime? That's one place where the scheduler still can't quite preempt that we've seen is still an issue for other folks (having trouble finding the GitHub issue for it). What are the blue and orange goroutines doing during that "STW"? |
Several pause-causes are described in #27732 ; you might want to give that a look, and I'll stare hard at you traces for a bit. We fixed some of those hiccups, but not all. |
Thanks @odeke-em! Great to see you over here 🙌 @mknyszek / @dr2chase thanks for the pointers. I will start looking at those to see if it could explain what we are seeing.
What would be considered big? This is doing a fairly large of allocations but most of them seem small to me. I have uploaded a heap profile here so you can take a look. This is the overall amount of allocation reported by this host. It is important to call out that we are trying to move a lot of data. During some of these tests we are pushing 200MB/s of data through the network:
It's always kind of similar. From the application perspective, around these pauses what I've seen is that it's either reading from a unix socket or marshaling data into protocol buffers. It's always stopping at a malloc call, but I'm assuming this is just where golang was able to stop the goroutine. Here detailed example from a bigger pause: The blue here is reading from a unix socket: Then for the purple one, this is the end stack trace: |
The fact that the stack trace has a I suspect that these large buffers aren't showing up in the heap profile that you shared because that's an |
Thinking how can we instrument this. Is there something in the golang profiling toolkit that I can leverage to confirm this? |
@mknyszek was going to make a suggestion, decided to make a CL instead. Incoming.... Is a megabyte something that we can zero "quickly"? That was the lump size I chose. I tested it with 1k, of course. |
Change https://golang.org/cl/270943 mentions this issue: |
Anecdotally, after @mknyszek comment about large buffers, I tried a change in Vitess codebase that seemed to have helped. I need to gather more data to confirm. I'll post more info about this tomorrow. |
The second version of the CL above might also help; it addresses one instance of uninterruptibly zeroing large buffers that would have caused the traces you provided. |
I used the stats from Memstats by size allocations and this is the distribution:
Overall, it doesn't look like the service is doing large allocations. Is there something I'm still missing here? |
@rafael It's also still possible this is something else entirely, and that we're just getting "lucky" and the first possible preemption is at an allocation. We may need to just instrument the runtime to try to confirm this. |
Cool! Let me do that. FWIW, I also manually instrumented the area of the vitess codebase were most of the allocations are happening in this program and also didn't notice large allocations. |
@rafael Cool! Thanks. If I'm reading this right, the following functions produce 6.12 MiB allocations:
The top two appear at the top of the goroutine stacks you shared above. This is pretty good evidence that the non-preemptible allocation (and zeroing) is to blame. It would be nice to try and get closer to understanding whether its zeroing specifically that's the problem, or the operation as a whole. Could you run |
Oh, oh this is interesting indeed! Here the results of the benchmark:
|
I have one more insight to share. I think I have confirmed that indeed this 6MB allocation is the one to blame for the long pauses. I did some tests where I'm not allowing allocations of these size and the long GC pauses go away. Here a before, during and after the test: Basically, P75 goes from 2ms to less than 1ms. @mknyszek is the benchmarck is telling you anything about the zeroing problem? I don't have a point of reference there, so not sure how to interpret the results 😅 |
Any interest/ability in trying that CL? The latest version is good (it breaks up zeroing of non-pointer allocations into 1MiB chunks, correctly). There are other such loops to consider (slice copying, which includes if one is reallocated), but that is probably the one biting you. |
For sure, I should be able to test it. Do you have a canonical guide I can follow to compile this patch locally? |
I guess it should be this: https://golang.org/doc/install/source |
Thanks for the update! The benchmark results are interesting; it's probably worth noting that the I wrote a little benchmark for the large allocation hot path and I'm getting like 500ns at tip (which is a bit faster than Go 1.15 but on the order of tens of nanoseconds) on my VM so it's probably not that. I thought that it could be getting stuck in sweeping, but that's not it either since it would show up in the execution trace (we have explicit tracepoints for that). Regarding your comment above (#42642 (comment)), I have a couple questions:
|
@mknyszek, for sure. Thank y'all for all the help here! Answers to the questions:
In this test that we are running, there was a query with a single row that returns 6MB of data. In the tablet implementation, this ends up being a single allocation. I stopped sending this query to the tablet. So they never get to MySQL and we never have to marshal/unmarshal responses of 6MB in a single row. In terms of QPS, this query is less than 1% of all the load that we are generating on this test. Let me know if this clarifies it.
I posted that one, but I was also watching the GC traces. This is what I saw there:
Before I was never able to get < 1ms STW Sweep termination in a consistent way. Does this seem like a good test to you? Should be watching something else? |
Re: just disabling a specific kind of query and it being only 1% of total load is very useful information. I assume other queries go down a similar code path? And the GC trace I think confirms it because it actually includes the pause times. I think that's pretty good evidence that it's the large allocation itself. |
Correct, all the other queries go down the exact same path. |
@rafael Oh, sorry, forgot. If compiling and testing is local to you:
then put |
@dr2chase - Testing the compiled version in the exact environment will take me a bit longer. However, now that we know that it's this specific workload, I'm able to easily reproduce the issue in my local environment. This is how the gc traces look without your change:
Now a similar test with your changes (
The jumps between 10 and 15 is when I start the test. But after that it seems to be stable with STW pauses taking less than 1ms. It seems that your patch fixes this problem 😀 |
To be clear, the local test, it's a vitess benchmark that reads from a table that returns a row of 6MB. Besides the environment, the main difference with the other test I've been running is that I'm only generating workloads with the problematic query. |
Wonderful, thanks for testing this. I may (this is allegedly not my highest priority, but I hate these GC-hiccup-bugs) take a shot at those, too. Easy case is pointer-free, but that's what gets used for things like "I have a 10 MB file to read, here is a buffer, do it". |
…tion If something "huge" is allocated, and the zeroing is trivial (no pointers involved) then zero it by lumps in a loop so that preemption can occur, not all in a single non-preemptible call. Updates golang#42642. Change-Id: I94015e467eaa098c59870e479d6d83bc88efbfb4
Random question, heading for 1.17 freeze. |
Hi! We were testing in a box with many processors (64 to be exact). |
…tion If something "huge" is allocated, and the zeroing is trivial (no pointers involved) then zero it by chunks in a loop so that preemption can occur, not all in a single non-preemptible call. Benchmarking suggests that 256K is the best chunk size. Updates #42642. Change-Id: I94015e467eaa098c59870e479d6d83bc88efbfb4 Reviewed-on: https://go-review.googlesource.com/c/go/+/270943 Trust: David Chase <[email protected]> Run-TryBot: David Chase <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Michael Knyszek <[email protected]>
I think this is closed, since the CL solved the problem when tested, and the CL is in (and has been for some time). |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
N/A
What operating system and processor architecture are you using (
go env
)?What did you do?
This is in the context of vttablet process in Vitess. VTablet provides a grpc service that runs as a sidecar to a MySQL server. It is mostly forwarding requests to mysql and then marshal/unmarshal data using protocol buffers. Some stats gathered during this test:
go_memstats_alloc_bytes_total
is around 800MB/s.What did you expect to see?
I expected sub-millisecond stop-the-world times during garbage collections. This seems very similar to what is reported in #17831. It seems like this shouldn't be happening.
What did you see instead?
Similar to what is described in #17831, I'm seeing mark termination STW times of over 1ms somewhat regularly:
Here a look of the trace in detail:
Would love to see if you have some pointers on how to debug this further.
The text was updated successfully, but these errors were encountered: