Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: multi-ms sweep termination pauses (second edition) #42642

Closed
rafael opened this issue Nov 16, 2020 · 31 comments
Closed

runtime: multi-ms sweep termination pauses (second edition) #42642

rafael opened this issue Nov 16, 2020 · 31 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@rafael
Copy link

rafael commented Nov 16, 2020

What version of Go are you using (go version)?

go1.15.5 linux/amd64

Does this issue reproduce with the latest release?

N/A

What operating system and processor architecture are you using (go env)?

linux/amd64

What did you do?

This is in the context of vttablet process in Vitess. VTablet provides a grpc service that runs as a sidecar to a MySQL server. It is mostly forwarding requests to mysql and then marshal/unmarshal data using protocol buffers. Some stats gathered during this test:

  • It is processing around 5k requests per second.
  • The live heap is around 100MB.
  • The rate of allocation reported by go_memstats_alloc_bytes_total is around 800MB/s.
  • ~1k goroutines per second are running.
  • ~14 GC operations per second.

What did you expect to see?

I expected sub-millisecond stop-the-world times during garbage collections. This seems very similar to what is reported in #17831. It seems like this shouldn't be happening.

What did you see instead?

Similar to what is described in #17831, I'm seeing mark termination STW times of over 1ms somewhat regularly:

gc 734 @86.464s 1%: 1.2+6.9+0.074 ms clock, 121+29/89/24+7.1 ms cpu, 113->133->82 MB, 121 MB goal, 96 P
gc 735 @86.532s 1%: 3.2+4.4+0.15 ms clock, 314+10/86/19+15 ms cpu, 147->151->80 MB, 165 MB goal, 96 P
gc 736 @86.618s 1%: 0.14+5.8+0.087 ms clock, 13+8.8/122/71+8.4 ms cpu, 149->153->61 MB, 160 MB goal, 96 P
gc 737 @86.694s 1%: 0.28+12+0.12 ms clock, 27+14/141/79+11 ms cpu, 117->119->72 MB, 122 MB goal, 96 P
gc 738 @86.819s 1%: 1.0+7.4+0.12 ms clock, 103+37/153/102+12 ms cpu, 142->149->90 MB, 144 MB goal, 96 P
gc 739 @86.914s 1%: 0.80+4.8+0.13 ms clock, 77+7.4/90/70+12 ms cpu, 176->179->71 MB, 180 MB goal, 96 P
gc 740 @87.026s 1%: 0.10+4.5+0.17 ms clock, 10+33/85/28+16 ms cpu, 142->148->49 MB, 143 MB goal, 96 P
gc 741 @87.114s 1%: 0.090+3.9+0.10 ms clock, 8.7+2.2/66/38+10 ms cpu, 93->95->42 MB, 99 MB goal, 96 P
gc 742 @87.241s 1%: 0.19+3.7+0.10 ms clock, 18+19/66/10+10 ms cpu, 87->89->49 MB, 88 MB goal, 96 P
gc 743 @87.331s 1%: 1.1+4.6+0.14 ms clock, 113+19/80/35+13 ms cpu, 99->103->48 MB, 100 MB goal, 96 P
gc 744 @87.438s 1%: 0.094+4.0+0.11 ms clock, 9.0+3.5/67/44+10 ms cpu, 91->95->42 MB, 97 MB goal, 96 P
gc 745 @87.477s 1%: 0.11+3.3+0.069 ms clock, 10+2.7/66/41+6.6 ms cpu, 81->83->48 MB, 85 MB goal, 96 P
gc 746 @87.564s 1%: 2.0+5.6+0.11 ms clock, 194+40/82/18+11 ms cpu, 93->100->49 MB, 96 MB goal, 96 P
gc 747 @87.590s 1%: 1.8+6.8+0.099 ms clock, 179+24/75/23+9.5 ms cpu, 99->109->67 MB, 100 MB goal, 96 P
gc 748 @87.650s 1%: 7.9+13+0.072 ms clock, 759+37/86/23+6.9 ms cpu, 126->142->57 MB, 135 MB goal, 96 P
gc 749 @87.697s 1%: 0.71+3.6+0.11 ms clock, 68+2.2/66/40+11 ms cpu, 107->110->54 MB, 114 MB goal, 96 P
gc 750 @87.756s 1%: 0.20+3.5+0.10 ms clock, 19+7.0/70/8.0+9.6 ms cpu, 100->105->44 MB, 109 MB goal, 96 P

Here a look of the trace in detail:

image

Would love to see if you have some pointers on how to debug this further.

@odeke-em
Copy link
Member

Thank you for filing this bug @rafael, welcome to the Go project, and great to catch you here!

I shall kindly loop in the runtime crew @dr2chase @mknyszek @aclements @randall77 @cherrymui.

@odeke-em odeke-em added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 17, 2020
@mknyszek
Copy link
Contributor

That looks a lot to me like #10958 which should have been fixed by #24543... but it could be one of the points listed at #36365. Do you happen to know if your service is generating a lot of fairly large allocations and therefore doing a lot of zeroing in the runtime? That's one place where the scheduler still can't quite preempt that we've seen is still an issue for other folks (having trouble finding the GitHub issue for it). What are the blue and orange goroutines doing during that "STW"?

@dr2chase
Copy link
Contributor

Several pause-causes are described in #27732 ; you might want to give that a look, and I'll stare hard at you traces for a bit. We fixed some of those hiccups, but not all.

@rafael
Copy link
Author

rafael commented Nov 17, 2020

Thanks @odeke-em! Great to see you over here 🙌

@mknyszek / @dr2chase thanks for the pointers. I will start looking at those to see if it could explain what we are seeing.

Do you happen to know if your service is generating a lot of fairly large allocations and therefore doing a lot of zeroing in the runtime?

What would be considered big? This is doing a fairly large of allocations but most of them seem small to me. I have uploaded a heap profile here so you can take a look.

This is the overall amount of allocation reported by this host. It is important to call out that we are trying to move a lot of data. During some of these tests we are pushing 200MB/s of data through the network:

image

What are the blue and orange goroutines doing during that "STW"?

It's always kind of similar. From the application perspective, around these pauses what I've seen is that it's either reading from a unix socket or marshaling data into protocol buffers. It's always stopping at a malloc call, but I'm assuming this is just where golang was able to stop the goroutine.

Here detailed example from a bigger pause:
image

The blue here is reading from a unix socket:
image

Then for the purple one, this is the end stack trace:

image

@mknyszek
Copy link
Contributor

The fact that the stack trace has a makeslice in it suggests very strongly to me that the scheduler got stuck trying to preempt allocation of some large buffer (and specifically zeroing, the allocation itself is likely relatively fast, definitely much less than several milliseconds, even for large things). These pauses occur relatively often judging by the GC trace so I don't think it's the runtime talking to the OS or anything, either.

I suspect that these large buffers aren't showing up in the heap profile that you shared because that's an inuse_space profile and there probably aren't too many of these buffers live at the same time. It would be great if we could confirm this is the problem by identifying how large these buffers are.

@rafael
Copy link
Author

rafael commented Nov 17, 2020

Thinking how can we instrument this. Is there something in the golang profiling toolkit that I can leverage to confirm this?

@dr2chase
Copy link
Contributor

dr2chase commented Nov 18, 2020

@mknyszek was going to make a suggestion, decided to make a CL instead. Incoming....

Is a megabyte something that we can zero "quickly"? That was the lump size I chose. I tested it with 1k, of course.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/270943 mentions this issue: runtime: break up large calls to memclrNoHeapPointers to allow preemption

@rafael
Copy link
Author

rafael commented Nov 18, 2020

Anecdotally, after @mknyszek comment about large buffers, I tried a change in Vitess codebase that seemed to have helped. I need to gather more data to confirm. I'll post more info about this tomorrow.

@dr2chase
Copy link
Contributor

The second version of the CL above might also help; it addresses one instance of uninterruptibly zeroing large buffers that would have caused the traces you provided.

@rafael
Copy link
Author

rafael commented Nov 19, 2020

I used the stats from Memstats by size allocations and this is the distribution:

0      ▏ 0
8      ▌ 1.557397117e+09
16     ███████████████████▏ 6.0339252894e+10
32     █████▌ 1.7407392825e+10
48     ████▏ 1.3326983655e+10
64     ███████▏ 2.2233258888e+10
80     ██████ 1.8959178597e+10
96     ██ 6.076362153e+09
112    ▏ 5.83462182e+08
128    ▌ 1.317646087e+09
144    █▎ 3.903451002e+09
160    █▏ 3.587883411e+09
176    ▏ 1.02819092e+08
192    ▏ 2.58323406e+08
208    ▍ 1.211457619e+09
224    █▎ 3.962731206e+09
240    █▏ 3.651110021e+09
256    ▎ 7.86321473e+08
288    ▊ 2.355958895e+09
320    ▎ 7.27910111e+08
352    ▏ 4.7019461e+08
384    ▏ 2.21418366e+08
416    ▏ 1.26196223e+08
448    ▏ 1.14244214e+08
480    ▏ 1.36488985e+08
512    ▏ 8.1271424e+07
576    ▏ 1.69730361e+08
640    ▏ 3.29448594e+08
704    ▏ 5.3287569e+07
768    ▏ 1.42214736e+08
896    ▍ 1.072132732e+09
1024   ▍ 1.009615789e+09
1152   ▏ 1.28482225e+08
1280   ▏ 8.6099698e+07
1408   ▏ 6.0357244e+07
1536   ▏ 8.4883717e+07
1792   ▏ 2.05473615e+08
2048   ▏ 3.66724011e+08
2304   ▏ 2.66266386e+08
2688   ▏ 2.37473527e+08
3072   ▏ 1.24092779e+08
3200   ▏ 7.841361e+06
3456   ▏ 3.0968758e+07
4096   ▏ 1.07436131e+08
4864   ▏ 2.5853161e+07
5376   ▏ 1.8418524e+07
6144   ▏ 3.2067115e+07
6528   ▏ 1.407301e+07
6784   ▏ 1.0100237e+07
6912   ▏ 3.334964e+06
8192   ▏ 3.1100912e+07
9472   ▏ 1.5581866e+07
9728   ▏ 1.963955e+06
10240  ▏ 6.2289e+06
10880  ▏ 3.8646e+06
12288  ▏ 2.1823942e+07
13568  ▏ 7.014784e+06
14336  ▏ 2.612276e+06
16384  ▏ 1.9442245e+07
18432  ▏ 3.851829e+06
19072  ▏ 1.08006e+06

Overall, it doesn't look like the service is doing large allocations. Is there something I'm still missing here?

@mknyszek
Copy link
Contributor

@rafael MemStats.BySize doesn't include allocations larger than 32 KiB and the runtime doesn't track them exactly, only a total. Fishing this out is going to be a bit tricky. We're probably looking for something on the order of tens of MiB. It might be more likely to show up in an alloc_space view of a heap profile? It's just another view of the heap profile whose screenshot you shared earlier, which is an inuse_space view; look for the --sample_index flag in the pprof tool's help section.

It's also still possible this is something else entirely, and that we're just getting "lucky" and the first possible preemption is at an allocation. We may need to just instrument the runtime to try to confirm this.

@rafael
Copy link
Author

rafael commented Nov 19, 2020

Cool! Let me do that. FWIW, I also manually instrumented the area of the vitess codebase were most of the allocations are happening in this program and also didn't notice large allocations.

@rafael
Copy link
Author

rafael commented Nov 19, 2020

@mknyszek uploaded the alloc_space view here for the same heap profile I shared earlier.

@mknyszek
Copy link
Contributor

mknyszek commented Nov 19, 2020

@rafael Cool! Thanks. If I'm reading this right, the following functions produce 6.12 MiB allocations:

proto.(*Buffer).grow
mysql.(*Conn).readOnePacket
mysql.(*Conn).parseRow
mysql.readLenEncStringAsBytesCopy

The top two appear at the top of the goroutine stacks you shared above. This is pretty good evidence that the non-preemptible allocation (and zeroing) is to blame. It would be nice to try and get closer to understanding whether its zeroing specifically that's the problem, or the operation as a whole.

Could you run go test -run="^$" -bench="BenchmarkMemclr/8M" runtime on the hardware/platform (or equivalent) you're running your application on and share the results? That will give us a ballpark estimate of how long the zeroing takes in your application.

@rafael
Copy link
Author

rafael commented Nov 19, 2020

Oh, oh this is interesting indeed!

Here the results of the benchmark:

goos: linux
goarch: amd64
pkg: runtime
BenchmarkMemclr/8M-96 	    3358	    356781 ns/op	23511.96 MB/s
PASS
ok  	runtime	1.251s

@rafael
Copy link
Author

rafael commented Nov 20, 2020

I have one more insight to share. I think I have confirmed that indeed this 6MB allocation is the one to blame for the long pauses. I did some tests where I'm not allowing allocations of these size and the long GC pauses go away. Here a before, during and after the test:

image

Basically, P75 goes from 2ms to less than 1ms.

@mknyszek is the benchmarck is telling you anything about the zeroing problem? I don't have a point of reference there, so not sure how to interpret the results 😅

@dr2chase
Copy link
Contributor

Any interest/ability in trying that CL? The latest version is good (it breaks up zeroing of non-pointer allocations into 1MiB chunks, correctly). There are other such loops to consider (slice copying, which includes if one is reallocated), but that is probably the one biting you.

@rafael
Copy link
Author

rafael commented Nov 20, 2020

For sure, I should be able to test it. Do you have a canonical guide I can follow to compile this patch locally?

@rafael
Copy link
Author

rafael commented Nov 20, 2020

I guess it should be this: https://golang.org/doc/install/source

@mknyszek
Copy link
Contributor

Thanks for the update! The benchmark results are interesting; it's probably worth noting that the memclr microbenchmark is kind of a "best case scenario" though it's a little more expensive than I expected.

I wrote a little benchmark for the large allocation hot path and I'm getting like 500ns at tip (which is a bit faster than Go 1.15 but on the order of tens of nanoseconds) on my VM so it's probably not that. I thought that it could be getting stuck in sweeping, but that's not it either since it would show up in the execution trace (we have explicit tracepoints for that).

Regarding your comment above (#42642 (comment)), I have a couple questions:

  1. When you say "I'm not allowing these allocations" what do you mean exactly? I worry that if it involves preventing certain kinds of work then it could be some other aspect of that work that's causing the problem.
  2. Your graph is titled "GC duration" which for a concurrent GC like ours generally doesn't mean much since the application is able to make progress during a GC. That is, aside from two short stop-the-world pauses during the GC cycle, one of which is definitely the actual cause of the problem here. What's the exact name of the metric you're plotting there? If it's actually "GC duration" and not "pause duration" under the hood, I'd instead look around for a metric with "pause" in the name. (With that being said, if the pauses are a major component of the full GC duration, which seems like it could be true here, then "GC duration" might still be useful, but it's somewhat indirect.)

@rafael
Copy link
Author

rafael commented Nov 20, 2020

@mknyszek, for sure. Thank y'all for all the help here!

Answers to the questions:

When you say "I'm not allowing these allocations" what do you mean exactly? I worry that if it involves preventing certain kinds of work then it could be some other aspect of that work that's causing the problem.

In this test that we are running, there was a query with a single row that returns 6MB of data. In the tablet implementation, this ends up being a single allocation. I stopped sending this query to the tablet. So they never get to MySQL and we never have to marshal/unmarshal responses of 6MB in a single row.

In terms of QPS, this query is less than 1% of all the load that we are generating on this test.

Let me know if this clarifies it.

Your graph is titled "GC duration" which for a concurrent GC like ours generally doesn't mean much since the application is able to make progress during a GC. That is, aside from two short stop-the-world pauses during the GC cycle, one of which is definitely the actual cause of the problem here. What's the exact name of the metric you're plotting there? If it's actually "GC duration" and not "pause duration" under the hood, I'd instead look around for a metric with "pause" in the name. (With that being said, if the pauses are a major component of the full GC duration, which seems like it could be true here, then "GC duration" might still be useful, but it's somewhat indirect.)

I posted that one, but I was also watching the GC traces. This is what I saw there:

gc 22 @214.977s 0%: 0.29+9.1+0.048 ms clock, 28+53/185/86+4.6 ms cpu, 2589->2597->122 MB, 2718 MB goal, 96 P
gc 23 @223.546s 0%: 0.95+9.3+0.063 ms clock, 91+21/153/46+6.0 ms cpu, 2444->2454->128 MB, 2567 MB goal, 96 P
gc 24 @233.406s 0%: 0.30+6.8+0.10 ms clock, 29+16/127/80+9.8 ms cpu, 2565->2572->120 MB, 2693 MB goal, 96 P
gc 25 @241.453s 0%: 0.24+6.7+0.076 ms clock, 23+29/126/92+7.3 ms cpu, 2418->2425->121 MB, 2538 MB goal, 96 P
gc 26 @251.056s 0%: 0.34+6.3+0.042 ms clock, 33+19/126/79+4.0 ms cpu, 2428->2435->120 MB, 2550 MB goal, 96 P
gc 27 @260.094s 0%: 0.25+8.0+0.090 ms clock, 24+21/151/37+8.6 ms cpu, 2401->2410->133 MB, 2521 MB goal, 96 P
gc 28 @270.569s 0%: 0.32+9.5+0.21 ms clock, 31+40/191/220+20 ms cpu, 2674->2684->134 MB, 2808 MB goal, 96 P
gc 29 @278.307s 0%: 0.28+13+0.34 ms clock, 27+50/276/280+33 ms cpu, 2688->2702->124 MB, 2822 MB goal, 96 P
gc 30 @285.458s 0%: 0.33+5.8+0.12 ms clock, 31+27/112/62+11 ms cpu, 2492->2497->121 MB, 2617 MB goal, 96 P
gc 31 @294.365s 0%: 0.40+8.0+0.13 ms clock, 38+38/165/117+12 ms cpu, 2436->2445->124 MB, 2558 MB goal, 96 P
gc 32 @302.676s 0%: 0.33+7.6+0.084 ms clock, 32+33/140/103+8.1 ms cpu, 2483->2493->128 MB, 2608 MB goal, 96 P
gc 33 @311.432s 0%: 0.30+6.9+0.27 ms clock, 29+12/135/75+26 ms cpu, 2563->2569->132 MB, 2690 MB goal, 96 P
gc 34 @320.904s 0%: 0.36+6.3+0.12 ms clock, 34+24/132/110+11 ms cpu, 2656->2664->131 MB, 2788 MB goal, 96 P
gc 35 @331.057s 0%: 0.27+6.6+0.10 ms clock, 25+32/138/73+10 ms cpu, 2636->2645->136 MB, 2768 MB goal, 96 P
gc 36 @340.559s 0%: 0.29+7.4+0.10 ms clock, 28+32/147/140+10 ms cpu, 2722->2731->135 MB, 2858 MB goal, 96 P
gc 37 @350.143s 0%: 0.37+12+0.077 ms clock, 35+15/148/112+7.4 ms cpu, 2719->2732->147 MB, 2855 MB goal, 96 P
gc 38 @360.296s 0%: 0.31+7.3+0.071 ms clock, 30+14/148/154+6.9 ms cpu, 2953->2960->128 MB, 3101 MB goal, 96 P
gc 39 @370.494s 0%: 0.30+7.2+0.11 ms clock, 29+39/149/101+10 ms cpu, 2570->2582->127 MB, 2699 MB goal, 96 P
gc 40 @379.534s 0%: 0.29+7.3+0.12 ms clock, 28+39/147/113+12 ms cpu, 2559->2566->138 MB, 2687 MB goal, 96 P
gc 41 @389.935s 0%: 0.31+7.5+0.12 ms clock, 30+17/153/81+12 ms cpu, 2761->2770->132 MB, 2899 MB goal, 96 P
gc 42 @398.867s 0%: 0.35+7.7+0.095 ms clock, 34+34/152/128+9.1 ms cpu, 2653->2662->130 MB, 2785 MB goal, 96 P
gc 43 @408.573s 0%: 0.28+6.7+0.26 ms clock, 27+15/140/91+25 ms cpu, 2612->2620->133 MB, 2743 MB goal, 96 P
gc 44 @418.369s 0%: 0.33+7.1+0.10 ms clock, 32+22/143/62+9.7 ms cpu, 2671->2675->134 MB, 2804 MB goal, 96 P
gc 45 @429.916s 0%: 0.30+6.3+0.15 ms clock, 29+24/128/94+15 ms cpu, 2694->2701->132 MB, 2828 MB goal, 96 P
gc 46 @440.502s 0%: 0.32+7.9+0.10 ms clock, 31+19/156/121+10 ms cpu, 2656->2663->128 MB, 2789 MB goal, 96 P
gc 47 @449.657s 0%: 0.28+14+0.17 ms clock, 27+26/142/131+17 ms cpu, 2579->2592->130 MB, 2708 MB goal, 96 P
gc 48 @459.646s 0%: 0.27+6.4+0.10 ms clock, 26+15/130/126+10 ms cpu, 2610->2615->130 MB, 2741 MB goal, 96 P
gc 49 @468.934s 0%: 0.46+7.7+0.11 ms clock, 44+39/155/80+10 ms cpu, 2610->2620->131 MB, 2741 MB goal, 96 P
gc 50 @479.104s 0%: 0.28+6.9+0.073 ms clock, 27+25/139/102+7.0 ms cpu, 2631->2639->129 MB, 2763 MB goal, 96 P
gc 51 @489.255s 0%: 0.29+8.5+0.15 ms clock, 28+15/158/130+14 ms cpu, 2581->2590->133 MB, 2710 MB goal, 96 P
gc 52 @498.449s 0%: 0.26+6.9+0.17 ms clock, 25+21/137/113+16 ms cpu, 2678->2684->131 MB, 2812 MB goal, 96 P
gc 53 @508.348s 0%: 0.38+7.2+0.17 ms clock, 37+30/148/112+16 ms cpu, 2621->2628->131 MB, 2752 MB goal, 96 P
gc 54 @517.727s 0%: 0.35+6.9+0.058 ms clock, 34+13/132/135+5.5 ms cpu, 2623->2628->135 MB, 2754 MB goal, 96 P
gc 55 @526.814s 0%: 0.29+7.7+0.087 ms clock, 27+31/160/78+8.3 ms cpu, 2707->2716->134 MB, 2842 MB goal, 96 P
gc 56 @535.603s 0%: 0.30+6.8+0.12 ms clock, 28+29/139/96+12 ms cpu, 2685->2696->131 MB, 2819 MB goal, 96 P
gc 57 @544.671s 0%: 0.29+8.0+0.11 ms clock, 28+26/134/80+11 ms cpu, 2634->2648->132 MB, 2766 MB goal, 96 P
gc 58 @553.706s 0%: 0.32+6.8+0.10 ms clock, 31+26/133/44+9.8 ms cpu, 2647->2654->135 MB, 2779 MB goal, 96 P
gc 59 @562.722s 0%: 0.30+8.4+0.10 ms clock, 29+19/154/92+10 ms cpu, 2707->2717->132 MB, 2842 MB goal, 96 P
gc 60 @572.005s 0%: 0.38+7.0+0.12 ms clock, 37+21/138/110+12 ms cpu, 2642->2649->132 MB, 2774 MB goal, 96 P
gc 61 @582.054s 0%: 0.34+7.8+0.19 ms clock, 32+50/156/157+18 ms cpu, 2658->2668->133 MB, 2791 MB goal, 96 P
gc 62 @591.478s 0%: 0.34+13+0.059 ms clock, 33+84/189/120+5.7 ms cpu, 2661->2672->129 MB, 2794 MB goal, 96 P
gc 63 @601.242s 0%: 0.38+8.6+0.079 ms clock, 36+16/148/112+7.6 ms cpu, 2589->2598->145 MB, 2719 MB goal, 96 P
gc 64 @612.071s 0%: 0.24+6.6+0.18 ms clock, 23+30/137/84+17 ms cpu, 2910->2918->136 MB, 3055 MB goal, 96 P

Before I was never able to get < 1ms STW Sweep termination in a consistent way.

Does this seem like a good test to you? Should be watching something else?

@mknyszek
Copy link
Contributor

Re: just disabling a specific kind of query and it being only 1% of total load is very useful information. I assume other queries go down a similar code path? And the GC trace I think confirms it because it actually includes the pause times. I think that's pretty good evidence that it's the large allocation itself.

@rafael
Copy link
Author

rafael commented Nov 20, 2020

Correct, all the other queries go down the exact same path.

@dr2chase
Copy link
Contributor

@rafael Oh, sorry, forgot. If compiling and testing is local to you:

cd someplace
git clone https://go.googlesource.com/go
cd go
git fetch https://go.googlesource.com/go refs/changes/43/270943/2 && git checkout FETCH_HEAD
cd src
./make.bash

then put someplace/go/bin on your path.

@rafael
Copy link
Author

rafael commented Nov 20, 2020

@dr2chase - Testing the compiled version in the exact environment will take me a bit longer. However, now that we know that it's this specific workload, I'm able to easily reproduce the issue in my local environment.

This is how the gc traces look without your change:

gc 4 @0.722s 0%: 0.11+1.4+0.006 ms clock, 0.91+1.4/2.6/2.7+0.055 ms cpu, 6->7->4 MB, 7 MB goal, 8 P
gc 5 @15.255s 0%: 0.047+3.5+0.028 ms clock, 0.38+1.6/5.7/3.2+0.22 ms cpu, 8->9->7 MB, 9 MB goal, 8 P
gc 6 @15.268s 0%: 0.007+1.7+0.006 ms clock, 0.058+0/3.1/7.0+0.048 ms cpu, 15->15->14 MB, 16 MB goal, 8 P
gc 7 @15.286s 0%: 0.23+39+0.24 ms clock, 1.9+12/17/0+1.9 ms cpu, 126->182->174 MB, 135 MB goal, 8 P
gc 8 @15.327s 0%: 7.2+27+0.20 ms clock, 58+7.4/17/0+1.6 ms cpu, 182->302->270 MB, 348 MB goal, 8 P
gc 9 @15.363s 0%: 6.4+106+2.2 ms clock, 51+2.0/31/0+17 ms cpu, 334->888->799 MB, 540 MB goal, 8 P
gc 10 @15.480s 0%: 0.79+14+1.2 ms clock, 6.3+4.4/10/4.4+10 ms cpu, 815->863->535 MB, 1599 MB goal, 8 P
gc 11 @17.021s 0%: 0.010+6.1+0.014 ms clock, 0.086+0/3.2/8.2+0.11 ms cpu, 784->800->455 MB, 1070 MB goal, 8 P
gc 12 @17.442s 0%: 0.016+9.7+0.031 ms clock, 0.13+0/6.0/9.2+0.24 ms cpu, 784->865->190 MB, 910 MB goal, 8 P
gc 13 @17.483s 0%: 1.4+7.8+0.065 ms clock, 11+2.5/6.9/0.36+0.52 ms cpu, 343->383->286 MB, 381 MB goal, 8 P
gc 14 @17.519s 0%: 2.6+12+0.58 ms clock, 20+2.3/10/0+4.7 ms cpu, 511->600->447 MB, 573 MB goal, 8 P
gc 15 @18.520s 0%: 0.012+4.1+1.1 ms clock, 0.10+0/3.1/6.5+9.5 ms cpu, 784->816->511 MB, 894 MB goal, 8 P
gc 16 @19.341s 0%: 5.2+1.8+0.007 ms clock, 41+1.9/3.4/4.2+0.060 ms cpu, 953->961->230 MB, 1023 MB goal, 8 P
gc 17 @19.380s 0%: 2.0+5.0+0.019 ms clock, 16+2.2/6.6/1.0+0.15 ms cpu, 440->456->319 MB, 461 MB goal, 8 P
gc 18 @19.417s 0%: 1.6+13+0.11 ms clock, 12+8.7/18/0+0.94 ms cpu, 616->712->575 MB, 638 MB goal, 8 P
gc 19 @20.528s 0%: 0.018+2.1+0.009 ms clock, 0.14+0/3.9/7.9+0.076 ms cpu, 1033->1033->487 MB, 1151 MB goal, 8 P
gc 20 @21.192s 0%: 1.4+11+0.015 ms clock, 11+5.6/8.5/0+0.12 ms cpu, 955->1011->400 MB, 975 MB goal, 8 P
gc 21 @21.338s 0%: 5.9+12+0.041 ms clock, 47+4.6/14/0+0.33 ms cpu, 761->793->544 MB, 801 MB goal, 8 P
gc 22 @22.358s 0%: 3.5+8.1+0.009 ms clock, 28+1.5/5.1/3.5+0.077 ms cpu, 1025->1042->520 MB, 1088 MB goal, 8 P
gc 23 @22.900s 0%: 0.51+2.9+0.007 ms clock, 4.1+0.22/5.8/8.2+0.056 ms cpu, 1002->1002->391 MB, 1040 MB goal, 8 P
gc 24 @23.166s 0%: 0.12+3.7+0.007 ms clock, 0.98+0/4.8/6.2+0.061 ms cpu, 777->777->504 MB, 783 MB goal, 8 P
gc 25 @24.212s 0%: 0.13+3.5+0.008 ms clock, 1.1+0.37/6.6/3.4+0.071 ms cpu, 1010->1010->520 MB, 1011 MB goal, 8 P
gc 26 @24.677s 0%: 1.0+7.7+0.013 ms clock, 8.6+1.9/7.9/0.072+0.10 ms cpu, 1018->1042->408 MB, 1041 MB goal, 8 P
gc 27 @25.224s 0%: 0.91+5.3+1.2 ms clock, 7.2+0.39/6.0/4.0+10 ms cpu, 818->826->512 MB, 819 MB goal, 8 P
gc 28 @26.130s 0%: 1.3+9.2+0.013 ms clock, 11+5.2/6.3/0.48+0.10 ms cpu, 1002->1042->552 MB, 1025 MB goal, 8 P
gc 29 @26.529s 0%: 6.5+19+0.020 ms clock, 52+4.3/6.6/1.7+0.16 ms cpu, 1082->1139->536 MB, 1105 MB goal, 8 P
gc 30 @27.587s 0%: 0.78+3.9+0.027 ms clock, 6.2+0/4.9/5.8+0.22 ms cpu, 1026->1026->513 MB, 1073 MB goal, 8 P
gc 31 @28.112s 0%: 0.011+1.7+0.008 ms clock, 0.092+0/3.1/6.9+0.065 ms cpu, 1002->1002->384 MB, 1026 MB goal, 8 P
gc 32 @28.383s 0%: 0.18+9.2+0.034 ms clock, 1.5+2.4/6.3/3.7+0.27 ms cpu, 754->778->521 MB, 769 MB goal, 8 P
gc 33 @29.514s 0%: 0.010+16+1.4 ms clock, 0.087+1.1/4.6/4.8+11 ms cpu, 1019->1115->545 MB, 1042 MB goal, 8 P
gc 34 @29.938s 0%: 1.4+14+0.010 ms clock, 11+5.5/21/4.4+0.083 ms cpu, 1035->1091->529 MB, 1090 MB goal, 8 P
gc 35 @30.940s 0%: 0.009+8.2+0.011 ms clock, 0.079+0/6.2/6.5+0.094 ms cpu, 1003->1067->529 MB, 1058 MB goal, 8 P
gc 36 @31.416s 0%: 1.7+8.2+0.008 ms clock, 13+3.2/7.3/0+0.070 ms cpu, 1011->1067->545 MB, 1059 MB goal, 8 P
gc 37 @32.007s 0%: 0.012+11+0.075 ms clock, 0.10+0/4.1/7.5+0.60 ms cpu, 1027->1091->529 MB, 1091 MB goal, 8 P
gc 38 @32.919s 0%: 0.17+3.9+0.009 ms clock, 1.4+0.58/5.7/3.5+0.073 ms cpu, 1011->1011->489 MB, 1059 MB goal, 8 P
gc 39 @33.307s 0%: 1.5+3.8+0.034 ms clock, 12+0.33/4.4/7.7+0.27 ms cpu, 955->955->481 MB, 979 MB goal, 8 P
gc 40 @34.245s 0%: 0.13+10+1.6 ms clock, 1.1+2.7/6.7/0.21+13 ms cpu, 939->1003->529 MB, 963 MB goal, 8 P
gc 41 @34.691s 0%: 0.55+6.2+0.031 ms clock, 4.4+4.2/6.8/0+0.25 ms cpu, 1027->1059->537 MB, 1059 MB goal, 8 P
gc 42 @35.232s 0%: 0.014+2.1+0.007 ms clock, 0.11+0/3.7/7.1+0.061 ms cpu, 1027->1027->465 MB, 1075 MB goal, 8 P
gc 43 @36.122s 0%: 1.8+2.6+0.008 ms clock, 14+0.34/4.4/8.1+0.065 ms cpu, 915->915->441 MB, 931 MB goal, 8 P
gc 44 @36.447s 0%: 0.012+2.4+0.044 ms clock, 0.10+0/4.2/5.9+0.35 ms cpu, 867->867->449 MB, 883 MB goal, 8 P
gc 45 @36.859s 0%: 1.8+3.0+0.032 ms clock, 15+0.52/5.7/10+0.26 ms cpu, 883->891->497 MB, 899 MB goal, 8 P
gc 46 @37.824s 0%: 1.8+11+0.010 ms clock, 15+5.9/9.5/0.050+0.083 ms cpu, 979->1035->529 MB, 995 MB goal, 8 P
gc 47 @38.195s 0%: 1.4+3.1+0.005 ms clock, 11+0.63/5.1/7.1+0.047 ms cpu, 1012->1012->474 MB, 1059 MB goal, 8 P
gc 48 @38.964s 0%: 1.1+6.6+0.011 ms clock, 9.0+0.13/8.3/8.3+0.094 ms cpu, 932->940->529 MB, 949 MB goal, 8 P
gc 49 @39.666s 0%: 1.7+3.3+0.007 ms clock, 13+0.44/5.9/2.2+0.063 ms cpu, 1035->1035->449 MB, 1059 MB goal, 8 P
gc 50 @40.027s 0%: 0.013+2.5+0.006 ms clock, 0.10+0/4.6/8.0+0.050 ms cpu, 883->883->473 MB, 899 MB goal, 8 P
gc 51 @40.857s 0%: 0.50+3.3+0.006 ms clock, 4.0+0.39/6.4/6.7+0.050 ms cpu, 931->931->521 MB, 947 MB goal, 8

Now a similar test with your changes (go version at devel +affa92dc9d):

gc 170 @118.010s 0%: 0.011+8.3+0.010 ms clock, 0.092+3.0/8.7/2.2+0.082 ms cpu, 987->1035->505 MB, 1011 MB goal, 8 P
gc 171 @118.932s 0%: 1.1+7.6+1.2 ms clock, 8.9+2.4/5.1/0.027+9.9 ms cpu, 979->1035->529 MB, 1011 MB goal, 8 P
gc 1 @0.023s 1%: 0.031+0.64+0.17 ms clock, 0.25+0.26/0.62/0.77+1.3 ms cpu, 4->4->1 MB, 5 MB goal, 8 P
gc 2 @0.030s 1%: 0.057+1.0+0.037 ms clock, 0.46+0.12/1.3/2.5+0.30 ms cpu, 4->4->2 MB, 5 MB goal, 8 P
gc 3 @0.067s 1%: 0.026+0.91+0.035 ms clock, 0.21+0.20/1.3/2.5+0.28 ms cpu, 4->4->2 MB, 5 MB goal, 8 P
gc 4 @0.100s 1%: 0.024+1.7+0.004 ms clock, 0.19+0.22/2.2/1.7+0.037 ms cpu, 5->5->3 MB, 6 MB goal, 8 P
gc 5 @0.213s 0%: 0.035+2.0+0.081 ms clock, 0.28+1.0/2.8/3.1+0.65 ms cpu, 6->6->5 MB, 7 MB goal, 8 P
gc 6 @37.454s 0%: 0.093+9.0+0.005 ms clock, 0.74+0.29/9.7/0.19+0.043 ms cpu, 9->9->7 MB, 10 MB goal, 8 P
gc 7 @77.168s 0%: 0.048+1.6+0.032 ms clock, 0.38+0/3.0/8.5+0.26 ms cpu, 19->19->15 MB, 20 MB goal, 8 P
gc 8 @77.186s 0%: 0.27+16+0.005 ms clock, 2.1+14/27/0+0.046 ms cpu, 47->48->38 MB, 48 MB goal, 8 P
gc 9 @77.204s 0%: 0.38+44+5.8 ms clock, 3.0+5.0/24/0+46 ms cpu, 158->294->294 MB, 159 MB goal, 8 P
gc 10 @77.279s 0%: 8.7+49+0.035 ms clock, 70+3.4/58/0.019+0.28 ms cpu, 471->647->543 MB, 589 MB goal, 8 P
gc 11 @77.385s 0%: 2.3+8.6+0.014 ms clock, 18+0.39/10/6.6+0.11 ms cpu, 872->872->495 MB, 1086 MB goal, 8 P
gc 12 @79.250s 0%: 0.19+33+0.054 ms clock, 1.5+0.64/45/6.1+0.43 ms cpu, 898->922->159 MB, 991 MB goal, 8 P
gc 13 @79.325s 0%: 2.7+17+0.008 ms clock, 21+0/16/0.070+0.067 ms cpu, 312->345->199 MB, 319 MB goal, 8 P
gc 14 @79.382s 0%: 5.2+16+0.007 ms clock, 42+5.7/17/0.15+0.063 ms cpu, 368->440->359 MB, 398 MB goal, 8 P
gc 15 @79.435s 0%: 0.41+14+0.007 ms clock, 3.3+0.29/16/0+0.056 ms cpu, 624->640->519 MB, 718 MB goal, 8 P
gc 16 @80.943s 0%: 0.23+48+0.006 ms clock, 1.8+4.5/86/2.0+0.053 ms cpu, 985->1049->528 MB, 1039 MB goal, 8 P
gc 17 @81.318s 0%: 1.0+22+0.16 ms clock, 8.4+1.3/7.0/0+1.3 ms cpu, 969->1074->415 MB, 1056 MB goal, 8 P
gc 18 @81.907s 0%: 0.052+3.5+0.16 ms clock, 0.42+0/5.2/8.2+1.3 ms cpu, 769->777->504 MB, 831 MB goal, 8 P
gc 19 @82.600s 0%: 0.36+8.4+0.26 ms clock, 2.8+0.16/15/8.1+2.0 ms cpu, 962->978->520 MB, 1008 MB goal, 8 P
gc 20 @83.198s 0%: 0.046+35+0.086 ms clock, 0.36+0.89/14/43+0.69 ms cpu, 1010->1154->504 MB, 1040 MB goal, 8 P
gc 21 @84.089s 0%: 0.061+3.9+0.26 ms clock, 0.49+0/5.2/7.8+2.0 ms cpu, 938->970->464 MB, 1008 MB goal, 8 P
gc 22 @84.681s 0%: 0.19+10+0.067 ms clock, 1.5+0.98/14/0.12+0.53 ms cpu, 898->930->528 MB, 928 MB goal, 8 P
gc 23 @84.959s 0%: 0.16+6.8+0.003 ms clock, 1.3+1.2/11/4.5+0.028 ms cpu, 1018->1026->520 MB, 1057 MB goal, 8 P
gc 24 @85.836s 0%: 0.20+4.6+0.17 ms clock, 1.6+0.26/8.2/3.8+1.3 ms cpu, 1010->1018->520 MB, 1041 MB goal, 8 P
gc 25 @86.524s 0%: 0.046+4.5+1.6 ms clock, 0.37+0/5.4/9.9+12 ms cpu, 1018->1050->472 MB, 1041 MB goal, 8 P
gc 26 @87.167s 0%: 0.38+2.2+0.003 ms clock, 3.0+6.7/4.2/0.53+0.026 ms cpu, 938->938->553 MB, 945 MB goal, 8 P
gc 27 @87.953s 0%: 0.12+4.1+0.003 ms clock, 1.0+0/7.6/6.6+0.026 ms cpu, 1083->1083->505 MB, 1106 MB goal, 8 P
gc 28 @88.366s 0%: 0.49+18+0.15 ms clock, 3.9+2.0/6.5/0+1.2 ms cpu, 986->1099->561 MB, 1010 MB goal, 8 P
gc 29 @89.229s 0%: 0.20+8.3+0.040 ms clock, 1.6+1.7/15/9.7+0.32 ms cpu, 1059->1075->529 MB, 1122 MB goal, 8 P
gc 30 @89.933s 0%: 0.36+14+0.005 ms clock, 2.9+3.1/17/0+0.043 ms cpu, 1019->1091->553 MB, 1058 MB goal, 8 P
gc 31 @90.770s 0%: 0.34+10+2.3 ms clock, 2.7+0.42/9.4/14+19 ms cpu, 1051->1091->497 MB, 1107 MB goal, 8 P
gc 32 @91.379s 0%: 0.34+7.7+2.7 ms clock, 2.7+1.1/5.6/4.1+21 ms cpu, 955->995->505 MB, 994 MB goal, 8 P
gc 33 @91.731s 0%: 0.44+6.1+1.2 ms clock, 3.5+2.4/11/4.2+10 ms cpu, 995->1003->529 MB, 1011 MB goal, 8 P
gc 34 @92.562s 0%: 0.54+3.3+0.15 ms clock, 4.3+0.70/4.0/6.2+1.2 ms cpu, 1027->1035->513 MB, 1059 MB goal, 8 P
gc 35 @93.265s 0%: 0.16+9.7+0.62 ms clock, 1.2+0/12/13+4.9 ms cpu, 1003->1028->481 MB, 1027 MB goal, 8 P
gc 36 @94.091s 0%: 0.042+5.6+0.14 ms clock, 0.34+0/2.9/10+1.1 ms cpu, 947->955->433 MB, 963 MB goal, 8 P
gc 37 @94.566s 0%: 0.26+9.4+0.006 ms clock, 2.1+1.7/9.7/0+0.054 ms cpu, 851->891->537 MB, 867 MB goal, 8 P
gc 38 @95.017s 0%: 0.18+4.9+0.004 ms clock, 1.4+0.22/8.9/5.3+0.037 ms cpu, 1051->1051->465 MB, 1075 MB goal, 8 P
gc 39 @95.827s 0%: 0.21+12+0.20 ms clock, 1.7+3.8/5.7/0.22+1.6 ms cpu, 908->1004->506 MB, 931 MB goal, 8 P
gc 40 @96.415s 0%: 0.25+8.3+0.004 ms clock, 2.0+0/15/8.5+0.039 ms cpu, 972->972->513 MB, 1013 MB goal, 8 P
gc 41 @97.072s 0%: 0.11+1.9+0.004 ms clock, 0.90+0/3.6/6.5+0.034 ms cpu, 987->987->457 MB, 1027 MB goal, 8 P
gc 42 @97.577s 0%: 0.86+15+0.59 ms clock, 6.9+0/10/0+4.7 ms cpu, 907->947->529 MB, 915 MB goal, 8 P
gc 43 @98.274s 0%: 0.26+7.0+0.003 ms clock, 2.0+0/11/6.0+0.028 ms cpu, 1027->1051->521 MB, 1059 MB goal, 8 P
gc 44 @99.170s 0%: 0.091+2.7+0.003 ms clock, 0.72+4.8/4.0/0.34+0.029 ms cpu, 1011->1011->481 MB, 1043 MB goal, 8 P
gc 45 @99.800s 0%: 0.044+3.5+0.40 ms clock, 0.35+0/5.6/9.0+3.2 ms cpu, 947->955->449 MB, 963 MB goal, 8 P
gc 46 @100.090s 0%: 0.24+7.6+0.092 ms clock, 1.9+0/12/5.1+0.73 ms cpu, 883->891->489 MB, 899 MB goal, 8 P
gc 47 @100.974s 0%: 0.063+2.0+0.011 ms clock, 0.50+1.5/3.4/5.2+0.088 ms cpu, 955->955->425 MB, 979 MB goal, 8 P
gc 48 @101.436s 0%: 0.18+6.0+0.005 ms clock, 1.5+0/9.0/8.2+0.045 ms cpu, 835->835->505 MB, 851 MB goal, 8 P
gc 49 @101.823s 0%: 0.050+2.7+0.15 ms clock, 0.40+0/4.0/7.6+1.2 ms cpu, 987->995->449 MB, 1011 MB goal, 8 P
gc 50 @102.615s 0%: 0.44+5.5+0.14 ms clock, 3.5+5.0/7.2/6.0+1.1 ms cpu, 883->899->489 MB, 899 MB goal, 8 P
gc 51 @103.176s 0%: 0.54+6.7+0.17 ms clock, 4.3+9.0/11/0.34+1.4 ms cpu, 955->971->537 MB, 979 MB goal, 8 P
gc 52 @103.791s 0%: 0.19+10+0.13 ms clock, 1.5+1.4/12/0+1.0 ms cpu, 1051->1099->537 MB, 1075 MB goal, 8 P
gc 53 @104.618s 0%: 0.26+6.0+0.15 ms clock, 2.1+0.54/10/2.4+1.2 ms cpu, 1051->1059->529 MB, 1075 MB goal, 8 P
gc 54 @105.068s 0%: 0.069+2.4+0.005 ms clock, 0.55+0/4.1/6.3+0.043 ms cpu, 1035->1035->481 MB, 1059 MB goal, 8 P
gc 55 @105.883s 0%: 0.31+3.9+0.004 ms clock, 2.5+4.2/6.2/0+0.037 ms cpu, 971->971->497 MB, 972 MB goal, 8 P
gc 56 @106.499s 0%: 0.060+3.0+0.006 ms clock, 0.48+0/4.7/6.6+0.052 ms cpu, 971->971->465 MB, 995 MB goal, 8 P
gc 57 @106.787s 0%: 0.20+7.2+0.30 ms clock, 1.6+0.75/4.5/4.3+2.4 ms cpu, 915->939->481 MB, 931 MB goal, 8 P
gc 58 @107.611s 0%: 0.27+5.4+0.004 ms clock, 2.2+6.3/10/0.16+0.033 ms cpu, 963->963->481 MB, 964 MB goal, 8 P
gc 59 @108.194s 0%: 0.043+10+0.17 ms clock, 0.35+0/6.4/7.7+1.3 ms cpu, 939->1011->529 MB, 963 MB goal, 8 P
gc 60 @108.798s 0%: 0.62+4.0+0.007 ms clock, 5.0+1.2/5.2/4.3+0.056 ms cpu, 1012->1020->506 MB, 1059 MB goal, 8 P
gc 61 @109.608s 0%: 0.45+5.0+0.17 ms clock, 3.6+4.4/6.7/2.4+1.4 ms cpu, 1004->1020->529 MB, 1013 MB goal, 8 P
gc 62 @110.037s 0%: 3.8+5.2+0.52 ms clock, 30+0.54/9.6/1.9+4.2 ms cpu, 1027->1027->497 MB, 1059 MB goal, 8 P
gc 63 @110.886s 0%: 0.27+10+0.006 ms clock, 2.2+5.6/12/0+0.049 ms cpu, 979->1027->537 MB, 995 MB goal, 8 P
gc 64 @111.547s 0%: 0.25+7.2+0.18 ms clock, 2.0+0.38/11/8.3+1.5 ms cpu, 1043->1051->489 MB, 1075 MB goal, 8 P
gc 65 @112.164s 0%: 0.062+11+0.93 ms clock, 0.50+0/3.8/12+7.4 ms cpu, 955->1035->537 MB, 979 MB goal, 8 P
gc 66 @113.007s 0%: 0.17+6.2+0.19 ms clock, 1.4+0.97/11/4.5+1.5 ms cpu, 1035->1051->521 MB, 1075 MB goal, 8 P
gc 67 @113.403s 0%: 0.53+7.0+0.004 ms clock, 4.2+2.7/13/4.0+0.038 ms cpu, 1003->1019->505 MB, 1043 MB goal, 8 P
gc 68 @114.276s 0%: 0.26+6.1+0.051 ms clock, 2.1+0.48/8.2/10+0.41 ms cpu, 995->995->513 MB, 1011 MB goal, 8 P
gc 69 @114.938s 0%: 0.36+11+0.005 ms clock, 2.8+2.7/5.1/3.2+0.043 ms cpu, 1003->1075->537 MB, 1027 MB goal, 8 P
gc 70 @115.509s 0%: 0.24+7.6+0.16 ms clock, 1.9+0.49/11/9.8+1.3 ms cpu, 1051->1059->529 MB, 1075 MB goal, 8 P

The jumps between 10 and 15 is when I start the test. But after that it seems to be stable with STW pauses taking less than 1ms.

It seems that your patch fixes this problem 😀

@rafael
Copy link
Author

rafael commented Nov 20, 2020

To be clear, the local test, it's a vitess benchmark that reads from a table that returns a row of 6MB. Besides the environment, the main difference with the other test I've been running is that I'm only generating workloads with the problematic query.

@dr2chase
Copy link
Contributor

Wonderful, thanks for testing this. I may (this is allegedly not my highest priority, but I hate these GC-hiccup-bugs) take a shot at those, too. Easy case is pointer-free, but that's what gets used for things like "I have a 10 MB file to read, here is a buffer, do it".

bobotu pushed a commit to bobotu/go that referenced this issue Dec 2, 2020
…tion

If something "huge" is allocated, and the zeroing is trivial (no pointers
involved) then zero it by lumps in a loop so that preemption can occur,
not all in a single non-preemptible call.

Updates golang#42642.

Change-Id: I94015e467eaa098c59870e479d6d83bc88efbfb4
@dr2chase
Copy link
Contributor

Random question, heading for 1.17 freeze.
How many processors are on the box doing the benchmarking?
Useful answers are 1, 2, many -- for some test experiments there are performance regressions at 1 or 2 processors, but these are of course weird microbenchmarks with two goroutines hammering on the garbage collector.

@rafael
Copy link
Author

rafael commented Apr 29, 2021

Hi!

We were testing in a box with many processors (64 to be exact).

gopherbot pushed a commit that referenced this issue Apr 30, 2021
…tion

If something "huge" is allocated, and the zeroing is trivial (no pointers
involved) then zero it by chunks in a loop so that preemption can occur,
not all in a single non-preemptible call.

Benchmarking suggests that 256K is the best chunk size.

Updates #42642.

Change-Id: I94015e467eaa098c59870e479d6d83bc88efbfb4
Reviewed-on: https://go-review.googlesource.com/c/go/+/270943
Trust: David Chase <[email protected]>
Run-TryBot: David Chase <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
@dr2chase
Copy link
Contributor

I think this is closed, since the CL solved the problem when tested, and the CL is in (and has been for some time).

@golang golang locked and limited conversation to collaborators Jul 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants