util/log: Speed up vmodule support and warn about it in help text #29017

a-robinson · 2018-08-23T20:44:58Z

It's possible that with the second commit the help text change is no longer needed, but I'm curious what you think about that. I haven't tried re-running tpc-c 10k with this, so while I'm pretty sure this is a lot better I don't know how much of a performance hit there still is.

This bit me pretty bad when doing performance testing. I turned on vmodule for a file which hardly logs at all, but all the calls to log.VDepth slowed the system down to a crawl. Alternatively we could improve log.VDepth to make this less of a concern. It's already been handcrafted to avoid allocations, but holding the logging lock while calling runtime.Callers is not good. Release note: None

cockroach-teamcity · 2018-08-23T20:45:05Z

This change is

petermattis

Pro-tip: run with TESTFLAGS="-count 10" and then compare the output of the two runs with benchstat.

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/util/log/clog.go, line 1401 at r2 (raw file):

	// Here is another cheap but safe test to see if vmodule is enabled.
	if atomic.LoadInt32(&logging.filterLength) > 0 {
		pcs := logging.pcsPool.Get().([1]uintptr)

I don't see a call to logging.pcsPool.Put which makes me suspect that you're allocating on every call here. You can also run with TESTFLAGS="-benchmem" to see the number of allocations per iteration.

pkg/util/log/clog.go, line 1410 at r2 (raw file):

			v = logging.setV(pcs[0])
		}
		logging.mu.Unlock()

I think there should be a call to logging.pcsPool.Put(pcs) here.

a-robinson

Benchstat output with -count 10:

name                 old time/op    new time/op    delta
VDepthWithVModule-8     790ns ± 1%     254ns ± 6%  -67.83%  (p=0.000 n=9+10)

name                 old alloc/op   new alloc/op   delta
VDepthWithVModule-8     0.00B          0.00B          ~     (all equal)

name                 old allocs/op  new allocs/op  delta
VDepthWithVModule-8      0.00           0.00          ~     (all equal)

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/util/log/clog.go, line 1401 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I don't see a call to logging.pcsPool.Put which makes me suspect that you're allocating on every call here. You can also run with TESTFLAGS="-benchmem" to see the number of allocations per iteration.

🤦Thanks, fixed.

Oddly enough, -benchmem actually doesn't show any allocations without the Put calls. Seems like an accounting bug of some sort, although we could just be misunderstanding something.

Including the Puts is obviously what we want, though, and it performs 10-20% better than without them so they clearly have some effect.

pkg/util/log/clog.go, line 1410 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think there should be a call to logging.pcsPool.Put(pcs) here.

Done. And in the early return case.

By using a sync.Pool to manage the PC buffers for us. This avoids locking while calling runtime.Callers without needing to allocate a new buffer every time VDepth is called. Take the benchmarks with a grain of salt since they're just from my laptop, but Before the change: $ make bench PKG=./pkg/util/log/ BENCHES=BenchmarkVDepthWithVModule BenchmarkVDepthWithVModule-8 2000000 825 ns/op After the change: $ make bench PKG=./pkg/util/log/ BENCHES=BenchmarkVDepthWithVModule BenchmarkVDepthWithVModule-8 5000000 284 ns/op Release note: None

petermattis

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

a-robinson · 2018-08-24T15:40:32Z

bors r+

29017: util/log: Speed up vmodule support and warn about it in help text r=a-robinson a=a-robinson It's possible that with the second commit the help text change is no longer needed, but I'm curious what you think about that. I haven't tried re-running tpc-c 10k with this, so while I'm pretty sure this is a lot better I don't know how much of a performance hit there still is. Co-authored-by: Alex Robinson <[email protected]>

craig · 2018-08-24T16:02:06Z

Build succeeded

GitHub CI (Cockroach)

29049: backport-2.1: util/log: Speed up vmodule support and warn about it in help text r=a-robinson a=a-robinson Backport 2/2 commits from #29017. /cc @cockroachdb/release --- It's possible that with the second commit the help text change is no longer needed, but I'm curious what you think about that. I haven't tried re-running tpc-c 10k with this, so while I'm pretty sure this is a lot better I don't know how much of a performance hit there still is. Co-authored-by: Alex Robinson <[email protected]>

Fixes cockroachdb#17770. This commit fixes the last user of `runtime.Callers` that misused the stdlib function by translating the PC values it returned directly into symbolic information (see https://golang.org/pkg/runtime/#Callers). Go's documentation warns that this is a recipe for disaster when mixed with mid-stack inlining. The other concern in cockroachdb#17770 was this comment: cockroachdb#17770 (comment). This was discussed in golang/go#29582 and addressed in golang/go@956879d. An alternative would be to use `runtime.Caller` here, but that would force an allocation that was hard-earned in cockroachdb#29017. Instead, this commit avoids any performance hit. ``` name old time/op new time/op delta Header-4 267ns ± 1% 268ns ± 0% ~ (p=0.584 n=10+10) VDepthWithVModule-4 260ns ± 3% 255ns ± 1% -1.87% (p=0.018 n=10+9) name old alloc/op new alloc/op delta Header-4 0.00B 0.00B ~ (all equal) VDepthWithVModule-4 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Header-4 0.00 0.00 ~ (all equal) VDepthWithVModule-4 0.00 0.00 ~ (all equal) ``` Release note: None

35014: kv: introduce new "max outstanding size" setting to txnPipeliner r=nvanbenschoten a=nvanbenschoten Fixes #32522. This change creates a new cluster setting called: ``` kv.transaction.write_pipelining_max_outstanding_size ``` It limits the size in bytes that can be dedicated to tracking in-flight writes that have been pipelined. Once this limit is hit, not more writes will be pipelined by a transaction until some of the writes are proven and removed from the outstanding write set. This change once again illustrates the need for periodic proving of outstanding writes. We touch upon that in the type definition's comment and in #35009. Release note: None 35199: log: fix remaining misuse of runtime.Callers/runtime.FuncForPC r=nvanbenschoten a=nvanbenschoten Fixes #17770. This commit fixes the last user of `runtime.Callers` that misused the stdlib function by translating the PC values it returned directly into symbolic information (see https://golang.org/pkg/runtime/#Callers) [1]. Go's documentation warns that this is a recipe for disaster when mixed with mid-stack inlining. The other concern in #17770 was this comment: #17770 (comment). This was discussed in golang/go#29582 and addressed in golang/go@956879d. An alternative would be to use `runtime.Caller` here, but that would force an allocation that was hard-earned in #29017. Instead, this commit avoids any performance hit. ``` name old time/op new time/op delta Header-4 267ns ± 1% 268ns ± 0% ~ (p=0.584 n=10+10) VDepthWithVModule-4 260ns ± 3% 255ns ± 1% -1.87% (p=0.018 n=10+9) name old alloc/op new alloc/op delta Header-4 0.00B 0.00B ~ (all equal) VDepthWithVModule-4 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Header-4 0.00 0.00 ~ (all equal) VDepthWithVModule-4 0.00 0.00 ~ (all equal) ``` [1] I went through and verified that this was still correct. Release note: None 35203: closedts: react faster to config changes r=danhhz a=tbg This approximately halves the duration of `make test PKG=./pkg/ccl/changefeedccl TESTS=/sinkless`, from ~60s to ~30s. Touches #34455. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Tobias Schottdorf <[email protected]>

a-robinson requested a review from jordanlewis August 23, 2018 20:44

petermattis requested changes Aug 24, 2018

View reviewed changes

a-robinson force-pushed the vmodule branch from c6c8942 to 75d6af0 Compare August 24, 2018 02:51

a-robinson commented Aug 24, 2018

View reviewed changes

a-robinson force-pushed the vmodule branch from 75d6af0 to 5ba7b88 Compare August 24, 2018 03:12

petermattis approved these changes Aug 24, 2018

View reviewed changes

craig bot merged commit 5ba7b88 into cockroachdb:master Aug 24, 2018

a-robinson mentioned this pull request Aug 24, 2018

backport-2.1: util/log: Speed up vmodule support and warn about it in help text #29049

Merged

nvanbenschoten mentioned this pull request Feb 26, 2019

log: fix remaining misuse of runtime.Callers/runtime.FuncForPC #35199

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

util/log: Speed up vmodule support and warn about it in help text #29017

util/log: Speed up vmodule support and warn about it in help text #29017

a-robinson commented Aug 23, 2018

cockroach-teamcity commented Aug 23, 2018

petermattis left a comment

a-robinson left a comment

petermattis left a comment

a-robinson commented Aug 24, 2018

craig bot commented Aug 24, 2018

util/log: Speed up vmodule support and warn about it in help text #29017

util/log: Speed up vmodule support and warn about it in help text #29017

Conversation

a-robinson commented Aug 23, 2018

cockroach-teamcity commented Aug 23, 2018

petermattis left a comment

Choose a reason for hiding this comment

a-robinson left a comment

Choose a reason for hiding this comment

petermattis left a comment

Choose a reason for hiding this comment

a-robinson commented Aug 24, 2018

craig bot commented Aug 24, 2018

Build succeeded