Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/log: Speed up vmodule support and warn about it in help text #29017

Merged
merged 2 commits into from
Aug 24, 2018

Conversation

a-robinson
Copy link
Contributor

It's possible that with the second commit the help text change is no longer needed, but I'm curious what you think about that. I haven't tried re-running tpc-c 10k with this, so while I'm pretty sure this is a lot better I don't know how much of a performance hit there still is.

This bit me pretty bad when doing performance testing. I turned on
vmodule for a file which hardly logs at all, but all the calls to
log.VDepth slowed the system down to a crawl.

Alternatively we could improve log.VDepth to make this less of a
concern. It's already been handcrafted to avoid allocations, but holding
the logging lock while calling runtime.Callers is not good.

Release note: None
@a-robinson a-robinson requested a review from jordanlewis August 23, 2018 20:44
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pro-tip: run with TESTFLAGS="-count 10" and then compare the output of the two runs with benchstat.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/util/log/clog.go, line 1401 at r2 (raw file):

	// Here is another cheap but safe test to see if vmodule is enabled.
	if atomic.LoadInt32(&logging.filterLength) > 0 {
		pcs := logging.pcsPool.Get().([1]uintptr)

I don't see a call to logging.pcsPool.Put which makes me suspect that you're allocating on every call here. You can also run with TESTFLAGS="-benchmem" to see the number of allocations per iteration.


pkg/util/log/clog.go, line 1410 at r2 (raw file):

			v = logging.setV(pcs[0])
		}
		logging.mu.Unlock()

I think there should be a call to logging.pcsPool.Put(pcs) here.

Copy link
Contributor Author

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchstat output with -count 10:

name                 old time/op    new time/op    delta
VDepthWithVModule-8     790ns ± 1%     254ns ± 6%  -67.83%  (p=0.000 n=9+10)

name                 old alloc/op   new alloc/op   delta
VDepthWithVModule-8     0.00B          0.00B          ~     (all equal)

name                 old allocs/op  new allocs/op  delta
VDepthWithVModule-8      0.00           0.00          ~     (all equal)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/util/log/clog.go, line 1401 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I don't see a call to logging.pcsPool.Put which makes me suspect that you're allocating on every call here. You can also run with TESTFLAGS="-benchmem" to see the number of allocations per iteration.

🤦Thanks, fixed.

Oddly enough, -benchmem actually doesn't show any allocations without the Put calls. Seems like an accounting bug of some sort, although we could just be misunderstanding something.

Including the Puts is obviously what we want, though, and it performs 10-20% better than without them so they clearly have some effect.


pkg/util/log/clog.go, line 1410 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think there should be a call to logging.pcsPool.Put(pcs) here.

Done. And in the early return case.

By using a sync.Pool to manage the PC buffers for us. This avoids
locking while calling runtime.Callers without needing to allocate a new
buffer every time VDepth is called.

Take the benchmarks with a grain of salt since they're just from my
laptop, but

Before the change:
$ make bench PKG=./pkg/util/log/ BENCHES=BenchmarkVDepthWithVModule
BenchmarkVDepthWithVModule-8     2000000               825 ns/op

After the change:
$ make bench PKG=./pkg/util/log/ BENCHES=BenchmarkVDepthWithVModule
BenchmarkVDepthWithVModule-8     5000000               284 ns/op

Release note: None
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

@a-robinson
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request Aug 24, 2018
29017: util/log: Speed up vmodule support and warn about it in help text r=a-robinson a=a-robinson

It's possible that with the second commit the help text change is no longer needed, but I'm curious what you think about that. I haven't tried re-running tpc-c 10k with this, so while I'm pretty sure this is a lot better I don't know how much of a performance hit there still is.

Co-authored-by: Alex Robinson <[email protected]>
@craig
Copy link
Contributor

craig bot commented Aug 24, 2018

Build succeeded

@craig craig bot merged commit 5ba7b88 into cockroachdb:master Aug 24, 2018
craig bot pushed a commit that referenced this pull request Aug 27, 2018
29049: backport-2.1: util/log: Speed up vmodule support and warn about it in help text r=a-robinson a=a-robinson

Backport 2/2 commits from #29017.

/cc @cockroachdb/release

---

It's possible that with the second commit the help text change is no longer needed, but I'm curious what you think about that. I haven't tried re-running tpc-c 10k with this, so while I'm pretty sure this is a lot better I don't know how much of a performance hit there still is.


Co-authored-by: Alex Robinson <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 26, 2019
Fixes cockroachdb#17770.

This commit fixes the last user of `runtime.Callers` that misused
the stdlib function by translating the PC values it returned directly
into symbolic information (see https://golang.org/pkg/runtime/#Callers).
Go's documentation warns that this is a recipe for disaster when mixed
with mid-stack inlining.

The other concern in cockroachdb#17770 was this comment:
cockroachdb#17770 (comment).
This was discussed in golang/go#29582 and
addressed in golang/go@956879d.

An alternative would be to use `runtime.Caller` here, but that would
force an allocation that was hard-earned in cockroachdb#29017. Instead, this commit
avoids any performance hit.

```
name                 old time/op    new time/op    delta
Header-4                267ns ± 1%     268ns ± 0%    ~     (p=0.584 n=10+10)
VDepthWithVModule-4     260ns ± 3%     255ns ± 1%  -1.87%  (p=0.018 n=10+9)

name                 old alloc/op   new alloc/op   delta
Header-4                0.00B          0.00B         ~     (all equal)
VDepthWithVModule-4     0.00B          0.00B         ~     (all equal)

name                 old allocs/op  new allocs/op  delta
Header-4                 0.00           0.00         ~     (all equal)
VDepthWithVModule-4      0.00           0.00         ~     (all equal)
```

Release note: None
craig bot pushed a commit that referenced this pull request Feb 26, 2019
35014: kv: introduce new "max outstanding size" setting to txnPipeliner r=nvanbenschoten a=nvanbenschoten

Fixes #32522.

This change creates a new cluster setting called:
```
kv.transaction.write_pipelining_max_outstanding_size
```

It limits the size in bytes that can be dedicated to tracking in-flight
writes that have been pipelined. Once this limit is hit, not more writes
will be pipelined by a transaction until some of the writes are proven
and removed from the outstanding write set.

This change once again illustrates the need for periodic proving of
outstanding writes. We touch upon that in the type definition's comment
and in #35009.

Release note: None

35199: log: fix remaining misuse of runtime.Callers/runtime.FuncForPC r=nvanbenschoten a=nvanbenschoten

Fixes #17770.

This commit fixes the last user of `runtime.Callers` that misused
the stdlib function by translating the PC values it returned directly
into symbolic information (see https://golang.org/pkg/runtime/#Callers) [1].
Go's documentation warns that this is a recipe for disaster when mixed
with mid-stack inlining.

The other concern in #17770 was this comment: #17770 (comment).
This was discussed in golang/go#29582 and addressed in golang/go@956879d.

An alternative would be to use `runtime.Caller` here, but that would
force an allocation that was hard-earned in #29017. Instead, this commit
avoids any performance hit.

```
name                 old time/op    new time/op    delta
Header-4                267ns ± 1%     268ns ± 0%    ~     (p=0.584 n=10+10)
VDepthWithVModule-4     260ns ± 3%     255ns ± 1%  -1.87%  (p=0.018 n=10+9)

name                 old alloc/op   new alloc/op   delta
Header-4                0.00B          0.00B         ~     (all equal)
VDepthWithVModule-4     0.00B          0.00B         ~     (all equal)

name                 old allocs/op  new allocs/op  delta
Header-4                 0.00           0.00         ~     (all equal)
VDepthWithVModule-4      0.00           0.00         ~     (all equal)
```

[1] I went through and verified that this was still correct.

Release note: None

35203: closedts: react faster to config changes r=danhhz a=tbg

This approximately halves the duration of

`make test PKG=./pkg/ccl/changefeedccl TESTS=/sinkless`,

from ~60s to ~30s.

Touches #34455.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Tobias Schottdorf <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants