Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: broadcast MsgApp on auto-leave joint config proposal #14538

Merged
merged 1 commit into from
Sep 30, 2022

Conversation

nvanbenschoten
Copy link
Contributor

@nvanbenschoten nvanbenschoten commented Sep 29, 2022

This commit ensures that the raft leader eagerly broadcasts a MsgApp to each follower when initiating an automatic transition out of a joint configuration. This had been missed previously, which could lead to delayed completion of an auto-transition.

cc. @tbg

This commit ensures that the raft leader eagerly broadcasts a MsgApp to
each follower when initiating an automatic transition out of a joint
configuration. This had been missed previously, which could lead to
delayed completion of an auto-transition.

Signed-off-by: Nathan VanBenschoten <[email protected]>
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/broadcastOnLeave branch from 3db9806 to bd34388 Compare September 29, 2022 16:33
@codecov-commenter
Copy link

Codecov Report

Merging #14538 (bd34388) into main (4513671) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main   #14538   +/-   ##
=======================================
  Coverage   75.26%   75.26%           
=======================================
  Files         457      457           
  Lines       37262    37195   -67     
=======================================
- Hits        28046    27996   -50     
+ Misses       7446     7433   -13     
+ Partials     1770     1766    -4     
Flag Coverage Δ
all 75.26% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
raft/raft.go 90.23% <100.00%> (+0.01%) ⬆️
server/etcdserver/api/v3lock/lock.go 61.53% <0.00%> (-8.47%) ⬇️
client/pkg/v3/tlsutil/tlsutil.go 83.33% <0.00%> (-8.34%) ⬇️
client/v3/leasing/util.go 91.66% <0.00%> (-6.67%) ⬇️
api/etcdserverpb/raft_internal_stringer.go 76.78% <0.00%> (-4.94%) ⬇️
client/v3/concurrency/session.go 88.63% <0.00%> (-4.55%) ⬇️
client/v3/leasing/cache.go 87.77% <0.00%> (-3.89%) ⬇️
server/etcdserver/api/v3rpc/util.go 70.96% <0.00%> (-3.23%) ⬇️
server/storage/mvcc/watchable_store.go 86.23% <0.00%> (-2.90%) ⬇️
client/v3/experimental/recipes/key.go 75.34% <0.00%> (-2.74%) ⬇️
... and 18 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@tbg tbg self-requested a review September 30, 2022 06:28
Copy link
Contributor

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nvanbenschoten

@ahrtr ahrtr merged commit c102c07 into etcd-io:main Sep 30, 2022
nvanbenschoten added a commit to nvanbenschoten/etcd that referenced this pull request Oct 3, 2022
This commit simplifies the logic added in 37c7e4d to auto-leave joint
configurations. It does so by making the following adjustments to the
code:

- remove the `oldApplied <= r.pendingConfIndex` condition. This does
  not seem necessary. When a node first attempts to auto-leave a joint
  config, it will bump `r.pendingConfIndex` when proposing. In cases
  where `oldApplied >= r.pendingConfIndex`, the proposal must have
  already been applied. Reviewers should double check this.
- use raft.Step instead of custom proposal code. This code was already
  present in stepLeader, so there was no reason to duplicate it. This
  would have avoided bugs like the one we fixed in etcd-io#14538.
- use `confChangeToMsg` to generate message, to centralize the creation
  of all `MsgProp{EntryConfChange}` messages.
nvanbenschoten added a commit to nvanbenschoten/etcd that referenced this pull request Oct 3, 2022
This commit simplifies the logic added in 37c7e4d to auto-leave joint
configurations. It does so by making the following adjustments to the
code:

- remove the `oldApplied <= r.pendingConfIndex` condition. This does
  not seem necessary. When a node first attempts to auto-leave a joint
  config, it will bump `r.pendingConfIndex` when proposing. In cases
  where `oldApplied >= r.pendingConfIndex`, the proposal must have
  already been applied. Reviewers should double check this.
- use raft.Step instead of custom proposal code. This code was already
  present in stepLeader, so there was no reason to duplicate it. This
  would have avoided bugs like the one we fixed in etcd-io#14538.
- use `confChangeToMsg` to generate message, to centralize the creation
  of all `MsgProp{EntryConfChange}` messages.

Signed-off-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/etcd that referenced this pull request Oct 3, 2022
This commit simplifies the logic added in 37c7e4d to auto-leave joint
configurations. It does so by making the following adjustments to the
code:

- remove the `oldApplied <= r.pendingConfIndex` condition. This does
  not seem necessary. When a node first attempts to auto-leave a joint
  config, it will bump `r.pendingConfIndex` when proposing. In cases
  where `oldApplied >= r.pendingConfIndex`, the proposal must have
  already been applied. Reviewers should double check this.
- use raft.Step instead of custom proposal code. This code was already
  present in stepLeader, so there was no reason to duplicate it. This
  would have avoided bugs like the one we fixed in etcd-io#14538.
- use `confChangeToMsg` to generate message, to centralize the creation
  of all `MsgProp{EntryConfChange}` messages.

Signed-off-by: Nathan VanBenschoten <[email protected]>
@tbg tbg mentioned this pull request Oct 10, 2022
3 tasks
tbg added a commit to tbg/cockroach that referenced this pull request Oct 10, 2022
```
go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f

go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead.
go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0
go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
```

This picks up

- etcd-io/etcd#14413
- etcd-io/etcd#14538

Closes cockroachdb#87264.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Oct 13, 2022
```
go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f

go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead.
go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0
go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
```

This picks up

- etcd-io/etcd#14413
- etcd-io/etcd#14538

Closes cockroachdb#87264.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Oct 14, 2022
```
go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f

go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead.
go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0
go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
```

This picks up

- etcd-io/etcd#14413
- etcd-io/etcd#14538

Closes cockroachdb#87264.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Oct 17, 2022
```
go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f

go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead.
go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0
go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
```

This picks up

- etcd-io/etcd#14413
- etcd-io/etcd#14538

Closes cockroachdb#87264.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Nov 2, 2022
```
go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f

go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead.
go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0
go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
```

This picks up

- etcd-io/etcd#14413
- etcd-io/etcd#14538

Closes cockroachdb#87264.

Release note: None
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Nov 3, 2022
89632: go.mod: bump raft r=nvanbenschoten a=tbg

```
go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f

go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead.
go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0
go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
```

This picks up

- etcd-io/etcd#14413
- etcd-io/etcd#14538

Compared single-node performance on gceworker via

```bash
#!/bin/bash
set -euxo pipefail
pkill -9 cockroach || true
rm -rf cockroach-data
cr=./cockroach-$1
$cr start-single-node --background --insecure
$cr workload init kv
$cr workload run kv --splits 100 --max-rate 2000 --duration 10m --read-percent 0 --min-block-bytes 10 --max-block-bytes 10 | tee $1.txt
```

```
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0        1199604         1999.3      6.7      7.1     10.0     11.0     75.5  write #master
  600.0s        0        1199614         1999.4      6.8      7.1     10.0     11.0     79.7  write #PR
```

Closes #87264.

- [x] [make it build](#88985 (comment))
- [x] remove the maxIndex param and handling from Task.AckCommittedEntriesBeforeApplication
- [x] check that single node write latencies don't regress

Release note: None


91117: sql: reduce the overhead of EXPLAIN ANALYZE r=yuzefovich a=yuzefovich

In order to propagate the execution stats across the distributed query plan we use the tracing infrastructure, where each stats object is added as "structured metadata" to the trace. Thus, whenever we're collecting the exec stats for a statement, we must enable tracing. Previously, in many cases we would enable it at the highest verbosity level which has non-trivial overhead. In some cases this was an overkill (e.g. in `EXPLAIN ANALYZE` we don't really care about the trace containing all of the gory details - we won't expose it anyway), so this is now fixed by using the less verbose "structured" verbosity level. As a concrete example of the difference: for a stmt that without `EXPLAIN ANALYZE` takes around 190ms, with `EXPLAIN ANALYZE` it would previously run for about 1.8s and now it takes around 210ms.

This required some minor changes to the row-by-row outbox and router
setups to collect thats even if the recording is not verbose.

Addresses: #90739.

Epic: None

Release note (performance improvement): The overhead of running `EXPLAIN ANALYZE` and `EXPLAIN ANALYZE (DISTSQL)` has been significantly reduced. The overhead of `EXPLAIN ANALYZE (DEBUG)` didn't change.

91119: roachprod: improve error in ParallelE r=smg260 a=tbg

Prior to this commit, the error's stack trace did not link back
to the caller of `ParallelE`. Now it does.

Epic: none
Release note: None


91126: dev: allow whitespace separated regexps for testlogic files r=ajwerner a=ajwerner

This was a feature of `make testlogic` and it was liked.

Fixes #91125

Release note: None

Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants