-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: re-work leader self-ack mechanism #87264
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Comments
tbg
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv-replication
labels
Sep 1, 2022
cc @cockroachdb/replication |
Draft PR etcd-io/etcd#14413 |
Upstream is now ready for review: etcd-io/etcd#14413 |
Now that etcd-io/etcd#14413 has landed, what's next for this issue? Pull in the dependency? Should we also remove the |
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Sep 29, 2022
This picks up etcd-io/etcd#14413. Closes cockroachdb#87264. Release note: None
See checklist on #88985 |
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Oct 10, 2022
``` go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66 go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead. go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0 go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66 ``` This picks up - etcd-io/etcd#14413 - etcd-io/etcd#14538 Closes cockroachdb#87264. Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Oct 13, 2022
``` go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66 go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead. go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0 go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66 ``` This picks up - etcd-io/etcd#14413 - etcd-io/etcd#14538 Closes cockroachdb#87264. Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Oct 14, 2022
``` go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66 go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead. go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0 go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66 ``` This picks up - etcd-io/etcd#14413 - etcd-io/etcd#14538 Closes cockroachdb#87264. Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Oct 17, 2022
``` go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66 go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead. go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0 go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66 ``` This picks up - etcd-io/etcd#14413 - etcd-io/etcd#14538 Closes cockroachdb#87264. Release note: None
craig bot
pushed a commit
that referenced
this issue
Nov 3, 2022
89632: go.mod: bump raft r=nvanbenschoten a=tbg ``` go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66 go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead. go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0 go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66 ``` This picks up - etcd-io/etcd#14413 - etcd-io/etcd#14538 Compared single-node performance on gceworker via ```bash #!/bin/bash set -euxo pipefail pkill -9 cockroach || true rm -rf cockroach-data cr=./cockroach-$1 $cr start-single-node --background --insecure $cr workload init kv $cr workload run kv --splits 100 --max-rate 2000 --duration 10m --read-percent 0 --min-block-bytes 10 --max-block-bytes 10 | tee $1.txt ``` ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 600.0s 0 1199604 1999.3 6.7 7.1 10.0 11.0 75.5 write #master 600.0s 0 1199614 1999.4 6.8 7.1 10.0 11.0 79.7 write #PR ``` Closes #87264. - [x] [make it build](#88985 (comment)) - [x] remove the maxIndex param and handling from Task.AckCommittedEntriesBeforeApplication - [x] check that single node write latencies don't regress Release note: None 91117: sql: reduce the overhead of EXPLAIN ANALYZE r=yuzefovich a=yuzefovich In order to propagate the execution stats across the distributed query plan we use the tracing infrastructure, where each stats object is added as "structured metadata" to the trace. Thus, whenever we're collecting the exec stats for a statement, we must enable tracing. Previously, in many cases we would enable it at the highest verbosity level which has non-trivial overhead. In some cases this was an overkill (e.g. in `EXPLAIN ANALYZE` we don't really care about the trace containing all of the gory details - we won't expose it anyway), so this is now fixed by using the less verbose "structured" verbosity level. As a concrete example of the difference: for a stmt that without `EXPLAIN ANALYZE` takes around 190ms, with `EXPLAIN ANALYZE` it would previously run for about 1.8s and now it takes around 210ms. This required some minor changes to the row-by-row outbox and router setups to collect thats even if the recording is not verbose. Addresses: #90739. Epic: None Release note (performance improvement): The overhead of running `EXPLAIN ANALYZE` and `EXPLAIN ANALYZE (DISTSQL)` has been significantly reduced. The overhead of `EXPLAIN ANALYZE (DEBUG)` didn't change. 91119: roachprod: improve error in ParallelE r=smg260 a=tbg Prior to this commit, the error's stack trace did not link back to the caller of `ParallelE`. Now it does. Epic: none Release note: None 91126: dev: allow whitespace separated regexps for testlogic files r=ajwerner a=ajwerner This was a feature of `make testlogic` and it was liked. Fixes #91125 Release note: None Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Nov 4, 2022
91226: batcheval: deflake `TestAddSSTableSSTTimestampToRequestTimestampRespectsClosedTS` r=erikgrinaker a=erikgrinaker Flake introduced by #87264. `@tbg` Can you just quickly verify that this flake is expected given your Raft changes? Seems reasonable to me. Resolves #91211. Release note: None Co-authored-by: Erik Grinaker <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Describe the problem
The
raft.Ready
contract has some deficiencies that have caused trouble upstream, and which are in our interest to help resolve.etcd-io/etcd#14370 (comment)
I believe we're not affected, since we are aware of this behavior and handle it:
cockroach/pkg/kv/kvserver/apply/task.go
Lines 182 to 193 in d064059
To Reproduce
See the thread above.
Expected behavior
Make the single-node case look like any other case. Don't emit
Ready
with.CommittedEntries
that are contingent on handling.Entries
. This reduces single-node throughput, but the performance could be clawed back on our end by special casing single nodes (which we probably don't need to, since if performance is a goal we can more completely elide raft).In practice, this means shepherding etcd-io/etcd#14411.
Also revive etcd-io/etcd#10861, which among other things points out the above problem.
When thinking about issues such as these:
#38322
#17500 (comment)
it is important to have all of the invariants properly documented and discoverable.
Jira issue: CRDB-19244
The text was updated successfully, but these errors were encountered: