Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: re-work leader self-ack mechanism #87264

Closed
tbg opened this issue Sep 1, 2022 · 5 comments · Fixed by #89632
Closed

raft: re-work leader self-ack mechanism #87264

tbg opened this issue Sep 1, 2022 · 5 comments · Fixed by #89632
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@tbg
Copy link
Member

tbg commented Sep 1, 2022

Describe the problem

The raft.Ready contract has some deficiencies that have caused trouble upstream, and which are in our interest to help resolve.

etcd-io/etcd#14370 (comment)

I believe we're not affected, since we are aware of this behavior and handle it:

// 4. etcd/raft may provided a series of CommittedEntries in a Ready struct that
// haven't actually been appended to our own log. This is most common in single
// node replication groups, but it is possible when a follower in a multi-node
// replication group is catching up after falling behind. In the first case,
// the entries are not yet committed so acknowledging them would be a lie. In
// the second case, the entries are committed so we could acknowledge them at
// this point, but doing so seems risky. To avoid complications in either case,
// the method takes a maxIndex parameter that limits the indexes that it will
// acknowledge. Typically, callers will supply the highest index that they have
// durably written to their raft log for this upper bound.
//
func (t *Task) AckCommittedEntriesBeforeApplication(ctx context.Context, maxIndex uint64) error {

To Reproduce

See the thread above.

Expected behavior

Make the single-node case look like any other case. Don't emit Ready with .CommittedEntries that are contingent on handling .Entries. This reduces single-node throughput, but the performance could be clawed back on our end by special casing single nodes (which we probably don't need to, since if performance is a goal we can more completely elide raft).

In practice, this means shepherding etcd-io/etcd#14411.

Also revive etcd-io/etcd#10861, which among other things points out the above problem.

When thinking about issues such as these:

#38322
#17500 (comment)

it is important to have all of the invariants properly documented and discoverable.

Jira issue: CRDB-19244

@tbg tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv-replication labels Sep 1, 2022
@tbg tbg self-assigned this Sep 1, 2022
@blathers-crl
Copy link

blathers-crl bot commented Sep 1, 2022

cc @cockroachdb/replication

@tbg
Copy link
Member Author

tbg commented Sep 1, 2022

Draft PR etcd-io/etcd#14413

@tbg
Copy link
Member Author

tbg commented Sep 8, 2022

Upstream is now ready for review: etcd-io/etcd#14413

@nvanbenschoten
Copy link
Member

Now that etcd-io/etcd#14413 has landed, what's next for this issue? Pull in the dependency? Should we also remove the maxIndex param and handling from Task.AckCommittedEntriesBeforeApplication?

@tbg tbg mentioned this issue Sep 29, 2022
3 tasks
tbg added a commit to tbg/cockroach that referenced this issue Sep 29, 2022
This picks up etcd-io/etcd#14413.

Closes cockroachdb#87264.

Release note: None
@tbg
Copy link
Member Author

tbg commented Sep 29, 2022

See checklist on #88985

@tbg tbg mentioned this issue Oct 10, 2022
3 tasks
tbg added a commit to tbg/cockroach that referenced this issue Oct 10, 2022
```
go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f

go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead.
go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0
go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
```

This picks up

- etcd-io/etcd#14413
- etcd-io/etcd#14538

Closes cockroachdb#87264.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Oct 13, 2022
```
go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f

go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead.
go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0
go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
```

This picks up

- etcd-io/etcd#14413
- etcd-io/etcd#14538

Closes cockroachdb#87264.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Oct 14, 2022
```
go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f

go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead.
go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0
go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
```

This picks up

- etcd-io/etcd#14413
- etcd-io/etcd#14538

Closes cockroachdb#87264.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Oct 17, 2022
```
go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f

go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead.
go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0
go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
```

This picks up

- etcd-io/etcd#14413
- etcd-io/etcd#14538

Closes cockroachdb#87264.

Release note: None
craig bot pushed a commit that referenced this issue Nov 3, 2022
89632: go.mod: bump raft r=nvanbenschoten a=tbg

```
go get go.etcd.io/etcd/raft/v3@d19116e6ee66e52a5fd8cce2e10f9422fb80e42f

go: downloading go.etcd.io/etcd/raft/v3 v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
go: module github.com/golang/protobuf is deprecated: Use the "google.golang.org/protobuf" module instead.
go: upgraded go.etcd.io/etcd/api/v3 v3.5.0 => v3.6.0-alpha.0
go: upgraded go.etcd.io/etcd/raft/v3 v3.0.0-20210320072418-e51c697ec6e8 => v3.6.0-alpha.0.0.20221009201006-d19116e6ee66
```

This picks up

- etcd-io/etcd#14413
- etcd-io/etcd#14538

Compared single-node performance on gceworker via

```bash
#!/bin/bash
set -euxo pipefail
pkill -9 cockroach || true
rm -rf cockroach-data
cr=./cockroach-$1
$cr start-single-node --background --insecure
$cr workload init kv
$cr workload run kv --splits 100 --max-rate 2000 --duration 10m --read-percent 0 --min-block-bytes 10 --max-block-bytes 10 | tee $1.txt
```

```
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0        1199604         1999.3      6.7      7.1     10.0     11.0     75.5  write #master
  600.0s        0        1199614         1999.4      6.8      7.1     10.0     11.0     79.7  write #PR
```

Closes #87264.

- [x] [make it build](#88985 (comment))
- [x] remove the maxIndex param and handling from Task.AckCommittedEntriesBeforeApplication
- [x] check that single node write latencies don't regress

Release note: None


91117: sql: reduce the overhead of EXPLAIN ANALYZE r=yuzefovich a=yuzefovich

In order to propagate the execution stats across the distributed query plan we use the tracing infrastructure, where each stats object is added as "structured metadata" to the trace. Thus, whenever we're collecting the exec stats for a statement, we must enable tracing. Previously, in many cases we would enable it at the highest verbosity level which has non-trivial overhead. In some cases this was an overkill (e.g. in `EXPLAIN ANALYZE` we don't really care about the trace containing all of the gory details - we won't expose it anyway), so this is now fixed by using the less verbose "structured" verbosity level. As a concrete example of the difference: for a stmt that without `EXPLAIN ANALYZE` takes around 190ms, with `EXPLAIN ANALYZE` it would previously run for about 1.8s and now it takes around 210ms.

This required some minor changes to the row-by-row outbox and router
setups to collect thats even if the recording is not verbose.

Addresses: #90739.

Epic: None

Release note (performance improvement): The overhead of running `EXPLAIN ANALYZE` and `EXPLAIN ANALYZE (DISTSQL)` has been significantly reduced. The overhead of `EXPLAIN ANALYZE (DEBUG)` didn't change.

91119: roachprod: improve error in ParallelE r=smg260 a=tbg

Prior to this commit, the error's stack trace did not link back
to the caller of `ParallelE`. Now it does.

Epic: none
Release note: None


91126: dev: allow whitespace separated regexps for testlogic files r=ajwerner a=ajwerner

This was a feature of `make testlogic` and it was liked.

Fixes #91125

Release note: None

Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
@craig craig bot closed this as completed in 2c1c354 Nov 3, 2022
craig bot pushed a commit that referenced this issue Nov 4, 2022
91226: batcheval: deflake `TestAddSSTableSSTTimestampToRequestTimestampRespectsClosedTS` r=erikgrinaker a=erikgrinaker

Flake introduced by #87264. `@tbg` Can you just quickly verify that this flake is expected given your Raft changes? Seems reasonable to me.

Resolves #91211.

Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants