Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: sysbench/oltp_write_only/nodes=3/cpu=32/conc=256 failed [tocommit out of range] #97926

Closed
cockroach-teamcity opened this issue Mar 2, 2023 · 17 comments · Fixed by #98721
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Mar 2, 2023

roachtest.sysbench/oltp_write_only/nodes=3/cpu=32/conc=256 failed with artifacts on master @ 20e2adda3c76c7172dd986c871df0ae9a346918f:

test artifacts and logs in: /artifacts/sysbench/oltp_write_only/nodes=3/cpu=32/conc=256/run_1
(cluster.go:1969).Run: output in run_161729.508110704_n4_sysbench-dbdriverpgs: sysbench \
		--db-driver=pgsql \
		--pgsql-host={pghost:1} \
		--pgsql-port=26257 \
		--pgsql-user=root \
		--pgsql-password= \
		--pgsql-db=sysbench \
		--report-interval=1 \
		--time=600 \
		--threads=256 \
		--tables=10 \
		--table_size=10000000 \
		--auto_inc=false \
		oltp_write_only prepare returned: context canceled
(monitor.go:127).Wait: monitor failure: monitor command failure: unexpected node event: 1: dead (exit status 7)

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=32 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-24968

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Mar 2, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Mar 2, 2023
@blathers-crl blathers-crl bot added the T-testeng TestEng Team label Mar 2, 2023
@renatolabs
Copy link
Contributor

Quite a few liveness errors and eventually node 1 crashes with a panic from etcd/raft:

go.etcd.io/raft/v3/log.go:324 ⋮ [T1,n1,s1,r101/1:‹/Table/11{1/1/3781…-2}›] 739 
tocommit(489) is out of range [lastIndex(427)]. Was the raft log corrupted, truncated, or lost?

cc @cockroachdb/replication

@blathers-crl
Copy link

blathers-crl bot commented Mar 2, 2023

cc @cockroachdb/replication

@renatolabs renatolabs removed the T-testeng TestEng Team label Mar 2, 2023
@erikgrinaker
Copy link
Contributor

@pavelkalinnikov This seems like a high-priority problem, would appreciate if you could have an initial look.

@pav-kv pav-kv self-assigned this Mar 6, 2023
@pav-kv
Copy link
Collaborator

pav-kv commented Mar 6, 2023

Node 1 got behind, and fails on an empty MsgApp in the probing state:

{
  "span": {
    "start_key": "/Table/111/1/3781091",
    "end_key": "/Table/112"
  },
  "raft_state": {
    "replica_id": 2,
    "hard_state": {
      "term": 6,
      "vote": 2,
      "commit": 821
    },
    "lead": 2,
    "state": "StateLeader",
    "applied": 821,
    "progress": {
      "1": {
        "match": 427,
        "next": 428,
        "state": "StateProbe",
        "paused": true
      },
      "2": {
        "match": 821,
        "next": 822,
        "state": "StateReplicate"
      },
      "3": {
        "match": 821,
        "next": 822,
        "state": "StateReplicate"
      }
    }
  },
  "state": {
    "state": {
      "raft_applied_index": 821,
      "lease_applied_index": 508,

@pav-kv
Copy link
Collaborator

pav-kv commented Mar 6, 2023

Possibly this is the bug that @tbg mentioned in this comment (see below "Also, isn't there a (pre-existing) bug here?").

@pav-kv pav-kv added A-kv-replication Relating to Raft, consensus, and coordination. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Mar 6, 2023
@pav-kv
Copy link
Collaborator

pav-kv commented Mar 6, 2023

Same symptoms in another failure here: #97389 (comment)

@pav-kv
Copy link
Collaborator

pav-kv commented Mar 6, 2023

The message causing the panic is a MsgApp sent from here:

pb.Message{
	Type:    pb.MsgApp,
	To:      1,
	From:    2,
	Term:    6,
	Index:   489,
	LogTerm: 0,  // <--- Why is this 0?
	Entries: [], // <--- Looks like no entries were found with index >= 489 (or max-inflight is saturated)?
	Commit:  660,
}

raftLog.maybeAppend checks matchTerm, but the latter silently accepts this 0 term as matching: https://github.com/etcd-io/raft/blob/d9907d6ac6baaebc3c9fd4e67acaa4154d2b3cd3/log.go#L391-L396. Hence we're entering the branch which asserts.

LogTerm isn't supposed to be 0 in the first place, we need to understand why this is the case. It can happen if an entry is not found. But how can it be not found in maybeSendAppend where this value is populated?

@pav-kv
Copy link
Collaborator

pav-kv commented Mar 6, 2023

@nvanbenschoten Could something like this be caused by async log appends introduced in #94165? We're seeing raftLog.term returns 0 (i.e. "not found") for an entry index which is already known to be committed.

@pav-kv
Copy link
Collaborator

pav-kv commented Mar 6, 2023

Possibly index 489 got truncated on the leader by the time raftLog.term was called. There is a line like this:

I230302 16:20:52.380104 83436 kv/kvserver/raft_log_queue.go:723 ⋮ [T1,n3,raftlog,s3,r101/2:‹/Table/11{1/1/3781…-2}›] 160  should truncate: true [truncate 31 entries to first index 510 (chosen via: last index); log too large (16 MiB > 16 MiB); implies 1 Raft snapshot]

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Mar 6, 2023

raftLog.maybeAppend checks matchTerm, but the latter silently accepts this 0 term as matching: https://github.com/etcd-io/raft/blob/d9907d6ac6baaebc3c9fd4e67acaa4154d2b3cd3/log.go#L391-L396. Hence we're entering the branch which asserts.

I came to the same conclusion. If it is legitimate for a MsgApp to carry a LogTerm of 0, then this receiver-side code is broken. Is this legitimate when sendIfEmpty = true? That depends on the relationship between a raft leader's own log and the Progress.Next of all of its followers.

If it's not legitimate then the receiver-side code could probably be improved, but it's not the root of the problem. Instead, we'll need to look at the leader and understand whether we're hitting the pr.Next < l.firstIndex() case or the pr.Next - 1 > l.lastIndex() case. @pavelkalinnikov have you seen any indication of which case we're hitting here? We're the raft leader, so presumably the first case.

@pav-kv
Copy link
Collaborator

pav-kv commented Mar 6, 2023

@nvanbenschoten Yes, likely we're hitting the first case, see the log truncation message above.

@nvanbenschoten
Copy link
Member

Interesting. When we're dealing with some kind of race with log truncation, do we have any reason to expect raftLog.term to reach the call to storage.Term and return raft.ErrCompacted, vs. only reaching the call to storage.FirstIndex and hitting this problematic return 0, nil return path?

@pav-kv
Copy link
Collaborator

pav-kv commented Mar 6, 2023

I'm struggling to see how at all we would hit the ErrCompacted branch here. I think we would always see this return 0.

raftLog.firstIndex will always return an index:

  1. either (rarely) the index of the unstable snapshot in memory about to be applied,
  2. or (most likely) the Storage.FirstIndex for an already applied to storage snapshot.

To get an ErrCompacted from Storage.Term, we should first have seen that raftLog.firstIndex()-1 <= i <= lastIndex, which in case 1 and 2 means:

  1. unstable snapshot index <= i <= lastIndex, but unstable snapshot index is probably >= stored snapshot index, so we won't get ErrCompacted
  2. Storage.FirstIndex()-1 <= i <= lastIndex, which is again a prereq for not getting ErrCompacted.

@erikgrinaker erikgrinaker changed the title roachtest: sysbench/oltp_write_only/nodes=3/cpu=32/conc=256 failed roachtest: sysbench/oltp_write_only/nodes=3/cpu=32/conc=256 failed [tocommit out of range] Mar 7, 2023
@pav-kv
Copy link
Collaborator

pav-kv commented Mar 7, 2023

Culprit: etcd-io/raft@42419da

I couldn't repro this panic yet, but found another one that this commit causes (bisected to verify). Trying to repro this one too, it seems to have the same underlying cause - sending a zero LogTerm after log truncation.

@pav-kv
Copy link
Collaborator

pav-kv commented Mar 7, 2023

Found a repro, working in upstream to fix it: etcd-io/raft#31.

@nvanbenschoten
Copy link
Member

@pavelkalinnikov nice find! Could you explain why etcd-io/raft@42419da is the culprit? We've primarily been looking at raftLog.term and that seems broken independent of your change. Do we need to fix this return 0, nil path?

@pav-kv
Copy link
Collaborator

pav-kv commented Mar 7, 2023

@nvanbenschoten In etcd-io/raft#31 there is a test which simulates the behaviour in this issue: a Raft log truncation + a bit of slowness on a follower so that truncation on the leader overtakes the appends flow to the follower. I bisected, and this test starts panicking (with the same message) right at the culprit commit that I linked.

The reason why my change broke it is: previously we unconditionally called raftLog.entries which would return ErrCompacted; after my change there is the Inflights.Full() case in which we skip fetching entries, and won't get this ErrCompacted (so will proceed to sending the zero term).

Yes, broadly speaking we need to fix or workaround the return 0, nil path.

craig bot pushed a commit that referenced this issue Mar 16, 2023
98574: sql: support tenant configuration templates r=stevendanna,ecwall a=knz

Fixes #98573.
Epic: CRDB-23559
First commit from #98726.

This change introduces the LIKE clause to CREATE TENANT, which makes CREATE TENANT copy the parameters (but not the storage keyspace) from the tenant selected by LIKE.

Also if LIKE is not specified, but the (new) cluster setting `sql.create_tenant.default_template` is not empty, the value of the cluster setting is used implicitly as LIKE clause.

A proposed use of this is cluster-to-cluster replication, considering
cutover as well. On the target (sink) cluster, the operator would do:

```
CREATE TENANT application LIKE app_template
  FROM REPLICATION OF application ON ....
```

And then cutover would look something like the following if they
wanted the tenant to still be named "application"

```
ALTER TENANT application CUTOVER TO LATEST;
DROP TENANT application; -- if there's one already
ALTER TENANT application START SERVICE SHARED;
```

Release note: None

98721: go.mod: bump etcd-io/raft to 5fe1c31 r=tbg a=pavelkalinnikov

Fixes #97926
Epic: none
Release note (bug fix): fixed a rare panic in upstream etcd-io/raft when message appends race with log compaction

98747: kvserver: deflake TestReplicaProbeRequest r=pavelkalinnikov a=tbg

When we ignored an ambigous result but the probe didn't actually happen,
a later condition in the test would fail.

Retry the probe on ambiguous results instead; the test already only
expects the probe to happen "at least once", so we don't introduce
any new issues should a successful probe end up being retried.

Fixes #97136.

Epic: none
Release note: None


Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
@craig craig bot closed this as completed in 6e4dd83 Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants