Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvflowcontrol: observed raft log position less than per-stream lower bound #106123

Closed
Tracked by #98703
irfansharif opened this issue Jul 4, 2023 · 1 comment · Fixed by #107412
Closed
Tracked by #98703

kvflowcontrol: observed raft log position less than per-stream lower bound #106123

irfansharif opened this issue Jul 4, 2023 · 1 comment · Fixed by #107412
Assignees
Labels
A-admission-control C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@irfansharif
Copy link
Contributor

irfansharif commented Jul 4, 2023

Describe the problem

In #105955 we saw the following:

۩ cat logs/3.unredacted/cockroach.log | grep "observed raft log position less than per-stream lower bound" | head -n 10
E230701 07:05:49.168518 265 kv/kvserver/kvflowcontrol/kvflowtokentracker/tracker.go:95 ⋮ [-] 4421  observed raft log position less than per-stream lower bound (log-position=0/0 <= log-position=7/252461)
E230701 07:05:49.171144 265 kv/kvserver/kvflowcontrol/kvflowtokentracker/tracker.go:95 ⋮ [-] 4434  observed raft log position less than per-stream lower bound (log-position=0/0 <= log-position=6/23546)
E230701 07:05:49.171165 265 kv/kvserver/kvflowcontrol/kvflowtokentracker/tracker.go:95 ⋮ [-] 4435  observed raft log position less than per-stream lower bound (log-position=0/0 <= log-position=6/23546)
E230701 07:05:49.171178 265 kv/kvserver/kvflowcontrol/kvflowtokentracker/tracker.go:95 ⋮ [-] 4436  observed raft log position less than per-stream lower bound (log-position=0/0 <= log-position=6/23546)
E230701 07:05:49.171190 265 kv/kvserver/kvflowcontrol/kvflowtokentracker/tracker.go:95 ⋮ [-] 4437  observed raft log position less than per-stream lower bound (log-position=0/0 <= log-position=6/23546)
E230701 07:05:49.171201 265 kv/kvserver/kvflowcontrol/kvflowtokentracker/tracker.go:95 ⋮ [-] 4438  observed raft log position less than per-stream lower bound (log-position=0/0 <= log-position=6/23546)
E230701 07:05:49.171215 265 kv/kvserver/kvflowcontrol/kvflowtokentracker/tracker.go:95 ⋮ [-] 4439  observed raft log position less than per-stream lower bound (log-position=0/0 <= log-position=6/23546)
E230701 07:05:49.171363 265 kv/kvserver/kvflowcontrol/kvflowtokentracker/tracker.go:95 ⋮ [-] 4440  observed raft log position less than per-stream lower bound (log-position=0/0 <= log-position=7/20430)
E230701 07:05:49.171381 265 kv/kvserver/kvflowcontrol/kvflowtokentracker/tracker.go:95 ⋮ [-] 4441  observed raft log position less than per-stream lower bound (log-position=0/0 <= log-position=7/20430)
E230701 07:05:49.172008 265 kv/kvserver/kvflowcontrol/kvflowtokentracker/tracker.go:95 ⋮ [-] 4445  observed raft log position less than per-stream lower bound (log-position=0/0 <= log-position=6/33144)

Which is an indication of a kvflowcontrol integration bug:

if !(dt.lowerBound.Less(pos)) {
// We're trying to track a token deduction at a position less than the
// stream's lower-bound. Shout loudly but ultimately no-op. This
// regression indicates buggy usage since:
// - token deductions are done so with monotonically increasing log
// positions (see Handle.DeductTokensFor);
// - the monotonically increasing log positions for token deductions
// also extends to positions at which streams are connected,
// which typically happen when (a) members are added to the raft
// group, (b) previously crashed follower nodes restart, (c)
// replicas are unpaused, or (d) caught back up via snapshots (see
// Handle.ConnectStream).
// - token returns upto some log position don't precede deductions at
// lower log positions (see Handle.ReturnTokensUpto);
logFn := log.Errorf
if buildutil.CrdbTestBuild {
logFn = log.Fatalf
}
logFn(ctx, "observed raft log position less than per-stream lower bound (%s <= %s)",
pos, dt.lowerBound)
return false
}

Jira issue: CRDB-29414

@irfansharif irfansharif added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-admission-control labels Jul 4, 2023
@irfansharif irfansharif self-assigned this Jul 4, 2023
@irfansharif
Copy link
Contributor Author

In #105955 there were disk stalls, and the raft group stopped being ticked, which might have had something to do with these zero log positions being read from raft.

craig bot pushed a commit that referenced this issue Jul 24, 2023
107412: kvserver: fix 'observed raft log position' assertion r=irfansharif a=irfansharif

Fixes #107336.
Fixes #106123.
Fixes #107156.
Fixes #106589.

It's possible to hit this assertion under --stress --race when the proposing replica is starved enough for raft ticks that it loses leadership right when it steps proposals through raft. We're relying on undocumented API semantics in the etcd raft library whereby it mutates stepped entries with the term+index its to end up in. But that's only applicable if stepping through entries as a leader. Simply relax this assertion instead.

Release note: None

Co-authored-by: irfan sharif <[email protected]>
@craig craig bot closed this as completed in deb538f Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-admission-control C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant