Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: respect closed timestamp in tryReproposeWithNewLeaseIndex #42939

Merged
merged 1 commit into from
Dec 13, 2019

Conversation

tbg
Copy link
Member

@tbg tbg commented Dec 4, 2019

Prior to this commit, tryReproposeWithNewLeaseIndex would not consult
with the closed timestamp tracker. This meant that a proposal may end up
with a new lease index at a timestamp that consumers of closed timestamp
updates had already considered immutable. In other words, this broke
closed timestamp guarantees and would by extension also have the
potential to break CDC's.

This path is fairly rare and it isn't one likely to cause problems
in practice, though it certainly could have.

See #42821 for details.

Fixes #42821

Release note (bug fix): fix a bug which could lead to follower reads or
CDC updates that did not reflect the full set of data at the timestamp.
This bug was never observed in practice and should be rare to cause
issues, one of the necessary ingredients being an aggressive closed
timestamp interval.

@tbg tbg requested a review from nvanbenschoten December 4, 2019 12:51
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg requested a review from ajwerner December 4, 2019 12:52
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic idea makes sense. Good find and nice fix.

Reviewed 1 of 2 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)


pkg/roachpb/batch.go, line 62 at r1 (raw file):

	if txn := ba.Txn; txn != nil {
		ts := txn.ReadTimestamp
		ts.Forward(txn.DeprecatedOrigTimestamp)

Can this be removed in 20.2?


pkg/storage/replica_application_result.go, line 214 at r1 (raw file):

	maxLeaseIndex, pErr := r.propose(ctx, p)
	if pErr != nil {
		log.Warningf(ctx, "failed to repropose with new lease index: %s", pErr)

Can you help me understand why you didn't want to log here and you did want to log above?


pkg/storage/replica_application_result.go, line 220 at r1 (raw file):

		// will go to the client, so send something it can digest.
		//
		// WIP(tbg): is this the right error?

This is a bummer. We want an error which will be retried in distsender. Not leaseholder error will do the trick but seems wrong. I think I’d be in favor of adding a new error over this.

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/roachpb/batch.go, line 62 at r1 (raw file):

Previously, ajwerner wrote…

Can this be removed in 20.2?

I think so, but let's remove this thing in its own PR.


pkg/storage/replica_application_result.go, line 214 at r1 (raw file):

Previously, ajwerner wrote…

Can you help me understand why you didn't want to log here and you did want to log above?

I removed this and added logging at the caller, to avoid introducing extra log.Warningf for the new error return in this method.


pkg/storage/replica_application_result.go, line 220 at r1 (raw file):

Previously, ajwerner wrote…

This is a bummer. We want an error which will be retried in distsender. Not leaseholder error will do the trick but seems wrong. I think I’d be in favor of adding a new error over this.

Hmm - this needs a backport, so I don't think we should be adding a new error. The leaseholder error has a "custom message" field, I can use it to disambiguate this error should it pop up somewhere. Agreed that there's not much to love about returning this error. More generally though, there's lots that's not to love about this whole method in general. I chatted with @nvanbenschoten yesterday and we both thought that there's something we should do around handling ErrProposalDropped (#21849) and avoiding raft forwarding entirely (#37906). The idea being that illegal lease applied index && proposal hasn't applied yet just can't happen if you don't forward proposals. This needs some fleshing out, but I'd very much like to remove all of this reordering, and thus this subtle code, altogether. (The bug count for this code is >5).

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/replica_application_result.go, line 220 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Hmm - this needs a backport, so I don't think we should be adding a new error. The leaseholder error has a "custom message" field, I can use it to disambiguate this error should it pop up somewhere. Agreed that there's not much to love about returning this error. More generally though, there's lots that's not to love about this whole method in general. I chatted with @nvanbenschoten yesterday and we both thought that there's something we should do around handling ErrProposalDropped (#21849) and avoiding raft forwarding entirely (#37906). The idea being that illegal lease applied index && proposal hasn't applied yet just can't happen if you don't forward proposals. This needs some fleshing out, but I'd very much like to remove all of this reordering, and thus this subtle code, altogether. (The bug count for this code is >5).

Wrote up the comment: #21849 (comment)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we write tests to detect closed timestamp violations such as this one? Something that might show up had we set the closed timestamp interval to something short and then added some chaos to the raft transport perhaps?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/roachpb/batch.go, line 62 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I think so, but let's remove this thing in its own PR.

You need this while in 20.1.I think I was mostly just asking for a TODO to get rid of this forwarding in 20.2 but I suppose that will just fall out of the removal of that field.


pkg/storage/replica_application_result.go, line 214 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I removed this and added logging at the caller, to avoid introducing extra log.Warningf for the new error return in this method.

Ack.


pkg/storage/replica_application_result.go, line 220 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Wrote up the comment: #21849 (comment)

Thanks. Makes sense.

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we write tests to detect closed timestamp violations such as this one? Something that might show up had we set the closed timestamp interval to something short and then added some chaos to the raft transport perhaps?

I think there are two ingredients we need:

  1. an invariant that detects CT violations (i.e. something like: query the tracker in .propose to validate that the MLAI is tracked; and/or with each applied Raft command, verify that the MVCC snapshot being changed isn't immutable yet)
  2. run through situations in which a violation is more likely.

Honestly to establish confidence that we have fixed this particular bug I would rather spend energy on simplifying our architecture so that we just handle things in a simpler way that doesn't even have this problem (i.e. there's only one path into Raft, and it "obviously" goes through the tracker). But even after that having some way to establish confidence that our closed timestamps are in fact properly closed does seem desirable. An always-on assertion to that end could be good, coupled with tests in which we run with an aggressive closed timestamp duration as you suggested.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

@ajwerner
Copy link
Contributor

ajwerner commented Dec 9, 2019

Honestly to establish confidence that we have fixed this particular bug I would rather spend energy on simplifying our architecture so that we just handle things in a simpler way that doesn't even have this problem (i.e. there's only one path into Raft, and it "obviously" goes through the tracker).

Fair. Thanks for the thoughts on the testing strategy. I'll keep those in mind as I look towards moving some of the closed timestamp subsystem into the raft transport and work to make closed timestamps something more of a per-range concept than today's store level concept.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a test that indicates that this works: #43121. In general, the assertions in rangefeed are actually pretty good at catching these issues.

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/roachpb/batch.go, line 59 at r1 (raw file):

// GetActiveTimestamp returns the timestamp at which the request will read.
func (ba *BatchRequest) GetActiveTimestamp() hlc.Timestamp {

Isn't ba.Timestamp set to this timestamp in SetActiveTimestamp? Why do we need this accessor?


pkg/storage/replica_application_result.go, line 223 at r1 (raw file):

		lhErr := roachpb.NewError(newNotLeaseHolderError(
			r.mu.state.Lease,
			cmd.proposal.command.ProposerReplica.StoreID,

I think I just removed this. If we're here then don't we know that this was proposed locally?


pkg/storage/replica_application_result.go, line 238 at r1 (raw file):

	// NB: The caller already promises that the lease check succeeded, meaning
	// the sequence numbers match, implying that the lease epoch hasn't changed
	// from what it was under the proposal-time lease.

This all checks out, but I still think it would look less troubling if we pulled the lease epoch off of cmd. Is that possible?

Prior to this commit, tryReproposeWithNewLeaseIndex would not consult
with the closed timestamp tracker. This meant that a proposal may end up
with a new lease index at a timestamp that consumers of closed timestamp
updates had already considered immutable. In other words, this broke
closed timestamp guarantees and would by extension also have the
potential to break CDC's.

This path is fairly rare and it isn't one likely to cause problems
in practice, though it certainly could have.

See cockroachdb#42821 for details.

Fixes cockroachdb#42821

Release note (bug fix): fix a bug which could lead to follower reads or
CDC updates that did not reflect the full set of data at the timestamp.
This bug was never observed in practice and should be rare to cause
issues, one of the necessary ingredients being an aggressive closed
timestamp interval.
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, thanks for writing that test!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/replica_application_result.go, line 223 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think I just removed this. If we're here then don't we know that this was proposed locally?

Yes, we do. Updated.


pkg/storage/replica_application_result.go, line 238 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This all checks out, but I still think it would look less troubling if we pulled the lease epoch off of cmd. Is that possible?

It's not available there, or are you saying I should stash it on ProposalData? I agree that it'd be nice not to have to do mental yoga over the epoch.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for writing that test!

It wasn't meant to test this. So I should be thanking you for deflaking my test.

:lgtm:

Reviewed 2 of 2 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


pkg/storage/replica_application_result.go, line 238 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

It's not available there, or are you saying I should stash it on ProposalData? I agree that it'd be nice not to have to do mental yoga over the epoch.

Oh you need the epoch, not the sequence, never mind.

@tbg
Copy link
Member Author

tbg commented Dec 13, 2019

TFTR!

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Dec 13, 2019

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Dec 13, 2019

Build failed

@tbg
Copy link
Member Author

tbg commented Dec 13, 2019

bors r=nvanbenschoten

flake was #38428

craig bot pushed a commit that referenced this pull request Dec 13, 2019
42939: storage: respect closed timestamp in tryReproposeWithNewLeaseIndex r=nvanbenschoten a=tbg

Prior to this commit, tryReproposeWithNewLeaseIndex would not consult
with the closed timestamp tracker. This meant that a proposal may end up
with a new lease index at a timestamp that consumers of closed timestamp
updates had already considered immutable. In other words, this broke
closed timestamp guarantees and would by extension also have the
potential to break CDC's.

This path is fairly rare and it isn't one likely to cause problems
in practice, though it certainly could have.

See #42821 for details.

Fixes #42821

Release note (bug fix): fix a bug which could lead to follower reads or
CDC updates that did not reflect the full set of data at the timestamp.
This bug was never observed in practice and should be rare to cause
issues, one of the necessary ingredients being an aggressive closed
timestamp interval.

Co-authored-by: Tobias Schottdorf <[email protected]>
@craig
Copy link
Contributor

craig bot commented Dec 13, 2019

Build succeeded

@craig craig bot merged commit 20600dc into cockroachdb:master Dec 13, 2019
@tbg tbg deleted the lai branch December 13, 2019 12:38
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Dec 13, 2019
I noticed that there weren't any storage-level integration tests that
verified that an active rangefeed processor would push transactions that
were holding up its resolved timestamp from progressing. This commit adds
such a test.

Periodically flakes with the following panic unless rebased on cockroachdb#42939:
```
panic: resolved timestamp 1576109922.890013000,0 equal to or above timestamp of operation {<nil> txn_id:d599eb68-cf29-4e15-90be-554b04f10a8b txn_key:"\275\211\211\210" txn_min_timestamp:<wall_time:1576109922877135000 > timestamp:<wall_time:1576109922883114000 logical:2 >  <nil> <nil> <nil> <nil>}
```

Release note: None
craig bot pushed a commit that referenced this pull request Dec 13, 2019
43121: storage: create TestReplicaRangefeedPushesTransactions r=nvanbenschoten a=nvanbenschoten

I noticed that there weren't any storage-level integration tests that verified that an active rangefeed processor would push transactions that were holding up its resolved timestamp from progressing. This commit adds such a test.

Periodically flakes under stress with the following panic unless rebased on #42939:
```
panic: resolved timestamp 1576109922.890013000,0 equal to or above timestamp of operation {<nil> txn_id:d599eb68-cf29-4e15-90be-554b04f10a8b txn_key:"\275\211\211\210" txn_min_timestamp:<wall_time:1576109922877135000 > timestamp:<wall_time:1576109922883114000 logical:2 >  <nil> <nil> <nil> <nil>}
```

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
@nvanbenschoten nvanbenschoten mentioned this pull request Feb 4, 2020
19 tasks
craig bot pushed a commit that referenced this pull request Jan 16, 2024
117801: kvserver: acquire `Replica.mu` when returning reproposal error r=erikgrinaker a=erikgrinaker

Discovered while working on #117612. Looks like it's been there since #42939.

This is a rare error, and these fields are unlikely to change while we're holding the raft mutex, so seems very unlikely to have caused any problems -- I could be convinced we shouldn't backport this, on the off-chance I've missed a potential deadlock.

Epic: none
Release note: None

117833: *: un-skip many tests that were skipped over the last month r=rail a=rickystewart

This partially reverts the following commits:

```
e615aec
99884f8
99f9760
75c91c2
5c64a22
c1778b7
381c96b
f9b20ae
8608499
eb71a83
337777f
```

These tests were skipped due to failures under RBE, typically under `deadlock` and `race`. With the addition of the `heavy` pool for these tests, we anticipate the decreased load will cause these tests to stop failing, therefore we un-skip the skipped tests.

Epic: CRDB-8308
Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: tryReproposeWithNewLeaseIndex may violate closed timestamp
4 participants