storage: don't leak committed protos to pushers on reproposal #34659

tbg · 2019-02-06T12:25:10Z

TODO: test

A recent commit (master only) reintroduced a bug that we ironically had
spent a lot of time on before. In summary, it would allow the result
of an EndTransaction which would in itself not apply to leak and would
result in intents being committed even though their transaction
ultimately would not:

#34025 (comment)

We've diagnosed this pretty quickly the second time around, but clearly
we didn't do a good job at preventing the regression. I can see how this
would happen as the method this code is in is notoriously difficult to
test for it interfaces so much with everything else that it's difficult
to unit test it; one needs to jump through lots of hoops to target it,
and so we do it less than we ought to.

I believe this wasn't released in any alpha (nor backported anywhere),
so no release note is necessary.

Fixes #34025.

Release note: None

cockroach-teamcity · 2019-02-06T12:25:18Z

This change is

A recent commit (master only) reintroduced a bug that we ironically had spent a lot of time on [before]. In summary, it would allow the result of an EndTransaction which would in itself *not* apply to leak and would result in intents being committed even though their transaction ultimately would not: cockroachdb#34025 (comment) We've diagnosed this pretty quickly the second time around, but clearly we didn't do a good job at preventing the regression. I can see how this would happen as the method this code is in is notoriously difficult to test for it interfaces so much with everything else that it's difficult to unit test it; one needs to jump through lots of hoops to target it, and so we do it less than we ought to. I believe this wasn't released in any alpha (nor backported anywhere), so no release note is necessary. Fixes cockroachdb#34025. [before]: cockroachdb#30792 (comment) Release note: None

petermattis

Nice find! Definitely need to write a test for this. This also seems worthy of listing in the Core fault lines: our inability to unit test complex logic.

Reviewable status: complete! 0 of 0 LGTMs obtained

tbg · 2019-02-06T14:54:14Z

How do we proceed regarding the alpha? I won't be able to write the test today. Either someone else can pick this up or I can commit to making this tested a priority for tomorrow. This would allow me a little bit of breathing room to see about making this kind of thing easier to test.

petermattis · 2019-02-06T14:57:38Z

I'm not super familiar with this code, so I'd want to have a few other folks sign off on this change (@nvanbenschoten, @bdarnell, @andreimatei). Assuming that happens, I'm fine with merging for the alpha along with a promise to get a test written as soon as possible.

tbg · 2019-02-06T15:58:22Z

👍 @bdarnell and/or @nvanbenschoten, your turn.

bdarnell

to merge for the alpha with my suggested change below and add tests later.

As I'm looking at this code, I think we can get rid of the whole alwaysReturn concept in the eval results by adding a path to the intent resolver in the eval context. Whenever we would add alwaysReturn intents/txns to the result, just hand them directly to the resolver at evaluation time. If application of the command doesn't make any difference, there's no reason to defer this until it applies. Then we can simplify the apply-time logic to only look at intents/txns in the eval result if the command fully succeeded.

Reviewed 1 of 1 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)

pkg/storage/replica_raft.go, line 1930 at r1 (raw file):

			}
			response.Intents = proposal.Local.DetachIntents()
			response.EndTxns = proposal.Local.DetachEndTxns(pErr != nil)

These two error variables are extremely subtle. I think this should be pErr != nil && response.Err != nil. Looking back at the introduction of this alwaysReturn feature (#17074), it was concerned with response errors, not raft/"forced" errors.

I don't think the response.Err clause actually matters because EndTransaction never returns both a non-always updated txn and a response error, but including this clause would be the more conservative option while we're fast-tracking this into the alpha.

tbg

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @bdarnell and @nvanbenschoten)

pkg/storage/replica_raft.go, line 1930 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

These two error variables are extremely subtle. I think this should be pErr != nil && response.Err != nil. Looking back at the introduction of this alwaysReturn feature (#17074), it was concerned with response errors, not raft/"forced" errors.

I don't think the response.Err clause actually matters because EndTransaction never returns both a non-always updated txn and a response error, but including this clause would be the more conservative option while we're fast-tracking this into the alpha.

response isn't the original proposal, it's one that is populated by it. Can you double-check, too? Afaict this is the only write to response.Err. The proposal is in proposal but we copy its fields over piecemeal.

bdarnell

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)

pkg/storage/replica_raft.go, line 1930 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

response isn't the original proposal, it's one that is populated by it. Can you double-check, too? Afaict this is the only write to response.Err. The proposal is in proposal but we copy its fields over piecemeal.

You're right; ignore this.

andreimatei

LGTM good find!
Have you looked into why TestFailureToProcessCommandClearsLocalResult didn't catch this?

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)

tbg · 2019-02-06T19:01:17Z

Yeah, it doesn't inject a terrible proposal reason. I'm not near computer, but feel free to merge.

…

On Wed, Feb 6, 2019, 19:14 Andrei Matei ***@***.*** wrote: ***@***.**** approved this pull request. LGTM good find! Have you looked into why TestFailureToProcessCommandClearsLocalResult didn't catch this? *Reviewable <https://reviewable.io/reviews/cockroachdb/cockroach/34659#-:-LY3-DOFBcDZV22eZq7b:b-eo10ue>* status: [image:

] complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten <https://github.com/nvanbenschoten> and @tbg <https://github.com/tbg>) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#34659 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE135HuWR6Wdr6yw9jLV2eDEooxBJUNgks5vKxtugaJpZM4alIOc> .

asubiotto · 2019-02-06T19:22:54Z

bors r=bdarnell,andreimatei

craig · 2019-02-06T19:52:42Z

Build failed (retrying...)

GitHub CI (Cockroach)

34651: server: rework TestClusterVersionBootstrapStrict r=andreimatei a=andreimatei This test... I'm not entirely sure what it was supposed to test to be honest, but it seemed to be more complicated than it needed to be. It forced and emphasized MinSupportedVersion being equal to BinaryServerVersion (which is generally not a thing). I've simplified it, making it not muck with the versions, while keep (I think) the things it was testing (to the extent that it was testing anything). This test was also in my way because it created servers that pretended to be versions that are not technically supported by the binary, and this kind of funkiness is making my life hard as I'm trying to rework the way in which versions are propagated and what knobs servers have, etc. Release note: None 34659: storage: don't leak committed protos to pushers on reproposal r=bdarnell,andreimatei a=tbg TODO: test ---- A recent commit (master only) reintroduced a bug that we ironically had spent a lot of time on [before]. In summary, it would allow the result of an EndTransaction which would in itself *not* apply to leak and would result in intents being committed even though their transaction ultimately would not: #34025 (comment) We've diagnosed this pretty quickly the second time around, but clearly we didn't do a good job at preventing the regression. I can see how this would happen as the method this code is in is notoriously difficult to test for it interfaces so much with everything else that it's difficult to unit test it; one needs to jump through lots of hoops to target it, and so we do it less than we ought to. I believe this wasn't released in any alpha (nor backported anywhere), so no release note is necessary. Fixes #34025. [before]: #30792 (comment) Release note: None Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Tobias Schottdorf <[email protected]>

craig · 2019-02-06T20:42:20Z

Build succeeded

GitHub CI (Cockroach)

nvanbenschoten · 2019-02-06T22:53:05Z

LGTM, fantastic find!

This adds the test promised in the PR below. When a transaction committed but the commit applied at an invalid lease applied index, we'd formerly (due to a recent change) leak the intents as committed which would cause dirty writes. Adapt an existing test to roughly do the following to prevent regression. The test (now) sets up two ranges and lets a transaction (anchored on the left) write to both of them. It then starts readers for both keys written by the txn and waits for them to enter the txn wait queue. Next, it lets the txn attempt to commit but injects a forced error below Raft. The bugs would formerly notify the txn wait queue that the transaction had committed (not true) and that its external intent (i.e. the one on the right range) could be resolved (not true). Verify that neither occurs. See cockroachdb#34659. Release note: None

34733: storage: regression test leaked intents on bounced proposal r=petermattis a=tbg This adds the test promised in the PR below. When a transaction committed but the commit applied at an invalid lease applied index, we'd formerly (due to a recent change) leak the intents as committed which would cause dirty writes. Adapt an existing test to roughly do the following to prevent regression. The test (now) sets up two ranges and lets a transaction (anchored on the left) write to both of them. It then starts readers for both keys written by the txn and waits for them to enter the txn wait queue. Next, it lets the txn attempt to commit but injects a forced error below Raft. The bugs would formerly notify the txn wait queue that the transaction had committed (not true) and that its external intent (i.e. the one on the right range) could be resolved (not true). Verify that neither occurs. See #34659. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>

tbg requested a review from a team February 6, 2019 12:25

tbg force-pushed the fix/response-perr branch from c956ab9 to 1fb8d87 Compare February 6, 2019 12:26

petermattis reviewed Feb 6, 2019

View reviewed changes

asubiotto mentioned this pull request Feb 6, 2019

release: v2.2.0-alpha.20190211 #34288

Closed

17 tasks

tbg requested review from bdarnell and nvanbenschoten February 6, 2019 15:58

bdarnell approved these changes Feb 6, 2019

View reviewed changes

tbg commented Feb 6, 2019

View reviewed changes

bdarnell approved these changes Feb 6, 2019

View reviewed changes

andreimatei approved these changes Feb 6, 2019

View reviewed changes

craig bot merged commit 1fb8d87 into cockroachdb:master Feb 6, 2019

tbg mentioned this pull request Feb 8, 2019

storage: regression test leaked intents on bounced proposal #34733

Merged

tbg mentioned this pull request Feb 11, 2019

release: periodically run TPCC checks #34788

Closed

tbg deleted the fix/response-perr branch March 13, 2019 11:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: don't leak committed protos to pushers on reproposal #34659

storage: don't leak committed protos to pushers on reproposal #34659

tbg commented Feb 6, 2019 •

edited

Loading

cockroach-teamcity commented Feb 6, 2019

petermattis left a comment

tbg commented Feb 6, 2019

petermattis commented Feb 6, 2019

tbg commented Feb 6, 2019

bdarnell left a comment

tbg left a comment

bdarnell left a comment

andreimatei left a comment

tbg commented Feb 6, 2019 via email

asubiotto commented Feb 6, 2019

craig bot commented Feb 6, 2019

craig bot commented Feb 6, 2019

nvanbenschoten commented Feb 6, 2019

storage: don't leak committed protos to pushers on reproposal #34659

storage: don't leak committed protos to pushers on reproposal #34659

Conversation

tbg commented Feb 6, 2019 • edited Loading

cockroach-teamcity commented Feb 6, 2019

petermattis left a comment

Choose a reason for hiding this comment

tbg commented Feb 6, 2019

petermattis commented Feb 6, 2019

tbg commented Feb 6, 2019

bdarnell left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

bdarnell left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

tbg commented Feb 6, 2019 via email

asubiotto commented Feb 6, 2019

craig bot commented Feb 6, 2019

Build failed (retrying...)

craig bot commented Feb 6, 2019

Build succeeded

nvanbenschoten commented Feb 6, 2019

tbg commented Feb 6, 2019 •

edited

Loading