Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvclient: fiddle with assertions and rollbacks #48297

Merged
merged 5 commits into from
May 4, 2020

Conversation

andreimatei
Copy link
Contributor

See individual commits.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@blathers-crl
Copy link

blathers-crl bot commented May 1, 2020

❌ The GitHub CI (Cockroach) build has failed on 640f4d04.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl
Copy link

blathers-crl bot commented May 1, 2020

❌ The GitHub CI (Cockroach) build has failed on 8a105744.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r1, 10 of 10 files at r2, 2 of 2 files at r3, 1 of 1 files at r4, 3 of 3 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @irfansharif, and @tbg)


pkg/kv/txn_external_test.go, line 117 at r5 (raw file):

			atomic.StoreInt64(&filterSet, 1)

			// This test uses a cluster with two nodes. It'll create a transaction

Is this going to be flaky if the lease moves around?


pkg/kv/txn_external_test.go, line 137 at r5 (raw file):

			require.NoError(t, txn.Put(ctx, key, "val"))

			// If the test we want the transaction to be committed and cleaned up,

"If the test we want" sounds wrong.


pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 806 at r5 (raw file):

}

func sanityCheckCommittedErr(ctx context.Context, pErr *roachpb.Error, ba roachpb.BatchRequest) {

Give this a comment indicating what it's sanity checking.

We had an assertion that only a TransactionAbortedError can carry an
ABORTED transaction. This assertion seems pretty dubious though - we've
just seen failed rollbacks trigger it. While that was a bug - and in
fact we caught it due to this assertion (see cockroachdb#48245), it seems hard to
ensure that rollbacks will not fail in the future.

Release note: None
I'll need it in a future commit to check wait for ctx cancelation in the
filter.

Release note: None
I'll use them in another package in a future commit.

Release note: None
Switch this utility method from collecting local traces to collecting
distributed traces. The method is mostly used in tests, with a handfull
of production uses. I don't think those production uses care.
I'll need distributed traces in a future commit, when this function is
used indirectly through the CheckPushResult() testing utility.

Release note: None
This patch improves the assertion that requests don't get errors
informing them that their transaction has been committed from underneath
them. There is a semi-legitimate case that's hitting that assertion: a
rollback sent after a commit returned an ambiguous result to the client.
This happens easily when the commit's ctx times out - the client gets an
ambiguous error and is likely to send a rollback attempt afterwards.
This rollback attempt might find a transaction record with the COMMITTED
status, in which case the assertion was hit.
This patch makes the assertion white-list this case. We were already
white-listing this specific case in other places. For other, unexpected
cases, the assertion (re-)becomes a Fatalf (from an Errorf), since an
unepectedly-committed transaction is no joke.

This patch also adds a test for the case that was triggering the
assertion. Also, it adds tests for another two most troubling cases of
rollback-after-ambiguous-commit that we handle incorrectly. Those
behaviors are not fixed; the test just reproduces and highlights them.

Touches cockroachdb#48301
Touches cockroachdb#48302

Release note: None
@blathers-crl
Copy link

blathers-crl bot commented May 4, 2020

❌ The GitHub CI (Cockroach) build has failed on dd208c0e.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif, @nvanbenschoten, and @tbg)


pkg/kv/txn_external_test.go, line 117 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this going to be flaky if the lease moves around?

well yeah, but why would the lease move? It'd better not move; we have plenty of tests like this.


pkg/kv/txn_external_test.go, line 137 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"If the test we want" sounds wrong.

done


pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 806 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Give this a comment indicating what it's sanity checking.

done

@craig
Copy link
Contributor

craig bot commented May 4, 2020

Build succeeded

@craig craig bot merged commit b8ecf62 into cockroachdb:master May 4, 2020
Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @nvanbenschoten, and @tbg)


pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 802 at r6 (raw file):

			// bug. Requests are not supposed to be sent on transactions after they
			// are committed.
			log.Errorf(ctx, "transaction unexpectedly committed: %s. ba: %s. txn: %s.", pErr, ba, errTxn)

Sorry for not getting to it earlier, so feel free to ignore this (doubly so because it's nothing to do with this diff): I'm surprised this isn't a log.Fatal given it's a "serious bug".

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif, @nvanbenschoten, and @tbg)


pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 802 at r6 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Sorry for not getting to it earlier, so feel free to ignore this (doubly so because it's nothing to do with this diff): I'm surprised this isn't a log.Fatal given it's a "serious bug".

see what's going in later commits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants