Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvclient/kvcoord: TestTxnCoordSenderPipelining failed #68948

Closed
cockroach-teamcity opened this issue Aug 14, 2021 · 3 comments · Fixed by #69019
Closed

kv/kvclient/kvcoord: TestTxnCoordSenderPipelining failed #68948

cockroach-teamcity opened this issue Aug 14, 2021 · 3 comments · Fixed by #69019
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.

Comments

@cockroach-teamcity
Copy link
Member

kv/kvclient/kvcoord.TestTxnCoordSenderPipelining failed with artifacts on master @ ed9411c488e93dca28a61a2abafc087aac00946e:

=== RUN   TestTxnCoordSenderPipelining
    test_log_scope.go:73: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestTxnCoordSenderPipelining320077597
    test_log_scope.go:74: use -show-logs to present logs inline
    txn_coord_sender_test.go:2239: 
        	Error Trace:	txn_coord_sender_test.go:2239
        	Error:      	Not equal: 
        	            	expected: []roachpb.Method{1, 22, 10, 1, 10}
        	            	actual  : []roachpb.Method{1, 22, 10, 1, 17, 10}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,2 +1,2 @@
        	            	-([]roachpb.Method) (len=5) {
        	            	+([]roachpb.Method) (len=6) {
        	            	  (roachpb.Method) 1,
        	            	@@ -5,2 +5,3 @@
        	            	  (roachpb.Method) 1,
        	            	+ (roachpb.Method) 17,
        	            	  (roachpb.Method) 10
        	Test:       	TestTxnCoordSenderPipelining
    panic.go:613: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestTxnCoordSenderPipelining320077597
--- FAIL: TestTxnCoordSenderPipelining (1.41s)
Reproduce

To reproduce, try:

make stressrace TESTS=TestTxnCoordSenderPipelining PKG=./pkg/kv/kvclient/kvcoord TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Parameters in this failure:

  • GOFLAGS=-json

Internal log

benesch marked as alumn{us/a}; resolving to nvanbenschoten instead

/cc @cockroachdb/kv nvanbenschoten

This test on roachdash | Improve this report!

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Aug 14, 2021
@erikgrinaker
Copy link
Contributor

Ran 30k stress runs, can't reproduce. Worth a closer look though.

@cockroach-teamcity
Copy link
Member Author

kv/kvclient/kvcoord.TestTxnCoordSenderPipelining failed with artifacts on master @ 04a41e7915f4a89dcc1d0dbd92466c6adf79ec9f:

=== RUN   TestTxnCoordSenderPipelining
    test_log_scope.go:73: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestTxnCoordSenderPipelining669158218
    test_log_scope.go:74: use -show-logs to present logs inline
    txn_coord_sender_test.go:2239: 
        	Error Trace:	txn_coord_sender_test.go:2239
        	Error:      	Not equal: 
        	            	expected: []roachpb.Method{1, 22, 10, 1, 10}
        	            	actual  : []roachpb.Method{1, 22, 10, 1, 17, 10}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,2 +1,2 @@
        	            	-([]roachpb.Method) (len=5) {
        	            	+([]roachpb.Method) (len=6) {
        	            	  (roachpb.Method) 1,
        	            	@@ -5,2 +5,3 @@
        	            	  (roachpb.Method) 1,
        	            	+ (roachpb.Method) 17,
        	            	  (roachpb.Method) 10
        	Test:       	TestTxnCoordSenderPipelining
    panic.go:613: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestTxnCoordSenderPipelining669158218
--- FAIL: TestTxnCoordSenderPipelining (1.08s)
Reproduce

To reproduce, try:

make stressrace TESTS=TestTxnCoordSenderPipelining PKG=./pkg/kv/kvclient/kvcoord TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Parameters in this failure:

  • GOFLAGS=-parallel=4

Internal log

benesch marked as alumn{us/a}; resolving to nvanbenschoten instead

/cc @cockroachdb/kv nvanbenschoten

This test on roachdash | Improve this report!

@nvanbenschoten
Copy link
Member

roachpb.Method 17 is a TxnHeartbeat request. I can pretty easily hit this with the following diff:

diff --git a/pkg/kv/kvclient/kvcoord/txn_coord_sender_test.go b/pkg/kv/kvclient/kvcoord/txn_coord_sender_test.go
index 3bc03b6806..61f9942388 100644
--- a/pkg/kv/kvclient/kvcoord/txn_coord_sender_test.go
+++ b/pkg/kv/kvclient/kvcoord/txn_coord_sender_test.go
@@ -2212,10 +2212,11 @@ func TestTxnCoordSenderPipelining(t *testing.T) {

        ambientCtx := log.AmbientContext{Tracer: tracing.NewTracer()}
        tsf := NewTxnCoordSenderFactory(TxnCoordSenderFactoryConfig{
-               AmbientCtx: ambientCtx,
-               Settings:   s.Cfg.Settings,
-               Clock:      s.Clock,
-               Stopper:    s.Stopper(),
+               AmbientCtx:        ambientCtx,
+               Settings:          s.Cfg.Settings,
+               Clock:             s.Clock,
+               Stopper:           s.Stopper(),
+               HeartbeatInterval: 1 * time.Millisecond,
        }, senderFn)
        db := kv.NewDB(ambientCtx, tsf, s.Clock, s.Stopper())

I figured that may point at #67215 being a culprit for the recent flakiness, but the transactions in this test would have launched a heartbeat loop even before #67215. So this may hint at some slowdown in CI or some slowdown in something below this test that causes these transactions to block for up to a second.

craig bot pushed a commit that referenced this issue Aug 17, 2021
68834: bazel: build http docs w bazel r=rail a=rickystewart

We generate this in a different way than the `Makefile` does -- we take
all the proto descriptor sets (that we need to generate anyway as part
of the build) and pass them in to `protoc` rather than passing a bunch
of include paths, which wouldn't work terribly well in the sandbox.

Also teach CI and `dev` to start building these docs.

I deleted a stray comment in `pkg/server/serverpb/admin.proto` that was
causing a diff in the generated file. I also had to patch
`@com_github_pseudomuto_protoc_gen_doc` to quash a compiler error (I
just deleted a dependency that isn't actually necessary).

Closes #65814.

Release note: None

69019: kv: deflake TestTxnCoordSenderPipelining r=nvanbenschoten a=nvanbenschoten

Fixes #68948.

This commit disables the transaction heartbeat loop in the test, which
removes a dependency on timing.

However, this doesn't explain why the test became slower over the past
few days. We see in CI that on the two separate occasions where the
test flaked, it took 1.08s and 1.41s, both above the 1s necessary for
the test to flake.

Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in a609f93 Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants