Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvnemesis: TestKVNemesisMultiNode failed #101721

Closed
cockroach-teamcity opened this issue Apr 18, 2023 · 18 comments · Fixed by #107882
Closed

kv/kvnemesis: TestKVNemesisMultiNode failed #101721

cockroach-teamcity opened this issue Apr 18, 2023 · 18 comments · Fixed by #107882
Assignees
Labels
branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv KV Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Apr 18, 2023

kv/kvnemesis.TestKVNemesisMultiNode failed with artifacts on master @ 1cd507a7c6582fd91bb56123e62bd4fde45c4d22:

=== RUN   TestKVNemesisMultiNode
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode1865204116
    test_log_scope.go:79: use -show-logs to present logs inline
    kvnemesis_test.go:180: seed: 7605291541804074861
    kvnemesis_test.go:124: kvnemesis logging to /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis2082477957
    kvnemesis.go:165: range 65 (/Table/100/"00532edf314db371") RANGE_CONSISTENT_STATS_INCORRECT:
        stats: {ContainsEstimates:0 LastUpdateNanos:1681814839621095004 IntentAge:0 GCBytesAge:13445 LiveBytes:212 LiveCount:4 KeyBytes:435 KeyCount:11 ValBytes:290 ValCount:17 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:1 RangeKeyBytes:51 RangeValCount:1 RangeValBytes:10 SysBytes:10401 SysCount:20 AbortSpanBytes:1199}
        delta (stats-computed): {ContainsEstimates:0 LastUpdateNanos:1681814839621095004 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:0 RangeKeyBytes:0 RangeValCount:0 RangeValBytes:0 SysBytes:-6 SysCount:0 AbortSpanBytes:0}
    kvnemesis.go:165: range 78 (/Table/100/"0d1b2742d6b6062f") RANGE_CONSISTENT_STATS_INCORRECT:
        stats: {ContainsEstimates:0 LastUpdateNanos:1681814839812403396 IntentAge:0 GCBytesAge:41393 LiveBytes:53 LiveCount:1 KeyBytes:750 KeyCount:14 ValBytes:583 ValCount:38 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:1 RangeKeyBytes:51 RangeValCount:1 RangeValBytes:10 SysBytes:2488 SysCount:21 AbortSpanBytes:1199}
        delta (stats-computed): {ContainsEstimates:0 LastUpdateNanos:1681814839812403396 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:0 RangeKeyBytes:0 RangeValCount:0 RangeValBytes:0 SysBytes:6 SysCount:0 AbortSpanBytes:0}
    kvnemesis.go:185: failures(verbose): /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis2082477957/failures
        repro steps: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis2082477957/repro.go
        rangefeed KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis2082477957/kvs-rangefeed.txt
        scan KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis2082477957/kvs-scan.txt
    kvnemesis_test.go:207: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:207
        	            				github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:160
        	Error:      	Should be zero, but was 2
        	Test:       	TestKVNemesisMultiNode
        	Messages:   	kvnemesis detected failures
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode1865204116
--- FAIL: TestKVNemesisMultiNode (70.36s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-27104

Epic CRDB-27234

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Apr 18, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Apr 18, 2023
@blathers-crl blathers-crl bot added the T-kv KV Team label Apr 18, 2023
@erikgrinaker
Copy link
Contributor

This is reminiscent of #93896, where we saw SysBytes discrepancies caused by range merges racing with lease requests (fixed in #99017). However, in this case there is a symmetric 6-byte discrepancy between two ranges -- does this indicate a similar issue during range splits, where SysBytes is mis-attributed to the wrong range after the split?

@cockroach-teamcity
Copy link
Member Author

kv/kvnemesis.TestKVNemesisMultiNode failed with artifacts on master @ a4ab6b6243f509e62993b4d81855f23470d9901a:

=== RUN   TestKVNemesisMultiNode
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode1884782417
    test_log_scope.go:79: use -show-logs to present logs inline
    kvnemesis_test.go:180: seed: 7148618387478836565
    kvnemesis_test.go:124: kvnemesis logging to /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1659314035
    kvnemesis.go:165: range 75 (/Table/100/"bb01c544f45f5f14") RANGE_CONSISTENT_STATS_INCORRECT:
        stats: {ContainsEstimates:0 LastUpdateNanos:1681899478668119526 IntentAge:0 GCBytesAge:73101 LiveBytes:53 LiveCount:1 KeyBytes:693 KeyCount:13 ValBytes:559 ValCount:35 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:3 RangeKeyBytes:171 RangeValCount:5 RangeValBytes:51 SysBytes:4683 SysCount:18 AbortSpanBytes:941}
        delta (stats-computed): {ContainsEstimates:0 LastUpdateNanos:1681899478668119526 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:0 RangeKeyBytes:0 RangeValCount:0 RangeValBytes:0 SysBytes:6 SysCount:0 AbortSpanBytes:0}
    kvnemesis.go:185: failures(verbose): /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1659314035/failures
        repro steps: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1659314035/repro.go
        rangefeed KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1659314035/kvs-rangefeed.txt
        scan KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1659314035/kvs-scan.txt
    kvnemesis_test.go:207: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:207
        	            				github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:160
        	Error:      	Should be zero, but was 1
        	Test:       	TestKVNemesisMultiNode
        	Messages:   	kvnemesis detected failures
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode1884782417
--- FAIL: TestKVNemesisMultiNode (109.79s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@erikgrinaker
Copy link
Contributor

Both of these failures are with deadlock and kv.expiration_leases_only.enabled = true. This probably tickles the stats race because leases are more likely to expire, so the merge/split requests will trigger a concurrent lease acquisition.

@erikgrinaker erikgrinaker self-assigned this Apr 19, 2023
@erikgrinaker erikgrinaker added T-kv-replication and removed T-kv KV Team labels Apr 19, 2023
@blathers-crl
Copy link

blathers-crl bot commented Apr 19, 2023

cc @cockroachdb/replication

@cockroach-teamcity
Copy link
Member Author

kv/kvnemesis.TestKVNemesisMultiNode failed with artifacts on master @ 10665f9856d9c7ca51557bdf5b48fe9df796228b:

=== RUN   TestKVNemesisMultiNode
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode1227749579
    test_log_scope.go:79: use -show-logs to present logs inline
    kvnemesis_test.go:180: seed: 7800460224026642564
    kvnemesis_test.go:124: kvnemesis logging to /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3322945106
    kvnemesis.go:165: range 73 (/Table/100/"b155b47704015b5a") RANGE_CONSISTENT_STATS_INCORRECT:
        stats: {ContainsEstimates:0 LastUpdateNanos:1682157779371627510 IntentAge:0 GCBytesAge:76056 LiveBytes:53 LiveCount:1 KeyBytes:1173 KeyCount:25 ValBytes:839 ValCount:54 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:4 RangeKeyBytes:222 RangeValCount:6 RangeValBytes:62 SysBytes:3593 SysCount:14 AbortSpanBytes:598}
        delta (stats-computed): {ContainsEstimates:0 LastUpdateNanos:1682157779371627510 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:0 RangeKeyBytes:0 RangeValCount:0 RangeValBytes:0 SysBytes:18 SysCount:0 AbortSpanBytes:0}
    kvnemesis.go:165: range 106 (/Table/100/"557a2358987d8669") RANGE_CONSISTENT_STATS_INCORRECT:
        stats: {ContainsEstimates:0 LastUpdateNanos:1682157779926807589 IntentAge:0 GCBytesAge:46609 LiveBytes:53 LiveCount:1 KeyBytes:519 KeyCount:11 ValBytes:384 ValCount:24 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:2 RangeKeyBytes:137 RangeValCount:5 RangeValBytes:51 SysBytes:1076 SysCount:14 AbortSpanBytes:683}
        delta (stats-computed): {ContainsEstimates:0 LastUpdateNanos:1682157779926807589 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:0 RangeKeyBytes:0 RangeValCount:0 RangeValBytes:0 SysBytes:-18 SysCount:0 AbortSpanBytes:0}
    kvnemesis.go:185: failures(verbose): /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3322945106/failures
        repro steps: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3322945106/repro.go
        rangefeed KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3322945106/kvs-rangefeed.txt
        scan KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3322945106/kvs-scan.txt
    kvnemesis_test.go:207: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:207
        	            				github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:160
        	Error:      	Should be zero, but was 2
        	Test:       	TestKVNemesisMultiNode
        	Messages:   	kvnemesis detected failures
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode1227749579
--- FAIL: TestKVNemesisMultiNode (119.19s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvnemesis.TestKVNemesisMultiNode failed with artifacts on master @ 1f3419e178bdba544f74d9c9e14a4682efd18028:

=== RUN   TestKVNemesisMultiNode
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode1377543860
    test_log_scope.go:79: use -show-logs to present logs inline
    kvnemesis_test.go:180: seed: 1024415409345674575
    kvnemesis_test.go:124: kvnemesis logging to /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis752674863
    kvnemesis.go:165: range 115 (/Table/100/"4f372447fddf9089") RANGE_CONSISTENT_STATS_INCORRECT:
        stats: {ContainsEstimates:0 LastUpdateNanos:1682248667918326218 IntentAge:0 GCBytesAge:39780 LiveBytes:0 LiveCount:0 KeyBytes:624 KeyCount:12 ValBytes:490 ValCount:31 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:1 RangeKeyBytes:51 RangeValCount:1 RangeValBytes:10 SysBytes:3401 SysCount:21 AbortSpanBytes:1293}
        delta (stats-computed): {ContainsEstimates:0 LastUpdateNanos:1682248667918326218 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:0 RangeKeyBytes:0 RangeValCount:0 RangeValBytes:0 SysBytes:-6 SysCount:0 AbortSpanBytes:0}
    kvnemesis.go:185: failures(verbose): /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis752674863/failures
        repro steps: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis752674863/repro.go
        rangefeed KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis752674863/kvs-rangefeed.txt
        scan KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis752674863/kvs-scan.txt
    kvnemesis_test.go:207: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:207
        	            				github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:160
        	Error:      	Should be zero, but was 1
        	Test:       	TestKVNemesisMultiNode
        	Messages:   	kvnemesis detected failures
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode1377543860
--- FAIL: TestKVNemesisMultiNode (96.03s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvnemesis.TestKVNemesisMultiNode failed with artifacts on master @ 1f3419e178bdba544f74d9c9e14a4682efd18028:

=== RUN   TestKVNemesisMultiNode
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode4065196892
    test_log_scope.go:79: use -show-logs to present logs inline
    kvnemesis_test.go:180: seed: 7646607177314613521
    kvnemesis_test.go:124: kvnemesis logging to /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3681986409
    kvnemesis.go:165: range 147 (/Table/100/"7bc4d1e4075c7520") RANGE_CONSISTENT_STATS_INCORRECT:
        stats: {ContainsEstimates:0 LastUpdateNanos:1682329766680715445 IntentAge:0 GCBytesAge:34428 LiveBytes:0 LiveCount:0 KeyBytes:285 KeyCount:5 ValBytes:243 ValCount:15 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:0 RangeKeyBytes:0 RangeValCount:0 RangeValBytes:0 SysBytes:1169 SysCount:10 AbortSpanBytes:420}
        delta (stats-computed): {ContainsEstimates:0 LastUpdateNanos:1682329766680715445 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:0 RangeKeyBytes:0 RangeValCount:0 RangeValBytes:0 SysBytes:6 SysCount:0 AbortSpanBytes:0}
    kvnemesis.go:185: failures(verbose): /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3681986409/failures
        repro steps: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3681986409/repro.go
        rangefeed KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3681986409/kvs-rangefeed.txt
        scan KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3681986409/kvs-scan.txt
    kvnemesis_test.go:207: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:207
        	            				github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:160
        	Error:      	Should be zero, but was 1
        	Test:       	TestKVNemesisMultiNode
        	Messages:   	kvnemesis detected failures
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode4065196892
--- FAIL: TestKVNemesisMultiNode (172.17s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvnemesis.TestKVNemesisMultiNode failed with artifacts on master @ 9fb03ee9b365d656cc26728cc87f682a385d511b:

=== RUN   TestKVNemesisMultiNode
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode2090673101
    test_log_scope.go:79: use -show-logs to present logs inline
    kvnemesis_test.go:180: seed: 753312517745099591
    kvnemesis_test.go:124: kvnemesis logging to /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1232282953
    kvnemesis.go:165: error applying x.AdminMerge(ctx, tk(6314721792752128086)) // kv/kvserver/replica_command.go:808: merge failed: waiting for all right-hand replicas to initialize: operation "wait for replicas init" timed out after 5.002s (given timeout 5s): grpc: context deadline exceeded [code 4/DeadlineExceeded]: kv/kvserver/replica_command.go:808: merge failed: waiting for all right-hand replicas to initialize: operation "wait for replicas init" timed out after 5.002s (given timeout 5s): grpc: context deadline exceeded [code 4/DeadlineExceeded]
    kvnemesis.go:185: failures(verbose): /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1232282953/failures
        repro steps: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1232282953/repro.go
        rangefeed KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1232282953/kvs-rangefeed.txt
        scan KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1232282953/kvs-scan.txt
    kvnemesis_test.go:207: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:207
        	            				github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:160
        	Error:      	Should be zero, but was 1
        	Test:       	TestKVNemesisMultiNode
        	Messages:   	kvnemesis detected failures
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode2090673101
--- FAIL: TestKVNemesisMultiNode (19.84s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@erikgrinaker erikgrinaker added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Apr 27, 2023
@cockroach-teamcity
Copy link
Member Author

kv/kvnemesis.TestKVNemesisMultiNode failed with artifacts on master @ 9e0b06bf61700d5192f171c6ebaec7e8a10fa6e6:

=== RUN   TestKVNemesisMultiNode
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode2226663341
    test_log_scope.go:79: use -show-logs to present logs inline
    kvnemesis_test.go:180: seed: 8827703835256234507
    kvnemesis_test.go:124: kvnemesis logging to /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis190548753
    kvnemesis.go:165: range 66 (/Table/100/"43181f1ddecdf899") RANGE_CONSISTENT_STATS_INCORRECT:
        stats: {ContainsEstimates:0 LastUpdateNanos:1683110290381382106 IntentAge:0 GCBytesAge:12442 LiveBytes:0 LiveCount:0 KeyBytes:1281 KeyCount:25 ValBytes:975 ValCount:63 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:2 RangeKeyBytes:115 RangeValCount:3 RangeValBytes:31 SysBytes:4629 SysCount:24 AbortSpanBytes:1532}
        delta (stats-computed): {ContainsEstimates:0 LastUpdateNanos:1683110290381382106 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 RangeKeyCount:0 RangeKeyBytes:0 RangeValCount:0 RangeValBytes:0 SysBytes:-10 SysCount:0 AbortSpanBytes:0}
    kvnemesis.go:185: failures(verbose): /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis190548753/failures
        repro steps: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis190548753/repro.go
        rangefeed KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis190548753/kvs-rangefeed.txt
        scan KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis190548753/kvs-scan.txt
    kvnemesis_test.go:207: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:207
        	            				github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:160
        	Error:      	Should be zero, but was 1
        	Test:       	TestKVNemesisMultiNode
        	Messages:   	kvnemesis detected failures
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode2226663341
--- FAIL: TestKVNemesisMultiNode (16.69s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvnemesis.TestKVNemesisMultiNode failed with artifacts on master @ 69875da00208dd792cb92aee54811e8a660773a9:

=== RUN   TestKVNemesisMultiNode
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode2895614279
    test_log_scope.go:79: use -show-logs to present logs inline
    kvnemesis_test.go:180: seed: 2840821478841235807
    kvnemesis_test.go:124: kvnemesis logging to /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1782552098
    kvnemesis.go:165: error applying x.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {
          txn.SetIsoLevel(isolation.Snapshot)
          txn.ReverseScan(ctx, tk(2472807650999776716), tk(15928832316702100637), 0) // kv/kvclient/kvcoord/txn_coord_sender.go:754: unexpected txn state: "unnamed" meta={id=40ecf13e key=/Table/100/"08ce982a34f906c6" iso=Snapshot pri=0.02441066 epo=1 ts=1684310680.015681917,0 min=1684310679.797745821,0 seq=0} lock=true stat=STAGING rts=1684310680.015681917,0 wto=false gul=1684310679.807745821,0; heartbeat observed status: PENDING
          txn.Get(ctx, tk(9210135055451966804)) // omitted
          txn.Put(ctx, tk(634611904550602438), sv(495)) // omitted
          b := &kv.Batch{}
          b.Put(tk(3897933467779442998), sv(496)) // TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE): "unnamed" meta={id=40ecf13e key=/Table/100/"08ce982a34f906c6" iso=Snapshot pri=0.02441066 epo=0 ts=1684310679.905093068,1 min=1684310679.797745821,0 seq=3} lock=true stat=STAGING rts=1684310679.807745821,0 wto=false gul=1684310679.807745821,0 ifw=2
          txn.CommitInBatch(ctx, b) // TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE): "unnamed" meta={id=40ecf13e key=/Table/100/"08ce982a34f906c6" iso=Snapshot pri=0.02441066 epo=0 ts=1684310679.905093068,1 min=1684310679.797745821,0 seq=3} lock=true stat=STAGING rts=1684310679.807745821,0 wto=false gul=1684310679.807745821,0 ifw=2
          return nil
        }) // kv/kvclient/kvcoord/txn_coord_sender.go:754: unexpected txn state: "unnamed" meta={id=40ecf13e key=/Table/100/"08ce982a34f906c6" iso=Snapshot pri=0.02441066 epo=1 ts=1684310680.015681917,0 min=1684310679.797745821,0 seq=0} lock=true stat=STAGING rts=1684310680.015681917,0 wto=false gul=1684310679.807745821,0; heartbeat observed status: PENDING: kv/kvclient/kvcoord/txn_coord_sender.go:754: unexpected txn state: "unnamed" meta={id=40ecf13e key=/Table/100/"08ce982a34f906c6" iso=Snapshot pri=0.02441066 epo=1 ts=1684310680.015681917,0 min=1684310679.797745821,0 seq=0} lock=true stat=STAGING rts=1684310680.015681917,0 wto=false gul=1684310679.807745821,0; heartbeat observed status: PENDING
    kvnemesis.go:165: error applying x.ReverseScan(ctx, tk(2472807650999776716), tk(15928832316702100637), 0) // kv/kvclient/kvcoord/txn_coord_sender.go:754: unexpected txn state: "unnamed" meta={id=40ecf13e key=/Table/100/"08ce982a34f906c6" iso=Snapshot pri=0.02441066 epo=1 ts=1684310680.015681917,0 min=1684310679.797745821,0 seq=0} lock=true stat=STAGING rts=1684310680.015681917,0 wto=false gul=1684310679.807745821,0; heartbeat observed status: PENDING: kv/kvclient/kvcoord/txn_coord_sender.go:754: unexpected txn state: "unnamed" meta={id=40ecf13e key=/Table/100/"08ce982a34f906c6" iso=Snapshot pri=0.02441066 epo=1 ts=1684310680.015681917,0 min=1684310679.797745821,0 seq=0} lock=true stat=STAGING rts=1684310680.015681917,0 wto=false gul=1684310679.807745821,0; heartbeat observed status: PENDING
    kvnemesis.go:185: failures(verbose): /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1782552098/failures
        repro steps: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1782552098/repro.go
        rangefeed KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1782552098/kvs-rangefeed.txt
        scan KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1782552098/kvs-scan.txt
    kvnemesis_test.go:207: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:207
        	            				github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:160
        	Error:      	Should be zero, but was 2
        	Test:       	TestKVNemesisMultiNode
        	Messages:   	kvnemesis detected failures
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode2895614279
--- FAIL: TestKVNemesisMultiNode (28.67s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvnemesis.TestKVNemesisMultiNode failed with artifacts on master @ 69875da00208dd792cb92aee54811e8a660773a9:

=== RUN   TestKVNemesisMultiNode
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode3901150746
    test_log_scope.go:79: use -show-logs to present logs inline
    kvnemesis_test.go:180: seed: 3837436284393373776
    kvnemesis_test.go:124: kvnemesis logging to /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3813137540
    kvnemesis.go:165: error applying x.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {
          txn.SetIsoLevel(isolation.Snapshot)
          {
            b := &kv.Batch{}
            txn.Run(ctx, b) // <nil>
          }
          txn.Scan(ctx, tk(1698754368228448779), tk(12972774191588256712), 0) // kv/kvclient/kvcoord/txn_coord_sender.go:754: unexpected txn state: "unnamed" meta={id=32117260 key=/Table/100/"f50e741697ca5f12" iso=Snapshot pri=0.01451727 epo=1 ts=1684310860.032580778,0 min=1684310859.418820247,0 seq=0} lock=true stat=STAGING rts=1684310860.032580778,0 wto=false gul=1684310859.428820247,0; heartbeat observed status: PENDING
          txn.Scan(ctx, tk(8721585885451996246), tk(11377696540311594206), 0) // omitted
          b := &kv.Batch{}
          b.Del(tk(17658178829351018258) /* @s545 */) // TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE): "unnamed" meta={id=32117260 key=/Table/100/"f50e741697ca5f12" iso=Snapshot pri=0.01451727 epo=0 ts=1684310859.578805914,1 min=1684310859.418820247,0 seq=3} lock=true stat=STAGING rts=1684310859.418820247,0 wto=false gul=1684310859.428820247,0 int=1 ifw=1
          b.Del(tk(9373485187807277138) /* @s546 */) // TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE): "unnamed" meta={id=32117260 key=/Table/100/"f50e741697ca5f12" iso=Snapshot pri=0.01451727 epo=0 ts=1684310859.578805914,1 min=1684310859.418820247,0 seq=3} lock=true stat=STAGING rts=1684310859.418820247,0 wto=false gul=1684310859.428820247,0 int=1 ifw=1
          txn.CommitInBatch(ctx, b) // TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE): "unnamed" meta={id=32117260 key=/Table/100/"f50e741697ca5f12" iso=Snapshot pri=0.01451727 epo=0 ts=1684310859.578805914,1 min=1684310859.418820247,0 seq=3} lock=true stat=STAGING rts=1684310859.418820247,0 wto=false gul=1684310859.428820247,0 int=1 ifw=1
          return nil
        }) // kv/kvclient/kvcoord/txn_coord_sender.go:754: unexpected txn state: "unnamed" meta={id=32117260 key=/Table/100/"f50e741697ca5f12" iso=Snapshot pri=0.01451727 epo=1 ts=1684310860.032580778,0 min=1684310859.418820247,0 seq=0} lock=true stat=STAGING rts=1684310860.032580778,0 wto=false gul=1684310859.428820247,0; heartbeat observed status: PENDING: kv/kvclient/kvcoord/txn_coord_sender.go:754: unexpected txn state: "unnamed" meta={id=32117260 key=/Table/100/"f50e741697ca5f12" iso=Snapshot pri=0.01451727 epo=1 ts=1684310860.032580778,0 min=1684310859.418820247,0 seq=0} lock=true stat=STAGING rts=1684310860.032580778,0 wto=false gul=1684310859.428820247,0; heartbeat observed status: PENDING
    kvnemesis.go:165: error applying x.Scan(ctx, tk(1698754368228448779), tk(12972774191588256712), 0) // kv/kvclient/kvcoord/txn_coord_sender.go:754: unexpected txn state: "unnamed" meta={id=32117260 key=/Table/100/"f50e741697ca5f12" iso=Snapshot pri=0.01451727 epo=1 ts=1684310860.032580778,0 min=1684310859.418820247,0 seq=0} lock=true stat=STAGING rts=1684310860.032580778,0 wto=false gul=1684310859.428820247,0; heartbeat observed status: PENDING: kv/kvclient/kvcoord/txn_coord_sender.go:754: unexpected txn state: "unnamed" meta={id=32117260 key=/Table/100/"f50e741697ca5f12" iso=Snapshot pri=0.01451727 epo=1 ts=1684310860.032580778,0 min=1684310859.418820247,0 seq=0} lock=true stat=STAGING rts=1684310860.032580778,0 wto=false gul=1684310859.428820247,0; heartbeat observed status: PENDING
    kvnemesis.go:185: failures(verbose): /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3813137540/failures
        repro steps: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3813137540/repro.go
        rangefeed KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3813137540/kvs-rangefeed.txt
        scan KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3813137540/kvs-scan.txt
    kvnemesis_test.go:207: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:207
        	            				github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:160
        	Error:      	Should be zero, but was 2
        	Test:       	TestKVNemesisMultiNode
        	Messages:   	kvnemesis detected failures
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode3901150746
--- FAIL: TestKVNemesisMultiNode (104.84s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@aliher1911
Copy link
Contributor

aliher1911 commented May 22, 2023

My current hypothesis is that the discrepancy comes from the fact that lease extensions doesn't obtain latches and could could change lease and mvcc stats in parallel to other requests. The only requests that touches lease are likely range split and range merge which are done as a commit trigger when executing EndTxnRequest. There was a fix for merge op already improving that and I think current failure is happening during splits.

Current repro of the failure shows discrepancy of 6 bytes on SystemBytes stat. This always happen when there's a lease extension happening in parallel to split. This change itself is mostly harmless as size of lease value shouldn't change, but it changes if we have a local component to the timestamp. Leases that have local component on commit and expiration timestamps are 6 bytes larger than without and that matches the size difference. Moreover failure always happens when lease extension happening close to the split changes from using local to local == 0. As an experiment, resetting local component to 0 makes failure go away. We can't use it as a fix as it is not correct and could cause problems.

Current understanding of the cause:

// Scan to compute the stats for the first side.
var absPostSplitFirst enginepb.MVCCStats
var err error
if h.in.ScanRightFirst {
absPostSplitFirst, err = input.AbsPostSplitRightFn()
h.absPostSplitRight = &absPostSplitFirst
} else {
absPostSplitFirst, err = input.AbsPostSplitLeftFn()
h.absPostSplitLeft = &absPostSplitFirst
}
if err != nil {
return splitStatsHelper{}, err
}
if h.in.AbsPreSplitBothEstimated.ContainsEstimates == 0 &&
h.in.DeltaBatchEstimated.ContainsEstimates == 0 {
// We have CombinedErrorDelta zero, so use arithmetic to compute the
// stats for the second side.
ms := h.in.AbsPreSplitBothEstimated
ms.Subtract(absPostSplitFirst)
ms.Add(h.in.DeltaBatchEstimated)
ms.Add(h.in.DeltaRangeKey)
if h.in.ScanRightFirst {
h.absPostSplitLeft = &ms
} else {
h.absPostSplitRight = &ms
}
return h, nil
}

h.in.AbsPreSplitBothEstimated contains stats from replica obtained by:
// GetMVCCStats returns a copy of the MVCC stats object for this range.
// This accessor is thread-safe, but provides no guarantees about its
// synchronization with any concurrent writes.
func (r *Replica) GetMVCCStats() enginepb.MVCCStats {
r.mu.RLock()
defer r.mu.RUnlock()
return *r.mu.state.Stats
}

which is an in-memory most recent state of stats.
We then compute(scan) LHS stats of snapshot and we end up with full stats that are inconsistent with LHS stats. We then compute RHS using full and LHS which refer different points in time that can have lease extension inbetween. That would produce skewed RHS.
We then use absPostSplitLeft in
func (h splitStatsHelper) DeltaPostSplitLeft() enginepb.MVCCStats {
// NB: if we ever wanted to also write to the left hand side after init'ing
// the helper, we can make that work, too.
// NB: note how none of this depends on mutations to absPostSplitRight.
ms := *h.absPostSplitLeft
ms.Subtract(h.in.AbsPreSplitBothEstimated)
return ms
}

to produce final stats delta that we return as a result of execution. This would contain delta between split time snapshot and newer version of stats from replica.

aliher1911 added a commit to aliher1911/cockroach that referenced this issue May 22, 2023
Previouly split trigger was using in-memory copy of mvccstats from replica.
This value could diverge from store snapshot used by request in special cases
of lease extensions that don't hold latches.
This was causing stats for the replica to diverge.
This PR changes base stats to be read from request reader which provides
consistent stats value with content of splitted ranges.

Touches: cockroachdb#101721

Release note: None
@aliher1911
Copy link
Contributor

@nvanbenschoten can you look at #101721 (comment) if it looks familiar in any way? This issue has bunch of different failures. Top level ones come are for mvcc stats, but transaction commit might be connected with changes you merged recently.

aliher1911 added a commit to aliher1911/cockroach that referenced this issue May 22, 2023
Previouly split trigger was using in-memory copy of mvccstats from replica.
This value could diverge from store snapshot used by request in special cases
of lease extensions that don't hold latches.
This was causing stats for the replica to diverge.
This PR changes base stats to be read from request reader which provides
consistent stats value with content of splitted ranges.

Touches: cockroachdb#101721

Release note: None
aliher1911 added a commit to aliher1911/cockroach that referenced this issue May 23, 2023
Previouly split trigger was using in-memory copy of mvccstats from replica.
This value could diverge from store snapshot used by request in special cases
of lease extensions that don't hold latches.
This was causing stats for the replica to diverge.
This PR changes base stats to be read from request reader which provides
consistent stats value with content of splitted ranges.

Touches: cockroachdb#101721

Release note: None
craig bot pushed a commit that referenced this issue May 23, 2023
103719: kvserver/batcheval: fix mvcc stats computaion on split r=nvanbenschoten a=aliher1911

Previouly split trigger was using in-memory copy of mvccstats from replica. This value could diverge from store snapshot used by request in special cases of lease extensions that don't hold latches.
This was causing stats for the replica to diverge. This PR changes base stats to be read from request reader which provides consistent stats value with content of splitted ranges.

Touches: #101721
Fixes: #102410

Release note: None

Co-authored-by: Oleg Afanasyev <[email protected]>
@aliher1911
Copy link
Contributor

We havn't seen many failures after stats were fixed. I managed to repro transaction push timeout with 50k reruns while testing stats:

--- FAIL: TestKVNemesisMultiNode (13.16s)
    test_log_scope.go:161: test logs captured to: /tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode3315994629
    test_log_scope.go:79: use -show-logs to present logs inline
    kvnemesis_test.go:180: seed: 910877016883269571
    kvnemesis_test.go:124: kvnemesis logging to /tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1584584814
    kvnemesis.go:165: error applying x.Scan(ctx, tk(4919960733718109041), tk(8091360691588176074), 0) // failed to push meta={id=d01cdc27 key=/Table/100/"480aacd5586decfa" iso=Snapshot pri=0.06480878 epo=1 ts=1684954142.358901691,1 min=1684954142.358901691,0 seq=2} lock=true stat=PENDING rts=0,0 wto=false gul=0,0: {<nil> 0 {0xc003a2caf8} index:0 1684954142.383387488,0}
    kvnemesis.go:185: failures(verbose): /tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1584584814/failures
        repro steps: /tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1584584814/repro.go
        rangefeed KVs: /tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1584584814/kvs-rangefeed.txt
        scan KVs: /tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1584584814/kvs-scan.txt
    kvnemesis_test.go:207:
                Error Trace:    github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:207
                                                        github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:160
                Error:          Should be zero, but was 1
                Test:           TestKVNemesisMultiNode
                Messages:       kvnemesis detected failures
    panic.go:522: -- test log scope end --
FAIL
I230524 18:49:03.146045 1 (gostd) testmain.go:100  [-] 1  Test //pkg/kv/kvnemesis:kvnemesis_test exited with error code 1

@aliher1911
Copy link
Contributor

I'll put issue back into backlog.

@cockroach-teamcity
Copy link
Member Author

kv/kvnemesis.TestKVNemesisMultiNode failed with artifacts on master @ e57e9742146b7fac7f7530fa6224e11121ada92f:

Fatal error:

panic: use of Span after Finish. Span: /cockroach.roachpb.Internal/Batch. Finish previously called at: <stack not captured. Set debugUseAfterFinish> [recovered]
	panic: use of Span after Finish. Span: /cockroach.roachpb.Internal/Batch. Finish previously called at: <stack not captured. Set debugUseAfterFinish>

Stack:

goroutine 7767151 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).recover(0x1f8006a?, {0x69690a8, 0xc012f9d3b0})
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:230 +0x6a
panic({0x4b26d80, 0xc0087e0730})
	GOROOT/src/runtime/panic.go:884 +0x212
github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).detectUseAfterFinish(0x69690a8?)
	github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:182 +0x12e
github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).Tracer(...)
	github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:225
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicaDecoder).createTracingSpans(0xc0066c7658, {0x69690a8, 0xc0029e5980})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_application_decoder.go:154 +0x4f0
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicaDecoder).DecodeAndBind(0xc0066c7200?, {0x69690a8, 0xc0029e5980}, {0xc0153d2750?, 0x2?, 0x10?})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_application_decoder.go:63 +0x66
github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).Decode(...)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:142
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked(_, {_, _}, {{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:878 +0x73a
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReady(_, {_, _}, {{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:728 +0x1b0
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady(0xc006662000, 0x2?)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:689 +0x14b
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftSchedulerShard).worker(0xc00de44540, {0x69690a8, 0xc012f9d3b0}, {0x694eee0, 0xc006662000}, 0xc006581500)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:416 +0x274
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).Start.func2({0x69690a8?, 0xc012f9d3b0?})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:321 +0x45
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 +0x146
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:475 +0x43b
Log preceding fatal error

=== RUN   TestKVNemesisMultiNode
    test_log_scope.go:167: test logs captured to: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisMultiNode2457498475
    test_log_scope.go:81: use -show-logs to present logs inline
    kvnemesis_test.go:283: seed: 8481106198570085994
    test_server_shim.go:98: 
        Test server was configured to route SQL queries to a secondary tenant (virtual cluster).
        If you are only seeing a test failure when this message appears, there may be a problem
        specific to cluster virtualization or multi-tenancy.
        
        To investigate, consider using "COCKROACH_TEST_TENANT=true" to force-enable just
        the secondary tenant in all runs (or, alternatively, "false" to force-disable), or use
        "COCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTING=false" to disable all random test variables altogether.
*
* WARNING: test tenant requested by configuration, but code organization prevents start!
* OSS binaries do not include enterprise features
*
*
* WARNING: test tenant requested by configuration, but code organization prevents start!
* OSS binaries do not include enterprise features
*
*
* WARNING: test tenant requested by configuration, but code organization prevents start!
* OSS binaries do not include enterprise features
*
*
* WARNING: test tenant requested by configuration, but code organization prevents start!
* OSS binaries do not include enterprise features
*
    kvnemesis_test.go:197: kvnemesis logging to /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis4034049211

Parameters: TAGS=bazel,gss,deadlock , stress=true

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@erikgrinaker
Copy link
Contributor

The last span use after finish might be the same issue as #107242.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 31, 2023
…iority

DNM: needs more testing.

Fixes cockroachdb#105330.
Fixes cockroachdb#101721.

This commit updates the transaction push logic to stop pushes from completing
successfully when the pushee is STAGING and the pusher has equal priority. Prior
to this change, the pusher would win in these cases when using a PUSH_TIMESTAMP
if at least one of the two transactions involved used a weak isolation level.

This had two undesirable effects:
- if the pushee was still possibly committable and requiring recovery
  (`!(knownHigherTimestamp || knownHigherEpoch)` in the code) then the pusher
  would immediately begin parallel commit recovery, attempting to disrupt the
  commit and abort the pushee. This is undesirable because the pushee may still
  be in progress and in cases of equal priority, we'd like to wait for the
  parallel commit to complete before kicking off recovery, deferring recovery to
  only the cases where the pushee/committers's heartbeat has expired.
- if the pushee was known too be uncommittable (`knownHigherTimestamp || knownHigherEpoch`
  in the code), then txn recovery was not kicked off but the pusher still could
  not perform the PUSH_TIMESTAMP (see e40c1b4), so it would return a
  `TransactionPushError`. This confused logic in `handleTransactionPushError`,
  allowing the error to escape to the client.

This commit resolves both issues by considering the pushee's transaction status
in `txnwait.ShouldPushImmediately`.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 2, 2023
Fixes cockroachdb#105330.
Fixes cockroachdb#101721.

This commit updates the transaction push logic to stop pushes from completing
successfully when the pushee is STAGING and the pusher has equal priority. Prior
to this change, the pusher would win in these cases when using a PUSH_TIMESTAMP
if at least one of the two transactions involved used a weak isolation level.

This had two undesirable effects:
- if the pushee was still possibly committable and requiring recovery
  (`!(knownHigherTimestamp || knownHigherEpoch)` in the code) then the pusher
  would immediately begin parallel commit recovery, attempting to disrupt the
  commit and abort the pushee. This is undesirable because the pushee may still
  be in progress and in cases of equal priority, we'd like to wait for the
  parallel commit to complete before kicking off recovery, deferring recovery to
  only the cases where the pushee/committers's heartbeat has expired.
- if the pushee was known too be uncommittable (`knownHigherTimestamp || knownHigherEpoch`
  in the code), then txn recovery was not kicked off but the pusher still could
  not perform the PUSH_TIMESTAMP (see e40c1b4), so it would return a
  `TransactionPushError`. This confused logic in `handleTransactionPushError`,
  allowing the error to escape to the client.

This commit resolves both issues by considering the pushee's transaction status
in `txnwait.ShouldPushImmediately`.

Release note: None
craig bot pushed a commit that referenced this issue Aug 3, 2023
107882: kv: don't let pusher win when pushing STAGING txn with equal priority r=miraradeva a=nvanbenschoten

Fixes #105330.
Fixes #101721.

This commit updates the transaction push logic to stop pushes from completing successfully when the pushee is STAGING and the pusher has equal priority. Prior to this change, the pusher would win in these cases when using a PUSH_TIMESTAMP if at least one of the two transactions involved used a weak isolation level.

This had two undesirable effects:
- if the pushee was still possibly committable and requiring recovery (`!(knownHigherTimestamp || knownHigherEpoch)` in the code) then the pusher would immediately begin parallel commit recovery, attempting to disrupt the commit and abort the pushee. This is undesirable because the pushee may still be in progress and in cases of equal priority, we'd like to wait for the parallel commit to complete before kicking off recovery, deferring recovery to only the cases where the pushee/committers's heartbeat has expired.
- if the pushee was known to be uncommittable (`knownHigherTimestamp || knownHigherEpoch` in the code), then txn recovery was not kicked off but the pusher still could not perform the PUSH_TIMESTAMP (see e40c1b4), so it would return a `TransactionPushError`. This confused logic in `handleTransactionPushError`, allowing the error to escape to the client.

This commit resolves both issues by considering the pushee's transaction status in `txnwait.ShouldPushImmediately`.

Release note: None

107962: docs: do not generate docs for crdb_internal functions r=knz a=rafiss

Epic: None
Release note: None

108062: server: Add rangefeed metrics r=miretskiy a=miretskiy

Add rangefeed related metrics to keep track of the number of actively running rangefeeds on the server.

Epic: CRDB-26372
Release note: None

108071: stmtdiagnostics: use background context when building the bundle r=yuzefovich a=yuzefovich

When the context is canceled, we still want to build the bundle as best as possible. Over in 532274b we introduced the usage of the background context in order to insert the bundle into the system tables, but we still built the bundle with the canceled context. This commit fixes that oversight - in particular, we should now get `env.sql` correctly.

Informs: https://github.com/cockroachlabs/support/issues/2494.
Epic: None

Release note: None

108085: builtins: improve docs and refactor code for unordered_unique_id r=rafiss,rharding6373 a=andyyang890

**builtins: improve documentation for unordered_unique_rowid**

This patch clarifies that `unordered_unique_rowid` generates unique IDs
that are statistically likely to be unordered because it bit-reverses
the insert timestamp for use as the ID's timestamp bit segment.

Release note: None

----

**builtinconstants: define constants for unique int bit segments**

This patch defines constants for the sizes and bitmasks for each
bit segment in a unique int generated for row IDs.

Release note: None

----

**builtins: refactor bit shifting logic in mapToUnorderedUniqueInt**

This patch refactors the bit shifting logic in `mapToUnorderedUniqueInt`
to use constants instead of magic numbers.

Release note: None

----

Epic: None


Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Andy Yang <[email protected]>
@craig craig bot closed this as completed in 90516db Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants