Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvserver: TestStoreRangeSplitAndMergeWithGlobalReads failed #119230

Closed
cockroach-teamcity opened this issue Feb 15, 2024 · 14 comments · Fixed by #122710
Closed

kv/kvserver: TestStoreRangeSplitAndMergeWithGlobalReads failed #119230

cockroach-teamcity opened this issue Feb 15, 2024 · 14 comments · Fixed by #122710
Assignees
Labels
A-kv Anything in KV that doesn't belong in a more specific category. A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). no-test-failure-activity O-robot Originated from a bot. P-3 Issues/test failures with no fix SLA T-kv KV Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Feb 15, 2024

kv/kvserver.TestStoreRangeSplitAndMergeWithGlobalReads failed with artifacts on master @ cc6ca026319024800395293b0fb18f05dd8eb50e:

=== RUN   TestStoreRangeSplitAndMergeWithGlobalReads
    test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/3c632028e5b231a582105881dfd5d4b5/logTestStoreRangeSplitAndMergeWithGlobalReads54447473
    test_log_scope.go:81: use -show-logs to present logs inline
    client_split_test.go:3691: condition failed to evaluate within 45s: from client_split_test.go:3693: commit wait count is 2
    panic.go:523: -- test log scope end --
--- FAIL: TestStoreRangeSplitAndMergeWithGlobalReads (53.05s)
Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-36100

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team labels Feb 15, 2024
@cockroach-teamcity cockroach-teamcity added this to the 24.1 milestone Feb 15, 2024
@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestStoreRangeSplitAndMergeWithGlobalReads failed on master @ 617bf347978dcc0d711399b1a76402d7f88de958:

=== RUN   TestStoreRangeSplitAndMergeWithGlobalReads
    test_log_scope.go:170: test logs captured to: outputs.zip/logTestStoreRangeSplitAndMergeWithGlobalReads331702955
    test_log_scope.go:81: use -show-logs to present logs inline
    client_split_test.go:3691: condition failed to evaluate within 45s: from client_split_test.go:3693: commit wait count is 2
    panic.go:523: -- test log scope end --
test logs left over in: outputs.zip/logTestStoreRangeSplitAndMergeWithGlobalReads331702955
--- FAIL: TestStoreRangeSplitAndMergeWithGlobalReads (52.99s)

Parameters:

  • attempt=1
  • run=9
  • shard=11
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestStoreRangeSplitAndMergeWithGlobalReads failed on master @ 0b7ae19e2b94b851ed8812914f57032aab699811:

=== RUN   TestStoreRangeSplitAndMergeWithGlobalReads
    test_log_scope.go:170: test logs captured to: outputs.zip/logTestStoreRangeSplitAndMergeWithGlobalReads2767376744
    test_log_scope.go:81: use -show-logs to present logs inline
    client_split_test.go:3691: condition failed to evaluate within 45s: from client_split_test.go:3693: commit wait count is 2
    panic.go:523: -- test log scope end --
test logs left over in: outputs.zip/logTestStoreRangeSplitAndMergeWithGlobalReads2767376744
--- FAIL: TestStoreRangeSplitAndMergeWithGlobalReads (53.01s)

Parameters:

  • attempt=1
  • run=12
  • shard=11
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@pav-kv
Copy link
Collaborator

pav-kv commented Feb 19, 2024

Reproduces eagerly on master @ b8cca1b:

dev test --stress --filter=TestStoreRangeSplitAndMergeWithGlobalReads pkg/kv/kvserver

Trying to bisect.

@pav-kv
Copy link
Collaborator

pav-kv commented Feb 19, 2024

This flake is not recent, e.g. the same failure occurs on 2c3b07e from Jan 9.

@pav-kv pav-kv added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv Anything in KV that doesn't belong in a more specific category. labels Feb 19, 2024
@pav-kv
Copy link
Collaborator

pav-kv commented Feb 19, 2024

It looks like we're waiting for a metric counter to be equal to 1, but it jumps to 2, so we wait indefinitely. Not sure if it's legit (and we should just make the condition laxer) or a bug. Maybe we should wait for a more robust condition.

// The commit wait count is 1 due to the split above since global reads are
// set for the default config.
var splitCount = int64(1)
testutils.SucceedsSoon(t, func() error {
if splitCount != store.Metrics().CommitWaitsBeforeCommitTrigger.Count() {
return errors.Errorf("commit wait count is %d", store.Metrics().CommitWaitsBeforeCommitTrigger.Count())
}
if splitCount != atomic.LoadInt64(&splits) {
return errors.Errorf("num splits is %d", atomic.LoadInt64(&splits))
}
return nil
})

@andrewbaptist @nvanbenschoten can you take a look since you last modified this test?

@pav-kv pav-kv added the A-testing Testing tools and infrastructure label Feb 19, 2024
@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestStoreRangeSplitAndMergeWithGlobalReads failed on master @ a36097be277adef635f55d317579ca79b450bfef:

=== RUN   TestStoreRangeSplitAndMergeWithGlobalReads
    test_log_scope.go:170: test logs captured to: outputs.zip/logTestStoreRangeSplitAndMergeWithGlobalReads3287400935
    test_log_scope.go:81: use -show-logs to present logs inline
    client_split_test.go:3691: condition failed to evaluate within 45s: from client_split_test.go:3693: commit wait count is 2
    panic.go:523: -- test log scope end --
test logs left over in: outputs.zip/logTestStoreRangeSplitAndMergeWithGlobalReads3287400935
--- FAIL: TestStoreRangeSplitAndMergeWithGlobalReads (53.01s)

Parameters:

  • attempt=1
  • run=23
  • shard=11
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestStoreRangeSplitAndMergeWithGlobalReads failed with artifacts on master @ a78e1972a82f5b1bbb50d715aff46f7b668036fe:

=== RUN   TestStoreRangeSplitAndMergeWithGlobalReads
    test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/3c632028e5b231a582105881dfd5d4b5/logTestStoreRangeSplitAndMergeWithGlobalReads549076652
    test_log_scope.go:81: use -show-logs to present logs inline
    client_split_test.go:3691: condition failed to evaluate within 45s: from client_split_test.go:3693: commit wait count is 2
    panic.go:523: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/3c632028e5b231a582105881dfd5d4b5/logTestStoreRangeSplitAndMergeWithGlobalReads549076652
--- FAIL: TestStoreRangeSplitAndMergeWithGlobalReads (53.06s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@andrewbaptist
Copy link
Collaborator

@nvanbenschoten I can take a look at this. It definitely appears this is a test only change, and I didn't fully understand what was going on with this metric. The test is uglier with this change: 124aaa3#diff-fb879e37911d655620c725817f9e37ffe56b226570268a3d31fabffd568a6c12L3678 since it now waits on a testutils.SucceedsSoon. A better way to structure this would have only the one range have GlobalReads set.

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestStoreRangeSplitAndMergeWithGlobalReads failed on master @ e50b0ec4d3a53f81e26f3776ae3f3be55d435a9a:

=== RUN   TestStoreRangeSplitAndMergeWithGlobalReads
    test_log_scope.go:170: test logs captured to: outputs.zip/logTestStoreRangeSplitAndMergeWithGlobalReads4231620982
    test_log_scope.go:81: use -show-logs to present logs inline
    client_split_test.go:3691: condition failed to evaluate within 45s: from client_split_test.go:3693: commit wait count is 2
    panic.go:523: -- test log scope end --
test logs left over in: outputs.zip/logTestStoreRangeSplitAndMergeWithGlobalReads4231620982
--- FAIL: TestStoreRangeSplitAndMergeWithGlobalReads (53.00s)

Parameters:

  • attempt=1
  • run=15
  • shard=11
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestStoreRangeSplitAndMergeWithGlobalReads failed with artifacts on master @ 31acb7a07a4e6e1e96ceb8533cfa042ea80514a8:

=== RUN   TestStoreRangeSplitAndMergeWithGlobalReads
    test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/93fbc720f61b7e85f714b9750f754229/logTestStoreRangeSplitAndMergeWithGlobalReads180609409
    test_log_scope.go:81: use -show-logs to present logs inline
    client_split_test.go:3691: condition failed to evaluate within 45s: from client_split_test.go:3693: commit wait count is 2
    panic.go:523: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/93fbc720f61b7e85f714b9750f754229/logTestStoreRangeSplitAndMergeWithGlobalReads180609409
--- FAIL: TestStoreRangeSplitAndMergeWithGlobalReads (53.01s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestStoreRangeSplitAndMergeWithGlobalReads failed with artifacts on master @ ee3168ac3e0286a63dd49ab8b9f14b036ad23bde:

=== RUN   TestStoreRangeSplitAndMergeWithGlobalReads
    test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/93fbc720f61b7e85f714b9750f754229/logTestStoreRangeSplitAndMergeWithGlobalReads1817694607
    test_log_scope.go:81: use -show-logs to present logs inline
    client_split_test.go:3691: condition failed to evaluate within 45s: from client_split_test.go:3693: commit wait count is 2
    panic.go:523: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/93fbc720f61b7e85f714b9750f754229/logTestStoreRangeSplitAndMergeWithGlobalReads1817694607
--- FAIL: TestStoreRangeSplitAndMergeWithGlobalReads (53.37s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestStoreRangeSplitAndMergeWithGlobalReads failed on master @ 04f0416d526a43741d22fd03966758dcccdeb79f:

=== RUN   TestStoreRangeSplitAndMergeWithGlobalReads
    test_log_scope.go:170: test logs captured to: outputs.zip/logTestStoreRangeSplitAndMergeWithGlobalReads3546424585
    test_log_scope.go:81: use -show-logs to present logs inline
    client_split_test.go:3691: condition failed to evaluate within 45s: from client_split_test.go:3693: commit wait count is 2
    panic.go:523: -- test log scope end --
test logs left over in: outputs.zip/logTestStoreRangeSplitAndMergeWithGlobalReads3546424585
--- FAIL: TestStoreRangeSplitAndMergeWithGlobalReads (52.99s)

Parameters:

  • attempt=1
  • run=19
  • shard=12
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestStoreRangeSplitAndMergeWithGlobalReads failed on master @ 6d65201b9b603e0b3fcf1d509ec23edfdd68de45:

=== RUN   TestStoreRangeSplitAndMergeWithGlobalReads
    test_log_scope.go:170: test logs captured to: outputs.zip/logTestStoreRangeSplitAndMergeWithGlobalReads3385134434
    test_log_scope.go:81: use -show-logs to present logs inline
    client_split_test.go:3691: condition failed to evaluate within 45s: from client_split_test.go:3693: commit wait count is 2
    panic.go:523: -- test log scope end --
test logs left over in: outputs.zip/logTestStoreRangeSplitAndMergeWithGlobalReads3385134434
--- FAIL: TestStoreRangeSplitAndMergeWithGlobalReads (52.98s)

Parameters:

  • attempt=1
  • run=24
  • shard=12
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

rickystewart added a commit to rickystewart/cockroach that referenced this issue Feb 23, 2024
craig bot pushed a commit that referenced this issue Feb 23, 2024
119599: kvserver: skip `TestStoreRangeSplitAndMergeWithGlobalReads` r=rail a=rickystewart

Flaky. See #119230

Epic: none
Release note: None

Co-authored-by: Ricky Stewart <[email protected]>
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Feb 26, 2024
Previously the test was validating the behavior with Global reads,
however since the test was not in the ccl package it required
complicated mangaling of the span configs.  Now it simply uses the
standard SQL commands to create the table.

Epic: none
Fixes: cockroachdb#119230

Release note: None
@exalate-issue-sync exalate-issue-sync bot added the P-3 Issues/test failures with no fix SLA label Mar 6, 2024
@shralex shralex removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Mar 6, 2024
Copy link

We have marked this test failure issue as stale because it has been
inactive for 1 month. If this failure is still relevant, removing the
stale label or adding a comment will keep it active. Otherwise,
we'll close it in 5 days to keep the test failure queue tidy.

andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Apr 19, 2024
The original attempt to fix this test introduced timing issues which
made this test flakey. This commit reverts all the changes to
TestStoreRangeSplitAndMergeWithGlobalReads.

Epic: none
Fixes: cockroachdb#119230

Release note: None
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Apr 22, 2024
Directly set the span config for the range under test rather than
setting the ZoneConfig and waiting for it to propagate. In addition to
simplifying the test it also makes it run faster.

Fixes: cockroachdb#119230

Epic: none

Release note: None
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Apr 22, 2024
The original attempt to fix this test introduced timing issues which
made this test flakey. This commit reverts all the changes to
TestStoreRangeSplitAndMergeWithGlobalReads.

Epic: none
Fixes: cockroachdb#119230

Release note: None
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Apr 22, 2024
Directly set the span config for the range under test rather than
setting the ZoneConfig and waiting for it to propagate. In addition to
simplifying the test it also makes it run faster.

Fixes: cockroachdb#119230

Epic: none

Release note: None
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Apr 22, 2024
Directly set the span config for the range under test rather than
setting the ZoneConfig and waiting for it to propagate. In addition to
simplifying the test it also makes it run faster.

Fixes: cockroachdb#119230

Epic: none

Release note: None
craig bot pushed a commit that referenced this issue Apr 22, 2024
122710: kvserver: fix TestStoreRangeSplitAndMergeWithGlobalReads r=arulajmani a=andrewbaptist

Directly set the span config for the range under test rather than setting the ZoneConfig and waiting for it to propagate. In addition to simplifying the test it also makes it run faster.

Fixes: #119230

Epic: none

Release note: None

122720: sql/delegate: don't include external connections in SHOW SYSTEM GRANTS r=rafiss a=rafiss

Epic: None
Release note (bug fix): Privileges granted for external connections were incorrectly showing up in SHOW SYSTEM GRANTS, but were not useful since there is no associated object name. Now they do not appear there. Instead, the SHOW GRANTS ON EXTERNAL CONNECTION syntax should be used.

Co-authored-by: Andrew Baptist <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
@craig craig bot closed this as completed in 47ca87d Apr 22, 2024
@github-project-automation github-project-automation bot moved this to Closed in KV Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). no-test-failure-activity O-robot Originated from a bot. P-3 Issues/test failures with no fix SLA T-kv KV Team
Projects
No open projects
Status: Closed
5 participants