Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvserver/allocator/allocatorimpl: TestAllocatorFullDisks failed #100033

Closed
cockroach-teamcity opened this issue Mar 30, 2023 · 4 comments · Fixed by #100589
Closed

kv/kvserver/allocator/allocatorimpl: TestAllocatorFullDisks failed #100033

cockroach-teamcity opened this issue Mar 30, 2023 · 4 comments · Fixed by #100589
Assignees
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). GA-blocker O-robot Originated from a bot. T-kv KV Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Mar 30, 2023

kv/kvserver/allocator/allocatorimpl.TestAllocatorFullDisks failed with artifacts on release-23.1 @ aec78f33d45a8376a0ecec885688bae60dbfb85c:

=== RUN   TestAllocatorFullDisks
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/5445000f6d8e70a16462d5297e3e2235/logTestAllocatorFullDisks162706032
    test_log_scope.go:79: use -show-logs to present logs inline
    allocator_test.go:8511: testStore 11 ran out of space during generation 60 (rangesAdded=1199/1200): disk (capacity=1.0 GiB, available=-16 MiB, used=1.0 GiB, logicalBytes=1.0 GiB), ranges=65, leases=0, queries=0.00, writes=0.00, ioThreshold={{0 0 0 0}} bytesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=0.00 p90=0.00 pMax=0.00}, writesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=0.00 p90=0.00 pMax=0.00}
    allocator_test.go:8573: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/5445000f6d8e70a16462d5297e3e2235/logTestAllocatorFullDisks162706032
--- FAIL: TestAllocatorFullDisks (2.00s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-26187

@cockroach-teamcity cockroach-teamcity added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Mar 30, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Mar 30, 2023
@blathers-crl blathers-crl bot added the T-kv KV Team label Mar 30, 2023
@kvoli
Copy link
Collaborator

kvoli commented Apr 4, 2023

This test failed after 20 minutes of stressing on master. I'm going to look further into it now.

=== RUN   TestAllocatorFullDisks
    test_log_scope.go:161: test logs captured to: /home/kvoli/go/src/github.com/cockroachdb/testoutput/_tmp/5445000f6d8e70a16462d5297e3e2235/logTestAllocatorFullDisks342762789
    test_log_scope.go:79: use -show-logs to present logs inline
    allocator_test.go:8511: testStore 11 ran out of space during generation 58 (rangesAdded=1197/1200): disk (capacity=1.0 GiB, available=-16 MiB, used=1.0 GiB, logicalBytes=1.0 GiB), ranges=65, leases=0, queries=0.00, writes=0.00, ioThreshold={{0 0 0 0}} bytesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=0.00 p90=0.00 pMax=0.00}, writesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=0.00 p90=0.00 pMax=0.00}
    allocator_test.go:8573: -- test log scope end --
test logs left over in: /home/kvoli/go/src/github.com/cockroachdb/testoutput/_tmp/5445000f6d8e70a16462d5297e3e2235/logTestAllocatorFullDisks342762789
--- FAIL: TestAllocatorFullDisks (4.14s)
FAIL
I230403 21:36:38.434111 1 (gostd) testmain.go:240  [-] 1  Test //pkg/kv/kvserver/allocator/allocatorimpl:allocatorimpl_test exited with error code 1


ERROR: exit status 1

7835 runs completed, 1 failures, over 20m2s
context canceled

@kvoli
Copy link
Collaborator

kvoli commented Apr 4, 2023

I think this is caused by incorrectly using the shed threshold instead of the rebalance threshold when calculating the total number of ranges for the test - introduced in #97409.

rangesPerNode := int(math.Floor(capacity * do.ShedAndBlockAllThreshold / rangeSize))

craig bot pushed a commit that referenced this issue Apr 4, 2023
100189:  kvcoord: Restart ranges on a dedicated goroutine. r=miretskiy a=miretskiy

Restart ranges on a dedicated goroutine (if needed).
Fix logic bug in stuck range handling.
Increase verbosity of logging to help debug mux rangefeed issues.

Informs #99560
Informs #99640
Informs #99214
Informs #98925
Informs #99092
Informs #99212
Informs #99910
Informs #99560

Release note: None

100525: rpc: Handle closed error r=erikgrinaker a=andrewbaptist

We close the listener before closing the connection. This can result in a spurious failure due to the Listener also closing our connection.

Epic: none
Fixes: #100391
Fixes: #77754
Informs: #80034

Release note: None

100528: sql: fix flaky TestSQLStatsCompactor r=j82w a=j82w

The test failure is showing more total wide scans
than expected. Change the compact stats job to run
once a year to avoid it running at the same time
as the test.

The interceptor is disabled right after delete
reducing the possibility of another operation
causing a conflict.

Epic: none
closes: #99653

Release note: none

100589: allocator: deflake full disk test r=andrewbaptist a=kvoli

In #97409 we introduced cluster settings to control the disk fullness threshold for rebalancing towards a store and shedding replicas off of the store. The `TestAllocatorFullDisks` assumes the total number of range bytes is equal or less than the rebalance threshold of the nodes, however the test was updated to use the shed threshold instead. This caused the test to flake occasionally as there was more than the expected amount of total range bytes.

This patch changes the ranges per node calculation to use the rebalance threshold again, instead of the shed threshold

```
dev test pkg/kv/kvserver/allocator/allocatorimpl -f TestAllocatorFullDisks -v --stress
...
15714 runs so far, 0 failures, over 39m45s
```

Fixes: #100033

Release note: None

100610: roachtest: set config.Quiet to true r=herkolategan a=srosenberg

After refactoring in [1], the default of config.Quiet was set to false since the roachprod CLI option is intended to set it to true. This resulted in an unwanted side-effect, namely roachtests running with the new default. Consequently, test_runner's log ended up with a bunch of (terminal) escape codes due to (status) spinner.

This change ensures roachtest explicitly sets config.Quiet to true.

[1] #99133

Epic: none

Release note: None

Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Co-authored-by: Andrew Baptist <[email protected]>
Co-authored-by: j82w <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Stan Rosenberg <[email protected]>
@craig craig bot closed this as completed in 3b94693 Apr 4, 2023
blathers-crl bot pushed a commit that referenced this issue Apr 4, 2023
In #97409 we introduced cluster settings to control the disk fullness
threshold for rebalancing towards a store and shedding replicas off of
the store. The `TestAllocatorFullDisks` assumes the total number of
range bytes is equal or less than the rebalance threshold of the nodes,
however the test was updated to use the shed threshold instead. This
caused the test to flake occasionally as there was more than the
expected amount of total range bytes.

This patch changes the ranges per node calculation to use the rebalance
threshold again, instead of the shed threshold

Fixes: #100033

Release note: None
@kvoli
Copy link
Collaborator

kvoli commented Apr 4, 2023

Will be closed on #100646

@kvoli kvoli reopened this Apr 4, 2023
@kvoli
Copy link
Collaborator

kvoli commented Apr 5, 2023

Closed on #100646

@kvoli kvoli closed this as completed Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). GA-blocker O-robot Originated from a bot. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants