-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: disk-stalled/wal-failover/among-stores failed #129922
Comments
roachtest.disk-stalled/wal-failover/among-stores failed with artifacts on master @ 4142920c2d5c50c0520c124764aeeda94ba043ae:
Parameters:
|
roachtest.disk-stalled/wal-failover/among-stores failed with artifacts on master @ fa9c0528fc0d06be1b4cfc534ec0501448111fbe:
Parameters:
|
The second failure #129922 (comment) is a test flake due to injecting too long a stall. The test attempts to inject a 30s stall, and a 60s stall would result in a fatal error in the node (COCKROACH_LOG_MAX_SYNC_DURATION is set to 60s). But we see the test injecting a longer stall from 11:17:45 to 11:19:02: And n1 dies due to this stall: |
In the first failure n1 loses leases, has no disk reads, has slot exhaustion. failure: corresponding stall: This is similar to the failure in #124399 (comment) One thing to note is that multiple stalls have a p100 of 10+s. The failure happens due to a stall where lower percentiles are also slow. That suggests that our disk read bytes (which are always 0) are not telling the whole story of what gets stuck, since if there was nothing getting stuck, even the p100 would consistently stay low. |
Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout. roachtest.disk-stalled/wal-failover/among-stores failed with artifacts on master @ 833dadd212fa4b12b1442ae8e00e85ee80a8cdce:
Parameters:
Same failure on other branches
|
roachtest.disk-stalled/wal-failover/among-stores failed with artifacts on master @ 472ea07a5232c98536293d13bb46cca59f9f2cd0:
Parameters:
Same failure on other branches
|
roachtest.disk-stalled/wal-failover/among-stores failed with artifacts on master @ 472ea07a5232c98536293d13bb46cca59f9f2cd0:
Parameters:
Same failure on other branches
|
roachtest.disk-stalled/wal-failover/among-stores failed with artifacts on master @ 472ea07a5232c98536293d13bb46cca59f9f2cd0:
Parameters:
Same failure on other branches
|
Update the default logging configuration used for roachprod clusters to disable auditable logs on logs going to file sinks. Some roachtests use the buffered:true configuration to withstand disk stall events. This setting is incompatible with auditable logs on file sinks and recently introduced validation (cockroachdb#132742) prohibits the settings from being used together. Release note: none Informs cockroachdb#129922. Informs cockroachdb#132988. Epic: none
132916: kvserver: clear rac2 token metrics prior to integration testing r=sumeerbhola a=kvoli `TestFlowControl.*V2` tests assert on exact counters. This can be problematic if benign deltas occur while setting up the test, such a send queue forming when adding a new learner, but being quickly resolved. Clear the token metrics prior to commencing these tests, in order to prevent flakes that result from such deltas in setup. Fixes: #132642 Release note: None 133089: roachprod: update default CockroachDB logging configuration r=dhartunian a=jbowens Update the default logging configuration used for roachprod clusters to disable auditable logs on logs going to file sinks. Some roachtests use the buffered:true configuration to withstand disk stall events. This setting is incompatible with auditable logs on file sinks and recently introduced validation (#132742) prohibits the settings from being used together. Release note: none Informs #129922. Informs #132988. Epic: none Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Jackson Owens <[email protected]>
Update the default logging configuration used for roachprod clusters to disable auditable logs on logs going to file sinks. Some roachtests use the buffered:true configuration to withstand disk stall events. This setting is incompatible with auditable logs on file sinks and recently introduced validation (#132742) prohibits the settings from being used together. Release note: none Informs #129922. Informs #132988. Epic: none
roachtest.disk-stalled/wal-failover/among-stores failed with artifacts on master @ 1e5b3c212b45419c960038718c48a5dd75a111a0:
Parameters:
Same failure on other branches
|
roachtest.disk-stalled/wal-failover/among-stores failed with artifacts on master @ 787f2e3fe5f73b33fcd65485908cbb71e0991222:
Parameters:
Same failure on other branches
|
In roachprod clusters, default to using buffering in file sinks. This is required by a subsequent change that will default to using WAL failover in roachprod clusters. Informs cockroachdb#133248 Informs cockroachdb#129922 Epic: CRDB-37534 Release note: none
133130: sqlccl: deflake TestExplainGist when run with concurrent ALTER PK r=rafiss,michae2 a=spilchen TestExplainGist occasionally fails when a query using a secondary index tries to fetch a column not included in that index (see issue #130282). This change doesn’t address the root cause, but instead ignores the error when it occurs. I've also created a more reliable reproducer in the TestDMLInjectionTest, which we can use to validate the eventual fix (#133129). Epic: none Closes #130282 Release note: none 133256: roachprod: default to buffering file sinks in roachprod r=jbowens a=jbowens In roachprod clusters, default to using buffering in file sinks. This is required by a subsequent change that will default to using WAL failover in roachprod clusters. Informs #133248 Informs #129922 Epic: CRDB-37534 Release note: none Co-authored-by: Matt Spilchen <[email protected]> Co-authored-by: Jackson Owens <[email protected]>
In roachprod clusters, default to using buffering in file sinks. This is required by a subsequent change that will default to using WAL failover in roachprod clusters. Informs #133248 Informs #129922 Epic: CRDB-37534 Release note: none
roachtest.disk-stalled/wal-failover/among-stores failed with artifacts on master @ 5a7850a72f941992b1bb4b23a73b5fa5e9f15a68:
Parameters:
Same failure on other branches
|
roachtest.disk-stalled/wal-failover/among-stores failed with artifacts on master @ 9354770c7c6eb5a89437068d8c6a4accf8031b67:
Parameters:
Same failure on other branches
|
roachtest.disk-stalled/wal-failover/among-stores failed with artifacts on master @ dafb6dd507b38fb3d6eb8b7e2493c7b8abed34d2:
Parameters:
ROACHTEST_arch=amd64
ROACHTEST_cloud=gce
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=16
ROACHTEST_encrypted=false
ROACHTEST_fs=ext4
ROACHTEST_localSSD=true
ROACHTEST_runtimeAssertionsBuild=false
ROACHTEST_ssd=2
Help
See: roachtest README
See: How To Investigate (internal)
See: Grafana
This test on roachdash | Improve this report!
Jira issue: CRDB-41774
The text was updated successfully, but these errors were encountered: