-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: tpccbench/nodes=9/cpu=4/multi-region failed #89100
Comments
Same as #86987. |
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on release-22.2 @ 600df9f5c387a07fd9b4ba5e54f8b8240645176f:
Parameters: |
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on release-22.2 @ 36e4cb1677e055e67fa3cfdce81dea18cf10df33:
Parameters: |
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on release-22.2 @ fe99dbf5702d1dedeb97d0bac7cb646dc2ec379c:
Parameters: |
Disk stall detected on n6:
This error is preceded by likely related timed out hearbeat warnings:
This seems like a test flake (hardware problem?). I'll let @cockroachdb/storage take a quick look to confirm if that makes sense. |
@renatolabs yes, that just looks like the hardware / infrastructure got overwhelmed and we were waiting 20s for a write. Likely nothing to do here as this usually happens due to things outside our control (AWS/GCP issues, noisy neighbour, something like that), and it seems like a one-off case too. We could consider bumping the disk stall threshold back up (it used to be 60s), but maybe all it'd have done here is delayed the same outcome by roughly a minute. |
This was fairly heavily litigated over in #81075. I don't think it's worth going back to 60s, unless this is becoming very noisy / toilsome. |
Ah yeah. Didn't mean to suggest it should go all the way up to 60s, there could be a sweet spot in between - but since this isn't a noisy failure at all I think sticking with the status quo makes sense. |
Looked in the Pebble logs for n6, and I see a bunch of stalled writes, to separate files. Agree with Bilal's assessment - infra flake. If we keep seeing these due to the 20s thresholds, let's reconsider. |
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on release-22.2 @ c69571de022f48d95d76fb39e378cc9ab9a30afe:
Parameters:
ROACHTEST_cloud=gce
,ROACHTEST_cpu=4
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-20118
Epic CRDB-20293
The text was updated successfully, but these errors were encountered: