-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: c2c/tpcc/warehouses=1000/duration=60/cutover=30 failed #106022
Comments
roachtest.c2c/tpcc/warehouses=1000/duration=60/cutover=30 failed with artifacts on master @ 428dc9da6a320de218460de6c6c8807caa4ded98:
Parameters: |
roachtest.c2c/tpcc/warehouses=1000/duration=60/cutover=30 failed with artifacts on master @ 34699bb9c1557fce449e08a68cd259efec94926f:
Parameters: |
This test has been failing every night due to dead node detection. Node 1 is consistently panicing because it has detected tracing span use after finish within the replica application decoder in kvserver. One next step could be to rerun this test with the env variable See trace found in node 1's logs:
This occured during the initial scan of the test. |
cc @cockroachdb/replication |
adding @cockroachdb/replication because I'm wondering if something in your neck of the woods changed in the past week that could be related to this panic. See my read of the logs here. |
Might be fallout from the recent reproposal refactor, cc @tbg. |
This might also be related to the trace logging that DR added in #102793, which is immediately before the panic in the logs. More likely though, the caller finishes the span and we keep using it during application. I'm not familiar with the details here. Lines 1316 to 1321 in 66c9f93
|
roachtest.c2c/tpcc/warehouses=1000/duration=60/cutover=30 failed with artifacts on master @ 43c26aec0072f76e02e6d5ffc1b7079026b24630:
Parameters: |
roachtest.c2c/tpcc/warehouses=1000/duration=60/cutover=30 failed with artifacts on master @ 0207c613fa7c8f3ab66c4518ee1e52dabb863426:
Parameters: |
@erikgrinaker this test hasn't failed in 3 days. Are you aware of some patch in replication land that may have fixed this? |
No. I won't have time to look into this, @pavelkalinnikov is on test flake duty for replication. |
cc @cockroachdb/replication |
Given that this consistently flaked for 4 days and now hasn't flaked for a week, I'm inclined to close this. @pavelkalinnikov I'll let you make the final call. |
The closest I can think of is #105877 but that merged on June 30. The test last failed on July 10. |
roachtest.c2c/tpcc/warehouses=1000/duration=60/cutover=30 failed with artifacts on master @ aacba20d325e5702836e9a76be646b5f1bd922af:
Parameters:
ROACHTEST_arch=amd64
,ROACHTEST_cloud=gce
,ROACHTEST_cpu=8
,ROACHTEST_encrypted=false
,ROACHTEST_fs=ext4
,ROACHTEST_localSSD=false
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-29347
The text was updated successfully, but these errors were encountered: