Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: ssh_problem failed #136285

Open
cockroach-teamcity opened this issue Nov 27, 2024 · 16 comments
Open

roachtest: ssh_problem failed #136285

cockroach-teamcity opened this issue Nov 27, 2024 · 16 comments
Labels
branch-release-24.2.6-rc O-roachtest O-robot Originated from a bot. T-testeng TestEng Team X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Nov 27, 2024

Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 9dab6cb9f71f21ef1e9c98640193035783ba9a4e:

test disk-stalled/wal-failover/among-stores failed: (cluster.go:2398).Run: full command output in run_103035.501445936_n1_sudo-dmsetup-resume-.log: error persisted after 3 attempts: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/disk-stalled/wal-failover/among-stores/run_1/ssh/ssh_103035.599869792_n1_sudo-dmsetup-resume-.log): TRANSIENT_ERROR(ssh_problem): exit status 255
(cluster.go:2398).Run: context canceled
(cluster.go:2398).Run: context canceled
test artifacts and logs in: /artifacts/disk-stalled/wal-failover/among-stores/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=16
  • encrypted=true
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=true
  • ssd=2
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-44955

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 9dab6cb9f71f21ef1e9c98640193035783ba9a4e:

test allocbench/nodes=7/cpu=8/kv/r=95/access=skew failed: (test_runner.go:823).func4: cluster.PutE: error persisted after 3 attempts: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/_runner-logs/ssh/ssh_065632.515687971_n1_if-hostname-teamcity.log): TRANSIENT_ERROR(ssh_problem): exit status 255 [owner=test-eng]

Parameters:

  • arch=arm64
  • cloud=gce
  • coverageBuild=false
  • cpu=8
  • encrypted=false
  • fs=ext4
  • localSSD=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 9dab6cb9f71f21ef1e9c98640193035783ba9a4e:

test backup/KMS/aws/n3cpu4 failed: (cluster.go:2285).Start: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/backup/KMS/aws/n3cpu4/run_1/ssh/ssh_065727.591661280_n2_upload-start-script.log): TRANSIENT_ERROR(ssh_problem): exit status 255
test artifacts and logs in: /artifacts/backup/KMS/aws/n3cpu4/run_1

Parameters:

  • arch=amd64
  • cloud=aws
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 9dab6cb9f71f21ef1e9c98640193035783ba9a4e:

test sql-stats/mixed-version failed: (mixedversion.go:710).Run: mixed-version test failure while running step 10 (restart node 1 with binary version v23.1.22): _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/sql-stats/mixed-version/cpu_arch=arm64/run_1/ssh/ssh_080423.259719900_n1_run-start-script.log): TRANSIENT_ERROR(ssh_problem): exit status 255 [owner=test-eng]
test artifacts and logs in: /artifacts/sql-stats/mixed-version/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=aws
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=system-only
  • mvtVersions=v23.1.22 → v23.2.6 → v24.1.2 → release-24.2.6-rc
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 9dab6cb9f71f21ef1e9c98640193035783ba9a4e:

test cdc/initial-scan-rolling-restart/shutdown-checkpoint failed: (test_runner.go:823).func4: cluster.PutE: put "/go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-amd64" failed: error persisted after 3 attempts: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine /go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-amd64 [email protected]:./cockroach
client_loop: send disconnect: Broken pipe
lost connection: exit status 1 [owner=test-eng]

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 9dab6cb9f71f21ef1e9c98640193035783ba9a4e:

test change-replicas/mixed-version failed: (test_runner.go:823).func4: cluster.PutE: put "/go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-amd64" failed: error persisted after 2 attempts: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine /go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-amd64 [email protected]:./cockroach
client_loop: send disconnect: Broken pipe
lost connection: exit status 1 [owner=test-eng]

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 9dab6cb9f71f21ef1e9c98640193035783ba9a4e:

test cdc/filtering/session/ignore-filtering=true failed: (test_runner.go:823).func4: cluster.PutE: put "/go/src/github.com/cockroachdb/cockroach/bin/cockroach-ea.linux-amd64" failed: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine /go/src/github.com/cockroachdb/cockroach/bin/cockroach-ea.linux-amd64 [email protected]:./cockroach
ssh: connect to host 34.75.234.27 port 22: Connection timed out
lost connection: exit status 1 [owner=test-eng]

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=true
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 9dab6cb9f71f21ef1e9c98640193035783ba9a4e:

test acceptance/version-upgrade failed: (test_runner.go:823).func4: cluster.PutE: put "/go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64" failed: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine /go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64 [email protected]:./cockroach
ssh: connect to host 34.28.108.141 port 22: Connection timed out
lost connection: exit status 1 [owner=test-eng]

Parameters:

  • arch=arm64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 9dab6cb9f71f21ef1e9c98640193035783ba9a4e:

test loqrecovery/half-online/workload=movr/rangeSize=default failed: (test_runner.go:823).func4: cluster.PutE: put "/go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64" failed: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine /go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64 [email protected]:./cockroach
ssh: connect to host 35.225.218.218 port 22: Connection timed out
lost connection: exit status 1 [owner=test-eng]

Parameters:

  • arch=arm64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 74f60821a82c570ab58b8d7716c1d74b85e87df0:

test alterpk-tpcc-250 failed: (test_runner.go:823).func4: cluster.PutE: put "/go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-amd64" failed: error persisted after 2 attempts: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine /go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-amd64 [email protected]:./cockroach
Warning: Permanently added '34.75.84.176' (ECDSA) to the list of known hosts.
client_loop: send disconnect: Broken pipe
lost connection: exit status 1 [owner=test-eng]

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=32
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 74f60821a82c570ab58b8d7716c1d74b85e87df0:

test kv0/enc=false/nodes=1/cpu=32 failed: (cluster.go:2398).Run: context canceled
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
failed to list cockroach processes: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/kv0/enc=false/nodes=1/cpu=32/cpu_arch=arm64/run_1/ssh/ssh_070610.252016941_n1_list-processes.log): TRANSIENT_ERROR(ssh_problem): exit status 255 [owner=test-eng]
test artifacts and logs in: /artifacts/kv0/enc=false/nodes=1/cpu=32/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=aws
  • coverageBuild=false
  • cpu=32
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 74f60821a82c570ab58b8d7716c1d74b85e87df0:

test jepsen/g2/majority-ring-subcritical-skews failed: (test_runner.go:823).func4: cluster.PutE: put "/go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64" failed: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine /go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64 [email protected]:./cockroach
ssh: connect to host 34.67.241.114 port 22: Connection timed out
lost connection: exit status 1 [owner=test-eng]

Parameters:

  • arch=arm64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 74f60821a82c570ab58b8d7716c1d74b85e87df0:

test cdc/workload/kv0/nodes=5/cpu=16/ranges=100k/server=scheduler/protocol=mux/format=json/sink=null failed: (test_runner.go:823).func4: cluster.PutE: put "/go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64" failed: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine /go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64 [email protected]:./cockroach
ssh: connect to host 35.193.221.38 port 22: Connection timed out
lost connection: exit status 1 [owner=test-eng]

Parameters:

  • arch=arm64
  • cloud=gce
  • coverageBuild=false
  • cpu=16
  • encrypted=false
  • fs=ext4
  • localSSD=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 74f60821a82c570ab58b8d7716c1d74b85e87df0:

test alterpk-tpcc-500 failed: (test_runner.go:823).func4: cluster.PutE: error persisted after 3 attempts: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/_runner-logs/ssh/ssh_065620.183337869_n1_if-hostname-teamcity.log): TRANSIENT_ERROR(ssh_problem): exit status 255 [owner=test-eng]

Parameters:

  • arch=arm64
  • cloud=gce
  • coverageBuild=false
  • cpu=16
  • encrypted=false
  • fs=ext4
  • localSSD=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 74f60821a82c570ab58b8d7716c1d74b85e87df0:

test import/tpcc/warehouses=1000/nodes=32 failed: (test_runner.go:823).func4: cluster.PutE: error persisted after 3 attempts: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/_runner-logs/ssh/ssh_112047.261871963_n1_if-hostname-teamcity.log): TRANSIENT_ERROR(ssh_problem): exit status 255 [owner=test-eng]

Parameters:

  • arch=arm64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 74f60821a82c570ab58b8d7716c1d74b85e87df0:

test follower-reads/survival=zone/locality=regional/reads=bounded-staleness/insufficient-quorum failed: (cluster.go:2285).Start: _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/follower-reads/survival=zone/locality=regional/reads=bounded-staleness/insufficient-quorum/run_1/ssh/ssh_111943.875672695_n1_run-start-script.log): TRANSIENT_ERROR(ssh_problem): exit status 255
test artifacts and logs in: /artifacts/follower-reads/survival=zone/locality=regional/reads=bounded-staleness/insufficient-quorum/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • runtimeAssertionsBuild=true
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.ssh_problem failed with artifacts on release-24.2.6-rc @ 74f60821a82c570ab58b8d7716c1d74b85e87df0:

test allocbench/nodes=7/cpu=8/kv/r=50/ops=skew failed: (test_runner.go:823).func4: cluster.PutE: put "/go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64" failed: error persisted after 2 attempts: TRANSIENT_ERROR(ssh_problem): ~ scp -r -C -o StrictHostKeyChecking=no -o ConnectTimeout=10 -i /home/roach/.ssh/id_rsa -i /home/roach/.ssh/google_compute_engine /go/src/github.com/cockroachdb/cockroach/bin/cockroach.linux-arm64 [email protected]:./cockroach
ssh: connect to host 34.72.215.20 port 22: Connection refused
lost connection: exit status 1 [owner=test-eng]

Parameters:

  • arch=arm64
  • cloud=gce
  • coverageBuild=false
  • cpu=8
  • encrypted=false
  • fs=ext4
  • localSSD=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-24.2.6-rc O-roachtest O-robot Originated from a bot. T-testeng TestEng Team X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue
Projects
None yet
Development

No branches or pull requests

1 participant