Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: rebalance/by-load/leases/mixed-version failed #122084

Closed
cockroach-teamcity opened this issue Apr 10, 2024 · 4 comments · Fixed by #122187
Closed

roachtest: rebalance/by-load/leases/mixed-version failed #122084

cockroach-teamcity opened this issue Apr 10, 2024 · 4 comments · Fixed by #122187
Assignees
Labels
A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-kv KV Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Apr 10, 2024

roachtest.rebalance/by-load/leases/mixed-version failed with artifacts on master @ 911f1ce389342459592d89fe7b78bcba3a2d265a:

(mixedversion.go:592).Run: mixed-version test failure while running step 9 (run "rebalance load run"): CPU not evenly balanced after timeout: within bounds mean=69.0 tolerance=20.0% (±13.8) bounds=[55.2, 82.8]
	stores=[s1: 66 (-4.1%), s2: 78 (+14.4%), s3: 61 (-10.3%)]
test artifacts and logs in: /artifacts/rebalance/by-load/leases/mixed-version/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-37692

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team labels Apr 10, 2024
@cockroach-teamcity cockroach-teamcity added this to the 24.1 milestone Apr 10, 2024
@kvoli
Copy link
Collaborator

kvoli commented Apr 10, 2024

mean=69.0 tolerance=20.0% (±13.8) bounds=[55.2, 82.8]
stores=[s1: 66 (-4.1%), s2: 78 (+14.4%), s3: 61 (-10.3%)]

It appears all stores are within the bounds, might be a testing issue.

@kvoli kvoli added P-2 Issues/test failures with a fix SLA of 3 months and removed P-2 Issues/test failures with a fix SLA of 3 months labels Apr 10, 2024
@kvoli
Copy link
Collaborator

kvoli commented Apr 10, 2024

The test is unable to gather the CPU usage information, causing the test to timeout before balancing is achieved (or noticed to be achieved).

06:34:15 cluster.go:2429: running cmd `./cockroach auth-session lo...` on nodes [:4]; details in run_063415.317963092_n4_cockroach-authsessio.log
06:34:15 httpclient.go:210: ./cockroach auth session login failed on node 4: failed to authenticate: COMMAND_PROBLEM: exit status 1

This may be an issue with mixed version secure clusters and the roachtest framework?

run_063413.087691684_n1_cockroach-authsessio: 06:34:13 cluster.go:2432: > ./cockroach auth-session login root --port={pgport:1} --certs-dir ./certs --format raw
teamcity-14772140-1712728228-03-n4cpu4:[1]: ./cockroach auth-session lo...
ERROR: connection lost.

EOF
Failed running "auth-session login"
run_063413.087691684_n1_cockroach-authsessio: 06:34:13 cluster.go:2457: > result: Error for Node 1: COMMAND_PROBLEM: exit status 1
(1) Node 1. Command with error:
  | ```
  | ./cockroach auth-session login root --port={pgport:1} --certs-dir ./certs --format raw
  | ```
  | stdout: <empty>
  | stderr:ERROR: connection lost.
  |
  | EOF
  | Failed running "auth-session login"
Wraps: (2) COMMAND_PROBLEM
Wraps: (3) exit status 1
Error types: (1) *hintdetail.withDetail (2) errors.Cmd (3) *exec.ExitError

@kvoli kvoli added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure P-2 Issues/test failures with a fix SLA of 3 months and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Apr 10, 2024
@kvoli
Copy link
Collaborator

kvoli commented Apr 11, 2024

This is a similar issue to #119064. The fix there was to run the test in insecure mode. Should we do the same here @DarrylWong?

@DarrylWong
Copy link
Contributor

👍 That sounds reasonable to me. A little surprised it took this long to fail with this though.

craig bot pushed a commit that referenced this issue Apr 11, 2024
122187: roachtest: run rebalance/by-load/*/mixed-version on insecure mode r=DarrylWong a=kvoli

Authentication fails if migrating from 22.2 to 23.1. Disable secure mode to avoid failing the test on this unrelated issue.

Fixes: #122179
Fixes: #122084

Release note: None

Co-authored-by: Austen McClernon <[email protected]>
@craig craig bot closed this as completed in 7256197 Apr 11, 2024
blathers-crl bot pushed a commit that referenced this issue Apr 11, 2024
Authentication fails if migrating from 22.2 to 23.1. Disable secure mode
to avoid failing the test on this unrelated issue.

Fixes: #122179
Fixes: #122084

Release note: None
@github-project-automation github-project-automation bot moved this to roachtest/unit test backlog in KV Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-kv KV Team
Projects
No open projects
Status: roachtest/unit test backlog
Development

Successfully merging a pull request may close this issue.

3 participants