Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: rust-postgres failed [scheduled backups fail] #96799

Closed
cockroach-teamcity opened this issue Feb 8, 2023 · 6 comments · Fixed by #96985
Closed

roachtest: rust-postgres failed [scheduled backups fail] #96799

cockroach-teamcity opened this issue Feb 8, 2023 · 6 comments · Fixed by #96985
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Feb 8, 2023

roachtest.rust-postgres failed with artifacts on master @ ff67a4ba86d710db090ce700f229020365851183:

test artifacts and logs in: /artifacts/rust-postgres/run_1
(cluster.go:1864).Start: parallel execution failure: ~ ./cockroach sql --url 'postgres://root@localhost:26257?sslmode=disable' "-e
CREATE SCHEDULE IF NOT EXISTS test_only_backup FOR BACKUP INTO 'gs://cockroachdb-backup-testing/roachprod-scheduled-backups/teamcity-8626723-1675836996-118-n1cpu16/1675873651445454863?AUTH=implicit' RECURRING '*/15 * * * *'
FULL BACKUP '@hourly'
WITH SCHEDULE OPTIONS first_run = 'now'"
ERROR: cannot dial server.
Is the server running?
If the server is running, check --host client-side and --advertise server-side.
dial tcp 127.0.0.1:26257: connect: connection refused
Failed running "sql": COMMAND_PROBLEM: ssh verbose log retained in ssh_162731.445558948_n1_run-sql.log: exit status 1

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/sql-sessions

This test on roachdash | Improve this report!

Jira issue: CRDB-24340

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Feb 8, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Feb 8, 2023
@rafiss rafiss removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Feb 8, 2023
@cockroach-teamcity
Copy link
Member Author

roachtest.rust-postgres failed with artifacts on master @ 09188370d82e163ff1d44c62fe611104502c548d:

test artifacts and logs in: /artifacts/rust-postgres/run_1
(cluster.go:1864).Start: parallel execution failure: ~ ./cockroach sql --url 'postgres://root@localhost:26257?sslmode=disable' "-e
CREATE SCHEDULE IF NOT EXISTS test_only_backup FOR BACKUP INTO 'gs://cockroachdb-backup-testing/roachprod-scheduled-backups/teamcity-8641928-1675923360-118-n1cpu16/1675958629426465617?AUTH=implicit' RECURRING '*/15 * * * *'
FULL BACKUP '@hourly'
WITH SCHEDULE OPTIONS first_run = 'now'"
ERROR: cannot dial server.
Is the server running?
If the server is running, check --host client-side and --advertise server-side.
dial tcp 127.0.0.1:26257: connect: connection refused
Failed running "sql": COMMAND_PROBLEM: ssh verbose log retained in ssh_160349.426569484_n1_run-sql.log: exit status 1

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.rust-postgres failed with artifacts on master @ e51ffa013c81212870891001f0328912550fa75d:

test artifacts and logs in: /artifacts/rust-postgres/run_1
(cluster.go:1864).Start: parallel execution failure: ~ ./cockroach sql --url 'postgres://root@localhost:26257?sslmode=disable' "-e
CREATE SCHEDULE IF NOT EXISTS test_only_backup FOR BACKUP INTO 'gs://cockroachdb-backup-testing/roachprod-scheduled-backups/teamcity-8655926-1676009739-111-n1cpu16/1676043014453489051?AUTH=implicit' RECURRING '*/15 * * * *'
FULL BACKUP '@hourly'
WITH SCHEDULE OPTIONS first_run = 'now'"
ERROR: cannot dial server.
Is the server running?
If the server is running, check --host client-side and --advertise server-side.
dial tcp 127.0.0.1:26257: connect: connection refused
Failed running "sql": COMMAND_PROBLEM: ssh verbose log retained in ssh_153014.453601621_n1_run-sql.log: exit status 1

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@rafiss rafiss changed the title roachtest: rust-postgres failed roachtest: rust-postgres failed [scheduled backups fail] Feb 10, 2023
@rafiss
Copy link
Collaborator

rafiss commented Feb 10, 2023

cc @msbutler something seems off with the scheduled backups here, but i can't tell why this test is affected.

@blathers-crl
Copy link

blathers-crl bot commented Feb 10, 2023

cc @cockroachdb/disaster-recovery

@msbutler msbutler self-assigned this Feb 10, 2023
@msbutler
Copy link
Collaborator

I think I know where the problem is. When the single node initially starts up, it successfully creates a backup schedule, but when it's restarted with a new port for the rust orm, the schedule backup cmd fails. I confirmed the new port seems to be the problem by commenting this line out and rerunning the test a few times, and the 2nd schedule backup cmd invoction works fine.
https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/tests/rust_postgres.go#L130

roachprod is naive to this port change, as the cockroach port is hardcoded at cluster creation, via setting this field:
https://github.com/cockroachdb/cockroach/blob/master/pkg/roachprod/vm/vm.go#L83

I suspect that any sql cmd executed via roachprod's sql interface will fail after this port change. I think the easiest fix here is to replace option.DefaultStartOpts() to option.DefaultStartOptsNoBackups(). @rafiss if this sounds good to you, i'll open a quick pr.

I'm unsure why i didn't catch this in my roachtest nightly run.

@rafiss
Copy link
Collaborator

rafiss commented Feb 10, 2023

Turning off backups for this test sgtm! Thanks for looking at this.

craig bot pushed a commit that referenced this issue Feb 10, 2023
96881: servercontroller: serve http from default tenant on error r=aadityasondhi a=dhartunian

Previously, when an HTTP request arrived with a tenant cookie attached, we would attempt to connect to that tenant and fail if the tenant couldn't be started or if it didn't exist. This causes a bad user-facing experience when someone's browser contains stale tenant cookies and they attempt to load DB Console. The error returns a 500 to the browser and user sees no UI, can't try and login again, and is basically stuck until they clear their cookies.

This change falls back to serving HTTP from the default tenant when the requested one can't be reached. This ensures that *something* is served to ensure that unauthenticated endpoints can still be loaded in these scenarios and users can attempt to login again.

Epic: CRDB-12100

Release note: None

96947: kvserver: fix rebalance obj test on arm mac r=pavelkalinnikov a=kvoli

Previosly,`TestRebalanceObjectiveManager` and
`TestLoadBasedRebalancingObjective` would assert assuming that it was possible for the test host to use `grunning`, however this is not true for ARM Mac.

This patch ammends these tests to test a subset of behavior when `grunning` isn't supported and then exit.

Fixes: #96934

Release note: None

96957: sql: fix SHOW GRANTS FOR public r=ecwall a=rafiss

fixes #96948

Release note (bug fix): Fixed the `SHOW GRANTS FOR public` command so it works correctly. Previously, this would return an error saying that the `public` role does not exist.

96964: roachtest: set range tombstones flag accordingly r=jbowens,msbutler a=nicktrav

Fix a bug in the `import-cancellation` roachtest whereby range tombstones are always disabled.

Release note: None.

Epic: CRDB-20465

96985: roachtest: disable scheduled backup in rust-postgres node restart r=rafiss a=msbutler

Fixes #96799

Release note: None

Co-authored-by: David Hartunian <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Nick Travers <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
@craig craig bot closed this as completed in f9cf805 Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants