Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/backupccl: TestFullClusterBackup failed #100094

Closed
cockroach-teamcity opened this issue Mar 30, 2023 · 2 comments · Fixed by #100121
Closed

ccl/backupccl: TestFullClusterBackup failed #100094

cockroach-teamcity opened this issue Mar 30, 2023 · 2 comments · Fixed by #100121
Assignees
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-disaster-recovery
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Mar 30, 2023

ccl/backupccl.TestFullClusterBackup failed with artifacts on release-23.1 @ 3f2adec1b34a86de342fea2811da936d1ae72a43:

=== RUN   TestFullClusterBackup
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/47447a7ed84475b6aaa4b9399a882ce0/logTestFullClusterBackup494608081
    test_log_scope.go:79: use -show-logs to present logs inline
=== CONT  TestFullClusterBackup
    testutils.go:199: no Invalid Descriptors
    testutils.go:199: no Invalid Descriptors
    full_cluster_backup_restore_test.go:350: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/47447a7ed84475b6aaa4b9399a882ce0/logTestFullClusterBackup494608081
--- FAIL: TestFullClusterBackup (253.48s)
=== RUN   TestFullClusterBackup/ensure_system_table_data_restored
    full_cluster_backup_restore_test.go:296: query 'SELECT * FROM system.scheduled_jobs': expected:
        852319971431219201, sql-schema-telemetry, 2023-03-30 12:03:33.142521 +0000 UTC, node, 2023-04-06 12:06:00 +0000 UTC, , 6 12 * * 4, ��, scheduled-schema-telemetry-executor, 
        I
        Gtype.googleapis.com/cockroach.sql.ScheduledSchemaTelemetryExecutionArgs
        852319971448324097, sql-stats-compaction, 2023-03-30 12:03:33.157522 +0000 UTC, node, NULL, 
        �pending, @hourly, ��, scheduled-sql-stats-compaction-executor, 
        K
        Itype.googleapis.com/cockroach.sql.ScheduledSQLStatsCompactorExecutionArgs
        852320675762765825, BACKUP 1680178028, 2023-03-30 12:07:08.070458 +0000 UTC, root, NULL, NULL, @hourly, , scheduled-backup-executor, 
        ��
        Htype.googleapis.com/cockroach.ccl.backupccl.ScheduledBackupExecutionArgs�F�DBACKUP TABLE data.public.bank INTO 'nodelocal://0/foo' WITH detached
        
        got:
        852319971431219201, sql-schema-telemetry, 2023-03-30 12:03:33.142521 +0000 UTC, node, 2023-03-30 12:06:00 +0000 UTC, 
        �pending, 6 12 * * 4, ��, scheduled-schema-telemetry-executor, 
        I
        Gtype.googleapis.com/cockroach.sql.ScheduledSchemaTelemetryExecutionArgs
        852319971448324097, sql-stats-compaction, 2023-03-30 12:03:33.157522 +0000 UTC, node, NULL, 
        �pending, @hourly, ��, scheduled-sql-stats-compaction-executor, 
        K
        Itype.googleapis.com/cockroach.sql.ScheduledSQLStatsCompactorExecutionArgs
        852320675762765825, BACKUP 1680178028, 2023-03-30 12:07:08.070458 +0000 UTC, root, NULL, NULL, @hourly, , scheduled-backup-executor, 
        ��
        Htype.googleapis.com/cockroach.ccl.backupccl.ScheduledBackupExecutionArgs�F�DBACKUP TABLE data.public.bank INTO 'nodelocal://0/foo' WITH detached
        
    --- FAIL: TestFullClusterBackup/ensure_system_table_data_restored (0.05s)
Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

/cc @cockroachdb/disaster-recovery

This test on roachdash | Improve this report!

Jira issue: CRDB-26245

@cockroach-teamcity cockroach-teamcity added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Mar 30, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Mar 30, 2023
@msbutler msbutler self-assigned this Mar 30, 2023
@msbutler
Copy link
Collaborator

i'll take a look at this

@msbutler
Copy link
Collaborator

lol this is another one of those silly schedule race flake tests.

The assertion failed because the sql-schema-telemetry schedule's next run on the backup was 2023-03-30 12:06:00 +0000 UTC while the restore's was 2023-04-06 12:06:00 +0000 UTC. The crontab is 6 12 * * 4, i.e. this schedule runs at At 12:06 UTC on Thursday. The test's backup was executed right before 12:06 UTC today (thursday), and the restore executed after this. QED.

To prevent this flake, I'll remove the prev_run and next_run columns from the test's validation query.

craig bot pushed a commit that referenced this issue Mar 30, 2023
100121: backupccl: deflake TestFullClusterBackup r=adityamaru a=msbutler

TestFullClusterBackup would flake if the test ran at a specific time during the week due to #100094. This patch prevents this flake.

Fixes #100094

Release note: none

100182: roachprod: set COCKROACH_CONNECT_TIMEOUT to 1200s during start r=renatolabs,jbowens a=msbutler

Previously the sql cmds invoked within roachprod start had an infinite timeout. This patch reduces the timeout to 10 minutes -- if a node can't start within 10 minutes because a roachprod client can't connect to the new crdb node, there's likely a problem.

We decided to make this change after a roachtest with a 10 hr time out hung for 10 hours due to an internal start cmd. See #99280.

Informs #99280

Release note: none

Co-authored-by: Michael Butler <[email protected]>
@craig craig bot closed this as completed in 0778e24 Mar 30, 2023
blathers-crl bot pushed a commit that referenced this issue Mar 30, 2023
TestFullClusterBackup would flake if the test ran at a specific time during the
week due to #100094. This patch prevents this flake.

Fixes #100094

Release note: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-disaster-recovery
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants