-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Flacky test PgCloneColocationTest.NoColocatedChildTables #24549
Labels
area/docdb
YugabyteDB core features
kind/enhancement
This is an enhancement of an existing feature
kind/failing-test
Tests and testing infra
priority/medium
Medium priority issue
Comments
yamen-haddad
added
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
Oct 21, 2024
yugabyte-ci
added
kind/failing-test
Tests and testing infra
priority/medium
Medium priority issue
kind/enhancement
This is an enhancement of an existing feature
labels
Oct 21, 2024
yamen-haddad
removed
kind/enhancement
This is an enhancement of an existing feature
status/awaiting-triage
Issue awaiting triage
labels
Oct 21, 2024
yugabyte-ci
added
the
kind/enhancement
This is an enhancement of an existing feature
label
Oct 21, 2024
SrivastavaAnubhav
added a commit
that referenced
this issue
Nov 12, 2024
Summary: This diff re-adds the retry logic removed in https://phorge.dev.yugabyte.com/D35120 to `GetSuitableSnapshotForRestore`. Without this logic, YbAdminSnapshotSchedule tests sometimes fail with the following error: ``` Creating snapshot in progress: 0b84f78a-4a44-4f55-901e-78eccbefa8f0 (passed 0.003s): PARALLEL_SNAPSHOT_OPERATION (master error 25) ``` This mostly affects tests because we take snapshots much more frequently than in production (currently, every 6s). I prefer the retry approach to checking for this error on the client because: 1. The approach was already used before I removed it 2. Checking the error on the client means we have to do so for clone and backup/restore separately, and it's annoying to do so because the tests using the yb-admin interface for restore (so the error output would have to be string-parsed) 3. This fixes the issue for production as well (even though there re-running is a simple fix). Jira: DB-13584 Test Plan: Created a YbAdminSnapshotScheduleTest which sets an artificial 1s delay in the create snapshot path by setting `TEST_delay_tablet_export_metadata_ms=1000`, and with a snapshot interval of 1s. The test just creates a table and restores to the current time. This reliably reproduced the PARALLEL_SNAPSHOT_ERROR message. ``` TEST_P(YbAdminSnapshotScheduleTestWithYsqlColocationRestoreParam, Anubhav) { auto schedule_id = ASSERT_RESULT(PreparePgWithColocatedParam()); ASSERT_OK(cluster_->SetFlagOnTServers("TEST_delay_tablet_export_metadata_ms", "1000")); auto conn = ASSERT_RESULT(PgConnect(client::kTableName.namespace_name())); ASSERT_OK(conn.Execute("CREATE TABLE test_table (key INT PRIMARY KEY, value TEXT)")); Timestamp time = ASSERT_RESULT(GetCurrentTime()); ASSERT_OK(RestoreSnapshotSchedule(schedule_id, time)); } ``` This test passed -n 100 locally. I don't think it's worth committing this as a regression test. Reviewers: zdrudi Reviewed By: zdrudi Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D39850
SrivastavaAnubhav
added a commit
that referenced
this issue
Dec 1, 2024
…store tests Summary: Original commit: b1b63ee / D39850 This diff re-adds the retry logic removed in https://phorge.dev.yugabyte.com/D35120 to `GetSuitableSnapshotForRestore`. Without this logic, YbAdminSnapshotSchedule tests sometimes fail with the following error: ``` Creating snapshot in progress: 0b84f78a-4a44-4f55-901e-78eccbefa8f0 (passed 0.003s): PARALLEL_SNAPSHOT_OPERATION (master error 25) ``` This mostly affects tests because we take snapshots much more frequently than in production (currently, every 6s). I prefer the retry approach to checking for this error on the client because: 1. The approach was already used before I removed it 2. Checking the error on the client means we have to do so for clone and backup/restore separately, and it's annoying to do so because the tests using the yb-admin interface for restore (so the error output would have to be string-parsed) 3. This fixes the issue for production as well (even though there re-running is a simple fix). Jira: DB-13584 Test Plan: Created a YbAdminSnapshotScheduleTest which sets an artificial 1s delay in the create snapshot path by setting `TEST_delay_tablet_export_metadata_ms=1000`, and with a snapshot interval of 1s. The test just creates a table and restores to the current time. This reliably reproduced the PARALLEL_SNAPSHOT_ERROR message. ``` TEST_P(YbAdminSnapshotScheduleTestWithYsqlColocationRestoreParam, Anubhav) { auto schedule_id = ASSERT_RESULT(PreparePgWithColocatedParam()); ASSERT_OK(cluster_->SetFlagOnTServers("TEST_delay_tablet_export_metadata_ms", "1000")); auto conn = ASSERT_RESULT(PgConnect(client::kTableName.namespace_name())); ASSERT_OK(conn.Execute("CREATE TABLE test_table (key INT PRIMARY KEY, value TEXT)")); Timestamp time = ASSERT_RESULT(GetCurrentTime()); ASSERT_OK(RestoreSnapshotSchedule(schedule_id, time)); } ``` This test passed -n 100 locally. I don't think it's worth committing this as a regression test. Reviewers: zdrudi Reviewed By: zdrudi Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D40254
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/docdb
YugabyteDB core features
kind/enhancement
This is an enhancement of an existing feature
kind/failing-test
Tests and testing infra
priority/medium
Medium priority issue
Jira Link: DB-13584
Description
I saw this test being flacky in multiple builds with the following error:
I believe this error (despite very rare) is causing most clone tests to be flacky. We should address this.
Issue Type
kind/failing-test
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: