Skip to content

Commit

Permalink
importer: only check import *atomicity* in TestImportWorkerFailure
Browse files Browse the repository at this point in the history
Five years ago, in cockroachdb#26881, we changed import to retry on worker
failures, which made imports much more resilient to transient failures
like nodes going down. As part of this work we created
`TestImportWorkerFailure` which shuts down one node during an import,
and checks that the import succeeded. Unfortunately, this test was
checked-in skipped, because though imports were much more resilient to
node failures, they were not completely resilient in every possible
scenario, making the test flakey.

Two months ago, in cockroachdb#105712, we unskipped this test and discovered that
in some cases the import statement succeeded but only imported a partial
dataset. This non-atomicity seems like a bigger issue than whether the
import is able to succeed in every possible transient failure scenario.

This PR changes `TestImportWorkerFailure` to remove successful import as
a necessary condition for test success. Instead, the test now only
checks whether the import was atomic; that is, whether a successful
import imported all data or a failed import imported none. This is more
in line with what we can guarantee about imports today.

This PR also completely unskips `TestImportWorkerFailure` so that we can
test the atomicity of imports more thoroughly.

Fixes: cockroachdb#102839

Release note: None
  • Loading branch information
michae2 committed Aug 11, 2023
1 parent 308a60b commit f1fc1a2
Showing 1 changed file with 8 additions and 7 deletions.
15 changes: 8 additions & 7 deletions pkg/sql/importer/import_stmt_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5417,10 +5417,6 @@ func TestImportWorkerFailure(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

skip.UnderStressWithIssue(t, 102839, "flaky test")
skip.UnderDeadlockWithIssue(t, 102839, "flaky test")
skip.UnderRaceWithIssue(t, 102839, "flaky test")

allowResponse := make(chan struct{})
params := base.TestClusterArgs{}
params.ServerArgs.Knobs.JobsTestingKnobs = jobs.NewTestingKnobsWithShortIntervals()
Expand Down Expand Up @@ -5468,10 +5464,15 @@ func TestImportWorkerFailure(t *testing.T) {
tc.StopServer(1)

close(allowResponse)
// We expect the statement to retry since it should have encountered a
// retryable error.
// We expect the IMPORT statement to usually retry since it should have
// encountered a retryable error. We don't currently catch all such retryable
// errors, however, so in some cases the IMPORT statement will return the
// error to the client. In this case we verify that the import was completely
// rolled back.
if err := <-errCh; err != nil {
t.Fatal(err)
t.Logf("%s failed, checking that imported data was completely removed: %q", query, err)
sqlDB.CheckQueryResults(t, `SELECT * FROM t ORDER BY i`, [][]string{})
return
}

// But the job should be restarted and succeed eventually.
Expand Down

0 comments on commit f1fc1a2

Please sign in to comment.