Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl: Correctly handle errors and cancellations during import. #49979

Merged
merged 1 commit into from
Jun 9, 2020

Conversation

miretskiy
Copy link
Contributor

@miretskiy miretskiy commented Jun 8, 2020

Fixes #49977

Parallel importer could get stuck due to a race between emitted
import batches and checking for context cancellation (either due to an
unforeseen error, or due to explicit context cancallation).

Fix the race condition, and add tests verifying correct behavior.

Release notes (bug fix): correctly handle import cancellation and errors.

@miretskiy miretskiy requested review from nvanbenschoten, a team and pbardea and removed request for a team June 8, 2020 19:30
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: thanks for the quick fix!

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @miretskiy and @pbardea)


pkg/ccl/importccl/read_import_base.go, line 580 at r1 (raw file):

}

// Flush flushes currently accumulated data.

nit: s/Flush/flush/


pkg/ccl/importccl/read_import_base_test.go, line 57 at r1 (raw file):

// nilDataProducer produces infinite stream of nulls.
// It implements importRowProducer

Missing period.


pkg/ccl/importccl/read_import_base_test.go, line 85 at r1 (raw file):

// errorReturningConsumer always returns a errConsumerAbortedError.
// it implements importRowConsumer

nit: s/it implemented importRowConsumer/It implemented importRowConsumer./


pkg/ccl/importccl/read_import_base_test.go, line 86 at r1 (raw file):

// errorReturningConsumer always returns a errConsumerAbortedError.
// it implements importRowConsumer
type errorReturningConsumer struct{}

It might be a little cleaner to add an err error field here so that we don't need the errConsumerAbortedError to be global. That also makes the consumer type a little more general.


pkg/ccl/importccl/read_import_base_test.go, line 97 at r1 (raw file):

// nilDataConsumer consumes and emits infinite stream of null.
// it implements importRowConsumer.

nit: s/it/It/


pkg/ccl/importccl/read_import_base_test.go, line 177 at r1 (raw file):

	require.NoError(t, ctxgroup.GroupWorkers(context.Background(), 100,
		func(_ context.Context, _ int) error {
			timeout := time.Millisecond * time.Duration(250+rand.Intn(1000))

750ms on average seems pretty long for a unit test. Does it need to be this long?

Copy link
Contributor Author

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @pbardea)


pkg/ccl/importccl/read_import_base_test.go, line 177 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

750ms on average seems pretty long for a unit test. Does it need to be this long?

Nah... Lowered it to 250-500ms

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: @pbardea should also give this a quick 👍 though.

Reviewed 2 of 2 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @pbardea)

Copy link
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pbardea
Copy link
Contributor

pbardea commented Jun 8, 2020

Ooops - sorry, I just noticed something: I think that we usually format release notes as:
Release note (bug fix): correctly handle import cancellation and errors. I think the formatting is important for the script that they run (- but I thought we had a linter to catch this?). It may also be useful to give the docs folks a quick description of what would happen if the race occurs.

@miretskiy
Copy link
Contributor Author

miretskiy commented Jun 8, 2020 via email

Fixes cockroachdb#49977

Parallel importer could get stuck  due to a race between emitted
import batches and checking for context cancellation (either due to an
unforeseen error, or due to explicit context cancallation).

Fix the race condition, and add tests verifying correct behavior.

Release notes (bug fix): correctly handle import cancellation and errors.
@miretskiy
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 9, 2020

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Jun 9, 2020

Build succeeded

@craig craig bot merged commit f098913 into cockroachdb:master Jun 9, 2020
craig bot pushed a commit that referenced this pull request Jun 12, 2020
50089: release-20.1: bulkio:  import no longer gets stuck due to errors encountered during import r=miretskiy a=miretskiy

Backport:
  * 1/1 commits from "importccl: Correctly handle errors and cancellations during import." (#49979)
  * 1/1 commits from "bulkio: Correctly group producer/consumers when importing data" (#49995)

Please see individual PRs for details.

/cc @cockroachdb/release


Co-authored-by: Yevgeniy Miretskiy <[email protected]>
@miretskiy miretskiy deleted the import branch July 12, 2020 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bulkio: parallelImporter can get stuck
4 participants