Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import: check readability earlier #74863

Merged
merged 1 commit into from
Jan 19, 2022
Merged

Conversation

benbardin
Copy link
Collaborator

@benbardin benbardin commented Jan 14, 2022

Release note (sql change): Import now checks readability earlier for multiple files, to fail sooner if e.g. permissions are invalid.

@benbardin benbardin requested review from dt, stevendanna and a team January 14, 2022 18:03
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@benbardin
Copy link
Collaborator Author

I don't think there are reasonable unit tests to add here - behavior should be the same as before, just faster. But I'm open to suggestions if you have any! (Conceivably I could break out the "pre-import" stanza into a separate, named function, and test that, but that doesn't seem super-consistent with how we organize/test these files right now anyway)

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ~fine without a test but if we wanted to write one, I think one easy-ish one to write would be to add a "debug pause point" to the ingestion phase, then send and import for one good file URI plus then one or more bad URIs.

In theory, without this change, the job would enter ingestion phase for the good URI and then pause on the pause point, but then with this change it'd fail rather than pause. Entering the ingestion phase at all is what we want to detect, and avoid, when we know the job is doomed due to the bad URI.

However, the fact we import files in random (go map iteration) order is a wrinkle in that, since it could spuriously pass (that is fail on the bad URI, rather than pause in ingestion) even without this change if it random'ed the bad file first, and it doesn't really seem worth putting lots of time into changing this just for a test to me. Maybe we test by hand once or twice and call it good?

pkg/ccl/importccl/read_import_base.go Outdated Show resolved Hide resolved
@benbardin
Copy link
Collaborator Author

Do we typically build unit tests in the way you describe? Seems odd to me, but if it's customary then I'll dive in! It sounds like (a) The test-fail condition would be hanging indefinitely until test timeout, and (b) we'd be making pretty unusual use of breakpoints and debuggers (unless you didn't mean that literally?). Maybe I misunderstand? But otherwise I don't love that solution, I think a manual test or two might be more fitting for this circumstance.

@dt
Copy link
Member

dt commented Jan 18, 2022

making pretty unusual use of breakpoints and debuggers (unless you didn't mean that literally?).

Yeah, sorry, I didn't mean that literally at all, that was really unclear: I added a little helper thing last month in this PR #73890

Basically, you could add a line to the import job, right when it moves in to ingestion, that looks like if err := r.CheckpausePoint("import-start-ingest"); err != nil { return err } or something like that, and then when that line is hit, instead of proceeding to ingest, it'd return an error that the job framework recognizes as meaning the job should be updated to be showed as "paused", and the caller that started the job gets an error saying so. I added it to be able to make it easier to write tests like this, that want to introspect into some behavior in the middle of a job, that the job would otherwise just pass though on its own. In your case, you'd know your job should always return your readability failure error, and never the pause error, since it should never reach ingestion phase, and that pause error, if your readability check catches the bad file.

Anyway, this is all academic -- a manual test sounds great and much simpler.

@dt
Copy link
Member

dt commented Jan 18, 2022

nit: IMPORT isn't an enterprise feature so release note should be sql change I think

@benbardin benbardin force-pushed the import-perms branch 2 times, most recently from 63fa010 to 67d6e66 Compare January 18, 2022 20:02
Copy link
Collaborator Author

@benbardin benbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt and @stevendanna)


pkg/ccl/importccl/read_import_base.go, line 175 at r1 (raw file):

Previously, dt (David Taylor) wrote…

I'm not sure you pushed the amended commit yet, but sure, that sounds fine.

Re defer in loop with/without anon function it still "works" just fine, it just changes when it'd run -- they'd all be queued up to run at the end of the outer function vs after each loop, but they'll still run. I'm fine with it either way though. Maybe a comment though to explain the anon function is to fun the deferred close immediately.

Ah, got it. Thanks! Added a comment.

Copy link
Collaborator Author

@benbardin benbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt and @stevendanna)

@dt
Copy link
Member

dt commented Jan 18, 2022

Sorry I should have nitted this at the same time earlier as the release not category, but it isn't technically "user permissions" which we're checking, which could be reasonably interpreted as the SQL user's permissions inside the DB to import into the table or database being written to -- those are already checked much earlier than this before we create the IMPORT job -- but rather it is the accessibility of the provided input URIs that is being verified in this change.

If you want to mention permissions, I might clarify that we mean the cloud provider's bucket permissions, so something like:

Release note (sql change): IMPORT now verifies that it can open and read from all of the provided input URIs before it starts processing them to detect and report any URIs that are malformed or lack the required bucket permission grants sooner.

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @benbardin and @stevendanna)


pkg/ccl/importccl/read_import_base.go, line 175 at r1 (raw file):

Previously, benbardin (Ben Bardin) wrote…

Ah, got it. Thanks! Added a comment.

Thanks!

(But of course, no good deed goes unpunished... Our style guide says to wrap comments at 80. Not blocking but would be nice if you need to push again for something else)

@benbardin benbardin changed the title import: check permissions earlier import: check readability earlier Jan 18, 2022
Release note (sql change): Import now checks readability earlier for multiple files, to fail sooner if e.g. permissions are invalid.
Copy link
Collaborator Author

@benbardin benbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! Fixed.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @benbardin and @stevendanna)

Copy link
Collaborator Author

@benbardin benbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt and @stevendanna)


pkg/ccl/importccl/read_import_base.go, line 175 at r1 (raw file):

Previously, dt (David Taylor) wrote…

Thanks!

(But of course, no good deed goes unpunished... Our style guide says to wrap comments at 80. Not blocking but would be nice if you need to push again for something else)

done!

@benbardin
Copy link
Collaborator Author

bors r+

@benbardin benbardin linked an issue Jan 19, 2022 that may be closed by this pull request
@craig
Copy link
Contributor

craig bot commented Jan 19, 2022

Build succeeded:

@craig craig bot merged commit 419167d into cockroachdb:master Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IMPORT - Verify credentials have access to resource sooner.
3 participants