Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl: Support mysql's OUTFILE format in IMPORT #25615

Merged
merged 3 commits into from
May 21, 2018

Conversation

dt
Copy link
Member

@dt dt commented May 17, 2018

This adds support for MySQL's OUTFILE (as created by mysqldump --tab) to IMPORT.

Without this field it falls back to the legacy opts which assume CSV only.

Release note: none.
@dt dt requested review from maddyblue, bobvawter and a team May 17, 2018 15:07
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dt dt force-pushed the import-mysqlout branch 2 times, most recently from fcece6a to 80195e6 Compare May 17, 2018 17:06
@dt dt force-pushed the import-mysqlout branch from 80195e6 to e88994b Compare May 17, 2018 17:30
@@ -337,6 +337,12 @@ func newReadImportDataProcessor(
return cp, nil
}

type inputConverter interface {
start(ctx context.Context, group *errgroup.Group)
readFile(context.Context, io.Reader, int32, string, func(finished bool) error) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of these arguments should be named


"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
_ "github.com/go-sql-driver/mysql"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this requires us to vendor. is there a way around doing that? like moving writeMysqlOutfileTestdata to a comment or a file with a build ignore tag? not sure. or should we just vendor it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input io.Reader,
inputIdx int32,
inputName string,
progressFn func(finished bool) error,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this type is used a bunch now and maybe should get its own type? don't care too much.

}

// If we have a defined enclose char, check if we're starting or stopping
// an enclosed field. Technically, the enclose char can be made mandatory,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the flag described in this comment d.opts.Enclose? If so fix this comment.

// so potentially we could have a flag to enforce that, returning an error
// if it is missing, but we don't need to care for simply decoding.
if d.opts.Enclose != roachpb.MySQLOutfileOptions_Never && c == d.opts.Encloser {
readingField = !readingField
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if d.opts.Encloser appears multiple times in a row they will just keep flipping readingField. Is that intended and correct?

@@ -0,0 +1,312 @@
// Copyright 2018 The Cockroach Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be CCL?

{
name: "escape-and-enclose",
opts: roachpb.MySQLOutfileOptions{
FieldSeparator: '\t', RowSeparator: '\n', HasEscape: true, Escape: '\\', Enclose: roachpb.MySQLOutfileOptions_Always, Encloser: '"',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these would be easier to read separated by newlines

ctx := context.TODO()
for _, config := range configs {
t.Run(config.name, func(t *testing.T) {
converter := newMysqloutfileReader(ctx, nil, config.opts, nil, 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this 3 be a named const?

// If the csv writing part of our conversion encodes as hex, we can
// round-trip bytes though the go csv parser.
orig := r[2]
r[2] = hex.EncodeToString([]byte(r[2]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand what's going on here. why don't we have to append \x to this?

expected = []byte(config.null)
}

actual := []byte(res[i][2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see where res[i][0] is verified

return errors.Errorf("unsupported import format: %q", importStmt.FileFormat)
}

if format.Format != roachpb.IOFileFormat_CSV {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to move these checks to be per format instead of anything not CSV? When we add mysql SQL support we'll need another version bump and we'll have to split it apart there again.

}

if expected != actual {
t.Fatalf("expected rowi=%s string to be %q, got %q", row[0], expected, actual)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling? rowi

if r == utf8.RuneError {
return 0, errors.Errorf("invalid character: %s", s)
}
if sz != len(s) {
return r, errors.New("must be only one character")
if l := len(s); 1 != l {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was correct before. Add a descriptive comment if this is indeed intended. The length of a string is the number of bytes, not the number of runes. See https://play.golang.org/p/SE0ybC7jWoN

@dt dt force-pushed the import-mysqlout branch from e88994b to 30a1607 Compare May 18, 2018 04:35
@dt
Copy link
Member Author

dt commented May 18, 2018

Review status: 0 of 21 files reviewed at latest revision, 14 unresolved discussions.


pkg/ccl/importccl/import_stmt.go, line 393 at r3 (raw file):

Previously, mjibson (Matt Jibson) wrote…

Is it better to move these checks to be per format instead of anything not CSV? When we add mysql SQL support we'll need another version bump and we'll have to split it apart there again.

I think we just need to check for not CSV: if all the nodes are aware of the IOFormat proto, they'll know to look at the enum, and correctly fail if they don't support a format. I added a better error for that in the processor.


pkg/ccl/importccl/import_stmt_test.go, line 1181 at r3 (raw file):

Previously, mjibson (Matt Jibson) wrote…

spelling? rowi

Done.


pkg/ccl/importccl/read_import_mysqlout.go, line 47 at r2 (raw file):

Previously, mjibson (Matt Jibson) wrote…

this type is used a bunch now and maybe should get its own type? don't care too much.

Done.


pkg/ccl/importccl/read_import_mysqlout.go, line 147 at r2 (raw file):

Previously, mjibson (Matt Jibson) wrote…

Is the flag described in this comment d.opts.Enclose? If so fix this comment.

Done.


pkg/ccl/importccl/read_import_mysqlout.go, line 151 at r2 (raw file):

Previously, mjibson (Matt Jibson) wrote…

if d.opts.Encloser appears multiple times in a row they will just keep flipping readingField. Is that intended and correct?

Eh, that's more or less my read of the docs. They say it has to be escaped if it occurs in the content.


pkg/ccl/importccl/read_import_mysqlout.go, line 162 at r2 (raw file):

Previously, mjibson (Matt Jibson) wrote…

there's a nice error message here, but not in any other error return path. i think we should make a general solution (like a struct with an Error method) to not introduce more things as described in #25532. doesn't need to be done in this pr.

hm, there aren't actually many error returns mid-file -- I think it is just this one and then io errors from reading bytes from the reader. There are two error returns that can happen at end-of-file.


pkg/ccl/importccl/read_import_mysqlout_test.go, line 3 at r2 (raw file):

Previously, mjibson (Matt Jibson) wrote…

Should this be CCL?

Done.


pkg/ccl/importccl/read_import_mysqlout_test.go, line 170 at r2 (raw file):

Previously, mjibson (Matt Jibson) wrote…

these would be easier to read separated by newlines

Done.


pkg/ccl/importccl/read_import_mysqlout_test.go, line 211 at r2 (raw file):

Previously, mjibson (Matt Jibson) wrote…

could this 3 be a named const?

Done.


pkg/ccl/importccl/read_import_mysqlout_test.go, line 295 at r2 (raw file):

Previously, mjibson (Matt Jibson) wrote…

i don't see where res[i][0] is verified

res[i][0] wasn't explicitly verified, was just being used as a label in case the other cols didn't line up, but I guess I can verify it too.


pkg/ccl/importccl/read_import_proc.go, line 342 at r2 (raw file):

Previously, mjibson (Matt Jibson) wrote…

all of these arguments should be named

Done.


pkg/util/strings.go, line 32 at r3 (raw file):

Previously, mjibson (Matt Jibson) wrote…

I think it was correct before. Add a descriptive comment if this is indeed intended. The length of a string is the number of bytes, not the number of runes. See https://play.golang.org/p/SE0ybC7jWoN

Done.


Comments from Reviewable

@maddyblue
Copy link
Contributor

Review status: 0 of 21 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/ccl/importccl/read_import_mysqlout.go, line 162 at r2 (raw file):

Previously, dt (David Taylor) wrote…

hm, there aren't actually many error returns mid-file -- I think it is just this one and then io errors from reading bytes from the reader. There are two error returns that can happen at end-of-file.

i'm specifically worried about the finished block above where it checks for expected enclosure chars and stuff. in any case can happen later.


pkg/ccl/importccl/read_import_mysqlout_test.go, line 34 at r2 (raw file):

Previously, dt (David Taylor) wrote…

We already vendor it: https://github.com/cockroachdb/cockroach/blob/master/Gopkg.lock#L413

TIL


pkg/ccl/importccl/read_import_mysqlout_test.go, line 245 at r2 (raw file):

Previously, mjibson (Matt Jibson) wrote…

i don't understand what's going on here. why don't we have to append \x to this?

reminder about this question


Comments from Reviewable

@dt
Copy link
Member Author

dt commented May 21, 2018

Review status: 0 of 21 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/ccl/importccl/read_import_mysqlout_test.go, line 245 at r2 (raw file):

Previously, mjibson (Matt Jibson) wrote…

reminder about this question

This is writing to a CSV, and later we read that CSV.

This was particularly useful when we were testing mysqlout->csv->cockroach. Now that we're doing mysqlout->cockroach, I'm not sure if we need to read and write csvs at all -- I kept them though since they're more human readable than the raw mysqlout.


Comments from Reviewable

dt added 2 commits May 21, 2018 17:46
Release note (sql change): Add support for MySQL's tabbed OUTFILE format to IMPORT.
@dt dt force-pushed the import-mysqlout branch from 30a1607 to b24d85e Compare May 21, 2018 17:47
@dt dt requested a review from a team May 21, 2018 17:47
@dt
Copy link
Member Author

dt commented May 21, 2018

bors r=mjibson

craig bot pushed a commit that referenced this pull request May 21, 2018
25615: importccl: Support mysql's OUTFILE format in IMPORT r=mjibson a=dt

This adds support for MySQL's OUTFILE (as created by `mysqldump --tab`) to IMPORT.

Co-authored-by: David Taylor <[email protected]>
@craig
Copy link
Contributor

craig bot commented May 21, 2018

Build succeeded

@craig craig bot merged commit b24d85e into cockroachdb:master May 21, 2018
@dt dt deleted the import-mysqlout branch June 5, 2018 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants