-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
importccl: Support mysql's OUTFILE format in IMPORT #25615
Conversation
Without this field it falls back to the legacy opts which assume CSV only. Release note: none.
fcece6a
to
80195e6
Compare
@@ -337,6 +337,12 @@ func newReadImportDataProcessor( | |||
return cp, nil | |||
} | |||
|
|||
type inputConverter interface { | |||
start(ctx context.Context, group *errgroup.Group) | |||
readFile(context.Context, io.Reader, int32, string, func(finished bool) error) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of these arguments should be named
|
||
"github.com/cockroachdb/cockroach/pkg/roachpb" | ||
"github.com/cockroachdb/cockroach/pkg/util/leaktest" | ||
_ "github.com/go-sql-driver/mysql" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this requires us to vendor. is there a way around doing that? like moving writeMysqlOutfileTestdata to a comment or a file with a build ignore tag? not sure. or should we just vendor it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already vendor it: https://github.com/cockroachdb/cockroach/blob/master/Gopkg.lock#L413
input io.Reader, | ||
inputIdx int32, | ||
inputName string, | ||
progressFn func(finished bool) error, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this type is used a bunch now and maybe should get its own type? don't care too much.
} | ||
|
||
// If we have a defined enclose char, check if we're starting or stopping | ||
// an enclosed field. Technically, the enclose char can be made mandatory, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the flag described in this comment d.opts.Enclose? If so fix this comment.
// so potentially we could have a flag to enforce that, returning an error | ||
// if it is missing, but we don't need to care for simply decoding. | ||
if d.opts.Enclose != roachpb.MySQLOutfileOptions_Never && c == d.opts.Encloser { | ||
readingField = !readingField |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if d.opts.Encloser appears multiple times in a row they will just keep flipping readingField. Is that intended and correct?
@@ -0,0 +1,312 @@ | |||
// Copyright 2018 The Cockroach Authors. | |||
// | |||
// Licensed under the Apache License, Version 2.0 (the "License"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be CCL?
{ | ||
name: "escape-and-enclose", | ||
opts: roachpb.MySQLOutfileOptions{ | ||
FieldSeparator: '\t', RowSeparator: '\n', HasEscape: true, Escape: '\\', Enclose: roachpb.MySQLOutfileOptions_Always, Encloser: '"', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these would be easier to read separated by newlines
ctx := context.TODO() | ||
for _, config := range configs { | ||
t.Run(config.name, func(t *testing.T) { | ||
converter := newMysqloutfileReader(ctx, nil, config.opts, nil, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this 3 be a named const?
// If the csv writing part of our conversion encodes as hex, we can | ||
// round-trip bytes though the go csv parser. | ||
orig := r[2] | ||
r[2] = hex.EncodeToString([]byte(r[2])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't understand what's going on here. why don't we have to append \x
to this?
expected = []byte(config.null) | ||
} | ||
|
||
actual := []byte(res[i][2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't see where res[i][0] is verified
return errors.Errorf("unsupported import format: %q", importStmt.FileFormat) | ||
} | ||
|
||
if format.Format != roachpb.IOFileFormat_CSV { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to move these checks to be per format instead of anything not CSV? When we add mysql SQL support we'll need another version bump and we'll have to split it apart there again.
} | ||
|
||
if expected != actual { | ||
t.Fatalf("expected rowi=%s string to be %q, got %q", row[0], expected, actual) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling? rowi
pkg/util/strings.go
Outdated
if r == utf8.RuneError { | ||
return 0, errors.Errorf("invalid character: %s", s) | ||
} | ||
if sz != len(s) { | ||
return r, errors.New("must be only one character") | ||
if l := len(s); 1 != l { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was correct before. Add a descriptive comment if this is indeed intended. The length of a string is the number of bytes, not the number of runes. See https://play.golang.org/p/SE0ybC7jWoN
Review status: 0 of 21 files reviewed at latest revision, 14 unresolved discussions. pkg/ccl/importccl/import_stmt.go, line 393 at r3 (raw file): Previously, mjibson (Matt Jibson) wrote…
I think we just need to check for not CSV: if all the nodes are aware of the IOFormat proto, they'll know to look at the enum, and correctly fail if they don't support a format. I added a better error for that in the processor. pkg/ccl/importccl/import_stmt_test.go, line 1181 at r3 (raw file): Previously, mjibson (Matt Jibson) wrote…
Done. pkg/ccl/importccl/read_import_mysqlout.go, line 47 at r2 (raw file): Previously, mjibson (Matt Jibson) wrote…
Done. pkg/ccl/importccl/read_import_mysqlout.go, line 147 at r2 (raw file): Previously, mjibson (Matt Jibson) wrote…
Done. pkg/ccl/importccl/read_import_mysqlout.go, line 151 at r2 (raw file): Previously, mjibson (Matt Jibson) wrote…
Eh, that's more or less my read of the docs. They say it has to be escaped if it occurs in the content. pkg/ccl/importccl/read_import_mysqlout.go, line 162 at r2 (raw file): Previously, mjibson (Matt Jibson) wrote…
hm, there aren't actually many error returns mid-file -- I think it is just this one and then io errors from reading bytes from the reader. There are two error returns that can happen at end-of-file. pkg/ccl/importccl/read_import_mysqlout_test.go, line 3 at r2 (raw file): Previously, mjibson (Matt Jibson) wrote…
Done. pkg/ccl/importccl/read_import_mysqlout_test.go, line 170 at r2 (raw file): Previously, mjibson (Matt Jibson) wrote…
Done. pkg/ccl/importccl/read_import_mysqlout_test.go, line 211 at r2 (raw file): Previously, mjibson (Matt Jibson) wrote…
Done. pkg/ccl/importccl/read_import_mysqlout_test.go, line 295 at r2 (raw file): Previously, mjibson (Matt Jibson) wrote…
res[i][0] wasn't explicitly verified, was just being used as a label in case the other cols didn't line up, but I guess I can verify it too. pkg/ccl/importccl/read_import_proc.go, line 342 at r2 (raw file): Previously, mjibson (Matt Jibson) wrote…
Done. pkg/util/strings.go, line 32 at r3 (raw file): Previously, mjibson (Matt Jibson) wrote…
Done. Comments from Reviewable |
Review status: 0 of 21 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. pkg/ccl/importccl/read_import_mysqlout.go, line 162 at r2 (raw file): Previously, dt (David Taylor) wrote…
i'm specifically worried about the finished block above where it checks for expected enclosure chars and stuff. in any case can happen later. pkg/ccl/importccl/read_import_mysqlout_test.go, line 34 at r2 (raw file): Previously, dt (David Taylor) wrote…
TIL pkg/ccl/importccl/read_import_mysqlout_test.go, line 245 at r2 (raw file): Previously, mjibson (Matt Jibson) wrote…
reminder about this question Comments from Reviewable |
Review status: 0 of 21 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. pkg/ccl/importccl/read_import_mysqlout_test.go, line 245 at r2 (raw file): Previously, mjibson (Matt Jibson) wrote…
This is writing to a CSV, and later we read that CSV. This was particularly useful when we were testing mysqlout->csv->cockroach. Now that we're doing mysqlout->cockroach, I'm not sure if we need to read and write csvs at all -- I kept them though since they're more human readable than the raw mysqlout. Comments from Reviewable |
Release note: none.
Release note (sql change): Add support for MySQL's tabbed OUTFILE format to IMPORT.
bors r=mjibson |
25615: importccl: Support mysql's OUTFILE format in IMPORT r=mjibson a=dt This adds support for MySQL's OUTFILE (as created by `mysqldump --tab`) to IMPORT. Co-authored-by: David Taylor <[email protected]>
Build succeeded |
This adds support for MySQL's OUTFILE (as created by
mysqldump --tab
) to IMPORT.