-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
importer: fix scanning of newlines in enclosed fields for DELIMITED DATA #97545
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
1e60ec9
to
6aa2c3a
Compare
6aa2c3a
to
bec4f76
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, glad this could get fixed! i had some small comments
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andyyang890)
pkg/sql/importer/read_import_mysqlout.go
line 87 at r1 (raw file):
row []rune err error eof bool
nice cleanup! this eof
field really seems needlessly complicated
pkg/sql/importer/read_import_mysqlout.go
line 96 at r1 (raw file):
var r rune var w int var gotEncloser, inEnclosedField, inEscapeSeq, inField bool
nit: could you add a small descriptive comment for each one of these flags? i think it will make the logic of the for loop easier to understand. specifically, the definitions of "encloser" and the difference between inEnclosedField vs inField would help.
pkg/sql/importer/read_import_mysqlout.go
line 101 at r1 (raw file):
r, w, d.err = d.reader.ReadRune() if d.err == io.EOF { d.err = nil
should we add the if nextLiteral
check here and set ErrUnexpectedEOF
if true?
pkg/sql/importer/read_import_mysqlout.go
line 141 at r1 (raw file):
} inField = !(r == d.opts.FieldSeparator)
nit: use r != d.opts.FieldSeparator
Previously, when using `IMPORT INTO ... DELIMITED DATA`, unescaped newlines within quoted fields were treated as row terminators instead of as part of the field. This patch refactors the scanning logic to correctly interpret these newlines. Release note (bug fix): `IMPORT INTO ... DELIMITED DATA` will now correctly handle quoted fields that contain unescaped newlines.
bec4f76
to
748a496
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
pkg/sql/importer/read_import_mysqlout.go
line 96 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: could you add a small descriptive comment for each one of these flags? i think it will make the logic of the for loop easier to understand. specifically, the definitions of "encloser" and the difference between inEnclosedField vs inField would help.
Done.
pkg/sql/importer/read_import_mysqlout.go
line 101 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
should we add the
if nextLiteral
check here and setErrUnexpectedEOF
if true?
I think that would actually be incorrect since it would disallow ending a file with the escape character. From my reading of the logic before my change, the check here would never actually be reached since it would mean d.row == nil
(i.e. the row is empty) yet nextLiteral
is true (i.e. the row had an escape character), which is why I deleted it.
pkg/sql/importer/read_import_mysqlout.go
line 141 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: use
r != d.opts.FieldSeparator
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, the comments help me understand better now. i just had one request for another test, and should be good to merge after that
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andyyang890)
pkg/sql/importer/read_import_mysqlout.go
line 101 at r1 (raw file):
I think that would actually be incorrect since it would disallow ending a file with the escape character
hm, but it seems correct to disallow ending a file with the escape character. at least, no matter what we decide, let's have a test case for that situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
pkg/sql/importer/read_import_mysqlout.go
line 101 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
I think that would actually be incorrect since it would disallow ending a file with the escape character
hm, but it seems correct to disallow ending a file with the escape character. at least, no matter what we decide, let's have a test case for that situation.
Oh you're right, this is indeed an error case and there's already a test for it here:
cockroach/pkg/sql/importer/import_stmt_test.go
Lines 478 to 487 in 1c3e46d
{ | |
name: "unmatched literal", | |
create: `i int8`, | |
with: `WITH fields_escaped_by = '\'`, | |
typ: "DELIMITED", | |
data: `\`, | |
err: "row 1: unmatched literal", | |
rejected: "\\\n", | |
query: map[string][][]string{`SELECT * from t`: {}}, | |
}, |
I believe the check for this is deferred to FillDatums
to return a more informative error message (i.e. unmatched literal
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for checking! lgtm
TFTR! bors r=rafiss |
Build succeeded: |
Previously, when using
IMPORT INTO ... DELIMITED DATA
, unescapednewlines within quoted fields were treated as row terminators instead
of as part of the field. This patch refactors the scanning logic to
correctly interpret these newlines.
Fixes #95906
Release note (bug fix):
IMPORT INTO ... DELIMITED DATA
will nowcorrectly handle quoted fields that contain unescaped newlines.