-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlbase: add round-trippable export and parsing for arrays and collated strings #44464
Conversation
I had to move Since our sql parser already does this work of parsing arrays, I used that instead of trying to adjust the existing array parser, which was meant for strictly casting strings to arrays (in pg compat style). |
4ada21f
to
99469eb
Compare
Thanks for putting this together @rohany. If possible, it would be great to get this reviewed and merged soon so that we can de-risk the cdc to upsert work needed for several customers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somewhat trust the code change to do what it says on the label. I would feel more comfortable with a more thorough test suite (The one included here is very "lightweight") but I also want to admit I don't have concrete recommendations. Ideally, someone (or multiple folk) from the Bulk I/O team could make recommendations about testing best practices.
Reviewed 13 of 13 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @rohany)
pkg/sql/sem/tree/parse_string.go, line 22 at r1 (raw file):
// TODO (rohany): why do the various import functions depend on this // and not ParseDatumStringAs? What is the "power" of not parsing // the input string as bytes but just keeping it as raw bytes?
This is a good question. I think this question should get an answer as part of the review process. Ideally MattJ would answer but he's on leave. Maybe ask @dt who is most likely to remember/know about this.
pkg/sql/sem/tree/parse_string.go, line 50 at r1 (raw file):
// because there are some things outside of this package in tree (like tree_test) // that need the functionality of this (basically the old ParseDatumStringAs). // However, I don't think the name is great.
Can you place more words in the comment to explain what this function assumes from its input parameters and what guarantees it provides on its output?
Based on that understanding we can work on a better name.
pkg/sql/sqlbase/roundtrip_format_test.go, line 91 at r1 (raw file):
// TestParseDatumStringAs tests that datums are roundtrippable between // printing with FmtExport and ParseDatumStringAs. func TestParseDatumStringAs(t *testing.T) {
So this test is about ParseDatumStringAs
. Where are the equivalent tests for the new function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @knz, and @rohany)
pkg/sql/sem/tree/parse_string.go, line 50 at r1 (raw file):
Previously, knz (kena) wrote…
Can you place more words in the comment to explain what this function assumes from its input parameters and what guarantees it provides on its output?
Based on that understanding we can work on a better name.
Ack, i removed the original comment for some reason.
// parseStringAs parses s as type t for simple types. Arrays and collated
// strings are not handled. nil, nil is returned if t is not a supported type.
It differs from ParseStringAs
in that it calls the bytes parsing method when types.Bytes
is requested while ParseStringAs
doesn't.
pkg/sql/sqlbase/roundtrip_format_test.go, line 91 at r1 (raw file):
Previously, knz (kena) wrote…
So this test is about
ParseDatumStringAs
. Where are the equivalent tests for the new function?
What do you mean? ParseDatumStringAs
and it's test already existed, I just moved it from tree/parse_string_test.go
into here and added a case for arrays. There is a randomized test for it above in this same file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @knz, and @rohany)
pkg/sql/sqlbase/roundtrip_format_test.go, line 91 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
What do you mean?
ParseDatumStringAs
and it's test already existed, I just moved it fromtree/parse_string_test.go
into here and added a case for arrays. There is a randomized test for it above in this same file.
*its
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @rohany)
pkg/sql/sem/tree/parse_string.go, line 50 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Ack, i removed the original comment for some reason.
// parseStringAs parses s as type t for simple types. Arrays and collated // strings are not handled. nil, nil is returned if t is not a supported type.
It differs from
ParseStringAs
in that it calls the bytes parsing method whentypes.Bytes
is requested whileParseStringAs
doesn't.
ParseAndRequireString
? I think we already have XAndRequireY
in other places?
pkg/sql/sqlbase/roundtrip_format_test.go, line 91 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
*its
thanks for pointing it out.
For curiosity where is ParseStringMustBe() tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @knz, and @rohany)
pkg/sql/sem/tree/parse_string.go, line 50 at r1 (raw file):
Previously, knz (kena) wrote…
ParseAndRequireString
? I think we already haveXAndRequireY
in other places?
That makes sense to me.
pkg/sql/sqlbase/roundtrip_format_test.go, line 91 at r1 (raw file):
Previously, knz (kena) wrote…
thanks for pointing it out.
For curiosity where is ParseStringMustBe() tested?
It was unexported before, and is used in ParseStringAs
. ParseStringAs
was used in a few places and tested there. However, I don't think it had any unit tests. The answer to the above question (about why using ParseStringAs
rather than ParseDatumStringAs
) might answer why that is the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @knz, and @rohany)
pkg/sql/sqlbase/roundtrip_format_test.go, line 91 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
It was unexported before, and is used in
ParseStringAs
.ParseStringAs
was used in a few places and tested there. However, I don't think it had any unit tests. The answer to the above question (about why usingParseStringAs
rather thanParseDatumStringAs
) might answer why that is the case.
Ok that makes sense. I'll be ok wrapping up this review once we have more insight about why this function wasn't used. Please approach your favorite TLs to get the information you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @knz)
pkg/sql/sem/tree/parse_string.go, line 22 at r1 (raw file):
Previously, knz (kena) wrote…
This is a good question. I think this question should get an answer as part of the review process. Ideally MattJ would answer but he's on leave. Maybe ask @dt who is most likely to remember/know about this.
Hm, I think it might be a mistake? I see csv
uses ParseDatumStringAs so I can't imagine the others actually intended to use something different. I'll try switching all the calls in importccl to be ParseDatumStringAs and see if anything fails.
I tried changing them and some of our mysql import tests failed with |
Can you explain why we use the same parsing code for pg and mysql data?
Do you think there's a chance our pg parsing code has ever been tweaked to become less strict to accommodate mysql input?
(if so that would be a problem that needs fixing)
I think the proximate way forward is to split the code paths, make a special case for mysql that needs it then fix /homogenize the pg case.
--
Verstuurd vanaf mijn Android apparaat met K-9 Mail. Excuseer mijn beknoptheid.
|
99469eb
to
cc6ad2d
Compare
cc6ad2d
to
3ca040f
Compare
We don't exactly use the same parsing code for mysql and pg -- there's a lot of mysql-specific lexing and some parsing -- but in some places, yes, we do indeed reuse our (pg) parsing functions to get to actual cockroach datums if they do the job, rather than re-implementing parallel versions of them. I think it makes sense to do so long as they work as is, and switch to a bespoke one only when we need different behavior. |
This is RFAL (now that tests have passed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 13 of 13 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @knz, and @rohany)
pkg/sql/sem/tree/parse_string.go, line 21 at r2 (raw file):
// unchanged. Otherwise s is parsed with the given type's Parse func. This function is // used when the input string might be unescaped raw bytes, so we don't want to run // a bytes parsing routine on that input.
Maybe you can refer to the other function in this comment and also explain in this comment the difference between the two.
pkg/sql/sem/tree/parse_string.go, line 29 at r2 (raw file):
d = NewDBytes(DBytes(s)) case types.CollatedStringFamily: d, err = NewDCollatedString(s, t.Locale(), &evalCtx.CollationEnv)
Collated strings can also contain escape sequences. Is this not supported?
pkg/sql/sqlbase/roundtrip_format.go, line 26 at r2 (raw file):
return parseArray(evalCtx, t, s) case types.CollatedStringFamily: return tree.NewDCollatedString(s, t.Locale(), &evalCtx.CollationEnv)
huh are you sure about this one. That doesn't seem right. See my previous comment.
Also where are the tests for this.
3ca040f
to
899796b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, RFAL!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @knz, and @rohany)
pkg/sql/sem/tree/parse_string.go, line 21 at r2 (raw file):
Previously, knz (kena) wrote…
Maybe you can refer to the other function in this comment and also explain in this comment the difference between the two.
Done
pkg/sql/sem/tree/parse_string.go, line 29 at r2 (raw file):
Previously, knz (kena) wrote…
Collated strings can also contain escape sequences. Is this not supported?
Good eye -- this was existing code, but it was incorrect (especially for roundtripping). Updated and added to the randomized tester.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM, but this PR is missing a couple of release notes. The bug fix should include a description of the symptoms a user would encounter prior to your fix.
Reviewed 6 of 6 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @knz, and @rohany)
899796b
to
c741f0f
Compare
c741f0f
to
ae7fc1b
Compare
Thanks for the careful review! Updated. |
ae7fc1b
to
134c2f6
Compare
…ed strings Work for cockroachdb#44322. Fixes cockroachdb#33429. Fixes cockroachdb#44414. Closes cockroachdb#44669. This PR adds support for ARRAY's and collated strings to be dumped with the `tree.FmtExport` flag, which dumps datums in a string format that can be round-tripped with `sqlbase.ParseDatumStringAs`. This PR additionally performs some code cleanup and renaming around various parsing functions within the `tree` package. In particular, this PR removes the function `tree.ParseStringAs` which was confusingly used sometimes instead of `sqlbase.ParseDatumStringAs` due to it's handling of raw bytes. Instead, we introduce a new function `sqlbase.ParseDatumStringAsWithRawBytes` to be more explicit about when to use one parsing function over the other. Release note (bug fix): This PR fixes bugs around `cockroach dump` and `IMPORT`/`EXPORT` where columns of arrays or collated strings were not able to be roundtripped from cockroach to the dump and back to cockroach.
134c2f6
to
dbfd995
Compare
bors r=knz |
44464: sqlbase: add round-trippable export and parsing for arrays and collated strings r=knz a=rohany Work for #44322. Fixes #33429. Fixes #44414. Closes #44669. This PR adds support for ARRAY's and collated strings to be dumped with the `tree.FmtExport` flag, which dumps datums in a string format that can be round-tripped with `sqlbase.ParseDatumStringAs`. This PR additionally performs some code cleanup and renaming around various parsing functions within the `tree` package. In particular, this PR removes the function `tree.ParseStringAs` which was confusingly used sometimes instead of `sqlbase.ParseDatumStringAs` due to it's handling of raw bytes. Instead, we introduce a new function `sqlbase.ParseDatumStringAsWithRawBytes` to be more explicit about when to use one parsing function over the other. Release note (bug fix): This PR fixes bugs around `cockroach dump` and `IMPORT`/`EXPORT` where columns of arrays or collated strings would not be able to be roundtripped from cockroach to the dump and back to cockroach. Co-authored-by: Rohan Yadav <[email protected]>
Build succeeded |
Work for #44322.
Fixes #33429.
Fixes #44414.
Closes #44669.
This PR adds support for ARRAY's and collated strings to be dumped with the
tree.FmtExport
flag, which dumps datums in a string format that can beround-tripped with
sqlbase.ParseDatumStringAs
.This PR additionally performs some code cleanup and renaming
around various parsing functions within the
tree
package.In particular, this PR removes the function
tree.ParseStringAs
whichwas confusingly used sometimes instead of
sqlbase.ParseDatumStringAs
due to it's handling of raw bytes. Instead, we introduce a new function
sqlbase.ParseDatumStringAsWithRawBytes
to be more explicit about whento use one parsing function over the other.
Release note (bug fix): This PR fixes bugs around
cockroach dump
andIMPORT
/EXPORT
where columns of arrays or collated strings wouldnot be able to be roundtripped from cockroach to the dump and back to cockroach.