Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlbase: add round-trippable export and parsing for arrays and collated strings #44464

Merged
merged 1 commit into from
Feb 5, 2020

Conversation

rohany
Copy link
Contributor

@rohany rohany commented Jan 28, 2020

Work for #44322.
Fixes #33429.
Fixes #44414.
Closes #44669.

This PR adds support for ARRAY's and collated strings to be dumped with the
tree.FmtExport flag, which dumps datums in a string format that can be
round-tripped with sqlbase.ParseDatumStringAs.

This PR additionally performs some code cleanup and renaming
around various parsing functions within the tree package.
In particular, this PR removes the function tree.ParseStringAs which
was confusingly used sometimes instead of sqlbase.ParseDatumStringAs
due to it's handling of raw bytes. Instead, we introduce a new function
sqlbase.ParseDatumStringAsWithRawBytes to be more explicit about when
to use one parsing function over the other.

Release note (bug fix): This PR fixes bugs around cockroach dump and
IMPORT/EXPORT where columns of arrays or collated strings would
not be able to be roundtripped from cockroach to the dump and back to cockroach.

@rohany rohany requested review from justinj and knz January 28, 2020 22:22
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rohany
Copy link
Contributor Author

rohany commented Jan 28, 2020

I had to move ParseDatumStringAs out of tree to avoid an import cycle with parser.

Since our sql parser already does this work of parsing arrays, I used that instead of trying to adjust the existing array parser, which was meant for strictly casting strings to arrays (in pg compat style).

@rohany rohany force-pushed the fmt-export-array branch 4 times, most recently from 4ada21f to 99469eb Compare January 29, 2020 18:06
@awoods187
Copy link
Contributor

Thanks for putting this together @rohany. If possible, it would be great to get this reviewed and merged soon so that we can de-risk the cdc to upsert work needed for several customers.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somewhat trust the code change to do what it says on the label. I would feel more comfortable with a more thorough test suite (The one included here is very "lightweight") but I also want to admit I don't have concrete recommendations. Ideally, someone (or multiple folk) from the Bulk I/O team could make recommendations about testing best practices.

Reviewed 13 of 13 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @rohany)


pkg/sql/sem/tree/parse_string.go, line 22 at r1 (raw file):

// TODO (rohany): why do the various import functions depend on this
//  and not ParseDatumStringAs? What is the "power" of not parsing
//  the input string as bytes but just keeping it as raw bytes?

This is a good question. I think this question should get an answer as part of the review process. Ideally MattJ would answer but he's on leave. Maybe ask @dt who is most likely to remember/know about this.


pkg/sql/sem/tree/parse_string.go, line 50 at r1 (raw file):

//  because there are some things outside of this package in tree (like tree_test)
//  that need the functionality of this (basically the old ParseDatumStringAs).
//  However, I don't think the name is great.

Can you place more words in the comment to explain what this function assumes from its input parameters and what guarantees it provides on its output?

Based on that understanding we can work on a better name.


pkg/sql/sqlbase/roundtrip_format_test.go, line 91 at r1 (raw file):

// TestParseDatumStringAs tests that datums are roundtrippable between
// printing with FmtExport and ParseDatumStringAs.
func TestParseDatumStringAs(t *testing.T) {

So this test is about ParseDatumStringAs. Where are the equivalent tests for the new function?

Copy link
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @knz, and @rohany)


pkg/sql/sem/tree/parse_string.go, line 50 at r1 (raw file):

Previously, knz (kena) wrote…

Can you place more words in the comment to explain what this function assumes from its input parameters and what guarantees it provides on its output?

Based on that understanding we can work on a better name.

Ack, i removed the original comment for some reason.

// parseStringAs parses s as type t for simple types. Arrays and collated
// strings are not handled. nil, nil is returned if t is not a supported type.

It differs from ParseStringAs in that it calls the bytes parsing method when types.Bytes is requested while ParseStringAs doesn't.


pkg/sql/sqlbase/roundtrip_format_test.go, line 91 at r1 (raw file):

Previously, knz (kena) wrote…

So this test is about ParseDatumStringAs. Where are the equivalent tests for the new function?

What do you mean? ParseDatumStringAs and it's test already existed, I just moved it from tree/parse_string_test.go into here and added a case for arrays. There is a randomized test for it above in this same file.

Copy link
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @knz, and @rohany)


pkg/sql/sqlbase/roundtrip_format_test.go, line 91 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

What do you mean? ParseDatumStringAs and it's test already existed, I just moved it from tree/parse_string_test.go into here and added a case for arrays. There is a randomized test for it above in this same file.

*its

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @rohany)


pkg/sql/sem/tree/parse_string.go, line 50 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

Ack, i removed the original comment for some reason.

// parseStringAs parses s as type t for simple types. Arrays and collated
// strings are not handled. nil, nil is returned if t is not a supported type.

It differs from ParseStringAs in that it calls the bytes parsing method when types.Bytes is requested while ParseStringAs doesn't.

ParseAndRequireString? I think we already have XAndRequireY in other places?


pkg/sql/sqlbase/roundtrip_format_test.go, line 91 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

*its

thanks for pointing it out.

For curiosity where is ParseStringMustBe() tested?

Copy link
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @knz, and @rohany)


pkg/sql/sem/tree/parse_string.go, line 50 at r1 (raw file):

Previously, knz (kena) wrote…

ParseAndRequireString? I think we already have XAndRequireY in other places?

That makes sense to me.


pkg/sql/sqlbase/roundtrip_format_test.go, line 91 at r1 (raw file):

Previously, knz (kena) wrote…

thanks for pointing it out.

For curiosity where is ParseStringMustBe() tested?

It was unexported before, and is used in ParseStringAs. ParseStringAs was used in a few places and tested there. However, I don't think it had any unit tests. The answer to the above question (about why using ParseStringAs rather than ParseDatumStringAs) might answer why that is the case.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @knz, and @rohany)


pkg/sql/sqlbase/roundtrip_format_test.go, line 91 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

It was unexported before, and is used in ParseStringAs. ParseStringAs was used in a few places and tested there. However, I don't think it had any unit tests. The answer to the above question (about why using ParseStringAs rather than ParseDatumStringAs) might answer why that is the case.

Ok that makes sense. I'll be ok wrapping up this review once we have more insight about why this function wasn't used. Please approach your favorite TLs to get the information you need.

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @knz)


pkg/sql/sem/tree/parse_string.go, line 22 at r1 (raw file):

Previously, knz (kena) wrote…

This is a good question. I think this question should get an answer as part of the review process. Ideally MattJ would answer but he's on leave. Maybe ask @dt who is most likely to remember/know about this.

Hm, I think it might be a mistake? I see csv uses ParseDatumStringAs so I can't imagine the others actually intended to use something different. I'll try switching all the calls in importccl to be ParseDatumStringAs and see if anything fails.

@dt
Copy link
Member

dt commented Feb 4, 2020

I tried changing them and some of our mysql import tests failed with bytea encoded value ends with incomplete escape sequence. It looks like mysql dumps bytes in some relatively raw string that does not escape in a way our ParseBytes function, which ParseDatumStringAs calls, expected. It might be that we need to be using something (in vitess probably) do parse those, but in any case I don't expect to always be able to use our roundtrip functions against so foreign data source like mysql.

@knz
Copy link
Contributor

knz commented Feb 4, 2020 via email

@dt
Copy link
Member

dt commented Feb 4, 2020

We don't exactly use the same parsing code for mysql and pg -- there's a lot of mysql-specific lexing and some parsing -- but in some places, yes, we do indeed reuse our (pg) parsing functions to get to actual cockroach datums if they do the job, rather than re-implementing parallel versions of them. I think it makes sense to do so long as they work as is, and switch to a bespoke one only when we need different behavior.

@rohany
Copy link
Contributor Author

rohany commented Feb 4, 2020

This is RFAL (now that tests have passed)

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 13 of 13 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @knz, and @rohany)


pkg/sql/sem/tree/parse_string.go, line 21 at r2 (raw file):

// unchanged. Otherwise s is parsed with the given type's Parse func. This function is
// used when the input string might be unescaped raw bytes, so we don't want to run
// a bytes parsing routine on that input.

Maybe you can refer to the other function in this comment and also explain in this comment the difference between the two.


pkg/sql/sem/tree/parse_string.go, line 29 at r2 (raw file):

		d = NewDBytes(DBytes(s))
	case types.CollatedStringFamily:
		d, err = NewDCollatedString(s, t.Locale(), &evalCtx.CollationEnv)

Collated strings can also contain escape sequences. Is this not supported?


pkg/sql/sqlbase/roundtrip_format.go, line 26 at r2 (raw file):

		return parseArray(evalCtx, t, s)
	case types.CollatedStringFamily:
		return tree.NewDCollatedString(s, t.Locale(), &evalCtx.CollationEnv)

huh are you sure about this one. That doesn't seem right. See my previous comment.

Also where are the tests for this.

Copy link
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, RFAL!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @knz, and @rohany)


pkg/sql/sem/tree/parse_string.go, line 21 at r2 (raw file):

Previously, knz (kena) wrote…

Maybe you can refer to the other function in this comment and also explain in this comment the difference between the two.

Done


pkg/sql/sem/tree/parse_string.go, line 29 at r2 (raw file):

Previously, knz (kena) wrote…

Collated strings can also contain escape sequences. Is this not supported?

Good eye -- this was existing code, but it was incorrect (especially for roundtripping). Updated and added to the randomized tester.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, but this PR is missing a couple of release notes. The bug fix should include a description of the symptoms a user would encounter prior to your fix.

Reviewed 6 of 6 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @knz, and @rohany)

@rohany rohany changed the title sqlbase: add round-trippable export and parsing for arrays sqlbase: add round-trippable export and parsing for arrays and collated strings Feb 5, 2020
@rohany
Copy link
Contributor Author

rohany commented Feb 5, 2020

Thanks for the careful review! Updated.

…ed strings

Work for cockroachdb#44322.
Fixes cockroachdb#33429.
Fixes cockroachdb#44414.
Closes cockroachdb#44669.

This PR adds support for ARRAY's and collated strings to be dumped with the
`tree.FmtExport` flag, which dumps datums in a string format that can be
round-tripped with `sqlbase.ParseDatumStringAs`.

This PR additionally performs some code cleanup and renaming
around various parsing functions within the `tree` package.
In particular, this PR removes the function `tree.ParseStringAs` which
was confusingly used sometimes instead of `sqlbase.ParseDatumStringAs`
due to it's handling of raw bytes. Instead, we introduce a new function
`sqlbase.ParseDatumStringAsWithRawBytes` to be more explicit about when
to use one parsing function over the other.

Release note (bug fix): This PR fixes bugs around `cockroach dump` and
`IMPORT`/`EXPORT` where columns of arrays or collated strings were not able
to be roundtripped from cockroach to the dump and back to cockroach.
@rohany
Copy link
Contributor Author

rohany commented Feb 5, 2020

bors r=knz

craig bot pushed a commit that referenced this pull request Feb 5, 2020
44464: sqlbase: add round-trippable export and parsing for arrays and collated strings r=knz a=rohany

Work for #44322.
Fixes #33429.
Fixes #44414.
Closes #44669.

This PR adds support for ARRAY's and collated strings to be dumped with the
`tree.FmtExport` flag, which dumps datums in a string format that can be
round-tripped with `sqlbase.ParseDatumStringAs`.

This PR additionally performs some code cleanup and renaming
around various parsing functions within the `tree` package.
In particular, this PR removes the function `tree.ParseStringAs` which
was confusingly used sometimes instead of `sqlbase.ParseDatumStringAs`
due to it's handling of raw bytes. Instead, we introduce a new function
`sqlbase.ParseDatumStringAsWithRawBytes` to be more explicit about when
to use one parsing function over the other.

Release note (bug fix): This PR fixes bugs around `cockroach dump` and
`IMPORT`/`EXPORT` where columns of arrays or collated strings would
not be able to be roundtripped from cockroach to the dump and back to cockroach.

Co-authored-by: Rohan Yadav <[email protected]>
@craig
Copy link
Contributor

craig bot commented Feb 5, 2020

Build succeeded

@craig craig bot merged commit dbfd995 into cockroachdb:master Feb 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tree.FmtExport does not wrap strings with quotes importccl: EXPORT/IMPORT does not roundtrip arrays properly
5 participants