Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tree.FmtExport does not wrap strings with quotes #44414

Closed
rohany opened this issue Jan 27, 2020 · 6 comments · Fixed by #44464
Closed

tree.FmtExport does not wrap strings with quotes #44414

rohany opened this issue Jan 27, 2020 · 6 comments · Fixed by #44464
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@rohany
Copy link
Contributor

rohany commented Jan 27, 2020

Found during investigation in #44322.

tree.DNull and "NULL" serialize to the same string because tree.FmtExport does not wrap strings with quotes.

ARRAY['hel, lo'] serializes to "ARRAY[hel,lo]", causing a future parse error.

@rohany rohany added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jan 27, 2020
@rohany
Copy link
Contributor Author

rohany commented Jan 27, 2020

Once this is fixed, the array parser pkg/sql/sem/tree/parse_array.go will have to be updated to handle single quotes on the values.

It seems like the following should be done:

  • When formatting an array, all strings within the array should be quoted.
  • When the array parser reads a NULL, it outputs tree.DNull, because NULLs won't be quoted.
  • If the parser reads "NULL" or 'NULL' it knows that this is a string, not NULL.

@knz
Copy link
Contributor

knz commented Jan 28, 2020

I think it's more complicated than that.
We are following pg's array serialization rules for the pgwire text format, because that is what is needed for several ORMs.

We used to always quote strings-in-arrays, but then things would break. It turns out that some clients expect strings that don't contain special characters to not get a quote on serialization.

Rohan I fear the problem you were set to solve initially is not completely trivial. You're trying to find shortcuts around a fundamental design mistake in our SQL code: we're trying to share parsing routines between at least 3 different "channels" of data:

  1. the pgwire text protocol
  2. the pgwire dump/restore format
  3. possibly other things, perhaps CDC (I'm not sure)

I am not 100% sure that (1) and (2) even in pg use the same routines. This needs to be double checked. FYI when crdb's own pgwire code and dump/restore were implemented, we did not carry out this analysis and things happened to "just work" (until they didn't. We were in luck perhaps?).

I'm also pretty sure that (3) should not be using the same code.

In an ideal world, we'd have a library of pairs of builtin functions in SQL like in pg, for byte and text serialization, then we'd use these routines the way they are used in pg. But I'm not sure how much work there is between where we are now and there.

@knz
Copy link
Contributor

knz commented Jan 28, 2020

NB: my comment about what we were originally doing with array refers to the {...} format when converting an array to string (for pgwire and dump).

I don't actually know what part of the code would get you ARRAY[hel,lo]. That sounds like something that wasn't there. On my machine I see this:

[email protected]:15490/movr> explain(verbose) select array['hel','lo'];
   tree  |     field     |    description    |  columns  | ordering
---------+---------------+-------------------+-----------+-----------
         | distributed   | false             |           |
         | vectorized    | false             |           |
  values |               |                   | ("array") |
         | size          | 1 column, 1 row   |           |
         | row 0, expr 0 | ARRAY['hel','lo'] |           |

See the expression at the bottom, the ARRAY[] is properly pretty-printed so I'm expecting this should work already.

@rohany
Copy link
Contributor Author

rohany commented Jan 28, 2020

Ok, let me try to investigate this more.

I don't actually know what part of the code would get you ARRAY[hel,lo]. That sounds like something that wasn't there. On my machine I see this:

If you format an array with FmtExport that will happen (i tried it in pkg/sql/sem/tree/parse_array_test.go). I'm not sure if the explain verbose uses that flag.

@rohany
Copy link
Contributor Author

rohany commented Jan 28, 2020

Rohan I fear the problem you were set to solve initially is not completely trivial. You're trying to find shortcuts around a fundamental design mistake in our SQL code: we're trying to share parsing routines between at least 3 different "channels" of data:

I believe I'm looking at a simpler problem though -- Just in the case of FmtExport. We only need to ensure that our own output is round trippable with ourselves, not postgres. As context, this work is being done to ensure that we have a format that we can always round trip with ourselves for 2DC and other things.

@knz
Copy link
Contributor

knz commented Jan 28, 2020

all right carry on 👍

craig bot pushed a commit that referenced this issue Feb 5, 2020
44464: sqlbase: add round-trippable export and parsing for arrays and collated strings r=knz a=rohany

Work for #44322.
Fixes #33429.
Fixes #44414.
Closes #44669.

This PR adds support for ARRAY's and collated strings to be dumped with the
`tree.FmtExport` flag, which dumps datums in a string format that can be
round-tripped with `sqlbase.ParseDatumStringAs`.

This PR additionally performs some code cleanup and renaming
around various parsing functions within the `tree` package.
In particular, this PR removes the function `tree.ParseStringAs` which
was confusingly used sometimes instead of `sqlbase.ParseDatumStringAs`
due to it's handling of raw bytes. Instead, we introduce a new function
`sqlbase.ParseDatumStringAsWithRawBytes` to be more explicit about when
to use one parsing function over the other.

Release note (bug fix): This PR fixes bugs around `cockroach dump` and
`IMPORT`/`EXPORT` where columns of arrays or collated strings would
not be able to be roundtripped from cockroach to the dump and back to cockroach.

Co-authored-by: Rohan Yadav <[email protected]>
@craig craig bot closed this as completed in dbfd995 Feb 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants