Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: pg_dump text array format not compatible with COPY #56593

Closed
nbir opened this issue Nov 12, 2020 · 5 comments
Closed

sql: pg_dump text array format not compatible with COPY #56593

nbir opened this issue Nov 12, 2020 · 5 comments
Assignees
Labels
O-community Originated from the community X-blathers-triaged blathers was able to find an owner

Comments

@nbir
Copy link

nbir commented Nov 12, 2020

The Problem

When importing a PostgreSQL dump file using the COPY statement (instead of INSERT), text array columns, as represented by PostgreSQL, are not compatible with CockroachDB.

Using pg_dump, a column of datatype text[], would be exported as {foobar,baz}.

CockroachDB IMPORT expects it to be '{"foobar","baz"}' (quoted strings). However, this format is incompatible when restoring into PostgreSQL using psql.

The only formats compatible with restoring into PostgreSQL are {foobar,baz} and {"foobar","baz"}.

Expected Behavior

To maintain compatibility of input formats between PostgreSQL and CockroachDB, we would expect CockroachDB IMPORT to work with PostgreSQL dump as is. CockroachDB IMPORT should accept text[] fields in the {foobar,baz} format when using the COPY statement.

Known Workaround

We can get around this issue by using pg_dump with the --inserts flag. This generates a separate INSERT command for each row instead of a single COPY statement. This approach has two drawbacks:

  1. Restores are much slower.
  2. Restores not atomic for an entire table if a row fails to insert.

Another workaround is to hand-manipulate (as shown below) or programmatically manipulate the PostgreSQL dump file to modify text array columns to match the format expected by CockroachDB.

To Reproduce

Files: files.zip

Key Steps

Requires CockroachDB running locally (using this guide).

  1. Using the CockroachDB SQL client create a test_text_array database.
root@localhost:26257/defaultdb> create database test_text_array;
CREATE DATABASE


root@localhost:26257/defaultdb> use test_text_array;
SET
  1. Using the attached pg_dump.sql file, run the following command in the CockroachDB SQL client. This step will fail because of the incompatible format.
root@localhost:26257/test_text_array> import pgdump 'nodelocal://1/pg_dump.sql';
ERROR: nodelocal://1/pg_dump.sql: row 2: parse "text_array_column" as STRING[]: at or near "{": syntax error
SQLSTATE: 42601
DETAIL: source SQL:
SET ROW ({foobar})
         ^
HINT: try \h SET SESSION

Entire Process

Requires PostgreSQL and CockroachDB running locally (using this guide).

Creating a dump from PostgreSQL

  1. Create a table in PostgreSQL with a text array column (datatype text[]) and insert some data by running the attached pg_init.sql file.
$ psql postgres -f pg_init.sql
  1. Create a dump of the database using pg_dump by running the following command.
$ pg_dump test_text_array > ~/node1/extern/test_text_array.sql
  1. Using the CockroachDB SQL client create a test_text_array database.
root@localhost:26257/defaultdb> create database test_text_array;
CREATE DATABASE

root@localhost:26257/defaultdb> use test_text_array;
SET
  1. Import the PostgreSQL dump into CockroachDB. This step will fail because of the incompatible format.
root@localhost:26257/test_text_array> import pgdump 'nodelocal://1/test_text_array.sql';
ERROR: nodelocal://1/test_text_array.sql: row 2: parse "text_array_column" as STRING[]: at or near "{": syntax error
SQLSTATE: 42601
DETAIL: source SQL:
SET ROW ({foobar})
         ^
HINT: try \h SET SESSION

Hand-manipulate dump file to be compatible with CockroachDB

Manually add quotes to text array fields. e.g. change {foobar,baz} in the PostgreSQL dump file to '{"foobar","baz"}'.

  1. Import the hand-manipulated dump into CockroachDB by using the attached pg_dump_modified_for_crdb.sql file. This import should work without any errors.
root@localhost:26257/test_text_array> import pgdump 'nodelocal://1/pg_dump_modified_for_crdb.sql';
        job_id       |  status   | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+------+---------------+--------
  606419473928945665 | succeeded |                  1 |    4 |             0 |   105
(1 row)

Restore hand-manipulated dump file into PostgreSQL

Delete the existing test_text_array table in PostgreSQL and import the hand-manipulated dump file that is compatible with CockroachDB, into PostgreSQL.

  1. Using the attached pg_dump_modified_for_crdb.sql file, run the following command. This fails because the quoted string format for text arrays that is compatible with CockroachDB is not compatible with PostgreSQL.
$ psql test_text_array < ~/node1/extern/pg_dump_modified_for_crdb.sql

CREATE TABLE
ERROR:  malformed array literal: "'{"foobar"}'"
DETAIL:  Array value must start with "{" or dimension information.
CONTEXT:  COPY test_text_array, line 2, column text_array_column: "'{"foobar"}'"

Environment:

  • CockroachDB version v20.2.0
  • OS: macOS 10.15.7
@blathers-crl
Copy link

blathers-crl bot commented Nov 12, 2020

Hello, I am Blathers. I am here to help you get the issue triaged.

It looks like you have not filled out the issue in the format of any of our templates. To best assist you, we advise you to use one of these templates.

I have CC'd a few people who may be able to assist you:

  • @solongordon (found keywords: pg_)
  • @rafiss (found keywords: pg_)
  • @cockroachdb/bulk-io (found keywords: IMPORT,export,Restore)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Nov 12, 2020
@jordanlewis jordanlewis changed the title PostgreSQL pg_dump text array format not compatible with CockroachDB IMPORT when using COPY statement sql: pg_dump text array format not compatible with CockroachDB IMPORT when using COPY statement Nov 12, 2020
@jordanlewis jordanlewis changed the title sql: pg_dump text array format not compatible with CockroachDB IMPORT when using COPY statement sql: pg_dump text array format not compatible with COPY Nov 12, 2020
@jordanlewis
Copy link
Member

Thanks for filing this, @nbir.

@rafiss
Copy link
Collaborator

rafiss commented Nov 12, 2020

Noting stuff here since I can't fully test everything yet:

Seems like the issue is in roundtrip_format.go

func ParseDatumStringAs(t *types.T, s string, evalCtx *tree.EvalContext) (tree.Datum, error) {
	switch t.Family() {
	// We use a different parser for array types because ParseAndRequireString only parses
	// the internal postgres string representation of arrays.
	case types.ArrayFamily, types.CollatedStringFamily:
		return parseAsTyp(evalCtx, t, s)
	default:
		res, _, err := tree.ParseAndRequireString(t, s, evalCtx)
		return res, err
	}
}

If we were to use the default case here, it actually works fine. From what I can see, this was added in #44464 in order to make cockroach dump/import round-trip correctly. But I don't think we really need to use ParseDatumStringAs for COPY.

It seems to work if I change copy.go to use ParseAndRequireString instead of ParseDatumStringAsWithRawBytes. Going to test that a bit more.

@rafiss
Copy link
Collaborator

rafiss commented Nov 13, 2020

This also is covered in #29043

craig bot pushed a commit that referenced this issue Jan 5, 2021
58244: importccl: Fixed COPY FROM STDIN compatibility for IMPORT pgdump array data r=miretskiy a=mokaixu

Previously, when users imported pgdumps that contained
populated tables with array (ex. text[]) column types,
a parsing error would occur when data was populated with
COPY FROM STDIN.

We currently support INSERTing array data using the ARRAY
function or casting it from a string when the
data is enclosed in single quotes (ex. '{3,4}').

Now, we can IMPORT pgdump a file that COPY FROM STDIN
array data (Note that the PGdumped array will
be formatted with curly braces without quotes ({3,4}).

Release note: bug fix
Parsing errors are no longer thrown when importing a pgdump file
with array data.

Fixes: #29043 , #56593

Co-authored-by: Monica Xu <[email protected]>
@rafiss
Copy link
Collaborator

rafiss commented Feb 5, 2021

closed by #58244

@rafiss rafiss closed this as completed Feb 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-community Originated from the community X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants