Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMPORT's CSV parser error messages could be more helpful #25532

Closed
rmloveland opened this issue May 15, 2018 · 1 comment
Closed

IMPORT's CSV parser error messages could be more helpful #25532

rmloveland opened this issue May 15, 2018 · 1 comment
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@rmloveland
Copy link
Collaborator

This is a feature request.

  1. Does an issue already exist addressing this request? If yes, please add a 👍 reaction to the existing issue. If not, move on to step 2.

I did not find anything with the search string 'csv is:open' that would address this request (apologies if I missed it).

  1. Please describe the feature you are requesting, as well as your proposed use case for this feature.

I would like the CSV parser used by IMPORT to produce more informative error messages.

Given CSV files called states and counties, which look like (respectively):

# id, state_name
0,Alabama
1,Alaska
3,Arizona
4,Arkansas
5,California
6,Colorado
7,Connecticut
...

# id, county_name, state_id (FK)
0,ABBEVILLE,46
1,ACADIA,20
2,ACCOMACK,56
3,ADA,14
4,ADAIR,17
5,ADAIR,19
6,ADAIR,28
...

And the following IMPORT statements:

IMPORT TABLE states (
  id INT PRIMARY KEY,
  name VARCHAR
) CSV DATA ('https://logicgrimoire.files.wordpress.com/2018/05/states3.xls') WITH comment = "#";

IMPORT TABLE counties (
  id INT PRIMARY KEY,
  name VARCHAR,
  state INT REFERENCES states (id)
) CSV DATA ('https://logicgrimoire.files.wordpress.com/2018/05/counties2.xls') WITH comment = "#";

I was able to get the following error messages from the CSV parser:

  1. ERROR: could not parse " " as type int: strconv.ParseInt: parsing " ": invalid syntax
  2. ERROR: https://logicgrimoire.files.wordpress.com/2018/05/states2.xls: row 58: expected 2 fields, got 3

In case 1, the offending line was the middle of

60,AMELIA,56
61,AMERICAN SAMOA, 
62,AMHERST,56

This seems like a very low-level error to expose to an end user. I was able to work out what was happening, but I think some users would have trouble unless they had previous programming experience. Ideally we could inform the user of a logical mismatch between the layout of the table they want to define and the file format they are passing in to the statement. In particular if this were a large file (multi-GB) I would have had no idea where the error was or how to proceed, as this data is valid according to other CSV parsers but logically incorrect from the POV of IMPORT.

In case 2, the offending line was the middle of

57, Washington
58, "Washington, D.C."
59, West Virginia

However, this is not the actual line. I originally had a line that looked exactly like that but apparently had a non-printing whitespace (or something) that was causing the CSV parse to fail. (FWIW CRDB's is not the only CSV parser that reported an error. I was also able to replicate the CSV parse error using a Perl library.)

In any case, I think the error message for case 2 is much better (it told me where to look!) but in this case it might be too high-level in a way, since it doesn't tell me where exactly (character offset or so) the CSV parse failed for that line.

  1. Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have). Are you currently using any workarounds to address this issue?

My workaround was to import my CSV into an Excel-compatible spreadsheet program and then re-export to CSV, after which things were fine in case 2 (but still not case 1 for reasons listed above).

The workaround was OK for my very small test data set, but might have been a blocker if I had several gigabytes of data to import. Also the first error message in particular would have probably blocked me completely with a larger/more complex data set given that there was no line number for me to investigate. (I could have written some custom tools to validate/fix my CSV to match the table layout, but at that point ... it would depend on how badly I wanted to use this DB.)

@rmloveland
Copy link
Collaborator Author

(PS: just realized the SQL pasted above errors due to foreign keys not being supported for CSV import - please ignore that, this issue is only about CSV. I was able to import using updated SQL.)

@maddyblue maddyblue added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label May 15, 2018
craig bot pushed a commit that referenced this issue May 24, 2018
26032: importccl: append file/row notes to IMPORT error messages r=mjibson a=mjibson

Fixes #25532

Release note (sql change): improve IMPORT error messages

26043: sql: Add test showing manual column type change r=bobvawter a=bobvawter

This test verifies that it's possible to effect a non-trivial column type
change using the current state of the system and verifies the planned strategy
for ALTER COLUMN TYPE.

Release note: None

Co-authored-by: Matt Jibson <[email protected]>
Co-authored-by: Bob Vawter <[email protected]>
@craig craig bot closed this as completed in #26032 May 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

2 participants