IMPORT's CSV parser error messages could be more helpful #25532
Labels
A-disaster-recovery
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
This is a feature request.
I did not find anything with the search string 'csv is:open' that would address this request (apologies if I missed it).
I would like the CSV parser used by IMPORT to produce more informative error messages.
Given CSV files called
states
andcounties
, which look like (respectively):And the following
IMPORT
statements:I was able to get the following error messages from the CSV parser:
ERROR: could not parse " " as type int: strconv.ParseInt: parsing " ": invalid syntax
ERROR: https://logicgrimoire.files.wordpress.com/2018/05/states2.xls: row 58: expected 2 fields, got 3
In case 1, the offending line was the middle of
This seems like a very low-level error to expose to an end user. I was able to work out what was happening, but I think some users would have trouble unless they had previous programming experience. Ideally we could inform the user of a logical mismatch between the layout of the table they want to define and the file format they are passing in to the statement. In particular if this were a large file (multi-GB) I would have had no idea where the error was or how to proceed, as this data is valid according to other CSV parsers but logically incorrect from the POV of
IMPORT
.In case 2, the offending line was the middle of
However, this is not the actual line. I originally had a line that looked exactly like that but apparently had a non-printing whitespace (or something) that was causing the CSV parse to fail. (FWIW CRDB's is not the only CSV parser that reported an error. I was also able to replicate the CSV parse error using a Perl library.)
In any case, I think the error message for case 2 is much better (it told me where to look!) but in this case it might be too high-level in a way, since it doesn't tell me where exactly (character offset or so) the CSV parse failed for that line.
My workaround was to import my CSV into an Excel-compatible spreadsheet program and then re-export to CSV, after which things were fine in case 2 (but still not case 1 for reasons listed above).
The workaround was OK for my very small test data set, but might have been a blocker if I had several gigabytes of data to import. Also the first error message in particular would have probably blocked me completely with a larger/more complex data set given that there was no line number for me to investigate. (I could have written some custom tools to validate/fix my CSV to match the table layout, but at that point ... it would depend on how badly I wanted to use this DB.)
The text was updated successfully, but these errors were encountered: