Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CustomSeparated is confused without quotes #42352

Closed
stas-sl opened this issue Oct 16, 2022 · 1 comment · Fixed by #43332
Closed

CustomSeparated is confused without quotes #42352

stas-sl opened this issue Oct 16, 2022 · 1 comment · Fixed by #43332
Assignees
Labels
potential bug To be reviewed by developers and confirmed/rejected.

Comments

@stas-sl
Copy link

stas-sl commented Oct 16, 2022

Tried on CH version 22.10.1.1232

This is pretty similar to --input-format CSV, but using '|' as row delimiter instead of '\n'. Doesn't work.

echo -ne "a,b|c,d|" | clickhouse-local --structure "a String, b String"  --input-format CustomSeparated --format_custom_escaping_rule=CSV --format_custom_field_delimiter=',' --format_custom_row_after_delimiter='|' -q 'select * from table'

Code: 27. DB::Exception: Cannot parse input: expected '|' before: ',d|':
Row 1:
Column 0,   name: a, type: String, parsed text: "a"
Column 1,   name: b, type: String, parsed text: "b|c"
ERROR: There is no delimiter after last field: expected "|", got ",d|"

: While executing CustomSeparatedRowInputFormat: While executing File. (CANNOT_PARSE_INPUT_ASSERTION_FAILED)

This is like --input-format TSV, but using ',' instead of '\t' for delimiting columns. Doesn't work.

echo -ne "a,b\nc,d\n" | clickhouse-local --structure "a String, b String"  --input-format CustomSeparated --format_custom_escaping_rule=Escaped --format_custom_field_delimiter=',' --format_custom_row_after_delimiter=$'\n' -q 'select * from table'

Code: 27. DB::Exception: Cannot parse input: expected ',' before: '\nc,d\n':
Row 1:
Column 0,   name: a, type: String, parsed text: "a,b"
ERROR: There is no delimiter between fields: expected ",", got "<LINE FEED>c,d<LINE FEED>"

: While executing CustomSeparatedRowInputFormat: While executing File. (CANNOT_PARSE_INPUT_ASSERTION_FAILED)

If we fully emulate --input-format TSV using CustomSeparated, then it works without quotes. The only difference from the 2nd query is that we use original TSV - '\t' separator for fields, instead of custom one.

echo -ne "a\tb\nc\td\n" | clickhouse-local --structure "a String, b String"  --input-format CustomSeparated --format_custom_escaping_rule=Escaped --format_custom_field_delimiter=$'\t' --format_custom_row_after_delimiter=$'\n' -q 'select * from table'
a       b
c       d

Emulating CSV using CustomSeparated also works, no quotes needed. The only difference from the 1st query is that we are using original CSV - '\n' line separator.

echo -ne "a,b\nc,d\n" | clickhouse-local --structure "a String, b String"  --input-format CustomSeparated --format_custom_escaping_rule=CSV --format_custom_field_delimiter=',' --format_custom_row_after_delimiter=$'\n' -q 'select * from table'
a       b
c       d

Just --input-format CSV also works

echo -ne "a,b\nc,d\n" | clickhouse-local --structure "a String, b String"  --input-format CSV -q 'select * from table'

If we really want to use custom separators that are different from standard CSV or TSV, we should use --format_custom_escaping_rule=CSV and wrap strings in double quotes

echo -ne '"a","b"|"c","d"|' | clickhouse-local --structure "a String, b String"  --input-format CustomSeparated --format_custom_escaping_rule=CSV --format_custom_field_delimiter=',' --format_custom_row_after_delimiter='|' -q 'select * from table'
a       b
c       d

Or use single quotes with --format_custom_escaping_rule=Quoted

echo -ne "'a','b'|'c','d'|" | clickhouse-local --structure "a String, b String"  --input-format CustomSeparated --format_custom_escaping_rule=Quoted --format_custom_field_delimiter=',' --format_custom_row_after_delimiter='|' -q 'select * from table'

Maybe I'm missing something, but I would expect first 2 commands to work without adding quotes, as there are just simple strings without spaces or fancy characters.

@stas-sl stas-sl added the potential bug To be reviewed by developers and confirmed/rejected. label Oct 16, 2022
@tavplubix
Copy link
Member

tavplubix commented Oct 17, 2022

The bug was probably introduced in #31434:

echo -ne "a,b|c,d|" | clickhouse_21.10.1.8009 local --structure "a String, b String"  --input-format CustomSeparated --format_custom_escaping_rule=CSV --format_custom_field_delimiter=',' --format_custom_row_after_delimiter='|' -q 'select * from table'
a	b
c	d

echo -ne "a,b|c,d|" | clickhouse_21.12.1.9017 local --structure "a String, b String"  --input-format CustomSeparated --format_custom_escaping_rule=CSV --format_custom_field_delimiter=',' --format_custom_row_after_delimiter='|' -q 'select * from table' 
Code: 27. DB::Exception: Cannot parse input: expected '|' before: ',d|': 
Row 1:
Column 0,   name: a, type: String, parsed text: "a"
Column 1,   name: b, type: String, parsed text: "b|c"
ERROR: There is no delimiter after last field: expected "|", got ",d|"

: While executing CustomSeparatedRowInputFormat: While executing File. (CANNOT_PARSE_INPUT_ASSERTION_FAILED)

Workaround: it works with Template format (https://clickhouse.com/docs/en/sql-reference/formats/#format-template):

echo -ne "\${a:CSV},\${b:CSV}|" > format.template
echo -ne "a,b|c,d|" | clickhouse-local --structure "a String, b String"  --input-format Template --format_template_row ./format.template --format_template_rows_between_delimiter '' -q 'select * from table'
a	b
c	d

The second case does not work (even with Template), because Escaped serialization require String fields to end with \t or \n, custom separators are not supported for TSV/Escaped(I will improve error message for Template format)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential bug To be reviewed by developers and confirmed/rejected.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants