You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
CSV is not a very well defined standard. There are lots and lots of different options for parsing values, escaping characters and configuring delimiters. Because of this complexity we should develop a fuzz testing framework to be able to verify that our code behaves the same as Spark on the CPU. We should concentrate on the default settings.
format: UTF-8
delimiter: ,
quote: "
escape: \
lineSeparator: (not set so it is \r|\n|\r\n)
charToEscapeQuoteEscaping: not set
comment: \u0000 (aka not set)
ignoreLeadingWhiteSpace: false
ignoreTrailingWhiteSpace: false
emptyValue: (empty string)
unescapedQuoteHandling: STOP_AT_DELIMITER
And a schema is also provided.
It would be great to expand this out further in the future, but for now this is the most important. The next things to look at testing would be changing the delimiter.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
CSV is not a very well defined standard. There are lots and lots of different options for parsing values, escaping characters and configuring delimiters. Because of this complexity we should develop a fuzz testing framework to be able to verify that our code behaves the same as Spark on the CPU. We should concentrate on the default settings.
format: UTF-8
delimiter:
,
quote:
"
escape:
\
lineSeparator: (not set so it is
\r|\n|\r\n
)charToEscapeQuoteEscaping: not set
comment:
\u0000
(aka not set)ignoreLeadingWhiteSpace: false
ignoreTrailingWhiteSpace: false
emptyValue: (empty string)
unescapedQuoteHandling:
STOP_AT_DELIMITER
And a schema is also provided.
It would be great to expand this out further in the future, but for now this is the most important. The next things to look at testing would be changing the delimiter.
The text was updated successfully, but these errors were encountered: