[BUG] CSV parsing of malformed lines is empty string not null #2068

revans2 · 2021-04-01T18:30:46Z

Describe the bug
In CSV it is possible to have a malformed line where there is not data for each entry on a line at the end.

A,B,C
number,

In these cases Spark will insert a null no matter what, but cudf always treats it like an empty string, and then applies the rules for null values. So if the null value is an empty string, which is the default, then everything looks fine. If not then cudf produces different results.

Steps/Code to reproduce bug
We have an integration test for this test_basic_read in the CSV tests for trucks-null.csv where nullValue is set to null

The text was updated successfully, but these errors were encountered:

revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 1, 2021

revans2 mentioned this issue Apr 1, 2021

[BUG] Fix CSV Parsing #2063

Open

38 tasks

sameerz removed the ? - Needs Triage Need team to review and classify label Apr 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] CSV parsing of malformed lines is empty string not null #2068

[BUG] CSV parsing of malformed lines is empty string not null #2068

revans2 commented Apr 1, 2021

[BUG] CSV parsing of malformed lines is empty string not null #2068

[BUG] CSV parsing of malformed lines is empty string not null #2068

Comments

revans2 commented Apr 1, 2021