[BUG] fall back to CPU if columnNameofCorruptRecord is in the CSV schema #2065

revans2 · 2021-04-01T18:12:48Z

Describe the bug
Spark has this option to deal with parsing bad data. It is kind of convoluted, but there is a config where you can set the name of a column that will deal with corrupt data. Then if spark sees this column name appear in the schema for the CSV data being read Spark will place anything that it thinks is corrupt data in that string column. I don't see a lot of value in having our code support this, but we should fall back to the CPU if we see it.

revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 1, 2021

revans2 mentioned this issue Apr 1, 2021

[BUG] Fix CSV Parsing #2063

Open

38 tasks

sameerz removed the ? - Needs Triage Need team to review and classify label Apr 6, 2021

revans2 mentioned this issue Apr 8, 2022

[TASK] Big Reliability Epic #1870

Closed

14 tasks

revans2 added P0 Must have for release reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] fall back to CPU if columnNameofCorruptRecord is in the CSV schema #2065

[BUG] fall back to CPU if columnNameofCorruptRecord is in the CSV schema #2065

revans2 commented Apr 1, 2021

[BUG] fall back to CPU if columnNameofCorruptRecord is in the CSV schema #2065

[BUG] fall back to CPU if columnNameofCorruptRecord is in the CSV schema #2065

Comments

revans2 commented Apr 1, 2021