Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] fall back to CPU if columnNameofCorruptRecord is in the CSV schema #2065

Open
revans2 opened this issue Apr 1, 2021 · 0 comments
Open
Labels
bug Something isn't working P0 Must have for release reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@revans2
Copy link
Collaborator

revans2 commented Apr 1, 2021

Describe the bug
Spark has this option to deal with parsing bad data. It is kind of convoluted, but there is a config where you can set the name of a column that will deal with corrupt data. Then if spark sees this column name appear in the schema for the CSV data being read Spark will place anything that it thinks is corrupt data in that string column. I don't see a lot of value in having our code support this, but we should fall back to the CPU if we see it.

@revans2 revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 1, 2021
@revans2 revans2 mentioned this issue Apr 1, 2021
38 tasks
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Apr 6, 2021
@revans2 revans2 mentioned this issue Apr 8, 2022
14 tasks
@revans2 revans2 added P0 Must have for release reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

No branches or pull requests

2 participants