Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] CUDF does not strip trailing white space after a quoted string value #2069

Open
revans2 opened this issue Apr 1, 2021 · 1 comment
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@revans2
Copy link
Collaborator

revans2 commented Apr 1, 2021

Describe the bug
When spark sees a line like

"A"   ,"B"

It sees that there is white space after A, and because A is quoted it will strip off the quotes and the trailing white space. CUDF sees any white space at the end after the quote and assumes that all of the data should be a part of the string so it produces a value of "A"

Steps/Code to reproduce bug
We have an integration test for this in the CSV tests. test_basic_read for the file ints_with_whitespace.csv but with the schema for strings instead of bytes.

@revans2 revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 1, 2021
@revans2 revans2 mentioned this issue Apr 1, 2021
38 tasks
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Apr 6, 2021
@revans2 revans2 added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Aug 16, 2023
@revans2
Copy link
Collaborator Author

revans2 commented Aug 16, 2023

I filed rapidsai/cudf#13892 for this in CUDF

@revans2 revans2 added the reliability Features to improve reliability or bugs that severly impact the reliability of the plugin label Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

No branches or pull requests

2 participants