-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] GpuGetJsonObject does not expand escaped characters #9033
Labels
bug
Something isn't working
Comments
revans2
added
bug
Something isn't working
? - Needs Triage
Need team to review and classify
labels
Aug 14, 2023
I found the code that handles escapes in the JSON parser that Spark uses. It is not in a simple place to link to because Github has a size limit and this file is crazy large.
|
Hi @nvdbaranec, I suppose you don't work on this issue so I assign to Chong. Please let me know if it's not correct. |
You are correct. |
Will be fixed by PR: NVIDIA/spark-rapids-jni#1868 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
If I have a string with escaped characters in it, the cpu version of GetJsonObject will interpret those escaped characters, but the GPU version will not.
For example if I have the file test.tsv.
If I run the following commands I get different results on the CPU and the GPU.
But don't let the '\n' and \t' fool you. That is just show cleaning things up for us and re-escaping the results.
I don't think it would be too hard to post process the returned data, but it might be better to do it in the get_json_object kernel itself, that way we can tell if the data is in quotes or not. We might also need it to properly get the key, if the key is escaped in some way.
The text was updated successfully, but these errors were encountered: