-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve compatibility when reading timestamps from JSON and CSV sources #4938
Conversation
Signed-off-by: Andy Grove <[email protected]>
build |
build |
build |
2 similar comments
build |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code is okay, but it is really complicated and a lot of assumptions that only a very specific set of formats are allowed. I keep thinking that there might be a simpler way to make it more data driven with look up tables instead of transpiling everything. But then I see we convert the format to both regular expressions and to the CUDF format and I just don't know if the lookup table will actually be small/less error prone or not.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuTextBasedPartitionReader.scala
Show resolved
Hide resolved
|
||
// fix timestamps that have milliseconds but no microseconds | ||
// example ".296" => ".296000" | ||
val placeholder = "@@@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment about why @@@
is an okay sequence to use here an will never interfere with a real timestamp.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuTextBasedPartitionReader.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuTextBasedPartitionReader.scala
Outdated
Show resolved
Hide resolved
build |
build |
1 similar comment
build |
build |
@revans2 could you re-approve this one, please. I had to upmerge since your last approval. |
Signed-off-by: Andy Grove [email protected]
Closes #4863 and closes #123
Improves timestamp support in JSON and CSV to match Spark, by reading from cuDF as strings and then converting to timestamps in the plugin.
There is one follow-on issues:
Status