You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Some of our Spark integration tests for ORC have been failing with timestamp data mismatches. The test writes out randomly generated data (with a deterministic RNG seed) using Spark CPU then reads the data with Spark CPU and then Spark using the RAPIDS Accelerator plugin. Sometimes, not always, the tests will mismatch on timestamp data, e.g.:
Steps/Code to reproduce bug
This has been hard to reproduce outside of the Spark integration tests and even outside of the Jenkins pipelines for those tests. Latest news is we believe it is triggered by using the GMT timezone on a CentOS 7 machine. We've never seen it fail on an Ubuntu 18.04 machine with the GMT timezone. @nvdbaranec has had some success trying to isolate the issue.
Expected behavior
ORC timestamps are loaded via cudf the same way they are loaded by Spark for the GMT timezone.
The text was updated successfully, but these errors were encountered:
…6959)
Fixes#6947
When TZif file has no transitions (e.g. GMT), `build_timezone_transition_table` has an out-of-bounds read that leads to undefined behavior and intermittent issues.
This PR makes two changes to behavior:
1. When there are no transitions, the ancient rule is initialized from the first time offset (instead of the first transition rule, which does not exist in this case).
2. When there are no transitions and the time offset is zero, an empty table is returned (avoid using a no-op table in CUDA).
Authors:
- vuule <[email protected]>
- Vukasin Milovanovic <[email protected]>
Approvers:
- GALI PREM SAGAR
- null
- Ram (Ramakrishna Prabhu)
- David
URL: #6959
Describe the bug
Some of our Spark integration tests for ORC have been failing with timestamp data mismatches. The test writes out randomly generated data (with a deterministic RNG seed) using Spark CPU then reads the data with Spark CPU and then Spark using the RAPIDS Accelerator plugin. Sometimes, not always, the tests will mismatch on timestamp data, e.g.:
Steps/Code to reproduce bug
This has been hard to reproduce outside of the Spark integration tests and even outside of the Jenkins pipelines for those tests. Latest news is we believe it is triggered by using the GMT timezone on a CentOS 7 machine. We've never seen it fail on an Ubuntu 18.04 machine with the GMT timezone. @nvdbaranec has had some success trying to isolate the issue.
Expected behavior
ORC timestamps are loaded via cudf the same way they are loaded by Spark for the GMT timezone.
The text was updated successfully, but these errors were encountered: