-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export TZ in tests when default TZ is used #10419
Conversation
Signed-off-by: Robert (Bobby) Evans <[email protected]>
build |
@@ -224,7 +224,7 @@ else | |||
fi | |||
|
|||
# time zone will be tested; use export TZ=time_zone_name before run this script | |||
TZ=${TZ:-UTC} | |||
export TZ=${TZ:-UTC} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we possibly just sweeping under the rug that TZ is not properly translated / propagated to spark properties? https://github.com/NVIDIA/spark-rapids/pull/10419/files#diff-146d61aa008bf60d3c970d1d76208d85b87070466a6d6021fa63eb7478037a27R234-R237
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is a disconnect between Spark and python. It is not that we are setting the timezone incorrectly for Spark. Nor is it technically a problem with the tests detecting what we set it to.
spark-rapids/integration_tests/src/main/python/conftest.py
Lines 84 to 91 in 28bb2b4
def get_test_tz(): | |
return os.environ.get('TZ', 'UTC') | |
def is_utc(): | |
return get_test_tz() == "UTC" | |
def is_not_utc(): | |
return not is_utc() |
The problem is that if we don't export TZ
python does not pick up on it and gets the wrong time zone compared to what we told Spark to use. The following patch shows this.
diff --git a/integration_tests/src/main/python/conftest.py b/integration_tests/src/main/python/conftest.py
index 8251f4b91..85234035c 100644
--- a/integration_tests/src/main/python/conftest.py
+++ b/integration_tests/src/main/python/conftest.py
@@ -82,7 +82,11 @@ def is_dataproc_runtime():
return runtime_env() == "dataproc"
def get_test_tz():
- return os.environ.get('TZ', 'UTC')
+ tz = os.environ.get('TZ', 'UTC')
+ print("TIME ZONE IS " + tz + " !!!!!!")
+ import datetime
+ print(datetime.datetime.now(datetime.timezone.utc).astimezone().tzinfo)
+ return tz
def is_utc():
return get_test_tz() == "UTC"
Without this fix the results show a disconnect and fastparquet is trying to normalize the timestamps differently than Spark is.
TIME ZONE IS UTC !!!!!!
CST
With this fix the results are now in sync
TIME ZONE IS UTC !!!!!!
UTC
If there is no export TZ
ends up not being set for the python process which then falls back to the defaults (UTC for the test detection and whatever the real timezone is for the rest of python). Most of the time pyspark handles converting timestamps between time zones as it goes back and forth between the JVM and python. But fastparquet does not have any of that so when the tests say that it is UTC, but python says it is not we end up with a disconnect and we get the wrong answer/don't skip the test when we should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, this makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This fixes #10411
Looks like we just needed to export the TZ so it properly took effect everywhere.