Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export TZ in tests when default TZ is used #10419

Merged
merged 1 commit into from
Feb 15, 2024

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Feb 13, 2024

This fixes #10411

Looks like we just needed to export the TZ so it properly took effect everywhere.

@revans2
Copy link
Collaborator Author

revans2 commented Feb 13, 2024

build

@@ -224,7 +224,7 @@ else
fi

# time zone will be tested; use export TZ=time_zone_name before run this script
TZ=${TZ:-UTC}
export TZ=${TZ:-UTC}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we possibly just sweeping under the rug that TZ is not properly translated / propagated to spark properties? https://github.com/NVIDIA/spark-rapids/pull/10419/files#diff-146d61aa008bf60d3c970d1d76208d85b87070466a6d6021fa63eb7478037a27R234-R237

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is a disconnect between Spark and python. It is not that we are setting the timezone incorrectly for Spark. Nor is it technically a problem with the tests detecting what we set it to.

def get_test_tz():
return os.environ.get('TZ', 'UTC')
def is_utc():
return get_test_tz() == "UTC"
def is_not_utc():
return not is_utc()

The problem is that if we don't export TZ python does not pick up on it and gets the wrong time zone compared to what we told Spark to use. The following patch shows this.

diff --git a/integration_tests/src/main/python/conftest.py b/integration_tests/src/main/python/conftest.py
index 8251f4b91..85234035c 100644
--- a/integration_tests/src/main/python/conftest.py
+++ b/integration_tests/src/main/python/conftest.py
@@ -82,7 +82,11 @@ def is_dataproc_runtime():
     return runtime_env() == "dataproc"
 
 def get_test_tz():
-    return os.environ.get('TZ', 'UTC')
+    tz = os.environ.get('TZ', 'UTC')
+    print("TIME ZONE IS " + tz + " !!!!!!")
+    import datetime
+    print(datetime.datetime.now(datetime.timezone.utc).astimezone().tzinfo)
+    return tz
 
 def is_utc():
     return get_test_tz() == "UTC"

Without this fix the results show a disconnect and fastparquet is trying to normalize the timestamps differently than Spark is.

TIME ZONE IS UTC !!!!!!
CST

With this fix the results are now in sync

TIME ZONE IS UTC !!!!!!
UTC

If there is no export TZ ends up not being set for the python process which then falls back to the defaults (UTC for the test detection and whatever the real timezone is for the rest of python). Most of the time pyspark handles converting timestamps between time zones as it goes back and forth between the JVM and python. But fastparquet does not have any of that so when the tests say that it is UTC, but python says it is not we end up with a disconnect and we get the wrong answer/don't skip the test when we should.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, this makes sense

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@revans2 revans2 merged commit 13b3535 into NVIDIA:branch-24.04 Feb 15, 2024
40 of 41 checks passed
@sameerz sameerz added the test Only impacts tests label Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Some fast parquet tests fail if the time zone is not UTC
4 participants