Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-1684: [C++] Find tzdb without TZDIR when in conda-environments #1882

Closed
wants to merge 20 commits into from

Conversation

h-vetinari
Copy link
Contributor

@h-vetinari h-vetinari commented Apr 10, 2024

What changes were proposed in this pull request?

Find tzdb without having to set TZDIR when in a conda-environment (where tzdata has a uniform location of $CONDA_PREFIX/share/zoneinfo across all platforms).

Why are the changes needed?

This is due to issues in arrow (see apache/arrow#36026) that cannot really be fixed there, as it assumes that orc >=2.0 knows how to find the tzdb. Having to set TZDIR in all user environments is an intrusive change that should be avoided, and since the cost here is checking a single environment variable, it's hopefully not too onerous for consideration.

How was this patch tested?

CI here

Was this patch authored or co-authored using generative AI tooling?

No

CC @wgtmac

See also: #1587

@h-vetinari
Copy link
Contributor Author

If desired, I can test this patch on conda-forge infrastructure (build a patched orc that's published in a separate channel, then run conda-forge/arrow-cpp-feedstock#1086 against that).

@wgtmac wgtmac changed the title [C++] find tzdb without TZDIR when in conda-environments ORC-1684: [C++] Find tzdb without TZDIR when in conda-environments Apr 10, 2024
@wgtmac
Copy link
Member

wgtmac commented Apr 10, 2024

Thanks for the fix! Do you have a JIRA account? I can assign this to you.

c++/src/Timezone.cc Outdated Show resolved Hide resolved
c++/src/Timezone.cc Outdated Show resolved Hide resolved
Copy link
Contributor Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick review!

Thanks for the fix! Do you have a JIRA account? I can assign this to you.

https://issues.apache.org/jira/secure/ViewProfile.jspa?name=h-vetinari

c++/src/Timezone.cc Outdated Show resolved Hide resolved
c++/src/Timezone.cc Outdated Show resolved Hide resolved
@h-vetinari
Copy link
Contributor Author

Would it be possible/acceptable to create a symlink in CI? I believe we could write a test that exercises the branch for CONDA_PREFIX here similar to

TEST(TestTimezone, testMissingTZDB) {

but it would need a working tzdb at the end of $CONDA_PREFIX/share/zoneinfo.

@h-vetinari
Copy link
Contributor Author

Ah, just found

set(TZDATA_DIR ${SOURCE_DIR}/share/zoneinfo)

So I guess it should be there?

@h-vetinari
Copy link
Contributor Author

h-vetinari commented Apr 10, 2024

Interestingly, I cannot inspect any CI logs here. Unfolding any section just gives a red Error:.

Nevermind, it worked now after 5min...

@wgtmac
Copy link
Member

wgtmac commented Apr 10, 2024

Ah, just found

set(TZDATA_DIR ${SOURCE_DIR}/share/zoneinfo)

So I guess it should be there?

That is only used in WIN32 to make test pass on Windows...

@h-vetinari
Copy link
Contributor Author

That is only used in WIN32 to make test pass on Windows...

Well, then we could still test that the CONDA_PREFIX mechanics work on windows, no? See my last commit, which tries to do this.

@h-vetinari
Copy link
Contributor Author

OK, so the test I added passed now:

[ RUN      ] TestTimezone.testTzdbFromCondaEnv
[       OK ] TestTimezone.testTzdbFromCondaEnv (0 ms)

but restoring TZDIR still fails:

 unknown file: error: C++ exception with description "Time zone file ������������������������������������������������������������������������������/UTC does not exist. 

Since this is copied from testMissingTZDB, I'm surprised the same error doesn't happen there, because that test is saving tzDirBackup in exactly the same way - i.e. just a pointer to an environment variable which then gets deleted. It seems to me that the restoration of that environment variable cannot succeed (without a deepcopy before deleting TZDIR), and so far that test may have only passed due to checking if (tzDirBackup != nullptr), but - potentially - never actually restored the environment variable...? 🤔

@h-vetinari
Copy link
Contributor Author

Tests are passing on windows now! 🥳

@h-vetinari
Copy link
Contributor Author

This is all green now and should be ready for closer review. :)

@h-vetinari
Copy link
Contributor Author

h-vetinari commented Apr 11, 2024

I have a last question - I see that there's some sort of caching for the tzdb going on (or at least, there's a test for it):

TEST(TestTimezone, testZoneCache) {
const Timezone* la1 = &getTimezoneByName("America/Los_Angeles");

Can we delete that cache somehow to rule out that this is what's causing testTzdbFromCondaEnv to pass? Or is it enough to choose some timezones that haven't been queried yet to bypass the cache?

@wgtmac
Copy link
Member

wgtmac commented Apr 11, 2024

I have a last question - I see that there's some sort of caching for the tzdb going on (or at least, there's a test for it):

TEST(TestTimezone, testZoneCache) {
const Timezone* la1 = &getTimezoneByName("America/Los_Angeles");

Can we delete that cache somehow to rule out that this is what's causing testTzdbFromCondaEnv to pass?

We can either use a different timezone to avoid caching or add a function to invalidate the cache.

@h-vetinari h-vetinari force-pushed the tzdb branch 3 times, most recently from a431294 to 8752fa2 Compare April 11, 2024 03:41
@h-vetinari
Copy link
Contributor Author

Sorry for the force-pushes; I had to figure out the right dates/times (which is a bit harder because we apparently need to specify them in UTC rather than local time; in the latter it would always be around 2/3am).

c++/test/TestTimezone.cc Outdated Show resolved Hide resolved
@h-vetinari
Copy link
Contributor Author

OK, I think I'm done now from my side. This tests all the things I could think of as relevant.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @h-vetinari for all your effort!

@wgtmac wgtmac closed this in e89ca33 Apr 12, 2024
wgtmac pushed a commit that referenced this pull request Apr 12, 2024
### What changes were proposed in this pull request?

Find tzdb without having to set `TZDIR` when in a conda-environment (where `tzdata` [has](https://conda-metadata-app.streamlit.app/?q=conda-forge%2Fnoarch%2Ftzdata-2024a-h0c530f3_0.conda) a uniform location of `$CONDA_PREFIX/share/zoneinfo` across all platforms).

### Why are the changes needed?

This is due to issues in arrow (see apache/arrow#36026) that cannot really be fixed there, as it assumes that orc >=2.0 knows how to find the tzdb. Having to set `TZDIR` in all user environments is an intrusive change that should be avoided, and since the cost here is checking a single environment variable, it's hopefully not too onerous for consideration.

### How was this patch tested?

CI here

### Was this patch authored or co-authored using generative AI tooling?

No

CC wgtmac

See also: #1587

Closes #1882 from h-vetinari/tzdb.

Authored-by: H. Vetinari <[email protected]>
Signed-off-by: Gang Wu <[email protected]>
(cherry picked from commit e89ca33)
Signed-off-by: Gang Wu <[email protected]>
@h-vetinari h-vetinari deleted the tzdb branch April 12, 2024 05:40
@h-vetinari
Copy link
Contributor Author

Thanks a lot for the review and help here @wgtmac! 🙏

Would you consider including this in the 2.0.1 release? I'm probably going to backport this in conda-forge if not, but it would be nicer if it could be included in the release here directly; seeing that it's just about checking a single environment variable, I think the risk here is very low.

@wgtmac
Copy link
Member

wgtmac commented Apr 12, 2024

Yes, I have already ported it to branch-2.0 and will be released in 2.0.1.

@h-vetinari
Copy link
Contributor Author

Amazing, thanks a lot! 🤩

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants