Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-36026: [C++][ORC] Catch all ORC exceptions to avoid crash #40697
GH-36026: [C++][ORC] Catch all ORC exceptions to avoid crash #40697
Changes from 4 commits
0c640ac
589b252
a2f7937
a4d880e
6320e7a
182e1d3
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? (Is
catch (const std::exception& e)
enough?)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually it is enough. But just in case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this useful if we're correctly catching exceptions? I'm afraid this could get out of sync with ORC's own timezone loading code.
I'd rather not do this, and let ORC fix the issue by making the timezone file optional (what is it used for exactly?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wgtmac what would be the error message that you get when not doing the above now that you are catching the exceptions?
Because one reason to keep this, is to provide an informative error message (most Windows users will see this the first time, so I think it is important to give some guidance in the error message, and not just a "file not found").
(to avoid getting out of sync with ORC, we could add an ORC version check and only do this for ORC<2?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After catching all exceptions, the error message here without the
check_timezone_database_availability()
isInvalid: Can't open /usr/share/zoneinfo/XXX
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ORC has two timestamp types: timestamp (namely timestamp_without_timezone) and timestamp_instant (namely timestamp_with_local_timezone).
getLocalTimezone()
on startup.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we convert, we should convert to UTC instead of converting to the local timezone. That's how Arrow timestamps work (but we still need a timezone DB for the conversion anyway):
arrow/format/Schema.fbs
Lines 283 to 288 in 5181791
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that's a better idea. However, this was the design choice of ORC timestamp type which originates from Hive. So it would be better to use timestamp_instant type in favor of the old timestamp type in ORC at any time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if a future ORC versions advertises e.g. "2.1.0.1", this will fail? Sounds inflexible while we're only concerned about the major number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense! It may also be
2.1.0-SNAPSHOT
or whatever custom patched version. Let me change it to only care about major and minor version instead.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't required anymore, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use
EnvVarGuard
so that the value doesn't leak if this test fails.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice that. Thanks!