Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add allowOverflow flag to Timestamp::toTimezone #9836

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions velox/docs/functions/spark/datetime.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,12 @@ These functions support TIMESTAMP and DATE input types.
SELECT from_unixtime(3600, 'yyyy'); -- '1970'
SELECT from_unixtime(9223372036854775807, "yyyy-MM-dd HH:mm:ss"); -- '1969-12-31 23:59:59'

If we run the following query in the `Asia/Shanghai` time zone: ::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to include all 3 queries above, but in this time zone?

The same queries evaluated with `Asia/Shanghai` session time zone return timestamps shifted by 8 hours.

SELECT ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. See 15e85ed


SELECT from_unixtime(100, 'yyyy-MM-dd HH:mm:ss'); -- '1970-01-01 08:01:40'
SELECT from_unixtime(3600, 'yyyy'); -- '1970'
SELECT from_unixtime(9223372036854775807, "yyyy-MM-dd HH:mm:ss"); -- '1970-01-01 07:59:59'

.. spark:function:: from_utc_timestamp(timestamp, string) -> timestamp

Returns the timestamp value from UTC timezone to the given timezone. ::
Expand Down
2 changes: 1 addition & 1 deletion velox/functions/lib/DateTimeFormatter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1033,7 +1033,7 @@ int32_t DateTimeFormatter::format(
bool allowOverflow) const {
Timestamp t = timestamp;
if (timezone != nullptr) {
t.toTimezone(*timezone);
t.toTimezone(*timezone, allowOverflow);
}
const auto timePoint = t.toTimePoint(allowOverflow);
const auto daysTimePoint = date::floor<date::days>(timePoint);
Expand Down
9 changes: 9 additions & 0 deletions velox/functions/sparksql/tests/DateTimeFunctionsTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -898,6 +898,15 @@ TEST_F(DateTimeFunctionsTest, fromUnixtime) {

// 8 hours ahead UTC.
setQueryTimeZone("Asia/Shanghai");
NEUpanning marked this conversation as resolved.
Show resolved Hide resolved
// In debug mode, Timestamp constructor will throw exception if range check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a deeper problem here that Timestamp cannot represent values representable in Spark? If not, then, perhaps there is some issue with the implementation where it attempts to create an "invalid" Timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presto's Timestamp is stored in one 64-bit signed integer for milliseconds, but Spark's Timestamp is stored in 64-bit signed integer for seconds, so this test attempts to create an "invalid" Timestamp. Maybe we should remove the range check for Timestamp since it's also used by Spark now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to follow-up on that. CC: @rui-mo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will work on it when i have time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NEUpanning Do we need to create an issue to track this follow-up?

Copy link
Contributor Author

@NEUpanning NEUpanning May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo Yes, i opened a issue : #9904

// fails.
#ifdef NDEBUG
// Integer overflow in the internal conversion from seconds to milliseconds.
EXPECT_EQ(
fromUnixTime(std::numeric_limits<int64_t>::max(), "yyyy-MM-dd HH:mm:ss"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example in the documentation shows a different result: https://facebookincubator.github.io/velox/functions/spark/datetime.html#from_unixtime

SELECT from_unixtime(9223372036854775807, "yyyy-MM-dd HH:mm:ss");  -- '1969-12-31 23:59:59'

Which one is the "right" one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are right. The example in the documentation doesn't set query time zone configuration, but this test sets time zone to "Asia/Shanghai" , so the results are different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add another example in document for query with time zone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added example in document : 59968ef

"1970-01-01 07:59:59");
NEUpanning marked this conversation as resolved.
Show resolved Hide resolved
#endif

EXPECT_EQ(fromUnixTime(0, "yyyy-MM-dd HH:mm:ss"), "1970-01-01 08:00:00");
EXPECT_EQ(fromUnixTime(120, "yyyy-MM-dd HH:mm"), "1970-01-01 08:02");
EXPECT_EQ(fromUnixTime(-59, "yyyy-MM-dd HH:mm:ss"), "1970-01-01 07:59:01");
Expand Down
4 changes: 2 additions & 2 deletions velox/type/Timestamp.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -118,8 +118,8 @@ Timestamp::toTimePoint(bool allowOverflow) const {
return tp;
}

void Timestamp::toTimezone(const date::time_zone& zone) {
auto tp = toTimePoint();
void Timestamp::toTimezone(const date::time_zone& zone, bool allowOverflow) {
auto tp = toTimePoint(allowOverflow);
auto epoch = zone.to_local(tp).time_since_epoch();
// NOTE: Round down to get the seconds of the current time point.
seconds_ = std::chrono::floor<std::chrono::seconds>(epoch).count();
Expand Down
16 changes: 9 additions & 7 deletions velox/type/Timestamp.h
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ struct Timestamp {
/// Due to the limit of std::chrono, throws if timestamp is outside of
/// [-32767-01-01, 32767-12-31] range.
/// If allowOverflow is true, integer overflow is allowed in converting
/// timestmap to milliseconds.
/// timestamp to milliseconds.
std::chrono::time_point<std::chrono::system_clock, std::chrono::milliseconds>
toTimePoint(bool allowOverflow = false) const;

Expand Down Expand Up @@ -311,12 +311,14 @@ struct Timestamp {
// Same as above, but accepts PrestoDB time zone ID.
void toGMT(int16_t tzID);

// Assuming the timestamp represents a GMT time, converts it to the time at
// the same moment at zone.
// Example: Timestamp ts{0, 0};
// ts.Timezone("America/Los_Angeles");
// ts.toString() returns December 31, 1969 16:00:00
void toTimezone(const date::time_zone& zone);
/// Assuming the timestamp represents a GMT time, converts it to the time at
/// the same moment at zone.
/// @param allowOverflow If true, integer overflow is allowed when converting
/// timestamp to TimePoint. Otherwise, user exception is thrown for overflow.
/// Example: Timestamp ts{0, 0};
/// ts.Timezone("America/Los_Angeles");
/// ts.toString() returns December 31, 1969 16:00:00
void toTimezone(const date::time_zone& zone, bool allowOverflow = false);
NEUpanning marked this conversation as resolved.
Show resolved Hide resolved

// Same as above, but accepts PrestoDB time zone ID.
void toTimezone(int16_t tzID);
Expand Down