Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support round operation on datetime64[ns] datatypes #9820

Merged
merged 31 commits into from
Dec 10, 2021
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
f13178b
added series.dt.floor
mayankanand007 Dec 2, 2021
693a6e9
Merge branch 'rapidsai:branch-22.02' into branch-22.02
mayankanand007 Dec 2, 2021
789ace3
added datetimeindex.round
mayankanand007 Dec 2, 2021
7bf420a
Merge branch 'branch-22.02' of https://github.com/mayankanand007/cudf…
mayankanand007 Dec 2, 2021
96d22ba
move round impl. to IndexedFrame
mayankanand007 Dec 2, 2021
fadb90b
Merge branch 'rapidsai:branch-22.02' into branch-22.02
mayankanand007 Dec 2, 2021
98ba3b4
Merge branch 'rapidsai:branch-22.02' into branch-22.02
mayankanand007 Dec 3, 2021
00fea68
fixed typo in test
mayankanand007 Dec 3, 2021
b2c3475
fixed typo in rst file
mayankanand007 Dec 3, 2021
2ed624e
added doxygen docstrings
mayankanand007 Dec 3, 2021
e4f19d9
apply suggestions related to typos in test names
mayankanand007 Dec 3, 2021
91697ec
fixed style
mayankanand007 Dec 4, 2021
43c5d32
fixed style issue
mayankanand007 Dec 4, 2021
d2f9e64
Merge branch 'rapidsai:branch-22.02' into branch-22.02
mayankanand007 Dec 6, 2021
d663784
addressing PR reviews
mayankanand007 Dec 6, 2021
bafd411
Merge branch 'branch-22.02' of https://github.com/mayankanand007/cudf…
mayankanand007 Dec 6, 2021
3c30fbd
updated function docstring with formatting
mayankanand007 Dec 6, 2021
69b447e
Merge branch 'rapidsai:branch-22.02' into branch-22.02
mayankanand007 Dec 6, 2021
35b7e6b
Merge branch 'rapidsai:branch-22.02' into branch-22.02
mayankanand007 Dec 7, 2021
7c6c135
changed field to freq
mayankanand007 Dec 7, 2021
4424080
Merge branch 'branch-22.02' into branch-22.02
mayankanand007 Dec 7, 2021
183cc5f
Apply suggestions from code review
mayankanand007 Dec 7, 2021
b771e80
updated docstring example in DatetimeIndex.round
mayankanand007 Dec 7, 2021
431b24f
Merge branch 'branch-22.02' of https://github.com/mayankanand007/cudf…
mayankanand007 Dec 7, 2021
b99f471
Merge branch 'rapidsai:branch-22.02' into branch-22.02
mayankanand007 Dec 8, 2021
e85d565
Merge branch 'rapidsai:branch-22.02' into branch-22.02
mayankanand007 Dec 9, 2021
09f71e2
changing dtype to auto for timestamps in test
mayankanand007 Dec 9, 2021
435b8f5
Merge branch 'rapidsai:branch-22.02' into branch-22.02
mayankanand007 Dec 9, 2021
f2b68b2
Merge branch 'rapidsai:branch-22.02' into branch-22.02
mayankanand007 Dec 10, 2021
4d0fc98
style fixes
mayankanand007 Dec 10, 2021
8c49f73
changed rounding_kind to rounding_function
mayankanand007 Dec 10, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions cpp/include/cudf/datetime.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -469,5 +469,96 @@ std::unique_ptr<column> floor_nanosecond(
column_view const& column,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Round to the nearest day
*
* @param column cudf::column_view of the input datetime values
* @param mr Device memory resource used to allocate device memory of the returned column.
*
* @throw cudf::logic_error if input column datatype is not TIMESTAMP
* @return cudf::column of the same datetime resolution as the input column
*/
std::unique_ptr<cudf::column> round_day(
cudf::column_view const& column,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Round to the nearest hour
*
* @param column cudf::column_view of the input datetime values
* @param mr Device memory resource used to allocate device memory of the returned column.
*
* @throw cudf::logic_error if input column datatype is not TIMESTAMP
* @return cudf::column of the same datetime resolution as the input column
*/
std::unique_ptr<cudf::column> round_hour(
cudf::column_view const& column,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Round to the nearest minute
*
* @param column cudf::column_view of the input datetime values
* @param mr Device memory resource used to allocate device memory of the returned column.
*
* @throw cudf::logic_error if input column datatype is not TIMESTAMP
* @return cudf::column of the same datetime resolution as the input column
*/
std::unique_ptr<cudf::column> round_minute(
cudf::column_view const& column,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Round to the nearest second
*
* @param column cudf::column_view of the input datetime values
* @param mr Device memory resource used to allocate device memory of the returned column.
*
* @throw cudf::logic_error if input column datatype is not TIMESTAMP
* @return cudf::column of the same datetime resolution as the input column
*/
std::unique_ptr<cudf::column> round_second(
cudf::column_view const& column,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Round to the nearest millisecond
*
* @param column cudf::column_view of the input datetime values
* @param mr Device memory resource used to allocate device memory of the returned column.
*
* @throw cudf::logic_error if input column datatype is not TIMESTAMP
* @return cudf::column of the same datetime resolution as the input column
*/
std::unique_ptr<column> round_millisecond(
column_view const& column,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Round to the nearest microsecond
*
* @param column cudf::column_view of the input datetime values
* @param mr Device memory resource used to allocate device memory of the returned column.
*
* @throw cudf::logic_error if input column datatype is not TIMESTAMP
* @return cudf::column of the same datetime resolution as the input column
*/
std::unique_ptr<column> round_microsecond(
column_view const& column,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Round to the nearest nanosecond
*
* @param column cudf::column_view of the input datetime values
* @param mr Device memory resource used to allocate device memory of the returned column.
*
* @throw cudf::logic_error if input column datatype is not TIMESTAMP
* @return cudf::column of the same datetime resolution as the input column
*/
std::unique_ptr<column> round_nanosecond(
column_view const& column,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

} // namespace datetime
} // namespace cudf
82 changes: 80 additions & 2 deletions cpp/src/datetime/datetime_ops.cu
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,11 @@ enum class datetime_component {
NANOSECOND
};

enum class rounding_kind { CEIL, FLOOR };
enum class rounding_kind {
CEIL, ///< Rounds up to the next integer multiple of the provided frequency
FLOOR, ///< Rounds down to the next integer multiple of the provided frequency
ROUND ///< Rounds to the nearest integer multiple of the provided frequency
bdice marked this conversation as resolved.
Show resolved Hide resolved
};

template <datetime_component Component>
struct extract_component_operator {
Expand Down Expand Up @@ -100,6 +104,7 @@ struct RoundFunctor {
switch (round_kind) {
case rounding_kind::CEIL: return cuda::std::chrono::ceil<DurationType>(dt);
case rounding_kind::FLOOR: return cuda::std::chrono::floor<DurationType>(dt);
case rounding_kind::ROUND: return cuda::std::chrono::round<DurationType>(dt);
default: cudf_assert(false && "Unsupported rounding kind.");
}
__builtin_unreachable();
Expand Down Expand Up @@ -224,7 +229,7 @@ struct is_leap_year_op {
}
};

// Specific function for applying ceil/floor date ops
// Specific function for applying ceil/floor/round date ops
struct dispatch_round {
template <typename Timestamp>
std::enable_if_t<cudf::is_timestamp<Timestamp>(), std::unique_ptr<cudf::column>> operator()(
Expand Down Expand Up @@ -672,6 +677,79 @@ std::unique_ptr<column> floor_nanosecond(column_view const& column,
mr);
}

std::unique_ptr<column> round_day(column_view const& column, rmm::mr::device_memory_resource* mr)
{
CUDF_FUNC_RANGE();
return detail::round_general(detail::rounding_kind::ROUND,
detail::datetime_component::DAY,
column,
rmm::cuda_stream_default,
mr);
}

std::unique_ptr<column> round_hour(column_view const& column, rmm::mr::device_memory_resource* mr)
{
CUDF_FUNC_RANGE();
return detail::round_general(detail::rounding_kind::ROUND,
detail::datetime_component::HOUR,
column,
rmm::cuda_stream_default,
mr);
}

std::unique_ptr<column> round_minute(column_view const& column, rmm::mr::device_memory_resource* mr)
{
CUDF_FUNC_RANGE();
return detail::round_general(detail::rounding_kind::ROUND,
detail::datetime_component::MINUTE,
column,
rmm::cuda_stream_default,
mr);
}

std::unique_ptr<column> round_second(column_view const& column, rmm::mr::device_memory_resource* mr)
{
CUDF_FUNC_RANGE();
return detail::round_general(detail::rounding_kind::ROUND,
detail::datetime_component::SECOND,
column,
rmm::cuda_stream_default,
mr);
}

std::unique_ptr<column> round_millisecond(column_view const& column,
rmm::mr::device_memory_resource* mr)
{
CUDF_FUNC_RANGE();
return detail::round_general(detail::rounding_kind::ROUND,
detail::datetime_component::MILLISECOND,
column,
rmm::cuda_stream_default,
mr);
}

std::unique_ptr<column> round_microsecond(column_view const& column,
rmm::mr::device_memory_resource* mr)
{
CUDF_FUNC_RANGE();
return detail::round_general(detail::rounding_kind::ROUND,
detail::datetime_component::MICROSECOND,
column,
rmm::cuda_stream_default,
mr);
}

std::unique_ptr<column> round_nanosecond(column_view const& column,
rmm::mr::device_memory_resource* mr)
{
CUDF_FUNC_RANGE();
return detail::round_general(detail::rounding_kind::ROUND,
detail::datetime_component::NANOSECOND,
column,
rmm::cuda_stream_default,
mr);
}

std::unique_ptr<column> extract_year(column_view const& column, rmm::mr::device_memory_resource* mr)
{
CUDF_FUNC_RANGE();
Expand Down
72 changes: 72 additions & 0 deletions cpp/tests/datetime/datetime_ops_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -914,4 +914,76 @@ TYPED_TEST(TypedDatetimeOpsTest, TestFloorDatetime)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*floor_millisecond(input), expected_nanosecond);
}

TYPED_TEST(TypedDatetimeOpsTest, TestRoundDatetime)
{
using T = TypeParam;
using namespace cudf::test;
using namespace cudf::datetime;
using namespace cuda::std::chrono;

auto start = milliseconds(-2500000000000); // Sat, 11 Oct 1890 19:33:20 GMT
auto stop = milliseconds(2500000000000); // Mon, 22 Mar 2049 04:26:40 GMT

auto input = generate_timestamps<T>(this->size(), time_point_ms(start), time_point_ms(stop));

auto host_val = to_host<T>(input);
auto timestamps = host_val.first;

std::vector<T> rounded_day(timestamps.size());
std::transform(timestamps.begin(), timestamps.end(), rounded_day.begin(), [](auto i) {
return time_point_cast<typename T::duration>(round<days>(i));
});
auto expected_day = fixed_width_column_wrapper<T, typename T::duration::rep>(rounded_day.begin(),
rounded_day.end());
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*round_day(input), expected_day);
mayankanand007 marked this conversation as resolved.
Show resolved Hide resolved

std::vector<T> rounded_hour(timestamps.size());
std::transform(timestamps.begin(), timestamps.end(), rounded_hour.begin(), [](auto i) {
return time_point_cast<typename T::duration>(round<hours>(i));
});
auto expected_hour = fixed_width_column_wrapper<T, typename T::duration::rep>(
rounded_hour.begin(), rounded_hour.end());
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*round_hour(input), expected_hour);

std::vector<T> rounded_minute(timestamps.size());
std::transform(timestamps.begin(), timestamps.end(), rounded_minute.begin(), [](auto i) {
return time_point_cast<typename T::duration>(round<minutes>(i));
});
auto expected_minute = fixed_width_column_wrapper<T, typename T::duration::rep>(
rounded_minute.begin(), rounded_minute.end());
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*round_minute(input), expected_minute);

std::vector<T> rounded_second(timestamps.size());
std::transform(timestamps.begin(), timestamps.end(), rounded_second.begin(), [](auto i) {
return time_point_cast<typename T::duration>(round<seconds>(i));
});
auto expected_second = fixed_width_column_wrapper<T, typename T::duration::rep>(
rounded_second.begin(), rounded_second.end());
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*round_second(input), expected_second);

std::vector<T> rounded_millisecond(timestamps.size());
std::transform(timestamps.begin(), timestamps.end(), rounded_millisecond.begin(), [](auto i) {
return time_point_cast<typename T::duration>(round<milliseconds>(i));
});
auto expected_millisecond = fixed_width_column_wrapper<T, typename T::duration::rep>(
rounded_millisecond.begin(), rounded_millisecond.end());
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*round_millisecond(input), expected_millisecond);

std::vector<T> rounded_microsecond(timestamps.size());
std::transform(timestamps.begin(), timestamps.end(), rounded_microsecond.begin(), [](auto i) {
return time_point_cast<typename T::duration>(round<microseconds>(i));
});
auto expected_microsecond = fixed_width_column_wrapper<T, typename T::duration::rep>(
rounded_microsecond.begin(), rounded_microsecond.end());
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*round_microsecond(input), expected_microsecond);

std::vector<T> rounded_nanosecond(timestamps.size());
std::transform(timestamps.begin(), timestamps.end(), rounded_nanosecond.begin(), [](auto i) {
return time_point_cast<typename T::duration>(round<nanoseconds>(i));
});
auto expected_nanosecond = fixed_width_column_wrapper<T, typename T::duration::rep>(
rounded_nanosecond.begin(), rounded_nanosecond.end());
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*round_nanosecond(input), expected_nanosecond);
}

CUDF_TEST_PROGRAM_MAIN()
1 change: 1 addition & 0 deletions docs/cudf/source/api_docs/series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,7 @@ Datetime methods
isocalendar
ceil
floor
round


Timedelta properties
Expand Down
13 changes: 13 additions & 0 deletions python/cudf/cudf/_lib/cpp/datetime.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,19 @@ cdef extern from "cudf/datetime.hpp" namespace "cudf::datetime" nogil:
cdef unique_ptr[column] floor_nanosecond(
const column_view& column
) except +
cdef unique_ptr[column] round_day(const column_view& column) except +
cdef unique_ptr[column] round_hour(const column_view& column) except +
cdef unique_ptr[column] round_minute(const column_view& column) except +
cdef unique_ptr[column] round_second(const column_view& column) except +
cdef unique_ptr[column] round_millisecond(
const column_view& column
) except +
cdef unique_ptr[column] round_microsecond(
const column_view& column
) except +
cdef unique_ptr[column] round_nanosecond(
const column_view& column
) except +
cdef unique_ptr[column] add_calendrical_months(
const column_view& timestamps,
const column_view& months
Expand Down
27 changes: 27 additions & 0 deletions python/cudf/cudf/_lib/datetime.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,33 @@ def floor_datetime(Column col, object field):
return result


def round_datetime(Column col, object field):
cdef unique_ptr[column] c_result
cdef column_view col_view = col.view()

with nogil:
# https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.resolution_string.html
if field == "D":
c_result = move(libcudf_datetime.round_day(col_view))
elif field == "H":
c_result = move(libcudf_datetime.round_hour(col_view))
elif field == "T" or field == "min":
c_result = move(libcudf_datetime.round_minute(col_view))
elif field == "S":
c_result = move(libcudf_datetime.round_second(col_view))
elif field == "L" or field == "ms":
c_result = move(libcudf_datetime.round_millisecond(col_view))
elif field == "U" or field == "us":
c_result = move(libcudf_datetime.round_microsecond(col_view))
elif field == "N":
c_result = move(libcudf_datetime.round_nanosecond(col_view))
else:
raise ValueError(f"Invalid resolution: '{field}'")

result = Column.from_unique_ptr(move(c_result))
return result


def is_leap_year(Column col):
"""Returns a boolean indicator whether the year of the date is a leap year
"""
Expand Down
3 changes: 3 additions & 0 deletions python/cudf/cudf/core/column/datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,9 @@ def ceil(self, freq: str) -> ColumnBase:
def floor(self, freq: str) -> ColumnBase:
return libcudf.datetime.floor_datetime(self, freq)

def round(self, freq: str) -> ColumnBase:
return libcudf.datetime.round_datetime(self, freq)

def normalize_binop_value(self, other: DatetimeLikeScalar) -> ScalarLike:
if isinstance(other, cudf.Scalar):
return other
Expand Down
Loading