Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add isocalendar API support #9169

Merged
merged 64 commits into from
Sep 30, 2021
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
c8fe494
created python isocalendar method, added changes in python for new da…
marlenezw Sep 2, 2021
34b3efb
fixing style issues.
marlenezw Sep 2, 2021
48eefad
removing astype and adding more tests.
marlenezw Sep 7, 2021
2d63285
fixing style issues.
marlenezw Sep 7, 2021
e520eb6
Update python/cudf/cudf/core/index.py
marlenezw Sep 17, 2021
40440dd
Update python/cudf/cudf/core/index.py
marlenezw Sep 17, 2021
df15c72
Update python/cudf/cudf/core/index.py
marlenezw Sep 17, 2021
5a8d4b8
Update python/cudf/cudf/core/index.py
marlenezw Sep 17, 2021
3fc8513
Update python/cudf/cudf/core/series.py
marlenezw Sep 17, 2021
1f4e088
Update python/cudf/cudf/core/series.py
marlenezw Sep 17, 2021
0f07f84
Update python/cudf/cudf/core/series.py
marlenezw Sep 17, 2021
cde53f2
Merge branch 'branch-21.12' of https://github.com/rapidsai/cudf into …
marlenezw Sep 20, 2021
d533c6d
Merge branch 'isocalendar' of https://github.com/marlenezw/cudf into …
marlenezw Sep 20, 2021
4c16cb8
using as_column instead of to_cudf.
marlenezw Sep 20, 2021
f1b7153
moving logic for iso to one place.
marlenezw Sep 20, 2021
576fe1d
fixing style issues.
marlenezw Sep 20, 2021
edce399
Merge branch 'branch-21.10' of https://github.com/rapidsai/cudf into …
marlenezw Sep 20, 2021
f73570d
switching to branch-21.10
marlenezw Sep 20, 2021
30da937
Merge branch 'branch-21.10' of https://github.com/rapidsai/cudf into …
marlenezw Sep 21, 2021
769e246
fixing errors and removing extra branching
marlenezw Sep 21, 2021
de3b738
maintaining index values.
marlenezw Sep 22, 2021
c1c3429
Update python/cudf/cudf/core/index.py
marlenezw Sep 22, 2021
1934ab2
Update python/cudf/cudf/core/series.py
marlenezw Sep 22, 2021
83f0d98
Update python/cudf/cudf/core/tools/datetimes.py
marlenezw Sep 22, 2021
f2ff86a
catching error if not an index or series.
marlenezw Sep 22, 2021
ddf4bed
Merge branch 'isocalendar' of https://github.com/marlenezw/cudf into …
marlenezw Sep 22, 2021
bc1e41e
fixing style issues.
marlenezw Sep 22, 2021
b0c320b
adding format params in test_datetime_strftime.
marlenezw Sep 22, 2021
35cac3e
adding cython code for a,A,b,B.
marlenezw Sep 23, 2021
513b015
Merge branch 'branch-21.10' of https://github.com/rapidsai/cudf into …
marlenezw Sep 23, 2021
995a3d4
Update python/cudf/cudf/_lib/string_casting.pyx
marlenezw Sep 23, 2021
87e94d4
fixing mypy issue,making to_iso internal,removing dead code.
marlenezw Sep 23, 2021
f0784d1
Merge branch 'isocalendar' of https://github.com/marlenezw/cudf into …
marlenezw Sep 23, 2021
121b45b
linting.
marlenezw Sep 23, 2021
1320cef
Update python/cudf/cudf/core/index.py
marlenezw Sep 23, 2021
3b611c9
adding new tests.
marlenezw Sep 23, 2021
c3f6196
Merge branch 'isocalendar' of https://github.com/marlenezw/cudf into …
marlenezw Sep 23, 2021
a85231b
doc string changes and more params for tests.
marlenezw Sep 23, 2021
c6d507e
Apply suggestions from code review
galipremsagar Sep 23, 2021
90c54f0
isort
galipremsagar Sep 23, 2021
3796396
Merge branch 'isocalendar' of https://github.com/marlenezw/cudf into …
galipremsagar Sep 23, 2021
f66b4d9
fix docs
galipremsagar Sep 23, 2021
9f01f2c
Merge branch 'branch-21.12' of https://github.com/rapidsai/cudf into …
marlenezw Sep 27, 2021
388cfec
fixing cython issues.
marlenezw Sep 27, 2021
cf8163b
Merge branch 'isocalendar' of https://github.com/marlenezw/cudf into …
marlenezw Sep 27, 2021
2278e70
Merge branch 'branch-21.12' of https://github.com/rapidsai/cudf into …
marlenezw Sep 27, 2021
8a5d2eb
fixing merge branch switching issues.
marlenezw Sep 27, 2021
5a7b4ae
Merge branch 'isocalendar' of https://github.com/marlenezw/cudf into …
marlenezw Sep 27, 2021
3ac1658
fixing branch switching errors.
marlenezw Sep 27, 2021
6413169
fixing more merge conflicts.
marlenezw Sep 27, 2021
9311326
more merge conflict changes.
marlenezw Sep 27, 2021
086723d
changes to style for branch-21.12
marlenezw Sep 27, 2021
cfeea6a
more style changes.
marlenezw Sep 27, 2021
26d22b5
more style changes.
marlenezw Sep 27, 2021
20ce42a
more style changes.
marlenezw Sep 27, 2021
31bbe49
running pre-commit
marlenezw Sep 27, 2021
978ef32
Merge branch 'branch-21.12' of https://github.com/rapidsai/cudf into …
marlenezw Sep 28, 2021
d3ffad7
string format to cython.
marlenezw Sep 28, 2021
f235c6b
Update .pre-commit-config.yaml
marlenezw Sep 28, 2021
19821df
Update __init__.py
marlenezw Sep 28, 2021
8b9bebf
Update __init__.py
marlenezw Sep 28, 2021
411a790
Update __init__.py
marlenezw Sep 28, 2021
67b8a98
Merge branch 'branch-21.12' of https://github.com/rapidsai/cudf into …
marlenezw Sep 29, 2021
0d25c18
Merge branch 'branch-21.12' of https://github.com/rapidsai/cudf into …
marlenezw Sep 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions python/cudf/cudf/core/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -1616,6 +1616,40 @@ def quarter(self):
res = extract_quarter(self._values)
return Int8Index(res, dtype="int8")

def isocalendar(self):
"""
Returns a DataFrame with the year, week, and day
calculated according to the ISO 8601 standard.
Returns
-------
DataFrame
with columns year, week and day
Examples
--------

>>> gIndex = cudf.DatetimeIndex(["2020-05-31 08:00:00",
... "1999-12-31 18:40:00"])
>>> gIndex.isocalendar()
year week day
2020-05-31 08:00:00 2020 22 7
1999-12-31 18:40:00 1999 52 5
"""
marlenezw marked this conversation as resolved.
Show resolved Hide resolved
indexSeries = cudf.core.tools.datetimes.to_iso_calendar(self)

@property
def day(self):
return indexSeries["day"]

@property
def week(self):
return indexSeries["week"]

@property
def year(self):
return indexSeries["year"]

return indexSeries
marlenezw marked this conversation as resolved.
Show resolved Hide resolved

def to_pandas(self):
nanos = self._values.astype("datetime64[ns]")
return pd.DatetimeIndex(nanos.to_pandas(), name=self.name)
Expand Down
71 changes: 57 additions & 14 deletions python/cudf/cudf/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -5503,6 +5503,61 @@ def quarter(self):
{None: res}, index=self.series._index, name=self.series.name,
)

def isocalendar(self):
"""
Returns a DataFrame with the year, week, and day
calculated according to the ISO 8601 standard.
Returns
-------
DataFrame
with columns year, week and day
Examples
--------
>>> ser = cudf.Series(pd.date_range(start="2021-07-25",
... end="2021-07-30"))
>>> ser.dt.isocalendar()
year week day
0 2021 29 7
1 2021 30 1
2 2021 30 2
3 2021 30 3
4 2021 30 4
5 2021 30 5
>>> ser.dt.isocalendar().week
0 29
1 30
2 30
3 30
4 30
5 30
Name: week, dtype: object

>>> serIndex = cudf.to_datetime(pd.Series(["2010-01-01", pd.NaT]))
>>> serIndex.dt.isocalendar()
year week day
0 2009 53 5
1 <NA> <NA> <NA>
>>> serIndex.dt.isocalendar().year
0 2009
1 <NA>
Name: year, dtype: object
"""
isoSeries = cudf.core.tools.datetimes.to_iso_calendar(self)

@property
def day(self):
return isoSeries["day"]

@property
def week(self):
return isoSeries["week"]

@property
def year(self):
return isoSeries["year"]

return isoSeries
marlenezw marked this conversation as resolved.
Show resolved Hide resolved

@property
def is_month_start(self):
"""
Expand Down Expand Up @@ -5806,9 +5861,8 @@ def strftime(self, date_format, *args, **kwargs):
Notes
-----

The following date format identifiers are not yet supported: ``%a``,
``%A``, ``%w``, ``%b``, ``%B``, ``%U``, ``%W``, ``%c``, ``%x``,
``%X``, ``%G``, ``%u``, ``%V``
The following date format identifiers are not yet
supported: ``%c``, ``%x``,``%X``

Examples
--------
Expand Down Expand Up @@ -5847,19 +5901,9 @@ def strftime(self, date_format, *args, **kwargs):
# once https://github.com/rapidsai/cudf/issues/5991
# is implemented
not_implemented_formats = {
"%a",
"%A",
"%w",
"%b",
"%B",
"%U",
"%W",
"%c",
"%x",
"%X",
"%G",
"%u",
"%V",
}
for d_format in not_implemented_formats:
if d_format in date_format:
Expand All @@ -5869,7 +5913,6 @@ def strftime(self, date_format, *args, **kwargs):
f"https://github.com/rapidsai/cudf/issues/5991 "
f"for tracking purposes."
)

str_col = self.series._column.as_string_column(
dtype="str", format=date_format
)
Expand Down
28 changes: 26 additions & 2 deletions python/cudf/cudf/core/tools/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from typing import Sequence, Union

import numpy as np
import pandas as pd
from pandas.core.tools.datetimes import _unit_map

import cudf
Expand Down Expand Up @@ -221,8 +222,8 @@ def to_datetime(
format=format,
)
return as_index(col, name=arg.name)
elif isinstance(arg, cudf.Series):
col = arg._column
elif isinstance(arg, (cudf.Series, pd.Series)):
col = column.as_column(arg)
col = _process_col(
col=col,
unit=unit,
Expand Down Expand Up @@ -652,3 +653,26 @@ def _isin_datetimelike(

res = lhs._obtain_isin_result(rhs)
return res


def to_iso_calendar(self):
marlenezw marked this conversation as resolved.
Show resolved Hide resolved
formats = ["%G", "%V", "%u"]
if isinstance(
self,
(
cudf.core.series.DatetimeProperties,
pd.core.indexes.accessors.DatetimeProperties,
),
):
iso_params = [self.strftime(fmt) for fmt in formats]
index = None
marlenezw marked this conversation as resolved.
Show resolved Hide resolved
elif isinstance(self, (cudf.Index, pd.Index)):
iso_params = [
self._values.as_string_column(self._values.dtype, fmt)
for fmt in formats
]
index = self._values
data = dict(zip(["year", "week", "day"], iso_params))
isoSeries = cudf.DataFrame(data, index=index, dtype=np.int32)

return isoSeries
marlenezw marked this conversation as resolved.
Show resolved Hide resolved
70 changes: 52 additions & 18 deletions python/cudf/cudf/tests/test_datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -1143,24 +1143,7 @@ def test_datetime_strftime(data, dtype, date_format):
assert_eq(expected, actual)


@pytest.mark.parametrize(
"date_format",
[
"%a",
"%A",
"%w",
"%b",
"%B",
"%U",
"%W",
"%c",
"%x",
"%X",
"%G",
"%u",
"%V",
],
)
marlenezw marked this conversation as resolved.
Show resolved Hide resolved
@pytest.mark.parametrize("date_format", ["%c", "%x", "%X"])
def test_datetime_strftime_not_implemented_formats(date_format):
gsr = cudf.Series([1, 2, 3], dtype="datetime64[ms]")

Expand Down Expand Up @@ -1334,6 +1317,57 @@ def test_quarter():
assert_eq(expect2.values, got2.values, check_dtype=False)


def test_isocalendar():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you parameterize the test with Series & Index inputs? There seems to be a lot of duplication of code in this test.

Also, would add a case for Series input along with a custom index.

data = [
"2020-05-31 08:00:00",
"1999-12-31 18:40:00",
"2000-12-31 04:00:00",
"1900-02-28 07:00:00",
"1800-03-14 07:30:00",
"2100-03-14 07:30:00",
"1970-01-01 00:00:00",
"1969-12-31 12:59:00",
]
dtype = "datetime64[s]"

# Series
ps = pd.Series(data, dtype=dtype)
gs = cudf.from_pandas(ps)

expect = ps.dt.isocalendar()
got = gs.dt.isocalendar()

assert_eq(expect, got, check_dtype=False)

# Series day
expectday = ps.dt.isocalendar().day
gotday = gs.dt.isocalendar().day

assert_eq(expectday, gotday, check_dtype=False)

# Series week
expectweek = ps.dt.isocalendar().week
gotweek = gs.dt.isocalendar().week

assert_eq(expectweek, gotweek, check_dtype=False)

# Series year
expectyear = ps.dt.isocalendar().year
gotyear = gs.dt.isocalendar().year

assert_eq(expectyear, gotyear, check_dtype=False)

# DatetimeIndex
pIndex = pd.DatetimeIndex(data)
gIndex = cudf.from_pandas(pIndex)

expect2 = pIndex.isocalendar()
got2 = gIndex.isocalendar()

# assert isinstance(got2, cudf.Int8Index)
marlenezw marked this conversation as resolved.
Show resolved Hide resolved
assert_eq(expect2.values, got2.values, check_dtype=False)


@pytest.mark.parametrize("dtype", DATETIME_TYPES)
def test_days_in_months(dtype):
nrows = 1000
Expand Down