Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add fold support to Timestamp constructor #31563

Merged
merged 137 commits into from
Feb 26, 2020
Merged
Show file tree
Hide file tree
Changes from 92 commits
Commits
Show all changes
137 commits
Select commit Hold shift + click to select a range
84bfff2
TST: add basic test for construction with fold
AlexKirko Jan 30, 2020
ba7fcd5
ENH: add basic fold support
AlexKirko Jan 30, 2020
0b6f894
ENH: add fold to ts properties
AlexKirko Jan 30, 2020
5c58b3a
baseline fold in conversion
AlexKirko Jan 30, 2020
546789a
add fold placeholder to conversion
AlexKirko Jan 30, 2020
fc69bbb
add fold to convert_to_tsobject
AlexKirko Jan 30, 2020
57d42b3
TST: add test to infer fold from value
AlexKirko Jan 30, 2020
f2ad196
ENH: infer fold from value for dateutil
AlexKirko Jan 30, 2020
935a3ec
ENH: infer fold for pytz (loss in performance)
AlexKirko Jan 30, 2020
d35af8f
PERF: remove unnecessary ifs
AlexKirko Jan 30, 2020
e6d4aaa
check that we are not at the left edge of deltas
AlexKirko Jan 30, 2020
fd98b27
ENH: adjust Timestamp.value for fold (if not from datetime or string)
AlexKirko Jan 31, 2020
af86f79
CLN: remove extra spaces
AlexKirko Jan 31, 2020
a6965e9
TST: add test for adjusting value for fold
AlexKirko Jan 31, 2020
4caf9bb
DOC: add to comment for local timezone
AlexKirko Jan 31, 2020
6843ed2
TST: add string case to fold inferring test
AlexKirko Jan 31, 2020
ebbf21f
ENH: infer fold for timestamp from string
AlexKirko Jan 31, 2020
4f43638
TST: add fold value adjustment from string
AlexKirko Jan 31, 2020
6238f9b
ENH: adjust value for fold from string
AlexKirko Jan 31, 2020
3584791
basic from datetime fold support with bugs
AlexKirko Jan 31, 2020
237341b
complete adjust value for fold from datetime
AlexKirko Jan 31, 2020
12b8b4e
use input fold as default
AlexKirko Jan 31, 2020
92e990e
TST: adjust datetime for included fold
AlexKirko Jan 31, 2020
c2189d3
TST: infer fold from fold included in datetime
AlexKirko Jan 31, 2020
9c9c2dd
FIX: fix bool condition for inferring fold for str
AlexKirko Jan 31, 2020
f0bbbcb
remove unnecessary value shift for inferring fold from str
AlexKirko Jan 31, 2020
ca46078
remove unnecessary GH24329 fix
AlexKirko Jan 31, 2020
411b036
pass fold through Timstamp.replace
AlexKirko Feb 1, 2020
464dadf
TST:fix ambiguous compatibility test in scalar test_timezones
AlexKirko Feb 1, 2020
2b9f2f6
TST: remove test for ambiguous error near dst boundary
AlexKirko Feb 1, 2020
ef87010
DOC: add fold arg description to functions and methods
AlexKirko Feb 1, 2020
ecfde58
REFACTOR: remove code duplication in localize_tso
AlexKirko Feb 1, 2020
6970ed1
DOC: add fold description to ints_to_pydatetime
AlexKirko Feb 1, 2020
353e554
REFACTOR: make obj.fold assignment process more transparent
AlexKirko Feb 1, 2020
bce8f0d
DOC: clarify comments for fold adjustment and inferring
AlexKirko Feb 1, 2020
558c237
REFACTOR: compact datetime_to_tsobject and refactor checks
AlexKirko Feb 1, 2020
7b88ffd
DOC: clarify comments for datetime_to_tsobject
AlexKirko Feb 1, 2020
9621c0a
add pos shift for adjusting value for fold in localize_tso
AlexKirko Feb 1, 2020
6bf58b5
CLN: remove unnecessary comment
AlexKirko Feb 1, 2020
73364af
try to speed up datetime processing
AlexKirko Feb 2, 2020
a8ad96c
Revert "try to speed up datetime processing" - does not work
AlexKirko Feb 2, 2020
104c97d
CLN: remove unnecessary fold assignment in timestamps
AlexKirko Feb 2, 2020
1f9e810
DOC: add whatsnew entry
AlexKirko Feb 2, 2020
55f0b8a
DOC: fix whatsnew
AlexKirko Feb 2, 2020
5e1be83
Merge branch 'master' into add-fold-to-timestamp
AlexKirko Feb 2, 2020
bda4934
Merge branch 'master' into add-fold-to-timestamp
AlexKirko Feb 3, 2020
1065085
CLN: finish merging
AlexKirko Feb 3, 2020
f9c6956
cut the logic to benchmark function signatures
AlexKirko Feb 3, 2020
62d5d6b
Revert "cut the logic to benchmark function signatures" - return to o…
AlexKirko Feb 3, 2020
6294ee9
PERF: change fold to fold or 0
AlexKirko Feb 3, 2020
f13e3d7
CLN: clean up TODOs
AlexKirko Feb 3, 2020
eade807
CLN: trim comments, add issues to whatsnew
AlexKirko Feb 4, 2020
7269f9a
fold: set Tiemstamp default to 0, reformat description
AlexKirko Feb 4, 2020
d650086
rollback Timestamp fold default to None
AlexKirko Feb 4, 2020
693cb6c
REFACT: move value adjustment for fold to function
AlexKirko Feb 4, 2020
2fe9ce7
REFACT: move inferring fold to a function
AlexKirko Feb 4, 2020
3e2c76c
CLN: remove unnecessary whitespace
AlexKirko Feb 4, 2020
2f4fdda
DOC: trim fold arg description in tslib.pyx
AlexKirko Feb 5, 2020
9f7a16e
REFACT: tighten code in datetime_to_tsobject
AlexKirko Feb 5, 2020
a6d37ea
REFACT: set TSObject default fold to 0
AlexKirko Feb 5, 2020
a017953
CLN: lint multiline if in datetime_to_tsobject
AlexKirko Feb 5, 2020
1d716e7
TST: reparametrize tests
AlexKirko Feb 6, 2020
b47efe0
raise if ts_input.fold and fold do not match
AlexKirko Feb 6, 2020
cf7c091
restart tests
AlexKirko Feb 6, 2020
a49a7e9
Merge branch 'master' into add-fold-to-timestamp
AlexKirko Feb 6, 2020
c39c490
Merge branch 'master' into add-fold-to-timestamp to resolve conflicts
AlexKirko Feb 7, 2020
75e1633
add ambiguous timezone to whatsnew example
AlexKirko Feb 7, 2020
21883be
combine is_utc and is_tzlocal
AlexKirko Feb 7, 2020
b21cb47
remove inner parenthesis
AlexKirko Feb 7, 2020
3ca1fc3
combine if statements in Timestamp constructor
AlexKirko Feb 7, 2020
edde445
CLN: replace "typ == or typ ==" with "typ in [...]"
AlexKirko Feb 10, 2020
d5925af
REFACT: move fold assignment to cdef
AlexKirko Feb 10, 2020
b128cde
DOC: add more note and examples for Timestamp.fold
AlexKirko Feb 10, 2020
a673b65
add valid fold value check to Timestamp constructor
AlexKirko Feb 10, 2020
276fad7
REFACT: move fold default from _TSObject cinit to explicit
AlexKirko Feb 10, 2020
5540de1
DOC: fix typo in error message
AlexKirko Feb 11, 2020
30eef01
TST: fix the rest of typo
AlexKirko Feb 11, 2020
68b05fc
change raise bahvior to allow overriding naive datetime
AlexKirko Feb 11, 2020
d67ec4f
DOC: update localize_tso docstring
AlexKirko Feb 12, 2020
3262085
change override default to raise
AlexKirko Feb 14, 2020
4790b76
finish transition to new behavior
AlexKirko Feb 14, 2020
353bd87
DOC: revert localize_tso docstring
AlexKirko Feb 14, 2020
94e9e65
CLN: remove unnecessary change
AlexKirko Feb 14, 2020
a69833a
restore pytz ambiguous test
AlexKirko Feb 14, 2020
e58ecb9
CLN: lint the tests
AlexKirko Feb 14, 2020
e1ffa8d
set fold in Timestamp.replace after pytz resets it
AlexKirko Feb 14, 2020
c57ec65
fix datetime call in docs
AlexKirko Feb 14, 2020
f3f8690
fix doc error
AlexKirko Feb 14, 2020
476c4a4
for naive datetime, set default to 0
AlexKirko Feb 14, 2020
afaeb88
TST: expand tests
AlexKirko Feb 14, 2020
4a33d36
REFACT: bundle fold in ts_input
AlexKirko Feb 14, 2020
82ed93c
pass fold to func_create candidates in tslib.pyx
AlexKirko Feb 14, 2020
d9aea09
remove dateutil adjustment from convert_datetime_to_tsobject
AlexKirko Feb 14, 2020
d68efb6
move relevant parts of adjust function where called and drop
AlexKirko Feb 14, 2020
9328071
CLN: revert localize_tso docstring notes
AlexKirko Feb 14, 2020
4ecbaf1
REFACT: combine raise conditions in timestamps.pyx
AlexKirko Feb 15, 2020
ee90ac7
CLN: tighten error messages
AlexKirko Feb 15, 2020
25291e4
CLN: black the tests
AlexKirko Feb 15, 2020
08cc256
DOC: expand doc in timeseries.rst and make it a note
AlexKirko Feb 15, 2020
2910720
TST: fix pytz fold conflict test
AlexKirko Feb 15, 2020
f6c11da
DOC: improve formatting in timeseries.rst addition
AlexKirko Feb 15, 2020
6f16ea5
DOC: update whatsnew
AlexKirko Feb 15, 2020
5024452
CLN: fix linting mistakes
AlexKirko Feb 15, 2020
3e49b7a
add versionadded where appropriate
AlexKirko Feb 15, 2020
bcf0905
DOC: add references to PEP 495
AlexKirko Feb 15, 2020
9b614ae
add local timezone support
AlexKirko Feb 15, 2020
467a11e
Merge branch 'master' into add-fold-to-timestamp
AlexKirko Feb 15, 2020
2145b05
Revert "add local timezone support"
AlexKirko Feb 15, 2020
8f82aa1
make fold keyword-only
AlexKirko Feb 17, 2020
d39e811
CLN: prune pytz-specific logic in Timestamp.replace
AlexKirko Feb 17, 2020
3091689
switch from getattr to hasattr in the constructor
AlexKirko Feb 17, 2020
4cfac36
Merge branch 'master' into add-fold-to-timestamp
AlexKirko Feb 17, 2020
e58fe0c
ENH: add fold support
AlexKirko Feb 17, 2020
d166a67
REFACT: initalize _TSObject.fold to False explicitly
AlexKirko Feb 17, 2020
5729eb8
CLN: fix linting
AlexKirko Feb 17, 2020
c9863e1
CLN: remove unnecessary TODOs
AlexKirko Feb 17, 2020
1d72e2d
add comment to __cinit__
AlexKirko Feb 17, 2020
97883ce
try reverting the changes to fix test error
AlexKirko Feb 17, 2020
aa5232b
Revert "try reverting the changes to fix test error"
AlexKirko Feb 17, 2020
0ebbe02
REFACT: no longer set delta by pointer
AlexKirko Feb 17, 2020
c840fd6
fix linting
AlexKirko Feb 17, 2020
a793bb5
DOC: tweak docstrings in tzconversion
AlexKirko Feb 17, 2020
920d52a
Merge branch 'master' into add-fold-to-timestamp
AlexKirko Feb 18, 2020
d732139
statically type trans and deltas
AlexKirko Feb 19, 2020
752acbc
REFACT: use PyDateTime_Check when deciding if to raise
AlexKirko Feb 19, 2020
a1f69cf
DOC: add refs to docs, add blank lines to docstrings
AlexKirko Feb 19, 2020
97dc342
REFACT: merge replace with error raising in timestamp.pyx
AlexKirko Feb 19, 2020
7ac14df
REFACT: move +/- delta logic out of _tzlocal_get_offset_components
AlexKirko Feb 19, 2020
397b2c8
CLN: fix indentation
AlexKirko Feb 19, 2020
757bd41
DOC: anonimize references
AlexKirko Feb 19, 2020
46a279b
DOC: give descriptive names to refs
AlexKirko Feb 19, 2020
81560bb
DOC: make fold a subsection in timeseries.rst
AlexKirko Feb 25, 2020
4256642
TST: add test for invalid fold raise
AlexKirko Feb 25, 2020
a24594a
Merge branch 'master' into add-fold-to-timestamp
AlexKirko Feb 25, 2020
0168aa6
DOC: rephrase text in timeseries.rst to improve readability
AlexKirko Feb 25, 2020
3a605c3
Merge branch 'master' into add-fold-to-timestamp
AlexKirko Feb 26, 2020
cd02318
add fold support to merged code
AlexKirko Feb 26, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions doc/source/user_guide/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2220,6 +2220,22 @@ you can use the ``tz_convert`` method.

rng_pytz.tz_convert('US/Eastern')

.. versionadded:: 1.1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this a note::

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


For ambiguous times, pandas supports explicitly specifying the fold argument.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you mention in this description that we still recommend using tz_localize for more flexibility when handling ambiguous times?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Due to daylight saving time, one wall clock time can occur twice when shifting
from summer to winter time; fold describes whether the datetime-like corresponds
to the first (0) or the second time (1) the wall clock hits the ambiguous time.
Fold is supported only for constructing from naive datetime or :class:`Timestamp`
or for constructing from components (see below).

.. ipython:: python

pd.Timestamp(datetime.datetime(2019, 10, 27, 1, 30, 0, 0),
tz='dateutil/Europe/London', fold=0)
pd.Timestamp(year=2019, month=10, day=27, hour=1, minute=30,
tz='dateutil/Europe/London', fold=1)

.. note::

When using ``pytz`` time zones, :class:`DatetimeIndex` will construct a different
Expand Down
20 changes: 20 additions & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,26 @@ For example:
ser["2014"]
ser.loc["May 2015"]

.. _whatsnew_110.timestamp_fold_support:

Fold argument support in Timestamp constructor
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:class:`Timestamp: now supports the fold argument according to PEP 495 similar to parent `pydatetime` class. It supports both accepting fold as an initialization argument and inferring fold from other constructor arguments (:issue:`25057`, :issue:`31338`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can point to the doc-reference that you added in timeseries.rst (leave the example here as well)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


For example:

.. ipython:: python

ts = pd.Timestamp("2019-10-27 01:30:00+00:00")
ts.fold

.. ipython:: python

ts = pd.Timestamp(year=2019, month=10, day=27, hour=1, minute=30,
tz="dateutil/Europe/London", fold=1)
ts

.. _whatsnew_110.enhancements.other:

Other enhancements
Expand Down
30 changes: 18 additions & 12 deletions pandas/_libs/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -49,30 +49,31 @@ from pandas._libs.tslibs.tzconversion cimport (

cdef inline object create_datetime_from_ts(
int64_t value, npy_datetimestruct dts,
object tz, object freq):
object tz, object freq, bint fold):
""" convenience routine to construct a datetime.datetime from its parts """
return datetime(dts.year, dts.month, dts.day, dts.hour,
jreback marked this conversation as resolved.
Show resolved Hide resolved
jreback marked this conversation as resolved.
Show resolved Hide resolved
dts.min, dts.sec, dts.us, tz)
dts.min, dts.sec, dts.us, tz, fold=fold)


cdef inline object create_date_from_ts(
int64_t value, npy_datetimestruct dts,
object tz, object freq):
object tz, object freq, bint fold):
jreback marked this conversation as resolved.
Show resolved Hide resolved
""" convenience routine to construct a datetime.date from its parts """
# GH 25057 add fold argument to match other func_create signatures
return date(dts.year, dts.month, dts.day)


cdef inline object create_time_from_ts(
int64_t value, npy_datetimestruct dts,
object tz, object freq):
object tz, object freq, bint fold):
jreback marked this conversation as resolved.
Show resolved Hide resolved
""" convenience routine to construct a datetime.time from its parts """
return time(dts.hour, dts.min, dts.sec, dts.us, tz)
return time(dts.hour, dts.min, dts.sec, dts.us, tz, fold=fold)


@cython.wraparound(False)
@cython.boundscheck(False)
def ints_to_pydatetime(const int64_t[:] arr, object tz=None, object freq=None,
str box="datetime"):
bint fold=0, str box="datetime"):
"""
Convert an i8 repr to an ndarray of datetimes, date, time or Timestamp

Expand All @@ -83,6 +84,11 @@ def ints_to_pydatetime(const int64_t[:] arr, object tz=None, object freq=None,
convert to this timezone
freq : str/Offset, default None
freq to convert
fold : bint, default is 0
jreback marked this conversation as resolved.
Show resolved Hide resolved
Due to daylight saving time, one wall clock time can occur twice
when shifting from summer to winter time; fold describes whether the
datetime-like corresponds to the first (0) or the second time (1)
the wall clock hits the ambiguous time
box : {'datetime', 'timestamp', 'date', 'time'}, default 'datetime'
If datetime, convert to datetime.datetime
If date, convert to datetime.date
Expand All @@ -104,7 +110,7 @@ def ints_to_pydatetime(const int64_t[:] arr, object tz=None, object freq=None,
str typ
int64_t value, delta, local_value
ndarray[object] result = np.empty(n, dtype=object)
object (*func_create)(int64_t, npy_datetimestruct, object, object)
object (*func_create)(int64_t, npy_datetimestruct, object, object, bint)
jreback marked this conversation as resolved.
Show resolved Hide resolved

if box == "date":
assert (tz is None), "tz should be None when converting to date"
Expand All @@ -129,7 +135,7 @@ def ints_to_pydatetime(const int64_t[:] arr, object tz=None, object freq=None,
result[i] = <object>NaT
else:
dt64_to_dtstruct(value, &dts)
result[i] = func_create(value, dts, tz, freq)
result[i] = func_create(value, dts, tz, freq, fold)
elif is_tzlocal(tz):
for i in range(n):
value = arr[i]
Expand All @@ -141,7 +147,7 @@ def ints_to_pydatetime(const int64_t[:] arr, object tz=None, object freq=None,
# using the i8 representation.
local_value = tz_convert_utc_to_tzlocal(value, tz)
dt64_to_dtstruct(local_value, &dts)
result[i] = func_create(value, dts, tz, freq)
result[i] = func_create(value, dts, tz, freq, fold)
else:
trans, deltas, typ = get_dst_info(tz)

Expand All @@ -155,7 +161,7 @@ def ints_to_pydatetime(const int64_t[:] arr, object tz=None, object freq=None,
else:
# Adjust datetime64 timestamp, recompute datetimestruct
dt64_to_dtstruct(value + delta, &dts)
result[i] = func_create(value, dts, tz, freq)
result[i] = func_create(value, dts, tz, freq, fold)

elif typ == 'dateutil':
# no zone-name change for dateutil tzs - dst etc
Expand All @@ -168,7 +174,7 @@ def ints_to_pydatetime(const int64_t[:] arr, object tz=None, object freq=None,
# Adjust datetime64 timestamp, recompute datetimestruct
pos = trans.searchsorted(value, side='right') - 1
dt64_to_dtstruct(value + deltas[pos], &dts)
result[i] = func_create(value, dts, tz, freq)
result[i] = func_create(value, dts, tz, freq, fold)
else:
# pytz
for i in range(n):
Expand All @@ -182,7 +188,7 @@ def ints_to_pydatetime(const int64_t[:] arr, object tz=None, object freq=None,
new_tz = tz._tzinfos[tz._transition_info[pos]]

dt64_to_dtstruct(value + deltas[pos], &dts)
result[i] = func_create(value, dts, new_tz, freq)
result[i] = func_create(value, dts, new_tz, freq, fold)
jreback marked this conversation as resolved.
Show resolved Hide resolved

return result

Expand Down
1 change: 1 addition & 0 deletions pandas/_libs/tslibs/conversion.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ cdef class _TSObject:
npy_datetimestruct dts # npy_datetimestruct
int64_t value # numpy dt64
object tzinfo
bint fold


cdef convert_to_tsobject(object ts, object tz, object unit,
Expand Down
132 changes: 123 additions & 9 deletions pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@ cdef class _TSObject:
# npy_datetimestruct dts # npy_datetimestruct
# int64_t value # numpy dt64
# object tzinfo
# bint fold

@property
def value(self):
Expand Down Expand Up @@ -323,6 +324,7 @@ cdef _TSObject convert_datetime_to_tsobject(datetime ts, object tz,
cdef:
_TSObject obj = _TSObject()

obj.fold = ts.fold
if tz is not None:
tz = maybe_get_tz(tz)

Expand Down Expand Up @@ -355,6 +357,24 @@ cdef _TSObject convert_datetime_to_tsobject(datetime ts, object tz,
obj.value += nanos
obj.dts.ps = nanos * 1000

if tz is not None:
mroeschke marked this conversation as resolved.
Show resolved Hide resolved
AlexKirko marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment before this section, saying this is where we infer / set fold, and point to the standard library documentation for the same, and any other helpful comments related to this discussion (e.g. imagine a future reader wondering what is going on here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is now dead, but there is a similar section when we infer from offset. I've linked to PEP 495 there as the standard library docs don't provide much more information about what fold does than we do in the docstring itself.

if is_utc(tz) or is_tzlocal(tz):
# TODO: think on how we can infer fold for local Timezone
# and adjust value for fold
pass
else:
trans, deltas, typ = get_dst_info(tz)

if typ in ['pytz', 'dateutil']:
pos = trans.searchsorted(obj.value, side='right') - 1
# pytz assumes fold == 1, dateutil fold == 0
# adjust only if necessary
if (typ == 'pytz' and obj.fold == 0 or
typ == 'dateutil' and obj.fold == 1):
pos = _adjust_tsobject_for_fold(obj, trans, deltas, pos,
obj.fold)
obj.fold = _infer_tsobject_fold(obj, trans, deltas, pos)
mroeschke marked this conversation as resolved.
Show resolved Hide resolved

check_dts_bounds(&obj.dts)
check_overflows(obj)
return obj
Expand All @@ -381,6 +401,7 @@ cdef _TSObject create_tsobject_tz_using_offset(npy_datetimestruct dts,
_TSObject obj = _TSObject()
int64_t value # numpy dt64
datetime dt
bint fold

value = dtstruct_to_dt64(&dts)
obj.dts = dts
Expand All @@ -390,10 +411,22 @@ cdef _TSObject create_tsobject_tz_using_offset(npy_datetimestruct dts,
check_overflows(obj)
return obj

# Infer fold from offset-adjusted obj.value
if is_utc(tz) or is_tzlocal(tz):
# TODO: think on how we can infer fold for local Timezone
# and adjust value for fold
pass
else:
trans, deltas, typ = get_dst_info(tz)

if typ in ['pytz', 'dateutil']:
pos = trans.searchsorted(obj.value, side='right') - 1
fold = _infer_tsobject_fold(obj, trans, deltas, pos)

# Keep the converter same as PyDateTime's
dt = datetime(obj.dts.year, obj.dts.month, obj.dts.day,
obj.dts.hour, obj.dts.min, obj.dts.sec,
obj.dts.us, obj.tzinfo)
obj.dts.us, obj.tzinfo, fold=fold)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obj.tzinfo is a pytz.FixedOffset. fold is always 0, so you can remove this and the "infer tsobject fold" logic above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pganssle
The problem is that then you call convert_datetime_to_tsobject, and if you haven't set or inferred dt.fold, then you end up losing the ability to infer from unambiguous string.

This is what happens if we cut this argument and logic:

================================== FAILURES ===================================
_ test_timestamp_constructor_infer_fold_from_value[2019-10-27 01:30:00+00:00-1-Europe/London] _

tz = 'Europe/London', ts_input = '2019-10-27 01:30:00+00:00', fold_out = 1

    @pytest.mark.parametrize("tz", ["dateutil/Europe/London", "Europe/London"])
    @pytest.mark.parametrize(
        "ts_input,fold_out",
        [
            (1572136200000000000, 0),
            (1572139800000000000, 1),
            ("2019-10-27 01:30:00+01:00", 0),
            ("2019-10-27 01:30:00+00:00", 1),
            (datetime(2019, 10, 27, 1, 30, 0, 0, fold=0), 0),
            (datetime(2019, 10, 27, 1, 30, 0, 0, fold=1), 1),
        ],
    )
    def test_timestamp_constructor_infer_fold_from_value(tz, ts_input, fold_out):
        # Test for #25057
        # Check that we infer fold correctly based on timestamps since utc
        # or strings
        ts = pd.Timestamp(ts_input, tz=tz)
        result = ts.fold
        expected = fold_out
>       assert result == expected
E       assert 0 == 1

Copy link
Contributor

@pganssle pganssle Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, pytz does not support fold, so if you remove it from the matrix, you should be OK here.

I think you may be misunderstanding what fold is supposed to be. fold solves a specific problem: there is not enough information contained in a datetime.datetime object without fold to distinguish between ambiguous datetimes. This is because tzinfo is a set of functions that takes the naive portion of the datetime and maps it to information about the offset. This doesn't work when two datetimes are distinguished only by their offset, so an additional bit is added to the naive portion of the datetime to allow time zone providers to uniquely map "naive portion" to offsets.

pytz does not have this problem, because pytz worked around this problem by inventing its own semantics for this. It ends up storing the relevant information in the time zone itself.

You only need to set the value for fold if your time zone provider needs it, and you would never expect pytz to set it. You can throw away all tests that involve pytz using or being affected by fold.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we then default fold=None and if it is passed and we have a pytz zone raise?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pganssle
Thank you. This is what I was having trouble with, because I could see that Timestamps based on pytz produced different offsets for different sides of fold.
Now that I understand your explanation, I understand how this is done:

        elif typ in ['pytz', 'dateutil']:
            pos = trans.searchsorted(obj.value, side='right') - 1
            if typ == 'pytz':
                tz = tz._tzinfos[tz._transition_info[pos]]
            dt64_to_dtstruct(obj.value + deltas[pos], &obj.dts)

Basically, we search through tz._tzinfos and manually pick the timezone that looks like what we need. This thing is completely separate from the fold mechanism.

I agree that adjusting or inferring fold for pytz is meaningless and we should just raise if fold is passed with a pytz-like timezone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we then default fold=None and if it is passed and we have a pytz zone raise?

I think I'm fine with raising in this situation, but I will say that this is a further deviation from datetime.datetime's behavior, which may have somewhat unpredictable consequences.

One possible downside is that if pytz ever changes to support fold, you'll have to change the behavior of fold in pandas, and pytz users who want a fold-supporting pytz in pandas will need to pull in the latest pandas.

It's unlikely to happen and I think pandas is already OK with an uncomfortably tight coupling with its dependencies like this, but it is just one potential pitfall.

That said, the permissions ratchet usually goes one way, so maybe it's best to start by raising and then remove it if people come up with ways that it fails in real code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic should still be removed, though, right? You're constructing a datetime that always has a pytz.FixedOffset for its tzinfo, so I expect that fold will always be ignored.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pganssle
No, it gets set: datetime seems to be very permissive about setting fold. This allows us to pass the fold we've inferred to convert_datetime_to_tsobject through dt, which uses the fold attribute from its datetime input to build _TSObject.

obj = convert_datetime_to_tsobject(
dt, tz, nanos=obj.dts.ps // 1000)
return obj
Expand Down Expand Up @@ -528,7 +561,7 @@ cdef inline void localize_tso(_TSObject obj, tzinfo tz):

Notes
-----
Sets obj.tzinfo inplace, alters obj.dts inplace.
Sets obj.tzinfo inplace, alters obj.dts inplace, alters obj.value inplace.
AlexKirko marked this conversation as resolved.
Show resolved Hide resolved
"""
cdef:
ndarray[int64_t] trans
Expand All @@ -546,6 +579,8 @@ cdef inline void localize_tso(_TSObject obj, tzinfo tz):
elif is_tzlocal(tz):
local_val = _tz_convert_tzlocal_utc(obj.value, tz, to_utc=False)
dt64_to_dtstruct(local_val, &obj.dts)
# TODO: think on how we can infer fold for local Timezone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives you the fold information you need, but unfortunately it appears to be an intermediate state.

I find _tz_convert_tzlocal_utc to be a somewhat strange function, but you can probably refactor it slightly to get what you want by refactoring it as (apologies that I don't know the right syntax offhand for how to do the "return by mutating a reference" idiom that's common in C):

cdef _tzlocal_get_offset_components(int64_t val, tzinfo tz, int64_t* delta, bint* fold):
    ... # Contents of the original function except the return part
    *delta = delta
    if fold != NULL:  # Don't know how to do this in Cython either
        *fold = dt.fold

cdef _int64_t _tzconvert_tzlocal_utc(int64_t val, tzinfo tz, bint to_utc=True):
    cdef int64_t delta
    _tzlocal_get_offset_components(val, &delta, NULL)
    if to_utc:
        return val + delta
    else:
        return val - delta

cdef _int64_t _tzconvert_tzlocal_fromutc(int64_t val, tzinfo tz, bint* fold):
    cdef int64_t delta
    _tzlocal_get_offset_components(val, &delta, &fold)
    return val - delta

Then you just move all these things where you need to know the fold over to using the _tzconvert_tzlocal_fromutc function and you pass &obj.fold as the extra parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pganssle
Thanks. Unfortunately, my fever is getting worse, so I'll have to sleep before I attempt tackling the local timezone case.
I should be okay then. If I disappear for a day or two, I apologize, but it just means I'm too ill to code well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel better!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, feeling much better now. Let's see if we can get this to merging.

# and adjust value for fold
else:
# Adjust datetime64 timestamp, recompute datetimestruct
trans, deltas, typ = get_dst_info(tz)
Expand All @@ -554,15 +589,13 @@ cdef inline void localize_tso(_TSObject obj, tzinfo tz):
# static/fixed tzinfo; in this case we know len(deltas) == 1
# This can come back with `typ` of either "fixed" or None
dt64_to_dtstruct(obj.value + deltas[0], &obj.dts)
elif typ == 'pytz':
# i.e. treat_tz_as_pytz(tz)
pos = trans.searchsorted(obj.value, side='right') - 1
tz = tz._tzinfos[tz._transition_info[pos]]
dt64_to_dtstruct(obj.value + deltas[pos], &obj.dts)
elif typ == 'dateutil':
# i.e. treat_tz_as_dateutil(tz)
elif typ in ['pytz', 'dateutil']:
AlexKirko marked this conversation as resolved.
Show resolved Hide resolved
pos = trans.searchsorted(obj.value, side='right') - 1
if typ == 'pytz':
tz = tz._tzinfos[tz._transition_info[pos]]
dt64_to_dtstruct(obj.value + deltas[pos], &obj.dts)

obj.fold = _infer_tsobject_fold(obj, trans, deltas, pos)
mroeschke marked this conversation as resolved.
Show resolved Hide resolved
else:
# Note: as of 2018-07-17 all tzinfo objects that are _not_
# either pytz or dateutil have is_fixed_offset(tz) == True,
Expand All @@ -572,6 +605,87 @@ cdef inline void localize_tso(_TSObject obj, tzinfo tz):
obj.tzinfo = tz


cdef inline int32_t _adjust_tsobject_for_fold(_TSObject obj, object trans,
object deltas, int32_t pos,
bint fold):
"""
Adjust _TSObject value for fold is possible. Return updated last offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in a few other places, _TSObject.value shouldn't need to be adjusted here, as it is the unambiguous source of truth. You should be using it to determine what the value of fold is, not the other way round.

transition position in the trans list.

Parameters
----------
obj : _TSObject
trans : object
List of offset transition points in nanoseconds since epoch.
deltas : object
List of offsets corresponding to transition points in trans.
pos : int32_t
Position of the last transition point before taking fold into account.
fold : bint
Due to daylight saving time, one wall clock time can occur twice
when shifting from summer to winter time; fold describes whether the
datetime-like corresponds to the first (0) or the second time (1)
the wall clock hits the ambiguous time

Returns
-------
int32_t
Position of the last transition point after taking fold into account.

Notes
-----
Alters obj.value inplace.
"""
if fold == 0:
if pos > 0:
fold_delta = deltas[pos - 1] - deltas[pos]
if obj.value - fold_delta < trans[pos]:
obj.value -= fold_delta
pos -= 1
elif fold == 1:
if pos < len(deltas):
fold_delta = deltas[pos] - deltas[pos + 1]
if obj.value + fold_delta > trans[pos + 1]:
obj.value += fold_delta
pos += 1

return pos


cdef inline bint _infer_tsobject_fold(_TSObject obj, object trans,
object deltas, int32_t pos):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can trans or deltas be typed as e.g. ndarray[int64_t]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we type these in the localizaiton functions i think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, thanks. Typed trans and deltas.

"""
Infer _TSObject fold property from value by assuming 0 and then setting
jreback marked this conversation as resolved.
Show resolved Hide resolved
to 1 if necessary.

Parameters
----------
obj : _TSObject
trans : object
List of offset transition points in nanoseconds since epoch.
deltas : object
List of offsets corresponding to transition points in trans.
pos : int32_t
Position of the last transition point before taking fold into account.

Returns
-------
bint
Due to daylight saving time, one wall clock time can occur twice
when shifting from summer to winter time; fold describes whether the
datetime-like corresponds to the first (0) or the second time (1)
the wall clock hits the ambiguous time
"""
cdef:
bint fold = 0

if pos > 0:
fold_delta = deltas[pos - 1] - deltas[pos]
if obj.value - fold_delta < trans[pos]:
fold = 1

return fold

cdef inline datetime _localize_pydatetime(datetime dt, tzinfo tz):
"""
Take a datetime/Timestamp in UTC and localizes to timezone tz.
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/timestamps.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ from pandas._libs.tslibs.np_datetime cimport npy_datetimestruct

cdef object create_timestamp_from_ts(int64_t value,
npy_datetimestruct dts,
object tz, object freq)
object tz, object freq, bint fold)
Loading