-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add fold support to Timestamp constructor #31563
Conversation
Default behaviour near DST boundary is now for fold=0
@jreback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question and some minor comments. @mroeschke and @pganssle ok here?
doc/source/whatsnew/v1.1.0.rst
Outdated
tz="dateutil/Europe/London", fold=1) | ||
ts | ||
|
||
For more, see :ref:`Timezone section <timeseries.timezone>` in the user guide on working with timezones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference the sub-section here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/source/user_guide/timeseries.rst
Outdated
@@ -2220,6 +2220,32 @@ you can use the ``tz_convert`` method. | |||
|
|||
rng_pytz.tz_convert('US/Eastern') | |||
|
|||
.. note:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make this a sub-section (e.g. Fold), add a reference (so you can link to it in whatsnew)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be the same level as working with time zones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
"timezones." | ||
) | ||
|
||
if hasattr(ts_input, 'fold'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed? (e.g. we are passing fold now to all of the construction functions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still needed when we pass a naive datetime, fold, and a timezone.
also pls merge master as some cosmetic changes where done in cython. ping on green. |
# Allow fold only for unambiguous input | ||
if fold is not None: | ||
if fold not in [0, 1]: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test that hits this exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I've been out of the loop on this PR.
Looks pretty good from the surface. Just one comment.
Linux 32 bit azure pipeline fails to build for everyone: #32229 |
UPDATE: outdated, see below
Docs changed (tested ref locally, and everything seems to work), test added for invalid fold values, master merged. |
Resolve conflict in timestamps.pyx by accepting the master version.
@jreback Changes you requested made.
|
thanks @AlexKirko very nice! and thanks to @pganssle and @mroeschke for the lively discussion! |
@jreback @pganssle @mroeschke |
Sorry I couldn't give this one last once-over, but thanks so much for putting in the hard work to get this right, @AlexKirko. And now there's PEP 495 support in |
Great work @AlexKirko! Really happy to have this in. |
PERF Note: Current implementation slows down the
Timestamp
constructor. I've tested this thoroughly and tracked it down to ba7fcd5 where I changed the function signatures at the very beginning of working on the PR. The performance overhead appeared before any of the logic was implemented.black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Reasoning for changes
We currently don't support fold from PEP 495, and this causes us a lot of grief, including, but not limited to, broken Timestamp representations at the edge of DST shift (see #31338). The values can also break.
Support of the fold attribute helps us deal with that. Now, if the wall clock time occurs twice due to a DST shift, we will know exactly at which point in time we are.
This also removes inconsistencies between using
pytz
anddateutil
: now both the values and representations match for both timezone packages near DST summer/winter shifts.Scope of PR
Implementing fold into Timestamp can easily get out of hand. Parent
pydatetime
has it easy, as the object doesn't need to sync its underlying epoch time, fold, and the representation, like we do.This PR is already large, so I propose we limit its scope to minimal consistent fold support:
datetime
constructor increate_timestamp_from_ts
so that it gets stored in the object (can't assign it directly as it's read-only).Things I suggest we leave to discussion and other PRs:
fold=1
for a datetime that is nowhere near a DST shift. We can raise and Error or a Warning in the future. For now, we check whetherfold
can be 1, and if not, we leave it as defaultfold=0
.fold == 0
for ambiguous time like Timestamp("2019-10-27 01:30:00", tz='Europe/London'). The error was dropped to mirror thefold=0
default in pydatetime and to allow the fold attribute to bebint
in Cython functions.Note:
pydatetime
doesn't infer fold from value and doesn't raise errors when you assignfold=1
nowhere near a fold. Example:Performance problems
This implementation behaves as I expect it to, but it slows down scalar constructor benchmarks by 30-70%, even for tz-naive benchmarks.
Update: since the benchmark slowdown appeared before any of the functionality, this has nothing to do with the logic introduced and all to do with adding
bint fold
to function signatures, apparently. I'm afraid I don't know Cython well enough to research further.