-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++ refactoring: handle datetime and timedelta #1149
Conversation
Codecov Report
|
@jpivarski - I think, Unrelated to this PR: I'm not sure about is the reducers default axis. Should it be |
Unfortunately, it needs to be >>> np.sum([[1, 2], [3, 4]])
10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks fine—removing NumpyArray
wrappers from intermediate arrays is conceptually simpler.
This doesn't add any tests, though. The PR I took over last Friday had some un-activated tests, and I thought you were planning on activating them (uncommenting, I think—either that or removing pytest.mark.skip
) and making them work. When I took over the PR, I did not do that. I don't want them to slip through the cracks!
When I get back in half an hour, I can point out which tests those are. This PR can be merged, if it's not going to cover them. |
Yes, three tests have been enabled and the other three need more work.
array0 = ak._v2.contents.NumpyArray(
["2019-09-02T09:30:00", "2019-09-13T09:30:00", "2019-09-21T20:00:00"]
) while Awkward 2.0 treats them as an unsupported dtype:
array = ak._v2.highlevel.Array(
[
[
np.datetime64("2020-03-27T10:41:11"),
np.datetime64("2020-01-27T10:41:11"),
np.datetime64("2020-05"),
np.datetime64("2020-01-27T10:41:11"),
np.datetime64("2020-04-27T10:41:11"),
],
[
np.datetime64("2020-04-27"),
np.datetime64("2020-02-27T10:41:11"),
np.datetime64("2020-01-27T10:41:11"),
np.datetime64("2020-06-27T10:41:11"),
],
[
np.datetime64("2020-02-27T10:41:11"),
np.datetime64("2020-03-27T10:41:11"),
np.datetime64("2020-01-27T10:41:11"),
],
]
) >>> array.layout
<ListOffsetArray len='3'>
<offsets><Index dtype='int64' len='4'>[ 0 5 9 12]</Index></offsets>
<content><UnionArray len='12'>
<tags><Index dtype='int8' len='12'>[0 0 1 0 0 2 0 0 0 0 0 0]</Index></tags>
<index><Index dtype='int64' len='12'>[0 1 0 2 3 0 4 5 6 7 8 9]</Index></index>
<content index='0'>
<NumpyArray dtype='datetime64[s]' len='10'>
['2020-03-27T10:41:11' '2020-01-27T10:41:11'
'2020-01-27T10:41:11' '2020-04-27T10:41:11'
'2020-02-27T10:41:11' '2020-01-27T10:41:11'
'2020-06-27T10:41:11' '2020-02-27T10:41:11'
'2020-03-27T10:41:11' '2020-01-27T10:41:11']
</NumpyArray>
</content>
<content index='1'>
<NumpyArray dtype='datetime64[M]' len='1'>['2020-05']</NumpyArray>
</content>
<content index='2'>
<NumpyArray dtype='datetime64[D]' len='1'>['2020-04-27']</NumpyArray>
</content>
</UnionArray></content>
</ListOffsetArray>
>>>
|
The connection to NumPy is more direct now—we no longer have to do the In this case, NumPy interprets the data as strings: >>> np.array(["2019-09-02T09:30:00", "2019-09-13T09:30:00", "2019-09-21T20:00:00"])
array(['2019-09-02T09:30:00', '2019-09-13T09:30:00',
'2019-09-21T20:00:00'], dtype='<U19')
>>> np.asarray(["2019-09-02T09:30:00", "2019-09-13T09:30:00", "2019-09-21T20:00:00"])
array(['2019-09-02T09:30:00', '2019-09-13T09:30:00',
'2019-09-21T20:00:00'], dtype='<U19') And it should, because making the output type depend on the formatting of the strings would make it less predictable. (Suppose someone didn't know about ISO time formatting and they just wanted strings that look like this.) Even Pandas won't interpret strings as dates unless a specific option is passed ( So this old test was wrong: it has to be given a NumPy array with datetime dtype. Even Python datetime objects don't work: >>> from datetime import datetime
>>> np.array([datetime.now(), datetime.now(), datetime.now()])
array([datetime.datetime(2021, 11, 15, 9, 0, 13, 584111),
datetime.datetime(2021, 11, 15, 9, 0, 13, 584115),
datetime.datetime(2021, 11, 15, 9, 0, 13, 584116)], dtype=object) We won't (ever) support
This is the first test of --- a/src/awkward/_v2/contents/unionarray.py
+++ b/src/awkward/_v2/contents/unionarray.py
@@ -921,7 +921,7 @@ class UnionArray(Content):
def _completely_flatten(self, nplike, options):
out = []
for content in self._contents:
- out.extend(content[: self._length]._completely_flatten(nplike, options))
+ out.extend(content[: len(self._tags)]._completely_flatten(nplike, options))
return out
def _recursively_apply( Then you can >>> array.layout.completely_flatten()
array(['2020-03-27T10:41:11', '2020-01-27T10:41:11',
'2020-01-27T10:41:11', '2020-04-27T10:41:11',
'2020-02-27T10:41:11', '2020-01-27T10:41:11',
'2020-06-27T10:41:11', '2020-02-27T10:41:11',
'2020-03-27T10:41:11', '2020-01-27T10:41:11',
'2020-05-01T00:00:00', '2020-04-27T00:00:00'],
dtype='datetime64[s]') and work with the single NumPy array that results.
I don't think |
@jpivarski - thanks! I've enabled all tests and started on |
If it's easy enough to separate into another PR, then let's do it. (The only reason I sometimes don't is because there's so much to do and waiting for a prerequisite to finish testing and merge before starting the next would slow down development. This may be the end of the day for you; if you're planning to start working on Is this one ready to merge as-is? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've scanned it again and it looks good to me!
Yes, I think so. I'll open a new PR tomorrow with |
test reducers on
datetime
andtimedelta
at axisNone
sort
andargsort
datetime
andtimedelta
typesunique
andis_unique
fordatetime
andtimedelta
types