-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement repeat() for Series, Index, and MultiIndex. #1328
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1328 +/- ##
==========================================
+ Coverage 95.23% 95.25% +0.01%
==========================================
Files 34 34
Lines 7373 7401 +28
==========================================
+ Hits 7022 7050 +28
Misses 351 351
Continue to review full report at Codecov.
|
|
||
sdf = self._internal.sdf.select(self._internal.scol) | ||
internal = _InternalFrame( | ||
sdf=sdf, index_map=[(sdf.columns[0], self._internal.index_names[0])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just curious about this, why just don't use self._internal.index_map
here rather create index_map
manually? 🤔 (Maybe do they work not same way ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because self._internal.scol
could have a different name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin Ah.. I think found the case. Thanks!! :)
>>> self = ks.Series(['a', 'b', 'c'], index=[1, 2, 3])
>>> sdf = self._internal.sdf.select(self._internal.scol)
>>> self._internal.index_map
[('__index_level_0__', None)]
>>> [(sdf.columns[0], self._internal.index_names[0])]
[('0', None)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that's a Series
case.
The example for Index
is like:
>>> import databricks.koalas as ks
>>> kidx = ks.Index([1,2,3])
>>> kidx1 = (kidx + 1)
>>> kidx1._internal.index_scols, kidx1._internal.scol
([Column<b'__index_level_0__'>], Column<b'(__index_level_0__ + 1)'>)
>>> kidx1._internal.sdf.select(kidx1._internal.index_scols + [kidx1._internal.scol])
DataFrame[__index_level_0__: bigint, (__index_level_0__ + 1): bigint]
>>> kidx1._internal.sdf.select(kidx1._internal.index_scols + [kidx1._internal.scol]).show()
+-----------------+-----------------------+
|__index_level_0__|(__index_level_0__ + 1)|
+-----------------+-----------------------+
| 1| 2|
| 2| 3|
| 3| 4|
+-----------------+-----------------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But actually you raised a good point.
Maybe we should revisit the infrastructure of Index
and we might be able to consolidate as you mentioned.
Let me think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops... sorry I forgot extracting Index
from Series
when i tested.
Thanks for the example and I totally agree that we consolidate them! 😺
(Anyway, sometimes printed table of sdf
confuse us when they're used in markdown code block at github ^^;;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the example you replied is super good. Thanks again!!
LGTM, except for just one short question. 👍 |
elif repeats < 0: | ||
raise ValueError("negative dimensions are not allowed") | ||
|
||
sdf = self._internal.sdf.select(self._internal.scol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, maybe we can integrate Index & MultiIndex if we fix some lines below?
(1) self._internal.scol
-> self._internal.index_scols
(2) and use self._internal.index_map
when create index_map
like this
if not isinstance(repeats, int):
raise ValueError("`repeats` argument must be integer, but got {}".format(type(repeats)))
elif repeats < 0:
raise ValueError("negative dimensions are not allowed")
sdf = self._internal.sdf.select(self._internal.index_scols) # fixed here (1)
internal = _InternalFrame(
sdf=sdf, index_map=self._internal.index_map # and here (2)
)
kdf = DataFrame(internal) # type: DataFrame
if repeats == 0:
return DataFrame(kdf._internal.with_filter(F.lit(False))).index
else:
return ks.concat([kdf] * repeats).index
or maybe i missed something 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scol
in Index
could be different from _internal.index_scols[0]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin Ah, yeah i just checked Implementation of them. seems we don't need to use index_scols[0]
for Index
.
Thanks for the replies, LGTM. |
As discussed at #1328 (comment), we can consolidate the `Index.repeat()` and `MultiIndex.repeat()` by updating its anchor with a new scol. Also resolves #1190 and other functions which were mistakenly using `index_scols` in `Index`, whose usage is valid now.
As discussed at databricks/koalas#1328 (comment), we can consolidate the `Index.repeat()` and `MultiIndex.repeat()` by updating its anchor with a new scol. Also resolves #1190 and other functions which were mistakenly using `index_scols` in `Index`, whose usage is valid now.
Adding
repeat()
forSeries
,Index
, andMultiIndex
.