Implement repeat() for Series, Index, and MultiIndex. #1328

ueshin · 2020-03-05T22:55:11Z

Adding repeat() for Series, Index, and MultiIndex.

codecov-io · 2020-03-05T23:17:06Z

Codecov Report

Merging #1328 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1328      +/-   ##
==========================================
+ Coverage   95.23%   95.25%   +0.01%     
==========================================
  Files          34       34              
  Lines        7373     7401      +28     
==========================================
+ Hits         7022     7050      +28     
  Misses        351      351

Impacted Files	Coverage Δ
databricks/koalas/missing/indexes.py	`100% <ø> (ø)`	⬆️
databricks/koalas/missing/series.py	`100% <ø> (ø)`	⬆️
databricks/koalas/series.py	`96.73% <100%> (+0.03%)`	⬆️
databricks/koalas/indexes.py	`96.5% <100%> (+0.16%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a849249...269c775. Read the comment docs.

itholic · 2020-03-06T01:31:22Z

databricks/koalas/indexes.py

+
+        sdf = self._internal.sdf.select(self._internal.scol)
+        internal = _InternalFrame(
+            sdf=sdf, index_map=[(sdf.columns[0], self._internal.index_names[0])]


I just curious about this, why just don't use self._internal.index_map here rather create index_map manually? 🤔 (Maybe do they work not same way ?)

because self._internal.scol could have a different name.

@ueshin Ah.. I think found the case. Thanks!! :)

>>> self = ks.Series(['a', 'b', 'c'], index=[1, 2, 3]) >>> sdf = self._internal.sdf.select(self._internal.scol) >>> self._internal.index_map [('__index_level_0__', None)] >>> [(sdf.columns[0], self._internal.index_names[0])] [('0', None)]

Actually that's a Series case.
The example for Index is like:

>>> import databricks.koalas as ks >>> kidx = ks.Index([1,2,3]) >>> kidx1 = (kidx + 1) >>> kidx1._internal.index_scols, kidx1._internal.scol ([Column<b'__index_level_0__'>], Column<b'(__index_level_0__ + 1)'>) >>> kidx1._internal.sdf.select(kidx1._internal.index_scols + [kidx1._internal.scol]) DataFrame[__index_level_0__: bigint, (__index_level_0__ + 1): bigint] >>> kidx1._internal.sdf.select(kidx1._internal.index_scols + [kidx1._internal.scol]).show() +-----------------+-----------------------+ |__index_level_0__|(__index_level_0__ + 1)| +-----------------+-----------------------+ | 1| 2| | 2| 3| | 3| 4| +-----------------+-----------------------+

But actually you raised a good point.
Maybe we should revisit the infrastructure of Index and we might be able to consolidate as you mentioned.
Let me think.

Oops... sorry I forgot extracting Index from Series when i tested.

Thanks for the example and I totally agree that we consolidate them! 😺
(Anyway, sometimes printed table of sdf confuse us when they're used in markdown code block at github ^^;;)

Ah, the example you replied is super good. Thanks again!!

itholic · 2020-03-06T01:34:10Z

LGTM, except for just one short question. 👍

itholic · 2020-03-06T01:38:23Z

databricks/koalas/indexes.py

+        elif repeats < 0:
+            raise ValueError("negative dimensions are not allowed")
+
+        sdf = self._internal.sdf.select(self._internal.scol)


Ah, maybe we can integrate Index & MultiIndex if we fix some lines below?

(1) self._internal.scol -> self._internal.index_scols
(2) and use self._internal.index_map when create index_map

like this

if not isinstance(repeats, int): raise ValueError("`repeats` argument must be integer, but got {}".format(type(repeats))) elif repeats < 0: raise ValueError("negative dimensions are not allowed") sdf = self._internal.sdf.select(self._internal.index_scols) # fixed here (1) internal = _InternalFrame( sdf=sdf, index_map=self._internal.index_map # and here (2) ) kdf = DataFrame(internal) # type: DataFrame if repeats == 0: return DataFrame(kdf._internal.with_filter(F.lit(False))).index else: return ks.concat([kdf] * repeats).index

or maybe i missed something 😅

scol in Index could be different from _internal.index_scols[0].

@ueshin Ah, yeah i just checked Implementation of them. seems we don't need to use index_scols[0] for Index.

itholic · 2020-03-06T02:06:07Z

Thanks for the replies, LGTM.

As discussed at #1328 (comment), we can consolidate the `Index.repeat()` and `MultiIndex.repeat()` by updating its anchor with a new scol. Also resolves #1190 and other functions which were mistakenly using `index_scols` in `Index`, whose usage is valid now.

As discussed at databricks/koalas#1328 (comment), we can consolidate the `Index.repeat()` and `MultiIndex.repeat()` by updating its anchor with a new scol. Also resolves #1190 and other functions which were mistakenly using `index_scols` in `Index`, whose usage is valid now.

Implement repeat() for Series, Index, and MultiIndex.

269c775

itholic reviewed Mar 6, 2020

View reviewed changes

HyukjinKwon approved these changes Mar 6, 2020

View reviewed changes

HyukjinKwon closed this in 699d6ca Mar 6, 2020

ueshin deleted the repeat branch March 6, 2020 02:40

ueshin mentioned this pull request Mar 11, 2020

Fix Index._with_new_scol to update its anchor. #1334

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement repeat() for Series, Index, and MultiIndex. #1328

Implement repeat() for Series, Index, and MultiIndex. #1328

ueshin commented Mar 5, 2020

codecov-io commented Mar 5, 2020 •

edited

Loading

itholic Mar 6, 2020 •

edited

Loading

ueshin Mar 6, 2020

itholic Mar 6, 2020

ueshin Mar 6, 2020 •

edited

Loading

ueshin Mar 6, 2020 •

edited

Loading

itholic Mar 6, 2020 •

edited

Loading

itholic Mar 6, 2020

itholic commented Mar 6, 2020

itholic Mar 6, 2020 •

edited

Loading

ueshin Mar 6, 2020

itholic Mar 6, 2020

itholic commented Mar 6, 2020

Implement repeat() for Series, Index, and MultiIndex. #1328

Implement repeat() for Series, Index, and MultiIndex. #1328

Conversation

ueshin commented Mar 5, 2020

codecov-io commented Mar 5, 2020 • edited Loading

Codecov Report

itholic Mar 6, 2020 • edited Loading

Choose a reason for hiding this comment

ueshin Mar 6, 2020

Choose a reason for hiding this comment

itholic Mar 6, 2020

Choose a reason for hiding this comment

ueshin Mar 6, 2020 • edited Loading

Choose a reason for hiding this comment

ueshin Mar 6, 2020 • edited Loading

Choose a reason for hiding this comment

itholic Mar 6, 2020 • edited Loading

Choose a reason for hiding this comment

itholic Mar 6, 2020

Choose a reason for hiding this comment

itholic commented Mar 6, 2020

itholic Mar 6, 2020 • edited Loading

Choose a reason for hiding this comment

ueshin Mar 6, 2020

Choose a reason for hiding this comment

itholic Mar 6, 2020

Choose a reason for hiding this comment

itholic commented Mar 6, 2020

codecov-io commented Mar 5, 2020 •

edited

Loading

itholic Mar 6, 2020 •

edited

Loading

ueshin Mar 6, 2020 •

edited

Loading

ueshin Mar 6, 2020 •

edited

Loading

itholic Mar 6, 2020 •

edited

Loading

itholic Mar 6, 2020 •

edited

Loading