Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement repeat() for Series, Index, and MultiIndex. #1328

Closed
wants to merge 1 commit into from

Conversation

ueshin
Copy link
Collaborator

@ueshin ueshin commented Mar 5, 2020

@codecov-io
Copy link

codecov-io commented Mar 5, 2020

Codecov Report

Merging #1328 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1328      +/-   ##
==========================================
+ Coverage   95.23%   95.25%   +0.01%     
==========================================
  Files          34       34              
  Lines        7373     7401      +28     
==========================================
+ Hits         7022     7050      +28     
  Misses        351      351
Impacted Files Coverage Δ
databricks/koalas/missing/indexes.py 100% <ø> (ø) ⬆️
databricks/koalas/missing/series.py 100% <ø> (ø) ⬆️
databricks/koalas/series.py 96.73% <100%> (+0.03%) ⬆️
databricks/koalas/indexes.py 96.5% <100%> (+0.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a849249...269c775. Read the comment docs.


sdf = self._internal.sdf.select(self._internal.scol)
internal = _InternalFrame(
sdf=sdf, index_map=[(sdf.columns[0], self._internal.index_names[0])]
Copy link
Contributor

@itholic itholic Mar 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just curious about this, why just don't use self._internal.index_map here rather create index_map manually? 🤔 (Maybe do they work not same way ?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because self._internal.scol could have a different name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ueshin Ah.. I think found the case. Thanks!! :)

>>> self = ks.Series(['a', 'b', 'c'], index=[1, 2, 3])
>>> sdf = self._internal.sdf.select(self._internal.scol)
>>> self._internal.index_map
[('__index_level_0__', None)]
>>> [(sdf.columns[0], self._internal.index_names[0])]
[('0', None)]

Copy link
Collaborator Author

@ueshin ueshin Mar 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that's a Series case.
The example for Index is like:

>>> import databricks.koalas as ks
>>> kidx = ks.Index([1,2,3])
>>> kidx1 = (kidx + 1)
>>> kidx1._internal.index_scols, kidx1._internal.scol
([Column<b'__index_level_0__'>], Column<b'(__index_level_0__ + 1)'>)
>>> kidx1._internal.sdf.select(kidx1._internal.index_scols + [kidx1._internal.scol])
DataFrame[__index_level_0__: bigint, (__index_level_0__ + 1): bigint]
>>> kidx1._internal.sdf.select(kidx1._internal.index_scols + [kidx1._internal.scol]).show()
+-----------------+-----------------------+
|__index_level_0__|(__index_level_0__ + 1)|
+-----------------+-----------------------+
|                1|                      2|
|                2|                      3|
|                3|                      4|
+-----------------+-----------------------+

Copy link
Collaborator Author

@ueshin ueshin Mar 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But actually you raised a good point.
Maybe we should revisit the infrastructure of Index and we might be able to consolidate as you mentioned.
Let me think.

Copy link
Contributor

@itholic itholic Mar 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops... sorry I forgot extracting Index from Series when i tested.

Thanks for the example and I totally agree that we consolidate them! 😺
(Anyway, sometimes printed table of sdf confuse us when they're used in markdown code block at github ^^;;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the example you replied is super good. Thanks again!!

@itholic
Copy link
Contributor

itholic commented Mar 6, 2020

LGTM, except for just one short question. 👍

elif repeats < 0:
raise ValueError("negative dimensions are not allowed")

sdf = self._internal.sdf.select(self._internal.scol)
Copy link
Contributor

@itholic itholic Mar 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, maybe we can integrate Index & MultiIndex if we fix some lines below?

(1) self._internal.scol -> self._internal.index_scols
(2) and use self._internal.index_map when create index_map

like this

        if not isinstance(repeats, int):
            raise ValueError("`repeats` argument must be integer, but got {}".format(type(repeats)))
        elif repeats < 0:
            raise ValueError("negative dimensions are not allowed")

        sdf = self._internal.sdf.select(self._internal.index_scols)  # fixed here (1)
        internal = _InternalFrame(
            sdf=sdf, index_map=self._internal.index_map  # and here (2)
        )
        kdf = DataFrame(internal)  # type: DataFrame
        if repeats == 0:
            return DataFrame(kdf._internal.with_filter(F.lit(False))).index
        else:
            return ks.concat([kdf] * repeats).index

or maybe i missed something 😅

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scol in Index could be different from _internal.index_scols[0].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ueshin Ah, yeah i just checked Implementation of them. seems we don't need to use index_scols[0] for Index.

@itholic
Copy link
Contributor

itholic commented Mar 6, 2020

Thanks for the replies, LGTM.

@ueshin ueshin deleted the repeat branch March 6, 2020 02:40
HyukjinKwon pushed a commit that referenced this pull request Mar 12, 2020
As discussed at #1328 (comment), we can consolidate the `Index.repeat()` and `MultiIndex.repeat()` by updating its anchor with a new scol.
Also resolves #1190 and other functions which were mistakenly using `index_scols` in `Index`, whose usage is valid now.
rising-star92 added a commit to rising-star92/databricks-koalas that referenced this pull request Jan 27, 2023
As discussed at databricks/koalas#1328 (comment), we can consolidate the `Index.repeat()` and `MultiIndex.repeat()` by updating its anchor with a new scol.
Also resolves #1190 and other functions which were mistakenly using `index_scols` in `Index`, whose usage is valid now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants