Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline Series.unique to Pandas' #249

Merged
merged 4 commits into from
May 8, 2019
Merged

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented May 7, 2019

This PR targets to inline Series.unique to Pandas'. After this PR:
Pandas

>>> import pandas as pd
>>> pd.DataFrame({"Person": ['a', 'b', 'b', 'c']})
  Person
0      a
1      b
2      b
3      c
>>> df = pd.DataFrame({"Person": ['a', 'b', 'b', 'c']})
>>> df.unique()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 4376, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'unique'
>>> df.Person.unique()
array(['a', 'b', 'c'], dtype=object)

Koalas

>>> import databricks.koalas as ks
>>> ks.DataFrame({"Person": ['a', 'b', 'b', 'c']})
  Person
0      a
1      b
2      b
3      c
>>> df = ks.DataFrame({"Person": ['a', 'b', 'b', 'c']})
>>> df.unique()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/hyukjin.kwon/workspace/forked/koalas/databricks/koalas/frame.py", line 1528, in __getattr__
    return Series(self._sdf.__getattr__(key), self, self._metadata.index_info)
  File "/Users/hyukjin.kwon/workspace/forked/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1296, in __getattr__
AttributeError: 'DataFrame' object has no attribute 'unique'
>>> df.Person.unique()
0    c
1    b
2    a
Name: Person, dtype: object

Resolves #233

@HyukjinKwon HyukjinKwon requested review from ueshin and rxin May 7, 2019 06:36
@codecov-io
Copy link

codecov-io commented May 7, 2019

Codecov Report

Merging #249 into master will increase coverage by 0.26%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #249      +/-   ##
==========================================
+ Coverage   92.17%   92.44%   +0.26%     
==========================================
  Files          35       35              
  Lines        3158     3203      +45     
==========================================
+ Hits         2911     2961      +50     
+ Misses        247      242       -5
Impacted Files Coverage Δ
databricks/koalas/frame.py 92.87% <ø> (+0.92%) ⬆️
databricks/koalas/series.py 91.69% <100%> (+0.41%) ⬆️
databricks/koalas/tests/test_dataframe.py 100% <0%> (ø) ⬆️
databricks/koalas/tests/test_series.py 100% <0%> (ø) ⬆️
databricks/koalas/tests/test_utils.py 100% <0%> (ø) ⬆️
databricks/koalas/utils.py 100% <0%> (ø) ⬆️
databricks/koalas/namespace.py 90.34% <0%> (+0.41%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a3e5160...19cb1a1. Read the comment docs.

databricks/koalas/series.py Outdated Show resolved Hide resolved
-------
Returns the unique values as a Series.

See Examples section.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i don't think you need this since examples is literally one line below. i'm going to merge this and remove this line.

@rxin rxin merged commit b5e08b5 into databricks:master May 8, 2019
@HyukjinKwon HyukjinKwon deleted the unique-series branch November 6, 2019 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unique function is broken
4 participants