-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Series.pop #866
Implement Series.pop #866
Conversation
Codecov Report
@@ Coverage Diff @@
## master #866 +/- ##
==========================================
+ Coverage 95.19% 95.21% +0.01%
==========================================
Files 34 34
Lines 6810 6835 +25
==========================================
+ Hits 6483 6508 +25
Misses 327 327
Continue to review full report at Codecov.
|
The returned value seems different from pandas'. >>> s
lama speed 45.0
weight 200.0
length 1.2
cow speed 30.0
weight 250.0
length 1.5
falcon speed 320.0
weight 1.0
length 0.3
dtype: float64
>>> s.pop('lama')
speed 45.0
weight 200.0
length 1.2
dtype: float64
>>> s
cow speed 30.0
weight 250.0
length 1.5
falcon speed 320.0
weight 1.0
length 0.3
dtype: float64 |
Btw, shall we remove |
@ueshin Thanks for reviewing!! fixed them. :) |
@ueshin Ah I got it. Then i think maybe our most important purpose is mimic pandas as same as possible, not more, not less, right ? 😃 |
@itholic Yup, basically that's right. We rarely expand if there is a special reason, though. haha. it depends. 😸 |
@ueshin Got it. Thanks for replying :) |
Shall we support this? >>> s
a lama speed 45.0
weight 200.0
length 1.2
cow speed 30.0
weight 250.0
length 1.5
b falcon speed 320.0
weight 1.0
length 0.3
dtype: float64
>>> s.pop(('a', 'lama'))
speed 45.0
weight 200.0
length 1.2
dtype: float64
>>> s
a cow speed 30.0
weight 250.0
length 1.5
b falcon speed 320.0
weight 1.0
length 0.3
dtype: float64 |
@ueshin sure, i'm going to add. Thanks for comment! :) |
@ueshin , added it 😄 |
So far, I found the following issues: For the given >>> s
lama speed 45.0
weight 200.0
length 1.2
cow speed 30.0
weight 250.0
length 1.5
falcon speed 320.0
weight 1.0
length 0.3
Name: 0, dtype: float64
>>> (s + 1).pop('lama')
Traceback (most recent call last):
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ueshin/workspace/databricks-koalas/master/databricks/koalas/series.py", line 3286, in pop
index_map=self._internal.index_map[len(item):])
File "/Users/ueshin/workspace/databricks-koalas/master/databricks/koalas/internal.py", line 693, in copy
column_index=column_index, column_index_names=column_index_names)
File "/Users/ueshin/workspace/databricks-koalas/master/databricks/koalas/internal.py", line 421, in __init__
self._data_columns = sdf.select(scol).columns
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6/lib/python3.6/site-packages/pyspark/sql/dataframe.py", line 1320, in select
jdf = self._jdf.select(self._jcols(*cols))
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6/lib/python3.6/site-packages/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6/lib/python3.6/site-packages/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: 'Resolved attribute(s) 0#33 missing from __index_level_1__#32,(0 + 1)#51 in operator !Project [(0#33 + cast(1 as double)) AS (0 + 1)#54].;;\n!Project [(0#33 + cast(1 as double)) AS (0 + 1)#54]\n+- Project [__index_level_1__#32, (0 + 1)#51]\n +- Filter (__index_level_0__#31 = lama)\n +- Project [__index_level_1__#32, (0#33 + cast(1 as double)) AS (0 + 1)#51, __index_level_0__#31]\n +- LogicalRDD [__index_level_0__#31, __index_level_1__#32, 0#33], false\n'
>>> s.pop(('lama', 'speed'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ueshin/workspace/databricks-koalas/master/databricks/koalas/series.py", line 3470, in __repr__
pser = self.head(max_display_count + 1)._to_internal_pandas()
File "/Users/ueshin/workspace/databricks-koalas/master/databricks/koalas/series.py", line 1634, in head
return _col(self.to_dataframe().head(n))
File "/Users/ueshin/workspace/databricks-koalas/master/databricks/koalas/series.py", line 1072, in to_frame
index_map=renamed._internal.index_map,
File "/Users/ueshin/workspace/databricks-koalas/master/databricks/koalas/internal.py", line 574, in index_map
assert len(self._index_map) > 0
AssertionError
>>> s.to_pandas().pop('horse')
Traceback (most recent call last):
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 1319, in get_value
return libindex.get_value_at(s, k)
File "pandas/_libs/index.pyx", line 43, in pandas._libs.index.get_value_at
File "pandas/_libs/index.pyx", line 48, in pandas._libs.index.get_value_at
File "pandas/_libs/util.pxd", line 113, in pandas._libs.util.get_value_at
File "pandas/_libs/util.pxd", line 98, in pandas._libs.util.validate_indexer
TypeError: 'str' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6/lib/python3.6/site-packages/pandas/core/generic.py", line 809, in pop
result = self[item]
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6/lib/python3.6/site-packages/pandas/core/series.py", line 868, in __getitem__
result = self.index.get_value(self, key)
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 1327, in get_value
raise e1
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 1311, in get_value
return self._engine.get_value(s, k)
File "pandas/_libs/index.pyx", line 81, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 89, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 667, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
KeyError: 'horse' whereas Koalas returns empty Series: >>> s.pop('horse')
Series([], Name: 0, dtype: float64) As for 3), the current behavior is reasonable, otherwise we need to count the row number. cc @HyukjinKwon |
@ueshin Thanks for reviewing!! i'm going to take a look at them. |
@ueshin Fixed (1), (2) and not yet for (3). |
hmm.. i don't understand why travis failed only in python3.6 |
@itholic can you rebase this please? |
@HyukjinKwon Sure, i just rebased them :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good otherwise
Softagram Impact Report for pull/866 (head commit: 513bd8a)⭐ Change Overview
📄 Full report
Impact Report explained. Give feedback on this report to [email protected] |
Resolves #789
Also support for MultiIndex
And beyond pandas, we can support for MultiIndex with specifying level.
(But i'm not sure this is necessary.)
In pandas, above example raises TypeError like below: