-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Series.where #922
Conversation
Codecov Report
@@ Coverage Diff @@
## master #922 +/- ##
==========================================
+ Coverage 94.52% 94.53% +<.01%
==========================================
Files 34 34
Lines 6465 6476 +11
==========================================
+ Hits 6111 6122 +11
Misses 354 354
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add more tests in test_series
to check various patterns? e.g.,
>>> s1 = pd.Series([0, 1, 2, 3, 4])
>>> s2 = pd.Series([100, 200, 300, 400, 500])
>>> s1.where(s2 > 100)
0 NaN
1 1.0
2 2.0
3 3.0
4 4.0
dtype: float64
and negative cases?
@@ -3409,6 +3409,80 @@ def replace(self, to_replace=None, value=None, regex=False) -> 'Series': | |||
|
|||
return self._with_new_scol(current) | |||
|
|||
def where(self, cond, other=np.nan): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itholic seems like pandas shares the same implementation internally. After this PR is merged, can you move this into _Frame
class and implement DataFrame.where
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, i'm going to work right after this PR is merged
Seems fine to me otherwise. |
Softagram Impact Report for pull/922 (head commit: b620849)⭐ Change Overview
📄 Full report
Impact Report explained. Give feedback on this report to [email protected] |
# | 4| 4| true| 500| | ||
# +-----------------+---+----------------+-----------------+ | ||
data_col_name = self._internal.column_name_for(self._internal.column_index[0]) | ||
index_column = self._internal.index_columns[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itholic, I think this doesn't support multi-level index cases. Can you fix this please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index_columns
can be multiple and we cannot just use the first one only.
set_option("compute.ops_on_diff_frames", True) | ||
|
||
@classmethod | ||
def tearDownClass(cls): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itholic disable this. compute.ops_on_diff_frames
is disabled by default because it costs a lot. We should move the test cases into OpsOnDiffFramesEnabledTest
@@ -742,6 +753,23 @@ def test_duplicates(self): | |||
self.assert_eq(pser.drop_duplicates().sort_values(), | |||
kser.drop_duplicates().sort_values()) | |||
|
|||
def test_where(self): | |||
pser1 = pd.Series([0, 1, 2, 3, 4], name=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test when compute.ops_on_diff_frames
is off? I think we can still use a scalar values for other
such as int
.
Like pandas Series.where (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.where.html)
implemented function
where
for series.