Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: inconsistent replace #36444

Merged
merged 19 commits into from
Sep 26, 2020

Conversation

QuentinN42
Copy link
Contributor

@QuentinN42 QuentinN42 commented Sep 18, 2020

  • tests added
  • tests passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • release note in doc/source/whatsnew/v1.1.3.rst

Must fix some replace in IntBlock if called with float integer (such as 1.0).

closes #35376

Added return is_integer(element) or is_float(element) to the IntBlock._can_hold_element method because an Block of ints can be replaced from int.
Read pandas-dev#35376 for more info
As @jbrockmendel said in pandas-dev#35376, you can replace an int by a float in an IntBlock only if the element is an integer.
@QuentinN42 QuentinN42 marked this pull request as draft September 18, 2020 06:52
move an import 2 lines above
@simonjayhawkins simonjayhawkins added Bug Regression Functionality that used to work in a prior pandas version replace replace method labels Sep 18, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.3 milestone Sep 18, 2020
@simonjayhawkins
Copy link
Member

Thanks @QuentinN42 for the PR. needs tests see #35376 (comment) and a release note in doc/source/whatsnew/v1.1.3.rst

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls always add a test that reproduces the issue first to make sure it fails.

pandas/core/internals/blocks.py Show resolved Hide resolved
Added the pandas-dev#35376 error as a test.
@QuentinN42
Copy link
Contributor Author

pls always add a test that reproduces the issue first to make sure it fails.

Test added.

Added regression description in `doc/source/whatsnew/v1.1.3.rst`
moved 4 lines in once
@@ -974,6 +974,8 @@ def test_replace_for_new_dtypes(self, datetime_frame):
}
),
),
# GH 35376
(DataFrame([[1, 1.0], [2, 2.0]]), 1.0, 5, DataFrame([[5, 5.0], [2, 2.0]]),),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the .replace(1, 5) case and maybe also .replace(1.0, 5.0) and .replace(1, 5.0)

Thx @simonjayhawkins for the typo

Co-authored-by: Simon Hawkins <[email protected]>
@dsaxton
Copy link
Member

dsaxton commented Sep 19, 2020

@QuentinN42 Can you merge master to fix merge conflict? Also I think this doesn't need to be a draft.

# Conflicts:
#	doc/source/whatsnew/v1.1.3.rst
added .replace(1, 5), .replace(1.0, 5.0) and .replace(1, 5.0)
@pep8speaks
Copy link

pep8speaks commented Sep 21, 2020

Hello @QuentinN42! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-25 11:06:55 UTC

Moved all singles lines in 6 lines
@QuentinN42
Copy link
Contributor Author

I've rolled back on typo from f64817b to pass the PEP8 test.
But it may not pass the CI / Checks (PR)

Formatting is just non sense
@QuentinN42
Copy link
Contributor Author

How can we see wich tests have failed and what is the given output ?

@simonjayhawkins
Copy link
Member

How can we see wich tests have failed and what is the given output ?

may be fixed by #36511

@QuentinN42 QuentinN42 marked this pull request as ready for review September 21, 2020 13:48
@QuentinN42
Copy link
Contributor Author

How can we see wich tests have failed and what is the given output ?

may be fixed by #36511

#36511 has been merged.

@simonjayhawkins
Copy link
Member

#36511 has been merged.

i've restarted the tests since (azure normally picks up the changes in master) but it looks like there are still failures.

can you merge upstream/master into this branch

@jreback jreback changed the title Issue35376 inconsistant replace BUG: inconsistent replace Sep 22, 2020
@@ -2066,7 +2067,7 @@ def _can_hold_element(self, element: Any) -> bool:
and not issubclass(tipo.type, (np.datetime64, np.timedelta64))
and self.dtype.itemsize >= tipo.itemsize
)
return is_integer(element)
return is_integer(element) or (is_float(element) and element.is_integer())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel are you sure you think the patch should be here; this is hit by a lot of paths where this doesn't make any sense. we almost always want to take some sort of action based on return value here if it fails. e.g. why is this not more specific to replace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also is this simply wrong:

[ins] In [3]: is_integer(1.0)
Out[3]: False

especially since

[ins] In [4]: 1.0.is_integer()
Out[4]: True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh its doing an equality check under the hood, we are not

n [60]: 1.0.is_integer()                                                                                                                                  
Out[60]: True

In [61]: 1.1.is_integer()                                                                                                                                  
Out[61]: False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe could replace this entirely with element.is_integer() and call it a day

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though my concern is still that the purpose of this check is can we hold an integer, not are we a float and can be treated as an integer. though practically it may not make a difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't realy understand ...
Need I replace return is_integer(element) into return element.is_integer().
If I do that, need I try except the AttributeError and return None ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or False ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longer-term we're going to have to look more carefully about can_hold_element and what we want it to mean. In the short run, for this check to be consistent with the tipo check above it should exclude floats

(so my previous suggestion may have been unhelpful, sorry)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is fine @QuentinN42 if you can add a comment here would be good

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls ping on green

@@ -2066,7 +2067,7 @@ def _can_hold_element(self, element: Any) -> bool:
and not issubclass(tipo.type, (np.datetime64, np.timedelta64))
and self.dtype.itemsize >= tipo.itemsize
)
return is_integer(element)
return is_integer(element) or (is_float(element) and element.is_integer())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is fine @QuentinN42 if you can add a comment here would be good

marking GH 36444 and GH 35376 close to (is_float(element) and element.is_integer())
@QuentinN42
Copy link
Contributor Author

I have a feeling that there are reproducibility errors with black.
On my personals repos, I've removed it from pre-commit for this errors.

On server :

black....................................................................Failed
- hook id: black
- files were modified by this hook

reformatted /home/runner/work/pandas/pandas/pandas/tests/frame/methods/test_replace.py
All done! ✨ 🍰 ✨
1 file reformatted, 1133 files left unchanged.

flake8...................................................................Passed
flake8-pyx...............................................................Passed
flake8-pxd...............................................................Passed
isort....................................................................Passed
pyupgrade................................................................Passed

On local :

black....................................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

pandas/core/generic.py:1:1: F407 future feature annotations is not defined
pandas/core/algorithms.py:5:1: F407 future feature annotations is not defined
pandas/core/construction.py:7:1: F407 future feature annotations is not defined
pandas/core/frame.py:11:1: F407 future feature annotations is not defined

flake8-pyx...............................................................Passed
flake8-pxd...............................................................Passed
isort....................................................................Passed

The error come from this line (pandas/tests/frame/methods/test_replace.py):

(DataFrame([[1, 1.0], [2, 2.0]]), 1, 5, DataFrame([[5, 5.0], [2, 2.0]]),),

that black want to split into :

(
    DataFrame([[1, 1.0], [2, 2.0]]),
    1,
    5,
    DataFrame([[5, 5.0], [2, 2.0]]),
),

@simonjayhawkins
Copy link
Member

@QuentinN42 the version of black used on ci was changed yesterday. see #36493

updating black locally and merging upstream/master will probably fix.

@QuentinN42
Copy link
Contributor Author

The error come from pandas/tests/test_sorting.py::TestSafeSort::test_codes_out_of_bound.
Not me 😢 (I hope).

@simonjayhawkins
Copy link
Member

The error come from pandas/tests/test_sorting.py::TestSafeSort::test_codes_out_of_bound.
Not me 😢 (I hope).

seen b4 #36382 (comment)

restarted ci

…istant-replace

# Conflicts:
#	doc/source/whatsnew/v1.1.3.rst
@simonjayhawkins
Copy link
Member

@jreback green #36444 (review)

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @QuentinN42 lgtm

@@ -2066,7 +2067,8 @@ def _can_hold_element(self, element: Any) -> bool:
and not issubclass(tipo.type, (np.datetime64, np.timedelta64))
and self.dtype.itemsize >= tipo.itemsize
)
return is_integer(element)
# Will have to be modified in the future see GH 36444 and GH 35376
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not what I meant.
say something like

we have not inferred an integer; check if we have an equal float

@jreback
Copy link
Contributor

jreback commented Sep 24, 2020

the current comment is not useful - needs to be updated

@QuentinN42
Copy link
Contributor Author

I've changed into :

We have not inferred an integer from the dtype
check if we have a builtin int or a float equal to an int

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Sep 25, 2020

@jreback ok to merge on green?

@jreback jreback merged commit 04e07d0 into pandas-dev:master Sep 26, 2020
@jreback
Copy link
Contributor

jreback commented Sep 26, 2020

thanks @QuentinN42

@simonjayhawkins
Copy link
Member

@meeseeksdev backport 1.1.x

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Sep 26, 2020
simonjayhawkins pushed a commit that referenced this pull request Sep 26, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Functionality that used to work in a prior pandas version replace replace method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: inconsistent replace
6 participants