Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: fix bug in DatetimeIndex slicing with irregular or unsorted indices #37023

Merged

Conversation

CloseChoice
Copy link
Member

@CloseChoice CloseChoice commented Oct 10, 2020

closes #36953
closes #35509

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@pep8speaks
Copy link

pep8speaks commented Oct 10, 2020

Hello @CloseChoice! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-10-26 12:55:55 UTC

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CloseChoice for the PR!

Some comments.

Can you add the OP from #35509? As per #36953 (comment) the two issues are caused by the same underlying bug but different behaviors so we'd like to have tests for both

I'm also not sure about the current location for this. Haven't looked in detail but somewhere like pandas/tests/frame/test_timeseries.py might be more appropriate

pandas/core/generic.py Outdated Show resolved Hide resolved
pandas/tests/generic/test_generic.py Outdated Show resolved Hide resolved

def test_slice_incomplete_datetimes(self):
# GH36953 and GH35509
increment = pd.Series(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is the OP but is it possible to explicitly construct it? without using the cumsum

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pandas/tests/generic/test_generic.py Outdated Show resolved Hide resolved
pandas/tests/generic/test_generic.py Outdated Show resolved Hide resolved
@CloseChoice
Copy link
Member Author

Thanks @CloseChoice for the PR!

Some comments.

Can you add the OP from #35509? As per #36953 (comment) the two issues are caused by the same underlying bug but different behaviors so we'd like to have tests for both

I'm also not sure about the current location for this. Haven't looked in detail but somewhere like pandas/tests/frame/test_timeseries.py might be more appropriate

Hi @arw2019 thanks for the quick review. I added another test corresponding to #35509 and moved both tests to your suggested destination pandas/tests/frame/test_timeseries.py.

@simonjayhawkins simonjayhawkins added Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version Datetime Datetime data dtype labels Oct 10, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.4 milestone Oct 10, 2020
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CloseChoice can you add a release note to doc/source/whatsnew/v1.1.4.rst

@@ -63,3 +63,22 @@ def test_datetime_assignment_with_NaT_and_diff_time_units(self):
{0: [1, None], "new": [1e9, None]}, dtype="datetime64[ns]"
)
tm.assert_frame_equal(result, expected)

def test_slice_irregular_datetime_index_with_nan(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move these to pandas/tests/indexing/test_partial.py

always use
result=
expected=
tm.assert_.....

@jreback
Copy link
Contributor

jreback commented Oct 10, 2020

also this needs a whatsnew note for 1.1.4 in regressions (mention both issues)

index = pd.to_datetime(["2012-01-01", "2012-01-02", "2012-01-03", None])
df = pd.DataFrame(range(len(index)), index=index)
expected = pd.DataFrame(range(len(index[:3])), index=index[:3])
tm.assert_frame_equal(df["2012-01-01":"2012-01-04"], expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you define result = df["2012-01-01":"2012-01-04"] above?

{"col1": ["a", "c"], "col2": [1, 3]},
index=pd.to_datetime(["2020-08-01", "2020-08-05"]),
)
tm.assert_frame_equal(df.loc["2020-08"], expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you define result = df["2020-08"] above?

@@ -3739,7 +3739,6 @@ def _slice(self: FrameOrSeries, slobj: slice, axis=0) -> FrameOrSeries:

Slicing with this method is *always* positional.
"""
assert isinstance(slobj, slice), type(slobj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the patch; something is going down the wrong path and ending up here. this routine is only for slices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will look into that. I just wanted to note that in version 1.0.9 it went down just the same path except that the exception wasn't there yet. In this sense we are not talking about a regression but about fixing a bug which seems to have been there since a while ago.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CloseChoice can you check if the suggestion from here #35509 (comment) passes the tests you added?

@jorisvandenbossche jorisvandenbossche changed the title remove assert in core/generic.py and add test for datetimeindex slicing REGR: fix bug in DatetimeIndex slicing with irregular or unsorted indices Oct 12, 2020
@simonjayhawkins
Copy link
Member

@CloseChoice can you fix the pre-commit errors. (I'm on Windows and pre-commit fails to initialize and these checks can no longer be performed locally from code-checks.sh #37240) cc @MarcoGorelli

@simonjayhawkins simonjayhawkins mentioned this pull request Oct 26, 2020
@MarcoGorelli
Copy link
Member

MarcoGorelli commented Oct 26, 2020

@CloseChoice can you fix the pre-commit errors. (I'm on Windows and pre-commit fails to initialize and these checks can no longer be performed locally from code-checks.sh #37240) cc @MarcoGorelli

@CloseChoice you just need to replace pd.DataFrame with DataFrame

@simonjayhawkins what errors do you get? Part of the motivation for moving these checks over to pre-commit was that they'd be cross-platform

EDIT

@simonjayhawkins the pre-commit hooks all pass, it's the code-checks.sh that fail 🤣

@simonjayhawkins
Copy link
Member

@simonjayhawkins what errors do you get?
i'm using a different machine, so have not looked in detail yet. (and can't remember what did last time)

version information

pre-commit version: 2.7.1
sys.version:
    3.8.6 | packaged by conda-forge | (default, Oct  7 2020, 18:22:52) [MSC v.1916 64 bit (AMD64)]
sys.executable: C:\Users\simon\anaconda3\envs\pandas-dev\python.exe
os.name: nt
sys.platform: win32

error information

An unexpected error has occurred: CalledProcessError: command: ('C:\\Users\\simon\\anaconda3\\envs\\pandas-dev\\python.exe', '-mvirtualenv', 'C:\\Users\\simon\\.cache\\pre-commit\\repoyew02fci\\py_env-python3', '-p', 'C:\\Users\\simon\\anaconda3\\envs\\pandas-dev\\python.exe')
return code: 1
expected return code: 0
stdout:
    FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\simon\\anaconda3\\envs\\pandas-dev\\Lib\\venv\\scripts\\nt\\python.exe'
    
stderr: (none)
Traceback (most recent call last):
  File "C:\Users\simon\anaconda3\envs\pandas-dev\lib\site-packages\pre_commit\error_handler.py", line 63, in error_handler
    yield
  File "C:\Users\simon\anaconda3\envs\pandas-dev\lib\site-packages\pre_commit\main.py", line 390, in main
    return run(args.config, store, args)
  File "C:\Users\simon\anaconda3\envs\pandas-dev\lib\site-packages\pre_commit\commands\run.py", line 388, in run
    install_hook_envs(hooks, store)
  File "C:\Users\simon\anaconda3\envs\pandas-dev\lib\site-packages\pre_commit\repository.py", line 206, in install_hook_envs
    _hook_install(hook)
  File "C:\Users\simon\anaconda3\envs\pandas-dev\lib\site-packages\pre_commit\repository.py", line 82, in _hook_install
    lang.install_environment(
  File "C:\Users\simon\anaconda3\envs\pandas-dev\lib\site-packages\pre_commit\languages\python.py", line 213, in install_environment
    cmd_output_b(*venv_cmd, cwd='/')
  File "C:\Users\simon\anaconda3\envs\pandas-dev\lib\site-packages\pre_commit\util.py", line 157, in cmd_output_b
    raise CalledProcessError(returncode, cmd, retcode, stdout_b, stderr_b)
pre_commit.util.CalledProcessError: command: ('C:\\Users\\simon\\anaconda3\\envs\\pandas-dev\\python.exe', '-mvirtualenv', 'C:\\Users\\simon\\.cache\\pre-commit\\repoyew02fci\\py_env-python3', '-p', 'C:\\Users\\simon\\anaconda3\\envs\\pandas-dev\\python.exe')
return code: 1
expected return code: 0
stdout:
    FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\simon\\anaconda3\\envs\\pandas-dev\\Lib\\venv\\scripts\\nt\\python.exe'
    
stderr: (none)

Part of the motivation for moving these checks over to pre-commit was that they'd be cross-platform

I understand that, but didn't we discuss that pre-commit was optional.

@simonjayhawkins the pre-commit hooks all pass, it's the code-checks.sh that fail 🤣

my bad. missed that.

@MarcoGorelli
Copy link
Member

@simonjayhawkins thanks for the traceback - are you using conda? I could reproduce that on a windows conda environment. Looks like it's a bug in virtualenv pypa/virtualenv#1986

If you downgrade virtualenv to 20.0.33 then it should work (did for me) - if it works for you I'll note it in the docs

@simonjayhawkins
Copy link
Member

If you downgrade virtualenv to 20.0.33 then it should work (did for me) - if it works for you I'll note it in the docs

thanks works for me too. (20.0.35 was installed). just started running on all-files. will report back if fails.

@simonjayhawkins
Copy link
Member

back to the issue

so pls merge master and let's see what the CI says

failures on Windows py38_np18 seen on other PRs.

@jreback jreback merged commit 425ce1b into pandas-dev:master Oct 26, 2020
@jreback
Copy link
Contributor

jreback commented Oct 26, 2020

thanks @CloseChoice

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Oct 26, 2020

will report back if fails.

although it installed fine for me, some hooks still failed, likely because currently we have

./scripts/validate_unwanted_patterns --validation-type="private_function_across_module"

instead of

python scripts/validate_unwanted_patterns --validation-type="private_function_across_module"

, will fix

@simonjayhawkins
Copy link
Member

will report back if fails.

although it installed fine for me, some hooks still failed, likely because currently we have

works for me in anaconda prompt and git bash

simonjayhawkins pushed a commit that referenced this pull request Oct 26, 2020
…ular or unsorted indices (#37418)

Co-authored-by: Tobias Pitters <[email protected]>
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Projects
None yet
8 participants