Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Fix regression in read_table with delim_whitespace=True #36560

Merged
merged 11 commits into from
Sep 26, 2020

Conversation

phofl
Copy link
Member

@phofl phofl commented Sep 22, 2020

We could write a private function which is called from read_csv and read_table with the appropriate default_set alternatively.

Edit: Black_pandas formats files not touched by myself. Was there an update or change of guidelines?

@phofl phofl added IO Data IO issues that don't fit into a more specific label Regression Functionality that used to work in a prior pandas version labels Sep 22, 2020
@phofl
Copy link
Member Author

phofl commented Sep 22, 2020

Test failure seems unrelated

@dsaxton
Copy link
Member

dsaxton commented Sep 22, 2020

@phofl Looks like a whatsnew conflict if you can merge master

� Conflicts:
�	doc/source/whatsnew/v1.1.3.rst
@phofl
Copy link
Member Author

phofl commented Sep 22, 2020

@dsaxton done

@dsaxton dsaxton added this to the 1.1.3 milestone Sep 22, 2020
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phofl
Copy link
Member Author

phofl commented Sep 23, 2020

@simonjayhawkins

Yes, it is correct, that this does not raise an error. But this is not a regression. This combination does not raise an error on 1.0.5. That would be a regular bugfix.

Should I add this here or make a separate PR, which won't be backported to 1.1.3?

Co-authored-by: Daniel Saxton <[email protected]>
@tacaswell
Copy link
Contributor

Following up on the comments it #36381 , I agree that the behavior still does not match the docstring. I would change the docstring to be "pass the default", not "pass nothing" which is both easier (no code changes!) and clearer (I can think of a defend-able position that '' and None are both "nothing").

@phofl
Copy link
Member Author

phofl commented Sep 23, 2020

That is probably a better solution. I would add this here if this is ok @simonjayhawkins ?

@simonjayhawkins
Copy link
Member

@simonjayhawkins

Yes, it is correct, that this does not raise an error. But this is not a regression. This combination does not raise an error on 1.0.5. That would be a regular bugfix.

Should I add this here or make a separate PR, which won't be backported to 1.1.3?

fair point. can either fix here or raise an issue for this. Not raising also applies to read_csv, so if done here should probably include read_csv and would probably need a second release note.

I think checking if sep has been passed can be done fairly easily with lib.no_default. If done in a separate PR could also be backported as a bugfix.

@simonjayhawkins
Copy link
Member

🤔 I've just re-read the docs If this option is set to True, nothing should be passed in for the delimiter parameter. so not raising is a bug. but otoh if it is not raising, and we start raising we would normally put something in the api breaking changes section of the release notes. since this is targeted for a patch release, we should probably not include anything considered an api breaking change.

so ignore my previous comments except about adding a comment to the code for the next reader.

@phofl
Copy link
Member Author

phofl commented Sep 23, 2020

Ok, will do that. I will open a issue about fixing this for 1.2?

@phofl
Copy link
Member Author

phofl commented Sep 23, 2020

@simonjayhawkins Done

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @phofl generally lgtm pending green

pandas/io/parsers.py Outdated Show resolved Hide resolved
@phofl
Copy link
Member Author

phofl commented Sep 24, 2020

@simonjayhawkins Failure seems unrelated?

@simonjayhawkins
Copy link
Member

restarted travis ci

@phofl
Copy link
Member Author

phofl commented Sep 24, 2020

That seems to be an issue with the Pipeline. It fails consistently through the PRs

@simonjayhawkins
Copy link
Member

That seems to be an issue with the Pipeline. It fails consistently through the PRs

Thanks. It's was failing intermittently on master and constantly on 1.1.x. I see it's now constantly failing on master too.

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Sep 25, 2020

😕 merged master and still arm failures. restarted.

@phofl
Copy link
Member Author

phofl commented Sep 25, 2020

@simonjayhawkins do you have any idea what the underlying reason might be?

@simonjayhawkins
Copy link
Member

it's timing out. happening on several PRs. no idea why.

@jreback
Copy link
Contributor

jreback commented Sep 26, 2020

this looks fine, just resolved conflicts.

cc @gfyoung ok here?

@simonjayhawkins simonjayhawkins merged commit ad63b97 into pandas-dev:master Sep 26, 2020
@simonjayhawkins
Copy link
Member

Thanks @phofl

@simonjayhawkins
Copy link
Member

@meeseeksdev backport 1.1.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Data IO issues that don't fit into a more specific label Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: read_table raises ValueError when delim_whitespace is set to True
6 participants