Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated read_html to add option #39925

Closed
wants to merge 3 commits into from
Closed

Conversation

Derekt2
Copy link

@Derekt2 Derekt2 commented Feb 20, 2021

Adds optional boolean parameter "remove_whitespace" to skip the remove_whitespace functionality. Defaults to true to support backwards compatibility. See #24766

Adds optional boolean parameter "remove_whitespace" to skip the remove_whitespace functionality. Defaults to true to support backwards compatibility. See pandas-dev#24766
@pep8speaks
Copy link

pep8speaks commented Feb 20, 2021

Hello @Derekt2! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 175:1: W293 blank line contains whitespace
Line 197:1: W293 blank line contains whitespace
Line 911:89: E501 line too long (92 > 88 characters)
Line 918:89: E501 line too long (90 > 88 characters)
Line 1061:1: W293 blank line contains whitespace

Comment last updated at 2021-02-26 18:43:17 UTC

@@ -73,7 +73,7 @@ def _importers():
_RE_WHITESPACE = re.compile(r"[\r\n]+|\s{2,}")


def _remove_whitespace(s: str, regex=_RE_WHITESPACE) -> str:
def _remove_whitespace(s: str, regex=_RE_WHITESPACE, remove_whitespace=True) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems a bit odd to define a boolean parameter input to a function named remove_whitespace if it really wants to remove_whitespace or not.
Better to do a conditional check before the function is called, and either call it or don't.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent point. updated the pull to reflect this suggestion

@jreback jreback added the IO HTML read_html, to_html, Styler.apply, Styler.applymap label Feb 21, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs tests

pandas/io/html.py Show resolved Hide resolved
pandas/io/html.py Show resolved Hide resolved
pandas/io/html.py Show resolved Hide resolved
pandas/io/html.py Show resolved Hide resolved
@Derekt2
Copy link
Author

Derekt2 commented Feb 26, 2021

Added the .. versionadded:: 1.3.0 to the doc strings

@jreback
Copy link
Contributor

jreback commented Feb 27, 2021

@Derekt2 this needs some tests.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Mar 30, 2021
@simonjayhawkins
Copy link
Member

@Derekt2 this needs some tests.

Thanks @Derekt2 for the PR. closing as stale. ping if you want to continue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO HTML read_html, to_html, Styler.apply, Styler.applymap Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_html ignores paragraphs in table cells
5 participants