Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYP: Type hints & assert statements #42044

Closed
wants to merge 9 commits into from

Conversation

rgkimball
Copy link

This PR addresses two concerns of best practices:

  • use of assert statements outside of test contexts; addressed by replacing with explicit value checks and exception raising
  • improved type hinting; several instances where this was previously missing or incorrect, notably DataFrame.merge for which the correct use of strings in the indicator argument did not match the boolean hint

Thanks for your time, happy to discuss or revise any of these as you see fit if they aren't consistent with the contribution guidelines.

@pep8speaks
Copy link

pep8speaks commented Jun 16, 2021

Hello @rgkimball! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-06-25 02:35:14 UTC

@mzeitlin11
Copy link
Member

Thanks for the pr @rgkimball! I remember there being an issue where use of asserts was discussed at length, but can't find it now. I think the basic philosophy behind most pandas asserts is that asserts should only be used to catch library bugs, not user errors. So any places where you've removed the assert and replaced with an explicit error, there should be a user-facing test case that can hit it.

Also, this pr is pretty complicated since it touches a lot of stuff - review would be a lot faster if you could break it up (for example, into typing addition for merge, typing for ..., change of asserts...)

Comment on lines 125 to 126
MergeTypes = Literal['inner', 'outer', 'left', 'right', 'cross']
ConcatTypes = Literal['inner', 'outer']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

master currently supports python3.7 so Literal needs to be inside a TYPE_CHECKING block.

This will be changed shortly #41989, so maybe could wait for that for these changes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment; either works for me - will all other TYPE_CHECKING blocks be removed at that point? If not, it may be more consistent to just add it here as well for compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #41989 we can start the cleanup of the typing workarounds for Python3.7 (which we use since we don't have typing_extensions as a required dependency) so we hopefully can remove the TYPE_CHECKING that were specifically added to use Literal. (and others than were not available in the Python typing module until Python 3.8)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - agreed then it makes sense to just wait until then for most of these additions. Based on Jeff's comment about waiting until 1.3, would ~2-3 weeks be a good estimate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. we've only just released the rc, so a couple of weeks at least.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#41989 now merged.

@rgkimball
Copy link
Author

I think the basic philosophy behind most pandas asserts is that asserts should only be used to catch library bugs, not user errors. So any places where you've removed the assert and replaced with an explicit error, there should be a user-facing test case that can hit it.

Ah, understood! The distinction makes sense, in that case there are probably a few instances here where it should revert to the assert for protected library functions - I'll add test cases for the others.

Also, this pr is pretty complicated since it touches a lot of stuff - review would be a lot faster if you could break it up (for example, into typing addition for merge, typing for ..., change of asserts...)

No problem! I wasn't sure if it would be easier by feature or by nature of the change - would you prefer to see those as separate commits on this PR or separate PR's altogether? The latter seems like it could clutter the queue a bit but happy to accommodate what's easiest for you & the team.

@simonjayhawkins
Copy link
Member

Thanks for the pr @rgkimball! I remember there being an issue where use of asserts was discussed at length, but can't find it now.

It feels like it gets discussed regularly in different PRs. we did have #32785 to try and keep those discussions in one place.

@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Jun 16, 2021
@rgkimball rgkimball changed the title Type hints & assert statements TYP: Type hints & assert statements Jun 16, 2021
pandas/core/apply.py Outdated Show resolved Hide resolved
@@ -9148,7 +9148,7 @@ def merge(
sort: bool = False,
suffixes: Suffixes = ("_x", "_y"),
copy: bool = True,
indicator: bool = False,
indicator: bool | str = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure about this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per _MergeOperation you can optionally supply a string as the column name, otherwise the indicator is given a default name:

if isinstance(self.indicator, str):
self.indicator_name = self.indicator
elif isinstance(self.indicator, bool):
self.indicator_name = "_merge" if self.indicator else None

Maybe I missed this, but is there a "valid column name" definition that would be more specific than str?

@rgkimball
Copy link
Author

Note failing tests are due to importing Literal in <3.8

@simonjayhawkins
Copy link
Member

@rgkimball can you merge master (and maybe break this PR up, e.g. maybe keep type annotation additions separate from code changes)

@simonjayhawkins simonjayhawkins added the Error Reporting Incorrect or improved errors from pandas label Jul 7, 2021
@@ -5691,7 +5691,9 @@ def _validate_indexer(self, form: str_t, key, kind: str_t):
if key is not None and not is_integer(key):
raise self._invalid_indexer(form, key)

def _maybe_cast_slice_bound(self, label, side: str_t, kind=no_default):
def _maybe_cast_slice_bound(
self, label, side: str_t, kind: Literal["loc", "getitem"] = no_default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think needs to be amended to include no_default?

@jbrockmendel
Copy link
Member

i like the annotations added, dont think its worth futzing with the assertions for fully internal code

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Aug 26, 2021
@mroeschke
Copy link
Member

Thanks for the PR, but it appears to have gone stale. Let us know if you're still interested in working on this and we can reopen. Closing.

@mroeschke mroeschke closed this Sep 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Stale Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants