-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String dtype: use 'str' string alias and representation for NaN-variant of the dtype #59388
Merged
jorisvandenbossche
merged 16 commits into
pandas-dev:main
from
jorisvandenbossche:string-dtype-alias
Aug 8, 2024
Merged
Changes from 1 commit
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
6e8d76a
String dtype: use 'str' string alias and representation for NaN-varia…
jorisvandenbossche 7c00808
clean-up
jorisvandenbossche 1ea64cf
fix typing and interchange tests
jorisvandenbossche a12b345
fix bunch of tests with future.infer_string
jorisvandenbossche 476aa44
skip test for pyarrow + fix typing
jorisvandenbossche 757ea8a
Merge remote-tracking branch 'upstream/main' into string-dtype-alias
jorisvandenbossche 220960f
fix more tests + remove some xfails
jorisvandenbossche d75f89d
fix repr tests
jorisvandenbossche 61ec243
Merge remote-tracking branch 'upstream/main' into string-dtype-alias
jorisvandenbossche 7399ae5
Merge remote-tracking branch 'upstream/main' into string-dtype-alias
jorisvandenbossche f2ff5db
use str alias more in tests
jorisvandenbossche c445a36
use str alias more in tests instead of object or str
jorisvandenbossche d816476
string storage option not yet honored for NaN-variant
jorisvandenbossche 39662d2
fix select_dtypes test + more test changes
jorisvandenbossche 53c8f75
Merge remote-tracking branch 'upstream/main' into string-dtype-alias
jorisvandenbossche 8ddb6cc
fixup
jorisvandenbossche File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,6 @@ | |
|
||
from typing import ( | ||
TYPE_CHECKING, | ||
ClassVar, | ||
Literal, | ||
cast, | ||
) | ||
|
@@ -110,9 +109,12 @@ class StringDtype(StorageExtensionDtype): | |
string[pyarrow] | ||
""" | ||
|
||
# error: Cannot override instance variable (previously declared on | ||
# base class "StorageExtensionDtype") with class variable | ||
name: ClassVar[str] = "string" # type: ignore[misc] | ||
@property | ||
def name(self) -> str: | ||
if self._na_value is libmissing.NA: | ||
return "string" | ||
else: | ||
return "str" | ||
|
||
#: StringDtype().na_value uses pandas.NA except the implementation that | ||
# follows NumPy semantics, which uses nan. | ||
|
@@ -129,7 +131,7 @@ def __init__( | |
) -> None: | ||
# infer defaults | ||
if storage is None: | ||
if using_string_dtype() and na_value is not libmissing.NA: | ||
if na_value is not libmissing.NA: | ||
storage = "pyarrow" | ||
else: | ||
storage = get_option("mode.string_storage") | ||
|
@@ -159,11 +161,19 @@ def __init__( | |
self.storage = storage | ||
self._na_value = na_value | ||
|
||
def __repr__(self) -> str: | ||
if self._na_value is libmissing.NA: | ||
return f"{self.name}[{self.storage}]" | ||
else: | ||
# TODO add more informative repr | ||
return self.name | ||
|
||
def __eq__(self, other: object) -> bool: | ||
# we need to override the base class __eq__ because na_value (NA or NaN) | ||
# cannot be checked with normal `==` | ||
if isinstance(other, str): | ||
if other == self.name: | ||
# TODO should dtype == "string" work for the NaN variant? | ||
if other == "string" or other == self.name: # noqa: PLR1714 | ||
Comment on lines
+186
to
+187
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This |
||
return True | ||
try: | ||
other = self.construct_from_string(other) | ||
|
@@ -220,6 +230,8 @@ def construct_from_string(cls, string) -> Self: | |
) | ||
if string == "string": | ||
return cls() | ||
elif string == "str" and using_string_dtype(): | ||
return cls(na_value=np.nan) | ||
elif string == "string[python]": | ||
return cls(storage="python") | ||
elif string == "string[pyarrow]": | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To discuss in #59342