-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Integer NA Extension Array #21160
Merged
Merged
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
4586245
ENH: add integer-na support via an ExtensionArray
jreback 4faa4c6
update for review comments
jreback 712b52d
update docs of IntegerDtype
jreback 74f392a
review comments
jreback 3889feb
make data & mask private attributes
jreback e5b8641
add dtype to to_integer_array
jreback d073e57
remove uneeded code & copies
jreback 2f08181
handle numpy scalars & more tests
jreback e6533dd
clean up / test astype
jreback 35a8738
fix up dtype comparison tests
jreback 68efb02
fixup quotes in interval index error messages
jreback c9e8f7d
some optimization on dtype checking
jreback ec2c632
don't force repr on invalid dtype
jreback 953de12
remove uneeded try/catch; review comments
jreback e74d10b
only allow safe casting
jreback 23afee1
review comments
jreback 86362f6
xfail reduce ops
jreback 1bdeb18
better type checking for extension types
jreback 8885835
use a better testing idiom
jreback 3dbb378
Merge branch 'master' into intna
jreback 1606786
interval index compat
jreback e9e0937
Merge branch 'master' into intna
jreback 4f04f90
Merge branch 'master' into intna
jreback File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,7 @@ v0.24.0 (Month XX, 2018) | |
New features | ||
~~~~~~~~~~~~ | ||
|
||
|
||
- ``ExcelWriter`` now accepts ``mode`` as a keyword argument, enabling append to existing workbooks when using the ``openpyxl`` engine (:issue:`3441`) | ||
|
||
.. _whatsnew_0240.enhancements.extension_array_operators: | ||
|
@@ -31,6 +32,62 @@ See the :ref:`ExtensionArray Operator Support | |
<extending.extension.operator>` documentation section for details on both | ||
ways of adding operator support. | ||
|
||
.. _whatsnew_0240.enhancements.intna: | ||
|
||
Optional Integer NA Support | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Pandas has gained the ability to hold integer dtypes with missing values. This long requested feature is enabled through the use of :ref:`extension types <extending.extension-types>`. | ||
Here is an example of the usage. | ||
|
||
We can construct a ``Series`` with the specified dtype. The dtype string ``Int64`` is a pandas ``ExtensionDtype``. Specifying a list or array using the traditional missing value | ||
marker of ``np.nan`` will infer to integer dtype. The display of the ``Series`` will also use the ``NaN`` to indicate missing values in string outputs. (:issue:`20700`, :issue:`20747`) | ||
|
||
.. ipython:: python | ||
|
||
s = pd.Series([1, 2, np.nan], dtype='Int64') | ||
s | ||
|
||
|
||
Operations on these dtypes will propagate ``NaN`` as other pandas operations. | ||
|
||
.. ipython:: python | ||
|
||
# arithmetic | ||
s + 1 | ||
|
||
# comparison | ||
s == 1 | ||
|
||
# indexing | ||
s.iloc[1:3] | ||
|
||
# operate with other dtypes | ||
s + s.iloc[1:3].astype('Int8') | ||
|
||
# coerce when needed | ||
s + 0.01 | ||
|
||
These dtypes can operate as part of of ``DataFrame``. | ||
|
||
.. ipython:: python | ||
|
||
df = pd.DataFrame({'A': s, 'B': [1, 1, 3], 'C': list('aab')}) | ||
df | ||
df.dtypes | ||
|
||
|
||
These dtypes can be merged & reshaped & casted. | ||
|
||
.. ipython:: python | ||
|
||
pd.concat([df[['A']], df[['B', 'C']]], axis=1).dtypes | ||
df['A'].astype(float) | ||
|
||
.. warning:: | ||
|
||
The Integer NA support currently uses the captilized dtype version, e.g. ``Int8`` as compared to the traditional ``int8``. This may be changed at a future date. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. captilized -> capitalized? |
||
|
||
.. _whatsnew_0240.enhancements.read_html: | ||
|
||
``read_html`` Enhancements | ||
|
@@ -256,6 +313,7 @@ Previous Behavior: | |
ExtensionType Changes | ||
^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
- ``ExtensionArray`` has gained the abstract methods ``.dropna()`` (:issue:`21185`) | ||
- ``ExtensionDtype`` has gained the ability to instantiate from string dtypes, e.g. ``decimal`` would instantiate a registered ``DecimalDtype``; furthermore | ||
the ``ExtensionDtype`` has gained the method ``construct_array_type`` (:issue:`21185`) | ||
- The ``ExtensionArray`` constructor, ``_from_sequence`` now take the keyword arg ``copy=False`` (:issue:`21185`) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,10 @@ | ||
from .base import (ExtensionArray, # noqa | ||
ExtensionOpsMixin, | ||
ExtensionScalarOpsMixin) | ||
from .categorical import Categorical # noqa | ||
from .datetimes import DatetimeArrayMixin # noqa | ||
from .interval import IntervalArray # noqa | ||
from .period import PeriodArrayMixin # noqa | ||
from .timedeltas import TimedeltaArrayMixin # noqa | ||
from .integer import ( # noqa | ||
IntegerArray, to_integer_array) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What was the goal of exposing |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"of of " -> "of a"