From feb53467921282706a47b54001f9df7435133d41 Mon Sep 17 00:00:00 2001 From: MeeseeksMachine <39504233+meeseeksmachine@users.noreply.github.com> Date: Fri, 21 Jan 2022 10:35:15 -0800 Subject: [PATCH] Backport PR #45528: DOC: tidy 1.4.0 release notes (#45533) Co-authored-by: Simon Hawkins --- doc/source/whatsnew/v1.4.0.rst | 211 +++++++++++++++++++-------------- 1 file changed, 121 insertions(+), 90 deletions(-) diff --git a/doc/source/whatsnew/v1.4.0.rst b/doc/source/whatsnew/v1.4.0.rst index 234fada5e2ba3..4e5369072e116 100644 --- a/doc/source/whatsnew/v1.4.0.rst +++ b/doc/source/whatsnew/v1.4.0.rst @@ -1,6 +1,6 @@ .. _whatsnew_140: -What's new in 1.4.0 (January ??, 2022) +What's new in 1.4.0 (January 22, 2022) -------------------------------------- These are the changes in pandas 1.4.0. See :ref:`release` for a full changelog @@ -20,7 +20,8 @@ Enhancements Improved warning messages ^^^^^^^^^^^^^^^^^^^^^^^^^ -Previously, warning messages may have pointed to lines within the pandas library. Running the script ``setting_with_copy_warning.py`` +Previously, warning messages may have pointed to lines within the pandas +library. Running the script ``setting_with_copy_warning.py`` .. code-block:: python @@ -34,7 +35,10 @@ with pandas 1.3 resulted in:: .../site-packages/pandas/core/indexing.py:1951: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. -This made it difficult to determine where the warning was being generated from. Now pandas will inspect the call stack, reporting the first line outside of the pandas library that gave rise to the warning. The output of the above script is now:: +This made it difficult to determine where the warning was being generated from. +Now pandas will inspect the call stack, reporting the first line outside of the +pandas library that gave rise to the warning. The output of the above script is +now:: setting_with_copy_warning.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. @@ -47,8 +51,9 @@ This made it difficult to determine where the warning was being generated from. Index can hold arbitrary ExtensionArrays ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Until now, passing a custom :class:`ExtensionArray` to ``pd.Index`` would cast the -array to ``object`` dtype. Now :class:`Index` can directly hold arbitrary ExtensionArrays (:issue:`43930`). +Until now, passing a custom :class:`ExtensionArray` to ``pd.Index`` would cast +the array to ``object`` dtype. Now :class:`Index` can directly hold arbitrary +ExtensionArrays (:issue:`43930`). *Previous behavior*: @@ -89,38 +94,43 @@ Styler - The new method :meth:`.Styler.hide` deprecates :meth:`.Styler.hide_index` and :meth:`.Styler.hide_columns` (:issue:`43758`) - The keyword arguments ``level`` and ``names`` have been added to :meth:`.Styler.hide` (and implicitly to the deprecated methods :meth:`.Styler.hide_index` and :meth:`.Styler.hide_columns`) for additional control of visibility of MultiIndexes and of index names (:issue:`25475`, :issue:`43404`, :issue:`43346`) - The :meth:`.Styler.export` and :meth:`.Styler.use` have been updated to address all of the added functionality from v1.2.0 and v1.3.0 (:issue:`40675`) - - Global options under the category ``pd.options.styler`` have been extended to configure default ``Styler`` properties which address formatting, encoding, and HTML and LaTeX rendering. Note that formerly ``Styler`` relied on ``display.html.use_mathjax``, which has now been replaced by ``styler.html.mathjax``. (:issue:`41395`) + - Global options under the category ``pd.options.styler`` have been extended to configure default ``Styler`` properties which address formatting, encoding, and HTML and LaTeX rendering. Note that formerly ``Styler`` relied on ``display.html.use_mathjax``, which has now been replaced by ``styler.html.mathjax`` (:issue:`41395`) - Validation of certain keyword arguments, e.g. ``caption`` (:issue:`43368`) - Various bug fixes as recorded below Additionally there are specific enhancements to the HTML specific rendering: - - :meth:`.Styler.bar` introduces additional arguments to control alignment and display (:issue:`26070`, :issue:`36419`), and it also validates the input arguments ``width`` and ``height`` (:issue:`42511`). - - :meth:`.Styler.to_html` introduces keyword arguments ``sparse_index``, ``sparse_columns``, ``bold_headers``, ``caption``, ``max_rows`` and ``max_columns`` (:issue:`41946`, :issue:`43149`, :issue:`42972`). + - :meth:`.Styler.bar` introduces additional arguments to control alignment and display (:issue:`26070`, :issue:`36419`), and it also validates the input arguments ``width`` and ``height`` (:issue:`42511`) + - :meth:`.Styler.to_html` introduces keyword arguments ``sparse_index``, ``sparse_columns``, ``bold_headers``, ``caption``, ``max_rows`` and ``max_columns`` (:issue:`41946`, :issue:`43149`, :issue:`42972`) - :meth:`.Styler.to_html` omits CSSStyle rules for hidden table elements as a performance enhancement (:issue:`43619`) - Custom CSS classes can now be directly specified without string replacement (:issue:`43686`) - Ability to render hyperlinks automatically via a new ``hyperlinks`` formatting keyword argument (:issue:`45058`) There are also some LaTeX specific enhancements: - - :meth:`.Styler.to_latex` introduces keyword argument ``environment``, which also allows a specific "longtable" entry through a separate jinja2 template (:issue:`41866`). + - :meth:`.Styler.to_latex` introduces keyword argument ``environment``, which also allows a specific "longtable" entry through a separate jinja2 template (:issue:`41866`) - Naive sparsification is now possible for LaTeX without the necessity of including the multirow package (:issue:`43369`) - *cline* support has been added for MultiIndex row sparsification through a keyword argument (:issue:`45138`) .. _whatsnew_140.enhancements.pyarrow_csv_engine: -Multithreaded CSV reading with a new CSV Engine based on pyarrow -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Multi-threaded CSV reading with a new CSV Engine based on pyarrow +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -:func:`pandas.read_csv` now accepts ``engine="pyarrow"`` (requires at least ``pyarrow`` 1.0.1) as an argument, allowing for faster csv parsing on multicore machines -with pyarrow installed. See the :doc:`I/O docs ` for more info. (:issue:`23697`, :issue:`43706`) +:func:`pandas.read_csv` now accepts ``engine="pyarrow"`` (requires at least +``pyarrow`` 1.0.1) as an argument, allowing for faster csv parsing on multicore +machines with pyarrow installed. See the :doc:`I/O docs ` for +more info. (:issue:`23697`, :issue:`43706`) .. _whatsnew_140.enhancements.window_rank: Rank function for rolling and expanding windows ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Added ``rank`` function to :class:`Rolling` and :class:`Expanding`. The new function supports the ``method``, ``ascending``, and ``pct`` flags of :meth:`DataFrame.rank`. The ``method`` argument supports ``min``, ``max``, and ``average`` ranking methods. +Added ``rank`` function to :class:`Rolling` and :class:`Expanding`. The new +function supports the ``method``, ``ascending``, and ``pct`` flags of +:meth:`DataFrame.rank`. The ``method`` argument supports ``min``, ``max``, and +``average`` ranking methods. Example: .. ipython:: python @@ -135,10 +145,12 @@ Example: Groupby positional indexing ^^^^^^^^^^^^^^^^^^^^^^^^^^^ -It is now possible to specify positional ranges relative to the ends of each group. +It is now possible to specify positional ranges relative to the ends of each +group. -Negative arguments for :meth:`.GroupBy.head` and :meth:`.GroupBy.tail` now work correctly and result in ranges relative to the end and start of each group, respectively. -Previously, negative arguments returned empty frames. +Negative arguments for :meth:`.GroupBy.head` and :meth:`.GroupBy.tail` now work +correctly and result in ranges relative to the end and start of each group, +respectively. Previously, negative arguments returned empty frames. .. ipython:: python @@ -167,10 +179,11 @@ Previously, negative arguments returned empty frames. DataFrame.from_dict and DataFrame.to_dict have new ``'tight'`` option ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -A new ``'tight'`` dictionary format that preserves :class:`MultiIndex` entries and names -is now available with the :meth:`DataFrame.from_dict` and :meth:`DataFrame.to_dict` methods -and can be used with the standard ``json`` library to produce a tight -representation of :class:`DataFrame` objects (:issue:`4889`). +A new ``'tight'`` dictionary format that preserves :class:`MultiIndex` entries +and names is now available with the :meth:`DataFrame.from_dict` and +:meth:`DataFrame.to_dict` methods and can be used with the standard ``json`` +library to produce a tight representation of :class:`DataFrame` objects +(:issue:`4889`). .. ipython:: python @@ -188,21 +201,21 @@ representation of :class:`DataFrame` objects (:issue:`4889`). Other enhancements ^^^^^^^^^^^^^^^^^^ -- :meth:`concat` will preserve the ``attrs`` when it is the same for all objects and discard the ``attrs`` when they are different. (:issue:`41828`) +- :meth:`concat` will preserve the ``attrs`` when it is the same for all objects and discard the ``attrs`` when they are different (:issue:`41828`) - :class:`DataFrameGroupBy` operations with ``as_index=False`` now correctly retain ``ExtensionDtype`` dtypes for columns being grouped on (:issue:`41373`) - Add support for assigning values to ``by`` argument in :meth:`DataFrame.plot.hist` and :meth:`DataFrame.plot.box` (:issue:`15079`) - :meth:`Series.sample`, :meth:`DataFrame.sample`, and :meth:`.GroupBy.sample` now accept a ``np.random.Generator`` as input to ``random_state``. A generator will be more performant, especially with ``replace=False`` (:issue:`38100`) -- :meth:`Series.ewm`, :meth:`DataFrame.ewm`, now support a ``method`` argument with a ``'table'`` option that performs the windowing operation over an entire :class:`DataFrame`. See :ref:`Window Overview ` for performance and functional benefits (:issue:`42273`) +- :meth:`Series.ewm` and :meth:`DataFrame.ewm` now support a ``method`` argument with a ``'table'`` option that performs the windowing operation over an entire :class:`DataFrame`. See :ref:`Window Overview ` for performance and functional benefits (:issue:`42273`) - :meth:`.GroupBy.cummin` and :meth:`.GroupBy.cummax` now support the argument ``skipna`` (:issue:`34047`) - :meth:`read_table` now supports the argument ``storage_options`` (:issue:`39167`) -- :meth:`DataFrame.to_stata` and :meth:`StataWriter` now accept the keyword only argument ``value_labels`` to save labels for non-categorical columns +- :meth:`DataFrame.to_stata` and :meth:`StataWriter` now accept the keyword only argument ``value_labels`` to save labels for non-categorical columns (:issue:`38454`) - Methods that relied on hashmap based algos such as :meth:`DataFrameGroupBy.value_counts`, :meth:`DataFrameGroupBy.count` and :func:`factorize` ignored imaginary component for complex numbers (:issue:`17927`) - Add :meth:`Series.str.removeprefix` and :meth:`Series.str.removesuffix` introduced in Python 3.9 to remove pre-/suffixes from string-type :class:`Series` (:issue:`36944`) - Attempting to write into a file in missing parent directory with :meth:`DataFrame.to_csv`, :meth:`DataFrame.to_html`, :meth:`DataFrame.to_excel`, :meth:`DataFrame.to_feather`, :meth:`DataFrame.to_parquet`, :meth:`DataFrame.to_stata`, :meth:`DataFrame.to_json`, :meth:`DataFrame.to_pickle`, and :meth:`DataFrame.to_xml` now explicitly mentions missing parent directory, the same is true for :class:`Series` counterparts (:issue:`24306`) - Indexing with ``.loc`` and ``.iloc`` now supports ``Ellipsis`` (:issue:`37750`) - :meth:`IntegerArray.all` , :meth:`IntegerArray.any`, :meth:`FloatingArray.any`, and :meth:`FloatingArray.all` use Kleene logic (:issue:`41967`) - Added support for nullable boolean and integer types in :meth:`DataFrame.to_stata`, :class:`~pandas.io.stata.StataWriter`, :class:`~pandas.io.stata.StataWriter117`, and :class:`~pandas.io.stata.StataWriterUTF8` (:issue:`40855`) -- :meth:`DataFrame.__pos__`, :meth:`DataFrame.__neg__` now retain ``ExtensionDtype`` dtypes (:issue:`43883`) +- :meth:`DataFrame.__pos__` and :meth:`DataFrame.__neg__` now retain ``ExtensionDtype`` dtypes (:issue:`43883`) - The error raised when an optional dependency can't be imported now includes the original exception, for easier investigation (:issue:`43882`) - Added :meth:`.ExponentialMovingWindow.sum` (:issue:`13297`) - :meth:`Series.str.split` now supports a ``regex`` argument that explicitly specifies whether the pattern is a regular expression. Default is ``None`` (:issue:`43563`, :issue:`32835`, :issue:`25549`) @@ -211,19 +224,19 @@ Other enhancements - :func:`read_csv` now accepts a ``callable`` function in ``on_bad_lines`` when ``engine="python"`` for custom handling of bad lines (:issue:`5686`) - :class:`ExcelWriter` argument ``if_sheet_exists="overlay"`` option added (:issue:`40231`) - :meth:`read_excel` now accepts a ``decimal`` argument that allow the user to specify the decimal point when parsing string columns to numeric (:issue:`14403`) -- :meth:`.GroupBy.mean`, :meth:`.GroupBy.std`, :meth:`.GroupBy.var`, :meth:`.GroupBy.sum` now supports `Numba `_ execution with the ``engine`` keyword (:issue:`43731`, :issue:`44862`, :issue:`44939`) -- :meth:`Timestamp.isoformat`, now handles the ``timespec`` argument from the base :class:``datetime`` class (:issue:`26131`) +- :meth:`.GroupBy.mean`, :meth:`.GroupBy.std`, :meth:`.GroupBy.var`, and :meth:`.GroupBy.sum` now support `Numba `_ execution with the ``engine`` keyword (:issue:`43731`, :issue:`44862`, :issue:`44939`) +- :meth:`Timestamp.isoformat` now handles the ``timespec`` argument from the base ``datetime`` class (:issue:`26131`) - :meth:`NaT.to_numpy` ``dtype`` argument is now respected, so ``np.timedelta64`` can be returned (:issue:`44460`) - New option ``display.max_dir_items`` customizes the number of columns added to :meth:`Dataframe.__dir__` and suggested for tab completion (:issue:`37996`) -- Added "Juneteenth National Independence Day" to ``USFederalHolidayCalendar``. See also `Other API changes`_. -- :meth:`.Rolling.var`, :meth:`.Expanding.var`, :meth:`.Rolling.std`, :meth:`.Expanding.std` now support `Numba `_ execution with the ``engine`` keyword (:issue:`44461`) +- Added "Juneteenth National Independence Day" to ``USFederalHolidayCalendar`` (:issue:`44574`) +- :meth:`.Rolling.var`, :meth:`.Expanding.var`, :meth:`.Rolling.std`, and :meth:`.Expanding.std` now support `Numba `_ execution with the ``engine`` keyword (:issue:`44461`) - :meth:`Series.info` has been added, for compatibility with :meth:`DataFrame.info` (:issue:`5167`) -- Implemented :meth:`IntervalArray.min`, :meth:`IntervalArray.max`, as a result of which ``min`` and ``max`` now work for :class:`IntervalIndex`, :class:`Series` and :class:`DataFrame` with ``IntervalDtype`` (:issue:`44746`) +- Implemented :meth:`IntervalArray.min` and :meth:`IntervalArray.max`, as a result of which ``min`` and ``max`` now work for :class:`IntervalIndex`, :class:`Series` and :class:`DataFrame` with ``IntervalDtype`` (:issue:`44746`) - :meth:`UInt64Index.map` now retains ``dtype`` where possible (:issue:`44609`) - :meth:`read_json` can now parse unsigned long long integers (:issue:`26068`) - :meth:`DataFrame.take` now raises a ``TypeError`` when passed a scalar for the indexer (:issue:`42875`) - :meth:`is_list_like` now identifies duck-arrays as list-like unless ``.ndim == 0`` (:issue:`35131`) -- :class:`ExtensionDtype` and :class:`ExtensionArray` are now (de)serialized when exporting a :class:`DataFrame` with :meth:`DataFrame.to_json` using ``orient='table'`` (:issue:`20612`, :issue:`44705`). +- :class:`ExtensionDtype` and :class:`ExtensionArray` are now (de)serialized when exporting a :class:`DataFrame` with :meth:`DataFrame.to_json` using ``orient='table'`` (:issue:`20612`, :issue:`44705`) - Add support for `Zstandard `_ compression to :meth:`DataFrame.to_pickle`/:meth:`read_pickle` and friends (:issue:`43925`) - :meth:`DataFrame.to_sql` now returns an ``int`` of the number of written rows (:issue:`23998`) @@ -241,15 +254,17 @@ These are bug fixes that might have notable behavior changes. Inconsistent date string parsing ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The ``dayfirst`` option of :func:`to_datetime` isn't strict, and this can lead to surprising behaviour: +The ``dayfirst`` option of :func:`to_datetime` isn't strict, and this can lead +to surprising behavior: .. ipython:: python :okwarning: pd.to_datetime(["31-12-2021"], dayfirst=False) -Now, a warning will be raised if a date string cannot be parsed accordance to the given ``dayfirst`` value when -the value is a delimited date string (e.g. ``31-12-2012``). +Now, a warning will be raised if a date string cannot be parsed accordance to +the given ``dayfirst`` value when the value is a delimited date string (e.g. +``31-12-2012``). .. _whatsnew_140.notable_bug_fixes.concat_with_empty_or_all_na: @@ -257,8 +272,9 @@ Ignoring dtypes in concat with empty or all-NA columns ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When using :func:`concat` to concatenate two or more :class:`DataFrame` objects, -if one of the DataFrames was empty or had all-NA values, its dtype was *sometimes* -ignored when finding the concatenated dtype. These are now consistently *not* ignored (:issue:`43507`). +if one of the DataFrames was empty or had all-NA values, its dtype was +*sometimes* ignored when finding the concatenated dtype. These are now +consistently *not* ignored (:issue:`43507`). .. ipython:: python @@ -266,7 +282,9 @@ ignored when finding the concatenated dtype. These are now consistently *not* i df2 = pd.DataFrame({"bar": np.nan}, index=range(1, 2)) res = pd.concat([df1, df2]) -Previously, the float-dtype in ``df2`` would be ignored so the result dtype would be ``datetime64[ns]``. As a result, the ``np.nan`` would be cast to ``NaT``. +Previously, the float-dtype in ``df2`` would be ignored so the result dtype +would be ``datetime64[ns]``. As a result, the ``np.nan`` would be cast to +``NaT``. *Previous behavior*: @@ -278,7 +296,8 @@ Previously, the float-dtype in ``df2`` would be ignored so the result dtype woul 0 2013-01-01 1 NaT -Now the float-dtype is respected. Since the common dtype for these DataFrames is object, the ``np.nan`` is retained. +Now the float-dtype is respected. Since the common dtype for these DataFrames is +object, the ``np.nan`` is retained. *New behavior*: @@ -291,7 +310,10 @@ Now the float-dtype is respected. Since the common dtype for these DataFrames is Null-values are no longer coerced to NaN-value in value_counts and mode ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -:meth:`Series.value_counts` and :meth:`Series.mode` no longer coerce ``None``, ``NaT`` and other null-values to a NaN-value for ``np.object``-dtype. This behavior is now consistent with ``unique``, ``isin`` and others (:issue:`42688`). +:meth:`Series.value_counts` and :meth:`Series.mode` no longer coerce ``None``, +``NaT`` and other null-values to a NaN-value for ``np.object``-dtype. This +behavior is now consistent with ``unique``, ``isin`` and others +(:issue:`42688`). .. ipython:: python @@ -323,8 +345,9 @@ Now null-values are no longer mangled. mangle_dupe_cols in read_csv no longer renaming unique columns conflicting with target names ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -:func:`read_csv` no longer renaming unique cols, which conflict with the target names of duplicated columns. -Already existing columns are jumped, e.g. the next available index is used for the target column name (:issue:`14704`). +:func:`read_csv` no longer renaming unique cols, which conflict with the target +names of duplicated columns. Already existing columns are jumped, e.g. the next +available index is used for the target column name (:issue:`14704`). .. ipython:: python @@ -333,7 +356,8 @@ Already existing columns are jumped, e.g. the next available index is used for t data = "a,a,a.1\n1,2,3" res = pd.read_csv(io.StringIO(data)) -Previously, the second column was called ``a.1``, while the third col was also renamed to ``a.1.1``. +Previously, the second column was called ``a.1``, while the third column was +also renamed to ``a.1.1``. *Previous behavior*: @@ -344,8 +368,9 @@ Previously, the second column was called ``a.1``, while the third col was also r a a.1 a.1.1 0 1 2 3 -Now the renaming checks if ``a.1`` already exists when changing the name of the second column and jumps this index. The -second column is instead renamed to ``a.2``. +Now the renaming checks if ``a.1`` already exists when changing the name of the +second column and jumps this index. The second column is instead renamed to +``a.2``. *New behavior*: @@ -358,9 +383,10 @@ second column is instead renamed to ``a.2``. unstack and pivot_table no longer raises ValueError for result that would exceed int32 limit ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Previously :meth:`DataFrame.pivot_table` and :meth:`DataFrame.unstack` would raise a ``ValueError`` if the operation -could produce a result with more than ``2**31 - 1`` elements. This operation now raises a :class:`errors.PerformanceWarning` -instead (:issue:`26314`). +Previously :meth:`DataFrame.pivot_table` and :meth:`DataFrame.unstack` would +raise a ``ValueError`` if the operation could produce a result with more than +``2**31 - 1`` elements. This operation now raises a +:class:`errors.PerformanceWarning` instead (:issue:`26314`). *Previous behavior*: @@ -386,14 +412,13 @@ groupby.apply consistent transform detection :meth:`.GroupBy.apply` is designed to be flexible, allowing users to perform aggregations, transformations, filters, and use it with user-defined functions -that might not fall into any of these categories. As part of this, apply -will attempt to detect when an operation is a transform, and in such a -case, the result will have the same index as the input. In order to -determine if the operation is a transform, pandas compares the -input's index to the result's and determines if it has been mutated. -Previously in pandas 1.3, different code paths used different definitions -of "mutated": some would use Python's ``is`` whereas others would test -only up to equality. +that might not fall into any of these categories. As part of this, apply will +attempt to detect when an operation is a transform, and in such a case, the +result will have the same index as the input. In order to determine if the +operation is a transform, pandas compares the input's index to the result's and +determines if it has been mutated. Previously in pandas 1.3, different code +paths used different definitions of "mutated": some would use Python's ``is`` +whereas others would test only up to equality. This inconsistency has been removed, pandas now tests up to equality. @@ -423,10 +448,10 @@ This inconsistency has been removed, pandas now tests up to equality. 1 3 5 2 4 6 -In the examples above, the first uses a code path where pandas uses -``is`` and determines that ``func`` is not a transform whereas the second -tests up to equality and determines that ``func`` is a transform. In the -first case, the result's index is not the same as the input's. +In the examples above, the first uses a code path where pandas uses ``is`` and +determines that ``func`` is not a transform whereas the second tests up to +equality and determines that ``func`` is a transform. In the first case, the +result's index is not the same as the input's. *New behavior*: @@ -435,8 +460,8 @@ first case, the result's index is not the same as the input's. df.groupby(['a']).apply(func) df.set_index(['a', 'b']).groupby(['a']).apply(func) -Now in both cases it is determined that ``func`` is a transform. In each case, the -result has the same index as the input. +Now in both cases it is determined that ``func`` is a transform. In each case, +the result has the same index as the input. .. _whatsnew_140.api_breaking: @@ -475,9 +500,12 @@ If installed, we now require: | mypy (dev) | 0.930 | | X | +-----------------+-----------------+----------+---------+ -For `optional libraries `_ the general recommendation is to use the latest version. -The following table lists the lowest version per library that is currently being tested throughout the development of pandas. -Optional libraries below the lowest tested version may still work, but are not considered supported. +For `optional libraries +`_ the general +recommendation is to use the latest version. The following table lists the +lowest version per library that is currently being tested throughout the +development of pandas. Optional libraries below the lowest tested version may +still work, but are not considered supported. +-----------------+-----------------+---------+ | Package | Minimum Version | Changed | @@ -559,11 +587,13 @@ Deprecations Deprecated Int64Index, UInt64Index & Float64Index ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -:class:`Int64Index`, :class:`UInt64Index` and :class:`Float64Index` have been deprecated -in favor of the base :class:`Index` class and will be removed in Pandas 2.0 (:issue:`43028`). +:class:`Int64Index`, :class:`UInt64Index` and :class:`Float64Index` have been +deprecated in favor of the base :class:`Index` class and will be removed in +Pandas 2.0 (:issue:`43028`). -For constructing a numeric index, you can use the base :class:`Index` class instead -specifying the data type (which will also work on older pandas releases): +For constructing a numeric index, you can use the base :class:`Index` class +instead specifying the data type (which will also work on older pandas +releases): .. code-block:: python @@ -582,9 +612,10 @@ checks with checking the ``dtype``: # with idx.dtype == "int64" -Currently, in order to maintain backward compatibility, calls to -:class:`Index` will continue to return :class:`Int64Index`, :class:`UInt64Index` and :class:`Float64Index` -when given numeric data, but in the future, an :class:`Index` will be returned. +Currently, in order to maintain backward compatibility, calls to :class:`Index` +will continue to return :class:`Int64Index`, :class:`UInt64Index` and +:class:`Float64Index` when given numeric data, but in the future, an +:class:`Index` will be returned. *Current behavior*: @@ -610,8 +641,8 @@ when given numeric data, but in the future, an :class:`Index` will be returned. Deprecated Frame.append and Series.append ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -:meth:`DataFrame.append` and :meth:`Series.append` have been deprecated and will be removed in Pandas 2.0. -Use :func:`pandas.concat` instead (:issue:`35407`). +:meth:`DataFrame.append` and :meth:`Series.append` have been deprecated and will +be removed in Pandas 2.0. Use :func:`pandas.concat` instead (:issue:`35407`). *Deprecated syntax* @@ -661,11 +692,11 @@ Other Deprecations - Deprecated the 'kind' argument in :meth:`Index.get_slice_bound`, :meth:`Index.slice_indexer`, :meth:`Index.slice_locs`; in a future version passing 'kind' will raise (:issue:`42857`) - Deprecated dropping of nuisance columns in :class:`Rolling`, :class:`Expanding`, and :class:`EWM` aggregations (:issue:`42738`) - Deprecated :meth:`Index.reindex` with a non-unique index (:issue:`42568`) -- Deprecated :meth:`.Styler.render` in favour of :meth:`.Styler.to_html` (:issue:`42140`) -- Deprecated :meth:`.Styler.hide_index` and :meth:`.Styler.hide_columns` in favour of :meth:`.Styler.hide` (:issue:`43758`) +- Deprecated :meth:`.Styler.render` in favor of :meth:`.Styler.to_html` (:issue:`42140`) +- Deprecated :meth:`.Styler.hide_index` and :meth:`.Styler.hide_columns` in favor of :meth:`.Styler.hide` (:issue:`43758`) - Deprecated passing in a string column label into ``times`` in :meth:`DataFrame.ewm` (:issue:`43265`) - Deprecated the 'include_start' and 'include_end' arguments in :meth:`DataFrame.between_time`; in a future version passing 'include_start' or 'include_end' will raise (:issue:`40245`) -- Deprecated the ``squeeze`` argument to :meth:`read_csv`, :meth:`read_table`, and :meth:`read_excel`. Users should squeeze the DataFrame afterwards with ``.squeeze("columns")`` instead. (:issue:`43242`) +- Deprecated the ``squeeze`` argument to :meth:`read_csv`, :meth:`read_table`, and :meth:`read_excel`. Users should squeeze the DataFrame afterwards with ``.squeeze("columns")`` instead (:issue:`43242`) - Deprecated the ``index`` argument to :class:`SparseArray` construction (:issue:`23089`) - Deprecated the ``closed`` argument in :meth:`date_range` and :meth:`bdate_range` in favor of ``inclusive`` argument; In a future version passing ``closed`` will raise (:issue:`40245`) - Deprecated :meth:`.Rolling.validate`, :meth:`.Expanding.validate`, and :meth:`.ExponentialMovingWindow.validate` (:issue:`43665`) @@ -793,12 +824,12 @@ Datetimelike Timedelta ^^^^^^^^^ -- Bug in division of all-``NaT`` :class:`TimeDeltaIndex`, :class:`Series` or :class:`DataFrame` column with object-dtype arraylike of numbers failing to infer the result as timedelta64-dtype (:issue:`39750`) +- Bug in division of all-``NaT`` :class:`TimeDeltaIndex`, :class:`Series` or :class:`DataFrame` column with object-dtype array like of numbers failing to infer the result as timedelta64-dtype (:issue:`39750`) - Bug in floor division of ``timedelta64[ns]`` data with a scalar returning garbage values (:issue:`44466`) - Bug in :class:`Timedelta` now properly taking into account any nanoseconds contribution of any kwarg (:issue:`43764`, :issue:`45227`) -Timezones -^^^^^^^^^ +Time Zones +^^^^^^^^^^ - Bug in :func:`to_datetime` with ``infer_datetime_format=True`` failing to parse zero UTC offset (``Z``) correctly (:issue:`41047`) - Bug in :meth:`Series.dt.tz_convert` resetting index in a :class:`Series` with :class:`CategoricalIndex` (:issue:`43080`) - Bug in ``Timestamp`` and ``DatetimeIndex`` incorrectly raising a ``TypeError`` when subtracting two timezone-aware objects with mismatched timezones (:issue:`31793`) @@ -817,7 +848,7 @@ Numeric Conversion ^^^^^^^^^^ -- Bug in :class:`UInt64Index` constructor when passing a list containing both positive integers small enough to cast to int64 and integers too large too hold in int64 (:issue:`42201`) +- Bug in :class:`UInt64Index` constructor when passing a list containing both positive integers small enough to cast to int64 and integers too large to hold in int64 (:issue:`42201`) - Bug in :class:`Series` constructor returning 0 for missing values with dtype ``int64`` and ``False`` for dtype ``bool`` (:issue:`43017`, :issue:`43018`) - Bug in constructing a :class:`DataFrame` from a :class:`PandasArray` containing :class:`Series` objects behaving differently than an equivalent ``np.ndarray`` (:issue:`43986`) - Bug in :class:`IntegerDtype` not allowing coercion from string dtype (:issue:`25472`) @@ -838,7 +869,7 @@ Interval Indexing ^^^^^^^^ -- Bug in :meth:`Series.rename` when index in Series is MultiIndex and level in rename is provided. (:issue:`43659`) +- Bug in :meth:`Series.rename` when index in Series is MultiIndex and level in rename is provided (:issue:`43659`) - Bug in :meth:`DataFrame.truncate` and :meth:`Series.truncate` when the object's Index has a length greater than one but only one unique value (:issue:`42365`) - Bug in :meth:`Series.loc` and :meth:`DataFrame.loc` with a :class:`MultiIndex` when indexing with a tuple in which one of the levels is also a tuple (:issue:`27591`) - Bug in :meth:`Series.loc` when with a :class:`MultiIndex` whose first level contains only ``np.nan`` values (:issue:`42055`) @@ -849,7 +880,7 @@ Indexing - Bug in :meth:`Index.get_indexer_non_unique` when index contains multiple ``np.nan`` (:issue:`35392`) - Bug in :meth:`DataFrame.query` did not handle the degree sign in a backticked column name, such as \`Temp(°C)\`, used in an expression to query a dataframe (:issue:`42826`) - Bug in :meth:`DataFrame.drop` where the error message did not show missing labels with commas when raising ``KeyError`` (:issue:`42881`) -- Bug in :meth:`DataFrame.query` where method calls in query strings led to errors when the ``numexpr`` package was installed. (:issue:`22435`) +- Bug in :meth:`DataFrame.query` where method calls in query strings led to errors when the ``numexpr`` package was installed (:issue:`22435`) - Bug in :meth:`DataFrame.nlargest` and :meth:`Series.nlargest` where sorted result did not count indexes containing ``np.nan`` (:issue:`28984`) - Bug in indexing on a non-unique object-dtype :class:`Index` with an NA scalar (e.g. ``np.nan``) (:issue:`43711`) - Bug in :meth:`DataFrame.__setitem__` incorrectly writing into an existing column's array rather than setting a new array when the new dtype and the old dtype match (:issue:`43406`) @@ -881,7 +912,7 @@ Missing - Bug in :meth:`DataFrame.fillna` not replacing missing values when using a dict-like ``value`` and duplicate column names (:issue:`43476`) - Bug in constructing a :class:`DataFrame` with a dictionary ``np.datetime64`` as a value and ``dtype='timedelta64[ns]'``, or vice-versa, incorrectly casting instead of raising (:issue:`44428`) - Bug in :meth:`Series.interpolate` and :meth:`DataFrame.interpolate` with ``inplace=True`` not writing to the underlying array(s) in-place (:issue:`44749`) -- Bug in :meth:`Index.fillna` incorrectly returning an un-filled :class:`Index` when NA values are present and ``downcast`` argument is specified. This now raises ``NotImplementedError`` instead; do not pass ``downcast`` argument (:issue:`44873`) +- Bug in :meth:`Index.fillna` incorrectly returning an unfilled :class:`Index` when NA values are present and ``downcast`` argument is specified. This now raises ``NotImplementedError`` instead; do not pass ``downcast`` argument (:issue:`44873`) - Bug in :meth:`DataFrame.dropna` changing :class:`Index` even if no entries were dropped (:issue:`41965`) - Bug in :meth:`Series.fillna` with an object-dtype incorrectly ignoring ``downcast="infer"`` (:issue:`44241`) @@ -900,7 +931,7 @@ I/O - Bug in :func:`json_normalize` where ``errors=ignore`` could fail to ignore missing values of ``meta`` when ``record_path`` has a length greater than one (:issue:`41876`) - Bug in :func:`read_csv` with multi-header input and arguments referencing column names as tuples (:issue:`42446`) - Bug in :func:`read_fwf`, where difference in lengths of ``colspecs`` and ``names`` was not raising ``ValueError`` (:issue:`40830`) -- Bug in :func:`Series.to_json` and :func:`DataFrame.to_json` where some attributes were skipped when serialising plain Python objects to JSON (:issue:`42768`, :issue:`33043`) +- Bug in :func:`Series.to_json` and :func:`DataFrame.to_json` where some attributes were skipped when serializing plain Python objects to JSON (:issue:`42768`, :issue:`33043`) - Column headers are dropped when constructing a :class:`DataFrame` from a sqlalchemy's ``Row`` object (:issue:`40682`) - Bug in unpickling a :class:`Index` with object dtype incorrectly inferring numeric dtypes (:issue:`43188`) - Bug in :func:`read_csv` where reading multi-header input with unequal lengths incorrectly raising uncontrolled ``IndexError`` (:issue:`43102`) @@ -914,7 +945,7 @@ I/O - Bug in :func:`read_csv` used second row to guess implicit index if ``header`` was set to ``None`` for ``engine="python"`` (:issue:`22144`) - Bug in :func:`read_csv` not recognizing bad lines when ``names`` were given for ``engine="c"`` (:issue:`22144`) - Bug in :func:`read_csv` with :code:`float_precision="round_trip"` which did not skip initial/trailing whitespace (:issue:`43713`) -- Bug when Python is built without lzma module: a warning was raised at the pandas import time, even if the lzma capability isn't used. (:issue:`43495`) +- Bug when Python is built without the lzma module: a warning was raised at the pandas import time, even if the lzma capability isn't used (:issue:`43495`) - Bug in :func:`read_csv` not applying dtype for ``index_col`` (:issue:`9435`) - Bug in dumping/loading a :class:`DataFrame` with ``yaml.dump(frame)`` (:issue:`42748`) - Bug in :func:`read_csv` raising ``ValueError`` when names was longer than header but equal to data rows for ``engine="python"`` (:issue:`38453`) @@ -926,7 +957,7 @@ I/O - Bug in :func:`read_csv` not replacing ``NaN`` values with ``np.nan`` before attempting date conversion (:issue:`26203`) - Bug in :func:`read_csv` raising ``AttributeError`` when attempting to read a .csv file and infer index column dtype from an nullable integer type (:issue:`44079`) - Bug in :func:`to_csv` always coercing datetime columns with different formats to the same format (:issue:`21734`) -- :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` with ``compression`` set to ``'zip'`` no longer create a zip file containing a file ending with ".zip". Instead, they try to infer the inner file name more smartly. (:issue:`39465`) +- :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` with ``compression`` set to ``'zip'`` no longer create a zip file containing a file ending with ".zip". Instead, they try to infer the inner file name more smartly (:issue:`39465`) - Bug in :func:`read_csv` where reading a mixed column of booleans and missing values to a float type results in the missing values becoming 1.0 rather than NaN (:issue:`42808`, :issue:`34120`) - Bug in :func:`to_xml` raising error for ``pd.NA`` with extension array dtype (:issue:`43903`) - Bug in :func:`read_csv` when passing simultaneously a parser in ``date_parser`` and ``parse_dates=False``, the parsing was still called (:issue:`44366`) @@ -935,7 +966,7 @@ I/O - Bug in :func:`read_csv` when passing a ``tempfile.SpooledTemporaryFile`` opened in binary mode (:issue:`44748`) - Bug in :func:`read_json` raising ``ValueError`` when attempting to parse json strings containing "://" (:issue:`36271`) - Bug in :func:`read_csv` when ``engine="c"`` and ``encoding_errors=None`` which caused a segfault (:issue:`45180`) -- Bug in :func:`read_csv` an invalid value of ``usecols`` leading to an un-closed file handle (:issue:`45384`) +- Bug in :func:`read_csv` an invalid value of ``usecols`` leading to an unclosed file handle (:issue:`45384`) Period ^^^^^^ @@ -947,7 +978,7 @@ Period Plotting ^^^^^^^^ -- When given non-numeric data, :meth:`DataFrame.boxplot` now raises a ``ValueError`` rather than a cryptic ``KeyError`` or ``ZeroDivisionError``, in line with other plotting functions like :meth:`DataFrame.hist`. (:issue:`43480`) +- When given non-numeric data, :meth:`DataFrame.boxplot` now raises a ``ValueError`` rather than a cryptic ``KeyError`` or ``ZeroDivisionError``, in line with other plotting functions like :meth:`DataFrame.hist` (:issue:`43480`) Groupby/resample/rolling ^^^^^^^^^^^^^^^^^^^^^^^^ @@ -981,10 +1012,10 @@ Reshaping ^^^^^^^^^ - Improved error message when creating a :class:`DataFrame` column from a multi-dimensional :class:`numpy.ndarray` (:issue:`42463`) - :func:`concat` creating :class:`MultiIndex` with duplicate level entries when concatenating a :class:`DataFrame` with duplicates in :class:`Index` and multiple keys (:issue:`42651`) -- Bug in :meth:`pandas.cut` on :class:`Series` with duplicate indices (:issue:`42185`) and non-exact :meth:`pandas.CategoricalIndex` (:issue:`42425`) +- Bug in :meth:`pandas.cut` on :class:`Series` with duplicate indices and non-exact :meth:`pandas.CategoricalIndex` (:issue:`42185`, :issue:`42425`) - Bug in :meth:`DataFrame.append` failing to retain dtypes when appended columns do not match (:issue:`43392`) - Bug in :func:`concat` of ``bool`` and ``boolean`` dtypes resulting in ``object`` dtype instead of ``boolean`` dtype (:issue:`42800`) -- Bug in :func:`crosstab` when inputs are are categorical Series, there are categories that are not present in one or both of the Series, and ``margins=True``. Previously the margin value for missing categories was ``NaN``. It is now correctly reported as 0 (:issue:`43505`) +- Bug in :func:`crosstab` when inputs are categorical Series, there are categories that are not present in one or both of the Series, and ``margins=True``. Previously the margin value for missing categories was ``NaN``. It is now correctly reported as 0 (:issue:`43505`) - Bug in :func:`concat` would fail when the ``objs`` argument all had the same index and the ``keys`` argument contained duplicates (:issue:`43595`) - Bug in :func:`concat` which ignored the ``sort`` parameter (:issue:`43375`) - Fixed bug in :func:`merge` with :class:`MultiIndex` as column index for the ``on`` argument returning an error when assigning a column internally (:issue:`43734`)