Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/pandas-dev/pandas into depr…
Browse files Browse the repository at this point in the history
…_array_manager
  • Loading branch information
rhshadrach committed Sep 14, 2023
2 parents 99e1bfd + f00efd0 commit 9992b37
Show file tree
Hide file tree
Showing 53 changed files with 694 additions and 196 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/code-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:

steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0

Expand Down Expand Up @@ -109,7 +109,7 @@ jobs:

steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0

Expand Down Expand Up @@ -143,7 +143,7 @@ jobs:
run: docker image prune -f

- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0

Expand All @@ -164,7 +164,7 @@ jobs:

steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
- python

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/comment-commands.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ jobs:

steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docbuild-and-upload.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:

steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0

Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/package-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:

steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0

Expand Down Expand Up @@ -62,7 +62,7 @@ jobs:
cancel-in-progress: true
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0

Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/unit-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ jobs:

steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0

Expand Down Expand Up @@ -194,7 +194,7 @@ jobs:

steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0

Expand Down Expand Up @@ -330,7 +330,7 @@ jobs:
PYTEST_TARGET: pandas

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
fetch-depth: 0

Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ jobs:
sdist_file: ${{ steps.save-path.outputs.sdist_name }}
steps:
- name: Checkout pandas
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0

Expand Down Expand Up @@ -103,7 +103,7 @@ jobs:
IS_SCHEDULE_DISPATCH: ${{ github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' }}
steps:
- name: Checkout pandas
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0

Expand Down
1 change: 1 addition & 0 deletions ci/deps/actions-310.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ dependencies:
- pymysql>=1.0.2
- pyreadstat>=1.1.5
- pytables>=3.7.0
- python-calamine>=0.1.6
- pyxlsb>=1.0.9
- s3fs>=2022.05.0
- scipy>=1.8.1
Expand Down
1 change: 1 addition & 0 deletions ci/deps/actions-311-downstream_compat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ dependencies:
- pymysql>=1.0.2
- pyreadstat>=1.1.5
- pytables>=3.7.0
- python-calamine>=0.1.6
- pyxlsb>=1.0.9
- s3fs>=2022.05.0
- scipy>=1.8.1
Expand Down
1 change: 1 addition & 0 deletions ci/deps/actions-311.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ dependencies:
- pymysql>=1.0.2
- pyreadstat>=1.1.5
# - pytables>=3.7.0, 3.8.0 is first version that supports 3.11
- python-calamine>=0.1.6
- pyxlsb>=1.0.9
- s3fs>=2022.05.0
- scipy>=1.8.1
Expand Down
1 change: 1 addition & 0 deletions ci/deps/actions-39-minimum_versions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ dependencies:
- pymysql=1.0.2
- pyreadstat=1.1.5
- pytables=3.7.0
- python-calamine=0.1.6
- pyxlsb=1.0.9
- s3fs=2022.05.0
- scipy=1.8.1
Expand Down
1 change: 1 addition & 0 deletions ci/deps/actions-39.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ dependencies:
- pymysql>=1.0.2
- pyreadstat>=1.1.5
- pytables>=3.7.0
- python-calamine>=0.1.6
- pyxlsb>=1.0.9
- s3fs>=2022.05.0
- scipy>=1.8.1
Expand Down
1 change: 1 addition & 0 deletions ci/deps/circle-310-arm64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ dependencies:
- pymysql>=1.0.2
# - pyreadstat>=1.1.5 not available on ARM
- pytables>=3.7.0
- python-calamine>=0.1.6
- pyxlsb>=1.0.9
- s3fs>=2022.05.0
- scipy>=1.8.1
Expand Down
1 change: 1 addition & 0 deletions doc/source/getting_started/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,7 @@ xlrd 2.0.1 excel Reading Excel
xlsxwriter 3.0.3 excel Writing Excel
openpyxl 3.0.10 excel Reading / writing for xlsx files
pyxlsb 1.0.9 excel Reading for xlsb files
python-calamine 0.1.6 excel Reading for xls/xlsx/xlsb/ods files
========================= ================== =============== =============================================================

HTML
Expand Down
23 changes: 21 additions & 2 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3453,7 +3453,8 @@ Excel files
The :func:`~pandas.read_excel` method can read Excel 2007+ (``.xlsx``) files
using the ``openpyxl`` Python module. Excel 2003 (``.xls``) files
can be read using ``xlrd``. Binary Excel (``.xlsb``)
files can be read using ``pyxlsb``.
files can be read using ``pyxlsb``. All formats can be read
using :ref:`calamine<io.calamine>` engine.
The :meth:`~DataFrame.to_excel` instance method is used for
saving a ``DataFrame`` to Excel. Generally the semantics are
similar to working with :ref:`csv<io.read_csv_table>` data.
Expand Down Expand Up @@ -3494,6 +3495,9 @@ using internally.

* For the engine odf, pandas is using :func:`odf.opendocument.load` to read in (``.ods``) files.

* For the engine calamine, pandas is using :func:`python_calamine.load_workbook`
to read in (``.xlsx``), (``.xlsm``), (``.xls``), (``.xlsb``), (``.ods``) files.

.. code-block:: python
# Returns a DataFrame
Expand Down Expand Up @@ -3935,7 +3939,8 @@ The :func:`~pandas.read_excel` method can also read binary Excel files
using the ``pyxlsb`` module. The semantics and features for reading
binary Excel files mostly match what can be done for `Excel files`_ using
``engine='pyxlsb'``. ``pyxlsb`` does not recognize datetime types
in files and will return floats instead.
in files and will return floats instead (you can use :ref:`calamine<io.calamine>`
if you need recognize datetime types).

.. code-block:: python
Expand All @@ -3947,6 +3952,20 @@ in files and will return floats instead.
Currently pandas only supports *reading* binary Excel files. Writing
is not implemented.

.. _io.calamine:

Calamine (Excel and ODS files)
------------------------------

The :func:`~pandas.read_excel` method can read Excel file (``.xlsx``, ``.xlsm``, ``.xls``, ``.xlsb``)
and OpenDocument spreadsheets (``.ods``) using the ``python-calamine`` module.
This module is a binding for Rust library `calamine <https://crates.io/crates/calamine>`__
and is faster than other engines in most cases. The optional dependency 'python-calamine' needs to be installed.

.. code-block:: python
# Returns a DataFrame
pd.read_excel("path_to_file.xlsb", engine="calamine")
.. _io.clipboard:

Expand Down
43 changes: 30 additions & 13 deletions doc/source/whatsnew/v0.10.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -180,19 +180,36 @@ labeled the aggregated group with the end of the interval: the next day).
DataFrame constructor with no columns specified. The v0.9.0 behavior (names
``X0``, ``X1``, ...) can be reproduced by specifying ``prefix='X'``:

.. ipython:: python
:okexcept:
import io
data = """
a,b,c
1,Yes,2
3,No,4
"""
print(data)
pd.read_csv(io.StringIO(data), header=None)
pd.read_csv(io.StringIO(data), header=None, prefix="X")
.. code-block:: ipython
In [6]: import io
In [7]: data = """
...: a,b,c
...: 1,Yes,2
...: 3,No,4
...: """
...:
In [8]: print(data)
a,b,c
1,Yes,2
3,No,4
In [9]: pd.read_csv(io.StringIO(data), header=None)
Out[9]:
0 1 2
0 a b c
1 1 Yes 2
2 3 No 4
In [10]: pd.read_csv(io.StringIO(data), header=None, prefix="X")
Out[10]:
X0 X1 X2
0 a b c
1 1 Yes 2
2 3 No 4
- Values like ``'Yes'`` and ``'No'`` are not interpreted as boolean by default,
though this can be controlled by new ``true_values`` and ``false_values``
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ Bug fixes
~~~~~~~~~
- Fixed bug for :class:`ArrowDtype` raising ``NotImplementedError`` for fixed-size list (:issue:`55000`)
- Fixed bug in :meth:`DataFrame.stack` with ``future_stack=True`` and columns a non-:class:`MultiIndex` consisting of tuples (:issue:`54948`)
- Fixed bug in :meth:`Series.dt.tz` with :class:`ArrowDtype` where a string was returned instead of a ``tzinfo`` object (:issue:`55003`)
- Fixed bug in :meth:`Series.pct_change` and :meth:`DataFrame.pct_change` showing unnecessary ``FutureWarning`` (:issue:`54981`)

.. ---------------------------------------------------------------------------
Expand Down
32 changes: 27 additions & 5 deletions doc/source/whatsnew/v2.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,27 @@ including other versions of pandas.
Enhancements
~~~~~~~~~~~~

.. _whatsnew_220.enhancements.enhancement1:
.. _whatsnew_220.enhancements.calamine:

enhancement1
^^^^^^^^^^^^
Calamine engine for :func:`read_excel`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``calamine`` engine was added to :func:`read_excel`.
It uses ``python-calamine``, which provides Python bindings for the Rust library `calamine <https://crates.io/crates/calamine>`__.
This engine supports Excel files (``.xlsx``, ``.xlsm``, ``.xls``, ``.xlsb``) and OpenDocument spreadsheets (``.ods``) (:issue:`50395`).

There are two advantages of this engine:

1. Calamine is often faster than other engines, some benchmarks show results up to 5x faster than 'openpyxl', 20x - 'odf', 4x - 'pyxlsb', and 1.5x - 'xlrd'.
But, 'openpyxl' and 'pyxlsb' are faster in reading a few rows from large files because of lazy iteration over rows.
2. Calamine supports the recognition of datetime in ``.xlsb`` files, unlike 'pyxlsb' which is the only other engine in pandas that can read ``.xlsb`` files.

.. code-block:: python
pd.read_excel("path_to_file.xlsb", engine="calamine")
For more, see :ref:`io.calamine` in the user guide on IO tools.

.. _whatsnew_220.enhancements.enhancement2:

Expand All @@ -28,7 +45,7 @@ enhancement2

Other enhancements
^^^^^^^^^^^^^^^^^^
-
- DataFrame.apply now allows the usage of numba (via ``engine="numba"``) to JIT compile the passed function, allowing for potential speedups (:issue:`54666`)
-

.. ---------------------------------------------------------------------------
Expand Down Expand Up @@ -159,9 +176,12 @@ Deprecations

Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Performance improvement in :func:`concat` with ``axis=1`` and objects with unaligned indexes (:issue:`55084`)
- Performance improvement in :func:`to_dict` on converting DataFrame to dictionary (:issue:`50990`)
- Performance improvement in :meth:`DataFrame.sort_index` and :meth:`Series.sort_index` when indexed by a :class:`MultiIndex` (:issue:`54835`)
- Performance improvement in :meth:`Index.difference` (:issue:`55108`)
- Performance improvement when indexing with more than 4 keys (:issue:`54550`)
-

.. ---------------------------------------------------------------------------
.. _whatsnew_220.bug_fixes:
Expand All @@ -170,6 +190,7 @@ Bug fixes
~~~~~~~~~
- Bug in :class:`AbstractHolidayCalendar` where timezone data was not propagated when computing holiday observances (:issue:`54580`)
- Bug in :class:`pandas.core.window.Rolling` where duplicate datetimelike indexes are treated as consecutive rather than equal with ``closed='left'`` and ``closed='neither'`` (:issue:`20712`)
- Bug in :meth:`DataFrame.apply` where passing ``raw=True`` ignored ``args`` passed to the applied function (:issue:`55009`)

Categorical
^^^^^^^^^^^
Expand Down Expand Up @@ -230,6 +251,7 @@ I/O
^^^
- Bug in :func:`read_csv` where ``on_bad_lines="warn"`` would write to ``stderr`` instead of raise a Python warning. This now yields a :class:`.errors.ParserWarning` (:issue:`54296`)
- Bug in :func:`read_excel`, with ``engine="xlrd"`` (``xls`` files) erroring when file contains NaNs/Infs (:issue:`54564`)
- Bug in :func:`to_excel`, with ``OdsWriter`` (``ods`` files) writing boolean/string value (:issue:`54994`)

Period
^^^^^^
Expand All @@ -248,8 +270,8 @@ Groupby/resample/rolling

Reshaping
^^^^^^^^^
- Bug in :func:`concat` ignoring ``sort`` parameter when passed :class:`DatetimeIndex` indexes (:issue:`54769`)
- Bug in :func:`merge` returning columns in incorrect order when left and/or right is empty (:issue:`51929`)
-

Sparse
^^^^^^
Expand Down
3 changes: 2 additions & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ dependencies:
- pymysql>=1.0.2
- pyreadstat>=1.1.5
- pytables>=3.7.0
- python-calamine>=0.1.6
- pyxlsb>=1.0.9
- s3fs>=2022.05.0
- scipy>=1.8.1
Expand Down Expand Up @@ -105,7 +106,7 @@ dependencies:
- ipykernel

# web
- jinja2 # in optional dependencies, but documented here as needed
# - jinja2 # already listed in optional dependencies, but documented here for reference
- markdown
- feedparser
- pyyaml
Expand Down
Loading

0 comments on commit 9992b37

Please sign in to comment.