Skip to content

Commit

Permalink
Merge branch 'main' into numpy-docstring-timestamp
Browse files Browse the repository at this point in the history
  • Loading branch information
tuhinsharma121 authored May 4, 2024
2 parents a542817 + 02708ed commit c5c24bd
Show file tree
Hide file tree
Showing 30 changed files with 294 additions and 265 deletions.
8 changes: 0 additions & 8 deletions ci/code_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -196,15 +196,10 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
-i "pandas.Series.dt.tz_convert PR01,PR02" \
-i "pandas.Series.dt.tz_localize PR01,PR02" \
-i "pandas.Series.dt.unit GL08" \
-i "pandas.Series.dtype SA01" \
-i "pandas.Series.eq PR07,SA01" \
-i "pandas.Series.floordiv PR07" \
-i "pandas.Series.ge PR07,SA01" \
-i "pandas.Series.gt PR07,SA01" \
-i "pandas.Series.hasnans SA01" \
-i "pandas.Series.is_monotonic_decreasing SA01" \
-i "pandas.Series.is_monotonic_increasing SA01" \
-i "pandas.Series.is_unique SA01" \
-i "pandas.Series.kurt RT03,SA01" \
-i "pandas.Series.kurtosis RT03,SA01" \
-i "pandas.Series.le PR07,SA01" \
Expand Down Expand Up @@ -236,7 +231,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
-i "pandas.Series.rsub PR07" \
-i "pandas.Series.rtruediv PR07" \
-i "pandas.Series.sem PR01,RT03,SA01" \
-i "pandas.Series.shape SA01" \
-i "pandas.Series.skew RT03,SA01" \
-i "pandas.Series.sparse PR01,SA01" \
-i "pandas.Series.sparse.density SA01" \
Expand Down Expand Up @@ -317,8 +311,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
-i "pandas.Timestamp.ctime SA01" \
-i "pandas.Timestamp.date SA01" \
-i "pandas.Timestamp.day GL08" \
-i "pandas.Timestamp.day_of_week SA01" \
-i "pandas.Timestamp.dayofweek SA01" \
-i "pandas.Timestamp.dst SA01" \
-i "pandas.Timestamp.floor SA01" \
-i "pandas.Timestamp.fold GL08" \
Expand Down
7 changes: 3 additions & 4 deletions doc/source/user_guide/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,11 +160,10 @@ Here is a sample (using 100 column x 100,000 row ``DataFrames``):
.. csv-table::
:header: "Operation", "0.11.0 (ms)", "Prior Version (ms)", "Ratio to Prior"
:widths: 25, 25, 25, 25
:delim: ;

``df1 > df2``; 13.32; 125.35; 0.1063
``df1 * df2``; 21.71; 36.63; 0.5928
``df1 + df2``; 22.04; 36.50; 0.6039
``df1 > df2``, 13.32, 125.35, 0.1063
``df1 * df2``, 21.71, 36.63, 0.5928
``df1 + df2``, 22.04, 36.50, 0.6039

You are highly encouraged to install both libraries. See the section
:ref:`Recommended Dependencies <install.recommended_dependencies>` for more installation info.
Expand Down
15 changes: 2 additions & 13 deletions doc/source/user_guide/gotchas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -315,19 +315,8 @@ Why not make NumPy like R?

Many people have suggested that NumPy should simply emulate the ``NA`` support
present in the more domain-specific statistical programming language `R
<https://www.r-project.org/>`__. Part of the reason is the NumPy type hierarchy:

.. csv-table::
:header: "Typeclass","Dtypes"
:widths: 30,70
:delim: |

``numpy.floating`` | ``float16, float32, float64, float128``
``numpy.integer`` | ``int8, int16, int32, int64``
``numpy.unsignedinteger`` | ``uint8, uint16, uint32, uint64``
``numpy.object_`` | ``object_``
``numpy.bool_`` | ``bool_``
``numpy.character`` | ``bytes_, str_``
<https://www.r-project.org/>`__. Part of the reason is the
`NumPy type hierarchy <https://numpy.org/doc/stable/user/basics.types.html>`__.

The R language, by contrast, only has a handful of built-in data types:
``integer``, ``numeric`` (floating-point), ``character``, and
Expand Down
77 changes: 37 additions & 40 deletions doc/source/user_guide/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -506,29 +506,28 @@ listed below, those with a ``*`` do *not* have an efficient, GroupBy-specific, i
.. csv-table::
:header: "Method", "Description"
:widths: 20, 80
:delim: ;

:meth:`~.DataFrameGroupBy.any`;Compute whether any of the values in the groups are truthy
:meth:`~.DataFrameGroupBy.all`;Compute whether all of the values in the groups are truthy
:meth:`~.DataFrameGroupBy.count`;Compute the number of non-NA values in the groups
:meth:`~.DataFrameGroupBy.cov` * ;Compute the covariance of the groups
:meth:`~.DataFrameGroupBy.first`;Compute the first occurring value in each group
:meth:`~.DataFrameGroupBy.idxmax`;Compute the index of the maximum value in each group
:meth:`~.DataFrameGroupBy.idxmin`;Compute the index of the minimum value in each group
:meth:`~.DataFrameGroupBy.last`;Compute the last occurring value in each group
:meth:`~.DataFrameGroupBy.max`;Compute the maximum value in each group
:meth:`~.DataFrameGroupBy.mean`;Compute the mean of each group
:meth:`~.DataFrameGroupBy.median`;Compute the median of each group
:meth:`~.DataFrameGroupBy.min`;Compute the minimum value in each group
:meth:`~.DataFrameGroupBy.nunique`;Compute the number of unique values in each group
:meth:`~.DataFrameGroupBy.prod`;Compute the product of the values in each group
:meth:`~.DataFrameGroupBy.quantile`;Compute a given quantile of the values in each group
:meth:`~.DataFrameGroupBy.sem`;Compute the standard error of the mean of the values in each group
:meth:`~.DataFrameGroupBy.size`;Compute the number of values in each group
:meth:`~.DataFrameGroupBy.skew` *;Compute the skew of the values in each group
:meth:`~.DataFrameGroupBy.std`;Compute the standard deviation of the values in each group
:meth:`~.DataFrameGroupBy.sum`;Compute the sum of the values in each group
:meth:`~.DataFrameGroupBy.var`;Compute the variance of the values in each group

:meth:`~.DataFrameGroupBy.any`,Compute whether any of the values in the groups are truthy
:meth:`~.DataFrameGroupBy.all`,Compute whether all of the values in the groups are truthy
:meth:`~.DataFrameGroupBy.count`,Compute the number of non-NA values in the groups
:meth:`~.DataFrameGroupBy.cov` * ,Compute the covariance of the groups
:meth:`~.DataFrameGroupBy.first`,Compute the first occurring value in each group
:meth:`~.DataFrameGroupBy.idxmax`,Compute the index of the maximum value in each group
:meth:`~.DataFrameGroupBy.idxmin`,Compute the index of the minimum value in each group
:meth:`~.DataFrameGroupBy.last`,Compute the last occurring value in each group
:meth:`~.DataFrameGroupBy.max`,Compute the maximum value in each group
:meth:`~.DataFrameGroupBy.mean`,Compute the mean of each group
:meth:`~.DataFrameGroupBy.median`,Compute the median of each group
:meth:`~.DataFrameGroupBy.min`,Compute the minimum value in each group
:meth:`~.DataFrameGroupBy.nunique`,Compute the number of unique values in each group
:meth:`~.DataFrameGroupBy.prod`,Compute the product of the values in each group
:meth:`~.DataFrameGroupBy.quantile`,Compute a given quantile of the values in each group
:meth:`~.DataFrameGroupBy.sem`,Compute the standard error of the mean of the values in each group
:meth:`~.DataFrameGroupBy.size`,Compute the number of values in each group
:meth:`~.DataFrameGroupBy.skew` * ,Compute the skew of the values in each group
:meth:`~.DataFrameGroupBy.std`,Compute the standard deviation of the values in each group
:meth:`~.DataFrameGroupBy.sum`,Compute the sum of the values in each group
:meth:`~.DataFrameGroupBy.var`,Compute the variance of the values in each group

Some examples:

Expand Down Expand Up @@ -832,19 +831,18 @@ The following methods on GroupBy act as transformations.
.. csv-table::
:header: "Method", "Description"
:widths: 20, 80
:delim: ;

:meth:`~.DataFrameGroupBy.bfill`;Back fill NA values within each group
:meth:`~.DataFrameGroupBy.cumcount`;Compute the cumulative count within each group
:meth:`~.DataFrameGroupBy.cummax`;Compute the cumulative max within each group
:meth:`~.DataFrameGroupBy.cummin`;Compute the cumulative min within each group
:meth:`~.DataFrameGroupBy.cumprod`;Compute the cumulative product within each group
:meth:`~.DataFrameGroupBy.cumsum`;Compute the cumulative sum within each group
:meth:`~.DataFrameGroupBy.diff`;Compute the difference between adjacent values within each group
:meth:`~.DataFrameGroupBy.ffill`;Forward fill NA values within each group
:meth:`~.DataFrameGroupBy.pct_change`;Compute the percent change between adjacent values within each group
:meth:`~.DataFrameGroupBy.rank`;Compute the rank of each value within each group
:meth:`~.DataFrameGroupBy.shift`;Shift values up or down within each group

:meth:`~.DataFrameGroupBy.bfill`,Back fill NA values within each group
:meth:`~.DataFrameGroupBy.cumcount`,Compute the cumulative count within each group
:meth:`~.DataFrameGroupBy.cummax`,Compute the cumulative max within each group
:meth:`~.DataFrameGroupBy.cummin`,Compute the cumulative min within each group
:meth:`~.DataFrameGroupBy.cumprod`,Compute the cumulative product within each group
:meth:`~.DataFrameGroupBy.cumsum`,Compute the cumulative sum within each group
:meth:`~.DataFrameGroupBy.diff`,Compute the difference between adjacent values within each group
:meth:`~.DataFrameGroupBy.ffill`,Forward fill NA values within each group
:meth:`~.DataFrameGroupBy.pct_change`,Compute the percent change between adjacent values within each group
:meth:`~.DataFrameGroupBy.rank`,Compute the rank of each value within each group
:meth:`~.DataFrameGroupBy.shift`,Shift values up or down within each group

In addition, passing any built-in aggregation method as a string to
:meth:`~.DataFrameGroupBy.transform` (see the next section) will broadcast the result
Expand Down Expand Up @@ -1092,11 +1090,10 @@ efficient, GroupBy-specific, implementation.
.. csv-table::
:header: "Method", "Description"
:widths: 20, 80
:delim: ;

:meth:`~.DataFrameGroupBy.head`;Select the top row(s) of each group
:meth:`~.DataFrameGroupBy.nth`;Select the nth row(s) of each group
:meth:`~.DataFrameGroupBy.tail`;Select the bottom row(s) of each group
:meth:`~.DataFrameGroupBy.head`,Select the top row(s) of each group
:meth:`~.DataFrameGroupBy.nth`,Select the nth row(s) of each group
:meth:`~.DataFrameGroupBy.tail`,Select the bottom row(s) of each group

Users can also use transformations along with Boolean indexing to construct complex
filtrations within groups. For example, suppose we are given groups of products and
Expand Down
18 changes: 9 additions & 9 deletions doc/source/user_guide/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,13 +94,14 @@ well). Any of the axes accessors may be the null slice ``:``. Axes left out of
the specification are assumed to be ``:``, e.g. ``p.loc['a']`` is equivalent to
``p.loc['a', :]``.

.. csv-table::
:header: "Object Type", "Indexers"
:widths: 30, 50
:delim: ;

Series; ``s.loc[indexer]``
DataFrame; ``df.loc[row_indexer,column_indexer]``
.. ipython:: python
ser = pd.Series(range(5), index=list("abcde"))
ser.loc[["a", "c", "e"]]
df = pd.DataFrame(np.arange(25).reshape(5, 5), index=list("abcde"), columns=list("abcde"))
df.loc[["a", "c", "e"], ["b", "d"]]
.. _indexing.basics:

Expand All @@ -116,10 +117,9 @@ indexing pandas objects with ``[]``:
.. csv-table::
:header: "Object Type", "Selection", "Return Value Type"
:widths: 30, 30, 60
:delim: ;

Series; ``series[label]``; scalar value
DataFrame; ``frame[colname]``; ``Series`` corresponding to colname
Series, ``series[label]``, scalar value
DataFrame, ``frame[colname]``, ``Series`` corresponding to colname

Here we construct a simple time series data set to use for illustrating the
indexing functionality:
Expand Down
67 changes: 32 additions & 35 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,26 +16,25 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
.. csv-table::
:header: "Format Type", "Data Description", "Reader", "Writer"
:widths: 30, 100, 60, 60
:delim: ;

text;`CSV <https://en.wikipedia.org/wiki/Comma-separated_values>`__;:ref:`read_csv<io.read_csv_table>`;:ref:`to_csv<io.store_in_csv>`
text;Fixed-Width Text File;:ref:`read_fwf<io.fwf_reader>`
text;`JSON <https://www.json.org/>`__;:ref:`read_json<io.json_reader>`;:ref:`to_json<io.json_writer>`
text;`HTML <https://en.wikipedia.org/wiki/HTML>`__;:ref:`read_html<io.read_html>`;:ref:`to_html<io.html>`
text;`LaTeX <https://en.wikipedia.org/wiki/LaTeX>`__;;:ref:`Styler.to_latex<io.latex>`
text;`XML <https://www.w3.org/standards/xml/core>`__;:ref:`read_xml<io.read_xml>`;:ref:`to_xml<io.xml>`
text; Local clipboard;:ref:`read_clipboard<io.clipboard>`;:ref:`to_clipboard<io.clipboard>`
binary;`MS Excel <https://en.wikipedia.org/wiki/Microsoft_Excel>`__;:ref:`read_excel<io.excel_reader>`;:ref:`to_excel<io.excel_writer>`
binary;`OpenDocument <http://opendocumentformat.org>`__;:ref:`read_excel<io.ods>`;
binary;`HDF5 Format <https://support.hdfgroup.org/HDF5/whatishdf5.html>`__;:ref:`read_hdf<io.hdf5>`;:ref:`to_hdf<io.hdf5>`
binary;`Feather Format <https://github.com/wesm/feather>`__;:ref:`read_feather<io.feather>`;:ref:`to_feather<io.feather>`
binary;`Parquet Format <https://parquet.apache.org/>`__;:ref:`read_parquet<io.parquet>`;:ref:`to_parquet<io.parquet>`
binary;`ORC Format <https://orc.apache.org/>`__;:ref:`read_orc<io.orc>`;:ref:`to_orc<io.orc>`
binary;`Stata <https://en.wikipedia.org/wiki/Stata>`__;:ref:`read_stata<io.stata_reader>`;:ref:`to_stata<io.stata_writer>`
binary;`SAS <https://en.wikipedia.org/wiki/SAS_(software)>`__;:ref:`read_sas<io.sas_reader>`;
binary;`SPSS <https://en.wikipedia.org/wiki/SPSS>`__;:ref:`read_spss<io.spss_reader>`;
binary;`Python Pickle Format <https://docs.python.org/3/library/pickle.html>`__;:ref:`read_pickle<io.pickle>`;:ref:`to_pickle<io.pickle>`
SQL;`SQL <https://en.wikipedia.org/wiki/SQL>`__;:ref:`read_sql<io.sql>`;:ref:`to_sql<io.sql>`

text,`CSV <https://en.wikipedia.org/wiki/Comma-separated_values>`__, :ref:`read_csv<io.read_csv_table>`, :ref:`to_csv<io.store_in_csv>`
text,Fixed-Width Text File, :ref:`read_fwf<io.fwf_reader>` , NA
text,`JSON <https://www.json.org/>`__, :ref:`read_json<io.json_reader>`, :ref:`to_json<io.json_writer>`
text,`HTML <https://en.wikipedia.org/wiki/HTML>`__, :ref:`read_html<io.read_html>`, :ref:`to_html<io.html>`
text,`LaTeX <https://en.wikipedia.org/wiki/LaTeX>`__, :ref:`Styler.to_latex<io.latex>` , NA
text,`XML <https://www.w3.org/standards/xml/core>`__, :ref:`read_xml<io.read_xml>`, :ref:`to_xml<io.xml>`
text, Local clipboard, :ref:`read_clipboard<io.clipboard>`, :ref:`to_clipboard<io.clipboard>`
binary,`MS Excel <https://en.wikipedia.org/wiki/Microsoft_Excel>`__ , :ref:`read_excel<io.excel_reader>`, :ref:`to_excel<io.excel_writer>`
binary,`OpenDocument <http://opendocumentformat.org>`__, :ref:`read_excel<io.ods>`, NA
binary,`HDF5 Format <https://support.hdfgroup.org/HDF5/whatishdf5.html>`__, :ref:`read_hdf<io.hdf5>`, :ref:`to_hdf<io.hdf5>`
binary,`Feather Format <https://github.com/wesm/feather>`__, :ref:`read_feather<io.feather>`, :ref:`to_feather<io.feather>`
binary,`Parquet Format <https://parquet.apache.org/>`__, :ref:`read_parquet<io.parquet>`, :ref:`to_parquet<io.parquet>`
binary,`ORC Format <https://orc.apache.org/>`__, :ref:`read_orc<io.orc>`, :ref:`to_orc<io.orc>`
binary,`Stata <https://en.wikipedia.org/wiki/Stata>`__, :ref:`read_stata<io.stata_reader>`, :ref:`to_stata<io.stata_writer>`
binary,`SAS <https://en.wikipedia.org/wiki/SAS_(software)>`__, :ref:`read_sas<io.sas_reader>` , NA
binary,`SPSS <https://en.wikipedia.org/wiki/SPSS>`__, :ref:`read_spss<io.spss_reader>` , NA
binary,`Python Pickle Format <https://docs.python.org/3/library/pickle.html>`__, :ref:`read_pickle<io.pickle>`, :ref:`to_pickle<io.pickle>`
SQL,`SQL <https://en.wikipedia.org/wiki/SQL>`__, :ref:`read_sql<io.sql>`,:ref:`to_sql<io.sql>`

:ref:`Here <io.perf>` is an informal performance comparison for some of these IO methods.

Expand Down Expand Up @@ -1837,14 +1836,13 @@ with optional parameters:

.. csv-table::
:widths: 20, 150
:delim: ;

``split``; dict like {index -> [index], columns -> [columns], data -> [values]}
``records``; list like [{column -> value}, ... , {column -> value}]
``index``; dict like {index -> {column -> value}}
``columns``; dict like {column -> {index -> value}}
``values``; just the values array
``table``; adhering to the JSON `Table Schema`_
``split``, dict like {index -> [index]; columns -> [columns]; data -> [values]}
``records``, list like [{column -> value}; ... ]
``index``, dict like {index -> {column -> value}}
``columns``, dict like {column -> {index -> value}}
``values``, just the values array
``table``, adhering to the JSON `Table Schema`_

* ``date_format`` : string, type of date conversion, 'epoch' for timestamp, 'iso' for ISO8601.
* ``double_precision`` : The number of decimal places to use when encoding floating point values, default 10.
Expand Down Expand Up @@ -2025,14 +2023,13 @@ is ``None``. To explicitly force ``Series`` parsing, pass ``typ=series``

.. csv-table::
:widths: 20, 150
:delim: ;

``split``; dict like {index -> [index], columns -> [columns], data -> [values]}
``records``; list like [{column -> value}, ... , {column -> value}]
``index``; dict like {index -> {column -> value}}
``columns``; dict like {column -> {index -> value}}
``values``; just the values array
``table``; adhering to the JSON `Table Schema`_

``split``, dict like {index -> [index]; columns -> [columns]; data -> [values]}
``records``, list like [{column -> value} ...]
``index``, dict like {index -> {column -> value}}
``columns``, dict like {column -> {index -> value}}
``values``, just the values array
``table``, adhering to the JSON `Table Schema`_


* ``dtype`` : if True, infer dtypes, if a dict of column to dtype, then use those, if ``False``, then don't infer dtypes at all, default is True, apply only to the data.
Expand Down
Loading

0 comments on commit c5c24bd

Please sign in to comment.