Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Whatsnew 1.2.0 cleanup #38002

Merged
merged 8 commits into from
Nov 25, 2020
Merged

Conversation

rhshadrach
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Cleanup of spelling, grammar, and links.


cc @jorisvandenbossche, @dsaxton, @phofl:

Using :meth:`.DataFrameGroupBy.quantile` (note the .) displays the text DataFrameGroupBy.quantile and sphinx finds the link to the proper page. Similar remarks apply to Rolling and others. From the sphinx docs:

Also, if the name is prefixed with a dot, and no exact match is found, the target is taken as a suffix and all object names with that suffix are searched.

I don't believe this was known about (at least to me!) and is different from what was aligned on in #37145, so I wanted to get thoughts here on which is preferred. While I've added many cross-links using this, DataFrameGroupBy.quantile is the sole example where I have replaced a preexisting link.


Other notable changes that are worth a mention to make sure they are correct:

  • Numpy is always referred to as NumPy in plain-text (no backticks)
  • When a method exists on only Series and DataFrame, replaced unlinked reference with a reference to each. For example, :meth:`read_csv` becomes :meth:`DataFrame.to_csv` and :meth:`Series.to_csv`
  • dtypes and argument values are always double-backticked
  • Moved the following note from Other to Missing

Bug in :meth:Series.nunique with dropna=True was returning incorrect results when both NA and None missing values were present (:issue:37566)

@rhshadrach rhshadrach added this to the 1.2 milestone Nov 22, 2020
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot , this is a very useful clean-up

Added a few comments, but some of them are subjective, so to be discussed

- Added :meth:`~DataFrame.set_flags` for setting table-wide flags on a ``Series`` or ``DataFrame`` (:issue:`28394`)
- Added ``day_of_week`` (compatibility alias ``dayofweek``) property to :class:`Timestamp`, :class:`DatetimeIndex`, :class:`Period`, :class:`PeriodIndex` (:issue:`9605`)
- Added ``day_of_year`` (compatibility alias ``dayofyear``) property to :class:`Timestamp`, :class:`DatetimeIndex`, :class:`Period`, :class:`PeriodIndex` (:issue:`9605`)
- Added :meth:`~DataFrame.set_flags` for setting table-wide flags on a :class:`Series` or :class:`DataFrame` (:issue:`28394`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't find it needed to make each occurence of Series and DataFrame into a link: 1) those pages are not that useful and 2) those would get linked many times.
In this specific sentence, the relevant link is the one to set_flags. By having this the only link in the sentence, it stands out more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. To restate as a general rule: unless Series/DataFrame are being directly changed, do not link.

- Performance improvements when creating DataFrame or Series with dtype ``str`` or :class:`StringDtype` from array with many string elements (:issue:`36304`, :issue:`36317`, :issue:`36325`, :issue:`36432`, :issue:`37371`)
- Performance improvement in :meth:`GroupBy.agg` with the ``numba`` engine (:issue:`35759`)
- Performance improvements when creating :meth:`pd.Series.map` from a huge dictionary (:issue:`34717`)
- Performance improvements when creating :class:`DataFrame` or :class:`Series` with dtype ``str`` or :class:`StringDtype` from array with many string elements (:issue:`36304`, :issue:`36317`, :issue:`36325`, :issue:`36432`, :issue:`37371`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for example in this sentence, I would leave DataFrame/Series as they were (IMO it makes the sentence more difficult to read if there is a lot of different formatting).

- Fixed Bug where :class:`DataFrame` column set to scalar extension type via a dict instantiation was considered an object type rather than the extension type (:issue:`35965`)
- Fixed bug where ``astype()`` with equal dtype and ``copy=False`` would return a new object (:issue:`28488`)
- Fixed bug when applying a NumPy ufunc with multiple outputs to an :class:`.IntegerArray` returning None (:issue:`36913`)
- Fixed an inconsistency in :meth:`PeriodArray.__init__`` signature to those of :class:`.DatetimeArray` and :class:`.TimedeltaArray` (:issue:`37289`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't create docstring pages for the __init__ (we generally document this in the class docstring), so how it was before (but correcting the link) is probably better for this case then

- Bug in ``accessor.DirNamesMixin``, where ``dir(obj)`` wouldn't show attributes defined on the instance (:issue:`37173`).
- Bug in :meth:`Series.nunique` with ``dropna=True`` was returning incorrect results when both ``NA`` and ``None`` missing values were present (:issue:`37566`)
- Fixed metadata propagation in the :class:`Series.dt`, :class:`Series.str` accessors, :class:`DataFrame.duplicated`, :class:`DataFrame.stack`, :class:`DataFrame.unstack`, :class:`DataFrame.pivot`, :class:`DataFrame.append`, :class:`DataFrame.diff`, :class:`DataFrame.applymap` and :class:`DataFrame.update` methods (:issue:`28283`, :issue:`37381`)
- Fixed metadata propagation when selecting columns with :meth:`DataFrame.__getitem__` (:issue:`28283`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, this __getitem__ cannot be linked to ..

- Fixed metadata propagation when selecting columns with :meth:`DataFrame.__getitem__` (:issue:`28283`)
- Bug in :meth:`Index.union` behaving differently depending on whether operand is an :class:`Index` or other list-like (:issue:`36384`)
- Passing an array with 2 or more dimensions to the :class:`Series` constructor now raises the more specific ``ValueError`` rather than a bare ``Exception`` (:issue:`35744`)
- Bug in ``accessor.DirNamesMixin``, where ``dir(obj)`` wouldn't show attributes defined on the instance (:issue:`37173`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote (not related to your changes), but we should not mention "accessor.DirNamesMixin", as this is something private / no user has ever heard about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - I changed some other notes that referred to private functions but missed this one. Will change.

@jorisvandenbossche
Copy link
Member

Using :meth:`.DataFrameGroupBy.quantile` (note the .) displays the text DataFrameGroupBy.quantile and sphinx finds the link to the proper page ..
I don't believe this was known about (at least to me!) and is different from what was aligned on in #37145, so I wanted to get thoughts here on which is preferred

Ah, that looks nice. Do you know, does sphinx search in the possible link targets, or does it search in the pandas namespace? (just wondering, because it could also find pandas.core.groupby.generic.DataFrameGroupBy which is the actual location of the object definition, but it is documented as pandas.core.groupby.DataFrameGroupBy)

@rhshadrach
Copy link
Member Author

Ah, that looks nice. Do you know, does sphinx search in the possible link targets, or does it search in the pandas namespace?

If I'm reading the docs correctly, it's the same way references are usually resolved but in reverse order. I would have to assume that is searching possible link targets, but am not certain on this:

Normally, names in these roles are searched first without any further qualification, then with the current module name prepended, then with the current module and class name (if any) prepended. If you prefix the name with a dot, this order is reversed. For example, in the documentation of Python’s codecs module, :py:func:open always refers to the built-in function, while :py:func:.open refers to codecs.open().

Also relevant when using the "." prefix:

Since this can get ambiguous, if there is more than one possible match, you will get a warning from Sphinx.

but I'm guessing warnings are ignored in the CI docs build (since I see warnings on my machine and build passes).

@jorisvandenbossche
Copy link
Member

Since this can get ambiguous, if there is more than one possible match, you will get a warning from Sphinx.

Normally we build the docs with the option to turn warnings into errors on CI (now, there are a bunch of warnings, but those are from the ipython directive, and not done through sphinx' warning machinery, so not causing errors). In any case, on CI here there are indeed no warnings.

@jorisvandenbossche
Copy link
Member

So given that it should give warnings in case of a possible duplicate, I am fine with using this . syntax (at least for the classes like groupby related which are quite nested)

- :meth:`DataFrame.to_parquet` now returns a ``bytes`` object when no ``path`` argument is passed (:issue:`37105`)
- :class:`Rolling` now supports the ``closed`` argument for fixed windows (:issue:`34315`)
- :class:`DatetimeIndex` and :class:`Series` with ``datetime64`` or ``datetime64tz`` dtypes now support ``std`` (:issue:`37436`)
- :class:`Window` now supports all Scipy window types in ``win_type`` with flexible keyword argument support (:issue:`34556`)
- :meth:`testing.assert_index_equal` now has a ``check_order`` parameter that allows indexes to be checked in an order-insensitive manner (:issue:`37478`)
- :func:`read_csv` supports memory-mapping for compressed files (:issue:`37621`)
- Improve error reporting for :meth:`DataFrame.merge()` when invalid merge column definitions were given (:issue:`16228`)
- Improve numerical stability for :meth:`Rolling.skew()`, :meth:`Rolling.kurt()`, :meth:`Expanding.skew()` and :meth:`Expanding.kurt()` through implementation of Kahan summation (:issue:`6929`)
- Improve numerical stability for :meth:`.Rolling.skew()`, :meth:`.Rolling.kurt()`, :meth:`Expanding.skew()` and :meth:`Expanding.kurt()` through implementation of Kahan summation (:issue:`6929`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Improve numerical stability for :meth:`.Rolling.skew()`, :meth:`.Rolling.kurt()`, :meth:`Expanding.skew()` and :meth:`Expanding.kurt()` through implementation of Kahan summation (:issue:`6929`)
- Improve numerical stability for :meth:`.Rolling.skew`, :meth:`.Rolling.kurt`, :meth:`Expanding.skew` and :meth:`Expanding.kurt` through implementation of Kahan summation (:issue:`6929`)

(I would expect it not to work with the parentheses (not sure though), but they are certainly not needed)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works with them, but to your point their existence in the resulting html is solely determined by the corresponding sphinx setting

- Bug in :meth:`DataFrame.groupby.rolling` returning wrong values with partial centered window (:issue:`36040`).
- Bug in :meth:`.DataFrameGroupBy.ffill` and :meth:`.DataFrameGroupBy.bfill` where a ``NaN`` group would return filled values instead of ``NaN`` when ``dropna=True`` (:issue:`34725`)
- Bug in :meth:`.RollingGroupby.count` where a ``ValueError`` was raised when specifying the ``closed`` parameter (:issue:`35869`)
- Bug in :meth:`DataFrameGroupBy.rolling` returning wrong values with partial centered window (:issue:`36040`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Bug in :meth:`DataFrameGroupBy.rolling` returning wrong values with partial centered window (:issue:`36040`)
- Bug in :meth:`.DataFrameGroupBy.rolling` returning wrong values with partial centered window (:issue:`36040`)

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, makes sense to be consistent here. But just want to note that since the page doesn't exist, the output is the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But just want to note that since the page doesn't exist, the output is the same.

Ah, yes, but so then generally speaking, we shouldn't be using this with a :meth:`..` link IMO. But for this case, we should probably add the rolling method on groupby objects to our API docs.

- Using :meth:`Rolling.var()` instead of :meth:`Rolling.std()` avoids numerical issues for :meth:`Rolling.corr()` when :meth:`Rolling.var()` is still within floating point precision while :meth:`Rolling.std()` is not (:issue:`31286`)
- Bug in :meth:`df.groupby(..).quantile() <pandas.core.groupby.DataFrameGroupBy.quantile>` and :meth:`df.resample(..).quantile() <pandas.core.resample.Resampler.quantile>` raised ``TypeError`` when values were of type ``Timedelta`` (:issue:`29485`)
- Bug in :meth:`Rolling.median` and :meth:`Rolling.quantile` returned wrong values for :class:`BaseIndexer` subclasses with non-monotonic starting or ending points for windows (:issue:`37153`)
- Using :meth:`.Rolling.var()` instead of :meth:`.Rolling.std()` avoids numerical issues for :meth:`.Rolling.corr()` when :meth:`.Rolling.var()` is still within floating point precision while :meth:`.Rolling.std()` is not (:issue:`31286`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here some parentheses

@rhshadrach
Copy link
Member Author

@jorisvandenbossche I'm having 2nd thoughts on my last two commits. Adding the . unnecessarily and removing any non-necessary () is making the diff here significantly larger than it needs to be with no change in the doc output. Is the consistency in the reST worth it?

@simonjayhawkins simonjayhawkins mentioned this pull request Nov 24, 2020
@jreback
Copy link
Contributor

jreback commented Nov 25, 2020

going to merge this on green.

@jreback jreback merged commit 3967d83 into pandas-dev:master Nov 25, 2020
@jreback
Copy link
Contributor

jreback commented Nov 25, 2020

thanks @rhshadrach
not sure what's up with the windows failure.

@rhshadrach rhshadrach deleted the whatsnew_1.2 branch December 6, 2020 14:04
@rhshadrach rhshadrach mentioned this pull request Jun 3, 2021
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants