Link API docs to user guide and other examples #5816

rabernat · 2021-09-24T15:34:31Z

Noting down a comment by @danjonesocean on Twitter: https://twitter.com/DanJonesOcean/status/1441392596362874882

In general, having more examples on each xarray page (like the one below) would be good. Then they would come up quickly in function searches:

http://xarray.pydata.org/en/stable/generated/xarray.Dataset.merge.html#xarray.Dataset.merge

Our API docs are generated by the function docstrings, and these are usually the first thing users hit when they search for functions. However, these docstring uniformly lack examples, often leaving users stuck.

I see two ways to mitigate this:

Add examples directly to the docstings (suggested by @jklymak)
Cross reference other examples from the user guide or other tutorials

jklymak · 2021-09-24T15:48:08Z

I think doing both is always appreciated: short "how to" with a common pattern or two and then a "see also".

raybellwaves · 2021-09-24T18:46:45Z

See https://github.com/pydata/xarray/blob/main/xarray/core/dataset.py#L2022

for how to link the doc string to other parts of the docs

keewis · 2021-10-10T16:38:18Z

I agree, this is would be really helpful. There's lots to do, though: a small script reports about 170 functions / methods without example sections (and 6 numpy wrappers), and all others could use reviews:

analysis script

import itertools

import numpy as np
import xarray as xr


def public_api(obj, base_name):
    api = dir(obj)
    public_api = tuple(name for name in api if not name.startswith("_"))
    public_api_objects = {name: getattr(obj, name) for name in public_api}

    # uppercase indicates classes
    public_functions = {
        f"{base_name}.{name}": obj
        for name, obj in public_api_objects.items()
        if not name[0].isupper() and callable(obj)
    }

    return public_functions


def is_dict_method(fqn):
    *_, name = fqn.split(".")

    return hasattr(dict, name)


def has_section(docstring, name):
    lines = docstring.split("\n")
    marker = "-" * len(name)
    for current, next in itertools.zip_longest(lines, lines[1:]):
        if next is None or next.strip() != marker:
            continue

        if current.strip() == name:
            return True

    return False


def is_numpy_wrapper(fqn, docstring):
    *_, name = fqn.split(".")

    segment = f"Refer to `numpy.{name}` for full documentation."
    shortened_segment = f"Refer to `numpy.{name[:4]}` for full documentation."
    if segment in docstring or shortened_segment in docstring:
        return True

    numpy_func = getattr(np, name, None)
    if numpy_func is None:
        return False

    numpy_docstring = numpy_func.__doc__
    return numpy_docstring in docstring


def format_names(names):
    return "\n".join(f"  - {name}" for name in names)


namespaces = {
    "xarray": xr,
    "xarray.DataArray": xr.DataArray,
    "xarray.Dataset": xr.Dataset,
}

funcs = dict(
    itertools.chain.from_iterable(
        public_api(namespace, base_name=name).items()
        for name, namespace in namespaces.items()
    )
)

docstrings = {name: func.__doc__ for name, func in funcs.items()}

without_docstring = tuple(
    name
    for name, docstring in docstrings.items()
    if docstring is None or not docstring.strip()
)

filtered_docstrings = tuple(
    (name, docstring)
    for name, docstring in docstrings.items()
    if (
        docstring is not None
        and not has_section(docstring, name="Examples")
        and not is_dict_method(name)
    )
)
without_examples_xarray = tuple(
    name
    for name, docstring in filtered_docstrings
    if not is_numpy_wrapper(name, docstring)
)
without_examples_numpy = tuple(
    name for name, docstring in filtered_docstrings if is_numpy_wrapper(name, docstring)
)


print(
    f"functions without examples ({len(without_examples_xarray)}):",
    format_names(without_examples_xarray),
    sep="\n",
)
print()
print(
    f"numpy wrappers without examples ({len(without_examples_numpy)}):",
    format_names(without_examples_numpy),
    sep="\n",
)
print()
print(
    f"functions without docstring ({len(without_docstring)}):",
    format_names(without_docstring),
    sep="\n",
)

analysis report in current main (with coarse filtering of advanced / deprecated API)

functions without examples (170):
  - xarray.decode_cf
  - xarray.get_options
  - xarray.infer_freq
  - xarray.load_dataarray
  - xarray.load_dataset
  - xarray.open_dataarray
  - xarray.open_dataset
  - xarray.open_mfdataset
  - xarray.open_zarr
  - xarray.polyval
  - xarray.unify_chunks
  - xarray.DataArray.all
  - xarray.DataArray.any
  - xarray.DataArray.as_numpy
  - xarray.DataArray.assign_attrs
  - xarray.DataArray.astype
  - xarray.DataArray.bfill
  - xarray.DataArray.broadcast_equals
  - xarray.DataArray.chunk
  - xarray.DataArray.close
  - xarray.DataArray.combine_first
  - xarray.DataArray.compute
  - xarray.DataArray.conj
  - xarray.DataArray.count
  - xarray.DataArray.cumprod
  - xarray.DataArray.cumsum
  - xarray.DataArray.curvefit
  - xarray.DataArray.drop
  - xarray.DataArray.drop_duplicates
  - xarray.DataArray.drop_isel
  - xarray.DataArray.drop_sel
  - xarray.DataArray.drop_vars
  - xarray.DataArray.dropna
  - xarray.DataArray.equals
  - xarray.DataArray.expand_dims
  - xarray.DataArray.ffill
  - xarray.DataArray.fillna
  - xarray.DataArray.from_cdms2
  - xarray.DataArray.from_dict
  - xarray.DataArray.from_iris
  - xarray.DataArray.from_series
  - xarray.DataArray.get_axis_num
  - xarray.DataArray.get_index
  - xarray.DataArray.groupby_bins
  - xarray.DataArray.head
  - xarray.DataArray.identical
  - xarray.DataArray.interp_like
  - xarray.DataArray.load
  - xarray.DataArray.max
  - xarray.DataArray.mean
  - xarray.DataArray.median
  - xarray.DataArray.min
  - xarray.DataArray.persist
  - xarray.DataArray.plot
  - xarray.DataArray.polyfit
  - xarray.DataArray.prod
  - xarray.DataArray.reduce
  - xarray.DataArray.reindex_like
  - xarray.DataArray.rename
  - xarray.DataArray.reorder_levels
  - xarray.DataArray.reset_coords
  - xarray.DataArray.reset_index
  - xarray.DataArray.rolling_exp
  - xarray.DataArray.searchsorted
  - xarray.DataArray.set_close
  - xarray.DataArray.squeeze
  - xarray.DataArray.std
  - xarray.DataArray.str
  - xarray.DataArray.sum
  - xarray.DataArray.tail
  - xarray.DataArray.thin
  - xarray.DataArray.to_cdms2
  - xarray.DataArray.to_dataframe
  - xarray.DataArray.to_dataset
  - xarray.DataArray.to_dict
  - xarray.DataArray.to_index
  - xarray.DataArray.to_iris
  - xarray.DataArray.to_masked_array
  - xarray.DataArray.to_netcdf
  - xarray.DataArray.to_numpy
  - xarray.DataArray.to_pandas
  - xarray.DataArray.to_series
  - xarray.DataArray.transpose
  - xarray.DataArray.unify_chunks
  - xarray.DataArray.var
  - xarray.DataArray.weighted
  - xarray.Dataset.all
  - xarray.Dataset.any
  - xarray.Dataset.apply
  - xarray.Dataset.argmax
  - xarray.Dataset.argmin
  - xarray.Dataset.as_numpy
  - xarray.Dataset.assign_attrs
  - xarray.Dataset.astype
  - xarray.Dataset.bfill
  - xarray.Dataset.broadcast_equals
  - xarray.Dataset.broadcast_like
  - xarray.Dataset.chunk
  - xarray.Dataset.close
  - xarray.Dataset.combine_first
  - xarray.Dataset.compute
  - xarray.Dataset.conj
  - xarray.Dataset.count
  - xarray.Dataset.cumprod
  - xarray.Dataset.cumsum
  - xarray.Dataset.curvefit
  - xarray.Dataset.differentiate
  - xarray.Dataset.drop
  - xarray.Dataset.drop_dims
  - xarray.Dataset.drop_vars
  - xarray.Dataset.dropna
  - xarray.Dataset.dump_to_store
  - xarray.Dataset.equals
  - xarray.Dataset.expand_dims
  - xarray.Dataset.ffill
  - xarray.Dataset.from_dataframe
  - xarray.Dataset.from_dict
  - xarray.Dataset.get_index
  - xarray.Dataset.groupby_bins
  - xarray.Dataset.head
  - xarray.Dataset.identical
  - xarray.Dataset.info
  - xarray.Dataset.interp_like
  - xarray.Dataset.isel
  - xarray.Dataset.load
  - xarray.Dataset.load_store
  - xarray.Dataset.max
  - xarray.Dataset.mean
  - xarray.Dataset.median
  - xarray.Dataset.merge
  - xarray.Dataset.min
  - xarray.Dataset.persist
  - xarray.Dataset.plot
  - xarray.Dataset.polyfit
  - xarray.Dataset.prod
  - xarray.Dataset.rank
  - xarray.Dataset.reduce
  - xarray.Dataset.reindex_like
  - xarray.Dataset.rename
  - xarray.Dataset.rename_dims
  - xarray.Dataset.rename_vars
  - xarray.Dataset.reorder_levels
  - xarray.Dataset.reset_coords
  - xarray.Dataset.reset_index
  - xarray.Dataset.rolling_exp
  - xarray.Dataset.sel
  - xarray.Dataset.set_close
  - xarray.Dataset.set_coords
  - xarray.Dataset.squeeze
  - xarray.Dataset.stack
  - xarray.Dataset.std
  - xarray.Dataset.sum
  - xarray.Dataset.tail
  - xarray.Dataset.thin
  - xarray.Dataset.to_array
  - xarray.Dataset.to_dask_dataframe
  - xarray.Dataset.to_dataframe
  - xarray.Dataset.to_dict
  - xarray.Dataset.to_netcdf
  - xarray.Dataset.to_pandas
  - xarray.Dataset.to_zarr
  - xarray.Dataset.transpose
  - xarray.Dataset.unify_chunks
  - xarray.Dataset.unstack
  - xarray.Dataset.var
  - xarray.Dataset.weighted

numpy wrappers without examples (6):
  - xarray.DataArray.argsort
  - xarray.DataArray.clip
  - xarray.DataArray.conjugate
  - xarray.Dataset.argsort
  - xarray.Dataset.clip
  - xarray.Dataset.conjugate

functions without docstring (1):
  - xarray.DataArray.dt

For some of those it might be difficult to write examples for (i.e. the I/O methods / functions), and modifying the docstring of the numpy wrappers would require some refactoring.

Most of the remaining ones are pretty easy to figure out and write examples for, however. If we remove the entry barrier as much as possible maybe these can be good first PRs with high impact?

I'm not really sure how to do that, though... I'd try to group them by difficulty and create sub-issues with a check list for each group (so they're easy to find), and maybe also explain what a good example looks like or link to a appropriate explanation. The latter might be a good thing to add to the contributor's guide, too.

rabernat added the topic-documentation label Sep 24, 2021

keewis added contrib-good-first-issue contrib-help-wanted labels Oct 10, 2021

dcherian mentioned this issue Nov 8, 2021

Generate reductions for DataArray, Dataset, GroupBy and Resample #5950

Merged

keewis mentioned this issue Jul 16, 2022

improve docstrings with examples and links #6793

Open

42 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Link API docs to user guide and other examples #5816

Link API docs to user guide and other examples #5816

rabernat commented Sep 24, 2021

jklymak commented Sep 24, 2021

raybellwaves commented Sep 24, 2021

keewis commented Oct 10, 2021 •

edited

Loading

Link API docs to user guide and other examples #5816

Link API docs to user guide and other examples #5816

Comments

rabernat commented Sep 24, 2021

jklymak commented Sep 24, 2021

raybellwaves commented Sep 24, 2021

keewis commented Oct 10, 2021 • edited Loading

keewis commented Oct 10, 2021 •

edited

Loading