Skip to content

Commit

Permalink
from upstream (#7)
Browse files Browse the repository at this point in the history
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
  • Loading branch information
jcoffi authored Feb 15, 2023
2 parents f1ebfd9 + 2c58dc7 commit 0894778
Show file tree
Hide file tree
Showing 78 changed files with 1,282 additions and 659 deletions.
2 changes: 2 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ Or more about `Ray Core`_ and its key abstractions:
- `Actors`_: Stateful worker processes created in the cluster.
- `Objects`_: Immutable values accessible across the cluster.

Monitor and debug Ray applications and clusters using the `Ray dashboard <https://docs.ray.io/en/latest/ray-core/ray-dashboard.html>`__.

Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing
`ecosystem of community integrations`_.

Expand Down
6 changes: 2 additions & 4 deletions doc/source/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ parts:
- file: data/random-access
- file: data/faq
- file: data/api/api
- file: data/glossary
- file: data/integrations

- file: train/train
Expand Down Expand Up @@ -361,9 +362,6 @@ parts:

- file: ray-observability/monitoring-debugging/monitoring-debugging
title: "Monitoring and Debugging"
sections:
- file: ray-observability/index
title: Tools

- file: ray-references/api
title: References
Expand All @@ -380,4 +378,4 @@ parts:
- file: ray-contribute/fake-autoscaler
- file: ray-core/examples/testing-tips
- file: ray-core/configure
- file: ray-contribute/whitepaper
- file: ray-contribute/whitepaper
5 changes: 5 additions & 0 deletions doc/source/data/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -144,3 +144,8 @@ or remote filesystems.


To learn more about saving datasets, read :ref:`Saving datasets <saving_datasets>`.

Next Steps
----------

* To check how your application is doing, you can use the :ref:`Ray dashboard<ray-dashboard>`.
137 changes: 137 additions & 0 deletions doc/source/data/glossary.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
.. _datasets_glossary:

=====================
Ray Datasets Glossary
=====================

.. glossary::

Batch format
The way batches of data are represented.

Set ``batch_format`` in methods like
:meth:`Dataset.iter_batches() <ray.data.Dataset.iter_batches>` and
:meth:`Dataset.map_batches() <ray.data.Dataset.map_batches>` to specify the
batch type.

.. doctest::

>>> import ray
>>> dataset = ray.data.range_table(10)
>>> next(iter(dataset.iter_batches(batch_format="numpy", batch_size=5)))
{'value': array([0, 1, 2, 3, 4])}
>>> next(iter(dataset.iter_batches(batch_format="pandas", batch_size=5)))
value
0 0
1 1
2 2
3 3
4 4

To learn more about batch formats, read
:ref:`UDF Input Batch Formats <transform_datasets_batch_formats>`.

Block
A processing unit of data. A :class:`~ray.data.Dataset` consists of a
collection of blocks.

Under the hood, :term:`Datasets <Datasets (library)>` partition :term:`records <Record>`
into a set of distributed data blocks. This allows Datasets to perform operations
in parallel.

Unlike a batch, which is a user-facing object, a block is an internal abstraction.

Block format
The way :term:`blocks <Block>` are represented.

Blocks are represented as
`Arrow tables <https://arrow.apache.org/docs/python/generated/pyarrow.Table.html>`_,
`pandas DataFrames <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`_,
and Python lists. To determine the block format, call
:meth:`Dataset.dataset_format() <ray.data.Dataset.dataset_format>`.

Datasets (library)
A library for distributed data processing.

Datasets isn’t intended as a replacement for more general data processing systems.
Its utility is as the last-mile bridge from ETL pipeline outputs to distributed
ML applications and libraries in Ray.

To learn more about Ray Datasets, read :ref:`Key Concepts <dataset_concept>`.

Dataset (object)
A class that represents a distributed collection of data.

:class:`~ray.data.Dataset` exposes methods to read, transform, and consume data at scale.

To learn more about Datasets and the operations they support, read the :ref:`Datasets API Reference <data-api>`.

Datasource
A :class:`~ray.data.Datasource` specifies how to read and write from
a variety of external storage and data formats.

Examples of Datasources include :class:`~ray.data.datasource.ParquetDatasource`,
:class:`~ray.data.datasource.ImageDatasource`,
:class:`~ray.data.datasource.TFRecordDatasource`,
:class:`~ray.data.datasource.CSVDatasource`, and
:class:`~ray.data.datasource.MongoDatasource`.

To learn more about Datasources, read :ref:`Creating a Custom Datasource <custom_datasources>`.

Record
A single data item.

If your dataset is :term:`tabular <Tabular Dataset>`, then records are :class:`TableRows <ray.data.row.TableRow>`.
If your dataset is :term:`simple <Simple Dataset>`, then records are arbitrary Python objects.
If your dataset is :term:`tensor <Tensor Dataset>`, then records are `NumPy ndarrays <https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html>`_.

Schema
The data type of a dataset.

If your dataset is :term:`tabular <Tabular Dataset>`, then the schema describes
the column names and data types. If your dataset is :term:`simple <Simple Dataset>`,
then the schema describes the Python object type. If your dataset is
:term:`tensor <Tensor Dataset>`, then the schema describes the per-element
tensor shape and data type.

To determine a dataset's schema, call
:meth:`Dataset.schema() <ray.data.Dataset.schema>`.

Simple Dataset
A Dataset that represents a collection of arbitrary Python objects.

.. doctest::

>>> import ray
>>> ray.data.from_items(["spam", "ham", "eggs"])
Dataset(num_blocks=3, num_rows=3, schema=<class 'str'>)

Tensor Dataset
A Dataset that represents a collection of ndarrays.

:term:`Tabular datasets <Tabular Dataset>` that contain tensor columns aren’t tensor datasets.

.. doctest::

>>> import numpy as np
>>> import ray
>>> ray.data.from_numpy(np.zeros((100, 32, 32, 3)))
Dataset(num_blocks=1, num_rows=100, schema={__value__: ArrowTensorType(shape=(32, 32, 3), dtype=double)})

Tabular Dataset
A Dataset that represents columnar data.

.. doctest::

>>> import ray
>>> ray.data.read_csv("s3://anonymous@air-example-data/iris.csv")
Dataset(num_blocks=1, num_rows=150, schema={sepal length (cm): double, sepal width (cm): double, petal length (cm): double, petal width (cm): double, target: int64})

User-defined function (UDF)
A callable that transforms batches or :term:`records <Record>` of data. UDFs let you arbitrarily transform datasets.

Call :meth:`Dataset.map_batches() <ray.data.Dataset.map_batches>`,
:meth:`Dataset.map() <ray.data.Dataset.map>`, or
:meth:`Dataset.flat_map() <ray.data.Dataset.flat_map>` to apply UDFs.

To learn more about UDFs, read :ref:`Writing User-Defined Functions <transform_datasets_writing_udfs>`.
1 change: 1 addition & 0 deletions doc/source/ray-air/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -205,3 +205,4 @@ Next Steps
- :ref:`air-examples-ref`
- :ref:`API reference <air-api-ref>`
- :ref:`Technical whitepaper <whitepaper>`
- To check how your application is doing, you can use the :ref:`Ray dashboard<ray-dashboard>`.
14 changes: 5 additions & 9 deletions doc/source/ray-core/ray-dashboard.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,12 @@

Ray Dashboard
=============
Ray's built-in dashboard provides metrics, charts, and other features that help
Ray users to understand Ray clusters and libraries.
Ray provides a web-based dashboard for monitoring and debugging Ray applications.
The dashboard provides a visual representation of the system state, allowing users to track the performance
of their applications and troubleshoot issues.

The dashboard lets you:

- View cluster metrics including time-series visualizations.
- See errors and exceptions at a glance.
- View logs across many machines.
- See all your ray jobs and the logs for those jobs.
- See your ray actors and their logs
.. image:: https://raw.githubusercontent.com/ray-project/Images/master/docs/new-dashboard/Dashboard-overview.png
:align: center

Getting Started
---------------
Expand Down
2 changes: 2 additions & 0 deletions doc/source/ray-core/walkthrough.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ As seen above, Ray stores task and actor call results in its :ref:`distributed o
Next Steps
----------

.. tip:: To check how your application is doing, you can use the :ref:`Ray dashboard <ray-dashboard>`.

Ray's key primitives are simple, but can be composed together to express almost any kind of distributed computation.
Learn more about Ray's :ref:`key concepts <core-key-concepts>` with the following user guides:

Expand Down
18 changes: 0 additions & 18 deletions doc/source/ray-observability/index.rst

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _observability:

Monitoring and Debugging
========================

Expand All @@ -8,9 +10,18 @@ See :ref:`Getting Help <ray-troubleshoot-getting-help>` if your problem is not s
.. toctree::
:maxdepth: 0

../overview
../../ray-core/ray-dashboard
../state/state-api
../ray-debugging
../ray-logging
../ray-metrics
profiling
../ray-tracing
troubleshoot-failures
troubleshoot-hangs
troubleshoot-performance
gotchas
profiling
getting-help
../../ray-contribute/debugging.rst
../../ray-contribute/profiling.rst
14 changes: 7 additions & 7 deletions doc/source/ray-observability/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,13 @@ This section covers a list of available monitoring and debugging tools and featu

This documentation only covers the high-level description of available tools and features. For more details, see :ref:`Ray Observability <observability>`.

Dashboard (Web UI)
------------------
Ray supports the web-based dashboard to help users monitor the cluster. When a new cluster is started, the dashboard is available
through the default address `localhost:8265` (port can be automatically incremented if port 8265 is already occupied).

See :ref:`Ray Dashboard <ray-dashboard>` for more details.

Application Logging
-------------------
By default, all stdout and stderr of tasks and actors are streamed to the Ray driver (the entrypoint script that calls ``ray.init``).
Expand Down Expand Up @@ -79,13 +86,6 @@ The following command will list all the actors from the cluster.
See :ref:`Ray State API <state-api-overview-ref>` for more details.

Dashboard (Web UI)
------------------
Ray supports the web-based dashboard to help users monitor the cluster. When a new cluster is started, the dashboard is available
through the default address `localhost:8265` (port can be automatically incremented if port 8265 is already occupied).

See :ref:`Ray Dashboard <ray-dashboard>` for more details.

Debugger
--------
Ray has a built-in debugger that allows you to debug your distributed applications.
Expand Down
Loading

0 comments on commit 0894778

Please sign in to comment.