Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a developing with spectral-cube docs page #798

Merged
merged 8 commits into from
Jan 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/dask.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _doc_dask:

Integration with dask
=====================

Expand Down Expand Up @@ -25,6 +27,10 @@ To read in a FITS cube using the dask-enabled classes, you can do::
Most of the properties and methods that normally work with :class:`~spectral_cube.SpectralCube`
should continue to work with :class:`~spectral_cube.DaskSpectralCube`.

For an interactive demonstration, see the `Guide to Dask Optimization <https://github.com/radio-astro-tools/tutorials/pull/21>`_.

..
TODO: UPDATE THE LINK TO THE TUTORIAL once merged

Schedulers and parallel computations
------------------------------------
Expand Down
93 changes: 93 additions & 0 deletions docs/developing_with_spectralcube.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
.. _doc_developersnotes:

Notes for development using spectral-cube
=========================================
.. currentmodule:: spectral_cube
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whooaaaa didn't know you could do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me either! I copy-pasted from docs Chris wrote.


spectral-cube is flexible and can used within other packages for
development beyond the core package's capabilities. Two significant strengths
are the use of memory-mapping and the integration with `dask <https://dask.org/>`_
(:ref:`doc_dask`) to efficiently handle larger than memory data.

This page provides suggestions for software development using spectral-cube in other
packages.

The following two sections give information on standard usage of :class:`SpectralCube`.
The third discusses usage with dask integration in :class:`DaskSpectralCube`.

Handling large data cubes
-------------------------

spectral-cube is specifically designed for handling larger-than-memory data
and minimizes creating copies of the data. :class:`SpectralCube` uses memory-mapping
and provides options for executing operations with only subsets of the data
(for example, the `how` keyword in :meth:`SpectralCube.moment`).

Masking operations can be performed "lazily", where the computation is completed
only when a view of the underlying boolean mask array is returned.
See :ref:`doc_masking` for details on these implementations.

Further strategies for handling large data is given in :ref:`doc_handling_large_datasets`.


Parallelizing operations
------------------------

Several operations implemented in :class:`SpectralCube` can be parallelized
using the `joblib <https://joblib.readthedocs.io/en/latest/>`_ package. Builtin methods
in :class:`SpectralCube` with the `parallel` keyword will enable using joblib.

New methods can take advantage of these features by creating custom functions
to pass to :meth:`SpectralCube.apply_function_parallel_spatial` and
:meth:`SpectralCube.apply_function_parallel_spectral`. These methods accept
functions that take a data and mask array input, with optional `**kwargs`,
and that return an output array of the same shape as the input.


Unifying large-data handling and parallelization with dask
----------------------------------------------------------

spectral-cube's dask integration unifies many of the above features and further
options leveraging the dask ecosystem. The :ref:`doc_dask` page provides an overview
of general usage and recommended practices, including:

* Using different dask schedulers (synchronous, threads, and distributed)
* Triggering dask executions and saving intermediate results to disk
* Efficiently rechunking large data for parallel operations
* Loading cubes in CASA image format

For an interactive demonstration of these features, see
the `Guide to Dask Optimization <https://github.com/radio-astro-tools/tutorials/pull/21>`_.

..
TODO: UPDATE THE LINK TO THE TUTORIAL once merged

For further development, we highlight the ability to apply custom functions using dask.
A :class:`DaskSpectralCube` loads the data as a `dask Array <https://docs.dask.org/en/stable/array.html>`_.
Similar to the non-dask :class:`SpectralCube`, custom functions can be used with
:meth:`DaskSpectralCube.apply_function_parallel_spectral` and
:meth:`DaskSpectralCube.apply_function_parallel_spatial`. Effectively these are
wrappers on `dask.array.map_blocks <https://docs.dask.org/en/stable/generated/dask.array.map_blocks.html#dask.array.map_blocks>`_
and accept common kwargs.

.. note::
The dask array can be accessed with `DaskSpectralCube._data` but we discourage
this as the builtin functions include checks, such as applying the mask to the
data.

If you have a use case needing on of dask array's other `operation tools <https://docs.dask.org/en/stable/array-best-practices.html#build-your-own-operations>`_
please raise an `issue on GitHub <https://github.com/radio-astro-tools/spectral-cube/issues>`_
so we can add this support!

The :ref:`doc_dask` page gives a basic example of using a custom function. A more
advanced example is shown in the `parallel fitting with dask tutorial <https://github.com/radio-astro-tools/tutorials/pull/12>`_.
This tutorial demonstrates fitting a spectral model to every spectrum in a cube, applied
in parallel over chunks of the data. This fitting example is a guide for using
:meth:`DaskSpectralCube.apply_function_parallel_spectral` with:

* A change in array shape and dimensions in the output (`drop_axis` and `chunks` in `dask.array.map_blocks <https://docs.dask.org/en/stable/generated/dask.array.map_blocks.html#dask.array.map_blocks>`_)
* Using dask's `block_info` dictionary in a custom function to track the location of a chunk in the cube

..
TODO: UPDATE THE LINK TO THE TUTORIAL once merged

1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,4 +119,5 @@ Advanced
dask.rst
yt_example.rst
big_data.rst
developing_with_spectralcube.rst
api.rst
2 changes: 2 additions & 0 deletions docs/masking.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _doc_masking:

Masking
=======

Expand Down