From 7bd4bb0b1aa0ce6064f7c9988bdefc0240b7f9e5 Mon Sep 17 00:00:00 2001 From: e-koch Date: Wed, 26 Jan 2022 17:32:03 -0500 Subject: [PATCH 1/8] Add references for some doc pages and add/update a few links --- docs/dask.rst | 4 ++++ docs/index.rst | 1 + docs/masking.rst | 2 ++ 3 files changed, 7 insertions(+) diff --git a/docs/dask.rst b/docs/dask.rst index 17190ee01..b1c3544d5 100644 --- a/docs/dask.rst +++ b/docs/dask.rst @@ -1,3 +1,5 @@ +.. _doc_dask: + Integration with dask ===================== @@ -25,6 +27,8 @@ To read in a FITS cube using the dask-enabled classes, you can do:: Most of the properties and methods that normally work with :class:`~spectral_cube.SpectralCube` should continue to work with :class:`~spectral_cube.DaskSpectralCube`. +For an interactive demonstration, see the `Guide to Dask Optimization `_. +.. TODO: UPDATE THE LINK TO THE TUTORIAL once merged Schedulers and parallel computations ------------------------------------ diff --git a/docs/index.rst b/docs/index.rst index 00f1278f7..e99cfe3bb 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -119,4 +119,5 @@ Advanced dask.rst yt_example.rst big_data.rst + developing_with_spectralcube.rst api.rst diff --git a/docs/masking.rst b/docs/masking.rst index b879d72c1..3cb6d9f81 100644 --- a/docs/masking.rst +++ b/docs/masking.rst @@ -1,3 +1,5 @@ +.. _doc_masking: + Masking ======= From 18756b8674f5c93af2580295d83845cf9564f0bc Mon Sep 17 00:00:00 2001 From: e-koch Date: Wed, 26 Jan 2022 17:32:41 -0500 Subject: [PATCH 2/8] First pass through a recommendations for developers page --- docs/developing_with_spectralcube.rst | 89 +++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) create mode 100644 docs/developing_with_spectralcube.rst diff --git a/docs/developing_with_spectralcube.rst b/docs/developing_with_spectralcube.rst new file mode 100644 index 000000000..b14f4777f --- /dev/null +++ b/docs/developing_with_spectralcube.rst @@ -0,0 +1,89 @@ +.. _doc_developersnotes: + +Notes for development using spectral-cube +========================================= +.. currentmodule:: spectral_cube + +spectral-cube is flexible and can used within other packages for +development beyond the core package's capabilities. Two significant strengths +are the use of memory-mapping and the integration with `dask `_ +(:ref:`doc_dask`) to efficiently larger than memory data. + +This page provides suggestions for developing using spectral-cube in other +packages. + +The following two sections give information on standard usage of :class:`SpectralCube`. +The third discusses usage with dask integration. + +Handling large data cubes +------------------------- + +spectral-cube is specifically designed for handling larger-than-memory data +and minimizes creating copies of the data. :class:`SpectralCube` uses memory-mapping +and provides options for executing operations with only subsets of the data +(for example, the `how` keyword in `~SpectralCube.moment`). + +Masking operations can be performed "lazily", where the computation is completed +only when a view of the underlying boolean mask array is returned (:ref:`doc_masking`). + +Further strategies for handling large data is given in :ref:`_doc_handling_large_datasets`. + + +Parallelizing operations +------------------------ + +Several operations implemented in :class:`SpectralCube` can be parallelized +using the `joblib `_ package. Builtin methods +in :class:`SpectralCube` with the `parallel` keyword will enable using joblib. + +New methods can take advantage of these features by using creating custom functions +to pass to :meth:`SpectralCube.apply_function_parallel_spatial` and +:meth:`SpectralCube.apply_function_parallel_spectral`. These methods expect +functions with a data and mask array input, with optional `**kwargs` that can be +passed, and expect an output array of the same shape as the input. + + +Unifying large-data handling and parallelization with dask +---------------------------------------------------------- + +spectral-cube's dask integration unifies many of the above features and further +options leveraging the dask ecosystem. The :ref:`doc_dask` page provides an overview +of general usage and recommended practices, including: + + * Using different dask schedulers (synchronous, threads, and distributed) + * Triggering dask executions and saving intermediate results to disk + * Efficiently rechunking large data for parallel operations + * Loading cubes in CASA image format + +For an interactive demonstration of these features, see +the `Guide to Dask Optimization `_. +.. TODO: UPDATE THE LINK TO THE TUTORIAL once merged + +For further development, we highlight the ability to apply custom functions using dask. +A :class:`DaskSpectralCube` loads the data as a `dask Array `_. +Similar to the non-dask :class:`SpectralCube`, custom function can be used with +:meth:`DaskSpectralCube.apply_function_parallel_spectral` and +:meth:`DaskSpectralCube.apply_function_parallel_spatial`. Effectively these are +wrappers on `dask.array.map_blocks `_ +and accept common kwargs. + +.. note:: + The dask array can be accessed with `DaskSpectralCube._data` but we discourage + this as the builtin functions include checks, such as applying the mask to the + data. + + If you have a use case needing on of dask array's other `operation tools `_ + please raise an `issue on GitHub `_ + so we can add this support! + +The :ref:`doc_dask` page gives a basic example of using a custom function. A more +advanced example is shown in the `parallel fitting with dask tutorial `_. +This tutorial demonstrates fitting a spectral model to every spectrum in a cube, applied +in parallel over chunks of the data. This fitting example is a guide for using +:meth:`DaskSpectralCube.apply_function_parallel_spectral` with: + + * A change in array shape and dimensions in the output (`drop_axis` and `chunks` in `dask.array.map_blocks `_) + * Using dask's `block_info` dictionary in a custom function to track the location of a chunk in the cube + +.. TODO: UPDATE THE LINK TO THE TUTORIAL once merged + From f4f6c0bc0eec9aa61af6640ec64a1b080a613323 Mon Sep 17 00:00:00 2001 From: e-koch Date: Wed, 26 Jan 2022 17:38:59 -0500 Subject: [PATCH 3/8] Catch some typos --- docs/developing_with_spectralcube.rst | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/docs/developing_with_spectralcube.rst b/docs/developing_with_spectralcube.rst index b14f4777f..3c9b6e739 100644 --- a/docs/developing_with_spectralcube.rst +++ b/docs/developing_with_spectralcube.rst @@ -6,14 +6,14 @@ Notes for development using spectral-cube spectral-cube is flexible and can used within other packages for development beyond the core package's capabilities. Two significant strengths -are the use of memory-mapping and the integration with `dask `_ -(:ref:`doc_dask`) to efficiently larger than memory data. +are the use of memory-mapping and the integration with `dask `_ +(:ref:`doc_dask`) to efficiently handle larger than memory data. This page provides suggestions for developing using spectral-cube in other packages. The following two sections give information on standard usage of :class:`SpectralCube`. -The third discusses usage with dask integration. +The third discusses usage with dask integration in :class:`DaskSpectralCube`. Handling large data cubes ------------------------- @@ -24,7 +24,8 @@ and provides options for executing operations with only subsets of the data (for example, the `how` keyword in `~SpectralCube.moment`). Masking operations can be performed "lazily", where the computation is completed -only when a view of the underlying boolean mask array is returned (:ref:`doc_masking`). +only when a view of the underlying boolean mask array is returned. +See :ref:`doc_masking` for details on these implementations. Further strategies for handling large data is given in :ref:`_doc_handling_large_datasets`. @@ -40,7 +41,7 @@ New methods can take advantage of these features by using creating custom functi to pass to :meth:`SpectralCube.apply_function_parallel_spatial` and :meth:`SpectralCube.apply_function_parallel_spectral`. These methods expect functions with a data and mask array input, with optional `**kwargs` that can be -passed, and expect an output array of the same shape as the input. +passed and expect an output array of the same shape as the input. Unifying large-data handling and parallelization with dask @@ -61,7 +62,7 @@ the `Guide to Dask Optimization `_. -Similar to the non-dask :class:`SpectralCube`, custom function can be used with +Similar to the non-dask :class:`SpectralCube`, custom functions can be used with :meth:`DaskSpectralCube.apply_function_parallel_spectral` and :meth:`DaskSpectralCube.apply_function_parallel_spatial`. Effectively these are wrappers on `dask.array.map_blocks `_ From 608f02dd498507f145a7feeff990267f30e88fb3 Mon Sep 17 00:00:00 2001 From: e-koch Date: Wed, 26 Jan 2022 17:59:04 -0500 Subject: [PATCH 4/8] Fix ref --- docs/developing_with_spectralcube.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/developing_with_spectralcube.rst b/docs/developing_with_spectralcube.rst index 3c9b6e739..603bee8e7 100644 --- a/docs/developing_with_spectralcube.rst +++ b/docs/developing_with_spectralcube.rst @@ -27,7 +27,7 @@ Masking operations can be performed "lazily", where the computation is completed only when a view of the underlying boolean mask array is returned. See :ref:`doc_masking` for details on these implementations. -Further strategies for handling large data is given in :ref:`_doc_handling_large_datasets`. +Further strategies for handling large data is given in :ref:`doc_handling_large_datasets`. Parallelizing operations From 173252d4f3c04b8e06f1da57f67daf42f54a85ee Mon Sep 17 00:00:00 2001 From: e-koch Date: Wed, 26 Jan 2022 18:01:59 -0500 Subject: [PATCH 5/8] Switch to method --- docs/developing_with_spectralcube.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/developing_with_spectralcube.rst b/docs/developing_with_spectralcube.rst index 603bee8e7..ddd8d0d71 100644 --- a/docs/developing_with_spectralcube.rst +++ b/docs/developing_with_spectralcube.rst @@ -21,7 +21,7 @@ Handling large data cubes spectral-cube is specifically designed for handling larger-than-memory data and minimizes creating copies of the data. :class:`SpectralCube` uses memory-mapping and provides options for executing operations with only subsets of the data -(for example, the `how` keyword in `~SpectralCube.moment`). +(for example, the `how` keyword in :meth:`SpectralCube.moment`). Masking operations can be performed "lazily", where the computation is completed only when a view of the underlying boolean mask array is returned. From 64536cd182bece1bd0d6c8613611703055e688df Mon Sep 17 00:00:00 2001 From: e-koch Date: Wed, 26 Jan 2022 18:03:42 -0500 Subject: [PATCH 6/8] Fix comments --- docs/dask.rst | 3 ++- docs/developing_with_spectralcube.rst | 6 ++++-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/dask.rst b/docs/dask.rst index b1c3544d5..abdb6abc7 100644 --- a/docs/dask.rst +++ b/docs/dask.rst @@ -28,7 +28,8 @@ Most of the properties and methods that normally work with :class:`~spectral_cub should continue to work with :class:`~spectral_cube.DaskSpectralCube`. For an interactive demonstration, see the `Guide to Dask Optimization `_. -.. TODO: UPDATE THE LINK TO THE TUTORIAL once merged +.. + TODO: UPDATE THE LINK TO THE TUTORIAL once merged Schedulers and parallel computations ------------------------------------ diff --git a/docs/developing_with_spectralcube.rst b/docs/developing_with_spectralcube.rst index ddd8d0d71..ae70463f6 100644 --- a/docs/developing_with_spectralcube.rst +++ b/docs/developing_with_spectralcube.rst @@ -58,7 +58,8 @@ of general usage and recommended practices, including: For an interactive demonstration of these features, see the `Guide to Dask Optimization `_. -.. TODO: UPDATE THE LINK TO THE TUTORIAL once merged +.. + TODO: UPDATE THE LINK TO THE TUTORIAL once merged For further development, we highlight the ability to apply custom functions using dask. A :class:`DaskSpectralCube` loads the data as a `dask Array `_. @@ -86,5 +87,6 @@ in parallel over chunks of the data. This fitting example is a guide for using * A change in array shape and dimensions in the output (`drop_axis` and `chunks` in `dask.array.map_blocks `_) * Using dask's `block_info` dictionary in a custom function to track the location of a chunk in the cube -.. TODO: UPDATE THE LINK TO THE TUTORIAL once merged +.. + TODO: UPDATE THE LINK TO THE TUTORIAL once merged From 055f9e09d8e2c23a710ac64adefa40a5a00529b5 Mon Sep 17 00:00:00 2001 From: e-koch Date: Wed, 26 Jan 2022 18:09:30 -0500 Subject: [PATCH 7/8] Add line before comment --- docs/dask.rst | 1 + docs/developing_with_spectralcube.rst | 1 + 2 files changed, 2 insertions(+) diff --git a/docs/dask.rst b/docs/dask.rst index abdb6abc7..91a261135 100644 --- a/docs/dask.rst +++ b/docs/dask.rst @@ -28,6 +28,7 @@ Most of the properties and methods that normally work with :class:`~spectral_cub should continue to work with :class:`~spectral_cube.DaskSpectralCube`. For an interactive demonstration, see the `Guide to Dask Optimization `_. + .. TODO: UPDATE THE LINK TO THE TUTORIAL once merged diff --git a/docs/developing_with_spectralcube.rst b/docs/developing_with_spectralcube.rst index ae70463f6..eebeb212e 100644 --- a/docs/developing_with_spectralcube.rst +++ b/docs/developing_with_spectralcube.rst @@ -58,6 +58,7 @@ of general usage and recommended practices, including: For an interactive demonstration of these features, see the `Guide to Dask Optimization `_. + .. TODO: UPDATE THE LINK TO THE TUTORIAL once merged From 29937e4a4c93c7fd4be47ad6fe626ab71ac8204a Mon Sep 17 00:00:00 2001 From: Eric Koch Date: Thu, 27 Jan 2022 08:33:46 -0500 Subject: [PATCH 8/8] Apply suggestions from code review Co-authored-by: Adam Ginsburg --- docs/developing_with_spectralcube.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/developing_with_spectralcube.rst b/docs/developing_with_spectralcube.rst index eebeb212e..037fa1f37 100644 --- a/docs/developing_with_spectralcube.rst +++ b/docs/developing_with_spectralcube.rst @@ -9,7 +9,7 @@ development beyond the core package's capabilities. Two significant strengths are the use of memory-mapping and the integration with `dask `_ (:ref:`doc_dask`) to efficiently handle larger than memory data. -This page provides suggestions for developing using spectral-cube in other +This page provides suggestions for software development using spectral-cube in other packages. The following two sections give information on standard usage of :class:`SpectralCube`. @@ -37,11 +37,11 @@ Several operations implemented in :class:`SpectralCube` can be parallelized using the `joblib `_ package. Builtin methods in :class:`SpectralCube` with the `parallel` keyword will enable using joblib. -New methods can take advantage of these features by using creating custom functions +New methods can take advantage of these features by creating custom functions to pass to :meth:`SpectralCube.apply_function_parallel_spatial` and -:meth:`SpectralCube.apply_function_parallel_spectral`. These methods expect -functions with a data and mask array input, with optional `**kwargs` that can be -passed and expect an output array of the same shape as the input. +:meth:`SpectralCube.apply_function_parallel_spectral`. These methods accept +functions that take a data and mask array input, with optional `**kwargs`, +and that return an output array of the same shape as the input. Unifying large-data handling and parallelization with dask