Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/dask poc #192

Merged
merged 48 commits into from
Aug 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
4cbe166
quick first attempt to use dask
toloudis Apr 14, 2022
b24e385
make scale.nearest work with dask, hopefully
toloudis Apr 15, 2022
bcb9e72
fiddling with passing the right args to to_zarr... now compute runs b…
toloudis Apr 15, 2022
c144548
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 15, 2022
c721cfb
remove commented line
toloudis Apr 15, 2022
2546373
try to fix the resizing and dask usage
toloudis Apr 15, 2022
cada2e7
Merge branch 'feature/dask-poc' of https://github.com/toloudis/ome-za…
toloudis Apr 15, 2022
dda7767
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 15, 2022
366062e
more consistent chunk size
toloudis Apr 19, 2022
d09e8ff
WIP - try to use da.compute(*delayed) - fast but buggy
will-moore Apr 22, 2022
d9cc16c
remove group.create_dataset() which creates float dtype arrays
will-moore Apr 25, 2022
336218e
Use ngff-writer.dask_utils.resize(). Remove write_multiscale()
will-moore Apr 26, 2022
443c565
Add dask_utils from ngff-writer
will-moore May 5, 2022
3edad53
Remove hard-coded unnecessary chunks_opt
will-moore May 5, 2022
dc89c44
Merge pull request #1 from will-moore/dask-poc
toloudis May 11, 2022
4209f89
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 11, 2022
0d5a795
will moore's changes plus minor tweaks
toloudis May 11, 2022
a18c33c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 11, 2022
09cb6bc
Merge branch 'feature/dask-poc' of https://github.com/toloudis/ome-za…
toloudis May 11, 2022
f93b179
fix mypy errors
toloudis May 11, 2022
02bbb2f
try to fix more mypy temporarily
toloudis May 11, 2022
6de1732
Add tox and pre-commit info to README
will-moore May 12, 2022
cb1a4c7
test commit
will-moore May 12, 2022
00f8b41
fix docstrings in dask_utils
will-moore May 12, 2022
489d5bd
Use Any for args and kwargs
will-moore May 13, 2022
cb39a08
flake8 fixes
will-moore May 13, 2022
f17861f
Add back write_multiscale() - without dask support
will-moore May 13, 2022
8f44127
test fix
will-moore May 13, 2022
f358779
Port PR #199 to this branch
will-moore May 16, 2022
ccd299f
Add test_write_image_dask()
will-moore May 16, 2022
611a0a6
Merge pull request #2 from will-moore/dask-poc
toloudis May 16, 2022
3fde9e5
Use scaler.nearest(pixels) for data astronaut()
will-moore May 18, 2022
167404a
Merge pull request #3 from will-moore/dask-poc
toloudis May 18, 2022
61a7440
Test that storage_options:chunks are respected
will-moore May 19, 2022
8347659
Merge pull request #4 from will-moore/dask-poc
will-moore May 19, 2022
56c807c
Merge remote-tracking branch 'upstream/master' into feature/dask-poc
toloudis Jul 3, 2022
e0a07a1
minor cleanup
toloudis Jul 3, 2022
3eb6616
add dask support to a couple other public write functions and make sc…
toloudis Jul 3, 2022
b3aa401
make tox tests pass by ensuring that multiscales metadata always has …
toloudis Jul 3, 2022
9d18ea5
fix mypy
toloudis Jul 4, 2022
62f641d
fix mypy again
toloudis Jul 4, 2022
79f31f4
add writer tests for both dask and ndarrays
toloudis Jul 10, 2022
a913166
Merge remote-tracking branch 'upstream/master' into feature/dask-poc
toloudis Jul 10, 2022
1b5ce35
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 10, 2022
b7a0eb3
fix test
toloudis Jul 10, 2022
19831b6
fix test
toloudis Jul 10, 2022
ed938d1
typings
toloudis Jul 28, 2022
607b85c
Merge branch 'ome:master' into feature/dask-poc
toloudis Jul 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,19 @@ It can be built locally with:
pip install spinx
sphinx-build -b html docs/source/ docs/build/html

Tests
-----

Tests can be run locally via `tox` with:

$ pip install tox
$ tox

To enable pre-commit code validation:

$ pip install pre-commit
$ pre-commit install

Release process
---------------

Expand Down
75 changes: 75 additions & 0 deletions ome_zarr/dask_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
from typing import Any, Tuple

import numpy as np
import skimage.transform
from dask import array as da

# This module contributed by Andreas Eisenbarth @aeisenbarth
# See https://github.com/toloudis/ome-zarr-py/pull/1


def resize(
image: da.Array, output_shape: Tuple[int, ...], *args: Any, **kwargs: Any
) -> da.Array:
r"""
Wrapped copy of "skimage.transform.resize"
Resize image to match a certain size.
:type image: :class:`dask.array`
:param image: The dask array to resize
:type output_shape: tuple
:param output_shape: The shape of the resize array
:type \*args: list
:param \*args: Arguments of skimage.transform.resize
:type \*\*kwargs: dict
:param \*\*kwargs: Keyword arguments of skimage.transform.resize
:return: Resized image.
"""
factors = np.array(output_shape) / np.array(image.shape).astype(float)
# Rechunk the input blocks so that the factors achieve an output
# blocks size of full numbers.
better_chunksize = tuple(
np.maximum(1, np.round(np.array(image.chunksize) * factors) / factors).astype(
int
)
)
image_prepared = image.rechunk(better_chunksize)
block_output_shape = tuple(
np.floor(np.array(better_chunksize) * factors).astype(int)
)

# Map overlap
def resize_block(image_block: da.Array, block_info: dict) -> da.Array:
return skimage.transform.resize(
image_block, block_output_shape, *args, **kwargs
).astype(image_block.dtype)

output_slices = tuple(slice(0, d) for d in output_shape)
output = da.map_blocks(
resize_block, image_prepared, dtype=image.dtype, chunks=block_output_shape
)[output_slices]
return output.rechunk(image.chunksize).astype(image.dtype)


def downscale_nearest(image: da.Array, factors: Tuple[int, ...]) -> da.Array:
"""
Primitive downscaling by integer factors using stepped slicing.
:type image: :class:`dask.array`
:param image: The dask array to resize
:type factors: tuple
:param factors: Sequence of integers factors for each dimension.
:return: Resized image.
"""
if not len(factors) == image.ndim:
raise ValueError(
f"Dimension mismatch: {image.ndim} image dimensions, "
f"{len(factors)} scale factors"
)
if not (
all(isinstance(f, int) and 0 < f <= d for f, d in zip(factors, image.shape))
):
raise ValueError(
f"All scale factors must not be greater than the dimension length: "
f"({tuple(factors)}) <= ({tuple(image.shape)})"
)
slices = tuple(slice(None, None, factor) for factor in factors)
return image[slices]
2 changes: 1 addition & 1 deletion ome_zarr/reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -500,7 +500,7 @@ def get_pyramid_lazy(self, node: Node) -> None:
well_node = Node(well_zarr, node)
well_spec: Optional[Well] = well_node.first(Well)
if well_spec is None:
raise Exception("could not find first well")
raise Exception("Could not find first well")
self.numpy_type = well_spec.numpy_type

LOGGER.debug(f"img_pyramid_shapes: {well_spec.img_pyramid_shapes}")
Expand Down
45 changes: 42 additions & 3 deletions ome_zarr/scale.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@
import os
from collections.abc import MutableMapping
from dataclasses import dataclass
from typing import Callable, Iterator, List
from typing import Any, Callable, Iterator, List, Tuple, Union

import dask.array as da
import numpy as np
import zarr
from scipy.ndimage import zoom
Expand All @@ -19,10 +20,14 @@
resize,
)

from .dask_utils import resize as dask_resize
from .io import parse_url

LOGGER = logging.getLogger("ome_zarr.scale")

ListOfArrayLike = Union[List[da.Array], List[np.ndarray]]
ArrayLike = Union[da.Array, np.ndarray]
Comment on lines +28 to +29

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in the future these could be some kind of general Array (once that is more fleshed out).



@dataclass
class Scaler:
Expand Down Expand Up @@ -125,6 +130,30 @@ def __create_group(
series.append({"path": path})
return grp

def resize_image(self, image: ArrayLike) -> ArrayLike:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is different from all the others on Scaler class as it returns a single Array (not a Pyramid of arrays). It is only used if the image in writer.write_image(image, group, scaler) is a dask array. Otherwise, mip = scaler.nearest(image) is used to create a pyramid for non-dask data. If you pass in a scaler instance that doesn't implement scaler.resize_image() then write_image(dask_image, group, scaler) will fail. This is unexpected and not documented anywhere.

When a scaler instance is passed in to write_image() none of the other methods on it are used (gaussian(), laplacian(), local_mean(), zoom()). If you wish to use one of these methods for downsampling (non-dask only), you need to do it before passing the data to write_multiscale(). So maybe these methods should be on a different Scaler class than the Scaler that is used for write_image(i, g, scaler) and the same scaling method should be used for dask and non-dask data (to return a single Image, not a pyramid)?

The write_image() docs for scaler parameter state If this is None, no downsampling will be performed. but that isn't true for dask_image data - this will fail if scaler is None.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that none of these issues is a blocker - Improving the Scaler documentation and usage - Using the scaler.resize_image() instead of scaler.nearest() for dask AND non-dask data, possibly removing other methods from the class etc could come in a follow-up PR (even a follow-up release). cc @joshmoore?

Probably the most important fix should be handling write_image(dask_data, group, scaler=None) which I think would currently fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree that dask versions of gaussian, laplacian, etc could be done in later PRs.
I can try to do one more pass this afternoon to make sure scaler=None does not break with dask array.
As I'm sure is true with many others, I am splitting my time among many different projects, so thanks for living with these intermittent pieces of work on this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick look and I am already handling scaler=None in _write_dask_image: (1) max_layer set to 0 if scaler is None, and (2) don't call scaler.resize_image if scaler is None

"""
Resize a numpy array OR a dask array to a smaller array (not pyramid)
"""
if isinstance(image, da.Array):

def _resize(image: ArrayLike, out_shape: Tuple, **kwargs: Any) -> ArrayLike:
return dask_resize(image, out_shape, **kwargs)

else:
_resize = resize

# only down-sample in X and Y dimensions for now...
new_shape = list(image.shape)
new_shape[-1] = np.ceil(float(image.shape[-1]) / self.downscale)
new_shape[-2] = np.ceil(float(image.shape[-2]) / self.downscale)
out_shape = tuple(new_shape)

dtype = image.dtype
image = _resize(
image.astype(float), out_shape, order=1, mode="reflect", anti_aliasing=False
)
return image.astype(dtype)

def nearest(self, base: np.ndarray) -> List[np.ndarray]:
"""
Downsample using :func:`skimage.transform.resize`.
Expand All @@ -133,9 +162,19 @@ def nearest(self, base: np.ndarray) -> List[np.ndarray]:
"""
return self._by_plane(base, self.__nearest)

def __nearest(self, plane: np.ndarray, sizeY: int, sizeX: int) -> np.ndarray:
def __nearest(self, plane: ArrayLike, sizeY: int, sizeX: int) -> np.ndarray:
"""Apply the 2-dimensional transformation."""
return resize(
if isinstance(plane, da.Array):

def _resize(
image: ArrayLike, output_shape: Tuple, **kwargs: Any
) -> ArrayLike:
return dask_resize(image, output_shape, **kwargs)

else:
_resize = resize

return _resize(
plane,
output_shape=(sizeY // self.downscale, sizeX // self.downscale),
order=0,
Expand Down
Loading