-
Notifications
You must be signed in to change notification settings - Fork 0
Some thoughts to get started #1
Comments
Thanks for this! I wanted to include this issue in the thread as well: |
I think we should standardize this for zarr. In an napari, the last two dimensions of the array are I've also seen conflicting things here with dimensions. I parsed an OME-TIFF using |
This makes a lot of sense. In some experimenting I've done, I've re-initialized the dask array from the recently saved downsampled version, so in the next iteration of the downsampling dask doens't need to recompute the previous sampling, just downsampling from the current layer.
How is this different from simply calling
In this case I would definitely suggest the final dimensions of the arrays be standardised to |
I've played a bit recently with creating tiled images from OME-tiff. On early experimentation, the biggest bottleneck seems to be converting the data to the Right now it is painfully slow to use |
Yes, I looked a lot into the logic of TiffFile for reading and essentially it creates an empty numpy array and then reads each tile into that array with the various workers. My thought and what I'd tried a little already was to stream those tiles directly to the
And if you look in the Sorry I just have clues, I wish I could show you a working version of what I wrote but I didn't get terribly far and have a bunch of other balls in the air right now but may find time early next week to revisit and make some decent running code. |
No worries! Thanks for the pointer. I'll have a look a look into this as well. |
I've been stewing a little about how to best convert images to zarr. It seems the primary focus will be whole slide images (WSIs) and pyramidal formats for efficient serving of large planes of high resolution data with an obvious focus on microscopy but we should be open to other spatial data like astronomy or GIS as low priority. A second focus will be imaging mass spectrometry (IMS) data where the broadband spectral nature adds significantly more considerations in how the data in stored and served, i.e. serving image data vs serving spectral data. We may in fact want to package something like
ims2zarr
eventually. My reading of the temperature in the IMS community tells me people don't want to use imzML too much longer but it's ubiquity at the vendor level, and analysis software level propagates it.My 1000 ft view for design is a python class for writing the
zarr
stores back-ended with different image readers that recognize different image formats or simply pass in anumpy
array (with caveats listed below).Important image metadata
zarr
store or does the format's original ordering persist?Readers
tifffile
: handles most tiffs, can extract tiff metadata, and pyramid levels in already pyramidalized images. Gives access to tiles from tiled tiffs with some "hacks" I've made. This gives us advantages to avoid loading entire image planes into memory, some of which may exceed memory on even high performance machines.czifile
: reads data from zeiss microscopes. Can access the native high resolution tiles, again, it has to be hacked a little.openslide
and it's python bindings: reads many RGB format whole slide images, this has some overlap withtifffile
, but I'm not sure the extent. I'd default totifffile
as it's not a dead project likeopenslide
appears to be.bioformats-python
andjavabridge
: the combination of these python packages gives access to everythingbioformats
can read, which is substantial.ITK
(andSimpleITK
) : gives access to medical image data like MRI, CT, etc. These images have a lot of different considerations.numpy
array. Some people may wish to process their data in python so having this option is important. Here is may be more difficult to define import metadata.Writing pyramids
My current thinking on this is to use the above readers to initialize a
zarr
store with the image's highest resolution, or base layer, and then usedask
to process the lower res. layers usingda.coarsen
. The downsampling should optionally be Gaussian or Laplacian smoothed prior to downsampling to render images that look less pixelated in the lower res. pyramidal layers.dask_image
has such filtering already. Each layer should have it's own metadata with the things listed above. For reading and initializing the store with the base layer, I would like to emphasize limited memory footprint so tiled images where we can randomly access smaller parts of the data set are a plus.An important question may be high resolution 3D data like cryo-EM or any reconstructed serial section microscopy but that seems more long term at the moment and there are some specific formats for that already that we may think of converting to
zarr
.Ok, where to start...
I see two things that are priority, getting the pyramid writing highly optimized and getting readers for many different types of data.
Maybe we can discuss a little more here and bring in more major topics and then divvy up some smaller issues and actionable items as separate issues or add milestones to the repo, etc.
The text was updated successfully, but these errors were encountered: