AICSImageIO 5.0 and Roadmap #372
Replies: 10 comments 3 replies
-
Just a couple of additional random high level thoughts. This is also an opportunity to make sure we have thought about functionality like mosaic, pyramid, multiscene, and rgb, while keeping the 80% code path have the simplest API. As we develop the spec of the new API, again, a goal is to keep the API simple and sensible with easy to use defaults for most scalar-valued bio images. And to minimize changes if possible, so that existing code will require the fewest possible changes. With the changes we discuss here, AICSimageIO results in being constructed entirely of:
|
Beta Was this translation helpful? Give feedback.
-
If we switch to a plugin model, we do this in part to reduce the maintenance burden of a single central repo that has ALL the file format readers. I wonder how we would "hand back" the currently implemented readers to their rightful maintainers (who we assume to be the domain experts on those particular file formats). We expect aicsimageio to maintain at least a couple of readers and writers, for open file formats like TIFF and Zarr, but it's easy to see a future where third parties implement even more optimized versions for the same formats. Another key problem we have identified with a discoverable plugin system is assigning precedence when there is more than one plugin that can read the same file format. |
Beta Was this translation helpful? Give feedback.
-
Prep for mini-hackathon:
|
Beta Was this translation helpful? Give feedback.
-
reader precedence: |
Beta Was this translation helpful? Give feedback.
-
Homework for us both: https://github.com/danielballan/pims2-prototype Read |
Beta Was this translation helpful? Give feedback.
-
Updates from HackathonDone
Planned Soon
Planned Later
|
Beta Was this translation helpful? Give feedback.
-
Oh -- as far as plugins go, is there a necessity to standardize permissible error types a (functional) plugin is permitted to / expected to use? |
Beta Was this translation helpful? Give feedback.
-
@evamaxfield @toloudis @BrianWhitneyAI Dan, Brian, & I briefly talked about how to determine which plug-in is best to accept when there are multiple plug-ins that support the same file format (like TIFF reader vs bioformats). This is the precedence we drafted:
See also this issue in bioio that describes how some plug-ins may be filtered out. My initial thoughts are:
|
Beta Was this translation helpful? Give feedback.
-
I tend to agree that the "exhaustive search" (which lets every possible installed plugin try to open the file and read some bits to try to tell what type it is) is potentially unnecessary. I wonder what workflows truly depend on this, where they are batching over many files and they don't have file extensions and don't know what the types are. Also be sure to note that extension ".ome.tiff" is different than ".tiff" and higher precedence because it is more specific. |
Beta Was this translation helpful? Give feedback.
-
I think this is in line with our thoughts from long ago. I can think of a few potential use cases for exhaustive search (usually involving "no-file-extension blob storage"), but I think for a pre-release v5 it can be ignored and added later, in a release candidate or something.
I personally don't think the update / created datetime difference matters. Updated datetime hopefully means most recently upgraded which works for me.
Ahhh yes. I forget how the current implementation works but I think priority should be: "length of suffix" match first, then datetime installed / updated, etc. etc. Last little idea which I would say to hold on until release candidate season is something like |
Beta Was this translation helpful? Give feedback.
-
Hello everybody! We are starting a discussion here as a means to figure out the general interest / a potential plan for an AICSImageIO 5.0.
Over the past couple of months we have seen a couple of new readers added, we have experienced some licensing issues, and we have heard some general gripes and wishes for this library.
Reasons for a 5.0
TL;DR:
We also want to minimize API changes to minimize the upgrade burden for existing code
Full Details
Make it easier for members of the community to implement their own readers
While we originally thought that our base reader specification was minimal in what a contributor would need to provide, there have been issues. First is the many complications to writing and maintaining a reader implementation (more on this later) but this specific target is meant to make it easier to contribute a reader that follows our spec. To that we look to the plugin model (examples of inspiration come from
fsspec
andnapari
hookspec). Some goals and ideas for this are:Reduce the maintainance on us / a single library
As we have added more readers to the library, we have become worried (as rightfully pointed out by members of the community) that it may become a problem if someone contributes a reader to the library but fails to maintain it and fix bugs.
Allow variability in which array-types and reading modes are supported
One of the things we noticed from new reader implementers was confusion as to "what all was needed." Some even asking the question: "Why can't I just support
numpy.ndarray
as the output type?" This is a totally valid question. Additionally, many users have expressed confusion about "what happens when they callget_image_data
vsget_image_dask_data
", and as we grow our supported dtypes, do we simply keep adding methods? This somewhat relates to how we already allow reader implementations to determine if they want to support mosaic file stitching or not.In 5.0, Reader authors can choose to support output to numpy, dask array, or cupy arrays. If a reader is used in a mode that is not implemented, then exceptions should be raised. (Maybe a reader should be able to report which modes it supports, to avoid trial and error) This should be extensible to new array types which would be unimplemented in readers by default.
dtype
andmode
parameters to specify the underlying memory model:img = AICSImage("path", dtype="numpy", mode="delayed")
. (Note: there is probably a better name than dtype, to avoid confusion)Reader.get_image_data
will remain the key entry point for getting pixel data out. The base class and utility code in aicsimageio will provide important basic supporting functionality around chunking, transpositions, etc.Minor API and user interaction improvements:
AICSImageIO really tries to determine a lot of behavior for the user without them knowing. Some of those things are
chunk_dims
and mosaic handling and crucially, even whichReader
will be selected as default. But these have caused problems and confusion because the user has no information before attempting to load the file. A specific example of this behavior has occurred when the file is a large mosaic image that may take minutes to completely stitch together before returning theAICSImage
object. The user should be able to determine how they want to load the object before they attempt to do so, with the defaults generally still being "good practice."mode
anddtype
they want their data stored in. I.e.Reader.get_read_modes_available
should return a list of the various modes available for that reader for that file. AndReader.get_metadata(file)
should pop open the file, parse the metadata, and return some information that will be useful for users in determining which mode they want to open the file in or to simply access the metadata without all the pixel data loading.Feedback
If you have any thoughts, ideas, concerns, or anything else you wish to share about these ideas please let us know! We are hoping to get a lot of feedback for a potential 5.0 API throughout the design and implementation process.
(in-place edits by @toloudis)
Beta Was this translation helpful? Give feedback.
All reactions