-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiscale metadata #102
Comments
Regarding my description of separate
The The user can also directly modify the units and coefficients of the For intermediate coordinate spaces like @d-v-b mentioned that specifying the In general I would say there is not necessarily any single Another possibly relevant example: |
Even for digital downsampling each scale level typically has a different translation applied to it, and that translation depends on the type of resampling procedure applied during downsampling. This translation should be specified explicitly in metadata. And if the translation is specified explicitly, we really should specify scale as well. Scale and translation completely specify downsampling a grid; failing to do so requires baking assumptions about downsampling routines into the format, and this should be avoided. I understand that 99% of the time we might be doing 2x windowed averaging, and most data viewers automatically assume the offsets generated by this procedure, but this spec should not make such assumptions when being explicit is so easy. If someone wants to be weird and generates a multiscale collection by first downsampling by 2, then by 3, then by 4, while always resampling on a grid starting at 0, this should be consistent with the spec (and data viewers should be able to handle this).
First, you are describing multiple transformations from dataset -> world, not multiple world coordinate spaces. I'm not sure what conditions would result in multiple world coordinate spaces. I've never worked with data with ambiguous axis semantics, maybe there are some examples? Second, I don't think ome-ngff metadata should support multiple alignments of the same dataset. That seems way too complex. Instead, I would stipulate that there are just 2 coordinate spaces in scope for the format: dataset and world. This makes everything (in particular, transformations) much simpler, and it's very close to how most people think about their data when it comes off an instrument. This of course is just my opinion, so I would be curious to hear from other people. |
@d-v-b I 100% agree that we should be very explicit about the downsampling factors and translation due to the downsampling method. To me it seems most natural to represent that by specifying the downsample factors and offsets relative to the base resolution, i.e. in terms of base resolution voxels rather than some physical coordinate space, but I can see advantages to both approaches. I think of the I would agree that in the simple case where you have a single image and you are just adding in the nominal voxel size from the microscope, so your transformation is really just specifying your units, maybe it is reasonable to always work in the same physical coordinate space and so you only need a single
Each channel of each of these volumes may be represented by a bunch of 2-d images, each image with its own nominal coordinate spaces from the microscope parameters. In some cases you may wish to view the 2-d images directly. Additionally we may produce individual alignments of each channel of each volume, and then align one volume to another. Then there may also be a "reference" coordinate system for the organism, and we may wish to align the data to this reference coordinate system. In total we may have:
Depending on the stage of processing, I can imagine that all of these coordinate spaces may be relevant for visualization and/or processing tasks. It seems like the "spaces" proposal by @bogovicj (#94) would address all of this, though. In general it just seems more natural to me to attach coordinate transformations to a named "view", rather than to the array itself, since we can have arbitrarily many views but could attach just a single coordinate transformation to the array itself and then we also have the potential ambiguity of whether we want to refer to the "raw" array or the transformed array. I suppose the intention may be that a view would be layered on top of a multiscale, rather than underneath it, though? |
Thanks for branching this out into a separate issue, @d-v-b. I didn't have time to read everything you and @jbms wrote carefully, but I'll leave a few comments: I think this issue is bringing up three related but distinct points 1 Meaning of coordinateTransformation and spaces
The point that I wanted to make (and that @jbms also made again here) is that there can be multiple "world" coordinate spaces (e.g. different registrations, different alignments etc.) and that
Indeed, it's not complete yet, but is on the immediate roadmap and discussed in #94. I think the only actionable thing to do here is to work on #94, #101 and follow ups to extend the definition of spaces and transformations and make sure that they are clear. 2 Where do we define array specific metadata (axes and transformations)I think this summarises the potential changes to metadata very well:
There are advantages / disadvantages to both solutions. I don't have a very strong opinion on this a priori, except for the sunken cost that all tools currently supporting ome.zarr are built with "consolidated" metadata at the group level in mind, and that this is a rather large architectural change. 3 How do we specify downscalingRaised by @jbms:
This point was quite extensively discussed already (sorry I can't find the exact Issue/PR for it right now) and there was a strong support for using transformations for each of the scale levels instead of downsampling factors. So I would be very much in favor of not opening this discussion again. In summary
|
@d-v-b, does your 👍 on @constantinpape's summary mean that we can see this issue primarily as added discussion for #94 & #101? If so, do we need to keep it open or find it again when the time comes? A few additions from my side "inline":
The ability to move arrays is something that has come up a few times in various contexts. At some point we should probably talk through what type of requirement this is (MUST, SHOULD, etc). It will impact various other parts of the spec like naming conventions.
I don't have any concrete numbers but I know that the xarray community is quite convinced of the savings of zarr-level consolidated metadata.
Reading through the discussion above, I do wonder if we don't consider multiscales just a short-hand moving forward for the more complete model that's being discuss. It might be that when the transforms are in we will be faced with deciding whether or not that short-hand has a place. Options I can imagine:
Probably the question is if there are any MUST fix issues here. If not, perhaps we can note design guidelines/lessons that we can apply as the spec evolves.
In general, from my side 👍 for avoiding or at least finding strategies for duplication. As for the group metadata, I'd add to CP's point:
that there will inevitably be refactorings over the next milestones which we can use to re-evaluate these layouts but that if there are no complete blockers, focusing on adding user-value to bring in more applications is probably better bang for our buck, which is maybe just another way of saying:
|
Yes, as long as one (or both) of those issues takes up the question of multiscale metadata. But I'm not sure the discussion of consolidated metadata fits in either of those issues (and sorry for putting so much stuff in this issue)...
Regardless of what this spec says, anyone with a standard zarr library can open an array directly, copy it, etc. So the spec should probably treat that access pattern as a given and design around it, which in my mind entails putting array metadata with arrays. That being said, it's possible to imagine wrapping array access in a higher-level API that doesn't foreground direct array access (e.g., xarray, which serializes a dataarray to a zarr group + collection of zarr arrays). But for xarray, this is necessary because the xarray data model involves multiple arrays (data and coordinates).
Agreed, it can be a big performance win. If this is an attractive angle, the right way to approach this is to consider formally supporting metadata consolidation as a transformation of unconsolidated metadata, instead of baking consolidation into the semantics of the metadata. This might mean defining an "ome-ngff data model", which could be serialized in multiple ways (maybe just two ways: consolidated and unconsolidated metadata), as opposed to a concrete specification of how exactly the metadata in a zarr container should look. Maybe this should be yet another issue...
I was kind of hoping that the multiscale spec would have little or no interaction with the specification of spaces and transforms. This was (I thought) the conclusion of zarr-developers/zarr-specs#50. I don't think the underlying semantics of a multiscale collection of images is at all complicated -- it's a list of images, each with spatial metadata, with a convention for ordering (increasing grid spacing), and even this could be relaxed to a SHOULD. Spatial metadata for each image composes with this "mere list of images" idea. And I'm happy to discuss this further in the spaces / transformations issues, even if just to say "multiscales should compose with this". All that being said, I don't think there are any MUST fix issues. |
@d-v-b wrote:
I think there are potentially two forms of multiscale array: Type 1 (discrete coordinate space): The transforms between scales are strictly translation and scale-only, and furthermore these translation and scale factors are all rational numbers (with small denominators), and often powers of two. You can do useful discrete/integer indexing with this type of multiscale volume. This is by far the most common form of multiscale array in my experience, and in particular is what you normally get by digitally downsampling a single base scale. Type 2 (continuous coordinate space): The transforms between scales are arbitrary, may involve affine transforms or even displacement fields. Integer indexing is not useful for this type of volume --- you will almost surely do everything via continuous coordinates and interpolation. This is what you might get from imaging at multiple optical zoom levels. This is strictly a generalization of type 1. (This type of multiscale volume is similar to the internal representation used by Neuroglancer.) The current proposal, in that it allows arbitrary transformations between scales, seems to be geared towards type 2. In my mind, it seems very natural that OME-zarr concern itself primarily with continuous coordinate space stuff, and there could be a separate zarr-multiscale standard for handling type 1. I would say though that since type 1 is by far more common, it may make sense to focus on standardizing type 1 multiscale arrays first. I think this distinction also relates to the previous discussion of whether to represent scales in terms of downsampling factors or in terms of "absolute" physical units. For type 1 I think there is a clear case to represent the scales via rational number downsampling factors (since that preserves the ability to do discrete indexing), while for type 2 it is less clear. |
@jbms I'm not sure I follow the logic here. In all cases arrays are defined over a finite set of coordinates (the array indices). When these coordinates are mapped into world coordinates via an affine transform (or any other bijective transform), the cardinality of the coordinates is unchanged. A visualization tool may introduce a continuous coordinate space by rendering data at coordinates between true coordinate values via interpolation, but this is purely a concern of that tool. |
@d-v-b One example I have in mind is applying a neural network model that takes input patches at multiple scale levels that are supposed to be aligned to each other in a certain way, e.g. a common center position and certain relative scales, e.g. 1x1x1, 2x2x2, 4x4x4; each successive scale may have the same voxel dimensions but cover a larger physical area. If we have a type 1 (discrete coordinate space) multiscale array, we can just check that, for each scale level required by the model, there is a scale in the multiscale array with exactly the desired downsample factors. We can then just read from these arrays without any interpolation and feed the data into the model. If we have a type 2 (continuous) multiscale array, then it seems it would be much more difficult to apply the neural network model. We have to somehow decide which of the scale levels we want to read from (and that is not necessarily at all obvious if they are not simple scale-and-translation-only), and then we have to interpolate to get the resolution expected by the model if it does not exactly match. Furthermore, even if it is just a scale-and-translation-only transform, to decide if the resolution exactly matches what is expected by the model, we need to use floating-point arithmetic which is subject to rounding and loss of precision. |
(Following on discussion from #85)
As schematized by @constantinpape here, the current (0.4) version of the spec results in a hierarchy like this:
I have a few concerns with this arrangement, specifically the relationship between
image-group/.zattrs:multiscales
and the absence of spatial metadata inimage-group/scale-level0/.zattrs
Array metadata
There is no spatial metadata (
axes
andcoordinateTransformations
) stored inscale-level0/.zattrs
. This is undesirable: first from a semantic purity standpoint, the spatial metadata forscale-level0
is a property ofscale-level0
, and as a general principle metadata should be located as close as possible to the thing it describes. From a practical standpoint, clients opening the array directly will have no access to spatial metadata via the array's own.zattrs
. Instead, clients must first parseimage-group/.zattrs:multiscales
to figure out the spatial embedding of a array. Not only is is this indirect and inefficient, it's brittle: copyingscale-level0
to a different group won't preserve the spatial metadata, which will inevitably lead to confusion and errors, at least as long as arrays are what clients actually access to get data.The
multiscales.coordinateTransformations
attribute contributes to the brittleness. I am seriously skeptical about this attribute, and I would appreciate if someone could explain why it is necessary (as opposed to per-arraycoordinateTransformations
, which seems much simpler and more robust). In #85 @jbms suggested (and @constantinpape agreed with) a model with 3 different coordinate spaces:coordinateTransformations
was to map dataset coordinates to world coordinates. The existence of an additional coordinate space implies thatcoordinateTransformations
are incomplete unless they are explicitly associated with a target coordinate space; this is currently not represented in the spec, but maybe it's on the roadmap? In any case, I think it's much simpler (and consistent with the actual semantics of data acquisition) to stipulate that there are only 2 coordinate spaces in scope: dataset coordinates (i.e., array indices), and world coordinates (physical units).multiscales.axes
has physical (world) units, so I must be missing something here.Metadata duplication
In the original zarr multiscales issue, the final proposal was extremely simple:
multiscales
was just a list of references to arrays, with no array metadata, and some metadata about itself (e.g., version). Clients interested in displaying a multiscale image ultimately need to know the scaling and offset of each array; to make IO a bit more efficient for these clients, several voices supported duplicating array metadata (specifically spatial metadata) and put this inmultiscales
. The logic was that clients would only need to perform one IO operation (fetchingimage-group/.zattrs
) to get all the information needed about the multiscale collection, but the cost is duplicated metadata. I wonder, how awful is it for IO if we don't duplicate any metadata, and clients must first queryimage-group/.zattrs
, parsemultiscales.datasets
, (which is just a list of paths), then access the metadata of each array listed inmultiscales.datasets
. For a typical multiscale collection with 5-8 scale levels, this means 5-8 additional fetches of JSON metadata. How bad is this for latency? If the fetches are launched concurrently, I suspect the impact would be minimal, and it will certainly be dwarfed by the time required to ultimately load chunk data. I think we should seriously consider this.Suggestions
axes
andcoordinateTransformations
) for arrays reside primarily in array metadata, e.g.scale-level0/.zattrs
. When an array's spatial metadata needs to be duplicated, e.g. inimage-group/.zattrs:multiscales.datasets
, it should be understood that such duplication is only for convenience / performance.multiscales
, i.e. makingmultiscales.datasets
just a list of references to other arrays with no array-specific metadata. Clients have to do more work to compose the multiscale, but the metadata story is much cleaner.cc @bogovicj
The text was updated successfully, but these errors were encountered: