Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds test for reading data with unordered position labeles #124

Merged
merged 6 commits into from
Aug 4, 2023

Conversation

ieivanov
Copy link
Collaborator

@henrypinkard I am finding that ndtiff sorts the axis labels (now using SortedSets), but the data is returned in the acquisition order, causing the test here to fail.

I think this is something you alluded to in micro-manager/pycro-manager#575. This issue will also come up if acquiring, for example, channels in the ['RFP', 'GFP'] order - the axes will read ['GFP', 'RFP'], but the channel with index 0 will correspond to RFP.

I think it will be costly to sort the data, so my suggestion would be to also remove the sorting of the axes - i.e. use set rather than SortedSet. This will largely preserve the current behavior and will even speed up the code a little bit. I also think it's right not to sort the data - if a user acquired channels or z slices out of alphabetical / numeric order it would be on them to reorder the data rather than expect the data reader to do it.

Let me know what you think. @carlkesselman is welcome to chime in as well.

@ieivanov
Copy link
Collaborator Author

P. S. I mean data returned by dataset.as_array()

@carlkesselman
Copy link
Contributor

carlkesselman commented Jul 27, 2023 via email

@henrypinkard
Copy link
Member

henrypinkard commented Jul 28, 2023

This issue will also come up if acquiring, for example, channels in the ['RFP', 'GFP'] order - the axes will read ['GFP', 'RFP'], but the channel with index 0 will correspond to RFP.

Do you mean the index 0 when calling dataset.as_array()?

I think it will be costly to sort the data, so my suggestion would be to also remove the sorting of the axes - i.e. use set rather than SortedSet.

What situations are you thinking about when you say sorting will be costly?

I also think it's right not to sort the data - if a user acquired channels or z slices out of alphabetical / numeric order it would be on them to reorder the data rather than expect the data reader to do it.

I think this is reasonable for String valued axes, but probably not numeric. A concrete example of this is an explore acquisition. You take a z stack going from index 0 to index 9. Then you decide you want to see whats above the sample and image indices -8 to -1. When you call as_array(), you'd expect it to be spatially ordered, and having to do that yourself if kinda confusing

Thoughts?

@carlkesselman
Copy link
Contributor

carlkesselman commented Jul 28, 2023 via email

@ieivanov
Copy link
Collaborator Author

Do you mean the index 0 when calling dataset.as_array()?

Correct

I think this is reasonable for String valued axes, but probably not numeric. A concrete example of this is an explore acquisition. You take a z stack going from index 0 to index 9. Then you decide you want to see whats above the sample and image indices -8 to -1. When you call as_array(), you'd expect it to be spatially ordered, and having to do that yourself if kinda confusing

In 80623e7 I added a test for reading a dataset as you suggested. Here is the code I used to acquire the data:

import numpy as np
from pycromanager import Acquisition, Dataset

def fun(image, metadata):
    if not hasattr(fun, "idx"):
        fun.idx = 0

    image[0, 0] = np.uint16(fun.idx)

    fun.idx += 1

    return image, metadata

events1 = [{'axes': {'z': z_idx}, 'z': z_idx} for z_idx in range(10)]
events2 = [{'axes': {'z': z_idx}, 'z': z_idx} for z_idx in range(-10, 0)]

with Acquisition(directory=r'Q:\Ivan\testing', name='unordered_z', image_process_fn=fun, show_display=False) as acq:
    acq.acquire(events1)
    acq.acquire(events2)

Currently Dataset sorts the axes, but not the data (even when using integer-values axes) when calling dataset.as_array(), causing test_unordered_z_axis to fail. By sorting the data we would be introducing a new feature - even if that was the original intention all along. Further, as axis sorting, data sorting, and the dataset.axes property are all undocumented - something I've also planned to work on, the current behavior is all around unexpected, as Carl also points out.

In order to sort the array that dataset.as_array() we'll need to get a sorting key (something like sorting_key = np.argsort(axis['z'])) and then apply it to the data: dataset.as_array()[sorting_key, ...]. That, I think will add extra computational cost.

My suggestion is to remove the axis sorting, and keep the data fetching as it. We'll then pass the cost of array sorting to the user, if that's something they want to do. In the case of the example above you'd do dataset.as_array()[np.argsort(dataset.axes['z']), ...]

@ieivanov
Copy link
Collaborator Author

P.S. It looks like there is an even bigger bug here - in the unordered_z_1 dataset data.as_array()[10] returns all zeros - seems like the negative z axis index is causing problems.

@henrypinkard
Copy link
Member

At least part of the problem is that this line

https://github.com/micro-manager/NDTiffStorage/blob/80623e77d67ea9e4be751cee6b53724752cbe91f/python/ndtiff/nd_tiff_current.py#L613

Should be changed to:

axes = {key: axes_to_stack[key][block_id[i]] for i, key in enumerate(axes_to_stack.keys())}

(BTW I tried to make this change myself, but couldn't. I could pull the changes locally as you suggested before, but not push. I think there is a box that you can check when making PRs that says something like "allow maintainers to make edits" that would make this easier)

How this works is that the axes use the SortedSets to determine the order, and then when as_array is called it makes the stacked array in this order (lazily, not pulling everything into RAM at once). However, because of this bug it wasn't actually using the order defined in dataset.axes.

I don't think there will be any appreciable performance penalty for creating the stacked arrays in different orderings, because the data isn't immediately read into memory. The only sorting-related cost should be sorting of the keys, but I don't think this is a huge concern because when loading the data from disk its a one time cost, and Carl's changes seem to have made this fast enough for in-progress acquisitions to be repeatedly run in real time. Maybe there are situations on enourmous datasets where this could become a problem, but I would guess this is atypical to how most people use this, so if we do find that it exists I'd advocate for making an option to turn off sorting as needed.

There may be differences in speed based on ordering for actually pulling the pixel data into memory, but this is hardware-dependent (e.g. if sequential images read are in nearby places on the physical disk) and I think beyond the scope of how this class should be set up.

Separately, there's the issue of whether String axes should be sorted or kept in acquisition order. I do think the dataset.axes ordering should match the dataset.as_array() ordering, since this was the original intent. So its a question of whether to change the sorting behavior for string-valued axes so that they remain in acquisition ordering or go to alphabetical automatically. This is a new enough feature that I think its fine to pick and document a convention now (and the past behavior was buggy and confusing anyway).

Great that you are working through all these details and making tests. I think this will be very beneficial to the libraries usability.

@ieivanov
Copy link
Collaborator Author

ieivanov commented Aug 4, 2023

axes = {key: axes_to_stack[key][block_id[i]] for i, key in enumerate(axes_to_stack.keys())}

That did the trick!

On the issue of sorting vs not sorting - let's take the path of least resistance and continue sorting the axes and now the data. I agree with you that it's critical that the dataset.axes ordering should match the dataset.as_array() ordering and we've achieved that here.

I think this PR is ready to merge. I saw that you updated the documentation (thanks a lot!) I'll read through it and I'll double check that this behavior is fully explained.

@henrypinkard henrypinkard merged commit 41d212a into micro-manager:main Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants