-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds test for reading data with unordered position labeles #124
Adds test for reading data with unordered position labeles #124
Conversation
P. S. I mean data returned by |
Do you mean the index 0 when calling
What situations are you thinking about when you say sorting will be costly?
I think this is reasonable for String valued axes, but probably not numeric. A concrete example of this is an explore acquisition. You take a z stack going from index 0 to index 9. Then you decide you want to see whats above the sample and image indices -8 to -1. When you call Thoughts? |
This is actually the interesting case… you have sequences of ascending values, and you end up sorting. Using the SortedSet minimizes the overhead in this case, but this will still be slightly more expensive than just doing the list append. However, based on my experience with the current implementation, the performance is tolerable, as you are just doing an insert into the sorted structure, not resorting every time. which was the prior behavior and really didn’t work.
So in my mind, this is a question of semantics and not performance. You do have an underlying assumption in your example, which is that with numeric indexes, they are numbered in either spatial or temporal order.
Perhaps the documentation should state this explicitly?
Carl
…----------------------------------------------------------
Dr. Carl Kesselman
William H. Keck Professor of Engineering
Professor, Epstein Department of Industrial and Systems Engineering
Fellow, Information Sciences Institute
Viterbi School of Engineering
Professor, Department of Population and Public Health Sciences, Keck School of Medicine
Professor, Biomedical Sciences, Ostrow School of Dentistry
University of Southern California
4676 Admiralty Way, Suite 1001, Marina del Rey, CA 90292-6695
Phone: +1 (310) 448-9338
Email: ***@***.***
Web: http://www.isi.edu/~carl
On Jul 28, 2023 at 8:05 AM -0700, Henry Pinkard ***@***.***>, wrote:
This issue will also come up if acquiring, for example, channels in the ['RFP', 'GFP'] order - the axes will read ['GFP', 'RFP'], but the channel with index 0 will correspond to RFP.
Do you mean the index 0 when calling dataset.as_array()?
I think it will be costly to sort the data, so my suggestion would be to also remove the sorting of the axes - i.e. use set rather than SortedSet.
What situations are you thinking about when you say sorting will be costly?
I also think it's right not to sort the data - if a user acquired channels or z slices out of alphabetical / numeric order it would be on them to reorder the data rather than expect the data reader to do it.
I think this is reasonable for String valued axes, but probably not numeric. A concrete example of this is an explore acquisition. You take a z stack going from index 0 to index 9. Then you decide you want to see whats above the sample and image indices -8 to -1. When you call as_array(), you'd expect it to be spatially ordered, and having to do that yourself if kinda confusing
—
Reply to this email directly, view it on GitHub<https://urldefense.us/v2/url?u=https-3A__github.com_micro-2Dmanager_NDTiffStorage_pull_124-23issuecomment-2D1655849589&d=DwMCaQ&c=qzHnJIRvjI6L-clJH8JwLQvf_Iq43fzikf6aoxZgMb8&r=sGCma2ufaUVT-N141kRIZQ&m=ycOwCS40Ufyrb7e2r_54YQDx6UoeBA5_Sqr0YkuZCSYeqQJsCyfzgCyek9jsahn2&s=25CvORkTF27zBdSjvpHmvs255g8iHfh0nXBzeVBY9cU&e=>, or unsubscribe<https://urldefense.us/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AA3OGXXEBGVGMCWZQC4L6L3XSPIJTANCNFSM6AAAAAA2ZJRI5I&d=DwMCaQ&c=qzHnJIRvjI6L-clJH8JwLQvf_Iq43fzikf6aoxZgMb8&r=sGCma2ufaUVT-N141kRIZQ&m=ycOwCS40Ufyrb7e2r_54YQDx6UoeBA5_Sqr0YkuZCSYeqQJsCyfzgCyek9jsahn2&s=3vh4vkpC6i0FNZAtj-lF4X8WgnTNzgxKCybx06mk3mo&e=>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Correct
In 80623e7 I added a test for reading a dataset as you suggested. Here is the code I used to acquire the data: import numpy as np
from pycromanager import Acquisition, Dataset
def fun(image, metadata):
if not hasattr(fun, "idx"):
fun.idx = 0
image[0, 0] = np.uint16(fun.idx)
fun.idx += 1
return image, metadata
events1 = [{'axes': {'z': z_idx}, 'z': z_idx} for z_idx in range(10)]
events2 = [{'axes': {'z': z_idx}, 'z': z_idx} for z_idx in range(-10, 0)]
with Acquisition(directory=r'Q:\Ivan\testing', name='unordered_z', image_process_fn=fun, show_display=False) as acq:
acq.acquire(events1)
acq.acquire(events2) Currently In order to sort the array that My suggestion is to remove the axis sorting, and keep the data fetching as it. We'll then pass the cost of array sorting to the user, if that's something they want to do. In the case of the example above you'd do |
P.S. It looks like there is an even bigger bug here - in the |
At least part of the problem is that this line Should be changed to: axes = {key: axes_to_stack[key][block_id[i]] for i, key in enumerate(axes_to_stack.keys())} (BTW I tried to make this change myself, but couldn't. I could pull the changes locally as you suggested before, but not push. I think there is a box that you can check when making PRs that says something like "allow maintainers to make edits" that would make this easier) How this works is that the I don't think there will be any appreciable performance penalty for creating the stacked arrays in different orderings, because the data isn't immediately read into memory. The only sorting-related cost should be sorting of the keys, but I don't think this is a huge concern because when loading the data from disk its a one time cost, and Carl's changes seem to have made this fast enough for in-progress acquisitions to be repeatedly run in real time. Maybe there are situations on enourmous datasets where this could become a problem, but I would guess this is atypical to how most people use this, so if we do find that it exists I'd advocate for making an option to turn off sorting as needed. There may be differences in speed based on ordering for actually pulling the pixel data into memory, but this is hardware-dependent (e.g. if sequential images read are in nearby places on the physical disk) and I think beyond the scope of how this class should be set up. Separately, there's the issue of whether String axes should be sorted or kept in acquisition order. I do think the Great that you are working through all these details and making tests. I think this will be very beneficial to the libraries usability. |
Co-Authored-By: Henry Pinkard <[email protected]>
That did the trick! On the issue of sorting vs not sorting - let's take the path of least resistance and continue sorting the axes and now the data. I agree with you that it's critical that I think this PR is ready to merge. I saw that you updated the documentation (thanks a lot!) I'll read through it and I'll double check that this behavior is fully explained. |
@henrypinkard I am finding that ndtiff sorts the axis labels (now using
SortedSet
s), but the data is returned in the acquisition order, causing the test here to fail.I think this is something you alluded to in micro-manager/pycro-manager#575. This issue will also come up if acquiring, for example, channels in the
['RFP', 'GFP']
order - the axes will read['GFP', 'RFP']
, but the channel with index 0 will correspond to RFP.I think it will be costly to sort the data, so my suggestion would be to also remove the sorting of the axes - i.e. use
set
rather thanSortedSet
. This will largely preserve the current behavior and will even speed up the code a little bit. I also think it's right not to sort the data - if a user acquired channels or z slices out of alphabetical / numeric order it would be on them to reorder the data rather than expect the data reader to do it.Let me know what you think. @carlkesselman is welcome to chime in as well.