-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle data exported via MetaXpress #201
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #201 +/- ##
==========================================
+ Coverage 92.75% 93.16% +0.41%
==========================================
Files 64 65 +1
Lines 3920 4128 +208
Branches 246 260 +14
==========================================
+ Hits 3636 3846 +210
+ Misses 269 265 -4
- Partials 15 17 +2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fstur for the pull request!
Is there a chance we can reduce the number of files in the test data to the absolute minimum required, e.g. by reducing to less time points and z steps?
Also, we might want to have just empty (0kb) files for those .TIF
files that aren't required to load metadata.
Regarding the folder organization, I'd suggest keeping the tests in tests/hcs/imagexpress
as that's corresponding to the Python package we are testing. How about just a new file test_plate_acquisitions_exported.py
or some such?
For the test data, I suggest naming the folder ImageXpress_exported
and remove the ZMB
. Also, MetaXpress
is the acquisition software and ImageXpress
the device, right? I suggest we stick with just ImageXpress for the naming, or are there good reasons to make the separation, (e.g. alternative software that is used on the ImageXpress systems)?
I added a few (sometimes nitpicky) comments.
|
||
Option B (as exported by MetaXpress): | ||
|
||
test_Plate_3420 --> name + {acquisition id} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that actually:
test_Plate_3420 --> name + {acquisition id} | |
test_Plate_3420 --> {name}_Plate_{acquisition id} |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I adjusted it
|
||
OPTION B (as exported by MetaXpress): | ||
|
||
test_Plate_3433 --> name + {acquisition id} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_Plate_3433 --> name + {acquisition id} | |
test_Plate_3433 --> {name}_Plate_{acquisition id} |
(see above)
Would that be correct?
@@ -320,20 +373,35 @@ def _parse_files(self) -> pd.DataFrame: | |||
|
|||
def _get_root_re(self) -> re.Pattern: | |||
return re.compile( | |||
r".*(?:[\/\\](?P<date>\d{4}-\d{2}-\d{2}))?[\/\\](?P<acq_id>\d+)(?:[\/\\]TimePoint_(?P<t>\d+))?(?:[\/\\]ZStep_(?P<z>\d+))" | |||
r".*(?:[\/\\](?P<date>\d{4}-\d{2}-\d{2}))?[\/\\](?:(?P<plate_name>.*)_Plate_)?(?P<acq_id>\d+)(?:[\/\\]TimePoint_(?P<t>\d+))?(?:[\/\\]ZStep_(?P<z>(?!0)\d+))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer for readability:
r".*(?:[\/\\](?P<date>\d{4}-\d{2}-\d{2}))?[\/\\](?:(?P<plate_name>.*)_Plate_)?(?P<acq_id>\d+)(?:[\/\\]TimePoint_(?P<t>\d+))?(?:[\/\\]ZStep_(?P<z>(?!0)\d+))" | |
r".*(?:[\/\\](?P<date>\d{4}-\d{2}-\d{2}))?[\/\\](?:(?P<plate_name>.*)_Plate_)?(?P<acq_id>\d+)(?:[\/\\]TimePoint_(?P<t>\d+))?(?:[\/\\]ZStep_(?P<z>[1-9]\d+))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to (?P[1-9]\d*)
) | ||
|
||
def _get_filename_re(self) -> re.Pattern: | ||
return re.compile( | ||
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})(?P<ext>.tif)" | ||
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})?(?P<ext>\.(?:tif|TIF))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using (?i:tif)
instead of the lookahead?
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})?(?P<ext>\.(?:tif|TIF))" | |
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})?(?P<ext>\.(?i:tif))" |
# check if files were exported via MetaXpress | ||
if files.plate_name.iloc[0] is not None: | ||
# single planes and MIPs are duplicated to the entire stack | ||
# -> need to check in metadata, if the channel contains a stack or not | ||
for c in np.sort(files.channel.unique()): | ||
file = files[(files.channel == c) & (files.z == "2")].iloc[0] | ||
metadata = load_metaseries_tiff_metadata(file.path) | ||
# projections have no "Z Step" attribute | ||
if "Z Step" in metadata.keys(): | ||
# single-planes have metadata["Z Step"] == 1, for each duplicate | ||
if metadata["Z Step"] == 2: | ||
channel_with_stack = c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we guarantee that channel_with_stack
will be set to something after this loop?
Maybe we can make it a method returning the first channel containing a stack, instead of iterating and keeping the last. (And then fail if no channel actually contains stacks - unless we cannot ever reach that condition..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't support MetaXpress export in MixedAcquisition
, we can probably simplify this, as for complete StackAcquisition
s we will always have all channels containing stacks, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are any ZStep_X folders, there must exist at least one channel that contains a full stack. So if the data was not altered, channel_with_stack
should always be set to something.
I can also move it to a new function, e.g. _get_channel_with_stack()
.
I also added a break
statement, so the first channel containing a stack is returned, and not the last.
My thoughts for doing it like this and not simplified for only full-stacks is the following: An "exported mixed acquisition" will to a user look very similar to an "exported stack acquisition". Like this, an "exported mixed acquisition" can still be processed by just using the 'StackAcquisition' class. The resulting ome-zarr will then just have some channels where the MIP / single plane is duplicated to the whole stack. I adjusted the docstrings and the MixedAcquisition error message slightly to make this hopefully a bit clearer.
In short: I would still support the conversion of "exported mixed acquisitions", just not with the MixedAcquisition
class, but with the StackAcquisition
class.
) | ||
|
||
def _get_filename_re(self) -> re.Pattern: | ||
return re.compile( | ||
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})(?P<ext>.tif)" | ||
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})?(?P<ext>\.(?:tif|TIF))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})?(?P<ext>\.(?:tif|TIF))" | |
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})?(?P<ext>\.(?i:tif))" |
(see above)
def test_mixed_acquisition(acquisition_dir: Path): | ||
with pytest.raises(ValueError): | ||
MixedAcquisition( | ||
acquisition_dir, | ||
alignment=TileAlignmentOptions.GRID, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe MixedAcquisition
with exported MetaXpress data should raise NotImplementedError
? Or we can keep ValueError
but test for a meaningful error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept it for now as a ValueError
and checked for the error message.
@pytest.fixture | ||
def dummy_plate(): | ||
ImageXpressPlateAcquisition.__abstractmethods__ = set() | ||
|
||
@dataclass | ||
class Plate(ImageXpressPlateAcquisition): | ||
pass | ||
|
||
return Plate() | ||
|
||
|
||
def test_raise_not_implemented_error(dummy_plate): | ||
with pytest.raises(NotImplementedError): | ||
dummy_plate._get_root_re() | ||
|
||
with pytest.raises(NotImplementedError): | ||
dummy_plate._get_filename_re() | ||
|
||
with pytest.raises(NotImplementedError): | ||
dummy_plate._get_z_spacing() | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant now, isn't it? I suggest we remove from here.
Hey @imagejan |
When exporting the data via the MetaXpress software, the data is structured slightly differently, than when simply copying the data folder. The main differences are:
-Projections are stored in the ZStep_0 folder
-The 'acquisition ID' folder also contains the plate-name
-The md_id field is missing
-The extension is .TIF instead of .tif
Changes to acommodate this: I adjusted the regex pattern to match all different cases, and added a line to switch all
z=="0"
toz=None
.In addition, if the plate is a mixed acquisition, the MetaXpress export will fill the incomplete stacks with either the single plane, or the MIP (data is duplicated), making all channels full stacks. I decided to process those just with the
StackAcquisition
class, and not support theMixedAcquisition
class.I had to add an additional check in
_compute_z_spacing
to check that the channel from which z is computed is not such a 'filled stack'.I also added the
imagexpress_ZMB_MetaXpress/test_plate_acquisitions.py
tests and additional test-data.If it's ok with you, it would be great if we could incorporate these changes to also handle MetaXpress exported files.