Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle data exported via MetaXpress #201

Merged
merged 13 commits into from
Dec 13, 2024
Merged

Handle data exported via MetaXpress #201

merged 13 commits into from
Dec 13, 2024

Conversation

fstur
Copy link
Collaborator

@fstur fstur commented Dec 10, 2024

When exporting the data via the MetaXpress software, the data is structured slightly differently, than when simply copying the data folder. The main differences are:
-Projections are stored in the ZStep_0 folder
-The 'acquisition ID' folder also contains the plate-name
-The md_id field is missing
-The extension is .TIF instead of .tif

Changes to acommodate this: I adjusted the regex pattern to match all different cases, and added a line to switch all z=="0" to z=None.

In addition, if the plate is a mixed acquisition, the MetaXpress export will fill the incomplete stacks with either the single plane, or the MIP (data is duplicated), making all channels full stacks. I decided to process those just with the StackAcquisition class, and not support the MixedAcquisition class.

I had to add an additional check in _compute_z_spacing to check that the channel from which z is computed is not such a 'filled stack'.

I also added the imagexpress_ZMB_MetaXpress/test_plate_acquisitions.py tests and additional test-data.

If it's ok with you, it would be great if we could incorporate these changes to also handle MetaXpress exported files.

Copy link

codecov bot commented Dec 10, 2024

Codecov Report

Attention: Patch coverage is 98.50000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 93.16%. Comparing base (1870eee) to head (87ae54a).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/faim_ipa/hcs/imagexpress/acquisition.py 80.00% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #201      +/-   ##
==========================================
+ Coverage   92.75%   93.16%   +0.41%     
==========================================
  Files          64       65       +1     
  Lines        3920     4128     +208     
  Branches      246      260      +14     
==========================================
+ Hits         3636     3846     +210     
+ Misses        269      265       -4     
- Partials       15       17       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@imagejan imagejan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fstur for the pull request!

Is there a chance we can reduce the number of files in the test data to the absolute minimum required, e.g. by reducing to less time points and z steps?
Also, we might want to have just empty (0kb) files for those .TIF files that aren't required to load metadata.

Regarding the folder organization, I'd suggest keeping the tests in tests/hcs/imagexpress as that's corresponding to the Python package we are testing. How about just a new file test_plate_acquisitions_exported.py or some such?

For the test data, I suggest naming the folder ImageXpress_exported and remove the ZMB. Also, MetaXpress is the acquisition software and ImageXpress the device, right? I suggest we stick with just ImageXpress for the naming, or are there good reasons to make the separation, (e.g. alternative software that is used on the ImageXpress systems)?

I added a few (sometimes nitpicky) comments.


Option B (as exported by MetaXpress):

test_Plate_3420 --> name + {acquisition id}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that actually:

Suggested change
test_Plate_3420 --> name + {acquisition id}
test_Plate_3420 --> {name}_Plate_{acquisition id}

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I adjusted it


OPTION B (as exported by MetaXpress):

test_Plate_3433 --> name + {acquisition id}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test_Plate_3433 --> name + {acquisition id}
test_Plate_3433 --> {name}_Plate_{acquisition id}

(see above)
Would that be correct?

@@ -320,20 +373,35 @@ def _parse_files(self) -> pd.DataFrame:

def _get_root_re(self) -> re.Pattern:
return re.compile(
r".*(?:[\/\\](?P<date>\d{4}-\d{2}-\d{2}))?[\/\\](?P<acq_id>\d+)(?:[\/\\]TimePoint_(?P<t>\d+))?(?:[\/\\]ZStep_(?P<z>\d+))"
r".*(?:[\/\\](?P<date>\d{4}-\d{2}-\d{2}))?[\/\\](?:(?P<plate_name>.*)_Plate_)?(?P<acq_id>\d+)(?:[\/\\]TimePoint_(?P<t>\d+))?(?:[\/\\]ZStep_(?P<z>(?!0)\d+))"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer for readability:

Suggested change
r".*(?:[\/\\](?P<date>\d{4}-\d{2}-\d{2}))?[\/\\](?:(?P<plate_name>.*)_Plate_)?(?P<acq_id>\d+)(?:[\/\\]TimePoint_(?P<t>\d+))?(?:[\/\\]ZStep_(?P<z>(?!0)\d+))"
r".*(?:[\/\\](?P<date>\d{4}-\d{2}-\d{2}))?[\/\\](?:(?P<plate_name>.*)_Plate_)?(?P<acq_id>\d+)(?:[\/\\]TimePoint_(?P<t>\d+))?(?:[\/\\]ZStep_(?P<z>[1-9]\d+))"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to (?P[1-9]\d*)

)

def _get_filename_re(self) -> re.Pattern:
return re.compile(
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})(?P<ext>.tif)"
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})?(?P<ext>\.(?:tif|TIF))"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using (?i:tif) instead of the lookahead?

Suggested change
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})?(?P<ext>\.(?:tif|TIF))"
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})?(?P<ext>\.(?i:tif))"

Comment on lines 389 to 400
# check if files were exported via MetaXpress
if files.plate_name.iloc[0] is not None:
# single planes and MIPs are duplicated to the entire stack
# -> need to check in metadata, if the channel contains a stack or not
for c in np.sort(files.channel.unique()):
file = files[(files.channel == c) & (files.z == "2")].iloc[0]
metadata = load_metaseries_tiff_metadata(file.path)
# projections have no "Z Step" attribute
if "Z Step" in metadata.keys():
# single-planes have metadata["Z Step"] == 1, for each duplicate
if metadata["Z Step"] == 2:
channel_with_stack = c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we guarantee that channel_with_stack will be set to something after this loop?

Maybe we can make it a method returning the first channel containing a stack, instead of iterating and keeping the last. (And then fail if no channel actually contains stacks - unless we cannot ever reach that condition..)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't support MetaXpress export in MixedAcquisition, we can probably simplify this, as for complete StackAcquisitions we will always have all channels containing stacks, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are any ZStep_X folders, there must exist at least one channel that contains a full stack. So if the data was not altered, channel_with_stack should always be set to something.

I can also move it to a new function, e.g. _get_channel_with_stack().

I also added a break statement, so the first channel containing a stack is returned, and not the last.

My thoughts for doing it like this and not simplified for only full-stacks is the following: An "exported mixed acquisition" will to a user look very similar to an "exported stack acquisition". Like this, an "exported mixed acquisition" can still be processed by just using the 'StackAcquisition' class. The resulting ome-zarr will then just have some channels where the MIP / single plane is duplicated to the whole stack. I adjusted the docstrings and the MixedAcquisition error message slightly to make this hopefully a bit clearer.
In short: I would still support the conversion of "exported mixed acquisitions", just not with the MixedAcquisition class, but with the StackAcquisition class.

)

def _get_filename_re(self) -> re.Pattern:
return re.compile(
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})(?P<ext>.tif)"
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})?(?P<ext>\.(?:tif|TIF))"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})?(?P<ext>\.(?:tif|TIF))"
r"(?P<name>.*)_(?P<well>[A-Z]+\d{2})_?(?P<field>s\d+)?_?(?P<channel>w[1-9]{1})?(?!_thumb)(?P<md_id>[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12})?(?P<ext>\.(?i:tif))"

(see above)

Comment on lines 216 to 221
def test_mixed_acquisition(acquisition_dir: Path):
with pytest.raises(ValueError):
MixedAcquisition(
acquisition_dir,
alignment=TileAlignmentOptions.GRID,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe MixedAcquisition with exported MetaXpress data should raise NotImplementedError? Or we can keep ValueError but test for a meaningful error message?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it for now as a ValueError and checked for the error message.

Comment on lines 224 to 245
@pytest.fixture
def dummy_plate():
ImageXpressPlateAcquisition.__abstractmethods__ = set()

@dataclass
class Plate(ImageXpressPlateAcquisition):
pass

return Plate()


def test_raise_not_implemented_error(dummy_plate):
with pytest.raises(NotImplementedError):
dummy_plate._get_root_re()

with pytest.raises(NotImplementedError):
dummy_plate._get_filename_re()

with pytest.raises(NotImplementedError):
dummy_plate._get_z_spacing()


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant now, isn't it? I suggest we remove from here.

@fstur
Copy link
Collaborator Author

fstur commented Dec 13, 2024

Hey @imagejan
Thanks for the review!
I implemented most of your comments. I also reduced the test-data a lot. But I don't think we can make some .TIF files empty, as I believe the metadata of all of them are read.
I adjusted the folder organization and switched all "MetaXpress" to "ImageXpress" as you suggested.

@imagejan imagejan merged commit 53b2608 into main Dec 13, 2024
12 checks passed
@imagejan imagejan deleted the zmb3 branch December 13, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants