Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand scope of write_elem wrapper #500

Closed
tcompa opened this issue Aug 29, 2023 · 6 comments · Fixed by #499
Closed

Expand scope of write_elem wrapper #500

tcompa opened this issue Aug 29, 2023 · 6 comments · Fixed by #499
Labels
Tables AnnData and ROI/feature tables

Comments

@tcompa
Copy link
Collaborator

tcompa commented Aug 29, 2023

On the write_elem wrapper: can we take this opportunity to make this a broader wrapper that handles more of the table writing? write_elem seems like a low level wrapper function. Would be great if tasks can just call a write_table or something like that with some parameters. Currently, many task replicate similar handling of the table metadata, updating the list of tables, checking for existing tables etc.

This would be a good opportunity to generalize this and have a useful wrapper that sets the correct metadata for those tables, but that someone building a new task then doesn’t have to think about.

Originally posted by @jluethi in #485 (comment)

@tcompa
Copy link
Collaborator Author

tcompa commented Aug 29, 2023

Here are the typical ways we create tables. Let's see if/what we can abstract out of them.

Create-ome-zarr task

            # Create tables zarr group for ROI tables
            group_tables = group_image.create_group("tables/")  # noqa: F841
            well_id = row + column

            # Prepare AnnData tables for FOV/well ROIs
            FOV_ROIs_table = prepare_FOV_ROI_table(site_metadata.loc[well_id])
            well_ROIs_table = prepare_well_ROI_table(
                site_metadata.loc[well_id]
            )

            # Write AnnData tables in the tables zarr group
            write_elem(group_tables, "FOV_ROI_table", FOV_ROIs_table)
            write_elem(group_tables, "well_ROI_table", well_ROIs_table)
            group_tables.attrs["tables"] = ["FOV_ROI_table", "well_ROI_table"]

Copy-ome-zarr task

                    # Copy the tables in ROI_table_names
                    for ROI_table_name in ROI_table_names:

                        logger.info(
                            f"I will now read {ROI_table_name} from "
                            f"{zarrurl_old=}, convert it to 2D, and "
                            "write it back to the new zarr file."
                        )
                        ROI_table = ad.read_zarr(
                            f"{zarrurl_old}/{well_path}/{image_path}/"
                            f"tables/{ROI_table_name}"
                        )
                        # Convert 3D FOVs to 2D
                        if project_to_2D:
                            new_ROI_table = convert_ROIs_from_3D_to_2D(
                                ROI_table, pxl_size_z
                            )
                        # Write new table
                        write_elem(
                            new_tables_group, ROI_table_name, new_ROI_table
                        )

Cellpose task

    if output_ROI_table:
        # Concatenate all ROI dataframes
        df_well = pd.concat(bbox_dataframe_list, axis=0, ignore_index=True)
        df_well.index = df_well.index.astype(str)
        # Extract labels and drop them from df_well
        labels = pd.DataFrame(df_well["label"].astype(str))
        df_well.drop(labels=["label"], axis=1, inplace=True)
        # Convert all to float (warning: some would be int, in principle)
        bbox_dtype = np.float32
        df_well = df_well.astype(bbox_dtype)
        # Convert to anndata
        bbox_table = ad.AnnData(df_well, dtype=bbox_dtype)
        bbox_table.obs = labels
        # Write to zarr group
        group_tables = zarr.group(f"{in_path}/{component}/tables/")
        write_elem(group_tables, output_ROI_table, bbox_table)
        logger.info(
            "Bounding box ROI table written to "
            f"{in_path}/{component}/tables/{output_ROI_table}"
        )

Napari-workflows task

    # Output handling: "dataframe" type (for each output, concatenate ROI
    # dataframes, clean up, and store in a AnnData table on-disk)
    for (name, out_params) in dataframe_outputs:
        table_name = out_params.table_name
        # Concatenate all FOV dataframes
        list_dfs = output_dataframe_lists[name]
        if len(list_dfs) == 0:
            measurement_table = ad.AnnData()
        else:
            df_well = pd.concat(list_dfs, axis=0, ignore_index=True)
            # Extract labels and drop them from df_well
            labels = pd.DataFrame(df_well["label"].astype(str))
            df_well.drop(labels=["label"], axis=1, inplace=True)
            # Convert all to float (warning: some would be int, in principle)
            measurement_dtype = np.float32
            df_well = df_well.astype(measurement_dtype)
            df_well.index = df_well.index.map(str)
            # Convert to anndata
            measurement_table = ad.AnnData(df_well, dtype=measurement_dtype)
            measurement_table.obs = labels
        # Write to zarr group
        group_tables = zarr.group(f"{in_path}/{component}/tables/")
        write_elem(group_tables, table_name, measurement_table)
        # Update OME-NGFF metadata
        current_tables = group_tables.attrs.asdict().get("tables") or []
        if table_name in current_tables:
            # FIXME: move this check to an earlier stage of the task
            raise ValueError(
                f"{in_path}/{component}/tables/ already includes "
                f"{table_name=} in {current_tables=}"
            )
        new_tables = current_tables + [table_name]
        group_tables.attrs["tables"] = new_tables
        # FIXME: also include table metadata, see issue #333

@tcompa
Copy link
Collaborator Author

tcompa commented Aug 29, 2023

Something that clearly can be moved into write_table is the creation of the tables group (if needed), and the check on whether the required subgroup already exists (to be handled via overwrite).
EDIT: also the tables metadata are always handled in the same way.

This means that the first version of this function could look like

def write_table(image_group, table_name, table, overwrite=False):
    # Create tables group, if needed
    ...
    # Check whether subgroup already exists, and proceed accordingly
    ...
   # If it's all OK, proceed and write the table
   ...
   # Update the `tables` metadata of the image group
   ...

Let's see if we can include more general features in this function.

tcompa added a commit that referenced this issue Aug 29, 2023
@tcompa
Copy link
Collaborator Author

tcompa commented Aug 29, 2023

The first prototype is available through f4f127e, comments on the scope are welcome.

@jluethi
Copy link
Collaborator

jluethi commented Aug 29, 2023

I like the approach to write_table a lot, especially that handling all the metadata attributes now go into a wrapper, not the task function. Will make it much easier to be consistent on those.
I don't fully understand yet how all of them are set and whether we handle e.g. well_ROI_table correctly (see comments). But great that we'll have one place to handle that.

@tcompa
Copy link
Collaborator Author

tcompa commented Aug 29, 2023

I improved the prototype function (including your suggestion) with 9e5b5e8 and 45b4d50, see current status of PR #499. Let me know if something is still unclear or can be improved.

More changes may come from actually integrating this function in the tasks.

@jluethi
Copy link
Collaborator

jluethi commented Aug 29, 2023

Looks good to me! We can then work more on this wrapper when we formalize ROIs more (e.g. what metadata should our default ROI tables have?). It's already a great first step towards it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tables AnnData and ROI/feature tables
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants