Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colocalization module #17

Merged
merged 9 commits into from
Feb 6, 2024
Merged

Colocalization module #17

merged 9 commits into from
Feb 6, 2024

Conversation

tdrose
Copy link
Member

@tdrose tdrose commented Dec 19, 2023

Colocalization is a common metric for analysis of imaging MS/spatial metabolomics data.
In a previous publication from Theo (ColocML), they evaluated metrics/processing steps for computing colocalization. I implemented the best-performing metric.

I think it is a useful addition to the package for users to perform this kind of analysis/processing which cannot be done by other downstream analysis packages such as scanpy or squidpy.

@tdrose tdrose added the enhancement New feature or request label Dec 19, 2023
@tdrose tdrose requested a review from aeisenbarth December 19, 2023 16:15
@tdrose tdrose self-assigned this Dec 19, 2023
@tdrose tdrose linked an issue Jan 11, 2024 that may be closed by this pull request
2 tasks
@tdrose
Copy link
Member Author

tdrose commented Jan 11, 2024

Once merged, closes #16 .

# It has the same dimensions as `adata.X`
colocalization.colocML_preprocessing(adata, layer="colocml_preprocessing")

# Compute the pairwise colocalization metrix between all ion images
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matrix (or metrics?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

from metaspace_converter.constants import COLOCALIZATION


def colocML_preprocessing(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename to coloc_ml_preprocessing, closer to Python naming conventions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"""

# Select data
if layer == None or layer not in adata.layers.keys():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the second condition is quite permissive. If an AnnData has both a default X and some layers, a user providing an unknown layer name would unexpectedly get the default layer, but since no error is raised, assume they got the requested layer.

When doing data analysis, is this level of permissiveness useful (because some datasets may have the layer, others not)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Now raising an error if layer not available.

coloc = _pairwise_cosine_similarity(data)

# Save colocalization
adata.uns[COLOCALIZATION] = coloc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also fit well as adata.varp[COLOCALIZATION] = coloc since it is pairwise on features.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Good idea. Will move it to anndata.varp

Comment on lines 86 to 90
# Layer exists
assert COLOCML_LAYER in adata.layers.keys()

# Layer sizes match
assert adata.X.shape == adata.layers[COLOCML_LAYER].shape
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move these above # Test median filtering because the other assertions would have failed anyways. That way, we first have the check that preprocessing was done, and then that it was done correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

assert observed == expected_preprocessing

# Quantile thresholding
adata.X[0].reshape(-1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't modify the receiver. You can remove the line adata.X[0].reshape(-1).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Probably just an artifact from my internal testing. Removed it.

Comment on lines 41 to 43
dict(num_ions=5, height=10, width=10),
dict(num_ions=0, height=2, width=3),
dict(num_ions=4, height=4, width=4),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the third and first case seem to cover the same classes of equivalent inputs, that means I would not expect the third one fails without the first one failing as well.

But the second one covers two different edge cases (no ions, rectangular image). I would separate them like this:

        dict(num_ions=5, height=10, width=10),
        dict(num_ions=0, height=10, width=10),
        dict(num_ions=5, height=3, width=4),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

Comment on lines 69 to 75
expected_preprocessing = np.median(actual[0][:3, :3])
observed = adata.layers[COLOCML_LAYER][adata.uns[METASPACE_KEY]["image_size"][X] + 1, 0]
assert observed == expected_preprocessing

expected_preprocessing = np.median(actual[1][:3, :3])
observed = adata.layers[COLOCML_LAYER][adata.uns[METASPACE_KEY]["image_size"][X] + 1, 1]
assert observed == expected_preprocessing
Copy link
Collaborator

@aeisenbarth aeisenbarth Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assertions are not obvious to me as a reader. I would extract the inline expression to a variable width and explain the indexing. For example:

        # Check that preprocessing was done correctly.
        # Probe a single pixel of the resulting array and compute the expected median filter.
        x = 1
        y = 1
        width = adata.uns[METASPACE_KEY]["image_size"][X]

        expected_pixel_value = np.median(actual[0][y-1:y+2, x-1:x+2])
        actual_pixel_value = adata.layers[COLOCML_LAYER][y * width + x, 0]
        assert actual_pixel_value == expected_pixel_value

One can then even check all ions in one go with

        expected_pixel_features = np.median(actual[:, y - 1:y + 2, x - 1:x + 2], axis=[1, 2])
        actual_pixel_features = adata.layers[COLOCML_LAYER][y * width + x, :]
        np.testing.assert_allclose(actual_pixel_features, expected_pixel_features)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.

Comment on lines 77 to 84
# Quantile thresholding
adata.X[0].reshape(-1)

colocML_preprocessing(
adata, median_filter_size=(3, 3), quantile_threshold=0.5, layer=COLOCML_LAYER
)

assert all(np.sum(adata.layers[COLOCML_LAYER] == 0, axis=0) <= 0.5 * adata.X.shape[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is often preferrable to have a single test case in a test function. For a second case, you could parametrize the function with @pytest.parametrize("quantile_threshold", [0, 0.5]) or, if the call or assertions are too different, write a separate test function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the assertion a bit clearer by extracting the parameter to a variable and computing the assertion values in separate lines.

        quantile_threshold = 0.5
        colocML_preprocessing(
            adata, median_filter_size=(3, 3), quantile_threshold=quantile_threshold, layer=COLOCML_LAYER
        )
        quantile_value = quantile_threshold * adata.X.shape[0]
        actual_thresholded = np.sum(adata.layers[COLOCML_LAYER] == 0, axis=0)
        assert all(actual_thresholded <= quantile_value)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added quantile_threshold to parameterization.

@@ -16,13 +17,15 @@ def _check_pixel_coordinates(adata: AnnData) -> bool:
return np.all(np.equal(pixel_list, required_pixels))


def anndata_to_image_array(adata: AnnData) -> np.ndarray:
def anndata_to_image_array(adata: AnnData, layer: Optional[str]=None) -> np.ndarray:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added layer argument to the anndata_to_image_array function. With this also the processed images from layers can be extracted.

@tdrose tdrose requested a review from aeisenbarth February 5, 2024 12:30
Copy link
Collaborator

@aeisenbarth aeisenbarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides one typo ready to merge!

ion image and all pixels below the quantile threshold will be set to 0.

Returns:
None. The processed data is saved in ``layer``. If layer is net to None, ``adata.X`` will
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

net set

@tdrose tdrose merged commit 9c7e08d into master Feb 6, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement ColocML processing and colocalization computation
2 participants