-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Colocalization module #17
Conversation
Once merged, closes #16 . |
docs/examples.rst
Outdated
# It has the same dimensions as `adata.X` | ||
colocalization.colocML_preprocessing(adata, layer="colocml_preprocessing") | ||
|
||
# Compute the pairwise colocalization metrix between all ion images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matrix (or metrics?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
from metaspace_converter.constants import COLOCALIZATION | ||
|
||
|
||
def colocML_preprocessing( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename to coloc_ml_preprocessing
, closer to Python naming conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
""" | ||
|
||
# Select data | ||
if layer == None or layer not in adata.layers.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the second condition is quite permissive. If an AnnData has both a default X
and some layers
, a user providing an unknown layer name would unexpectedly get the default layer, but since no error is raised, assume they got the requested layer.
When doing data analysis, is this level of permissiveness useful (because some datasets may have the layer, others not)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Now raising an error if layer not available.
coloc = _pairwise_cosine_similarity(data) | ||
|
||
# Save colocalization | ||
adata.uns[COLOCALIZATION] = coloc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also fit well as adata.varp[COLOCALIZATION] = coloc
since it is pairwise on features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Good idea. Will move it to anndata.varp
# Layer exists | ||
assert COLOCML_LAYER in adata.layers.keys() | ||
|
||
# Layer sizes match | ||
assert adata.X.shape == adata.layers[COLOCML_LAYER].shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move these above # Test median filtering
because the other assertions would have failed anyways. That way, we first have the check that preprocessing was done, and then that it was done correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
assert observed == expected_preprocessing | ||
|
||
# Quantile thresholding | ||
adata.X[0].reshape(-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't modify the receiver. You can remove the line adata.X[0].reshape(-1)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Probably just an artifact from my internal testing. Removed it.
dict(num_ions=5, height=10, width=10), | ||
dict(num_ions=0, height=2, width=3), | ||
dict(num_ions=4, height=4, width=4), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, the third and first case seem to cover the same classes of equivalent inputs, that means I would not expect the third one fails without the first one failing as well.
But the second one covers two different edge cases (no ions, rectangular image). I would separate them like this:
dict(num_ions=5, height=10, width=10),
dict(num_ions=0, height=10, width=10),
dict(num_ions=5, height=3, width=4),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Done.
expected_preprocessing = np.median(actual[0][:3, :3]) | ||
observed = adata.layers[COLOCML_LAYER][adata.uns[METASPACE_KEY]["image_size"][X] + 1, 0] | ||
assert observed == expected_preprocessing | ||
|
||
expected_preprocessing = np.median(actual[1][:3, :3]) | ||
observed = adata.layers[COLOCML_LAYER][adata.uns[METASPACE_KEY]["image_size"][X] + 1, 1] | ||
assert observed == expected_preprocessing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These assertions are not obvious to me as a reader. I would extract the inline expression to a variable width
and explain the indexing. For example:
# Check that preprocessing was done correctly.
# Probe a single pixel of the resulting array and compute the expected median filter.
x = 1
y = 1
width = adata.uns[METASPACE_KEY]["image_size"][X]
expected_pixel_value = np.median(actual[0][y-1:y+2, x-1:x+2])
actual_pixel_value = adata.layers[COLOCML_LAYER][y * width + x, 0]
assert actual_pixel_value == expected_pixel_value
One can then even check all ions in one go with
expected_pixel_features = np.median(actual[:, y - 1:y + 2, x - 1:x + 2], axis=[1, 2])
actual_pixel_features = adata.layers[COLOCML_LAYER][y * width + x, :]
np.testing.assert_allclose(actual_pixel_features, expected_pixel_features)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done.
# Quantile thresholding | ||
adata.X[0].reshape(-1) | ||
|
||
colocML_preprocessing( | ||
adata, median_filter_size=(3, 3), quantile_threshold=0.5, layer=COLOCML_LAYER | ||
) | ||
|
||
assert all(np.sum(adata.layers[COLOCML_LAYER] == 0, axis=0) <= 0.5 * adata.X.shape[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is often preferrable to have a single test case in a test function. For a second case, you could parametrize the function with @pytest.parametrize("quantile_threshold", [0, 0.5])
or, if the call or assertions are too different, write a separate test function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make the assertion a bit clearer by extracting the parameter to a variable and computing the assertion values in separate lines.
quantile_threshold = 0.5
colocML_preprocessing(
adata, median_filter_size=(3, 3), quantile_threshold=quantile_threshold, layer=COLOCML_LAYER
)
quantile_value = quantile_threshold * adata.X.shape[0]
actual_thresholded = np.sum(adata.layers[COLOCML_LAYER] == 0, axis=0)
assert all(actual_thresholded <= quantile_value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added quantile_threshold to parameterization.
@@ -16,13 +17,15 @@ def _check_pixel_coordinates(adata: AnnData) -> bool: | |||
return np.all(np.equal(pixel_list, required_pixels)) | |||
|
|||
|
|||
def anndata_to_image_array(adata: AnnData) -> np.ndarray: | |||
def anndata_to_image_array(adata: AnnData, layer: Optional[str]=None) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added layer
argument to the anndata_to_image_array
function. With this also the processed images from layers can be extracted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides one typo ready to merge!
ion image and all pixels below the quantile threshold will be set to 0. | ||
|
||
Returns: | ||
None. The processed data is saved in ``layer``. If layer is net to None, ``adata.X`` will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
net set
Colocalization is a common metric for analysis of imaging MS/spatial metabolomics data.
In a previous publication from Theo (ColocML), they evaluated metrics/processing steps for computing colocalization. I implemented the best-performing metric.
I think it is a useful addition to the package for users to perform this kind of analysis/processing which cannot be done by other downstream analysis packages such as scanpy or squidpy.