-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update validation for obs['is_primary_data'] to enforce False if uns['spatial']['is_single'] is False #865
Conversation
Co-authored-by: Nayib Gloria <[email protected]>
…f uns['spatial']['is_single'] is False
# Conflicts: # cellxgene_schema_cli/tests/fixtures/examples_validate.py # cellxgene_schema_cli/tests/test_schema_compliance.py # cellxgene_schema_cli/tests/test_validate.py
@@ -1000,13 +1000,14 @@ def _validate_obsm(self): | |||
self.errors.append( | |||
f"Suffix for embedding key in 'adata.obsm' {key} does not match the regex pattern {regex_pattern}." | |||
) | |||
else: | |||
elif key.lower() != "spatial": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff is from #827 merged into main
if not re.match(regex_pattern, key): | ||
self.errors.append( | ||
f"Embedding key in 'adata.obsm' {key} does not match the regex pattern {regex_pattern}." | ||
) | ||
self.warnings.append( | ||
f"Embedding key in 'adata.obsm' {key} does not start with X_ and thus will not be available in Explorer" | ||
f"Embedding key in 'adata.obsm' {key} is not 'spatial' nor does it start with 'X_'. Thus, it will " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff is from #827 merged into main
self.errors.append( | ||
"'spatial' embedding is forbidden in 'adata.obsm' if " "adata.uns['spatial']['is_single'] is not set." | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff is from #827 merged into main
@@ -142,7 +142,7 @@ | |||
"HANCESTRO:0575", | |||
"HsapDv:0000003", | |||
"donor_1", | |||
"nucleus", | |||
"na", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff is from #827 merged into main
83f09a7
to
4d4fb2b
Compare
1888316
to
b933f55
Compare
# Conflicts: # cellxgene_schema_cli/cellxgene_schema/validate.py # cellxgene_schema_cli/tests/test_schema_compliance.py # cellxgene_schema_cli/tests/test_validate.py
@@ -1993,30 +2010,18 @@ def test_obsm_values_no_X_embedding__non_spatial_dataset(self, validator_with_ad | |||
"WARNING: Validation of raw layer was not performed due to current errors, try again after fixing current errors.", | |||
] | |||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split this test up into two for simplicity in the set-up
validator.validate_adata() | ||
assert validator.errors == [] | ||
|
||
def test_obsm_values_spatial_embedding_present__is_single_false(self, validator_with_visium_assay): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved this test into test_validate.py
as test__validate_spatial_is_single_false_ok
and renamed now that we have the fixture adata_spatial_is_single_false
with spatial embedding present
""" | ||
Validate is_primary_data for spatial datasets. | ||
""" | ||
is_single = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use _is_single()
here once this is rebased on main
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes good call
return | ||
if is_single is False and obs["is_primary_data"].any(): | ||
self.errors.append( | ||
"When uns['spatial']['is_single'] is False, " "obs['is_primary_data'] must be False for all rows." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a break in the string here (is False, " "obs['is_primary_data']
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I decided not to explicitly add a test_schema_compliance test case for allowing is_primary_data to be True or False if uns["spatial"]["is_single"] is None, because we implicitly cover this in test_validate.py
by running test__validate_with_h5ad_valid_and_labels
, which uses a fixture that has is_primary_data values True and False and no spatial in uns. I could be convinced to add the test case for clarity, but leaning towards no. LMK what you think
validator.validate_adata() | ||
assert validator.errors == [] | ||
assert validator.is_spatial is True | ||
|
||
def test_obsm_values_no_X_embedding__slide_seq_v2_dataset(self, validator_with_slide_seq_v2_assay): | ||
validator = validator_with_slide_seq_v2_assay | ||
validator.adata.uns["default_embedding"] = "spatial" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this test is missing validator.validate_adata()
and the corresponding assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sad, it must've gotten lost along the way during a merge resolution
obs = getattr_anndata(self.adata, "obs") | ||
if obs is None or "is_primary_data" not in obs: | ||
return | ||
if self._is_single() is False and obs["is_primary_data"].any(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are "secondary errors" typically handled? For example, should this error be reported for a non-spatial dataset that has (incorrectly) specified uns.spatial.is_single = False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we accumulate all errors by appending to "self.errors", something like that would be handled wherever the uns check is being done. In this case, we should handle that in check_spatial_uns. I'll double check if we already are and add it in if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait I see what you mean--yes, we do report secondary errors like this. That way, if the problem was that they did mean for it to be spatial but accidentally used the wrong assay, we still tell them in 1 run all the other issues associated with it. Generally, we try to report every possible error in one go even if they'd all be fixed with 1 upstream correction. With some exceptions--we don't validate the X matrix, for instance, if there are issues with dependent columns before then, because its an expensive operation to run if we know there will be failures. But we do print a warning letting them know that this was the case--WARNING: "Validation of raw layer was not performed due to current errors, try again after fixing current errors."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think its fine that we exit early w/o checking spatial values in check_spatial_uns
if assay is not 'EFO:0010961' (Visium Spatial Gene Expression) or 'EFO:0030062' (Slide-seqV2), since curators have not requested otherwise and the errors would end up being the same for each one (i.e. this key should not exist if its not visium
X times)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks Nayib!
Looks good, thanks Nayib! I have added a couple of comments. |
validator.adata.obsm["spatial"] = validator.adata.obsm["X_umap"] | ||
def test_obsm_values_no_X_embedding__visium_dataset(self, validator_with_visium_assay): | ||
validator = validator_with_visium_assay | ||
validator.adata.uns["default_embedding"] = "spatial" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with the complete, required shape for adata
here but I see there are a couple of differences between the previous version and validator_with_visium_assay
(e.g. validator.adata.obsm["harmony"]
is no longer defined, validator.adata.uns["default_embedding"]
is now spatial
). Are these updates expected/OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, its expected
for non-spatial datasets, not having an X-prefix embedding like X_umap
should result in an error (see test_obsm_values_no_X_embedding__non_spatial_dataset
).
However, for spatial datasets, we do not require an X-prefix embedding, just 'spatial' embedding. This test case was originally modeled after test_obsm_values_no_X_embedding__non_spatial_dataset
, which used 'harmony' as its non-X-prefix embedding to trigger the error. But I realized spatial is already a non-X-prefix embedding so we didn't need to add another one to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Nayib!
@brian-mott ready for your review! |
Looks good, thanks Nayib! |
Reason for Change
Changes