-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(schema 5.1.0): throw error if obsm is not type np.ndarray #859
Conversation
"WARNING: All embeddings have to be of 'numpy.ndarray' type, 'adata.obsm['harmony']' is <class 'pandas.core.frame.DataFrame'>').", | ||
] | ||
assert validator.errors == [ | ||
"WARNING: All embeddings have to be of 'numpy.ndarray' type, 'adata.obsm['harmony']' is <class 'pandas.core.frame.DataFrame'>')." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be consistent and append errors with ERROR (blocks validation) rather than WARNING (doesn't)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good for ndarray check, but I think we're missing validation for the rest of this rule:
The value for each str key MUST be a numpy.ndarray of shape (n_obs, m), where n_obs is the number of rows in X and m >= 1.
We throw a warning if a non-spatial, non-X-prefix embedding value has a row count != n_obs--that should become an error as well. And I don't think we check for column count >= 1 at all; we warn on column count < 2 (and error for spatial / X-prefix embedding keys)
# Conflicts: # cellxgene_schema_cli/tests/test_schema_compliance.py
f" 'adata.obsm['{key}']' has shape of '{value.shape}'." | ||
if len(value.shape) < 2: | ||
self.errors.append( | ||
f"All embeddings must at least two dimensions. 'adata.obsm['{key}']' has a shape length of '{len(value.shape)}'." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: nvm previous statment was incorrect
|
||
if unknown_key and value.shape[1] < 1: | ||
self.errors.append( | ||
f"All other embeddings must have at least one column. 'adata.obsm['{key}']' has columns='{value.shape[1]}'." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in this context, I don't think the curator reading would know what "other embeddings" mean. Maybe "any embeddings not specified in the schema reference"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 nit about wording on an error statement but otherwise looks good. Ready for product review after that
@brian-mott ready for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks Trent and Nayib!
Reason for Change
Changes
_validate_obsm
to make it clear why an error occured.Testing