Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cellxgene-schema CLI must update validation for obs['is_primary_data'] #834

Closed
brianraymor opened this issue Mar 27, 2024 · 4 comments
Closed
Assignees
Labels
5.1 Next minor CELLxGENE schema version after 5.0 curation software dp Data Platform Team work

Comments

@brianraymor
Copy link
Contributor

Design

See is_primary_data.

The new requirement is:

This MUST be False if uns['spatial']['is_single'] is False.

is_primary_data

Key is_primary_data
Annotator Curator MUST annotate.
Value bool. This MUST be False if uns['spatial']['is_single'] is False. This MUST be True if this is the canonical instance of this cellular observation and False if not. This is commonly False for meta-analyses reusing data or for secondary views of data.

@nayib-jose-gloria
Copy link
Contributor

@brianraymor @brian-mott for validation of obs columns that depend on spatial metadata like this--do we need to account for datasets having both spatial and non-spatial assay rows? Would that scenario ever happen?

If so, should This MUST be False if uns['spatial']['is_single'] is False. only apply to rows with Visium Spatial Gene Expression or Slideseqv2 assays? Or all rows as long as uns['spatial']['is_single'] exists and is False?

@brianraymor
Copy link
Contributor Author

brianraymor commented May 1, 2024

do we need to account for datasets having both spatial and non-spatial assay rows? Would that scenario ever happen?

No. The use of uns for spatial implicitly indicates that we're "allowing" only Visium Spatial Gene Expression or Slideseqv2 in the dataset (not a mixture of assays per observations). I think we may have had a hard requirement in assay_ontology_term_id in an earlier draft that required all observations to have the same term id when Visium Spatial Gene Expression or Slideseqv2. We could clarify. CC: @jahilton for thoughts

@jahilton
Copy link
Collaborator

jahilton commented May 1, 2024

Yes, I can see the gap in the current documentation & think we can close that by enforcement in assay_ontology_term_id. If any observation is assay:Visium then all must be assay:Visium. If any observation is assay:Slide-seqV2 then all must be assay:Slide-seqV2.
This blocks the possibility of a Visium-Slide-seq integration, but I think that would complicate the X table. Better to deal with that when we expand to more spatial assays (when that type of integration is more likely)

@brianraymor
Copy link
Contributor Author

@nayib-jose-gloria - tracking in #871 - will create a matching CLI issue when the schema is updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5.1 Next minor CELLxGENE schema version after 5.0 curation software dp Data Platform Team work
Projects
None yet
Development

No branches or pull requests

3 participants