Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch/seq and processing #354

Merged
merged 11 commits into from
Oct 13, 2023
Merged

Patch/seq and processing #354

merged 11 commits into from
Oct 13, 2023

Conversation

anngvu
Copy link
Collaborator

@anngvu anngvu commented Oct 12, 2023

Additions mostly focused on processed data, though also fixed a broken ref.

So @allaway some data we're handling right now for KS upload is closer to "level 2" = ProcessedAlignedReads.
The "default" for any most assays is the level 1 template, which you're using, and things like genomic reference is not in there. But I suppose we do still want to collect level 1 attributes like library prep somewhere as well.

Our version:

"@id" : "bts:ProcessedAlignedReadsTemplate",
"schema:isPartOf" : {
"@id" : "http://schema.biothings.io/"
},
"sms:required" : "sms:false",
"sms:requiresComponent" : "",
"rdfs:label" : "ProcessedAlignedReadsTemplate",
"rdfs:comment" : "Template for describing aligned reads (e.g. BAM/CRAM files) from a sequencing assay. The QC meta are extracted from samtools stats when available and are the same metrics preferred by GDC. \n",
"@type" : "rdfs:Class",
"sms:requiresDependency" : [ {
"@id" : "bts:Component"
}, {
"@id" : "bts:Filename"
}, {
"@id" : "bts:fileFormat"
}, {
"@id" : "bts:resourceType"
}, {
"@id" : "bts:dataType"
}, {
"@id" : "bts:dataSubtype"
}, {
"@id" : "bts:assay"
}, {
"@id" : "bts:individualID"
}, {
"@id" : "bts:species"
}, {
"@id" : "bts:sex"
}, {
"@id" : "bts:age"
}, {
"@id" : "bts:ageUnit"
}, {
"@id" : "bts:diagnosis"
}, {
"@id" : "bts:nf1Genotype"
}, {
"@id" : "bts:nf2Genotype"
}, {
"@id" : "bts:modelSystemName"
}, {
"@id" : "bts:genomicReference"
}, {
"@id" : "bts:genomicReferenceLink"
}, {
"@id" : "bts:averageInsertSize"
}, {
"@id" : "bts:averageReadLength"
}, {
"@id" : "bts:averageBaseQuality"
}, {
"@id" : "bts:pairsOnDifferentChr"
}, {
"@id" : "bts:readsDuplicatedPercent"
}, {
"@id" : "bts:readsMappedPercent"
}, {
"@id" : "bts:meanCoverage"
}, {
"@id" : "bts:proportionCoverage10x"
}, {
"@id" : "bts:proportionCoverage30x"
}, {
"@id" : "bts:readDepth"
}, {
"@id" : "bts:totalReads"
}, {
"@id" : "bts:workflow"
}, {
"@id" : "bts:workflowLink"
}, {
"@id" : "bts:auxiliaryAsset"

HTAN's closest version (DNA-seq level 2):

{
                    "@id": "bts:Component"
                },
                {
                    "@id": "bts:Filename"
                },
                {
                    "@id": "bts:FileFormat"
                },
                {
                    "@id": "bts:HTANParentDataFileID"
                },
                {
                    "@id": "bts:HTANDataFileID"
                },
                {
                    "@id": "bts:AlignmentWorkflowUrl"
                },
                {
                    "@id": "bts:AlignmentWorkflowType"
                },
                {
                    "@id": "bts:GenomicReference"
                },
                {
                    "@id": "bts:GenomicReferenceURL"
                },
                {
                    "@id": "bts:IndexFileName"
                },
                {
                    "@id": "bts:AverageBaseQuality"
                },
                {
                    "@id": "bts:AverageInsertSize"
                },
                {
                    "@id": "bts:AverageReadLength"
                },
                {
                    "@id": "bts:MeanCoverage"
                },
                {
                    "@id": "bts:PairsOnDiffCHR"
                },
                {
                    "@id": "bts:TotalReads"
                },
                {
                    "@id": "bts:ProportionReadsMapped"
                },
                {
                    "@id": "bts:MapQ30"
                },
                {
                    "@id": "bts:TotalUniquelyMapped"
                },
                {
                    "@id": "bts:TotalUnmappedreads"
                },
                {
                    "@id": "bts:ProportionReadsDuplicated"
                },
                {
                    "@id": "bts:ShortReads"
                },
                {
                    "@id": "bts:ProportionCoverage10x"
                },
                {
                    "@id": "bts:ProportionCoverage30X"
                },
                {
                    "@id": "bts:ProportionTargetsNoCoverage"
                },
                {
                    "@id": "bts:ProportionBaseMismatch"
                },
                {
                    "@id": "bts:ProportionMitochondrialReads"
                },
                {
                    "@id": "bts:Contamination"
                },
                {
                    "@id": "bts:ContaminationError"
                }

@anngvu anngvu requested a review from allaway October 12, 2023 23:43
@github-actions
Copy link

github-actions bot commented Oct 12, 2023

Test Suite Report

Template Generation

template result link
ClinicalAssayTemplate 😄 template link
EpigeneticsAssayTemplate 😄 template link
FlowCytometryTemplate 😄 template link
GenomicsAssayTemplate 😄 template link
GenomicsAssayTemplateExtended 😄 template link
ImagingAssayTemplate 😄 template link
LightScatteringAssayTemplate 😄 template link
MRIAssayTemplate 😄 template link
PharmacokineticsAssayTemplate 😄 template link
PlateBasedReporterAssayTemplate 😄 template link
ProcessedAlignedReadsTemplate 😄 template link
ProcessedExpressionTemplate 😄 template link
ProcessedVariantCallsTemplate 😄 template link
ProteomicsAssayTemplate 😄 template link
ProtocolTemplate 😄 template link
RNASeqTemplate 😄 template link
ScRNASeqTemplate 😄 template link
UpdateMilestoneReport 😄 template link
WESTemplate 😄 template link
WGSTemplate 😄 template link

Manifest Validation

manifest result expectation
GenomicsAssayTemplate_0.csv 😄 Lists can be blank if attr not required using ‘list like’ rule
GenomicsAssayTemplate_1.csv 😄 Mixing blanks and regular list values works
GenomicsAssayTemplate_2.csv 😄 Conditional validation for attributes is currently not supported
ScRNASeqTemplate_0.csv 😄 Single list val works by using ‘list like’ rule
ScRNASeqTemplate_1.csv 😄 Fail because of missing data in required field libraryStrand

@allaway
Copy link
Contributor

allaway commented Oct 13, 2023

So @allaway some data we're handling right now for KS upload is closer to "level 2" = ProcessedAlignedReads.
The "default" for any most assays is the level 1 template, which you're using, and things like genomic reference is not in there. But I suppose we do still want to collect level 1 attributes like library prep somewhere as well.

Hmm, yeah, it's kind of an issue with the model of data organization that DCA imposes. I could create custom manifests using the fileFormat info; e.g. delete all the rows that are not fq in a level 1 manifest, all the rows that are not bam/cram in a level 2 manifest. This will add more complexity w/r/t the synapse_storage_manifest.csv which I'd have to keep removing. I could also reorganize the data into top-level folders that reflect the data level. This would add more complexity when it comes time to index the next data deposit in the bucket, as we'd probably get duplicate filehandles created by the indexing workflow for the files that have already been indexed.

🤔

Anyway, the PR looks good to me! I caught a few OLS3 links and replaced with the actual pURL since that OLS3 is deprecated at the end of the month.

Copy link
Contributor

@allaway allaway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved because LGTM, but if you could commit my suggested changes first that would be great! :)

modules/Data/Data.yaml Outdated Show resolved Hide resolved
modules/Data/Data.yaml Outdated Show resolved Hide resolved
modules/Data/Data.yaml Outdated Show resolved Hide resolved
@anngvu anngvu merged commit dc639b3 into main Oct 13, 2023
@anngvu anngvu deleted the patch/seq-and-processing branch October 13, 2023 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants