PureCN interval files with mappability were generated with following commands. The interval files used in DepMap data generation can also be found in gs://ccleparams/references/PureCN_intervals
.
WES AGILENT
Rscript $PURECN/IntervalFile.R --in-file agilent_hg38_lifted_chrXY.no_header.bed --fasta ~/Data/VCFs/Liftover/hg38.fa --out-file agilent_hg38_intervals.txt --genome hg38 --mappability GCA_000001405.15_GRCh38_no_alt_analysis_set_100.bw
WES ICE
Rscript $PURECN/IntervalFile.R --in-file ice_hg38_lifted_chrXY.no_header.bed --fasta ~/Data/VCFs/Liftover/hg38.fa --out-file ice_hg38_intervals.txt --genome hg38 --mappability GCA_000001405.15_GRCh38_no_alt_analysis_set_100.bw
WGS
Here we uniformly sample 2% of the genome to use for absolution copy number inference.
Rscript make_wgs_intervals.R
We have also created new intervals from the recommended pureCN workflow, including mappability information on both the wgs intervals and wes intervals
Rscript $PURECN/IntervalFile.R --in-file wgs_hg38_intervals.bed --fasta ~/Data/VCFs/Liftover/hg38.fa --out-file wgs_hg38_2_percent_intervals.txt --genome hg38 --mappability GCA_000001405.15_GRCh38_no_alt_analysis_set_100.bw
PureCN-AGILENT for AGILENT
PureCN-AGILENT for ICE
PureCN for WGS
About 10% of PureCN calls need to be manually curated. The PureCN_Curation notebook should be used to select the solutions that require curation, download the solution PDFs, and then update the Terra workspace to reflect manual changes. Detailed curation guidelines can be found here.
Once manual curation is complete the PureCN output files need to be updated to reflect the newly selected solution. To do this, run PureCN_update_solution on the curated samples.
WGD is determined using this formula: -2*loh_frac + 3 < ploidy. The call_wgd.R script does step and is part of the PureCN and PureCN_update_solution workflows.