Skip to content

GC and mappability corrected fragment coverage for paired end whole genome sequencing

License

Notifications You must be signed in to change notification settings

mskilab-org/fragCounter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Build Status codecov.io

fragCounter

The goal of fragCounter is to correct Whole genome or targeted sequencing data for GC and mappability bias.
The GC bias curve is determined by loess regression of read count by GC and mappability scores.
Segmentation is done by circular binary segmentation (CBS) algorithm after getting tumor/normal ratios of corrected read counts.

Installation (R package)

  1. Install devtools from CRAN (if you don't have it already)
install.packages('devtools')                                                                                                                                                                                                                                                                                                
  1. Install fragCounter
devtools::install_github('mskilab/fragCounter')                                                                                                                                                                                                                                                                              

Example

(after installing R package) Add fragCounter directory to PATH and test the executable

$ export PATH=${PATH}:$(Rscript -e 'cat(paste0(installed.packages()["fragCounter", "LibPath"], "/fragCounter/extdata/"))')
$ frag -h ## to see the help message
$ ./frag -b inst/extdata/chr21.bam -d inst/extdata/gcMAP21/ -w 200  

Rprofile Loading                                                                                                                                                                                                                                                                                                            
Rprofile Finished Loading                                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                            
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—                                                                                                                                                                                                                               
β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β• β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ•‘β•šβ•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—                                                                                                                                                                                                                              
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•                                                                                                                                                                                                                              
β–ˆβ–ˆβ•”β•β•β•  β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•”β•β•β•  β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—                                                                                                                                                                                                                              
β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘                                                                                                                                                                                                                              
β•šβ•β•     β•šβ•β•  β•šβ•β•β•šβ•β•  β•šβ•β• β•šβ•β•β•β•β•β•  β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β•  β•šβ•β•β•β•β•β• β•šβ•β•  β•šβ•β•β•β•   β•šβ•β•   β•šβ•β•β•β•β•β•β•β•šβ•β•  β•šβ•β•                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                            
Calling samtools view -f 0x02 -F 0x10 inst/extdata/chr21.bam -q 1 | cut -f "3,4,9" 
Starting fragment count on inst/extdata/chr21.bam and min mapQ 1 and   insert size limit 10000  
Finished computing coverage, and making GRanges  
Finished acquiring coverage 
Loaded GC and mappability  
length cov is 314827, length gc is 3663, length map is 3663   
Synced coverage, GC, and mappability   
Modified gc / mappability correction 
Converting to data.table 
Converting to GRanges                                                                                                                                                                                                                                                                                                       
Made GRanges

Usage (frag executable)

$ ./frag -h

Rprofile Loading                                                                                                                                                                                                                                                                                                            
Rprofile Finished Loading                                                                                                                                                                                                                                                                                                   
Usage: ./frag [options]                                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                            
Options:                                                                                                                                                                                                                                                                                                                    
        -b BAM, --bam=BAM                                                                                                                                                                                                                                                                                                   
                Path to .bam or .cram file (if .cram file provided then -r
                argument must be supplied)                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                            
        -c COV, --cov=COV                                                                                                                                                                                                                                                                                                   
                Path to existing coverage rds or bedgraph                                                                                                                                                                                                                                                                   

        -r FASTA, --reference=FASTA
        
                Path to reference .fasta file - required only if cram file is provided.
                                                                                                                                                                                                                                                                                                                            
        -m MIDPOINT, --midpoint=MIDPOINT                                                                                                                                                                                                                                                                                    
                If TRUE only count midpoint if FALSE then count bin footprint of every fragment interval                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                                            
        -w WINDOW, --window=WINDOW                                                                                                                                                                                                                                                                                          
                Window / bin size                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                            
        -d GCMAPDIR, --gcmapdir=GCMAPDIR                                                                                                                                                                                                                                                                                    
                Mappability / GC content dir                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                            
        -q MINMAPQ, --minmapq=MINMAPQ                                                                                                                                                                                                                                                                                       
                Minimal map quality                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                                            
        -p PAIRED, --paired=PAIRED                                                                                                                                                                                                                                                                                          
                Is paired                                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                            
        -e EXOME, --exome=EXOME                                                                                                                                                                                                                                                                                             
                Use exons as bins instead of fixed window                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                            
        -u USE.SKEL, --use.skel=USE.SKEL                                                                                                                                                                                                                                                                                    
                Use user defined regions instead of default exome skeleton                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                                                            
        -s SKELETON, --skeleton=SKELETON                                                                                                                                                                                                                                                                                    
                Path to skeleton file                                                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                                                            
        -o OUTDIR, --outdir=OUTDIR                                                                                                                                                                                                                                                                                          
                Directory to dump output into                                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                                                            
        -l LIBDIR, --libdir=LIBDIR                                                                                                                                                                                                                                                                                          
                Directory containing this R file                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                            
        -h, --help                                                                                                                                                                                                                                                                                                          
                Show this help message and exit
                

GC and Mappability files for 1 kb WGS (hg19)

These can be used as inputs to fragCounter to correct GC and Mappability biases. Make sure to save them in same directory and supplying the path to the directory. Retain the name of the file.

GC file: https://mskilab.s3.amazonaws.com/hg19/GCMAP/gcmap.19/gc1000.rds

Mappability file: https://mskilab.s3.amazonaws.com/hg19/GCMAP/gcmap.19/map1000.rds

GC and Mappability files for hg38 WGS (200bp)

GC file: https://mskilab.s3.amazonaws.com/hg38/GCMAP/gcmap.38/gc200.rds

Mappability file: https://mskilab.s3.amazonaws.com/hg38/GCMAP/gcmap.38/map200.rds

GC and Mappability files for hg38 WGS (1000bp)

GC file: https://mskilab.s3.amazonaws.com/hg38/GCMAP/gcmap.38/gc1000.rds

Mappability file: https://mskilab.s3.amazonaws.com/hg38/GCMAP/gcmap.38/map1000.rds

About

GC and mappability corrected fragment coverage for paired end whole genome sequencing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages