-
Notifications
You must be signed in to change notification settings - Fork 17
Finding OK seq strand switched from the RFD track
In this section we will use a very simple script to find OK-seq strand switched for visualization in IGV. Please note that this script is was written in R using the rtracklayer package, so please cite the package when using it.
WARNING: this is a very simplified approach for getting approximately good results, for more accurate results use the original method proposed in the paper by Petryk et al. 2018.
These are the arguments for the script:
usage: OKseq_switches.R [-h] [-i INPUT] [-b BINSIZE] [-l LEFTMIN]
[-r RIGHTMIN] [-d DIFF] [-s SLIDESIZE]
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input OK-seq bigWig file.
-b BINSIZE, --binsize BINSIZE
Bin size [default 50000]
-l LEFTMIN, --leftmin LEFTMIN
Mean value of bin on left side [default -0.15]
-r RIGHTMIN, --rightmin RIGHTMIN
Mean value of bin on right side [default 0.15]
-d DIFF, --diff DIFF Minimum difference between left and right side means
[default 0.4]
-s SLIDESIZE, --slidesize SLIDESIZE
Minimum difference between left and right side means
[default 50000]
The input bigWig is specified with "-i ".
With "--binsize" you can set what is the size of one bin. The positions of bins is summarized in the figure below.
In terms of scoring, the "--leftmin" parameter sets the mean maximum value for the bin on the left (meaning the score has to be below this), the "--rightmin" parameter sets the minimum value of the bin on the right. The "--diff" parameter sets what is the minimum score difference between the left- and right bins. A region is accepted if:
1) left-bin score <= leftmin
2) right-bin score >= rightmin
3) (right-bin score) - (left-bin score) > diff
The "--slidesize" parameter sets the sliding window steps, by default every 50k bases are examined.
An RFD track in bigWig format created by BAMscale. In this example we will find OK-seq strand switches obtained form here
We added an R script to the main page. This script takes as input one argument, which is the RFD track in bigWig format created by BAMscale.
You will need R, and the rtracklayer package. See page for installation.
If you have Bioconductor already installed, then simply open an R session and run:
BiocManager::install("rtracklayer")
Additionally the argparse package is needed to parse the command line arguments of the script. This can be installed in an R session:
install.packages("argparse")
In this example we will reanalyze OK-seq data from K562
After alignment and RDF track creation with BAMscale, the OK-seq strand switches can be found by running:
Rscript R/OKseq_switches.R -i ERR2760956_K562_OK-seq_BR1.rfd.bw
This command will create a BED file named ERR2760956_K562_OK-seq_BR1.rfd.OKseq_switches.bed
Here is a visualization of the output bigwig file and replication timing segments:
In this example we will reanalyze OK-seq data from the example in the OK-seq manual or BAMscale.
The OK-seq strand switches for this example can be found by running can be found by running:
Rscript R/OKseq_switches.R -i SRR7440353.sorted.dedup.rfd.bw
This command will create a BED file named SRR7440353.sorted.dedup.rfd.OKseq_switches.bed
Here is a visualization of the output bigwig file and replication timing segments:
Since the approach of the script is simple, you can see in the example that not all regions are identified correctly (left side). This can be fixed by increasing the bin size and adjusting the bin score thresholds.
- Main page
- Home
- Installation
- Benchmarking
- Brief examples
- Detailed Manuals
- Visualization scripts