-
Notifications
You must be signed in to change notification settings - Fork 0
Input data
The sample info file includes details about samples that are processed by abemus
.
The format is simply 5 columns, tab-delimited, and there is no column header.
column 1: The patient ID.
column 2: The tumor sample ID.*
column 3: The full path to the tumor sample BAM file.*
column 4: The matched-tumor control sample ID
column 5: The full path to the matched-tumor control sample BAM file.
* This field must be unique.
-
What if you have tumors without matched-control samples ?
abemus
considers also tumors without matched-control samples to call somatic snvs. -
What if you have controls without matched-tumor samples ?
abemus
uses control samples to build a global error-sequencing (GSE) distribution. Keep control samples without matched-tumor in the simple info file and fill the corresponding case column with a NA.
Here an example of a valid sample info file:
PT01 TUMOR_A /my_project/data/TUMOR_A.bam CTRL_A /my_project/data/CTRL_A.bam
PT01 TUMOR_B /my_project/data/TUMOR_B.bam CTRL_B /my_project/data/CTRL_B.bam
PT02 TUMOR_C /my_project/data/TUMOR_C.bam CTRL_C /my_project/data/CTRL_C.bam
PT03 NA NA CTRL_D /my_project/data/CTRL_D.bam
PT04 TUMOR_E /my_project/data/TUMOR_E.bam NA NA
Control samples CTRL_A
, CTRL_B
, CTRL_C
and CTRL_D
will be used to build the GSE distributions.
Tumor samples TUMOR_A
, TUMOR_B
, TUMOR_C
and TUMOR_E
will be investigated to check for somatic snvs.
Calls in TUMOR_A
, TUMOR_B
, TUMOR_C
will be refined by exploiting tumor-control matched information.
abemus
looks for snvs in genomic regions of interest. These genomic regions must be in the BED tab-delimited format and sorted (i.e. sortBed). There is no column header.
The 3 required BED fields are:
column 1: Chromosome name ("chr" annotation must be consistent with the one in BAM file).
column 2: Starting position of the genomic region.
column 3: Ending position of the genomic region.
The ending position is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as start_base=0, end_base=100, and span the bases numbered 0-99.
pacbam bed=regions.bed vcf=snps.vcf fasta=hg19.fasta strandbias mode=5 out=PaCBAM_outdir
Pileup data from PaCBAM tool are split by chromosome in order to speed up the computational workflow. This task can be achieved by the abemus
built-in function split_pacbam_bychrom()
Usage ( ?split_pacbam_bychrom
)
split_pacbam_bychrom(targetbed = "/my_project/info/regions.bed",
pacbamfolder = "/my_project/data/PaCBAM_outdir",
pacbamfolder_bychrom = "/my_project/data/PaCBAM_outdir_bychrom")
The targetbed
is the BED tab-delimited file with targeted genomic regions;
The pacbamfolder
is the folder in which original .pileup
and .pabs
output data from PaCBAM are saved;
Output data will be written in the indicated pacbamfolder_bychrom
and it will contain a subfolder for each sample (both tumors and controls) with .pileup
and .pabs
data split by chromosome:
pacbamfolder_bychrom/
SAMPLE_id/
pileup/
chr1.pileup
chr3.pileup
...
snvs/
chr1.pabs
chr3.pabs
...
The
pacbamfolder_bychrom
out directory is not created by thesplit_pacbam_bychrom()
function, make sure to create it in advance. Only data present in the folderpacbamfolder_bychrom
will be considered in the downstream analysis.
Department of Cellular, Computational and Integrative Biology (CIBIO) at University of Trento, Italy.