Skip to content

Methods and Parameter

Fritz Sedlazeck edited this page Nov 1, 2018 · 7 revisions

Detailed description of the methods and the parameter available

SURVIVOR incorporates multiple options to simulate/evaluate or simply analyze Structural variations. To use a specific module use:

 SURVIVOR ID 

If you have further questions please contact me: [email protected]

1. Simulate SV

SURVIVOR requires a reference file and a parameter file (e.g. par) to simulate SVs. A template par file can be generated by SURVIVOR itself. Each SV is simulated according to the size constrains and the number of events that the user specifies in the parameter file.

SURVIVOR incorporates two modi of operation which affects the coordinates that are reported per SVs. First, for simulating the reads that incorporate SVs. (Parameter 3 =0). Second, for the scenario that the reference should be altered and the SV should be detected using e.g. real reads (Parameter 3 =1).

In the end, SURVIVOR reports the so altered reference (*.fasta) and the file that summarizes the events that have been simulated. The file an be read as follows: A standart case is a start and stop coordinate plus the type of SV that was simulated. E.g a deletion ranging from 44432754 to 44432861 on chr 22.

22 44432754 22 44432861 DEL

Sniffles simulates translocation events always as balanced transloactions exchanging two regions. Thus, each TRA event has two entries:

22 31778779 21 40771179 TRA 22 31778881 21 40771281 TRA

A SV caller should identify both translocations from 31778779 on chr 22 to 40771179 on chr 21 as well as 31778881 on 22 to 40771281 on chr 21.

SURVIVOR is further able to simulate more nested events. In this case each event is reported separately:

21 35669707 21 35669793 INV 21 35669697 21 35669707 DEL 21 35669793 21 35669803 DEL

This represents an invdel : inversion flanked by two deletions on chr21.

2. Simulate reads

SURVIVOR incorporates a screening as well as simulation module.

2.1 Error profile generation

The screening module uses a streamed sam file and extracts the underlying read length distribution as well as the rations of match, mismatch, insertion and deletion per read position. The screening can be run like this:

samtools view your.bam | ~/workspace/SURVIVOR/Debug/SURVIVOR 2 scan 1000 your_error_profile_bwa.txt 

where 1000 is the minimum alignment length to consider. Note that the process can be speed up by subsampling alignments either like this:

 samtools view -s 0.1 your.bam | ~/workspace/SURVIVOR/Debug/SURVIVOR scanreads 1000 your_error_profile_bwa.txt

or by using head:

 samtools view -s 0.1 your.bam | head -n 10000 |  ~/workspace/SURVIVOR/Debug/SURVIVOR scanreads 1000 your_error_profile_bwa.txt

2.2 Read simulation

If you want to skip the previous part: I put two error profiles for Oxford NAnopore (based on NA12878) and Pacbio (HG002 GiaB) into the folder.

The read simulation takes the error profiles and a reference genome of your interest. You only need to specify the coverage that you want to simulate and every other parameter is taking form the previously obtained error profile.

3. Evaluate SV calls

After you obtained your call set based on the simulation (see option 1) you can use this to evaluate the performance of your SV method. You will need your vcf file, the before generated list of simulated SV, the maximum distance that you want to allow and a prefix for the output. SURVIVOR compares the type as well as the locations of the SVs simulated and called and reports correctly and additional found SVs in separate vcf files:

*_right.vcf *addition.vcf

Furthermore, SURVIVOR prints on the command line the number of SV per category:

DEL/DUP/INV/TRA/INS for found, missing and additional as well as the sensitivity as well as the false discovery rate.

The reads are reported with the read name indicating the lest most start position of the read and its strand.

4: Merge or consensus calling from multiple SV VCF files

SURVIVOR can be used to compare and obtain a consensus call set from these vcf files. Note in general SURVIVOR can incorporate any technology or other callers as long as the SVs are represented as a vcf file.

First, we collect all files that we want to merge in a file.

ls *vcf > sample_files

Next, we use SURVIVOR to obtain a call set.

./SURVIVOR merge sample_files 1000 2 1 1 0 30 sample_merged.vcf

This will merge all the vcf files specified in sample_files together using a maximum allowed distance of 1kb. Furthermore we ask SURVIVOR only to report calls supported by 2 callers and they have to agree on the type (1) and on the strand (1) of the SV. Note you can change this behavior by altering the numbers from 1 to e.g. 0. In addition, we told SURVIVOR to only compare SV that are at least 30bp long and print the output in sample_merged.vcf.

For simply merging different SV set one can set the minimum required support by caller to 1 instead of 2 and retrieve a union set across all samples.

SURVIVOR reports overlapping tags with SUPP= and SUPP_VEC= which represents the number of samples carrying the SV and which sample (0/1 per sample in the same order as the genotype columns). NOTE ./. or 0/0 is not counted as supporting a variant.

5: Filter SV calls

This module helps to filter down raw SVs calls based on multiple features.

~/workspace/SURVIVOR/Debug/SURVIVOR filter my_SV.vcf bad_regions.bed 50 -1 0.01 10 my_SV.filterd.vcf 

This module filters an existing VCF file based on multiple options. It can filter out SVs overlapping with certain regions ("bad_region.bed"), requiring certain size ranges (min 50bp, max not defined -1), certain allele frequency (here 0.01 -> 1%AF) and that the event is at least supported by n (here 10) reads. The output will be written in my_SV.filtered.vcf (as an example).

6: Converts a vcf SV calls file to BEDPE file

This module takes any vcf file and converts the content to a BEDPE file giving a minimum and maximum size of a SV.

~/workspace/SURVIVOR/Debug/SURVIVOR vcftobed my_SV.vcf 0 -1 output.bedpe 

You can disable the filtering by setting it to -1;

7: Convert a bed file to VCF file

This module helps to convert a bed file (consisting of chr\tstart\tstop) into a vcf file assuming all entries of the bed file are of a certain SV type.

 ~/workspace/SURVIVOR/Debug/SURVIVOR bedtovcf input.bed DEL out_del.vcf 

The resulting vcf file can be used by SURVIVOR or by any other method.

8: Summarizes a coverage file to bed file

This option is used to cluster positions with bad mapping quality to regions for filtering out SVs that could be false positives. The coverage file is assumed to be from samtools for example:

 samtools depth your_MQ0.bam > your_MQ0.cov
 ~/workspace/SURVIVOR/Debug/SURVIVOR stats your_MQ0.cov 1000 2 > my_MQ0_cov.bed 

Here we cluster together any positions that have at least 2 reads with MQ0 and are not further then 1kbp apart.