Tools to create:
- a .png image file describing all variants (obtained from vardict-java variant caller) alongside a genome/assembly (to provide) with their proportion (ordinates), with CDS descriptions (obtained from vadr annotator). At the top of the figure can be displayed the coverage depth repartition (if
-o cov_depth_f
option is provided).
Python/R scripts and Galaxy wrapper to use them.
It uses the results of:
- vadr >= 1.4.1 for annotation (of reference/assembly, tested with vadr 1.6.4 too)
- vardict-java 1.8.3 for variant calling (of BAM alignement using reference/assembly and reads)
-
vvv2_display.py
: main script running each step of analyses This script can be run independently, once vvv2 conda environment is installed and activated. Type./vvv2_display.py
then enter to get help on how to use it. -
PYTHON_SCRIPTS/convert_tbl2json.py
: Convertvadr
annotation output .tbl file to json -
PYTHON_SCRIPTS/convert_vcffile_to_readablefile.py
: Convertvardict-java
variant calling vcf file to human readable txt file -
PYTHON_SCRIPTS/correct_multicontig_vardict_vcf.py
: Correctvadr
annotation output .tbl file for contigs positions when the assembly provided is composed of more than one contig.
R_SCRIPTS/visualize_snp_v4.R
: Create a .png file showing on the same png figure:- coverage depth repartition alongside the genome/assembly (if
-o cov_depth_d
option provided) - variant proportions alongside the genome/assembly and CDS positions.
- coverage depth repartition alongside the genome/assembly (if
Use conda environment:
conda create -n vvv2_display -y
conda activate vvv2_display
mamba/conda install -c bioconda vvv2_display
Prefer mamba installation if completely new conda environments (faster). Do not mix mamba and conda.
Description:
vvv2_display.py -h
Typical usage:
vvv2_display.py -p res_vadr_pass.tsv -f res_vadr_fail.tsv -s res_vadr_seqstat.txt -n res_vardict_all.vcf -r res_vvv2_display.png -o cov_depth_f.txt -y -w 10
where:
res_vadr_pass.tsv
is the 'pass' file of vadr annotation program run on the genome/assemblyres_vadr_fail.tsv
is the 'fail' file of vadr annotation programres_vadr_seqstat.txt
is the 'seqstat' file of vadr annotation programres_vardict_all.vcf
is the result of vardict-java variant callerres_vvv2_display.png
is the name of the main output file (will be created)cov_depth_f.txt
is the coverage depth by position, provided bysamtools depth
run on the bam alignement file-y
tells to display coverage depth in linear scale (default log10 scale)-w 10
tells to set var significant threshold at 10% (default 7%): graphics display all variants, tsv summary will keep only significant ones (representation higher than this threshold)
All other options are for Galaxy wrapper compatibility (these are intermediate temporary files that must appear as parameter for Galaxy wrapper but are not used in a usual command line call)
Example is obtained on Turkey Coronavirus sequencing data, with as reference, the first draft assembly.
- png file:
Dotted vertical dash lines are contig boundaries.
- tsv summary file:
indice position ref alt freq gene prot lseq rseq isHomo*
1 6388 A G 0.1429 1a ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3 putative papain-like protease GTATTGTAGAAATTGTGATG GTATGGTCATCAAAATACAT no
2 6622 A G 0.0833 1a ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3 putative papain-like protease GAAGAAAGCTGTTTTTCTTA GGAAGCATTGAAATGTGAAC no
3 6838 A G 0.1429 1a ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3 putative papain-like protease AGTTTGTGACATTTTGTCTA TATAATTTCTGTAGATACTG no
4 7014 R A 0.8824 1a ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3 putative papain-like protease TACCGTCATATGGTATAGAC CTGATAAATTAACACCTCGT no
5 7833 G A 0.0909 1a ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP4 ATTGTTTTAATGGTGATAAT ATGCACCTGGAGCTTTACCA no
6 8110 T A 0.0833 1a ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP4 AGAACTTATGTTTAATATGG TAGTACATTCTTTACTGGTG no
7 9328 A G 0.1034 1a ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP5 putative 3C-like proteinase TGCATTACACACTGGAACGG CCTACATGGTGAGTTCTATG no
8 13404 A C 0.1429 intergene intergene GTTAGTGGGAACATCCAATA TTTAGTTGATCTTAGAACGT no
9 15255 A T 0.0882 1ab similar to ORF1ab polyprotein,similar to NSP13:GBSEP:putative helicase CTGTGGTAATCATAAACCAA GTTGTCAATACCGTTAGTAT no
10 15319 C T 0.0769 1ab similar to ORF1ab polyprotein,similar to NSP13:GBSEP:putative helicase TACAGGGCTAATTGTGCTGG AGCGAAAATGTTGATGATTT no
11 15326 A G 0.08 1ab similar to ORF1ab polyprotein,similar to NSP13:GBSEP:putative helicase CTAATTGTGCTGGCAGCGAA ATGTTGATGATTTTAATCAA yes
12 19937 G A 0.0714 1ab similar to ORF1ab polyprotein,similar to NSP16:GBSEP:putative 2-O-ribose methyltransferase TAACAGAGACAAGTTGGCAC AAAATTTATATGACATTGCA no
13 21092 T C 0.0811 S similar to spike protein TTACGTGGTGATAACACTGG GTTTCTTATGATTATCAGTG no
14 25794 TT AA 0.0838 5b 5b protein AGGATTAGATTGTGTTTACT CTTAACAAAGCAGGACAAGC no
*NB: an homopolymer region is set to 'yes' if there is a succession of at least 3 identical nucleotides.
it looks like a restrictive measure, but Ion Torrent and Nanopore sequencing are very bad on such region, so make sure you verify these variants.
vvv2_display.xml
: Allow Galaxy integration ofvvv2_display.py
. vvv2_display can be used in Galaxy pipelines.
Please, if you use vvv2_display and publish results, cite:
- Lai, Zhongwu, Aleksandra Markovets, Miika Ahdesmaki, Brad Chapman, Oliver Hofmann, Robert McEwen, Justin Johnson, Brian Dougherty, J. Carl Barrett, and Jonathan R. Dry. “VarDict: A Novel and Versatile Variant Caller for next-Generation Sequencing in Cancer Research.” Nucleic Acids Research 44, no. 11 (June 20, 2016): e108–e108. https://doi.org/10.1093/nar/gkw227.
- Schäffer, Alejandro A., Eneida L. Hatcher, Linda Yankie, Lara Shonkwiler, J. Rodney Brister, Ilene Karsch-Mizrachi, and Eric P. Nawrocki. “VADR: Validation and Annotation of Virus Sequence Submissions to GenBank.” BMC Bioinformatics 21, no. 1 (December 2020): 211. https://doi.org/10.1186/s12859-020-3537-3.
- Flageul, Alexandre, Pierrick Lucas, Edouard Hirchaud, Fabrice Touzain, Yannick Blanchard, Nicolas Eterradossi, Paul Brown, and Béatrice Grasland. “Viral Variant Visualizer (VVV): A Novel Bioinformatic Tool for Rapid and Simple Visualization of Viral Genetic Diversity.” Virus Research 291 (January 2021): 198201. https://doi.org/10.1016/j.virusres.2020.198201.