Skip to content

A repository with working scripts for IJMS (MDPI) 2020 paper

Notifications You must be signed in to change notification settings

lab7arriam/IJMS_2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

The Distribution of Several Genomic Virulence Determinants Does Not Corroborate the Established Serotyping Classification of Bacillus thuringiensis.

A repository with working scripts for IJMS (MDPI) 2020 paper

Reference

If you use the code/data from this repotitoy please cite: Shikov, A.E.; Malovichko, Y.V.; Lobov, A.A.; Belousova, M.E.; Nizhnikov, A.A.; Antonets, K.S. The Distribution of Several Genomic Virulence Determinants Does Not Corroborate the Established Serotyping Classification of Bacillus thuringiensis. Int. J. Mol. Sci. 2021, 22, 2244. https://doi.org/10.3390/ijms22052244

Contents

This repository contains scripts used for data preparation for the manuscript. Please consult the Methods section in the paper for extra details.

Scripts' description

  • agregate_cry_data.py - summarizes the results about 3-D cry proteins spectra in the assemblies based on CryProcessor results (Table S15 in the article);
  • agregate_flagelin_data.py, compare_flagellin_sets.py – aggregating the distribution of lengths and abundances of the flagellin sequences clustered via Roary (Table S10 in the article);
  • annotate_bt_assemblies.py – summarizing metadata of the assemblies inspected (Table S4 in the article);
  • bt_pangenome_tree.R – visualizing trees and heatmaps, performing PCA (Figures 3-5 in the article);
  • calculate_mash_dist.py – constructing a heatmap with paired mash distance scores for all analyzed genomes (Table S12 and Figure S6a in the article);
  • calculate_mean_support.py – calculating mean supporting values for phylogenetic trees in Newick format (for Table S8 in the article);
  • check_lengths.py, CheckLengths.py - calculates sequence length for each sequence in the fasta and prints it in sequence ID - sequence length notation;
  • check_lengths_massive.py - performs check_lengths.py over a directory containing multifasta files solely;
  • cluster_pivot_PCR.py - adds initial cluster names and cluster names for the obtained amplicons (not referred to in the final version of the manuscript);
  • compHmmToAnnot.py - parses a table containing names of Roary-deduced orthologs, then compares them to the entries stored in the HMM output folder and extracts sequences matching the original cluster sequence names from the HMM outputs folder to a separate directory;
  • compare_genomes_full.py - constructing a heatmap with paired genome identity values using minimap2 for all analyzed genomes (Table S12 and Figure S6b in the article);
  • download_hags.py - downloads the hag gene sequences from Xu and Côté, 2006 (DOI: 10.1128/AEM.00328-06);
  • downloading.sh – downloading Bt assemblies the NCBI assembly database;
  • extract_proteins_for_trees.py – extracting protein sequences from Roary-emanated pangenome for a specific gene cluster;
  • extract_proteins.py - extracts sequences from the fasta files based on file containing a list of identifiers
  • extract_proteins_blast.py: extracts query identifiers from the previously filtered BLAST outputs and uses them to fetch protein sequences from the files;
  • ExtractByClusterName.py - extracts nucleotide sequences by the accessions stored in the Roary cluster table and fetches sequences from the cluster fasta files;
  • ExtractByLength.py - finds the longest/shortest sequence in the sequence file
  • fetch_nucleotide.py': assigns sequences from the HMMer or BLAST output to the Roary cluster reprsentatives adn
  • get_mean_identity.py – evaluating mean paired sequence identity in the fasta file (for Table S8 in the article);
  • parse_tree_topology.py – assessing the lengths of subtrees containing representatives of Bt serovars (Table S14 in the article);
  • roary_stat.py, summ_roary_stat.py – excluding assemblies from the Roary-generated pangenome based on the abundance of common gene clusters;
  • summarize_proteins_from_dige.py – gathering the gene presence/absence results based on diamond blastp results (Table S6 in the article).

About

A repository with working scripts for IJMS (MDPI) 2020 paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published