The Distribution of Several Genomic Virulence Determinants Does Not Corroborate the Established Serotyping Classification of Bacillus thuringiensis.

A repository with working scripts for IJMS (MDPI) 2020 paper

Reference

If you use the code/data from this repotitoy please cite: Shikov, A.E.; Malovichko, Y.V.; Lobov, A.A.; Belousova, M.E.; Nizhnikov, A.A.; Antonets, K.S. The Distribution of Several Genomic Virulence Determinants Does Not Corroborate the Established Serotyping Classification of Bacillus thuringiensis. Int. J. Mol. Sci. 2021, 22, 2244. https://doi.org/10.3390/ijms22052244

Scripts' description

agregate_cry_data.py - summarizes the results about 3-D cry proteins spectra in the assemblies based on CryProcessor results (Table S15 in the article);
agregate_flagelin_data.py, compare_flagellin_sets.py – aggregating the distribution of lengths and abundances of the flagellin sequences clustered via Roary (Table S10 in the article);
annotate_bt_assemblies.py – summarizing metadata of the assemblies inspected (Table S4 in the article);
bt_pangenome_tree.R – visualizing trees and heatmaps, performing PCA (Figures 3-5 in the article);
calculate_mash_dist.py – constructing a heatmap with paired mash distance scores for all analyzed genomes (Table S12 and Figure S6a in the article);
calculate_mean_support.py – calculating mean supporting values for phylogenetic trees in Newick format (for Table S8 in the article);
check_lengths.py, CheckLengths.py - calculates sequence length for each sequence in the fasta and prints it in sequence ID - sequence length notation;
check_lengths_massive.py - performs check_lengths.py over a directory containing multifasta files solely;
cluster_pivot_PCR.py - adds initial cluster names and cluster names for the obtained amplicons (not referred to in the final version of the manuscript);
compHmmToAnnot.py - parses a table containing names of Roary-deduced orthologs, then compares them to the entries stored in the HMM output folder and extracts sequences matching the original cluster sequence names from the HMM outputs folder to a separate directory;
compare_genomes_full.py - constructing a heatmap with paired genome identity values using minimap2 for all analyzed genomes (Table S12 and Figure S6b in the article);
download_hags.py - downloads the hag gene sequences from Xu and Côté, 2006 (DOI: 10.1128/AEM.00328-06);
downloading.sh – downloading Bt assemblies the NCBI assembly database;
extract_proteins_for_trees.py – extracting protein sequences from Roary-emanated pangenome for a specific gene cluster;
extract_proteins.py - extracts sequences from the fasta files based on file containing a list of identifiers
extract_proteins_blast.py: extracts query identifiers from the previously filtered BLAST outputs and uses them to fetch protein sequences from the files;

ExtractByClusterName.py - extracts nucleotide sequences by the accessions stored in the Roary cluster table and fetches sequences from the cluster fasta files;

ExtractByLength.py - finds the longest/shortest sequence in the sequence file
fetch_nucleotide.py': assigns sequences from the HMMer or BLAST output to the Roary cluster reprsentatives adn
get_mean_identity.py – evaluating mean paired sequence identity in the fasta file (for Table S8 in the article);
parse_tree_topology.py – assessing the lengths of subtrees containing representatives of Bt serovars (Table S14 in the article);
roary_stat.py, summ_roary_stat.py – excluding assemblies from the Roary-generated pangenome based on the abundance of common gene clusters;
summarize_proteins_from_dige.py – gathering the gene presence/absence results based on diamond blastp results (Table S6 in the article).

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
pics		pics
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Distribution of Several Genomic Virulence Determinants Does Not Corroborate the Established Serotyping Classification of Bacillus thuringiensis.

Reference

Contents

Scripts' description

About

Releases

Packages

Contributors 2

Languages

lab7arriam/IJMS_2020

Folders and files

Latest commit

History

Repository files navigation

The Distribution of Several Genomic Virulence Determinants Does Not Corroborate the Established Serotyping Classification of Bacillus thuringiensis.

Reference

Contents

Scripts' description

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages