Here lies the code for taking viral metagenome contigs from human gut microbiomes, analyzing them for completeness, and then clustering the complete genomes
- Extract pre-labeled metagenome contigs by country of origin - get_country_contigs.sh
- Extract phages from pre-labeled metagenome contigs - get_phages.sh
- Sort phages by length - sort_contigs.py
- Analyze phage contigs for completeness and lifestyle using CheckV - run_checkv.sh
- Extract only the complete phage genomes - make_complete_fasta.sh
- Remove host contamination from genomes using CheckV outputs - cut_complete_genomes.py
- Create cluster network for complete genomes using Prodigal for ORF calling and vConTACT2 for clustering based on proteins - run_prodigal_vcontact2.sh
- Edit network node table to include info about our genomes' time periods and lifestyles - annotate_network.py
- Lifestyle Frequencies by Time Period:
- Network Colored by Time Period
- grey = database
- red = industrial
- blue = pre-industrial
- green = paleo
- Network Colored by Lifestyle
- grey = database
- purple = lytic
- orange = temperate
- Edge Lengths By Time Period
- Edge Counts By Time Period
- Average Neigbor Composition By Time Period
- Outlier Frequency By Time Period