Skip to content

Here lies the code for taking viral metagenome contigs from human gut microbiomes, analyzing them for completeness, and then clustering the complete genomes

Notifications You must be signed in to change notification settings

michaeliter/viral_genome_clustering

Repository files navigation

viral_genome_clustering

Here lies the code for taking viral metagenome contigs from human gut microbiomes, analyzing them for completeness, and then clustering the complete genomes

Pipeline:

  1. Extract pre-labeled metagenome contigs by country of origin - get_country_contigs.sh
  2. Extract phages from pre-labeled metagenome contigs - get_phages.sh
  3. Sort phages by length - sort_contigs.py
  4. Analyze phage contigs for completeness and lifestyle using CheckV - run_checkv.sh
  5. Extract only the complete phage genomes - make_complete_fasta.sh
  6. Remove host contamination from genomes using CheckV outputs - cut_complete_genomes.py
  7. Create cluster network for complete genomes using Prodigal for ORF calling and vConTACT2 for clustering based on proteins - run_prodigal_vcontact2.sh
  8. Edit network node table to include info about our genomes' time periods and lifestyles - annotate_network.py

Analyses:

  1. Lifestyle Frequencies by Time Period: lifestyle_freqs
  2. Network Colored by Time Period
    • grey = database
    • red = industrial
    • blue = pre-industrial
    • green = paleo
time_w_db
  1. Network Colored by Lifestyle
    • grey = database
    • purple = lytic
    • orange = temperate
life_w_db
  1. Edge Lengths By Time Period
Screen Shot 2024-03-22 at 12 58 57 PM
  1. Edge Counts By Time Period
Screen Shot 2024-03-21 at 10 02 35 PM
  1. Average Neigbor Composition By Time Period
Screen Shot 2024-03-21 at 10 02 14 PM Screen Shot 2024-03-21 at 10 02 06 PM Screen Shot 2024-03-21 at 10 01 58 PM
  1. Outlier Frequency By Time Period
Screen Shot 2024-03-22 at 1 11 12 PM

About

Here lies the code for taking viral metagenome contigs from human gut microbiomes, analyzing them for completeness, and then clustering the complete genomes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published