The COVID-19 pandemic has profoundly impacted the world, with India being one of the hardest-hit countries. With approximately 40 million reported cases and around 570,000 severe cases, which often led to acute respiratory distress syndrome (ARDS), it became crucial to understand the genetic evolution of the virus and its impact on disease severity. This project aims to analyze the mutations in different COVID-19 strains and study how these mutations influence the severity of the infections. Our analysis includes SNP identification, mutation impact assessment, identification of the most frequently mutated genes, and the creation of phylogenetic trees categorized by country and strain.
The first step in our analysis involved the identification of Single Nucleotide Polymorphisms (SNPs) in the COVID-19 viral genome. This process included:
- Collecting COVID-19 genome sequences from global databases.
- Aligning these sequences to a reference genome.
- Identifying SNPs using bioinformatics tools.
Following SNP identification, we catalogued the mutations present in different strains. This step involved:
- Annotating the SNPs to determine the specific mutations (e.g., amino acid changes).
- Categorizing mutations based on their location within the viral genome.
To understand the impact of mutations, we identified which genes within the viral genome were most frequently mutated. This involved:
- Counting the number of mutations in each gene.
- Analyzing the functional impact of these mutations on the viral proteins and potential disease severity.
To visualize the evolutionary relationships between different strains, we constructed phylogenetic trees. This was done in two parts:
- Country-wise Phylogenetic Trees: These trees show the evolutionary relationships of strains within specific countries, highlighting the geographic spread and mutation patterns.
- Strain-wise Phylogenetic Trees: These trees display the relationships between different strains globally, providing insights into how specific mutations may correlate with disease severity.
- Bioinformatics Tools: Tools such as BlastN and ClustalW were used for genome alignment and SNP calling.
- Phylogenetic Analysis: MEGA were utilized for constructing phylogenetic trees.
- SNP Identification: We identified numerous SNPs across different COVID-19 strains.
- Mutation Analysis: Specific mutations were found to be prevalent in certain genes, some of which were correlated with increased severity.
- Gene Mutation Frequency: Genes such as the Spike (S) protein showed a high frequency of mutations.
- Phylogenetic Trees: The constructed trees provided a clear visualization of the evolutionary pathways and mutation patterns of the virus.
Our comprehensive analysis of COVID-19 mutations offers valuable insights into how genetic changes in the virus may influence disease severity. By understanding these mutations and their impacts, we can better inform public health strategies .
- Further Functional Studies: Investigating the biological impact of key mutations on viral behavior and host interactions.
- Enhanced Phylogenetic Analysis: Expanding the dataset to include more sequences for robust phylogenetic analysis.
- Clinical Correlation: Correlating mutation data with clinical outcomes to better understand the implications of specific genetic changes.