Skip to content

Portfolio for genomics projects that I have completed (Summary of each project in README)

Notifications You must be signed in to change notification settings

sjcshin/genomics_projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genomics Projects

Most of these projects have been completed as a class project under the supervision of Professor Wei-Yi Cheng for his course on Introduction to Genomics and Information Science at Columbia University. All of the code is either completely mine or modified after receiving instructor feedback.

All data files have been provided by the course instructor and can be accessed here on request.

The repository is constantly updated as I complete more class/personal projects.

Outline of Projects

  • Biomarker Identification - Mirdametinib [code]

    • Summary: Identified different biomarkers that have led to the discovery of Mirdametinib using approaches illustrated in Barretina et al. paper. Used machine learning models to shortlist features that are of highest important.
    • Tools: sklearn RandomForestRegressor, MAF files, CCLE mutation data
  • NBL Analysis 1 - Identifying Risk-Associated Mutations [code]

    • Summary: Studied the genomic data of neuroblastoma (NBL) samples to identify predictive mutations and gene expression features
    • Tools: Fisher's exact test, MAF files
  • COVID Spark Protein Alignment [code]

    • Summary: Performed a multiple sequence alignment of different COVID-19 spark protein sequences, created a phylogenetic tree of the different variants, and assigned variant identities
    • Tools: Biopython, Ipytree, MAFFT, FASTA files
  • Prostate Cancer Risk Screening Using Germline Variant Data [code]

    • Summary: Identified pathogenic germline variants from 'Clinvar' clinical variants database that could lead to prostate cancer and screened 1000 genome project dataset to check if the subjects carry those specific variants associated with high risk of prostate cancer
    • Tools: scikit-allel, VCF, Zarr files
  • Calculating Mutation Load Using TCGA Breast Cancer Dataset [code]

    • Summary: Investigated how damages in mismatch repair pathway could affect the number of mutations, also known as mutation load, in breast cancer.
    • Tools: MAF files, Mann-Whitney U test

About

Portfolio for genomics projects that I have completed (Summary of each project in README)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published