Genomics Projects

Most of these projects have been completed as a class project under the supervision of Professor Wei-Yi Cheng for his course on Introduction to Genomics and Information Science at Columbia University. All of the code is either completely mine or modified after receiving instructor feedback.

All data files have been provided by the course instructor and can be accessed here on request.

The repository is constantly updated as I complete more class/personal projects.

Outline of Projects

Biomarker Identification - Mirdametinib [code]
- Summary: Identified different biomarkers that have led to the discovery of Mirdametinib using approaches illustrated in Barretina et al. paper. Used machine learning models to shortlist features that are of highest important.
- Tools: sklearn RandomForestRegressor, MAF files, CCLE mutation data
NBL Analysis 1 - Identifying Risk-Associated Mutations [code]
- Summary: Studied the genomic data of neuroblastoma (NBL) samples to identify predictive mutations and gene expression features
- Tools: Fisher's exact test, MAF files
COVID Spark Protein Alignment [code]
- Summary: Performed a multiple sequence alignment of different COVID-19 spark protein sequences, created a phylogenetic tree of the different variants, and assigned variant identities
- Tools: Biopython, Ipytree, MAFFT, FASTA files
Prostate Cancer Risk Screening Using Germline Variant Data [code]
- Summary: Identified pathogenic germline variants from 'Clinvar' clinical variants database that could lead to prostate cancer and screened 1000 genome project dataset to check if the subjects carry those specific variants associated with high risk of prostate cancer
- Tools: scikit-allel, VCF, Zarr files
Calculating Mutation Load Using TCGA Breast Cancer Dataset [code]
- Summary: Investigated how damages in mismatch repair pathway could affect the number of mutations, also known as mutation load, in breast cancer.
- Tools: MAF files, Mann-Whitney U test

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
final_project		final_project
.DS_Store		.DS_Store
.gitignore		.gitignore
2022-10-31_hw2a (3).Rmd		2022-10-31_hw2a (3).Rmd
Biomarker Identification - Mirdametinib.ipynb		Biomarker Identification - Mirdametinib.ipynb
COVID_Spark_Protein_Alignment.ipynb		COVID_Spark_Protein_Alignment.ipynb
Calculating Mutation Load Using TCGA Breast Cancer Dataset.ipynb		Calculating Mutation Load Using TCGA Breast Cancer Dataset.ipynb
Crystal_Shin_ss6631_midterm_part2.Rmd		Crystal_Shin_ss6631_midterm_part2.Rmd
NBL Analysis 1 - Identifying Risk-Associated Mutations.ipynb		NBL Analysis 1 - Identifying Risk-Associated Mutations.ipynb
Prostate Cancer Risk Screening Using Germline Variant Data.ipynb		Prostate Cancer Risk Screening Using Germline Variant Data.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genomics Projects

Outline of Projects

About

Releases

Packages

Languages

sjcshin/genomics_projects

Folders and files

Latest commit

History

Repository files navigation

Genomics Projects

Outline of Projects

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages