GitHub - csalg/bioinformatics_final_project: Final Project for SJTU Bioinformatics (graduate course)

This is my final project for the SJTU Bioinformatics class. We were asked to do data analysis on microarray gene expression data; however most traditional machine learning methods fail with this data set because n << p. For example, PCA is a terrible approximation to the real covariance of the data, most neural network architectures overfit, etc. In these cases, a sparsity assumption is usually made, and we show how applying sparse PLSDA and Deep Learning we can build robust classifiers.

I will not go into all the details, but we built a classifier that was 100% accurate in classifying all 5 major different types of leukemia; the reader can have a look at the slides or the report for an in-depth explanation.

Here is a picture of the leukemia dataset visualised transformed in to the sPLSDA feature space:

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
img		img
.DS_Store		.DS_Store
README.md		README.md
report.pdf		report.pdf
report.zip		report.zip
slides.pdf		slides.pdf
slides.zip		slides.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

csalg/bioinformatics_final_project

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages