COMPSCI 589 is an open source applied machine learning course designed for senior undergraduate students and junior (masters-level) graduate students. The course materials have been developed by Prof. Benjamin M. Marlin at the College of Information and Computer Sciences, University of Massachusetts Amherst since fall 2014.
The course slides were created in Latex using the Beamer package. Pre-compiled PDF slides are available in the slides directory. Pre-compiled PDF handouts (without animations) are available in the handouts directory. The majority of the lectures also have accompanying Jupyter notebook demos. The demos are located in the demos/code directory.
The Latex source for the slides is available in the src directory. The title slide for each lecture can by customized with your course number, your name, and your affiliation by editing the src/config.tex file and recompiling the slides. To recompile the slides, you will need pdflatex installed with the Beamer package. Slides and handouts can be recompiled individually, or using the supplied compile_all_slides.sh bash script.
The demos require Python 2.7, Jupyter notebook, and a current version of scikit-learn. Some demos use additional packages including Theano and wxPython.
The course introduces core machine learning models and algorithms for classification, regression, clustering, and dimensionality reduction. On the theory side, the course focuses on understanding models and the relationships between them. On the applied side, the course focuses on effectively using machine learning methods to solve real-world problems with an emphasis on model selection, regularization, design of experiments, and presentation and interpretation of results. The course also explores the use of machine learning methods across different computing contexts including desktop and cloud computing. The course focuses on Python, Scikit-Learn, and Apache Spark as toolkits.
The readings are taken from An Introduction to Statistical Learning [ISL], and The Elements of Statistical Learning, Second Edition [ESL], both of which are freely available.
-
Lecture 1: Course Overview - Supervised and Unsupervised Learning
Materials: Slides | Handouts | latex
Reading: ISL Section 1 (p.1-9), Section 2.1.4 (p27-29)
-
Lecture 2: KNN and Decision Trees
Materials: Slides | Handouts | latex
Reading: ESL Section 2.3.2 (p.14-16), ISL: Section 8 (p. 303, 311-314), ESL Section 2.5 (p.22-23)
-
Lecture 3: Naïve Bayes, LDA, and Logistic Regression
Materials: Slides | Handouts | latex
Reading: ESL Section 4 (p. 101-102, 106-110, 119-120, 127-132)
-
Lecture 4: Overfitting, Regularization and Crossvalidation
Materials: Slides | Handouts | latex
ISL Section 2.2.3 (p. 37), Section 5 (176-183, 184-186)
-
Lecture 5: Support Vector Machines, Basis Expansion, and Kernels
Materials: Slides | Handouts | latex
Reading: ISL Section 9.5 (p.356-359)
-
Lecture 6: Neural Networks and Deep Learning
Materials: Slides | Handouts | latex
Reading: ESL Section 11.3 (p.392-395, 397-409)
-
Lecture 7: Ensembles and Classification
Materials: Slides | Handouts | latex
Reading: ISL Section 8.2 (p.316-324)
-
Lecture 8: Linear Regression, Ridge and the Lasso
Materials: Slides | Handouts | latex
Reading: ISL Section 3.1 (p.61-63), Section 3.2 (p.71-75), Section 6.2 (p.214-224), Section 3.3.2 (p.86-92)
-
Lecture 9: KNN, Regression Trees, and Feature Selection
Materials: Slides | Handouts | latex
Reading: ISL Section 3.5 (p.104-109), Section 8.1.1 (p.304-311), Section 6.1 (205-210)
-
Lecture 10: Support Vector and Neural Network Regression
Materials: Slides | Handouts | latex
Reading: ESL Section 11.3 (392-401), ESL Section 12.3.6 (p.434-438)
-
Lecture 11: KOLS and Gaussian Process Regression
-
Lecture 12: Introduction to Data Parallel Computing
-
Lecture 13: Introduction to Apache Spark
Materials: Slides | Handouts | latex
Reading: Resilient Distributed Datasets
Reading: Spark Programming Guide
-
Lecture 14: Data Parallel Programming Abstractions in Spark
Materials: Slides | Handouts | latex
Reading: [Spark Exercises from AMP Camp 4](Spark Exercises from AMP Camp 4)
Video: AMP Camp 3 Spark Tutorial
-
Lecture 15: Hierarchical Clustering
Materials: Slides | Handouts | latex
Reading: ISL Section 10.3.2 (p.390-401)
-
Lecture 16: K-Means Clustering
Materials: Slides | Handouts | latex
Reading: ISL Section 10.3.1 (p.386-390), ESL Section 6.8 (p.214-216), Section 8.5 (p.272-276)
-
Lecture 17: Mixture Models
Materials: Slides | Handouts | latex
Reading: ISL Section 10.3.1 (p.386-390), ESL Section 6.8 (p.214-216), Section 8.5 (p.272-276)
-
Lecture 18: Linear Dimensionality Reduction and SVD
Materials: Slides | Handouts | latex
Reading: ESL Section 14.15.1 (p.534-536)
-
Lecture 19: Principal Component Analysis
Materials: Slides | Handouts | latex
Reading: ISL Section 10.3 (p.374-385)
-
Lecture 20: Sparse Coding, Non-negative Matrix Factorization, and Independent Component Analysis
Materials: Slides | Handouts | latex
Reading: ESL Section 14.6 (p.553-557), Section 14.7 (p.557-570),
Reading: Sparse Coding
-
Lecture 21: Kernel PCA and Spectral Clustering
Materials: Slides | Handouts | latex
Reading: ESL Section 14.15.3 (p.544-547), ESL Section 14.15.4 (p.547-550),
-
Lecture 22: Multidimensional Scaling and Isomap
Materials: Slides | Handouts | latex
Reading: ESL Section 14.8-9 (p.570-576)
- Lecture01: Introduction to Python
- Lecture02: KNN and Decision Trees
- Lecture03: Naive Bayes, LDA and Logistic Regression
- Lecture04: Model Complexity and Overfitting
- Lecture05: SVMs, Basis Expansions and Kernels
- Lecture06: Neural Network Classification (uses Theano)
- Lecture11: Gaussian Processes (uses wxPython)
- Lecture15: Hierarchical Clustering
- Lecture16: KMeans Clustering
- Lecture17: Mixture Models
- Lecture18-20: Linear Dimensionality Reduction
Copyright 2016 Benjamin M. Marlin. These materials are provided under the GNU GENERAL PUBLIC LICENSE Version 3 (GPL 3). As permitted by GPL 3 Section 7(b), all attributions present in this work must be preserved in all copies and derived works.
The development of these materials is supported by the National Science Foundation through award # IIS-1350522.