Team
- Daniel Demoray, producer
- Ruben Naeff, instructor
- Antoine Grant, expert in residence
- egroup & Slack channel, Google Drive folder, everyone
Please do not hesitate to contact any of us!
Logisitics
- June 25 - September 10, 2015
- Tuesdays and Thursdays 6.30-9.30
- GA West, 10 East 21st St, room 4A (4th floor)
- Office hours: TBD
Please fill out an exit ticket after each class
Please see Cloning the repo for how to download the latest course notes to your laptop.
- Calendar
- Final Project
- Guest speakers
- All datasets
- Further Reading to continue your studies
- Extraneous to make your life more pleasant
I. Data Exploration (Analytics)
-
01: INTRODUCTION TO DATA SCIENCE
- Slides
- Setting Up Your Environment
- Data Science at the Command Line including exercises
-
- Slides
- SQL Exercises
- Python Exercises
- Data Exploration in Python including exercises
-
04: VISUALIZATIONS AND MORE DATA GATHERING
- Slides
- Web scraping optional demo
- Twitter API optional demo
- How To Present Your Insights
- Visualizations including exercises
- Anscombe's Quartet illustrating the need for visualizations
- Assignment #1: Data Exploration
II. Supervised Learning
- 05: INTRODUCTION TO MACHINE LEARNING
- Slides
- kNN Classification Iris dataset
- kNN implementation optional exercise
Regression models
-
- Presentations Assignment #1
- Slides
- Introduction to numpy optional
- Linear Algebra recap optional
- Linear Regression
statsmodels
,patsy
,seaborn
; salaries, house prices - 3D plot in Python example as reference
-
07: POLYNOMIAL REGRESSION & REGULARIZATION
- Slides
- Regularization polynomials,
makepipeline
, Ridge, Lasso
-
08: REGRESSION & TEXT PROCESSING
- One slide
- Text Processing Amazon movie reviews,
CountVectorizer
(demo) - Guest Speaker: Amy Roberts, CEO & Founder of Healthy Bytes
- Assignment #2: Linear Regression Salary Prediction
III. Supervised Learning: Classification
-
- Slides
- Logistic Regression Iris dataset, precision/recall, decision boundaries; exercises
- Insult Classification exercise
- Area Under the ROC Curve optional deep dive
- Non-linear decision boundaries optional
-
- Slides Exit tickets review
- Review Assignment #2 and leaderboard
- Final project announcement guidelines, deadlines, sample projects
- Guest Speaker: Rohit Acharya, Chief Data Scientist at First Access
-
- Slides
- 20 Newsgroups
CountVectorizer
,TfidfVectorizer
,MultinomialNB
(demo) - Naive Bayes implementation exercise
- Statistics & Probability recap optional basic recap
- Bayesian coin flips optional deep dive
-
12: ENSEMBLE LEARNING & RANDOM FORESTS
- Slides
- Random Forests in
sklearn
DecisionTree
,RandomForestClassifier
,AdaBoostClassifier
,GradientBoostingClassifier
- Drawing trees in
sklearn
optionalGraphviz
,pydot
- Plotting decision boundaries optional
ExtraTreesClassifier
,AdaBoost
- Random Forests implementation with notebook demonstration optional deep dive
-
- Slides
- Recognizing hand-written digits demo
- Plotting hyperplanes and support vectors demo
- Plotting different SVM kernels optional demo of non-linear kernels
- Separating mushrooms exercise
- Guest Speaker: Sandy Griffith, Biostatistician and Technical Lead at Flatiron Health
- Flask demonstration
-
- Slides Classification Review
- Comparison of all classifiers exercise
- Guest Speaker: Bob Filbin, Chief Data Scientist at Crisis Text Line
- Assignment #3: Classification Kaggle competition, data exploration and demo solution
IV. Unsupervised Learning
-
- Slides
- Clustering irises
sklearn
, simple demo - Clustering text
sklearn
, 20-newsgroups - Clustering tags in Stack Overflow Jaccard distance, exercise
- KMeans implementation: notebook and code exercise
-
- Presentations data explorations final project
- Slides
- PCA demo demo of the math
- SVD demo demo of the math
- Clustering House Legislatures demo PCA, polarizing politics
- Facial Recognition demo PCA, SVM and exercise
- Latent Semantic Analysis demo SVD, text clustering
-
- Slides
- Recommending Beers demo of several recommendation methods
- Who To Follow exercise item-based collaborative filtering
- Guest Speaker: George Kailas, CEO at Instadat
- Extracurricular: predicting student responses
V. Various
-
19: GUEST LECTURE - CONJUGATE PRIORS
- Guest Lecturer: Robert Doherty, Lead Data Science at Outbrain
- Streaming Data Algorithms: Part 1 slides
- Bayesian A/B Headline Testing: Part 2 slides
- The Beta Distribution
- Instant Headline Testing
- Amazon Resellers LAB incl exercises
-
- Slides
- Videos: Neurons and the brain, Digit recognition and Autonomous driving by Andrew Ng
- Python implementation and demonstrating notebook optional deep dive
- Restriced Boltzmann Machines optional demo unsupervied neural nets in
sklearn
- Code Reviews work-in-progress
-
- Slides
- Multiprocessing in Python local parallel computing
- Scaling, MapReduce, Spark, AWS, EC2
- Ethics by Guest Speaker: Monica Bulger, Researcher at Data & Society
-
22: VARIOUS
-
23: FINAL PROJECT PRESENTATIONS