Reduce number of datasets and focus dataset type #12

lakikowolfe · 2020-08-10T15:48:21Z

TA used a wide variety of datasets. In the last class he switches between many different datasets to illustrate his points. These datasets vary widely in the type of data they are capturing.

We want to reduce the number of datasets used. Ideally working with only one or two datasets with a biological focus throughout the course.

lakikowolfe · 2020-08-10T15:48:48Z

Data audit

How the data was used
dtypes
missing data?
Include dummy datasets TA made

Class 1

Commute Time Dataset
- Feature engineering and EDA
- No missing data
- Generic dataset with both categorical and numeric data

Class 2

Commute Time Dataset
- Viz of single variables and relationships, linear regression, mean squared error, random forests

Class 3

Dummy dataset of 0 and 1 as an example of categorical data
Dummy dataset of two random clouds of points to illustrate decision boundaries
Tennis dataset
- all categorical variables, target variable is yes/no played tennis
Iris dataset
- All numeric variables except for target variable (categorical: species)
Dummy dataset for random forest

Class 4

Dummy data to show the curse of dimensionality
Iris dataset to show the benefits of PCA
- Pair plot
- PCA
Dummy data to superimpose the first component line over a series of random points
Dummy data and custom code to illustrate eiganvectors
Centered faces dataset: "Eigenfaces"
Dummy dataset of clusters to show K means
Arrests data
- four numeric vars
NCI60 for PCA and hierarchical clustering

lakikowolfe · 2020-08-10T16:25:09Z

Tennis dataset from class 3 can be replaced by Ted's OHSU cvd dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce number of datasets and focus dataset type #12

Reduce number of datasets and focus dataset type #12

lakikowolfe commented Aug 10, 2020

lakikowolfe commented Aug 10, 2020 •

edited

Loading

lakikowolfe commented Aug 10, 2020

Reduce number of datasets and focus dataset type #12

Reduce number of datasets and focus dataset type #12

Comments

lakikowolfe commented Aug 10, 2020

lakikowolfe commented Aug 10, 2020 • edited Loading

Data audit

Class 1

Class 2

Class 3

Class 4

lakikowolfe commented Aug 10, 2020

lakikowolfe commented Aug 10, 2020 •

edited

Loading