Skip to content

Commit

Permalink
Merge pull request #106 from kk-Syuer/main
Browse files Browse the repository at this point in the history
Main
  • Loading branch information
iacopomasi authored May 9, 2024
2 parents 09e9276 + e9c123f commit 1faedd5
Showing 1 changed file with 10 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@
"# BIAS-Variance Trade-off\n",
"- The **bias error** is produced by weak assumptions in the learning algorithm\n",
" - **High bias** can cause an algorithm to **miss the relevant relations between features and target outputs** \n",
" - Problem know as `underfitting`. Solution: increase the complexity/expressiveness of your ML algorithm!"
" - Problem known as `underfitting`. Solution: increase the complexity/expressiveness of your ML algorithm!"
]
},
{
Expand All @@ -239,7 +239,7 @@
"\n",
"- The **variance** is an error produced by an **oversensitivity to small fluctuations in the training set**\n",
" - High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs \n",
" - Problem know as `overfitting`. Solution: decrease the model complexity or add strong regularization."
" - Problem known as `overfitting`. Solution: decrease the model complexity or add strong regularization."
]
},
{
Expand Down Expand Up @@ -681,7 +681,7 @@
"Pro:\n",
"- Reduces waste of data\n",
"- Unbiased with respect to the split choice\n",
"- Provides an estimate of standard deviation of your prediction (variance)\n",
"- Provides an estimate of the standard deviation of your prediction (variance)\n",
"\n",
"Con:\n",
"- **Computationally expensive!** _(though with multi-core you can run in parallel)_\n",
Expand Down Expand Up @@ -753,7 +753,7 @@
"\n",
"- In terms of accuracy, **LOO often results in high variance as an estimator for the test error**.\n",
"\n",
"- Since of the samples are used to build each model, models constructed from folds are virtually identical to each other and to the model built from the entire training set.\n",
"- Since the samples are used to build each model, models constructed from folds are virtually identical to each other and to the model built from the entire training set.\n",
"\n",
"_As a general rule, most authors, and empirical evidence, suggest that 5- or 10- fold cross validation should be preferred to LOO._"
]
Expand Down Expand Up @@ -845,7 +845,7 @@
}
},
"source": [
"**Exam look-alike question**: how many model (decision trees here) you need to train to make the choice?"
"**Exam look-alike question**: how many models (decision trees here) you need to train to make the choice?"
]
},
{
Expand Down Expand Up @@ -1004,7 +1004,7 @@
"source": [
"# Hyper-parameter tuning\n",
"\n",
"We are working in the medical sector and we are using decision tree for their interpretability power, but we have to decide the **depth of the tree.**\n",
"We are working in the medical sector and we are using decision trees for their interpretability power, but we have to decide the **depth of the tree.**\n",
"\n",
"<br/><br/>\n",
"<center><img src=\"figs/hyperparams.png\" width=\"70%\" /></center>"
Expand Down Expand Up @@ -1051,7 +1051,7 @@
}
},
"source": [
"# How many model do we train with k=10 fold cross-validation and grid search over depth $\\in [1,2,3]$ and min impurity decrease in $\\{0.01,0.1\\}$?"
"# How many models do we train with k=10 fold cross-validation and grid search over depth $\\in [1,2,3]$ and min impurity decrease in $\\{0.01,0.1\\}$?"
]
},
{
Expand Down Expand Up @@ -1195,7 +1195,7 @@
"# Loading the Digits dataset\n",
"digits = datasets.load_digits()\n",
"\n",
"# To apply an classifier on this data, we need to flatten the image, to\n",
"# To apply a classifier on this data, we need to flatten the image, to\n",
"# turn the data in a (samples, feature) matrix:\n",
"n_samples = len(digits.images)\n",
"X = digits.images.reshape((n_samples, -1))\n",
Expand Down Expand Up @@ -2394,11 +2394,11 @@
"\n",
"1. Compute the ROC with a table and/or draw it approximatively the ROC curve (TPR vs FPR)\n",
"2. Calculate the Area Under the Curve (AUC).\n",
"3. How woud you set the score to make $AUC=100\\%?$\n",
"3. How would you set the score to make $AUC=100\\%?$\n",
"| **labels** \t| -1 \t| 1 \t| -1 \t| 1 \t| -1 \t| 1 |\n",
"|--------|----|-----|------|-----|------|-----|\n",
"| **score** \t| ? \t| ? \t| ? \t| ? \t| ? \t| ? | \n",
"4. How woud you set the score to make $AUC=0\\%?$\n",
"4. How would you set the score to make $AUC=0\\%?$\n",
"| **labels** \t| -1 \t| 1 \t| -1 \t| 1 \t| -1 \t| 1 | \n",
"|--------|----|-----|------|-----|------|-----|\n",
"| **score** \t| ? \t| ? \t| ? \t| ? \t| ? \t| ? |\n",
Expand Down

0 comments on commit 1faedd5

Please sign in to comment.