Merge pull request #106 from kk-Syuer/main

Main
iacopomasi · May 9, 2024 · 1faedd5 · 1faedd5
2 parents 09e9276 + e9c123f
commit 1faedd5
Showing 1 changed file with 10 additions and 10 deletions.
diff --git a/AA2324/course/10_model_selection_crossvalid/10_model_selection_crossvalid.ipynb b/AA2324/course/10_model_selection_crossvalid/10_model_selection_crossvalid.ipynb
@@ -223,7 +223,7 @@
     "# BIAS-Variance Trade-off\n",
     "- The **bias error** is produced by weak assumptions in the learning algorithm\n",
     "    - **High bias** can cause an algorithm to **miss the relevant relations between features and target outputs** \n",
-    "    - Problem know as `underfitting`. Solution: increase the complexity/expressiveness of your ML algorithm!"
+    "    - Problem known as `underfitting`. Solution: increase the complexity/expressiveness of your ML algorithm!"
    ]
   },
   {
@@ -239,7 +239,7 @@
     "\n",
     "- The **variance** is an error produced by an **oversensitivity to small fluctuations in the training set**\n",
     "    - High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs \n",
-    "    - Problem know as `overfitting`. Solution: decrease the model complexity or add strong regularization."
+    "    - Problem known as `overfitting`. Solution: decrease the model complexity or add strong regularization."
    ]
   },
   {
@@ -681,7 +681,7 @@
     "Pro:\n",
     "- Reduces waste of data\n",
     "- Unbiased with respect to the split choice\n",
-    "- Provides an estimate of standard deviation of your prediction (variance)\n",
+    "- Provides an estimate of the standard deviation of your prediction (variance)\n",
     "\n",
     "Con:\n",
     "- **Computationally expensive!** _(though with multi-core you can run in parallel)_\n",
@@ -753,7 +753,7 @@
     "\n",
     "- In terms of accuracy, **LOO often results in high variance as an estimator for the test error**.\n",
     "\n",
-    "- Since  of the  samples are used to build each model, models constructed from folds are virtually identical to each other and to the model built from the entire training set.\n",
+    "- Since  the  samples are used to build each model, models constructed from folds are virtually identical to each other and to the model built from the entire training set.\n",
     "\n",
     "_As a general rule, most authors, and empirical evidence, suggest that 5- or 10- fold cross validation should be preferred to LOO._"
    ]
@@ -845,7 +845,7 @@
     }
    },
    "source": [
-    "**Exam look-alike question**: how many model (decision trees here) you need to train to make the choice?"
+    "**Exam look-alike question**: how many models (decision trees here) you need to train to make the choice?"
    ]
   },
   {
@@ -1004,7 +1004,7 @@
    "source": [
     "# Hyper-parameter tuning\n",
     "\n",
-    "We are working in the medical sector and we are using decision tree for their interpretability power, but we have to decide the **depth of the tree.**\n",
+    "We are working in the medical sector and we are using decision trees for their interpretability power, but we have to decide the **depth of the tree.**\n",
     "\n",
     "<br/><br/>\n",
     "<center><img src=\"figs/hyperparams.png\" width=\"70%\" /></center>"
@@ -1051,7 +1051,7 @@
     }
    },
    "source": [
-    "# How many model do we train with k=10 fold cross-validation and grid search over depth $\\in [1,2,3]$ and min impurity decrease in $\\{0.01,0.1\\}$?"
+    "# How many models do we train with k=10 fold cross-validation and grid search over depth $\\in [1,2,3]$ and min impurity decrease in $\\{0.01,0.1\\}$?"
    ]
   },
   {
@@ -1195,7 +1195,7 @@
     "# Loading the Digits dataset\n",
     "digits = datasets.load_digits()\n",
     "\n",
-    "# To apply an classifier on this data, we need to flatten the image, to\n",
+    "# To apply a classifier on this data, we need to flatten the image, to\n",
     "# turn the data in a (samples, feature) matrix:\n",
     "n_samples = len(digits.images)\n",
     "X = digits.images.reshape((n_samples, -1))\n",
@@ -2394,11 +2394,11 @@
     "\n",
     "1. Compute the ROC with a  table and/or draw it approximatively  the ROC curve (TPR vs FPR)\n",
     "2. Calculate the Area Under the Curve (AUC).\n",
-    "3. How woud you set the score to make $AUC=100\\%?$\n",
+    "3. How would you set the score to make $AUC=100\\%?$\n",
     "| **labels** \t| -1 \t| 1   \t| -1   \t| 1   \t| -1   \t| 1 |\n",
     "|--------|----|-----|------|-----|------|-----|\n",
     "| **score**  \t| ? \t| ? \t| ? \t| ? \t| ? \t| ? | \n",
-    "4. How woud you set the score to make $AUC=0\\%?$\n",
+    "4. How would you set the score to make $AUC=0\\%?$\n",
     "| **labels** \t| -1 \t| 1   \t| -1   \t| 1   \t| -1   \t| 1 |  \n",
     "|--------|----|-----|------|-----|------|-----|\n",
     "| **score**  \t| ? \t| ? \t| ? \t| ? \t| ? \t| ? |\n",