Finished Part III

core-methods-in-edm · Nov 22, 2019 · 69324e9 · 69324e9
1 parent 4cfda5d
commit 69324e9
Show file tree

Hide file tree

Showing 4 changed files with 28 additions and 11 deletions.
diff --git a/Assignment7.Rmd b/Assignment7.Rmd
@@ -3,9 +3,9 @@ title: "Assignment 7 - Response"
 author: "Timothy Lee"
 date: "11/21/2019"
 output:
+  pdf_document: default
   html_notebook: default
-  html_document:
-  pdf_document:
+  html_document: default
 ---
 
 In the following assignment you will be looking at data from one level of an online geography tutoring system used by 5th grade students. The game involves a pre-test of geography knowledge (pre.test), a series of assignments for which you have the average score (av.assignment.score),  the number of messages sent by each student to other students about the assignments (messages), the number of forum posts students posted asking questions about the assignment (forum.posts), a post test at the end of the level (post.test) and whether or not the system allowed the students to go on to the next level (level.up).  
@@ -258,6 +258,11 @@ cohen.kappa(matrix1)
 cohen.kappa(matrix2)
 ```
 
+Conclusions that can be drawn:
+
+1. The first threshold has a higher kappa, accuracy, and recall.
+2. The second threshold has a higher precision.
+3. The first threshold is likely the better one to use. It shows more agreement between model and data (kappa), has a higher accuracy and recall. Although it has a lower precision, the reduction in precision (driven mostly by a lower false positive rate) is not sufficient to justify the tradeoff to accuracy and recall (driven mostly by the decreased true positive rate/increased false negative rate, which is larger than the decrease in FPR). The overestimation of students who might not level up would be a strain on resources.
 
 ### To Submit Your Assignment
 

diff --git a/Assignment7.html b/Assignment7.html
@@ -428,8 +428,8 @@ <h4>Classification tree</h4>
 ## 
 ##        CP nsplit rel error xerror     xstd
 ## 1 0.54250      0    1.0000 1.0000 0.038730
-## 2 0.01125      1    0.4575 0.4675 0.030825
-## 3 0.01000      3    0.4350 0.4825 0.031200</code></pre>
+## 2 0.01125      1    0.4575 0.4575 0.030569
+## 3 0.01000      3    0.4350 0.4800 0.031138</code></pre>
 <pre class="r"><code>#Pruning since gain to CP at 3rd split is not that much more, with no decrease in xerror
 #ctree_geog_its &lt;- prune.rpart(ctree_geog_its, cp = 0.01125)
 #Not doing this after all since it kind of ruins the ROC exercise. With few splits there are few unique probabilities assigned for the cases, and hence few cutoffs probabilities
@@ -664,6 +664,12 @@ <h4>Thresholds</h4>
 ## weighted kappa    0.44     0.49  0.55
 ## 
 ##  Number of subjects = 1000</code></pre>
+<p>Conclusions that can be drawn:</p>
+<ol style="list-style-type: decimal">
+<li>The first threshold has a higher kappa, accuracy, and recall.</li>
+<li>The second threshold has a higher precision.</li>
+<li>The first threshold is likely the better one to use. It shows more agreement between model and data (kappa), has a higher accuracy and recall. Although it has a lower precision, the reduction in precision (driven mostly by a lower false positive rate) is not sufficient to justify the tradeoff to accuracy and recall (driven mostly by the decreased true positive rate/increased false negative rate, which is larger than the decrease in FPR). The overestimation of students who might not level up would be a strain on resources.</li>
+</ol>
 </div>
 <div id="to-submit-your-assignment" class="section level3">
 <h3>To Submit Your Assignment</h3>