Skip to content

Commit

Permalink
Finished Part III
Browse files Browse the repository at this point in the history
  • Loading branch information
timothyLeeXQ committed Nov 22, 2019
1 parent 4cfda5d commit 69324e9
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 11 deletions.
9 changes: 7 additions & 2 deletions Assignment7.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ title: "Assignment 7 - Response"
author: "Timothy Lee"
date: "11/21/2019"
output:
pdf_document: default
html_notebook: default
html_document:
pdf_document:
html_document: default
---

In the following assignment you will be looking at data from one level of an online geography tutoring system used by 5th grade students. The game involves a pre-test of geography knowledge (pre.test), a series of assignments for which you have the average score (av.assignment.score), the number of messages sent by each student to other students about the assignments (messages), the number of forum posts students posted asking questions about the assignment (forum.posts), a post test at the end of the level (post.test) and whether or not the system allowed the students to go on to the next level (level.up).
Expand Down Expand Up @@ -258,6 +258,11 @@ cohen.kappa(matrix1)
cohen.kappa(matrix2)
```

Conclusions that can be drawn:

1. The first threshold has a higher kappa, accuracy, and recall.
2. The second threshold has a higher precision.
3. The first threshold is likely the better one to use. It shows more agreement between model and data (kappa), has a higher accuracy and recall. Although it has a lower precision, the reduction in precision (driven mostly by a lower false positive rate) is not sufficient to justify the tradeoff to accuracy and recall (driven mostly by the decreased true positive rate/increased false negative rate, which is larger than the decrease in FPR). The overestimation of students who might not level up would be a strain on resources.

### To Submit Your Assignment

Expand Down
10 changes: 8 additions & 2 deletions Assignment7.html
Original file line number Diff line number Diff line change
Expand Up @@ -428,8 +428,8 @@ <h4>Classification tree</h4>
##
## CP nsplit rel error xerror xstd
## 1 0.54250 0 1.0000 1.0000 0.038730
## 2 0.01125 1 0.4575 0.4675 0.030825
## 3 0.01000 3 0.4350 0.4825 0.031200</code></pre>
## 2 0.01125 1 0.4575 0.4575 0.030569
## 3 0.01000 3 0.4350 0.4800 0.031138</code></pre>
<pre class="r"><code>#Pruning since gain to CP at 3rd split is not that much more, with no decrease in xerror
#ctree_geog_its &lt;- prune.rpart(ctree_geog_its, cp = 0.01125)
#Not doing this after all since it kind of ruins the ROC exercise. With few splits there are few unique probabilities assigned for the cases, and hence few cutoffs probabilities
Expand Down Expand Up @@ -664,6 +664,12 @@ <h4>Thresholds</h4>
## weighted kappa 0.44 0.49 0.55
##
## Number of subjects = 1000</code></pre>
<p>Conclusions that can be drawn:</p>
<ol style="list-style-type: decimal">
<li>The first threshold has a higher kappa, accuracy, and recall.</li>
<li>The second threshold has a higher precision.</li>
<li>The first threshold is likely the better one to use. It shows more agreement between model and data (kappa), has a higher accuracy and recall. Although it has a lower precision, the reduction in precision (driven mostly by a lower false positive rate) is not sufficient to justify the tradeoff to accuracy and recall (driven mostly by the decreased true positive rate/increased false negative rate, which is larger than the decrease in FPR). The overestimation of students who might not level up would be a strain on resources.</li>
</ol>
</div>
<div id="to-submit-your-assignment" class="section level3">
<h3>To Submit Your Assignment</h3>
Expand Down
Loading

0 comments on commit 69324e9

Please sign in to comment.