-
Notifications
You must be signed in to change notification settings - Fork 23
/
Friday1.R
174 lines (148 loc) · 5.65 KB
/
Friday1.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
# Date: Friday, August 10, 2012
################################################
# Fisher's Exact Test and Related Procedures ###
################################################
# Goal: Detect true association between two binary
# "variables".
### Fisher's exact test for randomized trials ###
# Example:
# 48 bank managers participated in a randomized experiment without
# knowing this: They had to judge whether a fictitious person should
# be promoted or not, based on his/her file.
# The 48 files had been distributed at random and differed only in
# one aspect: 24 files belonged to Mr. Miller, 24 to Mrs. Miller.
#
# Possible working hypotheses:
# * "Men tend to be judged better than women"
# * "Women tend to be judged better than men"
# * "The decision is dependent on the candidate's
# gender, but we don't know how."
#
# (Before obtaining/analyzing the data, one should know
# one's nworking hypothesis. In case of doubt, the
# two-sided working hypothesis of some dependency is
# recommended.)
#
# Null hypothesis: The 48 managers judged objectively; the
# candidate's gender was irrelevant.
#
# Result:
# || promoted | not promoted ||
# -----------------------------------------------
# Mr. Miller || 21 | 3 || 24
# Mrs. Miller || 14 | 10 || 24
# -----------------------------------------------
# || 35 | 13 || 48
#
# Under null hypothesis, the resulting table has the form
#
# || promoted | not promoted ||
# -----------------------------------------------
# Mr. Miller || T | 24-T || 24
# Mrs. Miller || 35-T | 24 - 13 + T || 24
# -----------------------------------------------
# || 35 | 13 || 48
# with
# T ~ hyper(35, 13, 24) .
x <- 11:24 # possible values, bc every cell has to be >= 0
dx <- dhyper(x,35,13,24)
barplot(dx,names.arg=x)
# P-values
# (probabilities are computed assuming the null hypothesis):
# * pv.rs = 1 - phyper(T-1,35,13,24) # why the -1?!?!?
# Right-sided p-value: Is T "suspiciously large"?
# * pv.ls = phyper(T,35,13,24)
# Left-sided p-value: Is T "suspiciously small"?
# * pv.ts = 2 * min(pv.ls, pv.rs)
# Two-sided p-value: Is T "suspiciously extreme"?
# Which p-value is appropriate,
# depends on our working hypothesis.
# Let's be cautious and use the two-sided one:
pv.ls <- phyper(21, 35,13,24)
pv.ls
pv.rs <- 1 - phyper(20, 35,13,24)
pv.rs
pv.ts <- 2*min(pv.ls,pv.rs)
pv.ts
# Now the same thing with a built-in procedure:
help.search("Fisher's exact test")
? fisher.test
# There are two modes of using fisher.test():
# * We have a data frame 'ds' with two binary factors
# "F1" and "F2"; then run
# fisher.test(ds$F1,ds$F2)
# * We have a two-by-two table "x"; then run
# fisher.test(x)
# To use fisher.test here, we have to generate
# a contingency table:
x <- matrix(c(21,14,3,10),nrow=2,ncol=2)
x
fisher.test(x=x)
# Note that the p-value provided here is the
# two-sided p-value we computed "by hand".
# To get the one-sided p-values, type
fisher.test(x=x,alternative="less") # for pv.ls
fisher.test(x=x,alternative="greater") # for pv.rs
# The output also contains an estimate and a
# confidence interval for an "odds ratio".
# This will be explained next...
### Odds ratios ###
# Goals:
# * Compare two different procedures (e.g. medical
# treatments) with respect to success/failure.
# * Compare two (sub-)populations with respect to
# a certain property of individuals.
#
# In the first example, let p1 and p2 be the success
# probabilities with procedure 1 and 2, respectively.
# Then the odds of success are
# pj/(1-pj) with procedure j,
# and the odds ratio of success with procedure 1 versus
# procedure 2 equals
# OR = (p1/(1-p1)) / (p2/(1-p2)) #what an unsightful name for a variable >.<
# = p1*(1-p2) / (p2*(1-p1)) .
#
# Note that
# p1 >< p2 if, and only if, OR >< 1 .
#
# A randomized study to compare the two procedures:
# A group of volunteers is divided randomly into two
# groups, where group j undergoes procedure j.
# The results may be summarized as a two-by-two table:
#
# || success | failure ||
# -------------------------------------------------
# Group 1 || S1 | F1 || n1 = S1+F1
# Group 1 || S2 | F2 || n2 = S2+F2
# -------------------------------------------------
# || S = S1+S2 | F = F1+F2 || n = n1+n2
#
# Performing fisher.test(x) with this two-by-two table
# (x <- matrix(c(S1,S2,F1,F2),nrow=2,ncol=2))
# yields a p-value for the null hypothesis that the two
# two procedures have the same success probabilities
# (p1 == p2 <==> OR == 1),
# a confidence interval for the odds ratio OR and
# an estimator for OR.
# In the second example, let pj be the proportion of
# individuals having the property within
# (sub-)population j. Thus, if we draw an indivudual
# from (sub-)population j randomly, the odds of picking
# an individual with the property are pj/(1-pj).
# Again the odds ratio is defined as
# OR = (p1/(1-p1)) / (p1/(1-p2))
# = p1*(1-p2) / (p2*(1-p1)) .
# Data example:
# Gender and Left-Handedness (see website of this course).
#
# | left-handed | right-handed |
# ---------------------------------------
# male | 113 | 934 |
# female | 92 | 1070 |
# ---------------------------------------
x <- matrix(c(113,92,934,1070),ncol=2,nrow=2)
x
fisher.test(x)
# Conclusion: With confidence 95% we may conclude
# that left-handedness is more frequent among men
# than among women (in the given population).