forked from dlsun/probability
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcontinuous.Rmd
360 lines (303 loc) · 16.9 KB
/
continuous.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
# (PART) Continuous Probability {-}
# Continuous Random Variables {#continuous}
## Motivating Example {-}
In a Poisson process, _the number of arrivals_ in any interval is
a random variable that follows a Poisson distribution.
But what about _the time of the first arrival_? This is also
a random variable, but of a different kind.
The time of the first arrival does not have to be an integer. It
can equal $1.2$ seconds, $2.173$ seconds, $\sqrt{2}$ seconds, or
even $\pi$ seconds. In fact, any real number from $0$ to $\infty$ is a
possible value of this random variable.
To be concrete, suppose radioactive particles hit a Geiger counter
according to a Poisson process with a rate of $\lambda = 0.8$ particles per second.
We will represent the time of the first arrival---that is, the time that the first particle
hits the Geiger counter---by $T$. We can calculate probabilities involving $T$.
For example, we can calculate the probability that the first particle
arrives after 2 seconds by rewriting this event in terms of the
number of arrivals on an interval:
\[ P(T > 2) = P(\text{0 particles between 0 and 2 seconds}) \]
We know that the number of particles between 0 and 2 seconds
follows a $\text{Poisson}(\mu=0.8 \cdot 2)$ distribution, so we
just need to evaluate the p.m.f. at $x=0$:
\begin{equation}
P(T > 2) = e^{-0.8 \cdot 2} \frac{(0.8 \cdot 2)^0}{0!} = e^{-1.6} \approx .202.
(\#eq:t-gt-2)
\end{equation}
We can also calculate the probability that the first arrival happens between
$2$ and $3$ seconds, as:
\[ P(2 < T < 3) = P(T > 2) - P(T > 3). \]
We calculate $P(T > 3)$ in much the same way as we calculated $P(T > 2)$ above:
\begin{equation}
P(T > 3) = e^{-0.8 \cdot 3} \frac{(0.8 \cdot 3)^0}{0!} = e^{-2.4} \approx .091.
(\#eq:t-gt-3)
\end{equation}
So we see that
\[ P(2 < T < 3) = e^{-1.6} - e^{-2.4} \approx .111. \]
Although we can calculate specific probabilities involving $T$, how do we
describe its distribution? In particular:
- What is its c.d.f.?
- Does it have a p.m.f.?
We will answer these questions in this lesson and more.
## Theory {-}
First, we calculate the c.d.f. of $T$, which is straightforward from its
definition \@ref(eq:cdf).
```{example geiger-cdf, name="The CDF of the First Arrival"}
Remember that the c.d.f. is defined as $F(t) = P(T \leq t)$ as a function of $t$.
First, we use the complement rule:
\[ F(t) = P(T \leq t) = 1 - P(T > t). \]
Now, we calculate $P(T > t)$, in much the same way that we calculated $P(T > 2)$
in \@ref(eq:t-gt-2) and $P(T > 3)$ in \@ref(eq:t-gt-3):
\[ P(T > t) = e^{- 0.8 \cdot t} \frac{(0.8 \cdot t)^0}{0!} = e^{-0.8 t}. \]
This formula works for $t \geq 0$. Since times are positive, we know that
\[ P(T > t) = 1 \]
for $t < 0$.
Putting everything together, the c.d.f. of $T$ is
\[ F(t) = 1 - P(T > t) = \begin{cases} 1 - e^{-0.8 t} & t \geq 0 \\ 0 & t < 0 \end{cases}. \]
This function is graphed below.
```{r t-cdf-graph, echo=FALSE, fig.show = "hold", fig.align = "center",fig.cap="CDF of the First Arrival"}
par(mgp=c(1, 1, 0))
t <- seq(0, 5.5, by=0.01)
plot(t, 1 - exp(-0.8 * t), type="l", lwd=3, col="red",
xaxt="n", yaxt="n", bty="n",
xlab="t", ylab="Cumulative Distribution Function F(t)",
xlim=c(-0.5, 5.5), ylim=c(-0.05, 1.05))
lines(c(-0.5, 0), c(0, 0), lwd=3, col="red")
axis(1, pos=0)
axis(2, pos=0)
```
Notice how different this c.d.f. is, compared with the ones we graphed in
Lesson \@ref(cdf). In Figure \@ref(fig:cdf-graph), the c.d.f. was a
step function, with a jump at each possible value of the random variable.
By contrast, the c.d.f. above is continuous. This is the main distinction between the
kinds of random variables we were studying before, which are called
**discrete random variables**, and the kinds of
random variables we study in this lesson, which are called
**continuous random variables**.
```{definition, name="Continuous Random Variable"}
A random variable is called **continuous** if its c.d.f. is a
continuous function.
```
With discrete random variables, we can visualize the shape of the
distribution by graphing its p.m.f. Recall that the p.m.f. specified the
probability that the random variable is _equal to_ $x$. Continuous random
variables do not have a p.m.f. because the probability of any exact outcome is zero.
```{example, name="Probability of a Single Outcome"}
Continuing with Example \@ref(exm:geiger-cdf), what is the probability that the
first particle hits the Geiger counter at _exactly_ 2 seconds---that is,
$P(T = 2.00000...)$?
First, consider the probability that the first arrival is in the
interval $(2 - \epsilon, 2 + \epsilon)$, where $\epsilon$ is a small
positive number. No matter how small an $\epsilon$ we choose, this
probability will always be greater than the probability that the
first arrival happens at _exactly_ 2 seconds, since this interval includes 2 seconds,
as well as some other outcomes. (For example, if $\epsilon = 0.05$, then this interval would
also include the possibility that the first arrival happens at 1.98 seconds or
2.01 seconds.)
The number of arrivals on the interval $(2 - \epsilon, 2 + \epsilon)$ is a
Poisson random variable with parameter $\mu = 0.8 \cdot 2\epsilon$. Therefore,
the probability that there are no arrivals on the interval is:
\begin{align*}
P(\text{at least 1 arrival on interval}) &= 1 - P(\text{no arrivals on interval}) \\
&= 1 - e^{-0.8 \cdot 2\epsilon} \frac{(0.8 \cdot 2 \epsilon)^0}{0!} \\
&= 1 - e^{-1.6 \epsilon}.
\end{align*}
But $\epsilon$ can be arbitrarily small. As $\epsilon \to 0$, this probability approaches 0.
Therefore, the probability that the first arrival happens at _exactly_ 2 seconds is zero.
```
Because the probability of every outcome in a continuous random variable is zero,
continuous random variables cannot be described by their p.m.f. Instead, they are
described by a similar function called the **probability density function**.
```{definition pdf, name="Probability Density Function"}
The **probability density function** (or p.d.f.) of a continuous random variable
is defined to be the derivative of the c.d.f.
\[ f(x) \overset{\text{def}}{=} F'(x). \]
The values of $f(x)$ do not represent the probability that the random variable
is equal to $x$ (because for a continuous random variable, that probability is
always equal to 0).
```
The following video presents an alternative perspective on the p.d.f. It gives
more insight into what a "probability density" is. It also reinforces
the message that $P(X = x) = 0$ for any continuous random variable $X$.
<iframe width="560" height="315" src="https://www.youtube.com/embed/6Osba2KLTuk" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
```{example geiger-pdf, name="The PDF of the First Arrival"}
Continuing with Example \@ref(exm:geiger-cdf), the c.d.f. was calculated to be
\[ F(t) = \begin{cases} 1 - e^{-0.8 t} & t \geq 0 \\ 0 & t < 0 \end{cases}. \]
Taking the derivative, we have
\begin{equation}
f(t) = F'(t) = \begin{cases} 0.8 e^{-0.8 t} & t \geq 0 \\ 0 & t < 0 \end{cases}.
(\#eq:geiger-pdf)
\end{equation}
Figure \@ref(fig:t-pdf-graph) below shows the p.d.f., along with the c.d.f. Since the p.d.f. is the
derivative of the c.d.f., it is the slope of the c.d.f. at a given value of $t$. The
steeper the c.d.f., the higher the p.d.f.
```{r t-pdf-graph, echo=FALSE, fig.show = "hold", fig.align = "center",fig.cap="PDF from the CDF", fig.asp=1.2}
par(mfrow=c(2, 1), mgp=c(1, 1, 0))
t <- seq(0, 5.5, by=0.01)
plot(t, 0.8 * exp(-0.8 * t), type="l", lwd=3, col="blue",
xaxt="n", yaxt="n", bty="n",
xlab="t", ylab="Probability Density Function f(t)",
xlim=c(-0.5, 5.5), ylim=c(-0.05, 0.81))
axis(1, pos=0)
axis(2, pos=0)
lines(c(-0.5, 0), c(0, 0), lwd=3, col="blue")
lines(c(1.5, 1.5), c(0, 0.8 * exp(-0.8 * 1.5)), lty=2)
lines(c(0, 1.5), rep(0.8 * exp(-0.8 * 1.5), 2), lty=2)
points(1.5, 0.8 * exp(-0.8 * 1.5), pch=19, col="blue")
plot(t, 1 - exp(-0.8 * t), type="l", lwd=3, col="red",
xaxt="n", yaxt="n", bty="n",
xlab="t", ylab="Cumulative Distribution Function F(t)",
xlim=c(-0.5, 5.5), ylim=c(-0.05, 1.05))
lines(c(-0.5, 0), c(0, 0), lwd=3, col="red")
abline(-1.5 * 0.8 * exp(-0.8 * 1.5) + 1 - exp(-0.8 * 1.5), 0.8 * exp(-0.8 * 1.5), col="blue")
lines(c(1.5, 1.5), c(0, 1 - exp(-0.8 * 1.5)), lty=2)
axis(1, pos=0)
axis(2, pos=0)
```
This p.d.f. tells us that the first arrival is more likely to happen sooner,
rather than later. However, the values of the p.d.f. are not easy to interpret.
They do not represent probabilities, but rather probability _densities_.
If the values of the p.d.f. do not represent probabilities, how do we calculate
probabilities using the p.d.f.? It turns out that _areas under the p.d.f._
represent probabilities.
```{theorem calculating-probs-pdf, name="Calculating Probabilities Using the PDF"}
Let $X$ be a continuous random variable with p.d.f. $f(x)$. Then:
\[ P(a < X \leq b) = \int_a^b f(x)\,dx. \]
```{proof}
Let $F(x)$ be the c.d.f. of $X$. Then, we know that
\[ P(a < X \leq b) = P(X \leq b) - P(X \leq a) = F(b) - F(a). \]
Since $f = F'$, the [Fundamental Theorem of Calculus](https://en.wikipedia.org/wiki/Fundamental_theorem_of_calculus#Second_part) says that
\[ F(b) - F(a) = \int_a^b f(x)\,dx. \]
```
Theorem \@ref(thm:calculating-probs-pdf) says that probabilities correspond to
_areas_ under the p.d.f. This is another way to see that the probability of any single outcome
must be 0. The probability that $X = x$ would be the integral from $x$ to $x$; since
there is no area under the curve at $x$, this probability must be 0.
Let's recalculate the probabilities from the Motivating Example,
now using the p.d.f. \@ref(eq:geiger-pdf).
```{example, name="Calculating Probabilities Using the PDF"}
We know that the p.d.f. of the first arrival, $T$, is
\[ f(t) = \begin{cases} 0.8 e^{-0.8 t} & t \geq 0 \\ 0 & t < 0 \end{cases}. \]
To calculate probabilities using Theorem \@ref(thm:calculating-probs-pdf), we
integrate the p.d.f.
For example, the probability that the first arrival happens between 2 and 3 seconds
is:
\[ P(2 < T < 3) = \int_2^3 f(t)\,dt = \int_2^3 0.8 e^{-0.8 t}\,dt = .111, \]
and the probability that the first arrival happens after 2 seconds is
\[ P(T > 2) = \int_2^\infty f(t)\,dt = \int_2^\infty 0.8 e^{-0.8 t}\,dt = .202.\]
The areas that these probabilities represent are shown on the graphs below.
```{r, echo=FALSE, fig.show = "hold", fig.align = "center",fig.cap="PDF of the First Arrival", fig.asp=1.2}
par(mfrow=c(2, 1), mgp=c(1, 1, 0))
t <- seq(0, 5.5, by=0.01)
plot(t, 0.8 * exp(-0.8 * t), type="l", lwd=3, col="blue",
main="P(2 < T < 3)",
xaxt="n", yaxt="n", bty="n",
xlab="t", ylab="Probability Density Function f(t)",
xlim=c(-0.5, 5.5), ylim=c(-0.05, 0.81))
axis(1, pos=0)
axis(2, pos=0)
range <- seq(2, 3, by=0.01)
polygon(c(2, range, 3), c(0, 0.8 * exp(-0.8 * range), 0),
border=NA, col=rgb(1, 0, 0, 0.5))
lines(c(-0.5, 0), c(0, 0), lwd=3, col="blue")
plot(t, 0.8 * exp(-0.8 * t), type="l", lwd=3, col="blue",
main="P(T > 2)",
xaxt="n", yaxt="n", bty="n",
xlab="t", ylab="Probability Density Function f(t)", xlim=c(-0.5, 5.5), ylim=c(-0.05, 0.81))
axis(1, pos=0)
axis(2, pos=0)
range <- seq(2, 5.5, by=0.01)
polygon(c(2, range, 5.5), c(0, 0.8 * exp(-0.8 * range), 0),
border=NA, col=rgb(1, 0, 0, 0.5))
lines(c(-0.5, 0), c(0, 0), lwd=3, col="blue")
```
What about the probability that the first arrival happens before 3 seconds?
Technically, we should integrate the p.d.f. from $-\infty$ to $3$. However, $f(t) = 0$
for $-\infty < t < 0$, so we effectively integrate the p.d.f. from $0$ to $3$:
\[ P(T < 3) = \int_{-\infty}^3 f(t)\,dt = \int_0^3 0.8 e^{-0.8 t}\,dt = .909. \]
So far, we saw that we take the derivative of the c.d.f. to get the p.d.f.
We can also go in reverse: if we have the p.d.f., we can take its integral to get the c.d.f.
\begin{equation}
F(x) = \int_{-\infty}^x f(t)\,dt.
(\#eq:pdf-to-cdf)
\end{equation}
```{example, name="(Re)deriving the CDF of the First Arrival"}
Suppose we only knew the p.d.f. of the first arrival $T$:
\[ f(t) = \begin{cases} 0.8 e^{-0.8 t} & t \geq 0 \\ 0 & t < 0 \end{cases}. \]
To calculate the c.d.f. at $x$, we integrate the p.d.f. up to $x$. Since the p.d.f.
is 0 when $t < 0$, the integral effectively starts from $t=0$.
\[ F(x) = \int_{-\infty}^x f(t)\,dt = \int_0^x 0.8 e^{-0.8 t}\,dt = 1 - e^{-0.8 x}. \]
(This integral can be calculated using the $u$-substitution $u=-0.8 t$, or [using
Wolfram Alpha](https://www.wolframalpha.com/input/?i=integrate+0.8*exp%28-0.8*t%29+from+t%3D0+to+x).)
```
Figure \@ref(fig:pdf-to-cdf) below illustrates how areas under the p.d.f. translate to
values of the c.d.f.
```{r pdf-to-cdf, echo=FALSE, fig.show = "hold", fig.align = "center",fig.cap="CDF from the PDF", fig.asp=1.2}
par(mfrow=c(2, 1), mgp=c(1, 1, 0))
t <- seq(0, 5.5, by=0.01)
plot(t, 0.8 * exp(-0.8 * t), type="l", lwd=3, col="blue",
xaxt="n", yaxt="n", bty="n",
xlab="t", ylab="Probability Density Function f(t)",
xlim=c(-0.5, 5.5), ylim=c(-0.05, 0.81))
axis(1, pos=0)
axis(2, pos=0)
range <- seq(0, 1.5, by=0.01)
polygon(c(0, range, 1.5), c(0, 0.8 * exp(-0.8 * range), 0),
border=NA, col=rgb(1, 0, 0, 0.5))
lines(c(-0.5, 0), c(0, 0), lwd=3, col="blue")
plot(t, 1 - exp(-0.8 * t), type="l", lwd=3, col="red",
xaxt="n", yaxt="n", bty="n",
xlab="t", ylab="Cumulative Distribution Function F(t)",
xlim=c(-0.5, 5.5), ylim=c(-0.05, 1.05))
lines(c(-0.5, 0), c(0, 0), lwd=3, col="red")
lines(c(1.5, 1.5), c(0, 1 - exp(-0.8 * 1.5)), lty=2)
lines(c(0, 1.5), rep(1 - exp(-0.8 * 1.5), 2), lty=2)
points(1.5, 1 - exp(-0.8 * 1.5), col="red", pch=19)
axis(1, pos=0)
axis(2, pos=0)
```
## Optional Video {-}
This video (not my own) does an excellent job explaining the mechanics of
calculations involving p.d.f.s and c.d.f.s. If you felt that you understood the material
above well enough to do the Essential Practice below, you probably do not need to watch this video.
Do not get too caught up in the calculus; remember that you can always use
[Wolfram Alpha](http://www.wolframalpha.com) to compute any integrals or derivatives.
<iframe width="560" height="315" src="https://www.youtube.com/embed/EPm7FdajBvc" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
## Essential Practice {-}
1. Packets arrive at a certain node on the university’s intranet at 10 packets per minute, on average.
Assume packet arrivals meet the assumptions of a Poisson process.
a. What is the p.d.f. of $T$, the time of the first arrival?
b. What is the probability that the first arrival happens between 10 seconds and 30 seconds?
(Note that the rate in the problem is given in _minutes_.)
2. Two Cal Poly students, Ferris and Cameron, are frequently late to class. Cal Poly
classes start at 10 minutes past the hour mark.
Let $X$ be the time (in minutes) that Ferris arrives at class after the hour mark. The p.d.f.
of $X$ is
\[ f_X(x) = \begin{cases} \frac{1}{60} & 0 \leq x < 60 \\ 0 & \text{otherwise} \end{cases}. \]
Let $Y$ be the time (in minutes) that Cameron arrives at class after the hour mark. The p.d.f.
of $Y$ is
\[ f_Y(y) = \begin{cases} \frac{60 - y}{1800} & 0 \leq y < 60 \\ 0 & \text{otherwise} \end{cases}. \]
a. Sketch a graph of the two p.d.f.s. Without doing any calculations, who is more likely to arrive
on time (i.e., within the first 10 minutes of the hour)?
b. Determine the c.d.f.s of $X$ and $Y$.
c. Calculate the probability that Ferris arrives on time. You should be able to calculate
this probability in three ways: (1) finding the area under the p.d.f. using geometry, (2)
finding the area under the p.d.f. using calculus, and (3) using the c.d.f.
d. Calculate the probability that Cameron arrives on time. You should also be able to calculate
this probability in three ways.
3. The distance (in hundreds of miles) driven by a trucker in one day is a continuous
random variable $X$ whose cumulative distribution function (c.d.f.) is given by:
\[ F(x) = \begin{cases} 0 & x < 0 \\ x^3 / 216 & 0 \leq x \leq 6 \\ 1 & x > 6 \end{cases}. \]
a. Determine the p.d.f. Sketch a graph.
b. Calculate the probability that the trucker travels more than 500 miles in a day. (You should
be able to calculate this in at least 2 ways: directly from the c.d.f. and using the p.d.f.
that you calculated in the previous part.)
c. Calculate the probability that the trucker travels exactly 200 miles in a day.
4. Let $T$ be how late that a professor lets her class out (in minutes).
(If $T$ is negative, then she finishes the class early.) The p.d.f. of $T$ is
\[ f(t) = \begin{cases} c(2 + t) & -2 < t < 0 \\ c(2 - t) & 0 \leq t < 2 \\ 0 & \text{otherwise} \end{cases}, \]
where $c$ is a constant.
a. Determine the value of $c$ necessary to make this a proper p.d.f. (_Hint:_ What does the
total probability have to be?) Sketch the p.d.f.
b. Calculate the probability that the professor finishes between 1 minute early and
30 seconds late, i.e., $P(-1 < T < 0.5)$.