-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathinfo-1.Rnw
161 lines (128 loc) · 5.7 KB
/
info-1.Rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
\documentclass{beamer}
\usepackage{amsmath}
\setkeys{Gin}{width=0.9\textwidth,height=0.7\textheight}
\title{Information Theory}
\subtitle{Part 1: What is Information?}
\author{Isaac Carruthers}
\begin{document}
\frame{\titlepage}
\frame{\frametitle{What is Information Theory?
\footnote{ Developed by {\em Claude E. Shannon} in the late 1940s}}
How do we quantify Information?
\begin{itemize}
\item If we have a noisy channel, how quickly can we communicate?
\item If we are uncertain about something, but become less uncertain, how do we quantify
how much we've learned?
\item If we have an incomplete description of a system, how much information do we need to
collect before we can fully specify it?
\end{itemize}
\vspace{12pt}
Conversely, how do we quantify {\em uncertainty}?\\
(i.e. a lack of information)
}
\frame{\frametitle{Uncertainty of Two Systems}
Consider the uncertainty ($\sigma$) of two systems with identical setup, but not necessarily
identical results:
\vspace{12pt}
\begin{itemize}
\item Let $X_1, X_2$ be two independent, fair coin-flips
\item Clearly we should have $\sigma_1=\sigma_2$
\item Intuitively, we would also like the uncertainty of the combined system to have the
property $\sigma_{1,2}=2\sigma_1=2\sigma_2$
\end{itemize}
\vspace{12pt}
Another way to look at this: if we learn the outcome of flip 1, our uncertainty in $X_1$
should decrease by $\sigma_1$ and go to zero. Under any reasonable measure, the uncertainty
of the combined system should decrease by the same amount as the individual systems.
}
\frame{\frametitle{Uncertainty of Two Systems}
Intuitively, our uncertainty about a system should be related to the number of states we think
it could be in
\vspace{12pt}
In the previous example, each individual system could be in 1 of 2 states, and the combined
system could be in 1 of 4
\vspace{12pt}
When we combined the system, we multiplied the number of states they could be in, but we
added their uncertainties
\vspace{12pt}
The only functions that have this property in the general case are logarithms:
\[\log_b(x y) = log_b(x) + log_b(y)\]
}
\frame{\frametitle{More About Systems with Two Possibilities}
We are already familiar with a system like this for measuring information: bits!
\vspace{12pt}
If we have a system of $n$ coin-flips, there are $2^n$ possible outcomes, and we need to
send exactly $\log_2(2^n)=n$ bits to fully specify the outcome!
}
\frame{\frametitle{What if Some Possibilities are More Likely than Others?}
What if some possibilities are more likely than others?
\vspace{12pt}
In the simplest case, what if our coin always comes up heads?
\vspace{12pt}
Hopefully we all agree that there's no uncertainty here, so uncertainty should be $0$
\vspace{12pt}
What if there is a 75\% chance of getting heads?
}
\frame{\frametitle{What if Some Possibilities are More Likely than Others?}
There is a simple and neat interpolation to the variable-probability case
\vspace{12pt}
Rephrase our current definition in the following way:
\[\sigma=\log_b(n)=-\log_b\left(\frac1n\right)=-\left<\log_b(p_s)\right>_s=
-\sum_sp_s\log_b(p_s)\]
\vspace{12pt}
where the second equality only holds if all $n$ states have equal probability $p_s=\frac1n$
}
\frame{\frametitle{What if Some Possibilities are More Likely than Others?}
\centering
<<basic,fig=TRUE,echo=FALSE>>=
p <- (1:1000)/1000
s <- -p * log2(p) - (1-p)*log2(1-p)
plot(p, s, ylab=bquote(paste(sigma)), ty='l', bty='n', las=2, lwd=3)
abline(h=1, lty=2)
abline(v=0.5, lty=2)
abline(v=0.75, lty=3)
abline(h=-(0.75*log2(0.75)+0.25*log2(0.25)), lty=3)
@
}
\frame{\frametitle{So What is Information?}
In this framework, we can define information as the {\em change in our uncertainty} about
the state of the system.
\vspace{12pt}
Suppose we are waiting to hear about the outcome of a coin-flip, carried out by a friend in
another city. We start out with uncertainty $log_2(2)=1$ bit, and after we
hear the result our uncertainty is 0, so we gained 1 bit on information.
\vspace{12pt}
If we are waiting for a 75\% biased coin, we will only gain about $0.81$ bits of information...
But our friend still had to send us a whole bit, so how does that work?
}
\frame{\frametitle{What About the Fractional Bits?}
So suppose instead of just one biased coin, we have $k$ of them.
\vspace{12pt}
Each one has some entropy less than 1, so in theory we should be able to transmit their
outcome using less than $k$ bits.
\vspace{12pt}
Suppose we have $k=2$ coins that are 90\% biased (each with entropy 0.47 bits); can we transmit
the outcome of this in fewer than 2 bits?
}
\frame{\frametitle{Two Biased Coins}
Two coins with $p=0.9$ have four possible outcomes: $HH$ with $p=0.81$, $HT$ and $TH$ with
$p=0.09$ each, and $TT$ with $p=0.01$.
\vspace{12pt}
Suppose we use the following coding: on $HH$ we will send 0, on $HT$ we send 10, on $TH$ 110,
and on $TT$ 111.
\vspace{12pt}
On average we will send just $1*0.81+2*0.09+3*0.09+3*0.01=1.29$ bits!
\vspace{12pt}
As we deal with bigger and bigger systems, we can get closer and closer to the
theoretical limit implied by the system's entropy.
}
\frame{\frametitle{Summary}
\begin{itemize}
\item We measure information by how much our uncertainty decreases
\item We measure our uncertainty with entropy
\item Entropy puts a lower bound on how much information we need to fully specify our
system
\item We can develop ways to encode the state of our system that approach our lower bound
\end{itemize}
}
\end{document}