forked from mirumirai/AllYourBayes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
lda.html
403 lines (349 loc) · 23.2 KB
/
lda.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Flat UI - Free User Interface Kit</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<!-- Loading Bootstrap -->
<link href="bootstrap/css/bootstrap.css" rel="stylesheet">
<!-- Loading Flat UI -->
<link href="css/flat-ui.css" rel="stylesheet">
<link href="css/site.css" rel="stylesheet">
<link rel="shortcut icon" href="images/favicon.ico">
<!-- HTML5 shim, for IE6-8 support of HTML5 elements. All other JS at the end of file. -->
<!--[if lt IE 9]>
<script src="js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="container">
<div id="slidy-swoop" class="carousel slide" data-ride="carousel" data-interval="false">
<div class="carousel-inner">
<!-- page 1 -->
<div id="one" class="item active">
<h3>Linear Discriminant Analysis</h3>
<div class="row">
<div class="col-md-12">
<!-- <p>This is a formula: </p>
<blockquote><img src="http://latex.codecogs.com/gif.latex?1+sin(x)" /></blockquote> -->
<p></p>
<p>Analysis – We have at least two sets of data, and we want to classify (i.e. analyze) incoming data according to the stimulus that produced it.</p>
<ul>
<li>Did the neuron firing rate we recently recorded result from the monkey touching a hot stove or an ice cube?</li>
<li>Did our new data come from the red distribution or the blue distribution (above diagram)?</li>
</ul>
<p>Discriminant – In order to discriminate (i.e. categorize) incoming data, we need to decide on our cutoffs between each category.</p>
<ul>
<li>To make this process easier, we assume that the data in every category are distributed as multivariate Gaussian distributions with the same variance.</li>
<li>Thus, as with the distributions of the three data sets shown below, the points of any data set are just a shifted version of any other data set.</li>
</ul>
<img src="LDA1.png" class="pull-left" alt="">
</div>
</div> <!-- /row -->
<div class="row">
</div> <!-- /row -->
</div>
<!-- / page 1-->
<!-- page 2a -->
<div id="twoa" class="item">
<h3>Quantifying LDA</h3>
<div class="row">
<div class="col-md-12">
<!-- <img src="images/icons/svg/gift-box.svg" class="pull-left" alt=""> -->
<p> Linear – If our distributions are actually all shifted version of the same Gaussian shape, then dividing them up with lines is an effective means of classifying.</p>
<ul>
<li>Draw a strict line in the proverbial sand – The line is the boundary. Every point that lands on the left side of the line is in category red. Every point on the right is category blue.</li>
<li>As with MAP estimation, LDA takes priors into account. In the top picture, if category red is historically much more likely than category blue, we push the line farther over into blue territory, assuming we are more likely to see red in the future.</li>
</ul>
<img src="LDA2.png" class="pull-left" alt="">
</div>
</div>
<div class="row">
</div> <!-- /row -->
</div>
<!-- / page 2a-->
<!-- questions 1-->
<div id="questions1" class="item">
<h3>Practice Questions</h3>
<div class="row">
<!-- <div class="col-md-2"></div> -->
<div class="col-md-4">
<div class="panel panel-default">
<div class="panel-body">
<h6 class="demo-panel-title">What does LDA stand for?</h6>
<div class="form-group">
<input type="text" value="" placeholder="" class="form-control">
</div>
<button data-answer="linear discriminant analysis" class="btn btn-block btn-primary">
<span class="glyphicon glyphicon-check"></span> Check
</button>
</div>
</div>
</div> <!-- /col -->
<div class="col-md-4">
<div class="panel panel-default">
<div class="panel-body">
<h6 class="demo-panel-title"> In LDA, we assume every data set’s distribution to have the same _____ but possibly different ______.</h6>
<label class="radio">
<span class="icons"><span class="first-icon fui-radio-unchecked"></span><span class="second-icon fui-radio-checked"></span></span><input type="radio" name="q2" id="q2-1" value="variance, means" data-toggle="radio">
variance, means
</label>
<label class="radio">
<span class="icons"><span class="first-icon fui-radio-unchecked"></span><span class="second-icon fui-radio-checked"></span></span><input type="radio" name="q2" id="q2-2" value="other" data-toggle="radio">
means, variance
</label>
<button data-answer="variance, means" class="btn btn-block btn-primary">
<span class="glyphicon glyphicon-check"></span> Check
</button>
</div>
</div>
</div>
<div class="col-md-4">
<div class="panel panel-default">
<div class="panel-body">
<h6 class="demo-panel-title"> In LDA, what distribution do we assume every data set follows?</h6>
<label class="radio">
<span class="icons"><span class="first-icon fui-radio-unchecked"></span><span class="second-icon fui-radio-checked"></span></span><input type="radio" name="q2" id="q2-1" value="Answer 1" data-toggle="radio">
Poisson
</label>
<label class="radio">
<span class="icons"><span class="first-icon fui-radio-unchecked"></span><span class="second-icon fui-radio-checked"></span></span><input type="radio" name="q2" id="q2-2" value="Gaussian" data-toggle="radio">
Gaussian
</label>
<label class="radio">
<span class="icons"><span class="first-icon fui-radio-unchecked"></span><span class="second-icon fui-radio-checked"></span></span><input type="radio" name="q2" id="q2-3" value="Answer 3" data-toggle="radio">
Bimodal
</label>
<button data-answer="Gaussian" class="btn btn-block btn-primary">
<span class="glyphicon glyphicon-check"></span> Check
</button>
</div>
</div>
</div>
</div>
<div class="row">
</div>
</div>
<!-- /questions 1-->
<!-- page 2 -->
<div id="two" class="item">
<h3>Quantifying LDA</h3>
<div class="row">
<div class="col-md-12">
<!-- <img src="images/icons/svg/gift-box.svg" class="pull-left" alt=""> -->
<p>We know we need to draw a line between our two categories’ distributions. Where should the line go?</p>
<p>Intuitively, since this line will be our <u>decision boundary</u> between the two categories, it should correspond to the points at which the likelihood of either category is equally likely. If our data falls on this line, then we don’t know how to classify it. It could just as easily be category ‘hot stove’ as category ‘ice cube.’</p>
<p>To find all the points where the likelihood of each category is equal, we first need a function that takes a category and returns the likelihood of that category. For example, if we call this function f, then f(‘ice cube’) might return 0.57, telling us that the probability the monkey touched an ice cube is 0.57 (given the distributions we already have for ‘hot stove’ and ‘ice cube’). </p>
<p>WARNING: Math ahead. It’s worth the effort.</p>
<p> </p>
</div>
</div>
<div class="row">
</div> <!-- /row -->
</div> <!-- / page 2-->
<!-- page 3 -->
<div id="three" class="item">
<h3>Quantifying LDA</h3>
<div class="row">
<div class="col-md-12">
<!-- <img src="images/icons/svg/gift-box.svg" class="pull-left" alt=""> -->
<p>Since we assumed that our categories are distributed as multivariate Gaussians (the pictures above), this likelihood function is essentially just this distribution’s probability density function with the category as the parameter rather than the value:</p>
<blockquote><img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;f(c)&space;=&space;\frac{\pi_c}{(2\pi)^{k/2}&space;det(\Sigma)^{1/2}}e^{-0.5(x&space;-&space;\mu_c)^T\Sigma^{-1}(x&space;-&space;\mu_c)}&space;\end{align*}"></blockquote>
<p>where <img src="http://latex.codecogs.com/gif.latex?\pi%20_{c}"> is the prior probability of the category, k is the distribution’s number of components (‘multi’ in ‘multi-variate’ may mean 2, 3, 4…), x is our vector of data (‘hot stove’ and ‘ice cube’ data, for example), and <img src="http://latex.codecogs.com/gif.latex?\mu%20_{c}">is the category’s mean.</p>
<p>This expression looks messy and has lots of variables to keep track of. Let’s try to simplify it for our application.</p>
<p>Our goal is to find the category C with the largest likelihood. In other words, we want the value of C for which f(C) is maximized (i.e., argmax(f(C))). As a result, we can remove from f any terms that our constant for all values of C, as these terms won’t affect the function’s relative maximum.</p>
<p>Thus, we’ve shaved down our function to </p>
<p><blockquote><img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;f(c)&space;=&space;\pi_c&space;e^{-0.5(x&space;-&space;\mu_c)^T\Sigma^{-1}(x&space;-&space;\mu_c)}&space;\end{align*}"></blockquote></p>
</div>
</div>
<div class="row">
</div> <!-- /row -->
</div> <!-- / page 3-->
<!-- page 4 -->
<div id="four" class="item">
<h3>Quantifying LDA</h3>
<div class="row">
<div class="col-md-12">
<!-- <img src="images/icons/svg/gift-box.svg" class="pull-left" alt=""> -->
<p>Can we do even better? Of course we can, or we wouldn’t be asking.</p>
<p>Remember once again that we only need to find the value of C which maximizes the output of f. If we think very hard, we’ll eventually realize that taking the of f won’t change the maximizing C value because is a monotonically increasing function. Furthermore, it will make our lives much easier, because we won’t have compute the value of e raised to a power anymore.</p>
<p><blockquote><img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;f(c)&space;=&space;\log(\pi_c)&space;-&space;0.5(x&space;-&space;\mu_c)^T\Sigma^{-1}(x&space;-&space;\mu_c)&space;\end{align*}"></blockquote></p>
<p>If we think even harder about linear algebra (which we won’t make you do for now), we can simplify this expression to</p>
<p><blockquote><img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;f(c)&space;=&space;\log(\pi_c)&space;+&space;\mu_c^T\Sigma^{-1}x&space;-&space;0.5\mu_c^T\Sigma^{-1}\mu_c&space;\end{align*}"></blockquote></p>
<p>Great! We have a fairly simple function that takes a category and returns the likelihood of that category. Since we may have many categories to consider, let’s generalize its naming scheme. For a data set x and category C, we’ll rename <img src="http://latex.codecogs.com/gif.latex?f"> to <img src="http://latex.codecogs.com/gif.latex?\delta%20_{c}%20%28x%29">.</p>
</div>
</div>
<div class="row">
</div> <!-- /row -->
</div> <!-- / page 4-->
<!-- page 5 -->
<div id="five" class="item">
<h3>Quantifying LDA</h3>
<div class="row">
<div class="col-md-12">
<!-- <img src="images/icons/svg/gift-box.svg" class="pull-left" alt=""> -->
<p>What should we do with our function? Remember that we want to draw a line between two categories where each is equally likely. </p>
<p>If we name our categories ‘hot stove’ and ‘ice cube’, then we have two functions for any given data set x: <img src="http://latex.codecogs.com/gif.latex?\delta_{\text{hot%20stove}}%28x%29"> and <img src="http://latex.codecogs.com/gif.latex?\delta_{\text{ive%20cube}}%28x%29">. Now we want the line of values in our data set x for which these two categories’ likelihoods are equal.</p>
<p><blockquote><img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;\delta_{\text{hot&space;stove}}(x)&space;=&space;\delta_{\text{ice&space;cube}}(x)\\&space;\end{align*}"></blockquote></p>
<p><blockquote><img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;\log(\pi_{\text{hs}})&space;-&space;\mu_{\text{hs}}^T\Sigma^{-1}x&space;-&space;0.5\mu_{\text{hs}}^T\Sigma^{-1}\mu_{\text{hs}}&space;=&space;\log(\pi_{\text{ic}})&space;-&space;\mu_{\text{ic}}^T\Sigma^{-1}x&space;-&space;0.5\mu_{\text{ic}}^T\Sigma^{-1}\mu_{\text{ic}}&space;\end{align*}"></blockquote></p>
<p>Therefore, moving all the terms to the left side, our final boundary equation is</p>
<p><blockquote><img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;(\mu_{hs}&space;-&space;\mu_{ic})^T\Sigma^{-1}x&space;-&space;\frac{1}{2}(\mu_{hs}&space;+&space;\mu_{ic})^T\Sigma^{-1}(\mu_{hs}&space;-&space;\mu_{ic})&space;+&space;\log\frac{\pi_{hs}}{\pi_{ic}}&space;=&space;0&space;\end{align*}"></blockquote></p>
<p>What is this equation telling us? If we provide it values for our categories’ means (<img src="http://latex.codecogs.com/gif.latex?$\mu_{\text{hot&space;stove}},&space;\mu_{\text{ice&space;cube}}$">), covariances (<img src="http://latex.codecogs.com/gif.latex?$\Sigma^{-1}$">), and prior probabilities (<img src="http://latex.codecogs.com/gif.latex?$\pi_{hot&space;stove},&space;\pi_{ice&space;cube}$">), then it will tell us the values x in our dataset for which the likelihoods of our two categories are equal. We have our line!</p>
</div>
</div>
<div class="row">
</div> <!-- /row -->
</div> <!-- / page 5-->
<!-- page 6 -->
<div id="six" class="item">
<h3>Practice Questions</h3>
<div class="row">
<div class="col-md-12">
<p>
Consider the following graph:</p>
<blockquote><img src="LDA1.png" height="300" width="300"></blockquote>
</div>
</div>
<div class="row">
<!-- <div class="col-md-2"></div> -->
<div class="col-md-6">
<div class="panel panel-default">
<div class="panel-body">
<h6 class="demo-panel-title">If we learn that both the mean x and mean y values of the blue category decrease, which way does the decision boundary line move?</h6>
<label class="radio">
<span class="icons"><span class="first-icon fui-radio-unchecked"></span><span class="second-icon fui-radio-checked"></span></span><input type="radio" name="q2" id="q2-1" value="Answer 1" data-toggle="radio">
Line moves down and left
</label>
<label class="radio">
<span class="icons"><span class="first-icon fui-radio-unchecked"></span><span class="second-icon fui-radio-checked"></span></span><input type="radio" name="q2" id="q2-2" value="Gaussian" data-toggle="radio">
Up and right
</label>
<label class="radio">
<span class="icons"><span class="first-icon fui-radio-unchecked"></span><span class="second-icon fui-radio-checked"></span></span><input type="radio" name="q2" id="q2-3" value="Does not change" data-toggle="radio">
Does not change
</label>
<button data-answer="Does not change" class="btn btn-block btn-primary">
<span class="glyphicon glyphicon-check"></span> Check
</button>
</div>
</div>
</div>
<div class="col-md-6">
<div class="panel panel-default">
<div class="panel-body">
<h6 class="demo-panel-title">If we learn that the blue and red distributions have different variances, can we still perform LDA analysis?</h6>
<label class="radio">
<span class="icons"><span class="first-icon fui-radio-unchecked"></span><span class="second-icon fui-radio-checked"></span></span><input type="radio" name="q2" id="q2-1" value="Answer 1" data-toggle="radio">
Yes
</label>
<label class="radio">
<span class="icons"><span class="first-icon fui-radio-unchecked"></span><span class="second-icon fui-radio-checked"></span></span><input type="radio" name="q2" id="q2-2" value="No, LDA analysis assumes that both distributions have the same variance." data-toggle="radio">
No
</label>
<button data-answer="No, LDA analysis assumes that both distributions have the same variance." class="btn btn-block btn-primary">
<span class="glyphicon glyphicon-check"></span> Check
</button>
</div>
</div>
</div>
</div>
<div class="row">
</div>
</div>
<!-- /page6 -->
<!-- page 7 -->
<div id="seven" class="item">
<h3>Practice Questions</h3>
<div class="row">
<div class="col-md-12">
<p>
Consider the following graph in which the distribution of red values exhibits positive correlation:</p>
<blockquote><img src="LDA4.gif" height="300" width="300"></blockquote>
</div>
</div>
<div class="row">
<!-- <div class="col-md-2"></div> -->
<div class="col-md-6">
<div class="panel panel-default">
<div class="panel-body">
<h6 class="demo-panel-title">If both the red and blue distributions in our example exhibited positive correlation, would the slope of our decision boundary be positive or negative?</h6>
<label class="radio">
<span class="icons"><span class="first-icon fui-radio-unchecked"></span><span class="second-icon fui-radio-checked"></span></span><input type="radio" name="q2" id="q2-1" value="Positive" data-toggle="radio">
Positive
</label>
<label class="radio">
<span class="icons"><span class="first-icon fui-radio-unchecked"></span><span class="second-icon fui-radio-checked"></span></span><input type="radio" name="q2" id="q2-2" value="Negative" data-toggle="radio">
Negative
</label>
<button data-answer="Positive" class="btn btn-block btn-primary">
<span class="glyphicon glyphicon-check"></span> Check
</button>
</div>
</div>
</div>
<div class="col-md-6">
<div class="panel panel-default">
<div class="panel-body">
<h6 class="demo-panel-title"> We can generalize Linear Discriminant Analysis to work with more than two categories <a href="(http://en.wikipedia.org/wiki/Linear_discriminant_analysis#Multiclass_LDA)">(example)</a>. Notice that we need one boundary line to divide two categories. How many boundary lines do we need to divide n categories?</h6>
<label class="radio">
<span class="icons"><span class="first-icon fui-radio-unchecked"></span><span class="second-icon fui-radio-checked"></span></span><input type="radio" name="q2" id="q2-1" value="Answer 1" data-toggle="radio">
n
</label>
<label class="radio">
<span class="icons"><span class="first-icon fui-radio-unchecked"></span><span class="second-icon fui-radio-checked"></span></span><input type="radio" name="q2" id="q2-1" value="Answer 1" data-toggle="radio">
n+1
</label>
<label class="radio">
<span class="icons"><span class="first-icon fui-radio-unchecked"></span><span class="second-icon fui-radio-checked"></span></span><input type="radio" name="q2" id="q2-2" value="n-1" data-toggle="radio">
n-1
</label>
<button data-answer="n-1" class="btn btn-block btn-primary">
<span class="glyphicon glyphicon-check"></span> Check
</button>
</div>
</div>
</div>
</div>
<div class="row">
</div>
</div>
<!-- /page7 -->
</div>
<div>
<div class="pull-left">
<a href="index.html">
<span class="glyphicon glyphicon-arrow-left"></span>
Back to Intro
</a>
</div>
<!-- Steps -->
<div class="pull-right btn-group">
<a href="#slidy-swoop" class="btn btn-lg btn-inverse" data-slide="prev">
<span class="glyphicon glyphicon-chevron-left"></span>
Previous
</a>
<a href="#slidy-swoop" class="btn btn-lg btn-inverse" data-slide="next">
Next
<span class="glyphicon glyphicon-chevron-right"></span>
</a>
</div>
</div>
</div> <!-- /slidy-swoop -->
</div> <!-- /container -->
<!-- Load JS here for greater good =============================-->
<script src="js/jquery-1.8.3.min.js"></script>
<script src="js/jquery-ui-1.10.3.custom.min.js"></script>
<script src="js/jquery.ui.touch-punch.min.js"></script>
<script src="js/bootstrap.min.js"></script>
<script src="js/bootstrap-select.js"></script>
<script src="js/bootstrap-switch.js"></script>
<script src="js/flatui-checkbox.js"></script>
<script src="js/flatui-radio.js"></script>
<script src="js/jquery.tagsinput.js"></script>
<script src="js/jquery.placeholder.js"></script>
<script src="js/jquery.stacktable.js"></script>
<script src="http://vjs.zencdn.net/4.1/video.js"></script>
<script src="js/application.js"></script>
<script src="check.js"></script>
</body>
</html>