-
Notifications
You must be signed in to change notification settings - Fork 0
/
assignment.Rmd
744 lines (533 loc) · 38.5 KB
/
assignment.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
---
title: "CCS Assignment 2021-2022"
author: "Christopher Lucas and Simon Valentin"
course: Computational Cognitive Science (2021–2022)
output:
pdf_document:
template: assignment-ccs.template
html_document:
df_print: paged
bibliography: ccs_assignment_2021.bib
urlcolor: blue
editor_options:
markdown:
wrap: sentence
geometry: margin=1.5in
---
```{r echo=FALSE, message=FALSE}
# We'll want to create some plots with ggplot2
library(ggplot2)
library(dplyr)
library(rstan)
rstan_options(auto_write = TRUE)
```
### Submission Instructions
You should answer the questions in this Rmd file, and 'Knit' it into a PDF file, which you will submit.
Upload your answer file on [Blackboard Learn](https://www.learn.ed.ac.uk/).
Please make sure you have submitted the right file by downloading and reviewing the file you have submitted.
We cannot make concessions for students who turn in incomplete or incorrect files by accident.
### Good Scholarly Practice
Please remember University policies regarding all assessed work for credit.
Details and advice about this can be found [here](http://web.inf.ed.ac.uk/infweb/admin/policies/academic-misconduct).
You are expected to complete this assignment *individually* and must take reasonable measures to protect your assessed work from unauthorised access.
For example, if you put any such work on a public repository then you must set access permissions appropriately (generally permitting access only to yourself, or your group in the case of group practicals).
Do not disclose your solutions when asking questions in a forum visible to other students (e.g., non-private questions on Piazza).
When in doubt, contact course staff by email or private questions.
### General Guidance
The assignment is out of 100.
Marks are given alongside each question.
Especially challenging questions are marked 'Stretch'.
We recommend you leave them to the end if you are stuck on them and answer easier questions first.
### Objective and materials
This assignment is designed to introduce you to simple models of causal learning, and assess your ability to (1) understand and implement computational models, and (2) evaluate and compare them using experimental data.
We will look at three models: RW, $\Delta$P, and Causal Support.
Our data are from real-world experiments where people made judgments about whether objects were 'blickets', having seen events involving those objects.
The data files you will need are `org-kva-adult-E1-OR.csv` and `org-kva-adult-E1-AND.csv`, contained in the `data` sub-directory in the same archive that contains this Rmd file.
You will need to install RStan to complete this assignment.
You can find installation instructions [here](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started).
## Background: The experimental setup
"Blicket machine" experiments use a fictional device called a "blicket detector" or "blicket machine" that reacts to "blickets" by activating (lighting up and playing a sound).
That is, blickets are objects that can cause the blicket machine to activate.
By showing people different events involving the machine and prospective blickets, experimenters can learn about how people reason about cause and effect.
Please have a look at [@gopnik_theory_2004]([link](https://oce-ovid-com.ezproxy.is.ed.ac.uk/article/00006832-200401000-00001/HTML)) for a description of blicket detector experiments and data like those we will be modelling.
### Example: Backward blocking
Here is an example of a blicket detector experiment, similar to experiments that have demonstrated the *backward blocking* phenomenon.
The experimenter shows participants two objects, **A** and **B**, and says that one, both, or neither might be blickets.
They ask participants to judge whether **A** and **B** are blickets (0 for yes, 1 for no).
The experimenter then places **A** and **B** on the machine together.
The machine activates, lighting up and playing a sound.
The experimenter asks for another judgment about how likely it is that **A** and **B** are blickets.
The experimenter then places **A** placed on the machine by itself.
The machine activates again.
The experimenter asks for a final set of judgments about how likely it is that **A** and **B** are blickets.
------------------------------------------------------------------------
When people see the final event, they often revise their beliefs about not just **A**, but about **B** as well.
This phenomenon is called "backward blocking", and has been studied extensively with blicket detector experiments, as well as in earlier research that predates blicket detector experiments.
We can summarise the events participants see in the experiment above as "AB+ A+", where events are separated by spaces, letters denote the objects that are placed on the machine, and "+" and "-" indicate whether the machine activated.
### Experiments
We will be considering results from previous experiments that are slightly more complicated than the above example. These experiments were not designed to distinguish between the models you will be comparing, but may nonetheless be useful in understanding differences between the models.
The table below summarizes them, using the same notation as above.
| Condition Name | Abbrev | Event Set 1 | Event Set 2 |
|:---------------|--------|:---------------------|:--------------------|
| Conjunctive | CON | A- B- C- AB- AC+ BC- | D- D- D- E- DF+ DF+ |
| Disjunctive | DIS | A+ B- C+ AB+ AC+ BC+ | D- D- D- E- DF+ DF+ |
The experiment worked as follows: Participants were randomly allocated to either the CON or DIS condition, were then shown Event Set 1 for the respective condition, and subsequently made judgments about objects A, B, and C.
Then participants were presented with Event Set 2 from the condition they were allocated to, after which they made judgments but this time about objects D, E, and F.
### The data
There are two kinds of data we need to consider.
The first is the data that participants see -- the models will need to be provided those events in order to make predictions.
#### Event representation
Let us represent the sequence of objects placed on the machine as a $(E,O)$ matrix where $E$ stands for the number of events and $O$ to the number of objects.
Thus, each row describes the objects that are present in a single event, where a $1$ entry indicates that the corresponding object is present, and a $0$ entry indicating that the object is absent (with the columns ordered alphabetically).
For the backwards-blocking example, this would be:
```{r}
bb_objects <- matrix(c(1,1,1,0),nrow=2,ncol=2)
print(bb_objects)
```
So the first row (1,1) means that in the first event, both objects are present on the machine.
We can use a vector of length $E$ containing ones and zeros to represent whether the machine activates (1) or doesn't (0).
For the backward blocking example above, the vector would thus be:
```{r}
bb_activations <- c(1,1)
```
This means that the machine activates for the first event, and also for the second event.
For the experiment we are looking at, there are more than two events and two objects to keep track of.
Fortunately, the objects that are placed on the machine are the same in both conditions (CON and DIS), so we can use one matrix for both conditions:
```{r}
all_objects <- matrix(c(1,0,0,1,1,0,0,0,0,0,0,0,
0,1,0,1,0,1,0,0,0,0,0,0,
0,0,1,0,1,1,0,0,0,0,0,0,
0,0,0,0,0,0,1,1,1,0,1,1,
0,0,0,0,0,0,0,0,0,1,0,0,
0,0,0,0,0,0,0,0,0,0,1,1),
ncol=6,nrow=12)
print(all_objects)
```
Make sure you understand how the rows of the matrix above correspond to `A B C AB AC BC D D D E DF DF`.
The difference in the conditions is in the activation of the machine:
```{r}
con_activations <- c(0,0,0,0,1,0,0,0,0,0,1,1)
dis_activations <- c(1,0,1,1,1,1,0,0,0,0,1,1)
```
Now that we have the evidence that people are being given in the experiment, we can also run this evidence through different computational models and see what predictions they make about human judgments.
This allows us to assess how different models expect people will learn and generalize from that evidence, and since we have experimental data from actual people, we can compare model predictions to human judgments.
We will go through an example of a computational model in the next section.
Before that, let's look at some data from the experimental participants.
There are many columns in our CSVs, but the relevant ones for us are the binary judgments about whether objects A-F are blickets in both conditions. Note that the original experiment had people make two sets of judgments with different sets of objects, but here we will simply skip the second set.
```{r}
get_judgments <- function(path_to_csv) {
human_data <- read.csv(path_to_csv)
select(human_data,matches("isBlick.1"))
}
human_dis_data <- get_judgments("org-kva-adult-E1-OR.csv")
human_con_data <- get_judgments("org-kva-adult-E1-AND.csv")
```
The columns are `isBlickA1` through `isBlickF1`.
We aren't using the second judgments here, so we're dropping `isBlickA2` and so on.
Here are mean judgments over all participants, separate for each condition:
```{r}
colMeans(human_dis_data)
colMeans(human_con_data)
```
## Part 1: Understanding and implementing models
Now that we have our data, let's write some functions that will allow us to make predictions about people's judgments using different computational models.
We want our functions to take the events that were observed (the objects and activations), as well as any model parameters that might be relevant, and output the probability of each object being a blicket.
To illustrate, we'll start with a simple model that doesn't have any parameters.
### Model: $\Delta$P
Our first model is called $\Delta$P; most of the code for this model is implemented for you in the block below.
Briefly, $\Delta$P estimates the causal efficacy of a prospective cause by comparing the probability of the effect in the presence of the cause to the probability of the effect in the absence of the cause: $P(e=1|c=1)-P(e=1|c=0)$.
For details about this model, please look at page 339 of [@griffiths_structure_2005] [(link)](https://www.sciencedirect.com/science/article/pii/S0010028505000459).
Let's see what the $\Delta P$ values are for our two experimental conditions, but we first need to complete the code below.
**Question 1 (3 Marks)**: Complete the function `delta_P_values` in the code block below.
You need only edit the line
`delta_p <- 0.5` -- no new lines of code are necessary.
```{r}
delta_P_values <- function(objs,activations,theta) {
n_objects <- dim(objs)[2]
n_events <- dim(objs)[1]
delta_ps = rep(0,n_objects)
for(i in 1:n_objects) {
present <- objs[,i]
active_with <- as.double(sum(activations[as.logical(present)]))
p_with <- active_with/sum(present)
active_without <- sum(activations)-active_with
p_without <- active_without/(n_events-sum(present))
# YOUR CODE BEGINS HERE
delta_p <- p_with - p_without # Placeholder; replace it with your own code.
# YOUR CODE ENDS HERE
delta_ps[i] = delta_p
if(is.nan(delta_p)) {
delta_ps[i] = 0.5
}
}
delta_ps
}
```
To be clear, $\Delta P$ values are not probabilities, but differences in probabilities that can be negative. Hence, we can't use these values as direct predictions of human judgments.
**Question 2 (3 Marks)**: Briefly explain when $\Delta P$ might be positive and when it might be negative, and how this is likely to relate to blicket judgments given the way the experiment was set up.
**Answer 2**: $\Delta P$ is likely to be positive, when the probability of the effect is higher with the cause present than when it is not. $\Delta P$ is likely to be negative when the probability of the effect is higher when the cause is not present than when it is.
In relation to the experiment this means that $\Delta P$ will be positive when number of times the blicket detector is activated when the object is present more than the number of times the blicket detector is not activated when the object is present. This wil be vice versa for when $\Delta P$ is negative.
In order to arrive at judgment probability predictions, we need to transform the $\Delta P$ values appropriately.
The function below includes a very naive version of this -- how might you improve it, based on what you found in Question 2?
**Question 3 (4 Marks)**: Define a function for converting $\Delta P$ values to predictions about human judgments.
Specifically, assuming that all people are the same, it should take a $\Delta P$ value for an object as an input, and output the probability that someone will judge that object to be a blicket.
If it has any free parameters, pick sensible values for them in the body of the function and write a comment justifying your choice(s).
```{r}
dps_to_predictions <- function(dp_vec) {
library(scales)
# YOUR CODE BEGINS HERE
# abs(dp_vec) # Placeholder; replace it with your own code.
dp_vec = rescale(dp_vec, to = c(0,1), from = c(-1,1))#scaled correlation delta p to probability of delta p
# YOUR CODE ENDS HERE
dp_vec
}
```
**Question 4 (3 Marks)**: Explain why you made the changes you did.
**Answer 4**: As $\Delta P$ will at most have a minimum value of -1 and a maximum value of 1, I have scaled the $\Delta P$ values to lie between 0 and 1 to hold the relationship between objects.
Let's look at our predictions, starting by defining helper functions to make it easier to see them:
```{r}
pp <- function(nums,cn='predictions') {
names <- c("A","B","C","D","E","F")[1:length(nums)]
ppdf <- data.frame(object_id=names)
ppdf[cn] <- nums
print(ppdf,digits=2)
}
plot_model_preds <- function(preds,model_name) {
obj_names = c("A","B","C","D","E","F")[1:length(preds)]
df <- data.frame(obj_names=obj_names,preds=preds)
p<-ggplot(data=df, aes(x=obj_names, y=preds)) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5)) +
geom_bar(stat="identity") + ylim(0,1) +
labs(y=paste(model_name,"probabilities",sep=" "),x="Object")
p
}
```
Let's see $\Delta P$'s predictions for objects in the DIS condition:
```{r, echo=FALSE}
dp_dis <- delta_P_values(all_objects,dis_activations,theta=c())
dp_preds_dis <- dps_to_predictions(dp_dis)
plot_model_preds(dp_preds_dis,"DeltaP") + ggtitle("Disjunctive condition")
pp(dp_dis,'deltaP')
```
and in the CON condition:
```{r, echo=FALSE}
dp_con <- delta_P_values(all_objects,con_activations,theta=c())
dp_preds_con <- dps_to_predictions(dp_con)
pp(dp_con,'deltaP')
plot_model_preds(dps_to_predictions(dp_con),"DeltaP") + ggtitle("Conjunctive condition")
```
In the next section, you will implement some alternative models, before comparing the different models to human judgments.
### Model 2: Rescorla-Wagner
Originally published in 1972 as a model of associative learning, the Rescorla-Wagner model has also been offered as an account of human causal learning.
We'll be following the notation for the RW model from [@danks2003equilibria], and use the word `cue` interchangeably with `object` in this section, following the history of RW as a model of associative reasoning.
It has three kinds of parameters (the number of parameters depends on the number of cue objects being studied):
- $\alpha_i$ is the salience of the $i^{th}$ cue object. We'll assume all alphas are the same ($\alpha$).
- $\beta_1$ is the salience or "strength" of the effect occurring, while $\beta_2$ is the salience of the effect not occurring. Let $\beta_1=\beta_2=\beta$, for simplicity.
- $\lambda$ is the maximum possible associative strength between any cause and the effect.
**Question 5 (8 Marks)**: Implement a function that computes associations for the Rescorla-Wagner (RW) model, by editing the code block below.
Let the first three elements of theta be $\alpha$, $\beta$, and $\lambda$, in that order, and assume that all associations start at zero.
Further, assume that the objects and activations were presented in the order they are saved as.
```{r}
rw_associations <- function(objs,activations,theta) {
alpha <- theta[1]
beta <- theta[2]
lambda <- theta[3]
n_events <- dim(objs)[1]
n_objects <- dim(objs)[2]
associations <- rep(0.0,n_objects)
# YOUR CODE BEGINS HERE
for(i in 1:n_events) {
compound = 0
for(j in 1:n_objects) #In the case where all associations are equal to 0
if (objs[i,j] == 1){
compound = compound + associations[j]
}
for(j in 1:n_objects)
if (objs[i,j] == 1){
associations[j] = associations[j] + alpha*beta*(activations[i] - compound)
}
}
# YOUR CODE ENDS HERE
associations
}
rw_associations(all_objects,con_activations,c(.5,.5,1.0))
```
**Question 6 (4 Marks)**: Implement a function that generates predictions for the Rescorla-Wagner (RW) model, by editing the code block below.
Your function should produce *judgment probabilities* so that its output can be plugged directly into your `human_judgment_scores` function.
Justify briefly how your predictions can be interpreted as probabilities.
**Answer 6**:
The associations for the RW model indicate how likely an object is to be a blicket or not a blicket between -1 and 1. Therefore I have rescaled the results to lie between 0 and 1 such that they become probabilities as to how likely an object is to be a blicket.
```{r}
rw_preds <- function(associations) {
## YOUR CODE BEGINS HERE
predictions <- rescale(associations, to = c(0,1), from = c(-1,1)) # Placeholder; replace it with your own code.
## YOUR CODE ENDS HERE
predictions
}
```
```{r}
rw_preds(rw_associations(all_objects,dis_activations,c(.5,.5,1.0)))
```
#### Visualizing predictions of the RW model
Let's visualize the RW model's predictions on the DIS data.
```{r, fig.width=3, fig.height=2}
rw_preds_dis = rw_preds(rw_associations(all_objects,dis_activations,c(.5,.5,1.0)))
rw_preds_con = rw_preds(rw_associations(all_objects,con_activations,c(.5,.5,1.0)))
plot_model_preds(rw_preds_dis,"RW") + ggtitle("Disjunctive condition")
plot_model_preds(rw_preds_con,"RW") + ggtitle("Conjunctive condition")
```
**Question 7 (4 Marks)**: Briefly (1-2 sentences) discuss a situation where the RW model makes predictions that differ systematically from human judgments. You may be able to find examples in the readings associated with this assignment.
**Answer 7**: The RW model fails to explain the backwards blocking phenomenon, for instance when two objects repeatedly turn the machine on, the model cannot create an association if one of the objects does not active the machine on its own. This differs as human judgement would learn that the object is indeed not a blicket.
### Model 3: Griffiths and Tenenbaum's strength and structure
The last model we will look at is "causal support" [@griffiths_structure_2005].
First, make sure you understand the description of the model in the paper, and particularly the Noisy-OR parameterisation.
The form of the model used here has two continuous parameters:
- The causal power of the $i^{th}$ cue object. We'll assume all objects have the same causal power ($w$s, in the paper, here referred to as `causalPower`).
- The prior probability of any object being a blicket.
We will use this model as an opportunity to use stan [@stan_paper].
This model has already been implemented for you in `causal_support.stan` and the code to process the stan output is also provided below, however you will need to answer questions about the code.
Stan is a probabilistic programming language equipped with an efficient Markov chain Monte Carlo sampler (among other things) for performing statistical inference.
If you are interested you can read more about Stan in the [user guide](https://mc-stan.org/docs/2_28/stan-users-guide/index.html).
Stan lets us write down our generative model and has functionality for efficiently sampling from the posterior over model parameters. Here, we are interested in the posterior probabilities of each object (A-F) being blickets, based on the events and activations observed in each condition.
Sampling discrete latent variables is not supported in Stan (for an explanation and worked examples, see [link1](https://statmodeling.stat.columbia.edu/2020/01/29/rao-blackwellization-and-discrete-parameters-in-stan/), [link2](https://mc-stan.org/docs/2_27/stan-users-guide/latent-discrete-chapter.html)), so we instead marginalise them out, and compute posteriors over the discrete latent variables afterwards.
In order to do this, we first compute all possible combinations of possible blickets, e.g., [0,0,0,0,0,0] for no blickets; [1,0,0,0,0,0] for `A` being the only blicket.
```{r}
combination_lists <- rep(list(0:1), dim(all_objects)[2])
blicket_combinations <- data.matrix(expand.grid(combination_lists))
```
In the Stan code chunks below, we then pass the matrix of combinations as a parameter, can compute the posterior over whether objects are blickets with a few lines of R code based on the R output.
**Question 8 (5 Marks) | Stretch**:
In line 51 of `causal_support.stan`, there is an additional background cause that is always present, no matter which objects are present and which objects are blickets:
```pAct = (1.0-(1.0-backgroundStrength)*pow(1.0-causalPower, liveBlickets[i]));```.
Why does the code include this additional background cause and what would happen if were to exclude this factor?
**Answer 8**: The code includes the additional background cause as as a constant in the case that if the prior probability is 0 such that calculated strength for an event is also 0, this prevents the case where the model will not learn from an event that it should be learning from.
**Question 9 (2 Marks)**: Provide at least one reason why using Stan for the Causal Support model might be helpful over just writing the generative model in R like we did for the other models? Your answer can be brief (1-2 sentences).
**Answer 9**: Stan is more quicker to run, complex models like this can be computationally intensive and to create it in R can be resource heavy. R is built in C++ and outputs in R so it it will run much quicker.
**Question 10 (6 Marks)**:
Briefly state two conceptual differences between $\Delta P$ and causal support (1-3 sentences per difference), and one implications of each difference (1-2 sentences per).
**Answer 10**:
The $\Delta P$ model is a measure of covariation and causal support is a measure an attempt to make $\Delta P$ assumptions explicit. Because of this the $\Delta P$ model is less consistent with human judgment than the causal support model. This is mainly due to the linear nature of the $\Delta P$ model.
The causal support model uses the Noisy-OR parameterization as to where the $\Delta P$ model does not. This allows for proper parameterization between causes and events allows for the causal support model to make better judgements than covariational model such as $\Delta P$. This means that in general the causal support model is more consistent with human intuition when it comes to making judgements.
## Posterior output from causal support model
Let's run our model.
```{r run_stan_func, eval=TRUE}
# This function may take a while to run.
cs_posterior <- function(objs, activations, blicket_combinations) {
input_data <- list(nEvents=dim(objs)[1], nObjects=dim(objs)[2],
activations = activations, allObjects=objs,
nCombinations = 2**dim(objs)[2],
allCombinations = blicket_combinations)
post_samples <- stan("causal_support.stan",
refresh = 0, # suppresses intermediate output
data = input_data,
iter=3000)
nps <- colMeans(extract(post_samples, permuted = TRUE)$np)
cs <- c()
for (b in 1:dim(objs)[2]){
blicket_comb <- blicket_combinations[,b]
np_select <- nps[as.logical(blicket_comb), drop=F]
cs <- c(cs, sum(np_select)/(sum(nps)))
}
cs
}
```
```{r}
cs_preds <- function(posterior) {
posterior
}
```
```{r run_stan_dis}
# Conjunctive case:
cs_preds_dis <- cs_preds(cs_posterior(all_objects,
dis_activations, blicket_combinations))
plot_model_preds(cs_preds_dis,"CS") + ggtitle("Disjunctive condition")
# Disjunctive case:
cs_preds_con <- cs_preds(cs_posterior(all_objects,
con_activations, blicket_combinations))
plot_model_preds(cs_preds_con,"CS") + ggtitle("Conjunctive condition")
```
**Question 11 (5 Marks)**: Notice that we are not converting model outputs to judgment probabilities. What data model are we implicitly using here? (Note that Where `data` refers to people's judgments here.) State at least one alternative data model to the one being used.
**Answer 11**:
Instead of outputting judgment probabilities the model is outputting log probabilities, this is the log-likelihood model. An alternative data model to use would be the Z-Score model.
## Part 2: Comparing models
Let's see how well our different models ($\Delta P$, RW, and Causal Support) predict human judgments.
### Comparing models using probabilities
The function below assumes that the columns in the data structure are judgments for objects A-F, in order, corresponding to the prediction vector.
```{r}
human_judgment_probs <- function(dataset,prediction_vector) {
joint_probability <- 1.0
for(i in 1:length(prediction_vector)) {
pColumn <- prod(dbinom(dataset[,i],1,prediction_vector[i]))
joint_probability <- joint_probability*pColumn
print(pColumn)
}
joint_probability
}
```
Here are the probabilities for the $\Delta P$ model:
```{r}
human_judgment_probs(human_dis_data,dps_to_predictions(dp_dis))
human_judgment_probs(human_con_data,dps_to_predictions(dp_con))
```
**Question 12 (5 Marks)** Is there anything undesirable or unexpected about these values and how they're being computed here?
How might you edit the function `human_judgment_scores` below so that model scores are still monotone functions of the total probability, while fixing some computational issues you might encounter with `human_judgment_probs`? Briefly explain what problem(s) your solution solves.
**Answer 12**:
The values calculated are lower than expected, indicating that the $\Delta P$ model does not perform well according to this metric. The method used to score the $\Delta P$ model combines the probability of every object being a blicket for the dataset used. There are better techniques for model comparison that would give us a score that would be a better metric on how well this data model performs.
```{r}
require(Math)
human_judgment_scores <- function(dataset,prediction_vector) {
joint_score <- 0.0
# YOUR CODE STARTS HERE
human_judgement_pred <- rep(0.0, length(dataset))
cross_entropy <- rep(0.0, length(dataset))
for(i in 1:length(dataset)){ #calculating the probability judgements for human judgement data set
prediction <- sum(dataset[,i]) / nrow(dataset)
human_judgement_pred[i] <- prediction
}
for(i in 1:length(human_judgement_pred)){ #calculating cross entropy for each object of each model
cross_entropy[i] <- human_judgement_pred[i] * log2(prediction_vector[i])
}
joint_score = -sum(cross_entropy)
joint_score
# YOUR CODE ENDS HERE
}
```
```{r}
human_judgment_scores(human_dis_data, dp_preds_dis)
human_judgment_scores(human_con_data, dp_preds_con)
```
Now that we have a way to score our different model predictions, we can make our first model comparison. Above, we always looked at the CON and DIS conditions separately.
We're now going to combine our predictions after we pass them through the `human_judgment_scores` function.
**Question 13 (4 Marks)**
In the code chunk below, combine the scores from both conditions appropriately, such that you get an aggregate scalar score for both the DIS and CON conditions together.
```{r comb_scores}
combine_scores <- function(data, predictions) {
scores <- mapply(human_judgment_scores, data, predictions)
# YOUR CODE STARTS HERE
mean(scores) # Placeholder; replace it with your own code.
# YOUR CODE ENDS HERE
}
(delta_p_score <- combine_scores(list(human_dis_data,
human_con_data),
list(dp_preds_dis,
dp_preds_con)))
(rw_score <- combine_scores(list(human_dis_data,
human_con_data),
list(rw_preds_dis,
rw_preds_con)))
(cs_score <- combine_scores(list(human_dis_data,
human_con_data),
list(cs_preds_dis,
cs_preds_con)))
```
**Question 14 (5 Marks)**: Suppose a colleague of yours argues that the Stan code above involved fitting parameters to data. What would you reply to your colleague? State whether or not the Stan code above fitted any parameters for the Causal Support model and make sure to justify your answer.
**Answer 14**:
In response to a colleague, I would state that the model does not fit parameters, all parameters used in the causal_support.stan file are constants. The stan code above does not fit any parameters all the parameters used are predefined constants. For the model used there are 3 main parameters, backgroundStrength, causalPower and pBlicket; all of which do not change once the model has been called.
**Question 15 (5 Marks)**: Report the rank-ordering of models from the one with the best predictions to the one with the worst predictive performance. Briefly justify how you compared the different models.
**Answer 15**:
The models performance from best to worse is, Causal Support, Delta P and then Rescorla Wagner. I used cross entropy to compare the probability models, this allowed me to compare the models to the real human judgements. The higher the score the greater the uncertainty of the models predictions, This means that the lower the score the better the predictive power of the model.
## Part 3: Fitting and interpreting parameters
In this section, we're going to illustrate how we may optimise model parameters on data we have observed.
We already implemented all of our models and we have a way to score our models with the `human_judgment_scores` function.
Now we need to decide which parameters we are going to fit.
We are going to fit our parameters on both conditions simultaneously, and we are going to use the Nelder-Mead Simplex optimiser.
**Question 16A (5 Marks)**:
Complete the code chunk below to optimise parameters of the RW model. Make sure you understand how the `optim` function works. You need to optimise at least one parameter, but make sure to justify your choices for parameters to optimise and the fixed values for those you do not optimise. Report the optimal parameters you found as well as as your overall score when you use the optimised parameter, and note anything you might do differently or additionally if you were serious about trying to find a global optimum.
```{r}
rw_eval <- function(theta) {
# YOUR CODE STARTS HERE
data <- list(human_dis_data, human_con_data)
predictions <- list(
rw_preds(rw_associations(all_objects,dis_activations,c( theta[1], theta[2], 1 ))),
rw_preds(rw_associations(all_objects,con_activations,c( theta[1], theta[2], 1 )))
) # Placeholder; replace/edit with your own code.
combine_scores(data,predictions)
}
(res <- optim(c(0.5, 0.5), rw_eval, method = "Nelder-Mead")) # Placeholder; replace/edit with your own code.
# YOUR CODE ENDS HERE
opt_params <- res$par
```
**Answer 16A**:
For this question I decided to optimise the alpha and the beta parameters for the RW model. I chose these two parameters as they are weights for the salience of the object and the event occurring. I did not choose the lambda parameter as it is dependent on the outcome of the event and changes through iterations. The optimal parameters found for alpha and beta using the optim parameters are 0.7370908 and 0.7056503 respectively. The overall score for the model is 1.113281. I would have used a different optimisation method other than "Nelder - Mead" as there are better alternatives to use that would suit this problem.
**Question 16B (3 Marks)**:
Report the new rank ordering of the three models, now with fitted parameters for the RW model.
Has the ordering changed, or is it still the same as above without fitting any of the RW model parameters?
```{r}
# YOUR CODE STARTS HERE
rw_preds_dis_fit = rw_preds(rw_associations(all_objects,dis_activations,c(opt_params[1],opt_params[2],1.0))) # Placeholder; replace/edit with your own code.
rw_preds_con_fit = rw_preds(rw_associations(all_objects,con_activations,c(opt_params[1],opt_params[2],1.0))) # Placeholder; replace/edit with your own code.
# YOUR CODE ENDS HERE
(delta_p_score <- combine_scores(list(human_dis_data,
human_con_data),
list(dp_preds_dis,
dp_preds_con)))
(rw_score_fit <- combine_scores(list(human_dis_data,
human_con_data),
list(rw_preds_dis_fit,
rw_preds_con_fit)))
(cs_score <- combine_scores(list(human_dis_data,
human_con_data),
list(cs_preds_dis,
cs_preds_con)))
```
**Answer 16B**:
The ranking for the 3 models has now changed. The ranking is now as follows from best to worst, Causal Support, Rescorla Wagner and then Delta P.
We now fitted our parameters on all data simultaneously. This might lead to problems, as we may just overfit to our data and have no way of knowing whether our RW model with fitted parameters would generalise well to unseen data based on the fitted scores alone.
One option is to penalise our fitted scores. In that case, we could use an information criterion (like BIC) to penalise our model scores.
**Question 17 (4 Marks)**:
Explain how you would compute the complexity penalty for your fitted RW model. You can explain it in words, maths or code (R or pseudocode), as long as you make sure that your explanation would allow a colleague to implement and apply a solution in code. If you declare any variables, make sure to define them.
Another option for assessing generalisation performance would be to compute scores on a held-out test set. Here, we need to decide how we partition the data into training and test sets, such that can make claims about our model's ability to generalise to unseen data.
**Answer 17**:
To calculate a complexity penalty for the fitted RW model I would use BIC to penalise more complex models, for both BIC and log likelihood are better therefore adding scores together will penalise more complex models.
```{r}
BIC <- function(dataset, predictions, nParameters){
N = nrow(dataset) #number of observations
k = nParameters #number of parameters
LL <- rep(0.0, length(predictions)) #log likelihoods
vPredictions = unlist(predictions) #vector of predictions
for(i in 1:length(vPredictions)){
LL[i] <- vPredictions[i]^2
}
LL = prod(LL)
bic = -2*LL + log(N) * k
bic
}
combine_scores_complex <- function(data, predictions, nParameters){
score <- combine_scores(data, predictions)
BIC = rep(0.0, length(predictions))
for (i in 1:length(predictions)){
BIC[i] = BIC(data[[i]], predictions[[i]], nParameters)
}
BIC = mean(BIC)
penalised_score = score + BIC
penalised_score
}
(delta_p_score <- combine_scores_complex(list(human_dis_data,
human_con_data),
list(dp_preds_dis,
dp_preds_con), 3))
(rw_score_fit <- combine_scores_complex(list(human_dis_data,
human_con_data),
list(rw_preds_dis_fit,
rw_preds_con_fit), 3))
(cs_score <- combine_scores_complex(list(human_dis_data,
human_con_data),
list(cs_preds_dis,
cs_preds_con), 2))
delta_p_score
rw_score_fit
cs_score
```
**Question 18 (4 Marks)**
How would you partition the data into training and test sets, i.e., what should we generalise to: Unseen participants; unseen objects; unseen experimental conditions (or something else)? Describe your approach and make sure to justify your answer and state possible limitations.
**Answer 18**:
For this experiment it makes the most sense to test using unseen participants. The test data used to test how similar the models guesses where to the participants guesses. A limitation of this is that human judgement is very unpredictable, meaning that small datasets can wield very different results each time they are used.
## Part 4: Extensions
**Question 19 (10 Marks) | Stretch** Can you think of any extensions or new approaches for modelling human causal reasoning that go beyond the models we have considered in this course? Describe your proposed approach, briefly explain how your proposal would apply to the blicket experiment and what benefits it would have. It is fine to take inspiration from the literature as long as you reference it appropriately. However, please don't base your proposal on any of the course (co-)organisers' publications. *Do not write more than a page for this question.*
**Answer 19**:
An extension of this experiment would be to use the models to make judgements on unseen objects based on events. I would use binary probabilistic models that and train them using the training data that we have. I would then run the models on unseen objects, events and activations and have then judge which object was the blicket. From this I would compare it to a large dataset of human judgement such that we have a good estimation of the average human judgement.
Other models I would consider using alongside the ones used previously are logit, MLE and naive Bayes. I would then score these using the methods I have including using the complexity penalty. The benefit we would have is we would be able to directly compare how close the models are to real human judgement. By using unseen objects we can then see how well the models perfrom at causal reasoning in comparison to humans.
**Question 20 (8 Marks) | Stretch**: We have focused on comparing a fixed set of well-known models. What are some limitations of this approach, or ways it could lead an incautious scientist or audience to incorrect conclusions? For full credit, your answer will go beyond the scope of what has been discussed in lecture and reveal insight and/or deeper familiarity with the scientific literature.
**Answer 20**:
The limitation of using well known models is that readers assume that the models used where implemented correctly due to the volume of resources available. It can also lead to an overconfidence in the models used by the researchers as they may assume that the models work correctly and they may not question the results of the experiment.