-
Notifications
You must be signed in to change notification settings - Fork 0
/
09-rsm-3-fits.Rmd
686 lines (424 loc) · 31 KB
/
09-rsm-3-fits.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
# Three proper ROC fits {#rsm-3-fits}
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(kableExtra)
library(seqinr)
library(RJafroc)
library(ggplot2)
library(gridExtra)
library(grid)
library(binom)
library(foreach)
library(doRNG)
library(doParallel)
```
## How much finished 99% {#rsm-3-fits-how-much-finished}
## TBA Introduction {#rsm-3-fits-intro}
A proper ROC curve is one whose slope decreases monotonically as the operating point moves up the curve, a consequence of which is that a proper ROC does not display a chance line crossing followed by a sharp upward turn, i.e., a "hook", usually near the (1,1) upper right corner.
There are three currently available methods for fitting proper curves to ROC datasets:
* The PROPROC (proper ROC) model described in Chapter \@ref(proper-roc-models).
* The CBM (contaminated binormal model) also described in Chapter \@ref(proper-roc-models).
* The RSM (radiological search model) described in Chapter \@ref(rsm-fitting),
This chapter compares these methods by fitting them to 14 datasets described in Chapter \@ref(datasets). Comparing the RSM to the binormal model would be inappropriate as the latter does not predict proper ROCs.
The motivation for this work was a serendipitous finding [@chakraborty2011estimating] that PROPROC fitted ROC AUCs and RSM fitted ROC AUCs were identical for some datasets. This led to extending that work to include CBM-fitting and more datasets.
## Application to datasets {#rsm-3-fits-applications}
Both RSM and CBM are implemented in R-package `RJafroc` [@R-RJafroc]. `PROPROC` is implemented in Windows software OR DBM-MRMC 2.5 ^[Sept. 04, 2014; the version used in this chapter is no longer distributed.] that was available [here, last accessed 1/4/21](https://perception.lab.uiowa.edu/software). The pre-analyzed PROPROC results-file locations are shown in Appendix \@ref(rsm-3-fits-one-dataset-proproc).
The RSM, PROPROC and CBM algorithms were applied to datasets described in Chapter \@ref(datasets). These are named as follows:
```{r, echo = TRUE}
datasetNames <-
c("TONY", "VD", "FR",
"FED", "JT", "MAG",
"OPT", "PEN", "NICO",
"RUS", "DOB1", "DOB2",
"DOB3", "FZR")
```
The `datasetNames` array contains abbreviations for the contributors: "Dr. Tony Svahn", "Dr. Van Dyke", "Dr. Franken", "Dr. Federica Zanca", "Dr. John Thompson", "Dr. Magnus Bath", "Dr. Lucy Warren", "Dr. Monica Penedo", "Dr. Nico Karssemeijer", "Dr. Mark Ruschin", "Dr. James Dobbins-1", "Dr. James Dobbins-2", "Dr. James Dobbins-3", "Dr. Federica Zanc a real ROC". These are included with the `RJafroc` package and the corresponding objects are named `datasetXX`, where XX is an integer ranging from 1 to 14.
In the following we focus, for now, on just two ROC datasets: the Dr. Van Dyke (VD) and the Dr. E. Franken (FR) datasets. Fits are shown for treatment 1 and reader 2 for the Van Dyke dataset and for treatment 2 and reader 3 for the Franken dataset. Plots for all treatment-reader combinations for these two datasets are in Appendix \@ref(rsm-3-fits-all-plots-van-dyke) for the Van Dyke dataset and Appendix \@ref(rsm-3-fits-representative-plots-franken) for the Franken dataset.
```{r rsm-3-fits-code, echo=FALSE}
source(here::here("R/compare-3-fits/Compare3ProperRocFits.R"))
```
```{r rsm-3-fits-code-f2, echo=TRUE}
# VD dataset
ret <- Compare3ProperRocFits(
datasetNames,
which(datasetNames == "VD"))
resultsVD <- ret$allResults
plotsVD <- ret$allPlots
# FR dataset
ret <- Compare3ProperRocFits(
datasetNames,
which(datasetNames == "FR"))
resultsFR <- ret$allResults
plotsFR <- ret$allPlots
```
* The supporting code is in function `Compare3ProperRocFits()` located at `R/compare-3-fits/`.
* The results file locations are shown in Section \@ref(rsm-3-fits-pre-analyzed-results). Since the ML algorithm is time consuming, the cited locations contain pre-analyzed results.
* The fitted parameters are contained in `resultsVD` and `resultsFR`; the composite plots (i.e., 3 overlaid plots corresponding to the three proper ROC fitting algorithms) for each treatment and reader are contained in `plotsVD` and `plotsFR`.
## Composite plots {#rsm-3-fits-composite-plots}
* The Van Dyke dataset yields $I \times J = 2 \times 5 = 10$ composite plots.
* The Franken dataset yields $I \times J = 2 \times 4 = 8$ composite plots.
The following code shows how to display the composite plot for the Van Dyke dataset (labeled `D2` in the plot title as it is the second dataset in `datasetNames`) for treatment 1 and reader 2, i.e., i = 1 and j = 2.
```{r rsm-3-fits-plots-vd-12, fig.cap="Composite plots for Van Dyke dataset for treatment = 1, reader 2.", fig.show='hold', echo=TRUE}
# i = 1 and j = 2
plotsVD[[1,2]]
```
It contains 3 fitted curves:
* The RSM fitted curve is in black -- this is the only fit that includes a dashed line extending to (1,1).
* The PROPROC fitted curve is in red.
* The CBM fitted curve is in blue.
Three operating points from the binned data are shown as well as exact 95% confidence intervals for the lowest and uppermost operating points.
The next example shows composite plots for the Franken dataset (labeled `D3`) for treatment = 2 and reader = 3.
```{r rsm-3-fits-plots-fr-23, fig.cap="Composite plots for Franken dataset for treatment = 2, reader 3.", fig.show='hold', echo=TRUE}
# i = 2 and j = 3
plotsFR[[2,3]]
```
<!-- Note that the RSM end-point is almost at the upper right corner, implying lower lesion-localization performance. -->
## Accessing RSM parameters VD dataset {#rsm-3-fits-rsm-parameters}
The parameters corresponding to the RSM plots for the Van Dyke dataset are accessed as follows:
* `resultsVD[[i,j]]$retRsm$mu` is the RSM $\mu$ parameter for the Van Dyke dataset for treatment i and reader j;
* `resultsVD[[i,j]]$retRsm$lambda` is the RSM $\lambda$ parameter;
* `resultsVD[[i,j]]$retRsm$nu` is the RSM $\nu$ parameter;
* `resultsVD[[i,j]]$retRsm$zeta1` is the RSM $\zeta_1$ parameter.
For the Franken dataset one replaces `resultsVD[[i,j]]` with `resultsFR[[i,j]]`.
### RSM parameters Van Dyke dataset i = 1, j= 2
The following displays RSM parameters for the Van Dyke dataset, treatment 1 and reader 2:
```{r, echo=FALSE}
cat("RSM parameters, Van Dyke Dataset, i=1, j=2:",
"\nmu = ", resultsVD[[1,2]]$retRsm$mu,
"\nlambda = ", resultsVD[[1,2]]$retRsm$lambda,
"\nnu = ", resultsVD[[1,2]]$retRsm$nu,
"\nzeta_1 = ", as.numeric(resultsVD[[1,2]]$retRsm$zetas[1]),
"\nAUC = ", resultsVD[[1,2]]$retRsm$AUC,
"\nsigma_AUC = ", as.numeric(resultsVD[[1,2]]$retRsm$StdAUC),
"\nNLLini = ", resultsVD[[1,2]]$retRsm$NLLIni,
"\nNLLfin = ", resultsVD[[1,2]]$retRsm$NLLFin)
```
<!-- From the previous chapter the RSM parameters can be used to calculate lesion-localization and lesion-classification performances, namely $L_L$ and $L_C$ respectively. The following function calculates these values: -->
<!-- ```{r echo=TRUE} -->
<!-- LesionLocLesionCls <- function(mu, lambda, nu, lesDistr) { -->
<!-- temp <- 0 -->
<!-- for (L in 1:length(lesDistr)) { -->
<!-- temp <- temp + lesDistr[L] * (1 - nu)^L -->
<!-- } -->
<!-- L_L <- exp(-lambda) * (1 - temp) -->
<!-- L_C <- pnorm(mu/sqrt(2)) -->
<!-- return(list( -->
<!-- L_L = L_L, -->
<!-- L_C = L_C -->
<!-- )) -->
<!-- } -->
<!-- ``` -->
<!-- The function is used as follows: -->
<!-- ```{r echo=TRUE} -->
<!-- mu <- resultsVD[[1,2]]$retRsm$mu -->
<!-- lambda <- resultsVD[[1,2]]$retRsm$lambda -->
<!-- nu <- resultsVD[[1,2]]$retRsm$nu -->
<!-- f <- which(datasetNames == "VD") -->
<!-- fileName <- datasetNames[f] -->
<!-- theData <- get(sprintf("dataset%02d", f)) # the datasets already exist as R objects -->
<!-- lesDistr <- UtilLesDistr(theData) # RSM ROC fitting needs to know lesDistr -->
<!-- ret <- LesionLocLesionCls(mu, lambda, nu, lesDistr) -->
<!-- L_L <- ret$L_L -->
<!-- L_C <- ret$L_C -->
<!-- cat(sprintf("VD data i=1 j=2: L_L = %7.3f, L_C = %7.3f", L_L, L_C), "\n") -->
<!-- ``` -->
### RSM parameters Franken dataset i = 2, j= 3
Displayed next are RSM parameters for the Franken dataset, treatment 2 and reader 3:
```{r, echo=FALSE}
cat("RSM parameters, Franken dataset, i=2, j=3:",
"\nmu = ", resultsFR[[2,3]]$retRsm$mu,
"\nlambda = ", resultsFR[[2,3]]$retRsm$lambda,
"\nnu = ", resultsFR[[2,3]]$retRsm$nu,
"\nzeta_1 = ", as.numeric(resultsFR[[2,3]]$retRsm$zetas[1]),
"\nAUC = ", resultsFR[[2,3]]$retRsm$AUC,
"\nsigma_AUC = ", as.numeric(resultsFR[[2,3]]$retRsm$StdAUC),
"\nNLLini = ", resultsFR[[2,3]]$retRsm$NLLIni,
"\nNLLfin = ", resultsFR[[2,3]]$retRsm$NLLFin)
```
<!-- Shown next are the lesion-localization and lesion-classification performances for this dataset. -->
<!-- ```{r echo=FALSE} -->
<!-- mu <- resultsFR[[2,3]]$retRsm$mu -->
<!-- lambda <- resultsFR[[2,3]]$retRsm$lambda -->
<!-- nu <- resultsFR[[2,3]]$retRsm$nu -->
<!-- f <- which(datasetNames == "FR") -->
<!-- fileName <- datasetNames[f] -->
<!-- theData <- get(sprintf("dataset%02d", f)) # the datasets already exist as R objects -->
<!-- lesDistr <- UtilLesDistr(theData) # RSM ROC fitting needs to know lesDistr -->
<!-- ret <- LesionLocLesionCls(mu, lambda, nu, lesDistr) -->
<!-- L_L <- ret$L_L -->
<!-- L_C <- ret$L_C -->
<!-- cat(sprintf("FR data i=2, j=3: L_L = %7.3f, L_C = %7.3f", L_L, L_C), "\n") -->
<!-- ``` -->
<!-- While the lesion-classification performances are similar for the two examples, the lesion-localization performances are different, with the Van Dyke dataset showing a greater value. This is evident from the location of the end-points in the two plots shown in Fig. \@ref(fig:rsm-3-fits-plots-vd-12) and Fig. \@ref(fig:rsm-3-fits-plots-fr-23) (an end-point closer to the top-left corner implies greater lesion-localization performance). -->
## Accessing CBM parameters VD dataset {#rsm-3-fits-cbm-parameters}
The parameters of the CBM plots are accessed as follows:
* `resultsVD[[i,j]]$retCbm$mu` is the CBM $\mu$ parameter for treatment i and reader j;
* `resultsVD[[i,j]]$retCbm$alpha` is the CBM $\alpha$ parameter;
* `as.numeric(resultsVD[[i,j]]$retCbm$zetas[1])` is the CBM $\zeta_1$ parameter, the threshold corresponding to the highest non-trivial operating point;
* `resultsVD[[i,j]]$retCbm$AUC` is the CBM AUC;
* `as.numeric(resultsVD[[i,j]]$retCbm$StdAUC)` is the standard deviation of the CBM AUC;
* `resultsVD[[i,j]]$retCbm$NLLIni` is the initial value of negative log-likelihood;
* `resultsVD[[i,j]]$retCbm$NLLFin)` is the final value of negative log-likelihood.
As before, for the Franken dataset one replaces `resultsVD[[i,j]]` with `resultsFR[[i,j]]`.
The next example displays CBM parameters and AUC etc. for the Van Dyke dataset, treatment 1 and reader 2:
```{r, echo=FALSE}
i <- 1; j <- 2
cat("CBM parameters, Van Dyke Dataset, i=1, j=2:",
"\nmu = ", resultsVD[[i,j]]$retCbm$mu,
"\nalpha = ", resultsVD[[i,j]]$retCbm$alpha,
"\nzeta_1 = ", as.numeric(resultsVD[[i,j]]$retCbm$zetas[1]),
"\nAUC = ", resultsVD[[i,j]]$retCbm$AUC,
"\nsigma_AUC = ", as.numeric(resultsVD[[i,j]]$retCbm$StdAUC),
"\nNLLini = ", resultsVD[[i,j]]$retCbm$NLLIni,
"\nNLLfin = ", resultsVD[[i,j]]$retCbm$NLLFin)
```
The next example displays CBM parameters for the Franken dataset, treatment 2 and reader 3:
```{r, echo=FALSE}
cat("CBM parameters, Franken dataset, i=2, j=3:",
"\nmu = ", resultsFR[[i,j]]$retCbm$mu,
"\nalpha = ", resultsFR[[i,j]]$retCbm$alpha,
"\nzeta_1 = ", as.numeric(resultsFR[[i,j]]$retCbm$zetas[1]),
"\nAUC = ", resultsFR[[i,j]]$retCbm$AUC,
"\nsigma_AUC = ", as.numeric(resultsFR[[i,j]]$retCbm$StdAUC),
"\nNLLini = ", resultsFR[[i,j]]$retCbm$NLLIni,
"\nNLLfin = ", resultsFR[[i,j]]$retCbm$NLLFin)
```
The first three values are the fitted values for the CBM parameters $\mu$, $\alpha$ and $\zeta_1$. The next value is the AUC under the fitted CBM curve followed by its standard error. The last two values are the initial and final values of negative log-likelihood.
## PROPROC parameters {#rsm-3-fits-proproc-parameters}
For the VD dataset the `PROPROC` displayed parameters are accessed as follows:
* `resultsVD[[i,j]]$c1` is the PROPROC $c$ parameter for treatment i and reader j;
* `resultsVD[[i,j]]$da` is the PROPROC $d_a$ parameter;
* `resultsVD[[i,j]]$aucProp` is the PROPROC AUC;
Other statistics, such as standard error of AUC, are not provided by PROPROC software.
The next example displays PROPROC parameters for the Van Dyke dataset, treatment 1 and reader 2:
```{r, echo=FALSE}
cat("PROPROC parameters, Van Dyke Dataset, i=1, j=2:",
"\nc = ", resultsVD[[1,2]]$c1,
"\nd_a = ", resultsVD[[1,2]]$da,
"\nAUC = ", resultsVD[[1,2]]$aucProp)
```
The values are identical to those listed for treatment 1 and reader 2 in Fig. \@ref(fig:rsm-3-fits-proproc-output-van-dyke).
The next example displays PROPROC parameters for the Franken dataset, treatment 2 and reader 3:
```{r, echo=FALSE}
cat("PROPROC parameters, Franken dataset, i=2, j=3:",
"\nc = ", resultsVD[[2,3]]$c1,
"\nd_a = ", resultsVD[[2,3]]$da,
"\nAUC = ", resultsVD[[2,3]]$aucProp)
```
The next section provides an overview of the most salient findings from analyzing the datasets.
## Overview of findings {#rsm-3-fits-overview}
With 14 datasets the total number of individual modality-reader combinations is 236 to *each* of which the three fitting algorithms were applied. It is easy to be overwhelmed by the numbers so this section summarizes an important conclusion:
> The three fitting algorithms are consistent with a single algorithm-independent AUC.
If the AUCs of the three methods are identical the following relations hold with each slope $\text{m}_{PR}$ and $\text{m}_{CR}$ equal to unity:
\begin{equation}
\left.
\begin{aligned}
\text{AUC}_{\text{PRO}} =& \text{m}_{PR} \text{AUC}_{\text{RSM}} \\
\text{AUC}_{\text{CBM}} =& \text{m}_{CR} \text{AUC}_{\text{RSM}}
\end{aligned}
\right \}
(\#eq:rsm-3-fits-slopes-equation1)
\end{equation}
The abbreviations are as follows ($\text{AUC}_\text{PRO}$ = PROPROC AUC, $\text{AUC}_\text{CBM}$ = CBM AUC, $\text{AUC}_\text{RSM}$ = RSM AUC):
* PR = PROPROC vs. RSM slope;
* CR = CBM vs. RSM slope.
For each dataset the plot of PROPROC AUC vs. RSM AUC should be linear with zero intercept and slope $\text{m}_{PR}$, and likewise for the plots of CBM AUC vs. RSM AUC. The reason for the *zero intercept* is that if the AUCs are identical one cannot have an offset (i.e., intercept) term.
### Slopes {#rsm-3-fits-slopes}
* Denote PROPROC AUC for dataset $f$, where $f=1,2,...,14$, treatment $i$ and reader $j$ by $\text{AUC}^{\text{PRO}}_{fij}$. The corresponding RSM and CBM values are denoted by $\text{AUC}^{\text{RSM}}_{fij}$ and $\text{AUC}^{\text{CBM}}_{fij}$, respectively.
* For a given dataset the slope of the PROPROC AUC values vs. the RSM values is denoted $\text{m}_{\text{PR},f}$ and the grand average over all datasets is denoted $\text{m}^{\text{PR}}_\bullet$.
* Likewise, the slope of the CBM AUC values vs. the RSM values is denoted $\text{m}_{\text{CR},f}$ and the grand average is denoted $\text{m}^{\text{CR}}_\bullet$.
A bootstrap analysis was conducted to determine, for each dataset, the slope of the constrained line fit (over all treatments and readers) and the corresponding confidence intervals. The code for calculating the slopes is in `R/compare-3-fits/slopesConvVsRsm.R` and that for the bootstrap confidence intervals is in `R/compare-3-fits/slopesAucsConvVsRsmCI.R`.
```{r rsm-3-fits-confidence-intervals, echo=TRUE}
source(here::here("R/compare-3-fits/loadDataFile.R"))
source(here::here("R/compare-3-fits/slopesConvVsRsm.R"))
source(here::here("R/compare-3-fits/slopesAucsConvVsRsmCI.R"))
ret <- slopesConvVsRsm(datasetNames)
slopeCI <- slopesAucsConvVsRsmCI(datasetNames)
```
The call to function `slopesConvVsRsm()` returns `ret`, which contains, for each of 14 datasets, four `lists`: two plots and two slopes. For example:
* PRO vs. RSM: `ret$p1[[2]]` is the slope plot for $\text{AUC}^{\text{PRO}}_2$ vs. $\text{AUC}^\text{RSM}_2$ for all treatments and readers in the Van Dyke dataset (f = 2).
* CBM vs. RSM: `ret$p2[[2]]` is the slope plot for $\text{AUC}^{\text{CBM}}_2$ vs. $\text{AUC}^\text{RSM}_2$ for for all treatments and readers in the Van Dyke dataset (f = 2).
* PRO vs. RSM: `ret$m_pro_rsm` has two length 14 columns: the slopes $\text{m}_{\text{PR},f}$ for the constrained linear fits of the PROPROC vs. RSM AUC values for each dataset and the corresponding $R^2$ values, where $R^2$ is the fraction of variance explained by the fit. The first column is `ret$m_pro_rsm[[1]]` and the second column is `ret$m_pro_rsm[[2]]`.
* CBM vs. RSM: `ret$m_cbm_rsm` has two two length 14 columns: the slopes $\text{m}_{\text{CR},f}$ for the constrained linear fits of the CBM vs. RSM AUC values and the corresponding $R^2$ values.
As an example, for the Van Dyke dataset, `ret$p1[[2]]` which is shown in the left in Fig. \@ref(fig:rsm-3-fits-plots-2), is the plot of $\text{AUC}^{\text{PRO}}_2$ vs. $\text{AUC}^\text{RSM}_2$. Shown in the right is `ret$p2[[2]]`, the plot of $\text{AUC}^{\text{CBM}}_2$ vs. $\text{AUC}^\text{RSM}_2$. Each plot has the zero-intercept linear fit superposed on the $i\times j = 10$ points where each point represents a distinct modality-reader combination.
```{r rsm-3-fits-plots-2, fig.cap="Van Dyke dataset: Left plot is PROPROC-AUC vs. RSM-AUC with the superposed zero-intercept linear fit. The number of data points is `nPts` = 10. Right plot is CBM-AUC vs. RSM-AUC.", fig.show='hold', echo=FALSE}
grid.arrange(ret$p1[[2]], ret$p2[[2]], ncol = 2)
```
The next plot shows corresponding plots for the Franken dataset in which there are $2\times 4 = 8$ points in each plot. The left plot results from `ret$p1[[3]]` (since f = 3 for this dataset) and the right plot from `ret$p2[[3]]`.
```{r rsm-3-fits-plots-3, fig.cap="Similar to previous plot, for Franken dataset.", fig.show='hold', echo=FALSE}
grid.arrange(ret$p1[[3]], ret$p2[[3]], ncol = 2)
```
### Confidence intervals {#rsm-3-fits-confidence-intervals}
The call to `slopesAucsConvVsRsmCI` returns `slopeCI`, containing the results of the bootstrap analysis (the bullet symbols $\bullet$ denote grand averages over 14 datasets):
* `slopeCI$cislopeProRsm` 95-percent confidence interval for $\text{m}_{\text{PR} \bullet}$
* `slopeCI$cislopeCbmRsm` 95-percent confidence interval for $\text{m}_{\text{CR} \bullet}$
* `slopeCI$histSlopeProRsm` histogram of 200 bootstrap values of $\text{m}_{\text{PR} \bullet}$
* `slopeCI$histSlopeCbmRsm` histogram of 200 bootstrap values of $\text{m}_{\text{CR} \bullet}$
* `slopeCI$ciAvgAucRsm` confidence interval from 200 bootstrap values of $\text{AUC}^{\text{RSM}}_\bullet$
* `slopeCI$ciAvgAucPro` confidence interval for 200 bootstrap values of $\text{AUC}^{\text{PRO}}_\bullet$
* `slopeCI$ciAvgAucCbm` confidence interval for 200 bootstrap values of $\text{AUC}^{\text{CBM}}_\bullet$
As an example the following code displays in the first column the bootstrap CI for $\text{m}_{\text{PR} \bullet}$ (the slope, averaged over 14 datasets, of the PROPROC vs. RSM AUC values) and in the second column the CI for $\text{m}_{\text{CR} \bullet}$ (the averaged slope of the CBM vs. RSM AUC values):
```{r, echo=TRUE}
df <- as.data.frame(slopeCI$cislopeProRsm)
df <- cbind(df, slopeCI$cislopeCbmRsm)
colnames(df) <- c("m-PR", "m-CR")
print(df)
```
The CI for $\text{m}_{\text{PR} \bullet}$ is slightly above unity, while that for $\text{m}_{\text{CR} \bullet}$ is slightly below. Note how close the CIs are to unity, implying $\text{m}_{\text{PR}} = \text{m}_{\text{CR}} = 1$.
Shown next is the histogram of 200 bootstrap value of $\text{m}_{\text{PR} \bullet}$ (left plot) and $\text{m}_{\text{CR} \bullet}$ (right plot). These histograms were used to compute the previously cited confidence intervals.
```{r rsm-3-fits-histo-slopes, fig.cap="Histograms of slope PROPROC AUC vs. RSM AUC (left) and slope CBM AUC vs. RSM AUC (right).", fig.show='hold', echo=FALSE}
p1 <- slopeCI$histSlopeProRsm
p2 <- slopeCI$histSlopeCbmRsm
gA <- ggplot2::ggplotGrob(p1)
gB <- ggplot2::ggplotGrob(p2)
grid.arrange(p1,p2, ncol=2, widths = c(1,1),
heights=unit(c(4), c("in")))
```
### Summary of slopes and confidence intervals {#rsm-3-fits-slopes-confidence-intervals-summary}
Table \@ref(tab:rsm-3-fits-slopes-table1) summarizes the slopes and $R^2$ values for all 14 datasets.
```{r rsm-3-fits-confidence-intervals3, cache=FALSE, echo=FALSE}
x <- cbind(ret$m_pro_rsm, ret$m_cbm_rsm)
x <- rbind(x, apply(x,2, mean))
x <- round(x, digits = 4)
x <- rbind(x[1:14, ], rep("\n", 4), x[15:nrow(x), ])
z1 <- format(slopeCI$cislopeProRsm, digits = 4)
z2 <- format(slopeCI$cislopeCbmRsm, digits = 3)
x <- rbind(x,
c(paste0("(", z1[1], ", ", z1[2], ")"), # for seed = 1
NA,
paste0("(", z2[1], ", ", z2[2], ")"), # for seed = 1
NA))
row.names(x) <- c(datasetNames, "\n", "AVG", "CI")
colnames(x) <- c("$\\text{m}_{PR}$", "$R^2_{PR}$", "$\\text{m}_{CR}$", "$R^2_{CR}$")
```
```{r rsm-3-fits-slopes-table1, echo=FALSE}
kbl(x, caption = "Summary of slopes and correlations for the two zero-intercept fits: PROPROC AUC vs. RSM AUC and CBM AUC vs. RSM AUC. The average of each slope equals unity to within 0.6 percent.", booktabs = TRUE, escape = FALSE) %>% kable_styling(latex_options = c("basic", "scale_down", "HOLD_position"), row_label_position = "c")
```
The second column, labeled $\text{m}_{PR}$, lists the slopes of the straight line zero-intercept fits to PROPROC vs. RSM AUC values, for each of the 14 datasets, as labeled in the first column. The third column, labeled $R^2_{PR}$, lists the square of the correlation coefficient for each straight line zero-intercept fit.
The fourth and fifth columns list the corresponding values for the CBM AUC vs. RSM AUC fits.
The second last row lists the grand averages (AVG) and the last row lists the 95 percent confidence intervals.
## Reason for equal AUCs / Summary {#rsm-3-fits-discussion-summary}
> We find that all three proper ROC methods yield almost the same AUC. The reason is that the proper ROC is a consequence of using a decision variable that is equivalent (in the arbitrary monotonic increasing transformations sense, see below) to a likelihood ratio. Proper ROC fitting is discussed in Chapter \@ref(proper-roc-models). [@barrett2013foundations] show that an observer who uses the likelihood ratio, or any monotone increasing transformation of it, as the decision variable, has optimal performance, i.e., maximum ROC-AUC. An observer using the likelihood ratio as the decision variable is an ideal observer (Section 13.2.6 ibid). However, different ideal observers must yield the same AUC as otherwise one observer would be "more ideal" than another. Different models of fitting proper ROCs represent different approaches to modeling the decision variable and the likelihood ratio, but while the curves can have different shapes (the slope of the ROC curve at a given point equals the likelihood ratio calculated at that point) their AUCs must agree. This explains the empirical observation of this chapter that RSM, PROPROC and CBM all yield the same AUCs, as summarized in Table \@ref(tab:rsm-3-fits-slopes-table1).
## Appendices {#rsm-3-fits-appendices}
### Location of pre-analyzed PROPROC files {#rsm-3-fits-one-dataset-proproc}
For each dataset PROPROC parameters were obtained by running the Windows software with PROPROC selected as the curve-fitting method. The results are saved to files that end with `proprocnormareapooled.csv` ^[In accordance with R-package policies white-spaces in the original `PROPROC` output file names have been removed.] contained in "R/compare-3-fits/MRMCRuns/C/", where `C` denotes the name of the dataset (for example, for the Van Dyke dataset, `C` = "VD"). Examples are shown in the next two screen-shots.
```{r rsm-3-fits-mrmc-runs, out.width="300pt", fig.align = "center", echo=FALSE, fig.cap="Screen shot (1 of 2) of `R/compare-3-fits/MRMCRuns` showing the folders containing the results of PROPROC analysis on 14 datasets."}
knitr::include_graphics("images/compare-3-fits/MRMCRuns.png")
```
```{r rsm-3-fits-mrmc-runs-vd, out.width="300pt", fig.align = "center", echo=FALSE, fig.cap="Screen shot (2 of 2) of `R/compare-3-fits/MRMCRuns/VD` showing files containing the results of PROPROC analysis for the Van Dyke dataset."}
knitr::include_graphics("images/compare-3-fits/MRMCRuns-VD.png")
```
The contents of `R/compare-3-fits/MRMCRuns/VD/VDproprocnormareapooled.csv` are shown next, see Fig. \@ref(fig:rsm-3-fits-proproc-output-van-dyke). ^[The `VD.lrc` file in this directory is the Van Dyke data formatted for input to OR DBM-MRMC 2.5.] The PROPROC parameters $c$ and $d_a$ are in the last two columns. The column names are `T` = treatment; `R` = reader; `return-code` = undocumented value, `area` = PROPROC AUC; `numCAT` = number of ROC bins; `adjPMean` = undocumented value; `c` = $c$ and `d_a` = $d_a$, are the PROPROC parameters defined in [@metz1999proper].
```{r rsm-3-fits-proproc-output-van-dyke, echo=FALSE,out.width="50%",out.height="20%",fig.cap="PROPROC output for the Van Dyke ROC data set. The first column is the treatment, the second is the reader, the fourth is the AUC and the last two columns are the c and $d_a$ parameters.",fig.show='hold',fig.align='center'}
knitr::include_graphics("images/compare-3-fits/vanDyke.png")
```
### Location of pre-analyzed results {#rsm-3-fits-pre-analyzed-results}
The following screen shot shows the pre-analyzed files created by the function `Compare3ProperRocFits()` described below. Each file is named `allResultsC`, where `C` is the abbreviated name of the dataset (uppercase C denotes one or more uppercase characters; for example, `C` = `VD` denotes the Van Dyke dataset.).
```{r rsm-3-fits-all-results-rsm6, out.width="300pt", fig.align = "center", echo=FALSE, fig.cap="Screen shot of `R/compare-3-fits/RSM6` showing the results files created by `Compare3ProperRocFits()`."}
knitr::include_graphics("images/compare-3-fits/RSM6.png")
```
### Plots for the Van Dyke dataset {#rsm-3-fits-all-plots-van-dyke}
The following plots are arranged in pairs, with the left plot corresponding to treatment 1 and the right to treatment 2.
```{r rsm-3-fits-plots-vd-1-1, fig.cap="Composite plots in both treatments for the Van Dyke dataset, reader 1.", fig.show='hold', echo=FALSE}
p1 <- plotsVD[[1,1]]
p2 <- plotsVD[[2,1]]
gA <- ggplotGrob(p1)
gB <- ggplotGrob(p2)
#grid::grid.newpage()
#grid::grid.draw(cbind(gA, gB))
# see R/learn/_grid.arrange.Rmd
grid.arrange(gA,gB, ncol=2, widths = c(1,1),
heights=unit(c(4), c("in")))
```
```{r rsm-3-fits-plots-vd-1-2, fig.cap="Composite plots in both treatments for the Van Dyke dataset, reader 2. For treatment 2 the RSM and PROPROC fits are indistinguishable.", fig.show='hold', echo=FALSE}
p1 <- plotsVD[[1,2]]
p2 <- plotsVD[[2,2]]
gA <- ggplotGrob(p1)
gB <- ggplotGrob(p2)
#grid::grid.newpage()
#grid::grid.draw(cbind(gA, gB))
# see R/learn/_grid.arrange.Rmd
grid.arrange(gA,gB, ncol=2, widths = c(1,1),
heights=unit(c(4), c("in")))
```
The RSM parameter values for the treatment 2 plot are: $\mu$ = `r resultsVD[[2,2]]$retRsm$mu`, $\lambda$ = `r resultsVD[[2,2]]$retRsm$lambda`, $\nu$ = `r resultsVD[[2,2]]$retRsm$nu`, $\zeta_1$ = `r resultsVD[[2,2]]$retRsm$zetas[1]`. The corresponding CBM values are $\mu$ = `r resultsVD[[2,2]]$retCbm$mu`, $\alpha$ = `r resultsVD[[2,2]]$retCbm$alpha`, $\zeta_1$ = `r resultsVD[[2,2]]$retCbm$zetas[1]`. The RSM and CBM $\mu$ parameters are close and likewise the RSM $\nu$ and CBM $\alpha$ parameters are close - this is because they have similar physical meanings, which is investigated later in this chapter TBA. [The CBM does not have a parameter analogous to the RSM $\lambda$ parameter.]
```{r rsm-3-fits-plots-vd-1-3, fig.cap="Composite plots in both treatments for the Van Dyke dataset, reader 3.", fig.show='hold', echo=FALSE}
p1 <- plotsVD[[1,3]]
p2 <- plotsVD[[2,3]]
gA <- ggplotGrob(p1)
gB <- ggplotGrob(p2)
#grid::grid.newpage()
#grid::grid.draw(cbind(gA, gB))
# see R/learn/_grid.arrange.Rmd
grid.arrange(gA,gB, ncol=2, widths = c(1,1),
heights=unit(c(4), c("in")))
```
The RSM parameters for the treatment 1 plot are: $\mu$ = `r resultsVD[[1,3]]$retRsm$mu`, $\lambda$ = `r resultsVD[[1,3]]$retRsm$lambda`, $\nu$ = `r resultsVD[[1,3]]$retRsm$nu`, $\zeta_1$ = `r resultsVD[[1,3]]$retRsm$zetas[1]`. The corresponding CBM values are $\mu$ = `r resultsVD[[1,3]]$retCbm$mu`, $\alpha$ = `r resultsVD[[1,3]]$retCbm$alpha`, $\zeta_1$ = `r resultsVD[[1,3]]$retCbm$zetas[1]`.
```{r rsm-3-fits-plots-vd-1-4, fig.cap="Composite plots in both treatments for the Van Dyke dataset, reader 4. For treatment 2 the 3 plots are indistinguishable and each one has AUC = 1. The degeneracy is due to all operating points being on the axes of the unit square.", fig.show='hold', echo=FALSE}
p1 <- plotsVD[[1,4]]
p2 <- plotsVD[[2,4]]
gA <- ggplotGrob(p1)
gB <- ggplotGrob(p2)
#grid::grid.newpage()
#grid::grid.draw(cbind(gA, gB))
# see R/learn/_grid.arrange.Rmd
grid.arrange(gA,gB, ncol=2, widths = c(1,1),
heights=unit(c(4), c("in")))
```
```{r rsm-3-fits-plots-vd-1-5, fig.cap="Composite plots in both treatments for the Van Dyke dataset, reader 5.", fig.show='hold', echo=FALSE}
p1 <- plotsVD[[1,5]]
p2 <- plotsVD[[2,5]]
gA <- ggplotGrob(p1)
gB <- ggplotGrob(p2)
#grid::grid.newpage()
#grid::grid.draw(cbind(gA, gB))
# see R/learn/_grid.arrange.Rmd
grid.arrange(gA,gB, ncol=2, widths = c(1,1),
heights=unit(c(4), c("in")))
```
### Plots for the Franken dataset {#rsm-3-fits-representative-plots-franken}
The following plots are arranged in pairs, with the left plot corresponding to treatment 1 and the right to treatment 2. These plots apply to the Franken dataset.
```{r rsm-3-fits-plots-fr-1-1, fig.cap="Composite plots in both treatments for the Franken dataset, reader 1.", fig.show='hold', echo=FALSE}
p1 <- plotsFR[[1,1]]
p2 <- plotsFR[[2,1]]
gA <- ggplotGrob(p1)
gB <- ggplotGrob(p2)
#grid::grid.newpage()
#grid::grid.draw(cbind(gA, gB))
# see R/learn/_grid.arrange.Rmd
grid.arrange(gA,gB, ncol=2, widths = c(1,1),
heights=unit(c(4), c("in")))
```
```{r rsm-3-fits-plots-fr-1-2, fig.cap="Composite plots in both treatments for the Franken dataset, reader 2.", fig.show='hold', echo=FALSE}
p1 <- plotsFR[[1,2]]
p2 <- plotsFR[[2,2]]
gA <- ggplotGrob(p1)
gB <- ggplotGrob(p2)
#grid::grid.newpage()
#grid::grid.draw(cbind(gA, gB))
# see R/learn/_grid.arrange.Rmd
grid.arrange(gA,gB, ncol=2, widths = c(1,1),
heights=unit(c(4), c("in")))
```
```{r rsm-3-fits-plots-fr-1-3, fig.cap="Composite plots in both treatments for the Franken dataset, reader 3.", fig.show='hold', echo=FALSE}
p1 <- plotsFR[[1,3]]
p2 <- plotsFR[[2,3]]
gA <- ggplotGrob(p1)
gB <- ggplotGrob(p2)
#grid::grid.newpage()
#grid::grid.draw(cbind(gA, gB))
# see R/learn/_grid.arrange.Rmd
grid.arrange(gA,gB, ncol=2, widths = c(1,1),
heights=unit(c(4), c("in")))
```
```{r rsm-3-fits-plots-fr-1-4, fig.cap="Composite plots in both treatments for the Franken dataset, reader 4.", fig.show='hold', echo=FALSE}
p1 <- plotsFR[[1,4]]
p2 <- plotsFR[[2,4]]
gA <- ggplotGrob(p1)
gB <- ggplotGrob(p2)
#grid::grid.newpage()
#grid::grid.draw(cbind(gA, gB))
# see R/learn/_grid.arrange.Rmd
grid.arrange(gA,gB, ncol=2, widths = c(1,1),
heights=unit(c(4), c("in")))
```