-
Notifications
You must be signed in to change notification settings - Fork 0
/
06-binormal-model.Rmd
1346 lines (982 loc) · 75.7 KB
/
06-binormal-model.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Binormal model {#binormal-model}
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(kableExtra)
library(ggplot2)
library(RJafroc)
library(mvtnorm)
library(grid)
library(gridExtra)
```
```{r, echo=FALSE}
shadedPlotsRoc <- function (a, b, fpf)
{
mu <- a/b
sigma <- 1/b
tpf <- pnorm((mu + qnorm(fpf))/sigma)
zeta <- seq(- mu - 2,+ mu + 2, 0.1)
fpfArr <- sort(c(c(0, pnorm(zeta), 1),fpf))
tpfArr <- sort(c(rev(c(1, pnorm((mu - zeta)/sigma), 0)),tpf))
x <- data.frame(FPF = fpfArr, TPF = tpfArr)
x$fill <- "no fill"
x$fill[(x$FPF >= 0) & (x$FPF <= fpf)] <- "fill"
p <- ggplot(x, aes(x=FPF,y=TPF)) +
geom_line(aes(y = TPF)) +
geom_ribbon(data=subset(x, 0 <= x & x <= fpf),
aes(ymin=0,ymax=TPF), fill="blue", alpha=0.5)
x$fill <- "no fill"
x$fill[(x$TPF >= tpf) & (x$TPF <= 1)] <- "fill"
p <- p +
geom_ribbon(data=subset(x, fpf <= x & x <= 1),
aes(ymin=tpf,ymax=TPF), fill="green", alpha=0.5)
return(p)
}
```
```{r, echo=FALSE}
OpPtStr <- function(x,y) {
y <- paste0("(", sprintf("%.3f", x), ", ", sprintf("%.3f", y), ")")
return(y)
}
simplePrint <- function(x) {
sprintf("%.3f", x)
}
```
## How much finished 97% {#binormal-model-how-much-finished}
## Introduction {#binormal-model-introduction}
The equal variance binormal model was described in Chapter \@ref(binary-task). The ratings method of acquiring ROC data and calculation of operating points was discussed in Chapter \@ref(ratings-paradigm). It was shown there that for a clinical dataset the unequal-variance binormal model visually fitted the data better than the equal-variance binormal model.
This chapter deals with the unequal-variance binormal model, often abbreviated to **binormal model**. It is applicable to univariate datasets in which there is *one rating per case*, as in a single observer interpreting cases, one at a time, in a single modality. By convention the qualifier "univariate" is often omitted. In Chapter \@ref(bivariate-models) a bivariate model will be described where each case yields two ratings, as in a single observer interpreting cases in two modalities, or the similar problem of two observers interpreting the same cases in a single modality, but this is not the focus of this chapter.
## Binormal model {#binormal-model-definition}
The binormal model is defined by (capital letters indicate random variables lower-case are realized values and $t$ denotes the truth state):
\begin{equation}
\left.
\begin{aligned}
Z_{k_tt} \sim &N\left ( \mu_t,\sigma_{t}^{2} \right )\\
t&=1,2
\end{aligned}
\right \}
(\#eq:binormal-model-z-samples-1)
\end{equation}
where
\begin{equation}
\left.
\begin{aligned}
\mu_1=&0\\
\mu_2=&\mu\\
\sigma_{1}^{2}=&1\\
\sigma_{2}^{2}=&\sigma^{2}
\end{aligned}
\right \}
(\#eq:binormal-model-z-samples-2)
\end{equation}
Eqn. \@ref(eq:binormal-model-z-samples-1) states that the z-samples for non-diseased cases ($t = 1$) are distributed as a $N(0,1)$ distribution, i.e., the unit normal distribution, while the z-samples for diseased cases ($t = 2$) are distributed as a $N(\mu,\sigma^2)$ distribution, i.e., a normal distribution with mean $\mu$ and variance $\sigma^2$. In the unequal-variance binormal model, the variance $\sigma^2$ of the z-samples for diseased cases is allowed to be different from unity. Most ROC datasets are consistent with $\sigma > 1$.^[A more complicated version of this model would allow the mean of the non-diseased distribution to be non-zero and its variance different from unity. The resulting 4-parameter model is no more general than the 2-parameter model. The reason is that one is free to transform the decision variable, and associated thresholds, by applying arbitrary monotonic increasing function transformation, which do not change the ordering of the ratings and hence do not change the ROC curve. So if the mean of the noise distribution were non-zero, subtracting this value from all Z-samples would shift the effective mean of the non-diseased distribution to zero (the shifted Z-values are monotonically related to the original values) and the mean of the shifted diseased distribution becomes $\mu_2-\mu_1$. Next, one scales or divides (division by a positive number is also a monotonic transformation) all the Z-samples by $\sigma_1$, resulting in the scaled non-diseased distribution having unit variance, and the scaled diseased distribution has mean $\frac{\mu_2-\mu_1}{\sigma_1}$ and variance $(\frac{\sigma_2}{\sigma_1})^2$. Therefore, if one starts with 4 parameters then one can, by simple shifting and scaling operations, reduce the model to 2 parameters, as in Eqn. \@ref(eq:binormal-model-z-samples-1).]
### Binned data
In an R-rating ROC study the observed ratings $r$ take on integer values 1 through $R$ it being understood that higher ratings correspond to greater confidence for presence of disease. Define $R-1$ ordered cutoffs $\zeta_i$ where $i=1,2,...,R-1$ and $\zeta_1 < \zeta_2,...< \zeta_{R-1}$. Also define two dummy cutoffs $\zeta_0 = -\infty$ and $\zeta_R = +\infty$. The **binning rule** for a case with realized z-sample z is (Chapter \@ref(ratings-paradigm), Eqn. \@ref(eq:ratings-paradigm-binning-rule)):
\begin{equation}
\left.
\begin{aligned}
\text{if} \left (\zeta_{r-1} \le z < \zeta_r \right )&\Rightarrow \text{rating} = r\\
&r = 1, 2, ..., R
\end{aligned}
\right \}
(\#eq:binormal-model-binning-rule)
\end{equation}
```{r, fig.show='hold', echo = FALSE}
mu <- 1.5;sigma <- 1.5
z1 <- seq(-3, 4, by = 0.01)
z2 <- seq(-3, 6, by = 0.01)
Pdf1 <- dnorm(z1)
Pdf2 <- dnorm(z2, mu, sd = sigma)
df <- data.frame(z = c(z1, z2), pdfs = c(Pdf1, Pdf2),
truth =
c(rep('non-diseased',
length(Pdf1)),
rep('diseased', length(Pdf2))),
stringsAsFactors = FALSE)
cut_point <- data.frame(z = c(-2.0, -0.5, 1, 2.5))
rocPdfs <- ggplot(df, aes(x = z, y = pdfs, color = truth)) +
geom_line(linewidth = 1) +
scale_colour_manual(values=c("darkgrey","black")) +
theme(
legend.title = element_blank(),
legend.position = c(0.85, 0.90),
legend.text = element_text(face = "bold"),
axis.title.x = element_text(hjust = 0.8, size = 14,face="bold"),
axis.title.y = element_text(size = 14,face="bold")) +
geom_vline(data = cut_point, aes(xintercept = z),
linetype = "dashed", linewidth = 0.25) +
annotation_custom(
grob =
grid::textGrob(bquote(italic("O")),
gp = gpar(fontsize = 12)),
xmin = -3.2, xmax = -3.2, # adjust the position of "O"
ymin = -0.0, ymax = -0.01) +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0),limits=c(NA,0.4))
for (i in 1 : length(cut_point$z)){
rocPdfs <- rocPdfs +
annotation_custom(
grob =
grid::textGrob(bquote(zeta[.(i)]),gp = gpar(fontsize = 12)),
xmin = cut_point$z[i], xmax = cut_point$z[i],
ymin = -0.025, ymax = -0.045)
}
gt <- ggplot_gtable(ggplot_build(rocPdfs))
gt$layout$clip[gt$layout$name == "panel"] <- "off"
```
```{r binormal-model-pdfs, fig.cap="The pdfs of the two binormal model distributions for $\\mu = 1.5$ and $\\sigma = 1.5$. Four thresholds $\\zeta_1, \\zeta_2, \\zeta_3, \\zeta_4$ are shown corresponding to a five-rating ROC study. The rating assigned to a case is determined by its z-sample according to the binning rule.", fig.show='hold', echo=FALSE}
grid.draw(gt)
```
The above figure, generated with $\mu = 1.5$, $\sigma = 1.5$, $\zeta_1 = -2$, $\zeta_2 = -0.5$, $\zeta_3 = 1$ and $\zeta_4 = 2.5$, illustrates how realized z-samples are converted to ratings, i.e., application of the binning rule \@ref(eq:binormal-model-binning-rule). For example, a case with z-sample equal to -2.5 would be rated "1", and one with z-sample equal to -1 would be rated "2", cases with z-samples greater than 2.5 would be rated "5".
### Sensitivity and specificity
Let $Z_t$ denote the random z-sample for truth state $t$ ($t$ = 1 for non-diseased and $t$ = 2 for diseased cases). Since the distribution of z-samples from disease-free cases is $N(0,1)$, the expression for specificity in Chapter \@ref(binary-task-model) applies:
\begin{equation}
\text{Sp}\left ( \zeta \right )=P\left ( Z_1 < \zeta \right )=\Phi\left ( \zeta \right )
(\#eq:binormal-model-specificity)
\end{equation}
To obtain an expression for sensitivity, consider that for truth state $t = 2$, the random variable $\frac{Z_2-\mu}{\sigma}$ is distributed as $N(0,1)$:
\begin{equation*}
\frac{Z_2-\mu}{\sigma}\sim N\left ( 0,1 \right )
\end{equation*}
Sensitivity, abbreviated to $\text{Se}$, is defined by $\text{Se} \equiv P\left ( Z_2 > \zeta \right )$. It follows, because $\sigma$ is positive, that:
\begin{equation*}
\text{Se}\left ( \zeta | \mu, \sigma \right ) = P\left ( \frac{Z_2-\mu}{\sigma} > \frac{\zeta-\mu}{\sigma} \right )
\end{equation*}
The right-hand-side can be rewritten as follows:
\begin{equation}
\left.
\begin{aligned}
\text{Se}\left ( \zeta | \mu, \sigma \right )&= 1 - P\left ( \frac{Z_2-\mu}{\sigma} \leq \frac{\zeta-\mu}{\sigma} \right )\\
&=1-\Phi\left ( \frac{\zeta-\mu}{\sigma}\right )=\Phi\left ( \frac{\mu-\zeta}{\sigma}\right )
\end{aligned}
\right \}
(\#eq:binormal-model-sensitivity2)
\end{equation}
Summarizing, the formulae for the specificity and sensitivity for the binormal model are:
\begin{equation}
\left.
\begin{aligned}
\text{Sp}\left ( \zeta \right ) &= \Phi\left ( \zeta \right )\\
\text{Se}\left ( \zeta | \mu, \sigma \right ) &= \Phi\left ( \frac{\mu-\zeta}{\sigma}\right )
\end{aligned}
\right \}
(\#eq:binormal-model-se-sp)
\end{equation}
The coordinates of the operating point defined by $\zeta$ are given by:
\begin{equation}
\left.
\begin{aligned}
\text{FPF}\left ( \zeta \right ) &= 1 - \text{Sp}\left ( \zeta \right ) \\
&= 1 - \Phi\left ( \zeta \right ) \\
&= \Phi\left ( -\zeta \right )
\end{aligned}
\right \}
(\#eq:binormal-model-fpf)
\end{equation}
\begin{equation}
\text{TPF}\left ( \zeta | \mu, \sigma \right ) = \Phi\left ( \frac{\mu-\zeta}{\sigma} \right )
(\#eq:binormal-model-tpf)
\end{equation}
An equation for a curve is usually expressed as $y=f(x)$. An expression of this form for the ROC curve, i.e., the y-coordinate (TPF) expressed as a function of the x-coordinate (FPF), follows upon inversion of the expression for FPF, Eqn. \@ref(eq:binormal-model-fpf):
\begin{equation}
\zeta = -\Phi^{-1}\left ( \text{FPF} \right )
(\#eq:binormal-model-zeta)
\end{equation}
Substitution of Eqn. \@ref(eq:binormal-model-zeta) in Eqn. \@ref(eq:binormal-model-tpf) yields:
\begin{equation}
\text{TPF} = \Phi\left ( \frac{\mu + \Phi^{-1}\left (\text{FPF} \right )}{\sigma} \right )
(\#eq:binormal-model-roc-curve1)
\end{equation}
This equation will be put into conventional notation next.
### Binormal model in conventional notation
The $(\mu,\sigma)$ notation makes sense when extending the binormal model to newer models described later, see Chapter \@ref(proper-roc-models). However, it was not the way the binormal model was originally parameterized. Instead the following notation is widely used in the literature:
\begin{equation}
\left.
\begin{aligned}
a&=\frac{\mu}{\sigma}\\
b&=\frac{1}{\sigma}
\end{aligned}
\right \}
(\#eq:binormal-model-ab-parameters)
\end{equation}
>The reason for the $(a,b)$ instead of the $(\mu,\sigma)$ notation is historical. [@dorfman1969maximum] assumed that the diseased distribution had unit variance, and the non-diseased distribution had standard deviation $b$ and their separation was $a$, see Plot A in Fig. \@ref(fig:binormal-model-ab2-mu-sigma).
By dividing the z-samples by $b$, the variance of the distribution labeled "Noise" becomes unity, its mean stays at zero, and the variance of the distribution labeled "Signal" becomes $1/b$, and its mean becomes $a/b$, see plot B. Accordingly the inverses of Eqn. \@ref(eq:binormal-model-ab-parameters) are:
\begin{equation}
\left.
\begin{aligned}
\mu&=\frac{a}{b}\\
\sigma&=\frac{1}{b}
\end{aligned}
\right \}
(\#eq:binormal-model-ab-parameters-inv)
\end{equation}
Eqns. \@ref(eq:binormal-model-ab-parameters) and \@ref(eq:binormal-model-ab-parameters-inv) allow conversion from one notation to another.
```{r, echo=FALSE, fig.show='hold'}
mu <- 2
sigma <- 1.8
a <- mu/sigma
b <- 1/sigma
z <- seq(-3, 6, by = 0.01)
y1 <- dnorm(z, 0, sd = b)
y2 <- dnorm(z, a)
y <- data.frame(
z = c(z, z),
pdfs = c(y1, y2),
curves =
c(rep("y1", length(y1)),
rep("y2", length(y2))))
p1 <- ggplot() + ggtitle("A") +
geom_line(data = y,
mapping =
aes(x = z,
y = pdfs,
linetype = curves),
linewidth = 0.25) +
scale_linetype_manual(
values = c("dashed", "solid")) + theme_bw() +
theme(axis.title.y =
element_text(size = 18,face="bold"),
axis.title.x =
element_text(size = 18,face="bold")) +
theme(panel.grid.major =
element_blank(), panel.grid.minor = element_blank(),
panel.border =
element_rect(color = "black"), legend.position="none") +
geom_segment(aes(x = a,
y = 0.73,
xend = 0,
yend = 0.73),
arrow = grid::arrow(length = unit(0.3, "cm"), ends = "both")) +
annotate(
"text",
x = a/2,
y = 0.76,
label = "a",
size = 5) +
annotate(
"text",
x = 1.6,
y = 0.45,
label = "Noise",
size = 5) +
geom_segment(aes(
x = b,
y = 0.39,
xend = -b,
yend = 0.39),
arrow = grid::arrow(length = unit(0.3, "cm"), ends = "both")) +
annotate(
"text",
x = 0,
y = 0.42,
label = "b",
size = 5) +
geom_segment(
aes(x = 1*1.12+a,
y = 0.17,
xend = -1*1.2 +a,
yend = 0.17),
arrow = grid::arrow(length = unit(0.3, "cm"), ends = "both")) +
annotate("text",
x = a,
y = 0.19,
label = "1",
size = 5) +
annotate("text",
x = 4,
y = 0.1,
label = "Signal",
size = 5)
z <- seq(-3*sigma, 6*sigma, by = 0.01)
y1 <- dnorm(z, 0, 1)
y2 <- dnorm(z, mu, sigma)
y <- data.frame(z = c(z, z),
pdfs = c(y1, y2),
curves =
c(rep("y1", length(y1)),
rep("y2", length(y2))))
p2 <- ggplot() + ggtitle("B") + geom_line(
data = y,
mapping = aes(x = z,
y = pdfs, linetype = curves), linewidth = 0.25) +
scale_linetype_manual(
values = c("dashed", "solid")) +
theme_bw() +
theme(axis.title.y = element_text(size = 18,face="bold"),
axis.title.x = element_text(size = 18,face="bold")) +
theme(panel.grid.major =
element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(color = "black"),
legend.position="none") +
geom_segment(
aes(x = 0,
y = 0.41,
xend = mu,
yend = 0.41),
arrow = grid::arrow(length = unit(0.3, "cm"), ends = "both")) +
annotate("text",
x = mu/2,
y = 0.44,
label = "mu == a/b",
size = 5, fontface = "bold", parse = TRUE) +
annotate("text",
x = 2.8,
y = 0.28,
label = "Noise",
size = 5,
fontface = "bold", parse = TRUE) +
geom_segment(
aes(x = 1,
y = 0.22,
xend = -1,
yend = 0.22),
arrow = grid::arrow(length = unit(0.3, "cm"), ends = "both")) +
annotate("text",
x = 0,
y = 0.24,
label = "1",
size = 5, fontface = "bold", parse=TRUE) +
geom_segment(
aes(x = -sigma*1.2+mu,
y = 0.1,
xend = sigma*1.2 + mu,
yend = 0.1),
arrow = grid::arrow(length = unit(0.3, "cm"), ends = "both")) +
annotate("text",
x = mu,
y = 0.12,
label = "sigma == 1/b",
size = 5,
fontface = "bold", parse = TRUE) +
annotate("text",
x = 6.4,
y = 0.1,
label = "Signal",
size = 5,
fontface = "bold", parse = TRUE)
```
```{r binormal-model-ab2-mu-sigma, fig.cap="Plot A shows the definitions of the (a,b) parameters of the binormal model. In plot B the x-axis has been rescaled so that the noise distribution has unit variance; this illustrates the difference between the (a,b) and the ($\\mu,\\sigma$) parameters. In this figure $\\mu = 2$ and $\\sigma = 1.8$ which correspond to $a = 1.11$ and $b = 0.556$.", fig.show='hold', echo=FALSE}
grid.arrange(p1,p2,ncol=2)
```
## ROC curve {#binormal-model-roc-curve}
Using the $(a,b)$ notation, Eqn. \@ref(eq:binormal-model-roc-curve1) for the ROC curve reduces to:
\begin{equation}
\text{TPF}\left ( \text{FPF} \right ) = \Phi\left ( a+ b \Phi^{-1}\left (\text{FPF} \right ) \right )
(\#eq:binormal-model-roc-curve-tpf-fpf)
\end{equation}
Since $\Phi^{-1}(\text{FPF})$ is an increasing function of its argument $\text{FPF}$, and $b > 0$, the argument of the $\Phi$ function is an increasing function of $\text{FPF}$. Since $\Phi$ is a monotonically increasing function of its argument, $\text{TPF}$ is a monotonically increasing function of $\text{FPF}$. This is true regardless of the sign of $a$. If $\text{FPF} = 0$, then $\Phi^{-1}(0) = -\infty$ and $\text{TPF} = 0$. If $\text{FPF} = 1$, then $\Phi^{-1}(1) = +\infty$ and $\text{TPF} = 1$. Regardless of the value of $a$, as long as $b \ge 0$, the ROC curve starts at (0,0) and increases monotonically to (1,1).
From Eqn. \@ref(eq:binormal-model-fpf) and Eqn. \@ref(eq:binormal-model-tpf), the expressions for $\text{FPF}$ and $\text{TPF}$ in terms of model parameters $(a,b)$ are:
\begin{equation}
\left.
\begin{aligned}
\text{FPF}\left ( \zeta \right ) &= \Phi\left ( -\zeta \right )\\
\text{TPF}\left (\zeta | a,b \right ) &= \Phi\left ( a - b \zeta \right )
\end{aligned}
\right \}
(\#eq:binormal-model-op-point-ab)
\end{equation}
Solve for $\zeta$ from the equation for FPF:
\begin{equation}
\zeta = - \Phi^{-1}\left ( \text{FPF} \right )
(\#eq:binormal-model-op-point-ab1)
\end{equation}
## Density functions {#binormal-model-pdfs}
According to Eqn. \@ref(eq:binormal-model-z-samples-1) the probability that a non-diseased case z-sample is smaller than $\zeta$, i.e., the cumulative distribution function (CDF) function for non-diseased cases, is:
\begin{equation*}
P\left ( Z \le \zeta \mid Z\sim N\left ( 0,1 \right ) \right ) = 1-FPF\left ( \zeta \right ) = \Phi \left ( \zeta \right )
\end{equation*}
Likewise, the CDF for diseased case z-samples is:
\begin{equation*}
P\left ( Z \le \zeta \mid Z\sim N\left ( \mu,\sigma^2 \right ) \right ) = 1-\text{TPF}\left ( \zeta \right ) = \Phi \left ( \frac{\zeta - \mu}{\sigma} \right )
\end{equation*}
Since the *pdf* is the derivative of the corresponding CDF function, it follows that (the superscripts N and D denote non-diseased and diseased cases, respectively):
\begin{equation}
\left.
\begin{aligned}
pdf_N\left ( \zeta \right ) &= \frac{\partial \Phi\left ( \zeta \right )}{\partial \zeta} \\
&= \phi\left ( \zeta \right ) \\
&\equiv \frac{1}{\sqrt{2 \pi}}\exp\left ( -\frac{\zeta^2}{2} \right )
\end{aligned}
\right \}
(\#eq:binormal-model-pdf-n)
\end{equation}
\begin{equation}
\left.
\begin{aligned}
pdf_D\left ( \zeta \right ) &= \frac{\partial \Phi\left ( \frac{\zeta - \mu}{\sigma} \right )}{\partial \zeta} \\ &= \frac{1}{\sigma} \phi\left ( \frac{\zeta - \mu}{\sigma} \right ) \\
&\equiv \frac{1}{\sqrt{2 \pi}\sigma}\exp\left ( -\frac{\left (\zeta-\mu \right )^2}{2\sigma} \right )
\end{aligned}
\right \}
(\#eq:binormal-model-pdf-d-mu-sigma)
\end{equation}
The second equation can be written in $(a,b)$ notation as:
\begin{equation}
\left.
\begin{aligned}
pdf_D\left ( \zeta \right ) &= b\phi\left ( b\zeta-a \right ) \\
&= \frac{b}{\sqrt{2 \pi}}\exp\left ( -\frac{\left (b\zeta - a \right )^2}{2} \right )
\end{aligned}
\right \}
(\#eq:binormal-model-pdf-d-a-b)
\end{equation}
## Invariance property of pdfs {#binormal-model-invariance-property}
The binormal model is not as restrictive as might appear at first sight. Any monotone increasing transformation $Y=f(Z)$ applied to the observed z-samples, and the associated thresholds, will yield the same observed data, e.g., Table \@ref(tab:ratings-paradigm-example-table). This is because such a transformation leaves the ordering of the ratings unaltered and hence results in the same operating points. While the distributions for $Y$ will not be binormal (i.e., two independent normal distributions), one can safely "pretend" that one is still dealing with an underlying binormal model. An alternative way of stating this is that any pair of distributions is allowed as long as they are reducible to a binormal model form by a monotonic increasing transformation of Y: e.g., $Z=f^{-1}$. [If $f$ is a monotone increasing function of its argument, so is $f^{-1}$}.] For this reason, the term “pair of latent underlying normal distributions” is sometimes used to describe the binormal model. The robustness of the binormal model has been investigated [@hanley1988robustness; @dorfman1997proper]. The referenced paper by Dorfman et al has an excellent discussion of the robustness of the binormal model.
The robustness of the binormal model, i.e., the flexibility allowed by the infinite choices of monotonic increasing functions, application of each of which leaves the ordering of the data unaltered, is widely misunderstood. The non-Gaussian appearance of histograms of ratings in ROC studies can lead one to incorrect conclusions that the binormal model is inapplicable to these datasets. To quote a reviewer of one of my recent papers:
> I have had multiple encounters with statisticians who do not understand this difference.... They show me histograms of data, and tell me that the data is obviously not normal, therefore the binormal model should not be used.
The reviewer is correct. The misconception is illustrated next.
```{r, echo=FALSE}
TrapezoidalArea <- function( noise, signal )
{
area = 0
for( ns in 1 : length( signal ) ) {
a = noise[ noise < signal[ ns ] ]
b = noise[ noise == signal[ ns ] ]
area = area + length( a )
area = area + length( b ) * 0.5
}
area = area / length( noise ) / length( signal )
return( area )
}
Y <- function(z,mu1,mu2,sigma1,sigma2,f) { # line 12
y <- (1-f)*pnorm((z-mu1)/sigma1)*100+f*pnorm((z-mu2)/sigma2)*100
return( y )
}
```
```{r, fig.show='hold', out.width = '33%', echo=FALSE}
# shows that monotone transformations have no effect on
# AUC even though the pdfs look non-gaussian
# common misconception about ROC analysis
fArray <- c(0.1,0.5,0.9)
seedArray <- c(10,11,12)
for (row in 1:3) {
f <- fArray[row]
seed <- seedArray[row]
set.seed(seed)
# numbers of cases simulated
K1 <- 900
K2 <- 1000
mu1 <- 30
sigma1 <- 7
mu2 <- 55
sigma2 <- 7
# Simulate true gaussian ratings using above parameter values
z1 <- rnorm(K1,mean = mu1,sd = sigma1)
z1[z1>100] <- 100;z1[z1<0] <- 0 # constrain to 0 to 100
z2 <- rnorm(K2,mean = mu2,sd = sigma2)
z2[z2>100] <- 100;z2[z2<0] <- 0 # constrain to 0 to 100
# calculate AUC for true Gaussian ratings
AUC1 <- TrapezoidalArea(z1, z2)
Gaussians <- c(z1, z2)
# display histograms of true Gaussian ratings, A1, A2 or A3
x <- data.frame(x=Gaussians) # line 27
x <-
ggplot(data = x, mapping = aes(x = x)) +
geom_histogram(binwidth = 5, color = "black", fill="grey") +
xlab(label = "Original Rating") +
ggtitle(label = paste0("A", row, ": ", "Gaussians"))
print(x)
z <- seq(0.0, 100, 0.1)
# transform the latent Gaussians to true Gaussians
transformation <-
data.frame(
x = z,
z = Y(z,mu1,mu2,sigma1,sigma2,f))
# display transformation functions, B1, B2 or B3
x <-
ggplot(mapping = aes(x = x, y = z)) +
geom_line(data = transformation, linewidth = 1) +
xlab(label = "Original Rating") +
ylab(label = "Transformed Rating") +
ggtitle(label = paste0("B", row, ": ","Monotone Transformation"))
print(x)
y <- Y(c(z1, z2),mu1,mu2,sigma1,sigma2,f)
y1 <- y[1:K1];y2 <- y[(K1+1):(K1+K2)]
# calculate AUC for transformed ratings
AUC2 <- TrapezoidalArea( y1, y2)
# display histograms of latent Gaussian ratings, C1, C2 or C3
x <- data.frame(x=y)
x <- ggplot(data = x, mapping = aes(x = x)) +
geom_histogram(binwidth = 5, color = "black", fill="grey") +
xlab(label = "Transformed Rating") +
ggtitle(label = paste0("C", row, ": ", "Latent Gaussians"))
print(x)
# print AUCs, note they are identical (for each row)
options(digits = 9)
# cat("row =", row, ", seed =", seed, ", f =", f,
# "\nAUC of actual Gaussians =", AUC1,
# "\nAUC of latent Gaussians =", AUC2, "\n")
}
```
**This figure illustrates the invariance of ROC analysis to arbitrary monotone transformations of the ratings.**
* Each row contains 3 plots: labeled 1, 2 and 3. Each column contains 3 plots labeled A, B and C. So, for example, plot C2 refers to the second row and third column. Each of the latent Gaussian plots C1, C2 and C3 appears to be not binormal. However, using the monotone transformations shown (B1, B2 and B3) they can be transformed to the binormal model histograms A1, A2 and A3.
* Plot A1 shows the histogram of simulated ratings from a binormal model. Two peaks, one at 30 and the other at 55 are evident (by design, all ratings in this figure are in the range 0 to 100). Plot B1 shows the monotone transformation. Plot C1 shows the histogram of the transformed rating. The choice of $f$ leads to a transformed rating histogram that is peaked near the high end of the rating scale. For A1 and C1 the corresponding AUCs are identical.
* Plot A2 is for a different seed value, plot B2 is the transformation and now the transformed histogram is almost flat, plot C2. For plots A2 and C2 the corresponding AUCs are identical.
* Plot A3 is for a different seed value, B3 is the transformation and the transformed histogram C3 is peaked near the low end of the transformed rating scale. For plots A3 and (C3) the corresponding AUCs are identical.
**Visual examination of the shape of the histograms of ratings, or standard tests for normality, yield little, if any, insight into whether the underlying binormal model assumptions are being violated.**
## Az and d-prime measures {#binormal-model-full-auc}
The (full) area under the ROC, denoted $A_z$, is derived in [@thompson1989statistical]:
\begin{equation}
\left.
\begin{aligned}
A_z=&\Phi\left ( \frac{a}{\sqrt{1+b^2}} \right )\\
=&\Phi\left ( \frac{\mu}{\sqrt{1+\sigma^2}} \right )
\end{aligned}
\right\}
(\#eq:binormal-model-ab-2az)
\end{equation}
The binormal fitted AUC increases as $a$ increases or as $b$ decreases. Equivalently, it increases as $\mu$ increases or as $\sigma$ decreases.
The reason for the name $A_z$ is that historically (prior to maximum likelihood estimation) this quantity was estimated by converting the probabilities FPF and TPF to *z-deviates* (see TBA), which of-course assumes normal distributions. The z-subscript is meant to emphasize that this is a binormal model derived estimate.
The $d'$ parameter is defined as the separation of two unit-variance normal distributions yielding the same AUC as that predicted by the $(a,b)$ parameter binormal model. It is defined by:
\begin{equation}
d'=\sqrt{2}\Phi^{-1}\left ( A_z \right )
(\#eq:binormal-model-ab-2dprime)
\end{equation}
```{r echo=FALSE}
A_z <- pnorm(a/sqrt(1+b^2))
```
The d' index can be regarded as a perceptual signal-to-noise-ratio.
## Fitting the binormal model {#binormal-model-fitting}
[@dorfman1969maximum] were the first to fit ratings data to the binormal model. The details of the procedure are in Appendix \@ref(binormal-model-curve-fitting). While historically very important in showing how statistically valid quantitative analysis is possible using ROC ratings data, the fitting procedure suffers from what are termed "degeneracy issues" and "fitting artifacts" discussed in Appendix \@ref(binormal-model-degeneracy). Degeneracy is when the fitting procedure yields unreasonable parameter values. Fitting artifacts occur when the fitted curve predicts worse than chance level performance in some region of the fitted ROC curve. Because of these issues usage of this method is now discouraged as it has largely been supplanted by other software such as the CBM fitting method, the proper ROC fitting method implemented in PROPROC and the RSM (radiological search model) based fitting method. These are discussed in later chapters.
## Partial AUC measures {#binormal-model-partial-auc}
Two partial AUC measures have been defined. The idea is to have an AUC-like measure that emphasizes some region of the ROC curve, one that is argued to be clinically more significant, instead of $A_z$ which characterizes the whole curve. In the following two definitions are considered, one that emphasizes the high specificity region of the ROC curve and one which emphasizes the high sensitivity region of the curve.
Shorthand: denote $A \equiv A_z$, $x \equiv \text{FPF}$ and $y \equiv \text{TPF}$. The two partial AUC measures correspond to a partial integral along the x-axis starting from the origin (high specificity) and the other to a partial integral along the y-axis ending at (1,1) corresponding to high sensitivity. These are denoted by X and Y superscripts.
### Measure emphasizing high specificity {#binormal-model-meaning-partial-auc-definitions}
The partial area under the ROC, $A_c^{X}$, is defined as that extending from $x = 0$ to $x = c$, where $0 \le c \le 1$ (in our notation $c$ always means a cutoff on the x-axis of the ROC):
\begin{equation}
\left.
\begin{aligned}
A_c^{X} &= \int_{x=0}^{x=c} y \, dx
\\&= \int_{x=0}^{x=c} \Phi\left ( a + b \; \Phi^{-1} \left ( x \right ) \right ) \, dx
\end{aligned}
\right \}
(\#eq:binormal-model-partial-area-a1)
\end{equation}
The second form follows from Eqn. \@ref(eq:binormal-model-roc-curve-tpf-fpf).
[@thompson1989statistical] derive a formula for the partial-area in terms of the binormal model parameters $a$ and $b$:
\begin{equation}
A_c^{X} = \int_{z_2=-\infty}^{\Phi^{-1}\left ( c \right )} \int_{z_1=-\infty}^{\frac{a}{\sqrt{1+b^2}}} \phi\left ( z_1,z_2;\rho \right ) dz_1dz_2
(\#eq:binormal-model-partial-area-final)
\end{equation}
On the right hand side the integrand $\phi\left ( z_1,z_2;\rho \right )$ is the standard bivariate normal density function with correlation coefficient $\rho$. It is defined by:
\begin{equation}
\left.
\begin{aligned}
\phi\left (z_1,z_2;\rho \right ) &= \frac{1}{2 \pi \sqrt{1-\rho^2}} \exp\left ( -\frac{z_1^2 -2\rho z_1 z_2 +z_2^2}{2\left ( 1-\rho^2 \right )} \right ) \\
\rho &= - \frac{b}{\sqrt{1+b^2}}
\end{aligned}
\right \}
(\#eq:binormal-model-bivariate-density)
\end{equation}
As demonstrated later the integrals occurring on the right hand side of Eqn. \@ref(eq:binormal-model-partial-area-final) can be evaluated numerically.
As an area measure the partial AUC $A_c^{X}$ has a simple *geometric* meaning. A *physical* meaning is as follows:
> An ROC curve^[This curve is not binormal as the truncation destroys the normality of the two distributions] can be defined over the truncated dataset where all z-samples **smaller** than $-\Phi^{-1}(c)$ are ignored. The maximum area of this curve is that defined by the rectangle with corners at $(0,0)$ and $(c,\text{TPF}\left ( c \right ))$: $c$ is the abscissa at the upper limit of the integration interval along the x-axis and $\text{TPF}\left ( c \right )$ is the corresponding ordinate: see Eqn. \@ref(eq:binormal-model-roc-curve-tpf-fpf). Dividing $A_c^{X}$ by $\text{TPF}\left ( c \right ) \times c$ yields a normalized partial area measure, denoted $A_c^{XN}$, where $0 \le A_c^{XN} \le 1$. **This is the classification accuracy between diseased and non-diseased cases measured over the truncated dataset.** If $a \ge 0$ it is constrained to (0.5, 1).
\begin{equation}
A_c^{XN} = \frac{A_c^{X}}{\text{TPF}\left ( c \right ) \times c}
(\#eq:binormal-model-normalized-partial-auc-specificity)
\end{equation}
### Measure emphasizing high sensitivity {#binormal-model-metz-partial-auc}
Since the integral in Eqn. \@ref(eq:binormal-model-partial-area-a1) is from $x = 0$ to $x = c$ this partial AUC measure emphasizes the *high specificity* region of the ROC curve (since $x = 0$ corresponds to unit, i.e. highest, specificity).
An alternative partial AUC measure has been defined [@jiang1996receiver] that emphasizes the *high sensitivity* region of the ROC as follows:
\begin{equation}
A_c^{Y} = \int_{y=\text{TPF}(c)}^{y=1} \left (1-x \right ) \, dy
(\#eq:binormal-model-partial-area-ac-metz)
\end{equation}
$A_c^{Y}$ is the (un-normalized) area below the ROC extending from $y = \text{TPF}(c)$ to $y = 1$. The superscript Y denotes that the integral is over part of the y-axis. The maximum value of this integral is the area of the rectangle defined by the corner points $(c,\text{TPF}(c))$ and $(1,1)$. Therefore the normalized area is defined by (our normalization differs from that in the cited reference):
\begin{equation}
A_c^{YN} = \frac{A_c^{Y}}{\left (1 - \text{TPF}(c) \right ) \times \left (1-c \right )}
(\#eq:binormal-model-normalized-partial-auc-sensitivity)
\end{equation}
A *physical* meaning is as follows:
> An ROC curve can be defined over the truncated dataset where all z-samples **greater** than $-\Phi^{-1}(c)$ are ignored. **$A_c^{YN}$ is the classification accuracy between diseased and non-diseased cases measured over the truncated dataset.** By definition the normalized area ranges between 0 and 1.
### Numerical examples {#binormal-model-metz-partial-auc-example}
Fig. \@ref(fig:binormal-model-partial-areas) shows the two un-normalized areas.
```{r binormal-model-partial-areas, fig.cap = "Un-normalized partial AUC measures: the blue shaded area is $A_c^{X}$, the partial area below the ROC; the green shaded area is $A_c^{Y}$ the partial area above the ROC. Parameters are $a = 1.8$, $b = 1$ and $c = 0.3$.", fig.show='hold', echo=FALSE}
print(shadedPlotsRoc(a = 1.8, b = 1, fpf = 0.3))
```
The following code illustrates calculation of the partial-area measure using the function `pmvnorm` in the `R` package `mvtnorm` [@R-mvtnorm]. The parameter values were: $a = 1.8$, $b = 1$ and $c = 0.3$ (see lines 1-3 below).
```{r, attr.source = ".numberLines"}
a <- 1.8
b <- 1
fpf_c <- 0.3 # cannot use c as variable name
tpf_c <- pnorm(a + b * qnorm(fpf_c))
A_z <- pnorm(a/sqrt(1+b^2))
rho <- -b/sqrt(1+b^2)
Lower1 <- -Inf
Upper1 <- qnorm(fpf_c)
Lower2 <- -Inf
Upper2 <- a/sqrt(1+b^2)
sigma <- rbind(c(1, rho), c(rho, 1))
A_x <- as.numeric(pmvnorm(
c(Lower1, Lower2),
c(Upper1, Upper2),
sigma = sigma))
# divide by area of rectangle
A_xn <- A_x/fpf_c/tpf_c
```
The function `pmvnorm` is called at line 12. The un-normalized partial-area measure $A_c^{X}$ = `r simplePrint(A_x)`. The corresponding full AUC measure is $A_z$ = `r simplePrint(A_z)`. The normalized measure is $A_c^{XN}$ = `r simplePrint(A_xn)`. This is the classification accuracy between non-diseased and diseased cases in the truncated dataset defined by ignoring cases with z-samples smaller than $-\Phi^{-1}(c)$ = `r simplePrint(-qnorm(fpf_c))`. This measure emphasizes specificity.
$A_c^{Y}$ can be calculated using geometry. One subtracts $A_c^{X}$ from $A_z$ to get the area under the ROC to the right of $\text{FPF}=c$. Next one subtracts from this quantity the area of the rectangle with base $(1 - c)$ and height $\text{TPF}_c$. This yields the area if the green shaded region $A_c^{Y}$. To normalize it one divides by the area of the rectangle defined by the corner points $(c,\text{TPF}_c)$ and (1,1).
```{r}
# implement geometrical logic
A_y <- (A_z - A_x)-(1-fpf_c)*(tpf_c)
A_yn <- A_y/(1-tpf_c)/(1-fpf_c)
```
The un-normalized partial-area measure $A_c^{Y}$ = `r simplePrint(A_y)`. The normalized measure is $A_c^{YN}$ = `r simplePrint(A_yn)`. This is the classification accuracy between non-diseased and diseased cases in the truncated dataset defined by ignoring cases with z-samples greater than $-\Phi^{-1}(c)$ = `r simplePrint(-qnorm(fpf_c))`. This measure emphasizes sensitivity.
The variation with $a$ of the two normalized AUC measures is shown next. The function `normalizedAreas` encapsulates the above calculations and is called for different values of $a$.
```{r, echo=FALSE}
normalizedAreas <- function (a, b, fpf_c)
{
zeta <- -qnorm(fpf_c)
A_z <- pnorm(a/sqrt(1+b^2))
tpf_c <- pnorm(a - b * zeta)
rho <- -b/sqrt(1+b^2)
Lower1 <- -Inf
Upper1 <- qnorm(fpf_c)
Lower2 <- -Inf
Upper2 <- a/sqrt(1+b^2)
sigma <- rbind(c(1, rho), c(rho, 1))
A_x <- as.numeric(pmvnorm(
c(Lower1, Lower2),
c(Upper1, Upper2),
sigma = sigma))
A_xn <- A_x/fpf_c/tpf_c # normalized X-area
A_y <- (A_z - A_x)-(1-fpf_c)*(tpf_c) # geometry
A_yn <- A_y/(1-tpf_c)/(1-fpf_c) # normalized Y-area
return(list(
A_xn = A_xn,
A_yn = A_yn
)
)
}
```
```{r, echo=TRUE}
a_arr = seq(0,8)
A_xn_arr <- array(dim = length(a_arr))
A_yn_arr <- array(dim = length(a_arr))
for (i in 1:length(a_arr)) {
x <- normalizedAreas(a_arr[i], 1, 0.1) # c = 0.1
A_xn_arr[i] <- x$A_xn
A_yn_arr[i] <- x$A_yn
}
```
```{r summary-table-partial-normalized-areas, echo=FALSE}
# df <- data.frame(a = a_arr, A_XN = A_xn_arr, A_YN = A_yn_arr)
df <- data.frame(a_arr, A_xn_arr, A_yn_arr)
df$A_xn_arr <- round(df$A_xn_arr, digits = 4)
df$A_yn_arr <- round(df$A_yn_arr, digits = 4)
colnames(df) <- c("$a$","$A^{XN}_c$", "$A^{YN}_c$")
knitr::kable(df, caption = "Summary of normalized $A_c^{XN}$ and $A_c^{YN}$ partial AUCs for different values of the $a$ parameter, where $b = 1$ and $c = 0.1$.", escape = FALSE)
```
Table \@ref(tab:summary-table-partial-normalized-areas) shows $A_c^{XN}$ and $A_c^{YN}$ partial AUCs for different values of the $a$ parameter for $b = 1$ and $c = 0.1$. It demonstrates that the normalized areas are constrained between 0.5 and 1 (as long as $a$ in non-negative). For numerical reasons (basically a zero-divided-by-zero condition) it is difficult to show that $A_c^{YN}$ approaches 1 in the limit of very large a-parameter (since the green shaded area shrinks to zero).
## Comments on partial AUC measures {#binormal-model-partial-auc-comments}
There are several issues with the adoption of either partial AUC measure.
1. Since a partial area measure corresponds to classification accuracy measured over a **truncated** dataset a fundamental correspondence between $A_z$ and classification accuracy measured over the **entire** dataset is lost. A basic statistical principle of the desirability of an estimate valid for the entire population is being violated.
2. The choice of the truncation cutoff is arbitrary and subject to bias on the part of the investigator. This is similar to the type of bias that is inherent in a single point (sensitivity-specificity) based approach to analysis: this was the very reason for adoption of a measure such as $A_z$ that averages over the whole curve, as argued so eloquently in [@metz1978rocmethodology].
3. Then there is the issue of possible loss of statistical power. If $A_z$ is estimated from the whole dataset and either Eqn. \@ref(eq:binormal-model-normalized-partial-auc-specificity) or Eqn. \@ref(eq:binormal-model-normalized-partial-auc-sensitivity) is used to estimate partial AUC, then one expects no loss in statistical power, as these equations represent noiseless mathematical transformations using the $(a,b)$ parameters estimated over the entire dataset. However, if an empirical partial AUC measure is used there will surely be loss of statistical power resulting from ignoring some of the data. Due to degeneracy issues usage of the empirical partial AUC is often unavoidable. This is because performing significance testing requires that the dataset be re-sampled many times and the parametric fit may not work every time.
The second point is illustrated by the study reported in [@jiang1996receiver]. The ROC curves of a developmental-stage CAD system and that of radiologists cross each other: at high specificity the radiologists were better but the reverse was true at high sensitivity. By choosing the latter region the authors demonstrated statistically significant superiority of CAD over radiologists. Analysis using $A_z$ failed to reach statistical significance.
Two very large clinical studies [@fenton2007influence, @fenton2011effectiveness] using 222,135 and 684,956 women, respectively, showed that a commercial CAD can actually have a detrimental effect on patient outcome[@philpotts2009can]. A more recent study has confirmed the negative view of the efficacy of CAD[@lehman2015diagnostic] and there has even been a call for ending Medicare reimbursement for CAD interpretations[@fenton2015time]. I have not followed the field since ca. 2016 and it is likely that newer versions of CAD now being used in the clinic are better than those evaluated in the cited studies. But the point is that even using a ca. 1996 developmental-stage CAD the authors were able to claim, using a partial AUC measure, that CAD outperformed radiologists, a result clearly not borne out by later large clinical studies while the $A_z$ measure did not allow this conclusion.
## Discussion{#binormal-model-discussion}
The binormal model is historically very important and the contribution by Dorfman and Alf [@dorfman1969maximum] was seminal. Prior to their work, there was no statistically valid way of estimating AUC from observed ratings counts. Their work and a key paper [@RN1487] accelerated research using ROC methods. The number of publications using their algorithm, and the more modern versions developed by Metz and colleagues, is probably well in excess of 500. Because of its key role, I have endeavored to take out some of the mystery about how the binormal model parameters are estimated. In particular, a common misunderstanding that the binormal model assumptions are violated by real datasets, when in fact it is quite robust to apparent deviations from normality, is addressed (details are in Section \@ref(binormal-model-invariance-property)).
A good understanding of this chapter should enable the reader to better understand alternative ROC models, discussed later.
To this day the binormal model is widely used to fit ROC datasets. In spite of its limitations, the binormal model has been very useful in bringing a level of quantification to this field that did not exist prior to 1969.
## Appendix: Fitting an ROC curve {#binormal-model-curve-fitting}
One aim of this chapter is to demystify statistical curve fitting. With the passing of Profs. Donald Dorfman, Charles Metz and Richard Swensson, parametric modeling is much neglected. Researchers have instead focused on non-parametric analysis using the empirical AUC defined in Chapter \@ref(empirical-auc). A claimed advantage (overstated in my opinion, see Section \@ref(binormal-model-invariance-property)) of non-parametric analysis is the absence of distributional assumptions. Non-parametric analysis yields no insight into what is limiting performance. Binormal model based curve fitting described in this chapter will allow the reader to appreciate a later chapter (see RSM fitting chapter in `RJafrocFrocBook`) that describes a more complex fitting method which yields important insights into the factors limiting human observer (or artificial intelligence algorithm) performance.
### JAVA fitted ROC curve
This section, described in the physical book, has been abbreviated to a [relevant website](http://www.rad.jhmi.edu/jeng/javarad/roc/JROCFITi.html).
### Simplistic straight line fit to the ROC curve
To be described next is a method for fitting data such as in Table \@ref(tab:ratings-paradigm-example-table) to the binormal model, i.e., determining the parameters $(a,b)$ and the thresholds $\zeta_r , \quad r = 1, 2, ..., R-1$, to best fit, in some to-be-defined sense, the observed cell counts. The most common method uses an algorithm called maximum likelihood. But before getting to that, I describe the least-square method, which is conceptually simpler, but not really applicable, as will be explained shortly.
#### Least-squares estimation
By applying the function $\Phi^{-1}$ to both sides of Eqn. \@ref(eq:binormal-model-roc-curve1), one gets (the "inverse" function cancels the "forward" function on the right hand side):
\begin{equation*}
\Phi^{-1}\left ( \text{TPF} \right ) = a + b \Phi^{-1}\left ( FPF \right )
\end{equation*}
This suggests that a plot of $y = \Phi^{-1}\left ( \text{TPF} \right )$ vs. $x=\Phi^{-1}\left ( FPF \right )$ is expected to follow a straight line with slope $b$ and intercept $a$. Fitting a straight line to such data is generally performed by the method of least-squares, a capability present in most software packages and spreadsheets. Alternatively, one can simply visually draw the best straight line that fits the points, memorably referred to [@RN300] as "chi-by-eye". This was the way parameters of the binormal model were estimated prior to Dorfman and Alf's work [@dorfman1969maximum]. The least-squares method is a quantitative way of accomplishing the same aim. If $\left ( x_t,y_t \right )$ are the data points, one constructs $S$, the sum of the squared deviations of the observed ordinates from the predicted values (since $R$ is the number of ratings bins, the summation runs over the $R-1$ operating points):
\begin{equation*}
S = \sum_{i=1}^{R-1}\left ( y_i - \left ( a + bx_i \right ) \right )^2
\end{equation*}
The idea is to minimize S with respect to the parameters $(a,b)$. One approach is to differentiate this with respect to $a$ and $b$ and equate each resulting derivate expression to zero. This yields two equations in two unknowns, which are solved for $a$ and $b$. If the reader has never done this before, one should go through these steps at least once, but it would be smarter in future to use software that does all this. In `R` the least-squares fitting function is `lm(y~x)`, which in its simplest form fits a linear model `lm(y~x)` using the method of least-squares (in case you are wondering `lm` stands for linear model, a whole branch of statistics in itself; in this example one is using its simplest capability).
```{r, fig.cap = "The straight line fit method of estimating parameters of the fitting model.", fig.show='hold'}
# ML estimates of a and b (from Eng JAVA program)
# a <- 1.3204; b <- 0.6075
# # these are not used in program; just here for comparison
FPF <- c(0.017, 0.050, 0.183, 0.5)
# this is from Table 6.11, last two rows
TPF <- c(0.440, 0.680, 0.780, 0.900)
# ...do...
PhiInvFPF <- qnorm(FPF)
# apply the PHI_INV function
PhiInvTPF <- qnorm(TPF)
# ... do ...
fit <- lm(PhiInvTPF~PhiInvFPF)
print(fit)
```
```{r binormal-model-line-fit, fig.cap = "The straight line fit method of estimating parameters of the fitting model.", fig.show='hold', echo=FALSE}
pointsData <- data.frame(PhiInvFPF = PhiInvFPF,
PhiInvTPF = PhiInvTPF)
pointsPlot <- ggplot(data = pointsData,
mapping =
aes(x = PhiInvFPF,
y = PhiInvTPF)) +
geom_point(size = 2) +
theme(
axis.title.y = element_text(size = 18,face="bold"),
axis.title.x = element_text(size = 18,face="bold")) +
geom_abline(
slope = fit$coefficients[2],
intercept = fit$coefficients[1], linewidth = 0.75)
p1 <- pointsPlot
print(p1)
```
Fig. \@ref(fig:binormal-model-line-fit) shows operating points from Table \@ref(tab:ratings-paradigm-example-table), transformed by the $\Phi^{-1}$ function; the slope of the line is the least-squares estimate of the $b$ parameter and the intercept is the corresponding $a$ parameter of the binormal model.
The last line contains the least squares estimated values, $a$ = 1.3288 and $b$ = 0.6307. The corresponding maximum likelihood estimates of these parameters, as yielded by the Eng web code, see Appendix, are listed in line 4 of the main program: $a$ = 1.3204 and $b$ = 0.6075. The estimates appear to be close, particularly the estimate of $a$ , but there are a few things wrong with the least-squares approach. First, the method of least squares assumes that the data points are independent. Because of the manner in which they are constructed, namely by cumulating points, the independence assumption is not valid for ROC operating points. Cumulating the 4 and 5 responses constrains the resulting operating point to be above and to the right of the point obtained by cumulating the 5 responses only, so the data points are definitely not independent. Similarly, cumulating the 3, 4 and 5 responses constrains the resulting operating point to be above and to the right of the point obtained by cumulating the 4 and 5 responses, and so on. The second problem is the linear least-squares method assumes there is no error in measuring x; the only source of error that is accounted for is in the y-coordinate. In fact, both coordinates of an ROC operating point are subject to sampling error. Third, disregard of error in the x-direction is further implicit in the estimates of the thresholds, which according to Eqn. (6.2.19), is given by:
\begin{equation*}
\zeta_r = - \Phi^{-1}\left ( FPF_r \right )
\end{equation*}
These are "rigid" estimates that assume no error in the FPF values. As was shown in Chapter \@ref(binary-task), 95% confidence intervals apply to these estimates.
A historical note: prior to computers and easy access to statistical functions the analyst had to use a special plotting paper, termed “double probability paper”, that converted probabilities into x and y distances using the inverse function.
### Maximum likelihood estimation (MLE)
The approach taken by Dorfman and Alf was to maximize the likelihood function instead of S. The likelihood function is the probability of the observed data given a set of parameter values, i.e.,
\begin{equation*}
\text {L} \equiv P\left ( data \mid \text {parameters} \right )
\end{equation*}