-
Notifications
You must be signed in to change notification settings - Fork 0
/
08-rsm-fitting.Rmd
141 lines (81 loc) · 9.91 KB
/
08-rsm-fitting.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
# RSM fitting {#rsm-fitting}
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(kableExtra)
library(seqinr)
library(RJafroc)
library(ggplot2)
library(gridExtra)
library(binom)
library(here)
```
## How much finished 95% {#rsm-fitting-how-much-finished}
## Introduction {#rsm-fitting-intro}
The Radiological Search Model (RSM) is detailed in my [FROC paradigm book](https://dpc10ster.github.io/RJafrocFrocBook/). Early efforts at estimating RSM parameters revealed that an FROC curve based estimation method patterned after [@edwards2002maximum] worked only for designer level CAD data and not for human observer data. Subsequent effort focused on ROC curve-based fitting, and this proved successful at fitting human observer datasets. A preliminary account of this work is in [@chakraborty2011estimating].
> It may be surprising that the ROC curve based fitting works, which implies that one does not even need FROC data to estimate RSM parameters. I have previously stated that the ROC paradigm ignores search so how can one estimate search-model parameters from ROC data? The reason is that the *shape* of the RSM-predicted ROC curve and the *location* of the end-point depend on the RSM parameters.
## ROC Likelihood function {#rsm-fitting-roc-likelihood}
In Chapter `TempComment \@ref(rsm-predictions)` expressions were derived for the coordinates $(x,y)$ of the ROC curve predicted by the RSM, see Eqn. `TempComment \@ref(eq:rsm-predictions-fpf)` and Eqn. `TempComment \@ref(eq:rsm-predictions-tpf2)`, where $x\equiv x(\zeta,\lambda)$ and $y \equiv y(\zeta , \mu, \lambda, \nu, \overrightarrow{f_L})$.
Let $(F_r,T_r)$ denote the number of false positives and true positives in the ROC rating bin $r$ defined by thresholds $[\zeta_r, \zeta_{r+1})$ where $r = 0, 1, ..., R_{FROC}$. $(F_0,T_0)$ represent the numbers of non-diseased and diseased cases respectively with no marks, $(F_1,T_1)$ represent the corresponding numbers with highest rating equal to one, etc.
The probability $P_{1r}$ of a count in non-diseased ROC bin $r$ is:
\begin{equation}
P_{1r} = x\left ( \zeta_r \right ) - x\left ( \zeta_{r+1} \right )\\
(\#eq:rsm-fitting-roc-p1r)
\end{equation}
The probability $P_{2r}$ of a count in diseased ROC bin $r$ is:
\begin{equation}
P_{2r} = y\left ( \zeta_r \right ) - y\left ( \zeta_{r+1} \right )\\
(\#eq:rsm-fitting-roc-p2r)
\end{equation}
Ignoring combinatorial factors that do not depend on parameters the likelihood function is:
$$\left ( P_{1r} \right )^{F_r} \left ( P_{2r} \right )^{T_r}$$
The log-likelihood function is:
\begin{equation}
LL_{ROC} \left ( \mu, \lambda, \nu, \overrightarrow{f_L} \right )= \sum_{r=0}^{R_{FROC}} \left [F_r log \left (P_{1r} \right ) + T_r log \left (P_{2r} \right ) \right ] \\
(\#eq:rsm-fitting-roc-ll2)
\end{equation}
The total number of parameters to be estimated, including thresholds, is $3+R_{FROC}$. Maximizing the likelihood function yields parameter estimates.
The Broyden–Fletcher–Goldfarb–Shanno (BFGS) [@shanno1970optimal; @shanno1970conditioning; @goldfarb1970family; @fletcher1970new; @fletcher2013practical; @broyden1970convergence] minimization algorithm, as implemented as function `mle2()` in R-package [@R-bbmle] was used to minimize the negative of the likelihood function. Since the BFGS-algorithm varies each parameter in an unrestricted range $(-\infty, \infty)$, which would cause problems (e.g., RSM parameters cannot be negative and thresholds are subject to an ordering constraint) appropriate transformations ("forward" and "inverse") were used so that, irrespective of values chosen by the BFGS-algorithm, the values supplied to the log-likelihood function were always valid.
## Implementation {#rsm-fitting-fit-rsm}
Function `FitRsmROC()` fits an RSM-predicted ROC curve to a **binned single-modality single-reader ROC** dataset. It is called by `ret <- FitRsmRoc(binnedRocData, lesDistr$Freq, trt = 1, rdr = 1)`, where `binnedRocData` is a binned multi-treatment multi-reader ROC dataset, `lesDistr` is the lesion distribution vector $\overrightarrow{f_L}$ for the dataset and `trt` and `rdr` are the desired treatment and reader to extract from the dataset, each of which defaults to one.
The return value `ret` is a `list` with the following elements:
>
* `ret$mu` The RSM $\mu$ parameter.
* `ret$lambda` The RSM $\lambda$ parameter.
* `ret$nu` The RSM $\nu$ parameter.
* `ret$zetas` The RSM $\zeta$ parameters.
* `ret$AUC` The RSM fitted ROC-AUC.
* `ret$StdAUC` The standard deviation of AUC.
* `ret$NLLIni` The initial value of negative log-likelihood.
* `ret$NLLFin` The final value of negative log-likelihood.
* `ret$ChisqrFitStats` The chisquare goodness of fit (if it can be calculated).
* `ret$covMat` The covariance matrix of the parameters (if it can be calculated).
* `ret$fittedPlot` A `ggplot` object containing the fitted plots along with the empirical operating points and error bars.
## `FitRsmROC` usage example {#rsm-fitting-fitrsmroc-usage-example}
* The following example uses the *first* treatment in `dataset04`; this is a 5 treatment 4 radiologist FROC dataset [@zanca2009evaluation] consisting of 200 cases acquired on a 5-point integer scale, i.e., it is already binned. If not one needs to bin the dataset using `DfBinDataset()`. The number of parameters to be estimated increases with the number of bins since for each additional bin one needs to estimate an additional cutoff parameter.
```{r}
rocData <- DfFroc2Roc(dataset04)
lesDistr <- UtilLesDistr(dataset04)
ret <- FitRsmRoc(rocData, lesDistr = lesDistr$Freq)
```
The lesion distribution vector is `r lesDistr$Freq`. This means that fraction `r lesDistr$Freq[1]` of diseased cases contain one lesion, fraction `r lesDistr$Freq[2]` contain two lesions and fraction `r lesDistr$Freq[3]` contain three lesions.
The fitted parameter values are as follows (not shown are cutoffs excepting $\zeta_1$, the chi-square statistic and the covariance matrix):
>
* $\mu$ = `r round(ret$mu, 3)`
* $\lambda$ = `r round(ret$lambda,3)`
* $\nu$ = `r round(ret$nu,3)`
* $\zeta_1$ = `r round(ret$zetas[1],3)`
* $\text{AUC}$ = `r round(ret$AUC,4)`
* $\sigma (\text{AUC})$ = `r round(ret$StdAUC,3)`
* $\text{NLLIni}$ = `r round(ret$NLLIni,2)`
* $\text{NLLFin}$ = `r round(ret$NLLFin,2)`
<!-- The relatively large separation parameter $\mu$ implies good lesion-classification performance. The large $\lambda$ parameter implies poor lesion-localization performance. On the average the observer generates `r round(ret$lambda,2)` latent NL marks per image. However, because of the relatively large value of $\zeta_1$, i.e., `r round(ret$zetas[1],2)`, only fraction `r round(pnorm(-ret$zetas[1]),3)` of these are actually marked, resulting in `r round(ret$lambda*pnorm(-ret$zetas[1]),2)` actual marks per image. Lesion-localization performance depends on the numbers of latent marks, i.e., $\lambda$ and $\nu$, not the actual numbers of marks. -->
The fitting program decreased the negative log-likelihood (NLL) from `r round(ret$NLLIni,2)` to `r round(ret$NLLFin,2)`. A decrease in negative log-likelihood is expected since one is maximizing the log-likelihood. Because the RSM contains 3 parameters, which is one more than conventional ROC models, the chi-square goodness of fit statistic usually cannot be calculated except for large datasets - the criterion of at least 5 counts in each bin for true positives and false positives is usually hard to meet. Using the method described in Section \@ref(binormal-model-curve-fitting-validation) the degrees of freedom is $\text{df} = R_{FROC} - 3$^[The expansive form of the relevant equation is $R_{FROC} + R_{FROC} - (3 + R_{FROC}) = R_{FROC} - 3$].
Shown next is the fitted plot and exact 95% confidence intervals for the lowest and highest operating points.
```{r rsm-fitting-example-fit, fig.cap= "RSM-fitted ROC curve for treatment one and reader one of `dataset04`.", fig.show='hold', echo = FALSE}
print(ret$fittedPlot)
```
The fitted ROC curve in Fig. \@ref(fig:rsm-fitting-example-fit) is proper. The area under the proper ROC (calculated by numerical integration of the RSM-predicted curve) is `r round(ret$AUC,3)` which will be shown in Chapter \@ref(rsm-3-fits) to be close to that yielded by CBM and PROPROC and higher than the binormal model fitted value.
## Discussion / Summary {#rsm-fitting-discussion-summary}
Over the years, there have been several attempts at fitting FROC data. Prior to the RSM-based ROC curve approach described in this chapter all methods aimed at fitting FROC curves in the mistaken belief that this approach was using all the data. The earliest was [@chakraborty1989maximum]. This was followed by [@swensson1996unified], subsequently shown to be equivalent to my earlier work as far as predicting the FROC curve was concerned [@chakraborty2008operating]. In the meantime CAD developers, who relied heavily on the FROC curve to evaluate their algorithms, developed an empirical approach that was subsequently put on a formal basis in [@edwards2002maximum]. While this method works with CAD data it fails with radiologist data -- the reason is that it assumes $\zeta_1 = -\infty$ (i.e., the algorithm marks every suspcioius region) which is not true for radiologists as discussed in my 2017 print book.
This chapter describes an approach to fitting ROC curves using the RSM. On the face of it fitting the ROC curve seems to be ignoring much of the data (since the ROC rating on a case is the rating of the highest-rated mark on that case). If the case has several marks only the highest rated one is used. In fact the highest rated mark contains information about the other marks on the case, namely that they were all rated lower. For example the highest of a number of samples from a uniform distribution is a sufficient statistic, i.e., it contains all the information contained in the observed samples. Neglect of the marks rated lower is not as bad as might seem at first.
The next chapter describes application of the RSM and other available proper ROC fitting methods to a number of datasets.