epinowcast · seabbs · Sep 18, 2024 · Sep 13, 2024 · Sep 13, 2024 · Sep 16, 2024
diff --git a/vignettes/.gitignore b/vignettes/.gitignore
@@ -0,0 +1 @@
+/.quarto/
diff --git a/vignettes/library.bib b/vignettes/library.bib
@@ -28,3 +28,25 @@ @ARTICLE{Park2024
   year    =  2024,
   doi     = "10.1101/2024.01.12.24301247"
 }
+
+@article{leung1997censoring,
+  title={Censoring issues in survival analysis},
+  author={Leung, Kwan-Moon and Elashoff, Robert M and Afifi, Abdelmonem A},
+  journal={Annual review of public health},
+  volume={18},
+  number={1},
+  pages={83--104},
+  year={1997},
+  publisher={Annual Reviews 4139 El Camino Way, PO Box 10139, Palo Alto, CA 94303-0139, USA}
+}
+
+@article{cori2013new,
+  title={A new framework and software to estimate time-varying reproduction numbers during epidemics},
+  author={Cori, Anne and Ferguson, Neil M and Fraser, Christophe and Cauchemez, Simon},
+  journal={American journal of epidemiology},
+  volume={178},
+  number={9},
+  pages={1505--1512},
+  year={2013},
+  publisher={Oxford University Press}
+}
diff --git a/vignettes/why-it-works.Rmd b/vignettes/why-it-works.Rmd
@@ -0,0 +1,191 @@
+---
+title: "Interval censoring and truncation in epidemiological delay modelling"
+author: "Samuel Brand"
+format: html
+bibliography: library.bib
+---
+
+# Introduction {#introduction}
+
+Time to event analysis, also known as survival analysis, concerns estimating the distribution of delay times between events. A distinctive feature of the field are the methodological techniques used to deal with the missing data problems common in data sets of delay times [@leung1997censoring]. In `primarycensoreddist` we focus on two particular missing data problems:
+
+-   **Interval censoring**. The primary (start) time and/or the secondary (end) time of the delay is unobserved but known to be within an interval.
+-   **Truncation**. Delay data items are only contained in the data set conditional on aspect of the data point, for example if the secondary event is the observation time which add the data item to the data set.
+
+In statistical epidemiology, these missing data problems occur frequently in both data analysis and theoretical modelling.
+
+In data analysis, events in epidemiology are commonly reported as occurring in a particular day or week (*interval censoring*). Also, in an emergency situation data analysis can be happening on datasets that are streaming in real time, which introduces biases around which data items have been observed (*truncation*).
+
+In theoretical epidemiological modelling, it is often appropriate to model the evolution of an infectious disease as occurring in discrete time, for example in the [`EpiNow2`](https://epiforecasts.io/EpiNow2/) modelling package. This means appropriately discretising continuous distributions, such as the generation interval distribution. In `primarycensoreddist` we treat the discretisation of intrinsically continuous distributions as an *interval censoring* problem.
+
+# Statistical model used in `primarycensoreddist`
+
+As described in [Introduction](#introduction), `primarycensoreddist` focuses on a subset of methods from time to event analysis that address data missingness problems commonly found in epidemiological datasets. We present the statistical problem as a double interval censoring problem, where both the primary event time and the secondary event times are interval censored. We can recover single interval censoring problems by reducing one of the intervals to a point.
+
+## Survival function of secondary event time after primary event window
+
+A key assumption we make is that the censoring window for events is known and independent of the event time. This is known as non-informative censoring. It is useful to consider the time from the end (right) point of the primary censoring interval to the secondary time as a random variable,
+
+$$
+S_+ = T - C_P.
+$$
+
+Where $T$ is the delay distribution of interest and $C_P$ is interval between the end (right) point of the primary censoring window and the primary event time; note that by definition $C_P$ is not observed.
+
+With non-informative censoring, it is possible to derive the upper distribution function of $S_+$, or _survival function_ of $S_+$, from the distribution of $T$ and the distribution of $C_P$.
+
+$$
+Q_{S_+}(t) = \mathbb{E}_{C_P} \Big[Q_T(t + C_P)\Big] = \int_0^{W_P} Q_T(t + p) c(p) dp.
+$$
+
+Using integration by parts gives:
+$$
+Q_{S_+}(t) = Q_T(t + W_P) + \int_0^{W_P} f_T(t+p) C(p) dp.
+$$
+
+Where $c$ is the density function of delay from the primary event to the end (right) point of the primary censoring window with distribution function $C$, $Q_T$ is the survival function of the actual delay distribution of interest and $W_P$ is the length of the primary censoring window.
+
+## Probability mass of secondary event time within a secondary censoring window
+
+The probability of the secondary event time falling within a secondary censoring window of length $W_S$ that begins at time $n$ _after_ the primary censoring window is then given by:
+
+$$
+Pr(S_+ \in [n, n + W_S)) = Q_{S_+}(n) - Q_{S_+}(n + W_S).
+$$
+
+Which can in general be calculated by numerical quadrature.
+
+Note that the secondary event time can also occur within the primary censoring window. This happens with probability,
+$$
+Q_{S_+}(-W_P) - Q_{S_+}(0) = 1 - Q_T(W_P) - \int_0^{W_P} f_T(p) C(p) dp = Pr(T< C).
+$$
+
+Using a difference operator defined as:
+
+$$
+\Delta_{W}f(t) = f(t + W) - f(t).
+$$
+
+The common case where every primary and secondary event window is a single time unit, $W_P = W_S = 1$, simplifies the above expressions to:
+
+$$
+\begin{aligned}
+f_n = Pr(S_+ \in [n, n + 1)) &= Q_{S_+}(n) - Q_{S_+}(n + 1), \\
+& = - \Delta_1 Q_{S_+}(n),\qquad n = -1, 0,\dots.
+\end{aligned}
+$$
+Which is a discrete probability mass function of the secondary event time relative to the primary event time in units of the event window.
+
+## Analytic solutions for exponentially tilted primary event times
+
+In epidemiological analysis, it is common for primary events to be arriving at exponentially increasing or decreasing rates, for example, incidence of new infections in a growing epidemic. In this case, the distribution of the primary event time within its censoring window is biased by the exponential growth or decay. If we assume a reference uniform distribution within a primary censoring window $[k, k + W_P)$ then the distribution of the primary event time within the censoring window is the exponential tilted uniform distribution:
+
+$$
+f_P(t) \propto \exp(r t) \mathbb{1}_{[k, k + W_P]}(t).
+$$
+
+In this case, the distribution function for $C_P$, that is the length of time left in the primary censor window _after_ the primary event time, is given by:
+
+$$
+C(p) = {  1 - \exp(-r p) \over 1 - \exp(-r W_P)}, \qquad p \in [0, W_P].
+$$
+
+We consider two situations: 1) uniform primary event time ($r=0$) and 2) non-uniform primary event time ($r\neq0$).
+
+### Uniform primary event time ($r=0$)
+
+In this case, the primary event time is uniformly distributed within the primary censoring window. The survival function of $S_+$ is then:
+
+$$
+Q_{S_+}(t) = Q_T(t + W_P) + { 1 \over W_P} \int_0^{W_P} f_T(t+p) p~ dp.
+$$
+This is analytically solvable whenever the upper distribution function of $T$ is known and the mean of $T$ is analytically solvable from its integral definition.
+
+In each case it is easier to change the integration variable:
+
+$$
+\begin{aligned}
+Q_{S_+}(t) &= Q_T(t + W_P) + { 1 \over W_P} \int_t^{t+W_P} f_T(z) (z-t)~ dz \\
+&= Q_T(t + W_P) + { 1 \over W_P} \Big[  \int_t^{t+W_P} f_T(z) z~ dz - t \Delta_{W_P}F_T(t) \Big].
+\end{aligned}
+$$
+
+
+#### Gamma distribution
+
+The Gamma distribution has the density function:
+
+$$
+f_T(z;k, \theta) = {1 \over \Gamma(k) \theta^k} z^{k-1} \exp(-z/\theta).
+$$
+
+This gives a solution to the survival function of $S_+$ in terms of analytically available functions:
+
+$$
+\begin{aligned}
+Q_{S_+}(t; k, \theta) &= Q_T(t + W_P; k, \theta) + { 1 \over W_P} \left( {1 \over \Gamma(k) \theta^k} \int_t^{t+W_P} z^{k-1}  \exp(-z/\theta) (z-t)~ dz \right), \\
+&= Q_T(t + W_P; k, \theta) + { 1 \over W_P} \big[ k \theta \Delta_{W_P}F_T(t; k+1, \theta) - t \Delta_{W_P}F_T(t; k, \theta) \big].
+\end{aligned}
+$$
+
+Where we have used the standard Gamma integration trick of rewriting
+$$
+{z z^{k-1} \over \Gamma(k) \theta^k} = {k \theta z^k \over\Gamma(k+1) \theta^{k+1} }.
+$$
+
+In the special case of the equal event window then the discrete delay distribution is:
+
+$$
+\begin{aligned}
+f_n &= (n+2) F_T(n+2; k, \theta) + n F_T(n; k, \theta) - 2  (n+1) F_T(n+1; k, \theta) \\
+ &+  k \theta \Big( 2 F_T(n+1; k+1, \theta) - F_T(n; k+1, \theta) - F_T(n+2; k+1,\theta)  \Big).
+\end{aligned}
+$$
+
+Which was also found by Cori _et al_ [@cori2013new].
+
+#### Log-Normal distribution
+
+The Log-Normal distribution has the density function:
+
+$$
+f_T(z;\mu, \sigma) = {1 \over z \sigma \sqrt{2\pi}} \exp\left( - {(\log(z) - \mu)^2 \over 2 \sigma^2} \right).
+$$
+And distribution function:
+
+$$
+F_T(z;\mu, \sigma) = \Phi\left( {\log(z) - \mu \over \sigma} \right).
+$$
+Where $\Phi$ is the standard normal distribution function.
+
+This gives a solution to the survival function of $S_+$ in terms of analytically available functions:
+
+$$
+\begin{aligned}
+Q_{S+}(t ;\mu, \sigma) &= Q_T(t + W_P;\mu, \sigma) + { 1 \over W_P} \int_t^{t+W_P} {1 \over z \sigma \sqrt{2\pi}} \exp\left( - {(\log(z) - \mu)^2 \over 2 \sigma^2} \right) (z-t)~ dz, \\
+&= Q_T(t + W_P;\mu, \sigma) + { 1 \over W_P} \Big[ e^{\mu + \frac{1}{2} \sigma^2} \Delta_{W_P}F_T(t; \mu + \sigma^2, \sigma) - t\Delta_{W_P}F_T(t; \mu, \sigma) \Big]
+\end{aligned}
+$$
+
+Where we have used the Log-Normal integration trick of making a further substitution $y = (\ln z - \mu) / \sigma$ and using the shift property of the standard normal distribution function. This gives:
+
+$$
+\begin{aligned}
+\int_t^{t+W_P} z~ f_T(z; \mu, \sigma) dz &= {1 \over \sigma \sqrt{2\pi}} \int_{(\ln t - \mu)/\sigma}^{(\ln(t+W_P) - \mu)/\sigma} e^{\sigma y + \mu} e^{-y^2/2} dy\\
+&= e^{\mu + \frac{1}{2} \sigma^2} \int_{(\ln t - \mu)/\sigma}^{(\ln(t+W_P) - \mu)/\sigma} e^{-(y- \sigma)^2/2} dy \\
+&= e^{\mu + \frac{1}{2} \sigma^2} \Big[\Phi\Big({\ln(t+W_P) - \mu \over \sigma} - \sigma\Big) - \Phi\Big({\ln(t) - \mu \over \sigma} - \sigma\Big) \Big]\\
+&= e^{\mu + \frac{1}{2} \sigma^2} \Delta_{W_P}F_T(t; \mu + \sigma^2, \sigma).
+\end{aligned}
+$$
+
+In the special case of the equal event window then the discrete delay distribution is:
+
+$$
+\begin{aligned}
+f_n &= (n+2) F_T(n+2; \mu, \sigma) + n F_T(n; \mu, \sigma) - 2  (n+1) F_T(n+1; \mu, \sigma) \\
+ &+  e^{\mu + \frac{1}{2} \sigma^2} \Big( 2 F_T(n+1; \mu + \sigma^2, \sigma) - F_T(n; \mu + \sigma^2, \sigma) - F_T(n+2; \mu + \sigma^2, \sigma)  \Big).
+\end{aligned}
+$$
+
+
+## References