technical review JDR

USEPA · Sep 4, 2024 · dbd7648 · dbd7648
1 parent 97a9d52
commit dbd7648
Showing 1 changed file with 8 additions and 7 deletions.
diff --git a/vignettes/Introduction_Appendices.Rmd b/vignettes/Introduction_Appendices.Rmd
@@ -61,7 +61,7 @@ The <font face="CMTT10"> tcpl </font> package is a flexible analysis pipeline is
 
 </center>
 
-The original <font face="CMTT10"> tcplFit() </font> functions performed basic concentration response curve fitting. Processing with <font face="CMTT10"> tcpl v3 </font> and beyond depends on the stand-alone <font face="CMTT10"> tcplFit2 </font> package to allow a wider variety of concentration-response models when using invitrodb in the 4.0 schema and beyond. Using <font face="CMTT10"> tcpl_v3 </font> with the schema from invitrodb versions 2.0-3.5 will still default to <font face="CMTT10"> tcplFit() </font> modeling with constant, Hill, and gain-loss. The main improvement provided by updating to using <font face="CMTT10"> tcplFit2</font> is inclusion of concentration-response models like those contained in the program [BMDExpress](https://github.com/auerbachs/BMDExpress-2). These models include polynomial, exponential, and power functions in addition to the original Hill, gain-loss, and constant models. Similar to the program BMDExpress, <font face="CMTT10"> tcplFit2 </font> curve-fitting uses a defined Benchmark Response (BMR) level to estimate a benchmark dose (BMD), which is the concentration where the curve-fit intersects with this BMR threshold. One final addition was to let the hit call value be a continuous number ranging from 0 to 1 (in contrast to binary hit call values from <font face="CMTT10"> tcplFit() </font>). While developed primarily for ToxCast, the <font face="CMTT10"> tcpl </font> package is written to be generally applicable to the chemical-screening community.
+The original <font face="CMTT10"> tcplFit() </font> functions performed basic concentration response curve fitting. Processing with <font face="CMTT10"> tcpl v3 </font> and beyond depends on the stand-alone <font face="CMTT10"> tcplFit2 </font> package to allow a wider variety of concentration-response models when using invitrodb in the 4.0 schema and beyond. Using <font face="CMTT10"> tcpl_v3 </font> with the schema from invitrodb versions 2.0-3.5 will still default to <font face="CMTT10"> tcplFit() </font> modeling with constant, Hill, and gain-loss. The main improvement provided by updating to using <font face="CMTT10"> tcplFit2</font> is inclusion of concentration-response models like those contained in the program [BMDExpress2](https://github.com/auerbachs/BMDExpress-2). These models include polynomial, exponential, and power functions in addition to the original Hill, gain-loss, and constant models. Similar to the program BMDExpress, <font face="CMTT10"> tcplFit2 </font> curve-fitting uses a defined Benchmark Response (BMR) level to estimate a benchmark dose (BMD), which is the concentration where the curve-fit intersects with this BMR threshold. One final addition was to let the hit call value be a continuous number ranging from 0 to 1 (in contrast to binary hit call values from <font face="CMTT10"> tcplFit() </font>). While developed primarily for ToxCast, the <font face="CMTT10"> tcpl </font> package is written to be generally applicable to the chemical-screening community.
 
 The <font face="CMTT10"> tcpl </font> package includes processing functionality for two screening paradigms: (1) single-concentration (SC) and (2) multiple-concentration (MC) screening. SC screening consists of testing chemicals at one to three concentrations, often for the purpose of identifying potentially active chemicals to test in the multiple-concentration format. MC screening consists of testing chemicals across a concentration range, such that the modeled activity can give an estimate of potency, efficacy, etc. 
 
@@ -754,14 +754,14 @@ tcplRegister(what = "acsn", flds = list(acid = 1, acsn = "TCPL-MC-Demo"))
 ```
 
 ## Assay Component Endpoint
-[Assay component endpoint](aeid), or “endpoint” for short, represents the normalized component data. **To register an endpoint and create an $\mathit{aeid}$, an $\mathit{acid}$ must be provided to map the endpoint to the correct component.** In past <font face="CMTT10"> tcpl </font> versions, each component could have up to two endpoints therefore endpoint names would express directionality (*_up/_down*). <font face="CMTT10"> tcpl v3+ </font> allows bidirectional fitting to capture both the gain and loss of signal. Therefore with <font face="CMTT10"> tcpl v3+ </font>, the endpoint name will usually be the same as the component name.
+[Assay component endpoint](#aeid), or “endpoint” for short, represents the normalized component data. **To register an endpoint and create an $\mathit{aeid}$, an $\mathit{acid}$ must be provided to map the endpoint to the correct component.** In past <font face="CMTT10"> tcpl </font> versions, each component could have up to two endpoints therefore endpoint names would express directionality (*_up/_down*). <font face="CMTT10"> tcpl v3+ </font> allows bidirectional fitting to capture both the gain and loss of signal. Therefore with <font face="CMTT10"> tcpl v3+ </font>, the endpoint name will usually be the same as the component name.
 ```{r eval = FALSE, message = FALSE}
 tcplLoadAeid(fld = "asid", val = 1, add.fld = c("aid", "anm", "acid", "acnm"))
 tcplRegister(what = "aeid", flds = list(acid = 1, aenm = "TOX21_ERa_BLA_Agonist_ratio", normalized_data_type = "percent_activity", export_ready = 1, burst_assay = 0))
 ```
 Registering an assay endpoint also requires the $\mathit{normalized\_data\_type}$ field. The normalized_data_type is used when plotting and currently, the package supports the following values: percent_activity, log2_fold_induction, log10_fold_induction, and fold_induction. Any other values will be treated as "percent_activity."
 
-Other required fields to register an assay endpoint do not have to be explicitly defined and will default if not provided. These fields represent Boolean values (1 or 0, 1 being <font face="CMTT10"> TRUE </font>). The $\mathit{export\_ready}$ field indicates (1) the data is done and ready for export or (0) still in progress. The $\mathit{burst\_assay}$ field is specific to multiple-concentration processing and indicates (1) the assay endpoint is included in the burst distribution calculation or (0) not.
+Other required fields to register an assay endpoint do not have to be explicitly defined and will default to 0 if not provided. These fields represent Boolean values (1 or 0, 1 being <font face="CMTT10"> TRUE </font>). The $\mathit{export\_ready}$ field indicates (1) the data is done and ready for export or (0) still in progress. The $\mathit{burst\_assay}$ field is specific to multiple-concentration processing and indicates (1) the assay endpoint is included in the burst distribution calculation or (0) not.
 
 ## Naming Revision
 There are circumstances where assay, assay component, and assay endpoint names change. The $\mathit{aid}$, $\mathit{acid}$, and $\mathit{aeid}$ are considered more stable in the database, and these auto-incremented keys should not change. To revise naming for assay elements, the correct id must be specified in the <font face="CMTT10"> **tcplUpdate** </font> statement to prevent overwriting data.
@@ -1344,7 +1344,7 @@ mc3 <- tcplPrepOtpt(mc3)
 
 For demonstration purposes, the <font face="CMTT10"> mc_vignette </font> R data object is provided in the package since the vignette is not directly connected to such a database.  The <font face="CMTT10"> mc_vignette </font> object contains a subset of data from levels 3 through 5 from invitrodb  v4.2.  The following code loads the example mc3 data object, then plots the concentration-response series for an example spid with the summary estimates indicated.
 
-```{r fig.align='center',message=FALSE, class.source="scroll-100",message=FALSE,fig.dim=c(8,10), eval=FALSE}
+```{r fig.align='center',message=FALSE,message=FALSE,fig.dim=c(8,10),eval = FALSE}
 # Load the example data from the `tcpl` package.
 data(mc_vignette, package = 'tcpl')
 # Allocate the level 3 example data to `mc3`.
@@ -1564,7 +1564,7 @@ htmlTable(output,
 
 ```
 
-Most models in <font face="CMTT10">tcplfit2</font> assume the background response is zero and the absolute response (or initial response) is increasing. In other words, these models fit a monotonic curve in either direction. The polynomial 2 (poly2) model is an exception with two parameterization options. By default, the biphasic parameterization will be used in <font face="CMTT10"> tcpl </font>. A biphasic poly2 model fits responses that are increasing first and then decreasing, and vice versa (assuming the background response is zero). If biphasic responses are not reasonable, data can be fit using the monotonic-only parameterization in a standalone application of <font face="CMTT10"> tcplfit2_core </font> with the parameter <font face="CMTT10"> biphasic=FALSE</font> assigned. 
+Most models in <font face="CMTT10">tcplfit2</font> assume the background response is zero and the absolute response (or initial response) is increasing. In other words, these models fit a monotonic curve in either direction. The polynomial 2 (poly2) model is an exception with two parameterization options. The biphasic parameterization is what is used in <font face="CMTT10"> tcpl </font>. A biphasic poly2 model fits responses that are increasing first and then decreasing, and vice versa (assuming the background response is zero). If biphasic responses are not reasonable, data can be fit using the monotonic-only parameterization in a standalone application of <font face="CMTT10"> tcplfit2_core </font> with the parameter <font face="CMTT10"> biphasic=FALSE</font> assigned. 
 
 Upon completion of model fitting, each model gets a success designation: 1 if the model optimization converges, 0 if the optimization fails, and NA if 'nofit' was set to TRUE within <font face="CMTT10"> tcplFit2::tcplfit2_core </font> function.  Similarly, if the Hessian matrix was successfully inverted then 1 indicates a successful covariance calculation (cov); otherwise 0 is returned.  Finally, in cases where 'nofit' was set to TRUE (within <font face="CMTT10"> tcplFit2::tcplfit2_core </font>) or the model fit failed the Akaike information criterion (aic), root mean squared error (rme), model estimated responses (modl), model parameters (parameters), and the standard deviation of model parameters (parameter sds) are set to NA.  A complete list of model output parameters is provided below:
 
@@ -1890,6 +1890,7 @@ mc4_ss <- mc4_example %>% dplyr::filter(spid == "01504209") # Level 4 - model fi
 mc5_ss <- mc5_example %>% dplyr::filter(spid == "01504209") # Level 5 - best fit & est.
 # Next, we need to obtain the smooth curve estimate for the best model found
 # in the Level 5 analyses of the `tcpl` pipeline.
+# See Level 4 example above for how estDR is calculated.
 estDR <- estDR %>%
   dplyr::mutate(., best_modl = ifelse(variable == mc5_ss[, modl],
                                      yes = "best model", no = NA))
@@ -2036,7 +2037,7 @@ Additional information on derivations on potency estimates is found in [Data Int
 In addition to the continuous $hitc$ and the $fitc$, cautionary flags on curve-fitting can provide context to interpret potential false positives (or negatives) in ToxCast data, enabling the user to decide the stringency with which to filter these targeted in vitro screening data. These flags are programmatically generated and indicate characteristics of a curve that need extra attention or potential anomalies in the curve or data. See the [Data Interpretation>Flags](#flags) section for more details.
 
 ## - Level 7
-For invitrodb v4.2 onward, a new mc7 table contains pre-generated AED values using several potency metrics from invitrodb and a subset of models from the High-throughput Toxicokinetics R package <font face="CMTT10"> httk </font>. AEDs are generated in a separate .R script using the [httk R package](https://cran.r-project.org/web/packages/httk/index.html) because of the resource-intensive nature of running the Monte Carlo simulations to get estimates of plasma concentration for the median (50th %-ile) and most sensitive (95th %-ile) toxicokinetic individuals for both the 3-compartment steady state (3compartmentss) model and the physiologically-based toxicokinetic (pbtk) model for the large number of chemicals included in invitrodb v4.2 (generation of the table as configured in the current code took 24h using 40 cores). See the [Administered Equivalent Dose](#aed) section.
+For invitrodb v4.2 onward, a new mc7 table contains pre-generated AED values using several potency metrics from invitrodb and a subset of models from the High-throughput Toxicokinetics R package <font face="CMTT10"> httk</font>. AEDs are generated in a separate script using the [httk R package](https://cran.r-project.org/web/packages/httk/index.html).  This is done separately due to the resource-intensive nature of running the Monte Carlo simulations to get estimates of plasma concentration for the median (50th %-ile) and most sensitive (95th %-ile) toxicokinetic individuals.  Moreover, this is applied to both the 3-compartment steady state (3compartments) model and the physiologically-based toxicokinetic (pbtk) model for all chemicals included in invitrodb v4.2 (generation of the table as configured in the current code took 24h using 40 cores). See the [Administered Equivalent Dose](#aed) section.
 
 ## Compiled Processing Examples
 
@@ -2336,7 +2337,7 @@ Options_Applied <- c("Version 2.3.1",
 "Quantitative structure property relationships is loaded via load_sipes2017(), load_pradeep2020(), and load_dawson2021() to be able to make AED estimates for as many chemicals as possible.", 
   "ac50, acc, bmd",
   "Hitc >= 0.9 </br>
-Number of mc6 flags is >= 4 </br>
+Number of mc6 flags is < 4 </br>
 Fit category is not 36. This removes borderline responses resulting in ac50 below the concentration range screened, which is not considered to be quantitatively informative.
 .")