Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#24 Data retrieval vignette update #72

Merged
merged 2 commits into from
Jun 22, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions vignettes/Data_retrieval.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,32 @@ head(mc2_fmtd)

When loading data, the user must indicate the applicable fields and ids for the corresponding data level of interest. Loading level 0 (SC0 and MC0), MC1, and MC2 data the assay component id ($\mathit{acid}$) will always be used. As described in Table 1 of the tcpl Data Processing vignette, SC1 and MC3 processing levels perform data normalization where assay component ids ($\mathit{acid}$) are converted to assay endpoint ids ($\mathit{aeid}$). Thus, the SC1 and MC3 data tables contain both $\mathit{acid}$ and ($\mathit{aeid}$) ID's. Data can be loaded using either id as long as it is properly specified. Loading SC2, MC4, and MC5, one should always use the assay endpoint id ($\mathit{aeid}$). Selected id(s) are based on the primary key within each table containing data. Examples of loading data are detailed in later sections.

### Assay Annotations

Assay source, assay, assay component, and assay endpoint are registered via tcpl scripting into a collection of tables. The database structure takes the annotations and organizes them as attributes of the assay conductors, the assays (i.e., experiments), the assay components (i.e., raw readouts), or the assay endpoints (i.e., normalized component data) enabling aggregation and differentiation of the data generated through ToxCast and Tox21. The annotations capture four types of information:

i. Identification information
ii. Design information such as the technology, format, and objective aspects that decompress the assay’s innovations,
iii. Target information such as the target of technological measurement and the
biologically intended target, and
iv. Analysis information about how the data were processed and analyzed.

```{r annotation_query_ex, eval = FALSE}
#load libraries and connections
library(RMySQL)
con <- dbConnect(drv = RMySQL::MySQL(), user="user", pass="pass", db="InvitroDB", host="host")
#query database using RMySQL:
#use source table to identify which ids are needed in subsequent queries.
tcplLoadAsid()
source <- tcplLoadAeid(fld="asid", val=1, add.fld = c("aid", "anm", "acid", "acnm"))
#select annotation and subset by ids or name
assay <- dbGetQuery(con, "SELECT * FROM invitrodb.assay where aid=1;")
component <- dbGetQuery(con, "SELECT * FROM invitrodb.assay_component;")
component <- subset(component, acid %in% source$acid)
endpoint <- dbGetQuery(con, "SELECT * FROM invitrodb.assay_component_endpoint;")
endpoint <- endpoint[grepl("ATG", endpoint$assay_component_endpoint_name),]
```

### Chemical Information

The <font face="CMTT10">tcplLoadChem</font> function returns chemical information for user specified parameters, e.g. the chemical name (chnm) and chemical id (chid). The <font face="CMTT10">tcplLoadClib</font> function provides more information about the ToxCast chemical library used for sample generation.
Expand Down