Skip to content

Commit

Permalink
Merge pull request #179 from gjgetzinger/master
Browse files Browse the repository at this point in the history
Also closes #196. Thank you for @gjgetzinger for your contribution! I think two separate topics were combined by accident, can you contact me at webchem at ropensci dot org so we can discuss why this could have happened? Thanks.
  • Loading branch information
stitam authored Jan 21, 2020
2 parents c7777e7 + 5e35215 commit e1a6fa1
Show file tree
Hide file tree
Showing 6 changed files with 140 additions and 17 deletions.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Authors@R: c(person("Eduard", "Szöcs", role = c("aut", "cre"),
person("Andreas", "Scharmüller", role = "ctb"),
person("Eric R", "Scott", role = "ctb"),
person("Jan", "Stanstrup", role = "ctb"),
person("Gordon", "Getzinger", role = "ctb"),
person("Tamás", "Stirling", role = "ctb"))
Maintainer: Tamás Stirling <[email protected]>
LazyLoad: yes
Expand All @@ -36,4 +37,4 @@ Imports:
Suggests:
testthat,
rcdk
RoxygenNote: 7.0.2
RoxygenNote: 6.1.1
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ export(ppdb)
export(ppdb_parse)
export(ppdb_query)
export(smiles)
export(srs_query)
export(wd_ident)
import(RCurl)
import(dplyr)
Expand Down
2 changes: 2 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ NEW FEATURES

* Retrieve data from ChEBI (https://www.ebi.ac.uk/chebi/) webservice with chebi_lite_entity() and chebi_comp_entity(). ChEBI comprises a rich data base on chemicals with bilogical interest [contributed by @andreasLD].
* Retrieve retention indices from NIST (https://webbook.nist.gov) with nist_ri() [PR #154, contributed by @Aariq]
* Get record details from US EPA Substance Registry Services (https://cdxnodengn.epa.gov/cdx-srs-rest/) with srs_query() [PR #179]
* "first" argument in cts_convert() and cir_query() and "interactive" argument in pc_synonyms() deprecated. Use "choices" instead to return either a list of all results, only the first result, or an interactive menu to choose a result to return. [contributed by @Aariq]

MINOR IMPROVEMENTS
Expand All @@ -15,6 +16,7 @@ BUG FIXES

* cs_prop() failed with duplicated return values [issue #148, reported and fixed by @stanstrup]
* pp_query() failed when compound present, but no properties [issue #151, reported and fixed by @stanstrup]
* ci_query() failed when missing table [issue #196, reported and fixed by @gjgetzinger]
* get_csid() failed because of a major change in the ChemSpider API [issue #149, PR #165, contributed by @stitam]
* multiple functions failed because of a major change in the ChemSpider API [issue #149, contributed by @stitam]
* cir_query() mistook NA for sodium [issue #158, reported and fixed by @Aariq]
Expand Down
72 changes: 56 additions & 16 deletions R/chemid.R
Original file line number Diff line number Diff line change
Expand Up @@ -149,22 +149,62 @@ ci_query <- function(query, type = c('name', 'rn', 'inchikey'),
source_url <- gsub('^(.*)\\?.*', '\\1', qurl)
}

name <- xml_text(xml_find_all(ttt, "//h3[contains(., 'Name of Substance')]/following-sibling::div[1]//li"))
synonyms <- xml_text(xml_find_all(ttt, "//h3[contains(., 'Synonyms')]/following-sibling::div[1]//li"))
cas <- xml_text(xml_find_all(ttt, "//h3[contains(., 'CAS Registry')]/following-sibling::ul[1]//li"))
inchi <- gsub('\\n|\\t', '',
xml_text(xml_find_all(ttt, "//h3[contains(., 'InChI')]/following-sibling::text()[1]"))[1]
)
inchikey <- gsub('\\n|\\t|\\r', '',
xml_text(xml_find_all(ttt, "//h3[contains(., 'InChIKey')]/following-sibling::text()[1]"))
)
smiles <- gsub('\\n|\\t|\\r', '',
xml_text(xml_find_all(ttt, "//h3[contains(., 'Smiles')]/following-sibling::text()[1]"))
)
toxicity <- html_table(xml_find_all(ttt, "//h2[contains(., 'Toxicity')]/following-sibling::div//table"))[[1]]
physprop <- html_table(xml_find_all(ttt, "//h2[contains(., 'Physical Prop')]/following-sibling::div//table"))[[1]]
physprop[ , 'Value'] <- as.numeric(physprop[ , 'Value'])
#= same as physprop
if(is.na(xml_find_first(ttt, "//h3[contains(., 'Name of Substance')]/following-sibling::div[1]//li"))){
name <- NA
}else{
name <- xml_text(xml_find_all(ttt, "//h3[contains(., 'Name of Substance')]/following-sibling::div[1]//li"))
}

if(is.na(xml_find_first(ttt, "//h3[contains(., 'Synonyms')]/following-sibling::div[1]//li"))){
synonyms <- NA
}else{
synonyms <- xml_text(xml_find_all(ttt, "//h3[contains(., 'Synonyms')]/following-sibling::div[1]//li"))
}

if(is.na(xml_find_first(ttt, "//h3[contains(., 'CAS Registry')]/following-sibling::ul[1]//li"))){
cas <- NA
} else {
cas <- xml_text(xml_find_all(ttt, "//h3[contains(., 'CAS Registry')]/following-sibling::ul[1]//li"))
}

if(is.na(xml_find_first(ttt, "//h3[contains(., 'InChI')]/following-sibling::text()[1]"))){
inchi <- NA
} else {
inchi <- gsub('\\n|\\t', '',
xml_text(xml_find_all(ttt, "//h3[contains(., 'InChI')]/following-sibling::text()[1]"))[1]
)
}

if(is.na(xml_find_first(ttt, "//h3[contains(., 'InChIKey')]/following-sibling::text()[1]"))){
inchikey <- NA
} else {
inchikey <- gsub('\\n|\\t|\\r', '',
xml_text(xml_find_all(ttt, "//h3[contains(., 'InChIKey')]/following-sibling::text()[1]"))
)
}

if(is.na(xml_find_first(ttt, "//h3[contains(., 'Smiles')]/following-sibling::text()[1]"))){
smiles <- NA
} else {
smiles <- gsub('\\n|\\t|\\r', '',
xml_text(xml_find_all(ttt, "//h3[contains(., 'Smiles')]/following-sibling::text()[1]"))
)
}

if(is.na(xml_find_first(ttt, "//h2[contains(., 'Toxicity')]/following-sibling::div//table"))){
toxicity <- NA
} else {
toxicity <- html_table(xml_find_all(ttt, "//h2[contains(., 'Toxicity')]/following-sibling::div//table"))[[1]]
}

if(is.na(xml_find_first(ttt, "//h2[contains(., 'Physical Prop')]/following-sibling::div//table"))){
physprop <- NA
} else {
physprop <- html_table(xml_find_all(ttt, "//h2[contains(., 'Physical Prop')]/following-sibling::div//table"))[[1]]
physprop[ , 'Value'] <- as.numeric(physprop[ , 'Value'])
#= same as physprop
}


out <- list(name = name, synonyms = synonyms, cas = cas, inchi = inchi,
inchikey = inchikey, smiles = smiles, toxicity = toxicity,
Expand Down
49 changes: 49 additions & 0 deletions R/srs.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#' Get record details from U.S. EPA Substance Registry Servives (SRS)
#'
#' Get record details from SRS, see \url{https://cdxnodengn.epa.gov/cdx-srs-rest/}
#'
#'@param query character; query ID.
#'@param from character; type of query ID, e.g. \code{'itn'} , \code{'cas'},
#' \code{'epaid'}, \code{'tsn'}, \code{'name'}.
#'
#'@return a list of lists (for each supplied query): a list of 22. subsKey,
#' internalTrackingNumber, systematicName, epaIdentificationNumber,
#' currentCasNumber, currentTaxonomicSerialNumber, epaName, substanceType,
#' categoryClass, kingdomCode, iupacName, pubChemId, molecularWeight,
#' molecularFormula, inchiNotation, smilesNotation, classifications,
#' characteristics, synonyms, casNumbers, taxonomicSerialNumbers, relationships
#'@author Gordon Getzinger, \email{gjg3@@duke.edu}
#'@export
#'
#' @examples
#' \donttest{
#' # might fail if API is not available
#' srs_query(query = '50-00-0', from = 'cas')
#'
#' ### multiple inputs
#' casrn <- c('50-00-0', '67-64-1')
#' srs_query(query = casrn, from = 'cas')
#' }
srs_query <-
function(query,
from = c("itn", "cas", "epaid", "tsn", "name")) {
entity_url <- "https://cdxnodengn.epa.gov/cdx-srs-rest/"

rst <- lapply(query, function(x) {
entity_query <- paste0(entity_url, "/substance/", from, "/", x)
response <- httr::GET(entity_query)

if (response$status_code == 200) {
text_content <- httr::content(response, "text")
if (text_content == "[]") {
return(NA)
} else {
jsonlite::fromJSON(text_content)
}
} else {
stop(httr::http_status(response)$message)
}
})
names(rst) <- query
return(rst)
}
30 changes: 30 additions & 0 deletions man/srs_query.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit e1a6fa1

Please sign in to comment.