Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update function URLs to fix a number of tests (SRS, ChemSpider) #424

Merged
merged 8 commits into from
Dec 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ Suggests:
rmarkdown,
plot.matrix,
usethis,
vcr
RoxygenNote: 7.2.3
vcr (>= 0.6.0)
RoxygenNote: 7.3.2
VignetteBuilder: knitr
Config/testthat/edition: 3
Config/testthat/parallel: true
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
## BUG FIXES

* `pc_prop()` returned `NA` without much further explanation if any of the queries were not positive integers. The updated function attempts to coerce queries to positive integers, only progresses valid queries, and prints informative messages along the way if verbose messages are enabled.
* `srs_query()` broke because the URL was no longer working. We have updated the URL.
* `is.inchikey(type = "chemspider")` broke because the URL was no longer working. We have updated the URL but the function now requires an API key like all other ChemSpider functions.

# webchem 1.3.0

Expand Down
20 changes: 20 additions & 0 deletions R/jagst.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#' Organic plant protection products in the river Jagst / Germany in 2013
#'
#' This dataset comprises environmental monitoring data of organic plant protection products
#' in the year 2013 in the river Jagst, Germany.
#' The data is publicly available and can be retrieved from the
#' LUBW Landesanstalt für Umwelt, Messungen und Naturschutz Baden-Württemberg.
#' It has been preprocessed and comprises measurements of 34 substances.
#' Substances without detects have been removed.
#' on 13 sampling occasions.
#' Values are given in ug/L.
#'
#' @format A data frame with 442 rows and 4 variables:
#' \describe{
#' \item{date}{sampling data}
#' \item{substance}{substance names}
#' \item{value}{concentration in ug/L}
#' \item{qual}{qualifier, indicating values < LOQ}
#' }
#' @source \url{https://udo.lubw.baden-wuerttemberg.de/public/pages/home/index.xhtml}
"jagst"
15 changes: 15 additions & 0 deletions R/lc50.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#' Acute toxicity data from U.S. EPA ECOTOX
#'
#' This dataset comprises acute ecotoxicity data of 124 insecticides.
#' The data is publicly available and can be retrieved from the EPA ECOTOX database
#' (\url{https://cfpub.epa.gov/ecotox/})
#' It comprises acute toxicity data (D. magna, 48h, Laboratory, 48h) and has been
#' preprocessed (remove non-insecticides, aggregate multiple value, keep only numeric data etc).
#'
#' @format A data frame with 124 rows and 2 variables:
#' \describe{
#' \item{cas}{CAS registry number}
#' \item{value}{LC50value}
#' }
#' @source \url{https://cfpub.epa.gov/ecotox/}
"lc50"
20 changes: 10 additions & 10 deletions R/pubchem.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,19 +25,19 @@
#' \code{<xref>}, \code{"sourceid/<source id>"} or \code{"sourceall"}.}
#' \item{\code{assay}: \code{"aid"}, \code{<assay target>}.}
#' }
#' @details <structure search> is assembled as "{\code{substructure} |
#' \code{superstructure} | \code{similarity} | \code{identity}} / {\code{smiles}
#' | \code{inchi} | \code{sdf} | \code{cid}}", e.g.
#' @details <structure search> is assembled as "(\code{substructure} |
#' \code{superstructure} | \code{similarity} | \code{identity}) / (\code{smiles}
#' | \code{inchi} | \code{sdf} | \code{cid})", e.g.
#' \code{from = "substructure/smiles"}.
#' @details \code{<xref>} is assembled as "\code{xref}/\{\code{RegistryID} |
#' @details \code{<xref>} is assembled as "\code{xref}/(\code{RegistryID} |
#' \code{RN} | \code{PubMedID} | \code{MMDBID} | \code{ProteinGI},
#' \code{NucleotideGI} | \code{TaxonomyID} | \code{MIMID} | \code{GeneID} |
#' \code{ProbeID} | \code{PatentID}\}", e.g. \code{from = "xref/RN"} will query
#' \code{ProbeID} | \code{PatentID})", e.g. \code{from = "xref/RN"} will query
#' by CAS RN.
#' @details <fast search> is either \code{fastformula} or it is assembled as
#' "{\code{fastidentity} | \code{fastsimilarity_2d} | \code{fastsimilarity_3d} |
#' \code{fastsubstructure} | \code{fastsuperstructure}}/{\code{smiles} |
#' \code{smarts} | \code{inchi} | \code{sdf} | \code{cid}}", e.g.
#' "(\code{fastidentity} | \code{fastsimilarity_2d} | \code{fastsimilarity_3d} |
#' \code{fastsubstructure} | \code{fastsuperstructure})/(\code{smiles} |
#' \code{smarts} | \code{inchi} | \code{sdf} | \code{cid})", e.g.
#' \code{from = "fastidentity/smiles"}.
#' @details \code{<source id>} is any valid PubChem Data Source ID. When
#' \code{from = "sourceid/<source id>"}, the query is the ID of the substance in
Expand All @@ -46,8 +46,8 @@
#' depositor names. Depositor names are not case sensitive.
#' @details Depositor names and Data Source IDs can be found at
#' \url{https://pubchem.ncbi.nlm.nih.gov/sources/}.
#' @details \code{<assay target>} is assembled as "\code{target}/\{\code{gi} |
#' \code{proteinname} | \code{geneid} | \code{genesymbol} | \code{accession}\}",
#' @details \code{<assay target>} is assembled as "\code{target}/(\code{gi} |
#' \code{proteinname} | \code{geneid} | \code{genesymbol} | \code{accession})",
#' e.g. \code{from = "target/geneid"} will query by GeneID.
#' @references Wang, Y., J. Xiao, T. O. Suzek, et al. 2009 PubChem: A Public
#' Information System for
Expand Down
6 changes: 3 additions & 3 deletions R/srs.R
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ srs_query <-
if (!ping_service("srs")) stop(webchem_message("service_down"))
names(query) <- query
from <- match.arg(from)
entity_url <- "https://cdxnodengn.epa.gov/cdx-srs-rest/"
entity_url <- "https://cdxapps.epa.gov/oms-substance-registry-services/rest-api"
if (from == "cas"){
query <- as.cas(query, verbose = verbose)
}
Expand All @@ -55,12 +55,12 @@ srs_query <-
}
if (verbose) message(httr::message_for_status(response))
if (response$status_code == 200) {
text_content <- httr::content(response, "text")
text_content <- httr::content(response, "text", encoding = "utf-8")
if (text_content == "[]") {
if (verbose) webchem_message("not_available")
return(NA)
} else {
jsonlite::fromJSON(text_content)
tibble::as_tibble(jsonlite::fromJSON(text_content))
}
} else {
return(NA)
Expand Down
64 changes: 44 additions & 20 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@
#' @param x character; input InChIKey
#' @param type character; How should be checked? Either, by format (see above)
#' ('format') or by ChemSpider ('chemspider').
#' @param apikey character; your API key. If NULL (default),
#' \code{cs_check_key()} will look for it in .Renviron or .Rprofile. Only
#' used when `type = "chemspider"`.
#' @param verbose logical; print messages during processing to console?
#' @return a logical
#'
Expand All @@ -31,24 +34,32 @@
#' is.inchikey('BQJCRHHNABKAKU/KBQPJGBKSA/N')
#' is.inchikey('BQJCRHHNABKAKU-KBQPJGBKXA-N')
#' is.inchikey('BQJCRHHNABKAKU-KBQPJGBKSB-N')
is.inchikey = function(x, type = c('format', 'chemspider'),
verbose = getOption("verbose")) {
is.inchikey = function(
x,
type = c('format', 'chemspider'),
apikey = NULL,
verbose = getOption("verbose")
) {
# x <- 'BQJCRHHNABKAKU-KBQPJGBKSA-N'
if (length(x) > 1) {
stop('Cannot handle multiple input strings.')
}

type <- match.arg(type)
out <- switch(type,
format = is.inchikey_format(x, verbose = verbose),
chemspider = is.inchikey_cs(x, verbose = verbose))
out <- switch(
type,
format = is.inchikey_format(x, verbose = verbose),
chemspider = is.inchikey_cs(x, apikey = apikey, verbose = verbose)
)
return(out)
}


#' Check if input is a valid inchikey using ChemSpider API
#'
#' @param x character; input string
#' @param apikey character; your API key. If NULL (default),
#' \code{cs_check_key()} will look for it in .Renviron or .Rprofile.
#' @param verbose logical; print messages during processing to console?
#' @return a logical
#'
Expand All @@ -65,9 +76,15 @@ is.inchikey = function(x, type = c('format', 'chemspider'),
#' is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKXA-N')
#' is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKSB-N')
#' }
is.inchikey_cs <- function(x, verbose = getOption("verbose")){

if (!ping_service("cs_web")) stop(webchem_message("service_down"))
is.inchikey_cs <- function(
x,
apikey = NULL,
verbose = getOption("verbose")
){
if (is.null(apikey)) {
apikey <- cs_check_key()
}
if (!ping_service("cs")) stop(webchem_message("service_down"))

if (length(x) > 1) {
stop('Cannot handle multiple input strings.')
Expand All @@ -76,13 +93,20 @@ is.inchikey_cs <- function(x, verbose = getOption("verbose")){
if (verbose) webchem_message("na")
return(NA)
}
baseurl <- 'http://www.chemspider.com/InChI.asmx/IsValidInChIKey?'
qurl <- paste0(baseurl, 'inchi_key=', x)
webchem_sleep(type = 'scrape')
qurl <- 'https://api.rsc.org/compounds/v1/tools/validate/inchikey'
headers <- c(
"Accept" = "application/json",
"Content-Type" = "application/json",
"apikey" = apikey
)
body <- list("inchikey" = x) |> jsonlite::toJSON(auto_unbox = TRUE)
webchem_sleep(type = 'API')
if (verbose) webchem_message("query", x, appendLF = FALSE)
res <- try(httr::RETRY("GET",
qurl,
httr::user_agent(webchem_url()),
res <- try(httr::RETRY("POST",
url = qurl,
httr::add_headers(.headers = headers),
body = body,
encode = "json",
terminate_on = 404,
quiet = TRUE), silent = TRUE)
if (inherits(res, "try-error")) {
Expand All @@ -91,13 +115,13 @@ is.inchikey_cs <- function(x, verbose = getOption("verbose")){
}
if (verbose) message(httr::message_for_status(res))
if (res$status_code == 200){
h <- xml2::read_xml(res)
out <- as.logical(xml_text(h))
return(out)
}
else {
return(NA)
out <- as.logical(httr::content(res))
} else if (res$status_code == 400) {
out <- FALSE
} else {
out <- NA
}
return(out)
}


Expand Down
43 changes: 1 addition & 42 deletions R/webchem-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,48 +4,7 @@
#' of web APIs for chemical information.
#'
#' @docType package
#' @name webchem
#' @importFrom methods is
#' @importFrom utils globalVariables
if (getRversion() >= "2.15.1")
globalVariables(c("."))
"_PACKAGE"



#' Organic plant protection products in the river Jagst / Germany in 2013
#'
#' This dataset comprises environmental monitoring data of organic plant protection products
#' in the year 2013 in the river Jagst, Germany.
#' The data is publicly available and can be retrieved from the
#' LUBW Landesanstalt für Umwelt, Messungen und Naturschutz Baden-Württemberg.
#' It has been preprocessed and comprises measurements of 34 substances.
#' Substances without detects have been removed.
#' on 13 sampling occasions.
#' Values are given in ug/L.
#'
#' @format A data frame with 442 rows and 4 variables:
#' \describe{
#' \item{date}{sampling data}
#' \item{substance}{substance names}
#' \item{value}{concentration in ug/L}
#' \item{qual}{qualifier, indicating values < LOQ}
#' }
#' @source \url{https://udo.lubw.baden-wuerttemberg.de/public/pages/home/index.xhtml}
"jagst"


#' Acute toxicity data from U.S. EPA ECOTOX
#'
#' This dataset comprises acute ecotoxicity data of 124 insecticides.
#' The data is publicly available and can be retrieved from the EPA ECOTOX database
#' (\url{https://cfpub.epa.gov/ecotox/})
#' It comprises acute toxicity data (D. magna, 48h, Laboratory, 48h) and has been
#' preprocessed (remove non-insecticides, aggregate multiple value, keep only numeric data etc).
#'
#' @format A data frame with 124 rows and 2 variables:
#' \describe{
#' \item{cas}{CAS registry number}
#' \item{value}{LC50value}
#' }
#' @source \url{https://cfpub.epa.gov/ecotox/}
"lc50"
1 change: 1 addition & 0 deletions R/zzz.R
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
if (getRversion() >= "2.15.1") utils::globalVariables(c("."))
20 changes: 10 additions & 10 deletions man/get_cid.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions man/is.inchikey.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 4 additions & 1 deletion man/is.inchikey_cs.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/jagst.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/lc50.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading