Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve DESCRIPTION #47

Merged
merged 14 commits into from
Jul 10, 2015
30 changes: 18 additions & 12 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
Package: RSocrata
Type: Package
Title: Download 'Socrata' Data Sets as R Data Frames
Description: Provides easier interaction with
Socrata open data portals http://dev.socrata.com.
Expand All @@ -8,18 +9,23 @@ Description: Provides easier interaction with
returns an R data frame.
Converts dates to 'POSIX' format.
Manages throttling by 'Socrata'.
Version: 1.6.1-2
Date: 2015-6-5
URL: https://github.com/Chicago/RSocrata
BugReports: https://github.com/Chicago/RSocrata/issues
Imports:
httr (>= 0.3),
jsonlite (>= 0.9.14),
mime (>= 0.2),
Version: 1.6.2
Date: 2015-6-8
Authors@R: c(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll actually change this back to the old Author before pushing to CRAN. The CRAN guys are really picky that my email address in Authors@R match that of the Maintainer's email address. However, I like to separate the emails for general contact information from contacting me for package maintenance (CRAN will send e-mail blasts and I also use the email address in other projects I maintain). I prefer the Authors@R, but was getting cross with the CRAN submission process so used the old style.

person("Hugh", "Devlin, Ph. D.", role = c("aut")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we keep the URL and BugReports?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a specious remove. The lines are "added" (i.e., kept) on lines 29 and 30 in the new file.

person("Tom", "Schenk", role = c("cre"), email = "[email protected]"),
person("John", "Malc", email = "[email protected]", role = c("ctb"), comment = "@dmpe")
)
Maintainer: Tom Schenk <[email protected]>
Depends:
curl (>= 0.5)
R (>= 3.0.0)
Imports:
httr (>= 1.0.0),
jsonlite (>= 0.9.16),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to check

mime (>= 0.3)
Suggests:
RUnit
Author: Hugh Devlin, Ph. D. and Tom Schenk, Jr.
Maintainer: Tom Schenk Jr <[email protected]>
RUnit,
roxygen2 (>= 4.1.0)
License: MIT + file LICENSE
URL: https://github.com/Chicago/RSocrata
BugReports: https://github.com/Chicago/RSocrata/issues
16 changes: 11 additions & 5 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
# Generated by roxygen2 (4.1.1): do not edit by hand

export(fieldName)
export(ls.socrata)
export(posixify)
export(read.socrata)
export(ls.socrata)
importFrom("httr", "parse_url", "build_url", "http_status", "stop_for_status", "GET", "content")
importFrom("mime", "guess_type")
importFrom("jsonlite", "fromJSON")
import("curl")
importFrom(httr,GET)
importFrom(httr,build_url)
importFrom(httr,content)
importFrom(httr,http_status)
importFrom(httr,parse_url)
importFrom(httr,stop_for_status)
importFrom(jsonlite,fromJSON)
importFrom(mime,guess_type)
111 changes: 61 additions & 50 deletions R/RSocrata.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#' Time-stamped message
#'
#' Issue a time-stamped, origin-stamped log message.
#' @param s a string
#' @param s - a string
#' @return None (invisible NULL) as per cat
#' @author Hugh J. Devlin \email{Hugh.Devlin@@cityofchicago.org}
logMsg <- function(s) {
Expand All @@ -23,7 +23,7 @@ logMsg <- function(s) {
#' supported by Socrata. It will provide an exception if the syntax
#' does not align to Socrata unique identifiers. It only checks for
#' the validity of the syntax, but does not check if it actually exists.
#' @param fourByFour a string; character vector of length one
#' @param fourByFour - a string; character vector of length one
#' @return TRUE if is valid Socrata unique identifier, FALSE otherwise
#' @author Tom Schenk Jr \email{tom.schenk@@cityofchicago.org}
isFourByFour <- function(fourByFour) {
Expand All @@ -43,14 +43,15 @@ isFourByFour <- function(fourByFour) {
#' URL. Will accept queries with optional API token as a separate
#' argument or will also accept API token in the URL query. Will
#' resolve conflicting API token by deferring to original URL.
#' @param url a string; character vector of length one
#' @param app_token a string; SODA API token used to query the data
#' @param url - a string; character vector of length one
#' @param app_token - a string; SODA API token used to query the data
#' portal \url{http://dev.socrata.com/consumers/getting-started.html}
#' @return a valid Url
#' @return a - valid Url
#' @importFrom httr parse_url build_url
#' @author Tom Schenk Jr \email{tom.schenk@@cityofchicago.org}
validateUrl <- function(url, app_token) {
url <- as.character(url)
parsedUrl <- httr::parse_url(url)
parsedUrl <- parse_url(url)
if(is.null(parsedUrl$scheme) | is.null(parsedUrl$hostname) | is.null(parsedUrl$path))
stop(url, " does not appear to be a valid URL.")
if(!is.null(app_token)) { # Handles the addition of API token and resolves invalid uses
Expand All @@ -67,14 +68,14 @@ validateUrl <- function(url, app_token) {
})
}
if(substr(parsedUrl$path, 1, 9) == 'resource/') {
return(httr::build_url(parsedUrl)) # resource url already
return(build_url(parsedUrl)) # resource url already
}
fourByFour <- basename(parsedUrl$path)
if(!isFourByFour(fourByFour))
stop(fourByFour, " is not a valid Socrata dataset unique identifier.")
else {
parsedUrl$path <- paste('resource/', fourByFour, '.csv', sep="")
httr::build_url(parsedUrl)
build_url(parsedUrl)
}
}

Expand All @@ -84,19 +85,19 @@ validateUrl <- function(url, app_token) {
#' as it might appear in the first row of data,
#' to field name as it might appear in the HTTP header;
#' that is, lower case, periods replaced with underscores#'
#' @param humanName a Socrata human-readable column name
#' @param humanName - a Socrata human-readable column name
#' @return Socrata field name
#' @export
#' @author Hugh J. Devlin, Ph. D. \email{Hugh.Devlin@@cityofchicago.org}
#' @examples
#' #fieldName("Number.of.Stations") # number_of_stations
#' fieldName("Number.of.Stations") # number_of_stations
fieldName <- function(humanName) {
tolower(gsub('\\.', '_', as.character(humanName)))
}

#' Convert Socrata calendar_date string to POSIX
#'
#' @param x character vector in one of two Socrata calendar_date formats
#' @param x - character vector in one of two Socrata calendar_date formats
#' @return a POSIX date
#' @export
#' @author Hugh J. Devlin, Ph. D. \email{Hugh.Devlin@@cityofchicago.org}
Expand All @@ -110,35 +111,39 @@ posixify <- function(x) {
strptime(x, format="%m/%d/%Y %I:%M:%S %p") # long date-time format
}

# Wrap httr GET in some diagnostics
#
# In case of failure, report error details from Socrata
#
# @param url Socrata Open Data Application Program Interface (SODA) query
# @return httr response object
# @author Hugh J. Devlin, Ph. D. \email{Hugh.Devlin@@cityofchicago.org}
#' Wrap httr GET in some diagnostics
#'
#' In case of failure, report error details from Socrata
#'
#' @param url - Socrata Open Data Application Program Interface (SODA) query
#' @return httr response object
#' @importFrom httr http_status GET content stop_for_status
#' @author Hugh J. Devlin, Ph. D. \email{Hugh.Devlin@@cityofchicago.org}
#' @noRd
getResponse <- function(url) {
response <- httr::GET(url)
status <- httr::http_status(response)
response <- GET(url)
status <- http_status(response)
if(response$status_code != 200) {
msg <- paste("Error in httr GET:", response$status_code, response$headers$statusmessage, url)
if(!is.null(response$headers$`content-length`) && (response$headers$`content-length` > 0)) {
details <- httr::content(response)
details <- content(response)
msg <- paste(msg, details$code[1], details$message[1])
}
logMsg(msg)
}
httr::stop_for_status(response)
stop_for_status(response)
response
}

# Content parsers
#
# Return a data frame for csv
#
# @author Hugh J. Devlin \email{Hugh.Devlin@@cityofchicago.org}
# @param an httr response object
# @return data frame, possibly empty
#' Content parsers
#'
#' Return a data frame for csv
#'
#' @author Hugh J. Devlin \email{Hugh.Devlin@@cityofchicago.org}
#' @importFrom httr content
#' @param response - an httr response object
#' @return data frame, possibly empty
#' @noRd
getContentAsDataFrame <- function(response) { UseMethod('response') }
getContentAsDataFrame <- function(response) {
mimeType <- response$header$'content-type'
Expand All @@ -147,49 +152,53 @@ getContentAsDataFrame <- function(response) {
if(sep != -1) mimeType <- substr(mimeType, 0, sep[1] - 1)
switch(mimeType,
'text/csv' =
httr::content(response), # automatic parsing
content(response), # automatic parsing
'application/json' =
if(httr::content(response, as='text') == "[ ]") # empty json?
if(content(response, as='text') == "[ ]") # empty json?
data.frame() # empty data frame
else
data.frame(t(sapply(httr::content(response), unlist)), stringsAsFactors=FALSE)
data.frame(t(sapply(content(response), unlist)), stringsAsFactors=FALSE)
) # end switch
}

# Get the SoDA 2 data types
#
# Get the Socrata Open Data Application Program Interface data types from the http response header
# @author Hugh J. Devlin, Ph. D. \email{Hugh.Devlin@@cityofchicago.org}
# @param responseHeaders headers attribute from an httr response object
# @return a named vector mapping field names to data types
#' Get the SoDA 2 data types
#'
#' Get the Socrata Open Data Application Program Interface data types from the http response header
#' @author Hugh J. Devlin, Ph. D. \email{Hugh.Devlin@@cityofchicago.org}
#' @param response - headers attribute from an httr response object
#' @return a named vector mapping field names to data types
#' @importFrom jsonlite fromJSON
#' @noRd
getSodaTypes <- function(response) { UseMethod('response') }
getSodaTypes <- function(response) {
result <- jsonlite::fromJSON(response$headers[['x-soda2-types']])
names(result) <- jsonlite::fromJSON(response$headers[['x-soda2-fields']])
result <- fromJSON(response$headers[['x-soda2-types']])
names(result) <- fromJSON(response$headers[['x-soda2-fields']])
result
}

#' Get a full Socrata data set as an R data frame
#'
#' Manages throttling and POSIX date-time conversions
#'
#' @param url A Socrata resource URL,
#' @param url - A Socrata resource URL,
#' or a Socrata "human-friendly" URL,
#' or Socrata Open Data Application Program Interface (SODA) query
#' requesting a comma-separated download format (.csv suffix),
#' May include SoQL parameters,
#' but is assumed to not include a SODA offset parameter
#' @param app_token a string; SODA API token used to query the data
#' @param app_token - a string; SODA API token used to query the data
#' portal \url{http://dev.socrata.com/consumers/getting-started.html}
#' @return an R data frame with POSIX dates
#' @export
#' @author Hugh J. Devlin, Ph. D. \email{Hugh.Devlin@@cityofchicago.org}
#' @examples
#' df <- read.socrata("http://soda.demo.socrata.com/resource/4334-bgaj.csv")
#' @importFrom httr parse_url build_url
#' @importFrom mime guess_type
#' @export
read.socrata <- function(url, app_token = NULL) {
validUrl <- validateUrl(url, app_token) # check url syntax, allow human-readable Socrata url
parsedUrl <- httr::parse_url(validUrl)
mimeType <- mime::guess_type(parsedUrl$path)
parsedUrl <- parse_url(validUrl)
mimeType <- guess_type(parsedUrl$path)
if(!(mimeType %in% c('text/csv','application/json')))
stop("Error in read.socrata: ", mimeType, " not a supported data format.")
response <- getResponse(validUrl)
Expand All @@ -211,23 +220,25 @@ read.socrata <- function(url, app_token = NULL) {

#' List datasets available from a Socrata domain
#'
#' @param url A Socrata URL. This simply points to the site root.
#' @param url - A Socrata URL. This simply points to the site root.
#' @return an R data frame containing a listing of datasets along with
#' various metadata.
#' @export
#' @author Peter Schmiedeskamp \email{pschmied@@uw.edu}
#' @examples
#' df <- ls.socrata("http://soda.demo.socrata.com")
#' @importFrom jsonlite fromJSON
#' @importFrom httr parse_url
#' @export
ls.socrata <- function(url) {
url <- as.character(url)
parsedUrl <- httr::parse_url(url)
parsedUrl <- parse_url(url)
if(is.null(parsedUrl$scheme) | is.null(parsedUrl$hostname))
stop(url, " does not appear to be a valid URL.")
parsedUrl$path <- "data.json"
df <- jsonlite::fromJSON(httr::build_url(parsedUrl))
df <- fromJSON(build_url(parsedUrl))
df <- as.data.frame(df$dataset)
df$issued <- as.POSIXct(df$issued)
df$modified <- as.POSIXct(df$modified)
df$theme <- as.character(df$theme)
df
}
}
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ If you would like to contribute to this project, please see the [contributing do
1.4 Add json file format for Socrata downloads. Switch to RJSONIO rom rjson.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should correct this too. We went from RJSONIO to rjson


1.5 Several changes:
* Swapped ```jsonlite``` to ```RJSONIO```
* Swapped ```jsonlite``` from ```RJSONIO```
* Added handling for long and short dates
* Added unit test for reading private datasets

Expand Down
1 change: 1 addition & 0 deletions RSocrata.Rproj
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ RnwWeave: Sweave
LaTeX: pdfLaTeX

BuildType: Package
PackageUseDevtools: Yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to invs.

PackageInstallArgs: --no-multiarch --with-keep.source
PackageCheckArgs: --as-cran
PackageRoxygenize: rd,collate,namespace