Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: rnassqs - access the USDA-NASS 'Quick Stats' data through their API #298

Closed
14 of 25 tasks
potterzot opened this issue May 4, 2019 · 42 comments
Closed
14 of 25 tasks

Comments

@potterzot
Copy link

potterzot commented May 4, 2019

Submitting Author: Nicholas A. Potter (@potterzot)
Repository: https://github.com/potterzot/rnassqs
Version submitted: 0.4.0
Editor: @lmullen
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD


  • Paste the full DESCRIPTION file inside a code block below:
Package: rnassqs
Type: Package
Title: Access the NASS 'Quick Stats' API
Version: 0.4.0.9000
Date: 2019-04-29
Authors@R: c(
  person('Nicholas', 'Potter', email='[email protected]', role = c('aut', 'cre')),
  person('Joseph', 'Stachelek', email='', role = c('ctb')),
  person('Julia', 'Piaskowski', email='', role = c('ctb'))) 
Maintainer: Nicholas Potter <[email protected]>
Description: Interface to access data via the United States Department of 
  Agricultre's National Agricultural Statistical Service (NASS) 'Quick Stats' 
  web API <https://quickstats.nass.usda.gov/api>. Convenience functions 
  facilitate building queries based on available parameters and valid parameter 
  values.
URL: https://github.com/potterzot/rnassqs
BugReports: http://www.github.com/potterzot/rnassqs/issues
License: MIT + file LICENSE
LazyData: TRUE
Language: en-US
Imports:
  httr,
  jsonlite,
Suggests:
  testthat,
  here,
  knitr,
  rmarkdown
RoxygenNote: 6.1.1
Encoding: UTF-8
VignetteBuilder: knitr

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • database access
    • data munging
    • data deposition
    • reproducibility
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

Data retrieval because 'rnassqs' allows access the the NASS 'Quick Stats' API to fetch data.

  • Who is the target audience and what are scientific applications of this package?

Target audience is those who want to automate or reproducibly fetch data from 'Quick Stats', including agronomists, economists, and others working with agricultural data. Scientific applications include analysis of agricultural data by administrative region (e.g. county, state, watershed), economic analysis of policies that affect agriculture, and sociological/demographic analysis of agricultural producers over time.

None that I have been able to find.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

#297, responded to by @noamross

Technical checks

Confirm each of the following by checking the box. This package:

Publication options

JOSS Options
  • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI: 10.5281/zenodo.2662520
    • (Do not submit your package separately to JOSS)
MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@lmullen
Copy link
Member

lmullen commented May 8, 2019

@potterzot I will be the editor for this peer review.

Here are my editorial checks.

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

In running R CMD check I get the following:

❯ checking top-level files ... NOTE
  Non-standard files/directories found at top level:
    ‘paper.bib’ ‘paper.md’
  • Please either add those files too .Rbuildignore or put them in inst/.

  • In paper.md there are a few typos, including these. You might look over the paper one more time as you correct these.

  • "be one of the most difficult stages of research make reproducible" should be "to make"

  • "explicitly constructing html GET requests" should be "HTTP GET requests." (HTTP is the protocol that often, but not usually in the case of an API like this one, returns HTML.)

  • Could you clarify, please, what is tested when the user has not provided an API key? For tests that you are skipping if no API key is available, is it possible to stub or mock the API? The testthat package includes some mocking functions, as well as testthat::expect_equal_to_reference() which can be used for that purpose.

  • After running spelling::spell_check_package() I note a number of misspellings. Please run that on the package yourself and correct the actual misspellings.

  • After running goodpractice::gp() there is this change needed to the DESCRIPTION:

✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite
    often. A build date will be added to the package when you perform `R CMD build` on
    it.

I am in the process of looking for peer reviewers.


Reviewers:
Due date:

@potterzot
Copy link
Author

@lmullen thank you for your comments. I've made a commit to address your comments.

Regarding this note:

Could you clarify, please, what is tested when the user has not provided an API key? For tests that you are skipping if no API key is available, is it possible to stub or mock the API? The testthat package includes some mocking functions, as well as testthat::expect_equal_to_reference() which can be used for that purpose.

Tests in tests/testthat/test-oncran.R beginning on line 55 include mock API calls. The tests make the request, specifying that the function return the GET request URL rather than actually make the request, and that request is compared to the correct URL. There are three API paths to test:

There are additional mock API tests that follow those, but those are for convenience functions for making specific requests, e.g. nassqs_area and nassqs_yield, which wrap nassqs_GET.

Tests in tests/testthat/test-local.R make actual API calls using an API key, and are not possible on CRAN.

Is there a better way of organizing tests that makes it clear where the API mock tests are done and where the actual API call tests are done?

@lmullen
Copy link
Member

lmullen commented May 9, 2019

@potterzot That sounds fine to me. I just wanted to make sure the reviewers and I understood.

@lmullen
Copy link
Member

lmullen commented Jun 18, 2019

@potterzot Apologies for the delay in getting this review going. One person has agreed to review but a string of others have been unavailable at the start of the summer. Still looking for that second reviewer and then the review will begin.

@lmullen
Copy link
Member

lmullen commented Jun 19, 2019

Thanks to our reviewers for agreeing to take on this package.

Reviewer: @adamhsparks
Reviewer: @nealrichardson
Due date: 2019-07-11

You can find the guide for reviewers here. Please let me know if you have any questions.

@adamhsparks
Copy link
Member

I know I'm behind. Sorry, I've had a rather busy time lately. I'm starting the review today and will see how I go.

@adamhsparks
Copy link
Member

adamhsparks commented Jul 13, 2019

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be auto-generated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 7

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

This package offers great functionality and defines very nicely why it's necessary. I'm happy to see this sort of package being written.

Following are my comments.

  • Spell check the package, in particular the DESCRIPTION file's Line 12, Agricultre

  • It's a bit of opsec, but I think the README and vignette should be more explicit about keeping the API keys secret and not embedding them in scripts. Along with this, highlighting why it's useful to use the .Renviron file for this purpose would be nice (you don't have to worry about adding it to .gitignore for one). For new users just how to modify the .Renviron is likely to be confusing at best and frustrating at worst. Perhaps it would be nice to include an example that shows how to use usethis::edit_r_environ() for this purpose to streamline the illustrations and examples.

  • The DESCRIPTION file's description text might need " ' " around "API". I had to do this for the nasapower package to have it accepted on CRAN.

  • I always encourage including a proper CITATION file in /inst, see https://github.com/ropensci/nasapower/blob/master/inst/CITATION for an example

  • The paper.md file has Git conflicts in it that need to be resolved

  • There is no statement about contributions or a code-of-conduct present

  • I feel a bit like the functionality for the end-user is a bit overly complicated. This sort of package to me should just return data in a data.frame or list or some other R object that I can easily work with. In some cases I get a server response, in others I can ask for raw JSON or other file formats. I don't really see a good reason for this in R. I guess the functionality could be offered, but I think it needs to be hidden under a few layers for more advanced users that might want to use it. Rather it should just be a simple query for the data that I want and returning it in a format that I can readily use in R without extra steps. Just fetch the data and format it for me in one single step. I think it's possible with this package, but the documentation is a bit convoluted and not clear to me.

Comments on Vignette

When I try using the nassqs_token() I receive and error message.

> library(rnassqs)
> nassqs_token()
Error in nassqs_token() : could not find function "nassqs_token"

This example works as expected,

nassqs_fields()
[1] "agg_level_desc"        "asd_code"             
 [3] "asd_desc"              "begin_code"           
 [5] "class_desc"            "commodity_desc"       
 [7] "congr_district_code"   "country_code"         
 [9] "country_name"          "county_ansi"          
[11] "county_code"           "county_name"          
[13] "CV"                    "domaincat_desc"       
[15] "domain_desc"           "end_code"             
[17] "freq_desc"             "group_desc"           
[19] "load_time"             "location_desc"        
[21] "prodn_practice_desc"   "reference_period_desc"
[23] "region_desc"           "sector_desc"          
[25] "short_desc"            "state_alpha"          
[27] "state_ansi"            "state_name"           
[29] "state_fips_code"       "statisticcat_desc"    
[31] "source_desc"           "unit_desc"            
[33] "util_practice_desc"    "Value"                
[35] "watershed_code"        "watershed_desc"       
[37] "week_ending"           "year"                 
[39] "zip_5"                

however,

?nassqs_fields()

returns a help file that says,

Deprecated: Return list of NASS QS parameters.
Description
Deprecated. Use nassqs_params() instead.

Suggest updating vignette to match most recent functionality.


Another error occurs with this example from the vignette.

> rnassqs::nassqs_field_values(field = 'unit_desc')
Error: 'nassqs_field_values' is not an exported object from 'namespace:rnassqs'

The "All together" section script is not functional.

fields <- nassq_fields()
Error in nassq_fields() : could not find function "nassq_fields"

Comments on functions

  • Function names are not consistent with GET being all caps but parse and check not. In the vignette text nassqs_parse() is referred to as PARSE in all caps. Consistency in function naming will help the end user.

  • It is entirely up to the package authors how to organise the functions, but I find the current structure confusing at best. Typically when I see a file with just the package name, it is just there to provide the help file for the package with author information, references and other basic info. The nassqs.R file has several functions in it for the package along with release_questions(), which I find to be extremely odd. This function is not something that should be in the package and exposed to end-users.

My suggestion is remove release_questions() entirely and split out the functions into their own files with the function name being the file-name. This helps make it easier to keep the functions organised and updated. Splitting the functions out is entirely up to the authors if they wish to implement this structure, however I feel that removing release_questions() is necessary.

  • Why is base_URL given as a parameter that the user can modify? The documentation even says it "probably" should not be changed. I would just hard-code it and not give the user any possibility of changing it. I can't see any good reason for doing this. If the URL changes, then the package should be updated to reflect the changes.

Comments on documentation

  • When packages are mentioned in the documentation, wrap them in \pkg{httr} for proper formatting to indicate that you are referring to a package. Likewise, R can be written as \R for special formatting.

  • The text \code{jsonlite::fromJSON} should be written as \code{\link[jsonlite]{fromJSON}} so that it links to the help file for this function.

  • Titles for function help files should be written in title case.

  • All exported functions should have examples in the documentation, nassqs_check() does not have any examples.

  • The example for nassqs_param_values() is commented out making it difficult to follow. Examples should not be commented out and should be clear an easy to follow.

  # See all values available for the statisticcat_desc field. Values may not
  # be available in the context of other parameters you set, for example
  # a given state may not have any 'YIElD' in blueberries if they don't grow
  # blueberries in that state.
  # Requires an API key:
  #nassqs_param_values("statisticcat_desc", key = "my api key")

Should appear as

  # See all values available for the statisticcat_desc field. Values may not
  # be available in the context of other parameters you set, for example
  # a given state may not have any 'YIElD' in blueberries if they don't grow
  # blueberries in that state.
  # Requires an API key:
  
  nassqs_param_values("statisticcat_desc", key = "my api key")
  • I find the documentation for nassqs_GET() confusing. I don't understand the first example.
> params = list(commodity_name="CORN", 
+               year=2012, 
+               agg_level_desc = "STATE",
+               state_alpha = "WA",
+               statisticcat_desc = "YIELD")
> nassqs_GET(params)
Response [https://quickstats.nass.usda.gov/api/api_GET?key=XXXXXXXXXXXXXXXXXXXXXXX&commodity_name=CORN&year=2012&agg_level_desc=STATE&state_alpha=WA&statisticcat_desc=YIELD&format=JSON]
  Date: 2019-07-13 04:48
  Status: 200
  Content-Type: application/json
  Size: 148 kB

What do I do with this response value? How is this response yields for corn in 2012 in Washington?

Does the end-user even need to interface with this function or should it be hidden and used by the other functions in the package that return data in data.frames or other R objects?

  • nassqs_params() lacks examples

  • The example for nassqs_parse() could be easier to follow.

# Set parameters and make the request
params = list(commodity_name="CORN", 
              year=2012, 
              agg_level_desc = "STATE",
              state_alpha = "WA",
              statisticcat_desc = "YIELD")
req <- nassqs_GET(params)
nassqs_parse(req, as = "data.frame")

would be more clear as

# Set parameters and make the request
params <- list(
  commodity_name = "CORN",
  year = 2012,
  agg_level_desc = "STATE",
  state_alpha = "WA",
  statisticcat_desc = "YIELD"
)
req <- nassqs_GET(params)
corn <- nassqs_parse(req, as = "data.frame")
head(corn)
  • nassqs_parse() as is unclear. It states it indicates the data type returned, but doesn't list the data types aside from in the usage section, which indicates that a list is possible, but this doesn't appear to be documented? The @return section says a data frame or raw text of the content from the request.

  • Any functions that query an external server for data and may fail or take an extended period of time to run should have the examples wrapped in a \donttest{} for CRAN but still allowing for local testing. I see most examples are wrapped but not all that run an external query.

Comments on code style

  • In most cases the code is clearly written, in some cases the style is inconsistent with lack of spaces around a =, e.g. Line 73 of nassqs.R. This also applies in the examples for documentation. Also, single and double quotes are used interchangeably. I find it easier to follow if only one style is used in all cases as there are cases where single quotes only may be used and so forth and so on.

  • The operator used to assign in the examples also switches between = and <-. Only one should be used consistently.

  • Wrap code at 80 characters for ease of reading and for those of us that don't have editor windows that expand beyond 80 chars wide.

@nealrichardson
Copy link

nealrichardson commented Jul 15, 2019

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Looks like I get to be the infamous "Reviewer 2" this time :)

Since Adam dutifully got his review in on time and I'm delayed a couple of days, I'll say at the start that I agree generally with all of his points (with one minor exception, noted below), and in the interest of brevity, I've tried to avoid reiterating comments he made.

I greatly appreciate packages like this and the effort that goes into making them. This is the kind of package I wish existed when I was doing my dissertation--would have made at least the data retrieval and management part of research a lot cleaner. So, thank you for your contribution.

Style and code-hygiene comments aside, my main suggestion to you is to think about how you can orient the package towards the R user and their needs. The purpose of this package should be to encapsulate your hard-earned knowledge of how this API works and enable data scientists to access that data as naturally as possible. However, the package currently seems to expose an R interface centered around the needs of the API and not the R user. For example, the functions that seem to make up the public interface for the package are mostly stuck in "helpers.R", as if they're ancillary when really they should be front and center.

Here's a more concrete example: reading the vignette discussion about the comparison operators, it looks like the way to get data on corn production in Virginia since 2012 is

nassqs(list(commodity_desc="CORN", year__GE=2012, state_alpha="VA")))

But when I think about the R code that I want to type to get that data, it looks more like

nassqs(commodity == "corn", state == "va", year >= 2012)

That is, not case-sensitive, handles the different ways commodities and states (for example) could be referenced (integer code, abbreviation, long name), and more naturally handles the comparison operators (can translate year >= 2012 to year__GE=2012). You could go even farther and have a dplyr syntax where you can filter() rows and select() columns, so that the R user is essentially describing the data.frame they want to get, and your package figures out how to turn that into API requests, handle pagination, etc., and return the shape of data that an R user expects.

I'm not saying you need to do exactly this/all of this now (though some things, like case insensitvity, would be easy enough), but more to suggest ideas for the future and to show what I mean in terms of thinking of the R interface and not just what the API demands. When I wrap APIs (and in general write packages), I like to start by thinking about the R code I want (users) to type, and that usually means doing more work in the package to mask the awkwardness of the underlying HTTP API.

Another way to get more R-user-centric would be to improve the documentation, which I found to be thin, often tautological (@param api_path the API path), and not helpful for an inexperienced user. Documentation is a bit of work but it's worth doing, especially for how it can help focus you on what the intended user needs to know in order to get value from the package--and how you can make what details they need to know as minimal as possible.

A good way to see what the docs will look like to your users is to use pkgdown to build the package website. You don't need to publish this (though it's nice if you do), but I've found that even doing it locally makes it really visible to me where my documentation is not the most helpful or beautiful.

Observations from using the package

I got a token and did the example on the readme (corn yields in Virginia and Pennsylvania for since 2000).

> library(rnassqs)
> Sys.setenv(NASSQS_TOKEN="REDACTED")
> params <- list("commodity_desc"="CORN",
+                   "year__GE"=2000,
+                   "state_alpha"=c("VA", "PA"),
+                   "statisticcat_desc"="YIELD")
> df <- nassqs(params)

That all worked as expected. That said,

  • It was pretty slow. Slow enough that I wondered if the process was hung. The resulting data.frame is 4752 x 39, so not huge. I'm not sure what was slow, but it might be worth considering how to show progress. I believe httr has some tools for showing progress if it's slow because of the network, for example. But maybe network isn't the constraint. Granted, I made one request, so I'm not sure how widespread the issue is, but it was distracting.
  • It was not at all obvious to me that I would get 39 columns back. It would help if the documentation perhaps described the result, beyond saying "returns a data frame". And is it possible to request fewer columns? Some are clearly duplicate information (state_name, state_alpha).
  • It's odd to me that year is numeric but everything else in the data.frame is character, even when there are very obviously numeric columns in there. If I'm going to do anything useful with this crop yield data I just downloaded, the first thing I'm going to have to do is df$Value <- as.numeric(df$Value) so it seems worthwhile to return properly formatted data.
  • It was surprising that the one column name that the package code changes gets a rather unfriendly name ("CV (%)"), while the other columns follow a standard, well-behaved format.

Specific code notes

  • General style
    • be consistent with single vs. double quotes
    • don't need to quote/backtick list names (unless there's a space, which isn't the case anywhere here) e.g. prefer list(param=param) over list("param"=param)
    • My one point of disagreement with Adam: it's ok to have nassqs_GET() capitalized: HTTP request methods are supposed to be capitalized (they're case sensitive, per the RFC). FWIW in httr, the request methods are capitalized while other acronyms (JSON etc.) are not.
    • #' @importFrom httr function_name rather than httr::function_name
    • Check out markdown formatting in roxygen2: will clean up your \code{\link{}}s
    • if (something) { result } inline is bad form: either drop {} or make multiline. See https://style.tidyverse.org/syntax.html#indenting
  • data.R:
    • can do nassqs_fields <- nassqs_params, don't need function () that()
    • Give documentation for what those params mean? Or link to API docs?
  • nassqs.R
    • Agree that release_questions() doesn't belong in the package
    • nassqs_parse
      • L127 docs not correct since it doesn't necessarily return a data frame
      • "as" list is different from nassqs
      • as "raw" is misleading; it's not raw in the R sense, it's "character". or as="text" if you want to be like httr
      • L153: when is not a response object? Also better to use inherits() rather than class ==. Also can return() early rather than if/else
      • L159 warning: can't be tested because if condition is met, L156 would have errored. Suggests better test coverage needed
      • L177 read.csv
      • L179: no else case?
    • nassqs_auth
      • IMO this function should just be function (key) Sys.setenv(NASSQS_TOKEN = key). What's the value of the interactive mode? It's reasonable to expect user to set auth at the beginning of the session.
      • Not sure the dual purpose (set and get token) is worth it. In places where you want to read the token, just Sys.getenv()
      • L214 don't need to end lines with ;
    • nassqs (and mostly relevant for helpers.R functions too)
      • Is this the function people should be calling? If so, the file would be more readable if you put this function at the top of the file
      • A nicer interface would accept the "params" as ... rather than require you to wrap them in list() yourself. I don't think you need to pass ... to nassqs_GET (if any of its args are important to expose, add them as named arguments)
      • Related, you can use @inheritParams to document the function args in only one place and pull them in in the other functions that have the same
    • nassqs_GET
      • L35 docs: make that an href
      • I'd remove key from the function signature and just Sys.getenv(). L75-78 is unnecessary complexity
      • I'd remove base_url from the signature (since it "should probably never be changed"). If you want it configurable for some reason, allow it to be set by an option and in the function get it like base_url <- getOption("rnasssqs.base_url", "https://quickstats.nass.usda.gov/api/")
      • I'd remove url_only from the signature. It's confusing to have a function than can return very different things based on an argument. If that's functionality you need, I'd factor out the URL query assembly, ending in build_url(), to its own function, and have rnassqs_GET() call that and then httr::GET() the result. If you only have this argument for unit testing, I have an alternative proposal (see below).
      • It's odd that the function allows "XML" format but rnassqs_parse errors if it gets XML.
      • Consider removing "format" from the signature altogether since it can be set in params. Also, why would I care what content-type I'm querying in if my end result is going to be a data.frame?
    • nassqs_check: what do I do if my request is too large? Give some recommendation for how to fix it. If it means that you need to make paginated requests, how would I do that? An even friendlier solution would be for you to handle the pagination for the user so they don't need to know/care about this API constraint.
  • helpers.R
    • expand_list() is probably the only function there that fits with what I'd expect helpers to be (i.e. internal functions). I'd give it @keywords internal so it doesn't list in the help index, and possibly just document it with comments and not formal documentation. (And if you're going to document it, explain what it actually does, which is not obvious.)
  • Vignette
    • .secret file is used in tests and supporting code, and it's discussed in the vignette, but it's not actually supported in the package (i.e. the nassqs_auth() doesn't look for a .secret file). And IIUC the vignette recommendation is incomplete: you'd still have to set or pass it in even if you read it in with readLines(). IMO I'd just drop that discussion and rely on the environment variable, or say that if you wanted to store it in some file (called whatever), you could read it and set it in your session like Sys.setenv(NASSQS_TOKEN = readLines(file)).
    • Don't rnassqs:: your own functions, not necessary
    • Looks like you should Suggest assertthat since your vignette uses it.
  • Tests
    • oncran
      • You may find the httptest package useful for these tests. Rather than using the url_only argument (which I suspect you only have for tests), you could use expect_GET(nassqs(...), url)
      • httptest would also let you supply mock responses so you could test the full request/response/parse flow naturally. What you have with the the .rds testdata files works, of course, but httptest could make those tests more readable and maintainable.
      • FYI there is testthat::skip_if(), and you could of course define your own skip_if_interactive() (but you take my suggestion to drop the interactive behavior of the auth function, then you can just delete these tests)
      • withr::with_envvar can help manipulate environment variables within tests (withr is depended on by testthat, so it's a "free" dependency, in case that's a concern)
      • L133-151: maybe you want one of these to assert that you can get the "text" content, but mostly you're just asserting that httr::content() works.
    • local
      • Again, I'd drop the .secret file (and thus the here dependency)
      • L37: is.numeric(as.numeric(...)) is always true so this test doesn't do anything; also for future reference, testthat::expect_true may help in some places
      • L56: httptest would let you test handling of error responses without a network connection

Miscellaneous

  • Should add stats and utils to Imports in DESCRIPTION
  • R CMD check shows this NOTE--maybe because the file is huge? Consider (re)moving the link, obfuscating it so that CRAN doesn't try to download it, or something.
Found the following (possibly) invalid URLs:
  URL: ftp://ftp.nass.usda.gov/quickstats
    From: README.md
    Status: Error
    Message: libcurl error code 6:
      	Could not resolve host: ftp.nass.usda.gov
  • Would be good to add rnassqs.Rcheck and rnassqs_*.tar.gz to .Rbuildignore and .gitignore so that artifacts from R CMD build and check locally don't get checked in
  • .Rbuildignore includes a misspelling (paoer.pdf)
  • Re: spelling, check out the spelling package.

@lmullen
Copy link
Member

lmullen commented Jul 16, 2019

@adamhsparks and @nealrichardson: Thanks to both of you for these thorough and detailed reviews.

@potterzot Please take a look at these comments from the reviewers. There is lots of good advice in here. It does seem like some of it will require some pretty fundamental reassessment of the package's user-facing interface. Do you think you can make revisions within two weeks, our typical deadline? i.e., by July 29?

@potterzot
Copy link
Author

@lmullen, @adamhsparks, @nealrichardson first let me say a big thanks, these are some really good suggestions and you clearly put in a lot of time to give some great feedback. While some of the changes are substantial, I hope that the underlying framework will make them relatively easy to implement. I think I can probably submit a revision by July 29th. Is it possible to ask for an extension if that becomes necessary?

@lmullen
Copy link
Member

lmullen commented Jul 16, 2019

@potterzot Sure. If you can plan on a July 29 deadline, that would be best, but an extension would be fine if it becomes necessary.

@adamhsparks
Copy link
Member

@potterzot, I hope you find my review useful and not being negative. This is an extremely valuable package, my interest is in helping you improve it. Feel free to ask for help or guidance along the way. I'm happy to contribute.

@potterzot
Copy link
Author

Hi @adamhsparks, thank you for your time and willingness to help. I've finished making the straight-foward changes you suggested, and am thinking about the larger issues, which seem to boil down to two related issues:

  1. How much should we expose to the user
  2. How can we simplify the interface to make it more usable

The main function of the package is nassqs(), which fetches and returns parsed data. But I exposed nassqs_GET() and nassqs_parse() because I wanted to make it possible for advanced users to deal with any edge cases that might come up. Perhaps that's not really a concern here, and it would make sense to hide both of those functions. Working with output from nassqs_GET() requires some knowledge of the httr package and how requests work, so it would only be useful to someone who wants to see the raw results and knows what to do with them. nassqs_parse() on the other hand doesn't do much more than what jsonlite::fromJSON does, so I think it could be hidden without really removing any control from the user. My goal was to make it easy to use, but also to allow a user to dig into the deeper mechanism if necessary. This may be born out of personal frustration from when I've not understood what a hidden function is doing and had to root around in the source code to figure it out.

I propose hiding nassqs_parse(), expanding the documentation for nassqs_GET() so that it clearly states that the main function is nassqs() and that in general that function should not be needed. What do you think about that and about the larger issue of usability?

@adamhsparks
Copy link
Member

adamhsparks commented Jul 23, 2019

I think that sounds reasonable. The documentation can always be structured with the meat in the main vignette and then more advanced usage in another or farther down the page of a single vignette under an "Advanced Usage" header or some such.

@potterzot
Copy link
Author

Question: Since both reviewers recommend removing release_questions(), where is it recommended that it go? These are helpful (for me) pre-release questions and the function is suggested by the R Packages book (here), but it doesn't suggest where to put it.

Response to Reviewers

The reviewers raise some excellent points to consider about usability and organization. The package feels greatly improved by virtue of their comments and suggestions. Thank you again for your time and your invaluable suggestions. rnassqs is much cleaner and much improved as a result.

Below I detail some more general thoughts that were raised by the reviewers and my response, and then detail specific response to each reviewer separately.

Data size and usability

One reviewer suggestion was to improve the interface to make it more user friendly, i.e. that the package seemed to be built around the needs of the API rather than the needs of the user. To some extent this is a function of the inflexibility of the API itself. A data request to the Quick Stats API returns JSON that, when parsed to a data.frame, results in a data.frame that has 39 columns. Unfortunately there is no way to limit the number of columns returned. The idea of being able to select a la dplyr is great, but since all 39 columns must be returned, it seems best to leave it to the user to select after the call is made.

Parameter passing

In response to suggestions about parameters and user experience I've made two changes. The first allows for specifying either a list of parameters in nassqs as was the previous case, or specifying each parameter as a separate argument to nassqs, as was suggested by @nealrichardson. In addition, I've added links to parameter documentation in the API, and now nassqs_params() returns a list of parameters, while nassqs_params("agg_level_desc") returns a description of the "agg_level_desc" parameter. I've also updated the vignette to show both methods.

Pagination and handling requests larger than 50,000 records

The question of pagination came up repeatedly. It's an interesting one in the context of this API. There is not a direct way to paginate results that the API supports. Typically I end up subsetting by year or geography to make the query small enough. rnassqs could try to subset by year or geography automatically, perhaps with a series of rules that first subset by year and if only one year is requested or the request is too large, then also subsetting by a smaller geography. However, there are potential issues here. For example, a state-level request will not necessarily result in the same data as collecting all counties for that state for two reasons:

  1. Some data are collected at the state level only
  2. County-level data may be suppressed where the state level data is not

Automatically subsetting by year would be doable though. I have added an issue for a future release to do so. In the meantime I have also added information in the error message to suggest how to subset the query. I have also included information on iteration to subset queries in the vignette.

Specific changes

  • Added reviewers to the DESCRIPTION with role = 'rev'
  • Fixed spelling errors (sorry)
  • Removed release_questions() (but see question at the top of this response)
  • Added a progress bar as suggested by @nealrichardson

Response to Reviewer 1

@adamhsparks provided some excellent feedback on issues of usability and potential unnecessary complexity. In particular, this comment was helpful.

I feel a bit like the functionality for the end-user is a bit overly complicated. This sort of package to me should just return data in a data.frame or list or some other R object that I can easily work with. In some cases I get a server response, in others I can ask for raw JSON or other file formats. I don't really see a good reason for this in R. I guess the functionality could be offered, but I think it needs to be hidden under a few layers for more advanced users that might want to use it. Rather it should just be a simple query for the data that I want and returning it in a format that I can readily use in R without extra steps. Just fetch the data and format it for me in one single step. I think it's possible with this package, but the documentation is a bit convoluted and not clear to me.
There seem to be two major and related issues

While it's true that the package contains nassqs, which will just simply query and return a data.frame object without the user having to specify anything, I have reorganized and rewritten the documentation and vignette to emphasize nassqs rather than the low-level functions nassqs_GET and nassqs_parse. I have added documentation and changed the vignette to focus on the ease of use aspect, rather than on building a query using the core functions.

The second issue concerned the organization of functions. I have reorganized functions into files by collective functionality, in an effort to meet the guidelines suggested in R Packages, which states

While you’re free to arrange functions into files as you wish, the two extremes are bad: don’t put all functions into one file and don’t put each function into its own separate file.

Now functions dealing with the request and parsing of the request are in request.R. Authorization functions are in auth.R. Helpers are in helpers.R. Functions dealing with parameters and parameter values are in params.R. Functions that make queries easier are in wrappers.R. I think this strikes a good balance, but am certainly open to suggestions to make this clearer if needed.

Specific Items

General package

  • Clarified keeping the API key in the .Renviron file in the README and vignette
  • Added a CITATION file, though it has placeholders until/if the article is accepted by JOSS
  • Resolved GIT merge conflicts in paper.md
  • contributing and code of conduct text added in CONTRIBUTING.md and CONDUCT.md, as well as in the README

Vignette

  • Fixed references to nassqs_fields() in the vignette, which now refer to nassqs_param()
  • Fixed error with nassqs_field_values() in the vignette
  • Fixed inconsistencies with 'PARSE' in the vignette

Code

  • I did not change the case of nassqs_GET, based on style guidelines from the httr package: Best practices for API packages
  • removed base_url as a parameter in nassqs_GET

Code Style

  • Fixed inconsistencies with "=" and "<-"
  • All code lines are 80 characters or less except where required (e.g. documentation linking to long urls)

Documentation

  • Fixed incorrect reference to jsonlite::fromJSON
  • Fixed link to utils::read.table
  • Added "\code{\link[]{}}" for linking to function documentation (and later removed in favor of markdown syntax).
  • I did not change function titles to title case based on the examples from the 'R Packages' [documentation chapter]](http://r-pkgs.had.co.nz/man.html)
  • Added an example to nassqs_params()
  • Fixed example for nassqs_param_values()
  • Fixed inconsistencies between use of single and double quotes, though note that I use single quotes to refere to keys in a list or object, and double quotes to refer to strings.
  • wrapped all examples that make an API call in '\donttest'
  • Clarified documentation and examples in 'nassqs_GET' to specify that it is a low-level function
  • Clarified documentation in 'nassqs_parse' to better explain what it does and why

Response to Reviewer 2

@nealrichardson brought up several excellent points about the API and especially about testing and ease of use and focusing on the needs of the user. I feel it is easier to define a list of parameters and submit that as a single argument to nassqs, especially for example when iterating over a collection of queries. However, I recognize both needs, and have made it possible to call nassqs in either of two ways:

# First method, a named list of parameters
params <- list(agg_level_desc = "STATE",
               state_alpha = c("VA", "WA"))
nassqs(params)

# Second method, separate arguments
nassqs(agg_level_desc = "STATE", state_alpha = c("VA", "WA"))

# Or without capitalizing
nassqs(agg_level_desc = "state", state_alpha = c("va", "wa"))

I have expanded the vignette to demonstrate both methods and to emphasize the iteration and pagination of data available by iterating over a list of parameter lists.

Many of @nealrichardson's suggestions involve simplifying the interface, and I think the new function calls are much improved in this regard. These suggests were a real gem. Authorization is simpler, functions have fewer and simpler arguments, and overall ease of use is improved. His suggestion of allowing year >= 2012 instead of (or in addition to) year__GE = 2012 is also a good one. I have not implemented it here because I suspect LIKE and NOT LIKE would be slightly more difficult. I have created an issue to implement this in a future release.

Another concern was that all data is in character format rather than numeric for columns that are numeric. The reason is that the Quick Stats data lists suppressed or unavailable information in a variety of character-based ways. As a result the Value field may contain "(D)", "(Z)", or "(S)" rather than numbers. Converting to numeric makes these values NA, which loses the specific information about why the data is missing. It is true that it is easy enough to convert to numerical format, but in my opinion keeping this information about why data is missing is important.

The httptest package is a huge help and I wish I had known about it when I was asking on twitter about API testing months ago. I've reorganized and updated the tests to use mock API calls, and also with tests in test files that correspond to the function file names in the R directory. For example, test-requests.R contains tests for functions in R/request.R.

Specific Items

General

  • Added 'assertthat' to Suggests in DESCRIPTION
  • Added stats and utils to Imports in DESCRIPTION
  • Added rnassqs.Rcheck and rnassqs_*.tar.gz to .Rbuildignore and .gitignore
  • Added case insensitivity
  • Converted documentation to markdown format
  • Improved documentation generally, especially for parameter definitions
  • Added a package web site with pkgdown

Code

  • Moved nassqs to top of file since it is the main function
  • Fixed nassqs_fields <- nassqs_params
  • In nassqs_GET, removed url_only and format as function parameters
  • In nassqs_parse, used inherits rather than class() ==.
  • In nassqs_parse RE: "L153: when is not a response object?", at times the API is not working, so in that case this returns the error message directly.
  • In nassqs_parse, removed unreachable warning
  • In nassqs_parse, added else case and changed read.table to read.csv
  • In nassqs and nassqs_parse, change 'raw' to 'text' as an option for the as parameter
  • In nassqs_check, RE: "what do I do if my request is too large? Give some recommendation for how to fix it. If it means that you need to make paginated requests, how would I do that? An even friendlier solution would be for you to handle the pagination for the user so they don't need to know/care about this API constraint."
  • In expand_list, added @Keywords internal and expanded with a description of what it does and why.
  • In nassqs_auth(), now nassqs_GET checks for the environmental variable NASSQS_TOKEN, and nassqs_auth simply sets that token.
  • In `nassqs_parse(), RE: "It was surprising that the one column name that the package code changes gets a rather unfriendly name ("CV (%)"), while the other columns follow a standard, well-behaved format." This name change occurs because of the CSV response, so it is changed there to match the column name in other response types (i.e. JSON and XML)

Code Style

  • Fixed inline if statements: if() { } by removing braces
  • Regarding this comment: '@importFrom httr function_name rather than httr::function_name', R Packages generally recommends using the "::" version since it makes clear what package the function is coming from, with the exception that it makes things very slightly slower. However, the vast amount of time in the GET request is due to the API service, so in this case using "::" makes sense to me
  • Removed quotes from list names

Vignette

  • Removed rnassqs:: from functions
  • Removed .secret and references to it.

Tests

  • Removed use of .secret for testing
  • Reorganized tests to match the file names of the functions they are testing
  • Make use of httptest::with_mock_api
  • Fixed is.numeric(as.numeric(...))

@adamhsparks
Copy link
Member

This looks much improved as I've glanced over it. Thanks for thoughtfully responding to our reviews and comments. Thanks for explaining any reasons why my suggestions weren't followed, I have no objections to any of them. Some of my suggestions have been based on CRAN's, umm, erratic(?) enforcement of rules from time-to-time, so ignoring some of my suggestions are probably fine as I'm not sure that I use title case in all my documentation everywhere but think I was pulled up on it once before.

I think that the organisation of the functions is much more clear now and agree with Hadley on not all in one and not one only per .R file.

Regarding the question on release_questions(), devtools::release() actually asks most of those and more when you use it, I'd suggest using that rather than including questions for yourself in the package NAMESPACE.

For expand_list() I'd use a @noRd tag since it does not need to be exposed to the end-user, that I can tell? Documentation is good, I do that for my internal functions so I know what they do, but it shouldn't clutter the user's experience having it documented unless it's used somewhere that I'm missing where an end-user actually calls it?

There's no need for the CITATION file to be incomplete as you've suggested. It should have two entries after acceptance to JOSS. One should just be for the package, the current version number and year it was released that will automatically update with new releases, which you can set up now. The second is the JOSS paper citation that will never change. The example I provided shows this.

I'm curious, how is it different than usdarnass, which you have mentioned in the README now? This isn't detailed in the original submission.

@potterzot
Copy link
Author

@adamhsparks thank you. I've updated the CITATION file and also added @nord to expand_list().

Regarding usdarnass, I added the reference to the README after I found out about it, which was after I had submitted for rOpenSci review. I'm fairly sure rnassqs was developed first, since my first git commit was in June 2015, while theirs was November 2018, and rnassqs was published on CRAN on May 03 while usdarnass was published on CRAN on June 21. I think they were actually developed unaware of each other. If you have any thoughts or suggestions about a course of action I'd be all ears. It seems we could basically continue to develop in parallel or we could merge packages. I haven't reached out to the authors other than make a suggestion on an issue to let them know they could allow for multiple options as I write below.

The differences are small as far as I can tell:

  • rnassqs makes all of the query parameters available, while usdarnass only allows a subset.
  • rnassqs allows multiple options like state_alpha = c("VA", "WA"), but after I commented on an issue on usdarnass to say it was possible to do that, usdarnass does that as well.
  • Before this review, rnassqs took a list of parameters like nassqs(params), while usdarnass takes parameters like nass_data(state_alpha = "VA", agg_level_desc = "state"). Now as per suggestion from @nealrichardson rnassqs works either way.

@nealrichardson
Copy link

I'll just briefly comment that this all sounds good in principle and I look forward to re-reviewing in detail, though I won't be able to get to that until early next week.

@adamhsparks
Copy link
Member

Thanks, @potterzot. As I said, it was a quick glance. Echoing what @nealrichardson said, I need to fully re-review everything. Those were just the few things I found quickly so I commented.

@adamhsparks
Copy link
Member

adamhsparks commented Aug 2, 2019 via email

@nealrichardson
Copy link

Nice work. This looks much improved. I found a few sylistic issues again, and I had some suggestions for how to improve the testing, but rather than write them here, I've made a pull request with them for you to review/merge. The other reason I implemented these suggestions myself was that I got a test failure locally because one of the tests required auth but did not have the appropriate skip_if_no_auth() check, so I was already in the code to debug that. There was also a R CMD check issue I encountered because the .Rbuildignore still wasn't correctly excluding previously built tarballs. All fixed by that PR.

Test coverage could be better, though my PR bumps it up to 91%. Happy to advise on covering the conditions that currently are missed that if you want, though I won't withhold approval based on not reaching 100% line coverate.

One last followup point:

In nassqs_parse RE: "L153: when is not a response object?", at times the API is not working, so in that case this returns the error message directly.

httr::GET() will either return a response object (potentially with an error status, which you will have already have handled before getting here because you pass through the nassqs_check() function) or GET() will itself error (like if your internet is down). I don't think it's possible for it to return anything different.

@adamhsparks
Copy link
Member

adamhsparks commented Aug 8, 2019

I have a few last minor points (nitpicks?) that if changed will improve the package. Overall it's greatly improved and I like how it works. Congrats!

1.* @lmullen already noted this much earlier in the process. Please remove the "Date" field from the DESCRIPTION file. CRAN will automatically assign this and it's prone to ending up being out of date if you rely upon updating it manually.

  1. You don't need a paste() in a stop(), e.g.,
    stop(paste0("Your query parameters include 'format' as ", format, " but it should be one of 'json', 'xml', or 'csv'.")).
    It should be written as:
    stop("Your query parameters include 'format' as ", format, " but it should be one of 'json', 'xml', or 'csv'.")
    There are several instances of this that I noted.

3.* "inst/examples/example_parameters.R" has an incomplete final line. Add a line return to the file at the end of the file to fix this.

  1. The documentation for nasqss_GET() is inconsistent in how it references functions in the description. Some of the other R functions discussed in that paragraph use the function() convention while referring to nasqss_GET() is only as nassqss_GET minus the (). As a user I find it more clear if the () is used in documentation to indicate a function, not a parameter is being discussed. Note I didn't check all documentation, I just noticed this here.

5.* The documentation example for nassqss_param_values has "YIElD" not "YIELD" in the comment section, is this correct?

6.* nassqss_parse() documentation Description field is missing a "'" prior to (Z).

7.* I'm not sure that here needs to be listed in the DESCRIPTION Suggests field. It's only used in data-raw as far as I can tell? If so, that folder is not included in the R package so shouldn't need to be specified here.

8.* In the data-raw/get_test_data.R file, it might be good to set the version to 2 for maximum compatibility in the near term with versions of R from 1.4.0 to current. If it's NULL it will default to version 3.

  1. The README code could be formatted a bit more nicely too using proper RMarkdown chunks, e.g.,
```{r eval=FALSE}
    # Via devtools
    library(devtools)
    install_github('potterzot/rnassqs')
    
    # Via CRAN
    install.packages("rnassqs")```
  1. The README may not need to be a .Rmd? I can't see that you have any executed R code so you could simplify and just use a .md file.

  2. Consider using codemetar::write_codemeta() to create and update a .json metadata file for the package?

Once these are addressed (at your discretion for many of them) I'm happy to recommend accepting. I've added a "*" after the number and before the comment for the items that I think must be fixed. Those without are at your discretion.

@potterzot
Copy link
Author

@adamhsparks Thank you for the incredible detail in this! Much appreciated. I've made all of the changes you suggest except for this one, which I'm unclear on:

8.* In the data-raw/get_test_data.R file, it might be good to set the version to 2 for maximum compatibility in the near term with versions of R from 1.4.0 to current. If it's NULL it will default to version 3.

What do you mean by setting the version? Do you mean setting the R version in DESCRIPTION?

@potterzot
Copy link
Author

@nealrichardson I've reviewed and merged your PR, thanks! There were two tests for error handling that were failing:

  • "Too-large request error is handled"
  • "Other server error is handled"

Because they were within the with_mock_api() block they were returning a GET object instead of the error. I moved these to the authorization block and they work.

Regarding .Rbuildignore excluding tarballs, I had included that at some point long ago, but removed it for a reason that I don't remember. Thank you for adding it.

@potterzot
Copy link
Author

@nealrichardson PS if I have your okay I've also added you as a contributor in DESCRIPTION.

@nealrichardson
Copy link

Where did you see the failure? The PR merge commit passed on Travis. They don't require auth because they use the mock responses I added here: https://github.com/potterzot/rnassqs/pull/15/files#diff-7c5a672790a8227968bfd57c3a71faa0 Did you possibly make other changes that altered the querystring in the request? That would change the request URL and thus change the mock file path it was looking for. If so, you can rename those mock files to match and they'll be fine.

Sure, happy to be listed at contributor.

@potterzot
Copy link
Author

@nealrichardson Hmm, I checked out your commit again and am having no trouble. I must have changed something that started giving me those errors. I returned those files to their original state in a new commit. Also removed the response check based on your note about nassqs_GET() always returning a response object.

@adamhsparks
Copy link
Member

Hi @potterzot,
The RDS version of the file is 1, 2 or (default) 3. This changed with R 3.6 to "3". So many users may not have the ability to read a version 3 RDS file yet if they've not upgraded to R >= 3.6.

See ?saveRDS for more on version

@potterzot
Copy link
Author

Ah I see, that script was generally outdated anyway, so thank you for pointing that out! I was also unaware of the breaking change and do a lot of my data storage in RDS so thank you.

@adamhsparks
Copy link
Member

@lmullen, I've updated my initial review with the suggestion to accept, ticked the rest of the boxes and updated time spent reviewing.

@lmullen
Copy link
Member

lmullen commented Aug 15, 2019

@adamhsparks Great, thanks so much.

@nealrichardson Is there anything else outstanding from your perspective?

@nealrichardson
Copy link

All good, just checked the boxes.

@lmullen
Copy link
Member

lmullen commented Aug 15, 2019

Approved! Thanks @potterzot for submitting and making all the requested changes. And thanks, @adamhsparks and @nealrichardson for especially thorough reviews. Much appreciated.

@potterzot here are some to-dos to complete the onboarding process.

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do. Once you do I'll give you admin access.

  • Add the rOpenSci footer to the bottom of your README
    " [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)"

  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname).

  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Since you are also publishing to JOSS, we need you to do the following.

  • Activate Zenodo watching the repo
  • Tag and create a release so as to create a Zenodo version and DOI
  • Submit to JOSS using the Zenodo DOI. http://joss.theoj.org/papers/new When the paper shows up at JOSS, then add the following comment to the submission thread. This submission has been accepted to rOpenSci. The review thread can be found at <URL TO THIS REVIEW>.

You can also release a new version to CRAN.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd love to host a blog post about your package - either a short introduction to it with one example or a longer post with some narrative about its development or something you learned, and an example of its use. If you are interested, review the instructions, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

@potterzot
Copy link
Author

Thank you! This is very exciting. Thank you all for your fantastic help and efforts. @adamhsparks, I am realizing I didn't specifically ask your permission to include you as a reviewer. @lmullen, I've switched the repository over to ropensci and made changes to the readme.

@lmullen
Copy link
Member

lmullen commented Aug 16, 2019

Ok, you should be an admin on the repository again, @potterzot.

@potterzot
Copy link
Author

@stefaniebutland I would be happy to do a blog post, probably realistically not possible until October. I think I could do a longer article discussing why I started developing the package and how it's been helpful in my research and what I've learned in the process, if that seems like a good fit. Happy to do a shorter one as well.

@stefaniebutland
Copy link
Member

@potterzot Sounds good! Please submit a draft when you're ready and we can select a publication date at that time.

why I started developing the package and how it's been helpful in my research

If you include a "cool" example (that's not shown elsewhere) this is especially valuable as a way for readers to see how they might use the package.

what I've learned in the process

Always good to share this. Try to choose a couple of key points.

Thanks!

@stefaniebutland
Copy link
Member

Hi @potterzot. I'm checking in to let you know I have a blog post slot open for Tues Oct 29 if you still wanted to do a long form post. A shorter tech note is quite appropriate and could be published any time, after my review.

@potterzot
Copy link
Author

potterzot commented Nov 8, 2019

@stefaniebutland I couldn't make the Oct 29 deadline but have a working draft now, do you have a good future date that would work? The draft is basically done but I can change the template date and file names to match the anticipated date.

Also, I'm not sure where to put images. The template links I can change, but I don't see the corresponding img/blog-images directory in the roweb2 repository.

@potterzot
Copy link
Author

@stefaniebutland nevermind on the second part, I figured out the images. It helps if I read the instructions in full!

@stefaniebutland
Copy link
Member

For now, please date 2019-11-26.
That might change based on submission status of other posts that already have dates assigned.

I admit there are a LOT of instructions ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants