-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submission: rnassqs - access the USDA-NASS 'Quick Stats' data through their API #298
Comments
@potterzot I will be the editor for this peer review. Here are my editorial checks. Editor checks:
Editor commentsIn running R CMD check I get the following:
I am in the process of looking for peer reviewers. Reviewers: |
@lmullen thank you for your comments. I've made a commit to address your comments. Regarding this note:
Tests in tests/testthat/test-oncran.R beginning on line 55 include mock API calls. The tests make the request, specifying that the function return the GET request URL rather than actually make the request, and that request is compared to the correct URL. There are three API paths to test:
There are additional mock API tests that follow those, but those are for convenience functions for making specific requests, e.g. Tests in Is there a better way of organizing tests that makes it clear where the API mock tests are done and where the actual API call tests are done? |
@potterzot That sounds fine to me. I just wanted to make sure the reviewers and I understood. |
@potterzot Apologies for the delay in getting this review going. One person has agreed to review but a string of others have been unavailable at the start of the summer. Still looking for that second reviewer and then the review will begin. |
Thanks to our reviewers for agreeing to take on this package. Reviewer: @adamhsparks You can find the guide for reviewers here. Please let me know if you have any questions. |
I know I'm behind. Sorry, I've had a rather busy time lately. I'm starting the review today and will see how I go. |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 7
Review CommentsThis package offers great functionality and defines very nicely why it's necessary. I'm happy to see this sort of package being written. Following are my comments.
Comments on VignetteWhen I try using the > library(rnassqs)
> nassqs_token()
Error in nassqs_token() : could not find function "nassqs_token" This example works as expected, nassqs_fields()
[1] "agg_level_desc" "asd_code"
[3] "asd_desc" "begin_code"
[5] "class_desc" "commodity_desc"
[7] "congr_district_code" "country_code"
[9] "country_name" "county_ansi"
[11] "county_code" "county_name"
[13] "CV" "domaincat_desc"
[15] "domain_desc" "end_code"
[17] "freq_desc" "group_desc"
[19] "load_time" "location_desc"
[21] "prodn_practice_desc" "reference_period_desc"
[23] "region_desc" "sector_desc"
[25] "short_desc" "state_alpha"
[27] "state_ansi" "state_name"
[29] "state_fips_code" "statisticcat_desc"
[31] "source_desc" "unit_desc"
[33] "util_practice_desc" "Value"
[35] "watershed_code" "watershed_desc"
[37] "week_ending" "year"
[39] "zip_5" however, ?nassqs_fields() returns a help file that says,
Suggest updating vignette to match most recent functionality. Another error occurs with this example from the vignette. > rnassqs::nassqs_field_values(field = 'unit_desc')
Error: 'nassqs_field_values' is not an exported object from 'namespace:rnassqs' The "All together" section script is not functional. fields <- nassq_fields()
Error in nassq_fields() : could not find function "nassq_fields" Comments on functions
My suggestion is remove
Comments on documentation
# See all values available for the statisticcat_desc field. Values may not
# be available in the context of other parameters you set, for example
# a given state may not have any 'YIElD' in blueberries if they don't grow
# blueberries in that state.
# Requires an API key:
#nassqs_param_values("statisticcat_desc", key = "my api key") Should appear as # See all values available for the statisticcat_desc field. Values may not
# be available in the context of other parameters you set, for example
# a given state may not have any 'YIElD' in blueberries if they don't grow
# blueberries in that state.
# Requires an API key:
nassqs_param_values("statisticcat_desc", key = "my api key")
> params = list(commodity_name="CORN",
+ year=2012,
+ agg_level_desc = "STATE",
+ state_alpha = "WA",
+ statisticcat_desc = "YIELD")
> nassqs_GET(params)
Response [https://quickstats.nass.usda.gov/api/api_GET?key=XXXXXXXXXXXXXXXXXXXXXXX&commodity_name=CORN&year=2012&agg_level_desc=STATE&state_alpha=WA&statisticcat_desc=YIELD&format=JSON]
Date: 2019-07-13 04:48
Status: 200
Content-Type: application/json
Size: 148 kB What do I do with this response value? How is this response yields for corn in 2012 in Washington? Does the end-user even need to interface with this function or should it be hidden and used by the other functions in the package that return data in
# Set parameters and make the request
params = list(commodity_name="CORN",
year=2012,
agg_level_desc = "STATE",
state_alpha = "WA",
statisticcat_desc = "YIELD")
req <- nassqs_GET(params)
nassqs_parse(req, as = "data.frame") would be more clear as # Set parameters and make the request
params <- list(
commodity_name = "CORN",
year = 2012,
agg_level_desc = "STATE",
state_alpha = "WA",
statisticcat_desc = "YIELD"
)
req <- nassqs_GET(params)
corn <- nassqs_parse(req, as = "data.frame")
head(corn)
Comments on code style
|
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 5
Review CommentsLooks like I get to be the infamous "Reviewer 2" this time :) Since Adam dutifully got his review in on time and I'm delayed a couple of days, I'll say at the start that I agree generally with all of his points (with one minor exception, noted below), and in the interest of brevity, I've tried to avoid reiterating comments he made. I greatly appreciate packages like this and the effort that goes into making them. This is the kind of package I wish existed when I was doing my dissertation--would have made at least the data retrieval and management part of research a lot cleaner. So, thank you for your contribution. Style and code-hygiene comments aside, my main suggestion to you is to think about how you can orient the package towards the R user and their needs. The purpose of this package should be to encapsulate your hard-earned knowledge of how this API works and enable data scientists to access that data as naturally as possible. However, the package currently seems to expose an R interface centered around the needs of the API and not the R user. For example, the functions that seem to make up the public interface for the package are mostly stuck in "helpers.R", as if they're ancillary when really they should be front and center. Here's a more concrete example: reading the vignette discussion about the comparison operators, it looks like the way to get data on corn production in Virginia since 2012 is nassqs(list(commodity_desc="CORN", year__GE=2012, state_alpha="VA"))) But when I think about the R code that I want to type to get that data, it looks more like nassqs(commodity == "corn", state == "va", year >= 2012) That is, not case-sensitive, handles the different ways commodities and states (for example) could be referenced (integer code, abbreviation, long name), and more naturally handles the comparison operators (can translate I'm not saying you need to do exactly this/all of this now (though some things, like case insensitvity, would be easy enough), but more to suggest ideas for the future and to show what I mean in terms of thinking of the R interface and not just what the API demands. When I wrap APIs (and in general write packages), I like to start by thinking about the R code I want (users) to type, and that usually means doing more work in the package to mask the awkwardness of the underlying HTTP API. Another way to get more R-user-centric would be to improve the documentation, which I found to be thin, often tautological ( A good way to see what the docs will look like to your users is to use Observations from using the packageI got a token and did the example on the readme (corn yields in Virginia and Pennsylvania for since 2000).
That all worked as expected. That said,
Specific code notes
Miscellaneous
|
@adamhsparks and @nealrichardson: Thanks to both of you for these thorough and detailed reviews. @potterzot Please take a look at these comments from the reviewers. There is lots of good advice in here. It does seem like some of it will require some pretty fundamental reassessment of the package's user-facing interface. Do you think you can make revisions within two weeks, our typical deadline? i.e., by July 29? |
@lmullen, @adamhsparks, @nealrichardson first let me say a big thanks, these are some really good suggestions and you clearly put in a lot of time to give some great feedback. While some of the changes are substantial, I hope that the underlying framework will make them relatively easy to implement. I think I can probably submit a revision by July 29th. Is it possible to ask for an extension if that becomes necessary? |
@potterzot Sure. If you can plan on a July 29 deadline, that would be best, but an extension would be fine if it becomes necessary. |
@potterzot, I hope you find my review useful and not being negative. This is an extremely valuable package, my interest is in helping you improve it. Feel free to ask for help or guidance along the way. I'm happy to contribute. |
Hi @adamhsparks, thank you for your time and willingness to help. I've finished making the straight-foward changes you suggested, and am thinking about the larger issues, which seem to boil down to two related issues:
The main function of the package is I propose hiding |
I think that sounds reasonable. The documentation can always be structured with the meat in the main vignette and then more advanced usage in another or farther down the page of a single vignette under an "Advanced Usage" header or some such. |
Question: Since both reviewers recommend removing Response to ReviewersThe reviewers raise some excellent points to consider about usability and organization. The package feels greatly improved by virtue of their comments and suggestions. Thank you again for your time and your invaluable suggestions. Below I detail some more general thoughts that were raised by the reviewers and my response, and then detail specific response to each reviewer separately. Data size and usabilityOne reviewer suggestion was to improve the interface to make it more user friendly, i.e. that the package seemed to be built around the needs of the API rather than the needs of the user. To some extent this is a function of the inflexibility of the API itself. A data request to the Quick Stats API returns JSON that, when parsed to a data.frame, results in a data.frame that has 39 columns. Unfortunately there is no way to limit the number of columns returned. The idea of being able to Parameter passingIn response to suggestions about parameters and user experience I've made two changes. The first allows for specifying either a list of parameters in Pagination and handling requests larger than 50,000 recordsThe question of pagination came up repeatedly. It's an interesting one in the context of this API. There is not a direct way to paginate results that the API supports. Typically I end up subsetting by year or geography to make the query small enough.
Automatically subsetting by year would be doable though. I have added an issue for a future release to do so. In the meantime I have also added information in the error message to suggest how to subset the query. I have also included information on iteration to subset queries in the vignette. Specific changes
Response to Reviewer 1@adamhsparks provided some excellent feedback on issues of usability and potential unnecessary complexity. In particular, this comment was helpful.
While it's true that the package contains The second issue concerned the organization of functions. I have reorganized functions into files by collective functionality, in an effort to meet the guidelines suggested in R Packages, which states
Now functions dealing with the request and parsing of the request are in Specific ItemsGeneral package
Vignette
Code
Code Style
Documentation
Response to Reviewer 2@nealrichardson brought up several excellent points about the API and especially about testing and ease of use and focusing on the needs of the user. I feel it is easier to define a list of parameters and submit that as a single argument to
I have expanded the vignette to demonstrate both methods and to emphasize the iteration and pagination of data available by iterating over a list of parameter lists. Many of @nealrichardson's suggestions involve simplifying the interface, and I think the new function calls are much improved in this regard. These suggests were a real gem. Authorization is simpler, functions have fewer and simpler arguments, and overall ease of use is improved. His suggestion of allowing Another concern was that all data is in character format rather than numeric for columns that are numeric. The reason is that the Quick Stats data lists suppressed or unavailable information in a variety of character-based ways. As a result the The Specific ItemsGeneral
Code
Code Style
Vignette
Tests
|
This looks much improved as I've glanced over it. Thanks for thoughtfully responding to our reviews and comments. Thanks for explaining any reasons why my suggestions weren't followed, I have no objections to any of them. Some of my suggestions have been based on CRAN's, umm, erratic(?) enforcement of rules from time-to-time, so ignoring some of my suggestions are probably fine as I'm not sure that I use title case in all my documentation everywhere but think I was pulled up on it once before. I think that the organisation of the functions is much more clear now and agree with Hadley on not all in one and not one only per .R file. Regarding the question on For There's no need for the CITATION file to be incomplete as you've suggested. It should have two entries after acceptance to JOSS. One should just be for the package, the current version number and year it was released that will automatically update with new releases, which you can set up now. The second is the JOSS paper citation that will never change. The example I provided shows this. I'm curious, how is it different than |
@adamhsparks thank you. I've updated the CITATION file and also added @nord to Regarding The differences are small as far as I can tell:
|
I'll just briefly comment that this all sounds good in principle and I look forward to re-reviewing in detail, though I won't be able to get to that until early next week. |
Thanks, @potterzot. As I said, it was a quick glance. Echoing what @nealrichardson said, I need to fully re-review everything. Those were just the few things I found quickly so I commented. |
Will do!
…--
Dr Adam H. Sparks
Associate Professor of Field Crops Pathology | Centre for Crop Health
Institute for Life Sciences and the Environment (ILSE) | Research and Innovation Division
University of Southern Queensland | Toowoomba, Queensland | 4350 | Australia
Phone: +61746311948
Mobile: +61415489422
Mobile: +61415489422
On 2 Aug 2019, 5:38 AM +1000, Lincoln Mullen ***@***.***>, wrote:
Thanks for the detailed changes and response, @potterzot.
@adamhsparks and @nealrichardson Thanks for looking over the changes. Could you please complete your re-review and either request additional changes or vote to approve the package by August 8?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Nice work. This looks much improved. I found a few sylistic issues again, and I had some suggestions for how to improve the testing, but rather than write them here, I've made a pull request with them for you to review/merge. The other reason I implemented these suggestions myself was that I got a test failure locally because one of the tests required auth but did not have the appropriate Test coverage could be better, though my PR bumps it up to 91%. Happy to advise on covering the conditions that currently are missed that if you want, though I won't withhold approval based on not reaching 100% line coverate. One last followup point:
|
I have a few last minor points (nitpicks?) that if changed will improve the package. Overall it's greatly improved and I like how it works. Congrats! 1.* @lmullen already noted this much earlier in the process. Please remove the "Date" field from the DESCRIPTION file. CRAN will automatically assign this and it's prone to ending up being out of date if you rely upon updating it manually.
3.* "inst/examples/example_parameters.R" has an incomplete final line. Add a line return to the file at the end of the file to fix this.
5.* The documentation example for 6.* 7.* I'm not sure that here needs to be listed in the DESCRIPTION Suggests field. It's only used in data-raw as far as I can tell? If so, that folder is not included in the R package so shouldn't need to be specified here. 8.* In the data-raw/get_test_data.R file, it might be good to set the
Once these are addressed (at your discretion for many of them) I'm happy to recommend accepting. I've added a "*" after the number and before the comment for the items that I think must be fixed. Those without are at your discretion. |
@adamhsparks Thank you for the incredible detail in this! Much appreciated. I've made all of the changes you suggest except for this one, which I'm unclear on:
What do you mean by setting the version? Do you mean setting the |
@nealrichardson I've reviewed and merged your PR, thanks! There were two tests for error handling that were failing:
Because they were within the Regarding |
@nealrichardson PS if I have your okay I've also added you as a contributor in DESCRIPTION. |
Where did you see the failure? The PR merge commit passed on Travis. They don't require auth because they use the mock responses I added here: https://github.com/potterzot/rnassqs/pull/15/files#diff-7c5a672790a8227968bfd57c3a71faa0 Did you possibly make other changes that altered the querystring in the request? That would change the request URL and thus change the mock file path it was looking for. If so, you can rename those mock files to match and they'll be fine. Sure, happy to be listed at contributor. |
@nealrichardson Hmm, I checked out your commit again and am having no trouble. I must have changed something that started giving me those errors. I returned those files to their original state in a new commit. Also removed the response check based on your note about |
Hi @potterzot, See |
Ah I see, that script was generally outdated anyway, so thank you for pointing that out! I was also unaware of the breaking change and do a lot of my data storage in RDS so thank you. |
@lmullen, I've updated my initial review with the suggestion to accept, ticked the rest of the boxes and updated time spent reviewing. |
@adamhsparks Great, thanks so much. @nealrichardson Is there anything else outstanding from your perspective? |
All good, just checked the boxes. |
Approved! Thanks @potterzot for submitting and making all the requested changes. And thanks, @adamhsparks and @nealrichardson for especially thorough reviews. Much appreciated. @potterzot here are some to-dos to complete the onboarding process.
Since you are also publishing to JOSS, we need you to do the following.
You can also release a new version to CRAN. Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them Welcome aboard! We'd love to host a blog post about your package - either a short introduction to it with one example or a longer post with some narrative about its development or something you learned, and an example of its use. If you are interested, review the instructions, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions. We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here. |
Thank you! This is very exciting. Thank you all for your fantastic help and efforts. @adamhsparks, I am realizing I didn't specifically ask your permission to include you as a reviewer. @lmullen, I've switched the repository over to ropensci and made changes to the readme. |
Ok, you should be an admin on the repository again, @potterzot. |
@stefaniebutland I would be happy to do a blog post, probably realistically not possible until October. I think I could do a longer article discussing why I started developing the package and how it's been helpful in my research and what I've learned in the process, if that seems like a good fit. Happy to do a shorter one as well. |
@potterzot Sounds good! Please submit a draft when you're ready and we can select a publication date at that time.
If you include a "cool" example (that's not shown elsewhere) this is especially valuable as a way for readers to see how they might use the package.
Always good to share this. Try to choose a couple of key points. Thanks! |
Hi @potterzot. I'm checking in to let you know I have a blog post slot open for Tues Oct 29 if you still wanted to do a long form post. A shorter tech note is quite appropriate and could be published any time, after my review. |
@stefaniebutland I couldn't make the Oct 29 deadline but have a working draft now, do you have a good future date that would work? The draft is basically done but I can change the template date and file names to match the anticipated date. Also, I'm not sure where to put images. The template links I can change, but I don't see the corresponding |
@stefaniebutland nevermind on the second part, I figured out the images. It helps if I read the instructions in full! |
For now, please date 2019-11-26. I admit there are a LOT of instructions ;-) |
Submitting Author: Nicholas A. Potter (@potterzot)
Repository: https://github.com/potterzot/rnassqs
Version submitted: 0.4.0
Editor: @lmullen
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
Data retrieval because 'rnassqs' allows access the the NASS 'Quick Stats' API to fetch data.
Target audience is those who want to automate or reproducibly fetch data from 'Quick Stats', including agronomists, economists, and others working with agricultural data. Scientific applications include analysis of agricultural data by administrative region (e.g. county, state, watershed), economic analysis of policies that affect agriculture, and sociological/demographic analysis of agricultural producers over time.
None that I have been able to find.
#297, responded to by @noamross
Technical checks
Confirm each of the following by checking the box. This package:
Publication options
JOSS Options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.MEE Options
Code of conduct
The text was updated successfully, but these errors were encountered: