-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW]: MassMine: Your Access To Data #50
Comments
@n3mo thanks for this submission! Before we can proceed would you mind extracting the references into a |
@arfon thanks for the quick response. I've extracted the references as requested, as well as migrated both files to a directory named "paper". Let me know if there's anything else I can do. |
/ cc @openjournals/joss-reviewers - would anyone be willing to review this submission? If you would like to review this submission then please comment on this thread so that others know you're doing a review (so as not to duplicate effort). Something as simple as Reviewer instructions
Any questions, please ask for help by commenting on this issue! 🚀 |
Hello, I am the second author on MassMine and I have a question: May we suggest or invite an outside reviewer, or do you have an internal list that you prefer for JOSS? |
To make the process transparent, do post here when you request a reviewer's help, and mention the reviewer by GitHub handle. |
I just posted a message on Twitter and I provided the direct link to this page. Here is a link to the tweet: https://twitter.com/aaronbeveridge/status/777959812784984064 Thank you @labarba! |
@whedon list editors |
Current JOSS editors:
|
OK, the editor is @mgymrek |
@mgymrek 👋 I'm happy to help edit this one with you. |
@mbfhunzaker @ptwobrussell @dnmilne is https://github.com/n3mo/massmine your cup of tea? Are you willing to sign up as a reviewer and review this submission? |
@julianmcauley has agreed to review. Julian can you confirm that here? |
Yes, happy to review |
@whedon commands |
Here are some things you can ask me to do:
🚧 Important 🚧 This is all quite new. Please make sure you check the top of the issue after running a @whedon command (you might also need to refresh the page to see the issue update). |
@whedon assign @julianmcauley as reviewer |
OK, the reviewer is @julianmcauley |
👋 @julianmcauley. Thanks for agreeing to review this submission. Please take a look at the reviewer guidelines here: http://joss.theoj.org/about#reviewer_guidelines and update the checklist at the top of the issue as you progress through your review. |
@julianmcauley have you had a chance to look at this submission? Let us know if you have questions about the review process. |
Comments on the Checklist: Installation: Does installation proceed as outlined in the documentation? Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution. Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems). Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g. API method documentation)? Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified? Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support References: Do all archival references that should have a DOI list one (e.g. papers, datasets, software)? General comments: Positively, I think this is a well put-together project that contains features that people would be interested in. Negatively, the focus so far is on websites that already have strong API support, so really this code is adding an additional layer on top of an easy-to-use API. To somebody like me (who has never before read scheme code), it would be easier to follow the documentation from these websites' APIs directly rather than following this code. But maybe I'm in the minority. Certainly this wouldn't be an issue once more "hard to crawl" websites are added into the mix, if that's the plan. |
Julian, Thanks for the thoughtful feedback on the project. It may be helpful if I clarify an important detail. Although the code base is managed on GitHub, we envision the website located at www.massmine.org as the entry point for our typical end-user. Additional comments, with respect to this observation are detailed in-line below.
The www.massmine.org website is intended to be the definitive source for installation files and documentation. The software is distributed as a download-able binary purposely to free the user from managing dependencies and compiling the software themselves. Build instructions, as well as a link to the GitHub repository, are provided for advanced users, but we expect these users to be the exception rather than the rule.
I believe you have revealed a weakness in the design of our online documentation at www.massmine.org. There are in fact examples throughout the documentation. These resources are available in a sidebar (see screenshot below) that is revealed by clicking on the menu icon in the top left of each web page. However, this sidebar is hidden by default to accommodate smaller screens. It seems that this has led to the undesirable side effect of causing it to go unnoticed. Here are a few examples of what I'm referring to: The documentation provides a broad overview of how to use MassMine, as well as a separate detailed example analysis of Twitter. Both of these resources contain copious code snippets that gives new users copy-and-paste access to their first data set. Also, MassMine has further built-in help and examples for users, as documented on the broad overview page.
Once again, the website's navigation sidebar has hidden the full documentation. The sidebar indeed contains complete documentation to all of MassMine's options, with a separate page for each data source. Further, example usage code snippets are provided with each documented function. We certainly don't intend our typical user to ever have to look at the raw source code.
Automated tests are deliberately missing from the application itself. As explained above, our expected end-user will use the pre-built software tool, and we've intentionally shielded them from having to participate in the code-test-compile process. As it stands, the software is written to be agnostic about the data returned by the various APIs. As such, if Twitter, for example, changes the data returned by a given API endpoint, MassMine should continue to work. That is, under the hood it makes no assumptions about the data it receives. This behavior has already made it robust to several upstream changes. For example, Twitter recently increased the number of trends returned from its REST API from 10 to 50. This change required no update to the MassMine code base. It is possible that more extensive (and rare) changes, such as adjustments to the underlying URIs of the APIs, will lead to errors. We are typically aware of such impending changes, which are often publicized in advance, and work to facilitate a fix prior to any problems.
This was indeed missing--thanks! I've added language to the readme on GitHub. Links to the GitHub repo are available at the top of the www.massmine.org website. In the future, we could consider adding information for contributors to the documentation website as well. We also plan to enable disqus comments throughout the documentation to provide support to users unaccustomed to GitHub.
Hopefully, my comments above address most of your concerns. It seems the biggest problem was the perceived lack of documentation. Full documentation does indeed exist, albeit hidden from view by default. We plan to make it visible by default to avoid the extra step of clicking on the menu button which seems easy to miss. Regarding your point about already-existing strong support for the current data sources: the purpose of this NEH grant project was to fill a missing technology gap for a specific class of users. Indeed, there are both (1) pre-made applications that analyze networked data sources, and (2) many packages for most major programming languages that provide low-level access to popular web APIs. Existing options in category #1 are either proprietary and expensive, and/or only provide pre-defined analyses rather than raw data. Further, provided functionality is typically geared toward brand management and advertising applications, rendering it ineffective to open-ended research questions. Existing options in category #2 require programming experience to use in any substantive way. For many researchers, the learning curve required makes data acquisition difficult. MassMine makes these data sources available to non-programmers interested in performing research on such data. Additionally, MassMine does many things behind the scenes for the user, such as quietly managing rate limits imposed by the APIs, handling authentication credentials, and (responsibly) reconnecting dropped connections to prevent data loss due to common network hiccups. Also, we believe that providing a toolchain-agnostic application that provides a common user interface across many different APIs is of great benefit to users. If you have further questions or concerns after examining the documentation, please don't hesitate to let me know. Best, |
Thanks @julianmcauley! And thanks @n3mo for the quick response. Regarding documentation: @julianmcauley does this clarification address your concerns? I agree it'd be helpful to move the documentation to place more obviously visible from the home page. The documentation itself seems pretty thorough from glancing through it. For tests: although the tool is indeed built for the end user, that does not exclude the possibility of adding any automated tests for developers. Indeed, one could argue that most software is intended for end-users to just run, but it should still be tested even if users don't run those tests. This is also a listed requirement for JOSS, so I would love to see some tests added unless there is a good reason not to. |
Yes, that answers my concerns about documentation. I didn't see the sidebar, but rather I clicked the "docs" tab, and assumed that was it. It seems the docs tab links to only the "getting started" page of the usage instructions but doesn't contain a link to the remaining pages. This seems like an easy fix. But the usage documentation seems sufficiently thorough. |
@mgymrek, thanks for the quick response. I agree that testing is useful for developers, especially those eager to contribute to unfamiliar code bases. One source of complexity for this project is the particulars of connecting to multiple external APIs. These APIs require users and developers alike to set up log in credentials before using their services. This prevents MassMine from shipping with truly automatic testing, as we don't have spare oauth credentials for each service to ship with the test suite. To date, this has kept us from distributing tests with MassMine. Thereforee, we would prefer to not add testing, but if JOSS feels that this is strong requirement, I'm sure a suitable compromise could be reached. Please advise. |
I see how it is tricky to make automated tests given the issue of credentials. Do any of these APIs have test credentials that can be used for testing purposes? I am also looping in @arfon to this conversation to see what he thinks. |
To my knowledge they do not offer test credentials. |
Thanks for flagging this @gymreklab. Testing external APIs is always a little tricky but there are some language-specific tools (such as Webmock, VCR in Ruby-land) that achieve this. I'm not sure if there are similar tools for Scheme. As an alternative, what about having some fixture data with example requests/responses from some of these external services and making sure that the MassMine package can process these responses as expected? This would then at least help someone who is trying to understand what the software is actually doing to view sample inputs and outputs. What do you think @n3mo? |
Thanks for your thoughts @arfon. I agree that using fakes/mocks for external services is a reasonable compromise. I've begun adding testing to the software and will notify everyone once the update is available for review. |
Tests are now available! They are included in a separate directory in the repo, but can be ran directly with massmine. The installation instructions have been updated with details. But running the tests is simple once all build dependencies are installed: ./massmine.scm --test ./tests/run.scm |
Great! @julianmcauley would you be able to take a look at the added tests? |
Yes, certainly tests have been added, though to tell the truth I don't quite follow what functionality they really test. For twitter for instance the tests are: `(test-begin "Twitter Module") (test-assert "Twitter task descriptions" (list? twitter-task-descriptions)) (test-end "Twitter Module")` Doesn't this just test that the methods exist but not actually test them for functionality? If so then I'm not sure if the tests are all they valuable, though certainly they meet the basic requirement of "having tests". Apologies if I misunderstood. |
@n3mo I have also taken a look at the tests and am not totally sure what is being tested. Could you provide a brief description here? |
@julianmcauley and @mgymrek, thanks again for the continued feedback. The various tests ensure a mixture of goals. In the simplest case, they ensure that the methods exist and that they exist in the proper format. Tests for the Twitter and Tumblr modules amount essentially to this for reasons that I'll return to. The remaining modules (Google, Web URL, & Wikipedia) provide full tests of the various tasks (i.e., data requests) that massmine provides. For these modules the tests target, in addition to the simple existence checks and helper procedures, the top-level functions that are called when massmine is run by the user, and thus fully assess the underlying functionality. That is, they make actual data requests and ensure successful retrieval. For Twitter and Tumblr we are back to the previous conversation in this thread. Without API credentials for developers (which are not provided by these services), we cannot have fully automated tests. We have previously discussed utilizing mocks where possible. This is confounded by two reasons: First, the remaining methods make calls to functions provided by other imported packages not part of this code base, making it difficult and/or impossible to inject mock data directly for return. Second, were we able to set up a simulated host behind oauth to make calls to during our tests (which would require a substantial amount of tangential work for this project, given that an existing framework such as Webmock or VCR does not already exist for Chicken Scheme), the reward would be minimal. The reason is that as it stands, the API endpoints targeted by the Twitter and Tumblr modules simply catch the returned JSON data as a string, dumping the value either to stdout or file. Thus, in the end this substantial effort would amount to producing mock data that our functions are essentially agnostic to. So long as they are strings, everything will work, which casts doubt on the value of such a large undertaking. |
@n3mo - At this point could you make an archive of the reviewed software in Zenodo/figshare/other service and update this thread with the DOI of the archive? I can then move forward with accepting the submission. |
I've archived the software with Zenodo. The DOI is: 10.5281/zenodo.193078 |
@whedon set 10.5281/zenodo.193078 as archive |
OK. 10.5281/zenodo.193078 is the archive. |
Many thanks for reviewing this one @julianmcauley and @mgymrek for editing this paper. @n3mo - your paper is now accepted into JOSS and your DOI is http://dx.doi.org/10.21105/joss.00050 ⚡️ 🚀 💥 |
Thanks @arfon, @julianmcauley, and @mgymrek for your valuable feedback throughout this process. |
Submitting author: @n3mo (Nicholas Van Horn)
Repository: https://github.com/n3mo/massmine
Version: v1.0.1
Editor: @mgymrek
Reviewer: @julianmcauley
Archive: 10.5281/zenodo.193078
Status
Status badge code:
Reviewer questions
Conflict of interest
General checks
Functionality
Documentation
Software paper
Paper PDF: 10.21105.joss.00050.pdf
paper.md
file include a list of authors with their affiliations?The text was updated successfully, but these errors were encountered: