Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta-issue for mdhaber review of JOSS submission #1

Closed
24 of 25 tasks
mdhaber opened this issue Dec 16, 2022 · 9 comments
Closed
24 of 25 tasks

Meta-issue for mdhaber review of JOSS submission #1

mdhaber opened this issue Dec 16, 2022 · 9 comments

Comments

@mdhaber
Copy link

mdhaber commented Dec 16, 2022

This is a detailed list of notes corresponding with openjournals/joss-reviews#4913. The checklist below may be modified as the review progresses. I'll create a separate issue for any items that require substantial discussion.

  • Repository: Is the source code for this software available at the https://github.com/MrShoenel/metrics-as-scores?
    • Yes, but paper title in readme.md should match title of JOSS submission. Update: I don't remember exactly where I was looking before, but the readme doesn't seem to refer to this paper anymore.
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
    • Yes, GPLv3.
  • Contribution and authorship: Has the submitting author (@MrShoenel) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
    • Yes, @MrShoenel is the only contributor. But in that case, how did the other authors contribute to the project?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines?
    • See "Details" for specific considerations. Question for the editor: most of the commits took place over a period of two months, but at least one commit toward the beginning (266d010) suggests that there was substantial work before this. Does this meet the standard that "As a rule of thumb, JOSS’ minimum allowable contribution should represent not less than three months of work for an individual"?
  • Age of software (is this a well-established software project) / length of commit history.
    • Most the commits occured in August and September
  • Number of commits.
    • 160
  • Number of authors
    • 1
  • Total lines of code (LOC). Submissions under 1000 LOC will usually be flagged, those under 300 LOC will be desk rejected.
    • Nearly 2000 Python lines in src plus web app. Much of the Python code wraps existing code, though. Are there key algorithmic parts I should look at?
  • Whether the software has already been cited in academic papers.
    • I don't see any
  • Whether the software is sufficiently useful that it is likely to be cited by your peer group.
    • TBD
  • In addition, JOSS requires that software should be feature-complete (i.e., no half-baked solutions)
    • This doesn't look half-baked
  • packaged appropriately according to common community standards for the programming language being used (e.g., Python, R),
    • yes, it's PIP-installable
  • designed for maintainable extension (not one-off modifications of existing tools).
    • yes
  • “Minor utility” packages, including “thin” API clients, and single-function packages are not acceptable.
    • This is not a single-function package
  • Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
    • I don't think there is any original data. The Qualitas.class corpus itself is not claimed as part of this paper. Is this accurate? Update: there are three datasets mentioned in the Readme, and one is mentioned in the paper. All are available and documented by accompanying PDFs.
  • Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
  • Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
    • pip install metrics-as-scores seems to have completed successfully. I don't see any instructions for testing the installation or running the software locally, though.
    • I did not attempt the "Stand-alone Usage / Development Setup". Development installation worked.
  • Functionality: Have the functional claims of the software been confirmed?
    • Maybe. I would like the authors to list the functional claims of the software concisely before I judge this. Since I am not finding instructions for interacting with the software locally, I am relying on https://metrics-as-scores.ml/webapp. There, I see lots of probability density functions overlayed on the same graph. IIUC, each of them was generated by fitting some data to ~120 distributions and keeping only the best fit. But I have many questions.
      • What does the data represent? I'm still not sure what the Qualitas.class corpus data is. (I'd suggest showing less by default. This is a lot to be confronted with.)
      • There are a few fitting metrics that are listed - which is considered when selecting the best fit? (Looks like KS 2-sample. Why? This will be stochastic, and there are deterministic statistics available.)
      • Are there any claims about the statistical interpretation of the results, or are statistical methods being used for convenience? As an obvious example of what I'm looking for: I don't think the software claims that the observed data were drawn from the fitted distribution, but if it did, I would say that this has not been confirmed because it would be an abuse of goodness-of-fit tests to make such a claim.
      • The paper mentions lots of functionality that does not seem to be demonstrated by the web interface - ANOVA, TukeyHSD, etc.
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
    • It is stated, but IMO the language used is too abstract to be easily interpreted by a general audience or even a computational statistics developer (myself). I would suggest adding a very concrete example to demonstrate what is meant by key terms "raw data", "metrics", "scores", "distance", "context". I think that in a domain-independent statistics context, I would call "raw data" -> "sample(s)", "metric" -> "statistic" (because I do not think it satisfies the mathematical definition of a "metric"), "scores" -> something related to a CDF fitted either parametrically or nonparametrically. I confident that I understand "distance" and "context" correctly, though.
    • Is there a way to distinguish between "metric" as a mathematical function and "metric" as the numerical value that function assumes given particular "raw data"? Similar question for "score".
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
    • Maybe? The example website seems to show how the software works on real-world data, but I don't understand what that data is. The "Use your own data" might satisfy this criterion, but IMO it should include a much simpler example with a minimum number of "raw data" samples, and it should show each of the claimed features individually (e.g. empirical distribution, KDE, MLE, ANOVA, TukeyHSD). See Example Usage #5.
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
    • I don't see any rendered API documentation, and AFAICT, none of the Python code has docstrings. See API Documentation #2.
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
    • There are some tests. Pytest is not listed as a dependency, and there are no instructions for running the tests. I am not sure whether there are adequate tests because there is not documentation for public functions. See Automated Tests #4.
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support.

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
    • There is a summary, but I don't think it's clear to non-specialists. A simple, concrete example would help. I'll link to a separate issue about this.
    • This has been improved.
  • A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
    • I think that once the introduction is more accessible, the statement of need will satisfy this criterion with minor adustments.
    • This has been improved.
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
    • Yes, mostly. I can make an issue with copyediting suggestions when the paper is closer to its final form.
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?
@mdhaber
Copy link
Author

mdhaber commented Dec 16, 2022

To check off remaining boxes, I would appreciate more information from the authors. Let's take this a few items at a time to keep things organized.

@MrShoenel can you answer the following questions:

  1. How did the four authors of the paper contribute to the project, assuming @MrShoenel is the GH handle for only one of the authors?
  2. Is there any original data that is claimed to be associated with this paper? "Qualitas.class" is the only data set I see, and is not claimed as a contribution of this paper.
  3. Do you consider the paper's claim to include the results, e.g. those displayed at https://metrics-as-scores.ml/webapp? Or is the paper primarily the methods/code for producing results given data? (I think the latter.)
  4. Are there any performance claims ("fast", "memory-efficient", "accurate") that I am missing? That's fine if there are none; I just wanted to make sure I didn't miss any.

This was referenced Dec 16, 2022
@MrShoenel
Copy link
Owner

Thanks for the detailed feedback :) I think we can answer some parts directly, others will require some effort. I believe that we will begin next week and have it finished the week after.

@MrShoenel
Copy link
Owner

Hey, I will answer some of your questions:

  1. I am the sole author of the software. The other authors are my supervisors who helped with the ideation of it.
  2. There are two datasets that are currently associated with this paper (more to come next week). The two known datasets that can be installed using the CLI are https://doi.org/10.5281/zenodo.7633950 and https://doi.org/10.5281/zenodo.7633989.
  3. Both are the case. The paper primarily is concerned with the results derived from using the Qualitas.class corpus (which is displayed at https://metrics-as-scores.ml/. However, it also contributes to the methodology and significance of the approach. I have shared the paper privately with you for the time being.
  4. There are no such claims. However, the purpose of pre-generating densities is to enable a real-time experience in the web application. I have tried to make that clear within the documentation, and I am going to add something about that to the readme, too!

@MrShoenel
Copy link
Owner

I want to answer some things that were unclear about the following items:

  • Data sharing: We actually do claim original results. While we used the Qualitas.class corpus of software metrics for the related publication, we claim original results about software metrics and how they are context-sensitive. We took the Qualitas.class corpus and extracted all metric values, then generated parametric fits and empirical densities, conducted ANOVA-, KS2-, and TukeyHSD-tests. The result is a separate dataset. It can be downloaded now using the new text-based user interface. It contains also the results of the statistical tests and a template that can be rendered to give the exact original results as claimed in the paper (the rendered report can be seen if you follow the link).
  • Reproducibility: It is straightforward to reproduce all intermediate and aggregated results. For example, you can download the referenced dataset and use the new text-based user interface to generate a dataset from scratch using the contained CSV that can be used by Metrics As Scores, including all statistical tests and the report.

@MrShoenel
Copy link
Owner

MrShoenel commented Feb 20, 2023

I have made changes to the paper and altered the introduction and statement of need.
I hope it is more accessible now since I use the terms feature and group and make sure that metrics refers to software metrics.

I will leave the closing of the issues to you, please do so once all criteria are satisfied :)

@mdhaber
Copy link
Author

mdhaber commented Feb 20, 2023

  1. Great.

Re: 2-4, in one of your replies:

I have shared the paper privately with you for the time being.

This suggests that your responses may have been partially about the paper sent by email rather than this JOSS paper. I'll try rephrasing my questions to make sure it is clear which paper we're discussing, and I'd appreciate it if you'd rephrase your responses accordingly.

  1. Is there any original data that is claimed as part of this JOSS paper that is not published otherwise? If so, please let me know when all of it is mentioned in the JOSS paper so that I can confirm that it is accessible.
  2. Are there results (e.g. conclusions at https://metrics-as-scores.ml/webapp) claimed as part of this JOSS paper, or does this JOSS paper primarily about the the code behind the results? (If the results are claimed as part of this JOSS paper, what is the relationship between this JOSS paper and the paper you shared by email? In other words, can you contrast the contributions of the two? Note that if "results" are claimed in this JOSS paper, they all need to be "entirely reproducible by reviewers" - how would I do that?)
  3. Does this JOSS paper make any performance claims ("fast", "memory-efficient", "accurate") about the software?

Some new questions so I can check off other boxes:

  1. What work toward this JOSS paper precedes your first commit to the repository? (i.e. how long did it take to write the first commit, or were there other draft versions of the software that weren't committed)
  2. Can you point to the part of the paper that describes "how this software compares to other commonly-used packages?"

@MrShoenel
Copy link
Owner

  1. Yes, after adapting Metrics As Scores so it can work with arbitrary datasets properly, I have produced 3 datasets, one of which is now referenced in the JOSS paper and claimed as original data in there (this one: Metrics and Domains From the Qualitas.class corpus. https://doi.org/10.5281/zenodo.7633949. The other two datasets, while original, are not mentioned or referenced in the paper.
  2. This JOSS paper is primarily about the code behind the results. There are no results claimed as part of this JOSS paper. In the section "Application", however, I do refer briefly to some results that were claimed in the referred QRS paper (that I shared with you privately). The relationship between the QRS and the JOSS paper is this:
    • The QRS paper makes use of the web application to derive some original results from the Qualitas.class corpus. Those results are only claimed there, and there is a brief reference to them at the end of the JOSS paper.
    • The JOSS paper introduces the application and analysis suite called Metrics As Scores, but other than the software package, it does not claim any original results.
    • As part of improving the application (making it dataset-agnostic), the dataset that was derived from the Qualitas.class corpus was externalized and published separately. It is referenced in the JOSS paper. There are no new results claimed in the JOSS paper, just the dataset as its own publication.
  3. [see the previous response; it was about the JOSS paper]
  4. There were no preceding draft versions. While the first commit starts with an empty repository, a few hours were invested prior to this for obtaining the Qualitas.class corpus and transforming it into a dataset that can be used with the to-be-implemented application. I do not have precise estimates for that, but suppose that it was approx. a man-day.
  5. I am not aware of any other packages that could be compared to this one. However, the first two paragraphs in the section "MAS -- The Tool- and Analysis Suite" describe the usage of common analyses that can be used to partially achieve what Metrics As Scores can achieve. In the section "Statement Of Need" it is pointed out that the software enables a novel way of exploring differences among groups.

@mdhaber
Copy link
Author

mdhaber commented Feb 20, 2023

Thanks, I think those questions are answered, and I checked off several boxes. I've created two new bug reports and replied to some of the existing issues.

@MrShoenel
Copy link
Owner

@Statement of need: I did try to simplify some of the terms. 'raw data' does not appear any longer, the word 'metric' is now always used as 'software metric' to make it clear. 'score', however, needs to remain as-is because of the already published QRS paper (we cannot change the name in retrospect). It is made clear what the term means and how it is used in the first paragraph. Similarly, I have attempted to make it more explicit what the words 'context' and 'distance' mean.

@mdhaber mdhaber closed this as not planned Won't fix, can't repro, duplicate, stale May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants