Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: KmeansR (R) #7

Open
10 of 23 tasks
jamesh4 opened this issue Mar 16, 2020 · 3 comments
Open
10 of 23 tasks

Submission: KmeansR (R) #7

jamesh4 opened this issue Mar 16, 2020 · 3 comments
Assignees

Comments

@jamesh4
Copy link

jamesh4 commented Mar 16, 2020


name: KmeansR
about: K-means package for R
title: K-means implementation from scratch
labels: 1/editor-checks, New Submission!
assignees: Eithar Elbasheer (@EitharAlfatih), Simardeep Kaur (@SimardeepKaur)


Submitting Author: Rob Blumberg (@RobBlumberg ), Sreejith Munthikodu (@sreejithmunthikodu ), Saurav Chowdhury (@saurav193 ), James Huang (@jamesh4 )
Package Name: KmeansR
One-Line Description of Package: K-means implementation from scratch
Repository Link: https://github.com/UBC-MDS/KmeansR/tree/master
Version submitted: https://github.com/UBC-MDS/KmeansR/tree/1.0.0
Editor: TBD
Reviewer 1: Eithar Elbasheer (@EitharAlfatih)
Reviewer 2: Simardeep Kaur (@SimardeepKaur)
Archive: TBD
Version accepted: TBD



Description

  • Include a brief paragraph describing what your package does:
    This package includes R functions that implement k-means clustering from scratch. This will work on any dataset with valid numerical features, and includes fit, predict, and cluster_summary functions, as well as as elbow and silhouette methods for hyperparameter “k” optimization. A high level overview of each function is given below. See each function’s documentation for more details.

Scope

  • Please indicate which category or categories this package falls under:
    • Data exploration
    • Data retrieval
    • Data extraction
    • Data munging
    • Data deposition
    • Reproducibility
    • Geospatial
    • Education
    • Data visualization*

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.
* The data exploration category was added after consultation with @kvarada since the other categories don't accurately describe this package.

  • Explain how the and why the package falls under these categories (briefly, 1-2 sentences):

This package implements the k-means algorithm, a data mining and clustering technique used to uncover relationships in unlabelled data.

  • Who is the target audience and what are scientific applications of this package?

This package was created to provide a deeper understanding of the k-means clustering algorithm. Thus, this package is targeted to anyone interested in diving into the implementation of this clustering technique.

  • Are there other R packages that accomplish the same thing? If so, how does yours differ?

There is a built-in R function called KMeans. This package is not meant to add to the existing ecosystem; it is rather intended to deepen fundamental understanding of these algorithms.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

We spoke to @kvarada and she approved the addition of the data exploration category.

Technical checks

For details about the rOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has an OSI approved license
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

Publication options

JOSS Checks
  • The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the rOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
  • The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." rOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
  • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
  • The package is deposited in a long-term repository with the DOI:

Note: Do not submit your package separately to JOSS

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Code of conduct

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

@EitharAlfatih
Copy link

EitharAlfatih commented Mar 21, 2020

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of the package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 2 hours

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Overall a very well-built package!

  • I cloned the repository and was able to successfully run the tests.
  • The README file was able to clearly explain the purpose of the package and provided a good introductory to the implemented functions.
  • I ran devtools::check() and it successfully ran after I updated some dependencies.

Suggestions:

  • A list of dependencies may be helpful if added to README.md file.

  • In the README.md file, in the usage section, there exists a typo in the cluster_summary function. I believe it was meant to be clustersummary.

  • In your predict.R file the documentation mentions an extra distance_metric argument which is not part of the current implementation. Consider updating the documentation for this function.

  • The second and the third tests in test-predict.R -which are testing points with the same coordinates being assigned to the same centroid - are the same. Consider removing one of them.

  • Tests can be added to test-elbow.R for checking the input's format. The coverage will increase after testing the 4 if branches.

  • In clustersummary.R the example in the documentation has a typo, it calls cluster_summary instead of clustersummary.

Other than that, the package is well implemented, the documentation is well written and the tests covered the major functionality of the package.

Thanks again. I truly found it interesting and really helpful to go through this work.
Let me know if you have any questions.

@SimardeepKaur
Copy link

SimardeepKaur commented Mar 21, 2020

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • [] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

2 hours

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

  • Congratulations guys, for building a complete algorithm from scratch. This is a very well documented package which is recreating K means clustering. This shows your in depth knowledge and understanding of the algorithm.

Running usage example

  • The installations via “devtools::install_github("UBC-MDS/Kmeans")” worked well.I really appreciate the simplicity of the installation instructions.
  • I would like to mention that while running the usage example, the documentation did not mention of loading any libraries, which I got to know only when I ran the functions and encountered errors. I would suggest to add instructions for loading the required libraries like tidyverse etc.
  • The Usage example worked really well once I loaded all te libraries. I would have documented the example with more comments to make it more comprehensive.

Test coverage

  • I checked the package coverage using “package_coverage()” which gave the following results:
    Kmeans Coverage: 89.74%
    R/predict.R: 80.00%
    R/elbow.R: 84.38%
    R/clustersummary.R: 84.62%
    R/fit.R: 95.56%
    R/silhouette.R: 96.97%

These results are quite impressive given how complex these functions are.

  • I ran “devtools::test()” in the root repo of the project in R console. It worked perfectly.
  • I ran check() in the root repo which also worked perfectly.

Documentaion:

Readme of the package looks perfect. It was quite helpful in giving an overview of what is happening in each function.
Although I noticed a small typo. You should consider changing the fucntion “cluster_summary” to “clustersummary” because that is the actual function name.

Implementation

  • I went through all different functions that you have included. I must say that I was impressed by directory structure of your functions. I found doc strings for all the functions quite helpful.
  • I also checked by the functions by passing different arguments. I could see that defense programming was in place everywhere.

Overall:

This package looks great and is really inspiring for someone whi wants to have a deep understanding of a particlar algorithm to go ahead and try building it from scratch. I really enjoyed going through all the functions and docuemenation. Keep up the good work!

@RobBlumberg
Copy link

Hi @EitharAlfatih and @SimardeepKaur , we're happy to be able to tell you that we've implemented almost all of your feedback in our latest package release 1.1.0.

  • We've added tidyverse as our only dependency in the package README.
  • We fixed the clustersummary typos that were in the README usage section and the function's documentation.
  • We removed the redundant tests for the predict function, and also removed the description of the non-existent distance_metric argument in the predict function documentation.
  • We added four new tests for the elbow function to check for correct inputs. We also added several other tests for other functions to increase the overall code coverage.
  • We added loading of required libraries in the usage examples.

Thank you very much for the feedback, it has definitely improved the quality of our package!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants