Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JOSS] Improvements to documentation and some missing information #187

Closed
Sbozzolo opened this issue Sep 20, 2021 · 9 comments
Closed

[JOSS] Improvements to documentation and some missing information #187

Sbozzolo opened this issue Sep 20, 2021 · 9 comments
Assignees
Labels
enhancement Modification of existing feature/general improvement

Comments

@Sbozzolo
Copy link

Hello, I am reviewing GRChombo for publication in the Journal of Open-Source Software (JOSS). As part of the process, I am evaluating the quality of the documentation available. My overall impression is that the documentation in GRChombo contains good material, but it needs some polishing. Please, find below my comments. Please, keep in mind that the spirit of the comments are to help you improve GRChombo.

  • Capabilities of the code are unclear

    The "about" page briefly mentioned what the code can do, but it is definitely not comprehensive enough. Suppose I have a research project in mind. There's not enough information for me to evaluate if I can use GRChombo for it.

    For example:

    • What physics modules are implemented in the public code?
    • Can I compute surface/volume ADM masses, and momenta?
    • Can I locate apparent horizons?
    • Can I compute the quasi-local properties of such horizons?
    • Can I change the order of the integration or finite-difference scheme?
    • Is checkpointing available?
    • Can I change parameters while a simulation is running, or at checkpoint?
    • What boundary conditions are available?
    • What initial data are available?
    • Can I perform Cauchy Characteristic Extraction of gravitational waves?
    • Are interpolation operators available?
    • ...

    These are just some examples. I got the answer to some of these questions, but it would be important to have a summary of what the code can do.

  • Documentation has "work in progress" pages.

    Having incomplete pages is not a sign of good documentation and I think it is a red flag for new users. Importantly, the "structure of the code" page is incomplete. This page is important because it is the main way you can help new developers wrap their heads around GRChombo. If I were a developer, I would have to track how the code works by myself, which takes a lot of time an energy.

  • The doxygen documentation has several broken links and does not seem useful

    It is unclear to me what purpose is the doxygen documentation serving. For instance, it seems that examples are intermixed with the main library and that there's no clear organization of the content. The documentation also says "We currently do not guarantee that this doxygen is kept completely up to date." As a user, I don't know what I have to expect from this documentation. Also, there are broken links, for example here.

  • Postprocessing is not well documented.

    The documentation links to this page for post-processing with yt. But there's no explanation on how to use these scripts. Similarly for visit. The included scripts have no comments. E.g., what is the difference between PlotSlice.py and PlotSliceRemix.py? How to use and modify these scripts?

  • The README could give a better first-impression to new users

    What catches my eye when I open the README is "IMPORTANT: Updating examples for diagnostic variables". I have no idea what this means, but I think it is off-putting to new users. The README doesn't have any image and the information are really bare bone. JOSS requires that "A high-level overview of this documentation should be included in a README file (or equivalent)". The README should be updated to give an overview (that includes capabilities, dependencies, installation instructions) more extensively and clearly. Adding images will also improve the README.

  • The relationship between "public GRChombo" and "private" is not clarified.

    The "about" of the GitHub repo says "Main public version of the GRChombo code", which implies the existence of a private version. Indeed, several published results depend on unpublished code. The development model of GRChombo should be clarified. Is GRChombo openly developed, or is it that the collaboration develops private modules, use it for a while, and than releases them to the public? If yes, is any promise made on timelines/disclosures/...?

  • Copyright

    Who exactly owns the copyright? The header files say "GRChombo collaboration". What does this mean? Is there a contributor-license agreement that one needs to sign?

  • Versions

    What versioning scheme does GRChombo follow? Is there any guarantee of backwards compatibility?

    JOSS requires the existence of a version, how will the wiki (e.g., installation instruction) change when you start introducing versions? Maybe you can start by tagging a beta release that will become official upon acceptance to publication in JOSS.

  • Great pages

    I appreciated pages like "Coding conventions", "Updating the code", "Useful resources".

  • How to obtain help and report bugs is missing

    I couldn't find information about how to report bugs and how to obtain help. Are GitHub issues the preferred way? What kind of information is needed in the bug report? Where should I ask my questions?

    If the page "contact us" was intended to serve this purpose, it should be expanded to clarify this.

  • About contributing

    Do contributors have to write tests for their new features? How are PR merged (e.g., rebase vs merger commit)? What are the conventions for branch names?

  • Configuration files for famous machines

    The Einstein Toolkit ships with simfactory. This tool is used, among the other things, to compile the code. Simfactory contains a database of several supercomputers, so that the code is compiled and executed in the most optimized way (e.g., compilation flags, module to load, environment variables to set, number of MPI processes to use per node, ...). Users of the Einstein Toolkit can submit their configuration files to simfactory, so that everyone in the community benefits from that. I think this is really useful, so I would suggest you do something similar and encourage users to submit their configuration files to compile and run GRChombo. Since you already have some, you could have a "contrib" folder (to make clear that some are "official", other are contributed).

  • The organization of the wiki could be improved

    Is the documentation designed to be read as a book (from start to finish), or searched through as a traditional wiki? The way the wiki is currently setup seems to suggest the former. However, I think that this is a little bit intimidating and dispersive. For example, the table of content places at the same level: tutorials, usage pages, developer information, tips&tricks, other info. On the other hand, the "Getting started" page is much more friendly to new users than the wiki toc. So, my recommendation is to decrease the amount of information in the "home" page, keeping only a few links that are relevant to different audiences. You can also encourage users/developers to use the "search" feature to find what they look for.

    It would also be nice to have a "How do I?/FAQ" page. If video tutorials (e.g., for conferences/workshops) are available, they should be linked too.

  • Required citations is missing or not easy to find

    What references am I supposed to cite when I use GRChombo?

These are some of the comments that came to my mind going through the documentation, but I haven't gone through each single page carefully.

@KAClough
Copy link
Member

Thanks @Sbozzolo for this really useful feedback. We will update the wiki for these suggestions over the next couple of weeks.

@KAClough KAClough added the enhancement Modification of existing feature/general improvement label Sep 20, 2021
@KAClough
Copy link
Member

@rashti-alireza if you have any other comments on the wiki please do feel free to add them here too.

I've made a note that we could add some points on the compiler warnings and the use of HDF5 that you already raised.

@Sbozzolo
Copy link
Author

I forgot to add a point I was meaning to add. I think that it would be beneficial to have a page where you document how the documentation should look like. This for both users and developers. For users, it can provide a way to understand how the documentation is structured. For developers, this would be a "guiding principle" to maintain the documentation clean and updated.

Personally, I like this mental framework to organize my documentation. You can follow any framework you want, but I would recommend writing it down.

@Sbozzolo
Copy link
Author

.bash_rc should probably be .bashrc in https://github.com/GRChombo/GRChombo/wiki/Compiling-GRChombo

@Sbozzolo
Copy link
Author

Sbozzolo commented Sep 22, 2021

  • What are the recommendation for number of MPI processes vs OpenMP threads?
  • There should be some information about how much time/resources Examples require.

@KAClough
Copy link
Member

Moving @Sbozzolo comment from the review here as it is related to updating the code for JOSS paper:

When running tests, it might be nice to have a summary. E.g., something that prints something like "8 tests run: tests passed: [LIST], tests failed [LIST]". In Compiling Chombo you recommend using grep to obtain something like this. You don't do it in the "Compiling GRChombo" page. Maybe you can put together a simple make target to do this nicely?

@mirenradia
Copy link
Member

Hi @Sbozzolo, thanks for your detailed feedback. I think I broadly agree with most of what you suggest but I have a few comments below.

  • Capabilities of the code are unclear
    The "about" page briefly mentioned what the code can do, but it is definitely not comprehensive enough. Suppose I have a research project in mind. There's not enough information for me to evaluate if I can use GRChombo for it.
    For example:

    • What physics modules are implemented in the public code?
    • Can I compute surface/volume ADM masses, and momenta?
    • Can I locate apparent horizons?
    • Can I compute the quasi-local properties of such horizons?
    • Can I change the order of the integration or finite-difference scheme?
    • Is checkpointing available?
    • Can I change parameters while a simulation is running, or at checkpoint?
    • What boundary conditions are available?
    • What initial data are available?
    • Can I perform Cauchy Characteristic Extraction of gravitational waves?
    • Are interpolation operators available?
    • ...

    These are just some examples. I got the answer to some of these questions, but it would be important to have a summary of what the code can do.

I agree that a slightly more detailed overview of what is currently available is a good idea. However, I think it might be difficult to maintain an up-to-date list of every single feature that is available given the small size of our collaboration (relative to the ETK collaboration). I note that JOSS specifies the "core functionality" be understood.

  • Documentation has "work in progress" pages.
    Having incomplete pages is not a sign of good documentation and I think it is a red flag for new users. Importantly, the "structure of the code" page is incomplete. This page is important because it is the main way you can help new developers wrap their heads around GRChombo. If I were a developer, I would have to track how the code works by myself, which takes a lot of time an energy.

This is a good point and we should action this, particularly for the "structure of the code" page.

  • The doxygen documentation has several broken links and does not seem useful
    It is unclear to me what purpose is the doxygen documentation serving. For instance, it seems that examples are intermixed with the main library and that there's no clear organization of the content. The documentation also says "We currently do not guarantee that this doxygen is kept completely up to date." As a user, I don't know what I have to expect from this documentation. Also, there are broken links, for example here.

The doxygen documentation hosted at that URL is several years out of date and should be ignored. It is best to generate the documentation locally using the current version of the main branch. I will update the wiki to reflect this. With respect to the organisation of the content, as per the disclaimer, this documentation is not intended to be comprehensive but merely a way to look up classes and their methods.

  • Postprocessing is not well documented.
    The documentation links to this page for post-processing with yt. But there's no explanation on how to use these scripts. Similarly for visit. The included scripts have no comments. E.g., what is the difference between PlotSlice.py and PlotSliceRemix.py? How to use and modify these scripts?

I agree that the documentation and comments for the postprocessing tools is not up to the same standard of this code and could be improved. However, these tools are not part of this repository and should therefore not be considered an integral part of the main code. We link to them merely as examples and they are provided "as is". For your specific example of the difference between PlotSlice.py and PlotSliceRemix.py, please refer to the comments at the top of the latter where you will also find usage instructions.

  • The README could give a better first-impression to new users
    What catches my eye when I open the README is "IMPORTANT: Updating examples for diagnostic variables". I have no idea what this means, but I think it is off-putting to new users. The README doesn't have any image and the information are really bare bone. JOSS requires that "A high-level overview of this documentation should be included in a README file (or equivalent)". The README should be updated to give an overview (that includes capabilities, dependencies, installation instructions) more extensively and clearly. Adding images will also improve the README.

This is a good point and we will update the README accordingly. I'm not sure if it makes sense to provide detailed installation instructions here but we can certainly link to the relevant wiki pages.

  • The relationship between "public GRChombo" and "private" is not clarified.
    The "about" of the GitHub repo says "Main public version of the GRChombo code", which implies the existence of a private version. Indeed, several published results depend on unpublished code. The development model of GRChombo should be clarified. Is GRChombo openly developed, or is it that the collaboration develops private modules, use it for a while, and than releases them to the public? If yes, is any promise made on timelines/disclosures/...?

GRChombo is licensed under a BSD-3 license which allows users to modify and use the code for their own private use with no requirement to release changes publicly. As per @KAClough's comment on openjournals/joss-reviews#3703, we do however endeavour to add new features that may be useful for others. Unfortunately, given the small size of our collaboration, we can't make any promises on timescales.

  • Copyright
    Who exactly owns the copyright? The header files say "GRChombo collaboration". What does this mean? Is there a contributor-license agreement that one needs to sign?

As with many open source projects, copyright ownership is rather complicated. Technically, it is owned collectively by the owners of copyright work corresponding to the individual contributors to the code in the GRChombo collaboration (which are most likely either the contributor themselves or the institution to which they belong). Of course in practice, it is the license which is more important for redistribution.

There is no such agreement one needs to sign and, as far as I'm aware, such a formality is not that commonplace in open-source software (though not unheard of e.g. the Free Software Foundation).

  • Versions
    What versioning scheme does GRChombo follow? Is there any guarantee of backwards compatibility?
    JOSS requires the existence of a version, how will the wiki (e.g., installation instruction) change when you start introducing versions? Maybe you can start by tagging a beta release that will become official upon acceptance to publication in JOSS.

GRChombo currently does not follow a versioning scheme and we continually make updates to the main branch. As far as I can see from the review criteria, a versioning scheme is not required. As per the requirements for JOSS at the end of the review we will tag a version of the code that will be archived.

  • Great pages
    I appreciated pages like "Coding conventions", "Updating the code", "Useful resources".

Thank you!

  • How to obtain help and report bugs is missing
    I couldn't find information about how to report bugs and how to obtain help. Are GitHub issues the preferred way? What kind of information is needed in the bug report? Where should I ask my questions?
    If the page "contact us" was intended to serve this purpose, it should be expanded to clarify this.

As per the contact us wiki page, bugs may be reported using GitHub issues. If new users require help, they are free to email the provided email address [email protected]. Alternatively, they can open a new discussion in the discussions tab of this repository. I will clarify this on the contact us wiki page.

  • About contributing
    Do contributors have to write tests for their new features? How are PR merged (e.g., rebase vs merger commit)? What are the conventions for branch names?

The requirement to write a test for a new feature will be up to the PR reviewer and depend on the size of the new feature added. PRs are generally merged using merge commits but may be squashed at the discretion of the PR reviewer if the PR is relatively small. The conventions of branch names can be found here.

  • Configuration files for famous machines
    The Einstein Toolkit ships with simfactory. This tool is used, among the other things, to compile the code. Simfactory contains a database of several supercomputers, so that the code is compiled and executed in the most optimized way (e.g., compilation flags, module to load, environment variables to set, number of MPI processes to use per node, ...). Users of the Einstein Toolkit can submit their configuration files to simfactory, so that everyone in the community benefits from that. I think this is really useful, so I would suggest you do something similar and encourage users to submit their configuration files to compile and run GRChombo. Since you already have some, you could have a "contrib" folder (to make clear that some are "official", other are contributed).

These are provided for systems we have run GRChombo on here.

  • The organization of the wiki could be improved
    Is the documentation designed to be read as a book (from start to finish), or searched through as a traditional wiki? The way the wiki is currently setup seems to suggest the former. However, I think that this is a little bit intimidating and dispersive. For example, the table of content places at the same level: tutorials, usage pages, developer information, tips&tricks, other info. On the other hand, the "Getting started" page is much more friendly to new users than the wiki toc. So, my recommendation is to decrease the amount of information in the "home" page, keeping only a few links that are relevant to different audiences. You can also encourage users/developers to use the "search" feature to find what they look for.
    It would also be nice to have a "How do I?/FAQ" page. If video tutorials (e.g., for conferences/workshops) are available, they should be linked too.

Thanks for the suggestion. We can look at restructuring the homepage.

  • Required citations is missing or not easy to find
    What references am I supposed to cite when I use GRChombo?

This will be the JOSS paper but I guess we can't add it until it is published.

  • What are the recommendation for number of MPI processes vs OpenMP threads?

There aren't really hard-and-fast answers to this question as it depends on the specific architecture of the system and the problem at hand. In general, I tend to use 2-4 OpenMP threads per MPI rank.

  • There should be some information about how much time/resources Examples require.

For the BinaryBH example with the default parameter file there is some information here.

@Sbozzolo
Copy link
Author

Sbozzolo commented Sep 30, 2021

Thanks @mirenradia. Let me point out that my questions were not intended to be answered directly, but they were prompts for you to add and improve the documentation. Users should be able to find this information in the documentation.

I agree that a slightly more detailed overview of what is currently available is a good idea. However, I think it might be difficult to maintain an up-to-date list of every single feature that is available given the small size of our collaboration (relative to the ETK collaboration). I note that JOSS specifies the "core functionality" be understood.

I argue that some of the questions asked are definitely "core functionality" (e.g., checkpointing). And, personally, I think that being a small team makes things easier. Once you have the initial list of features available, it is not going to change dramatically and/or over fast timescales, so you can more easily keep it updated. I believe that having a clear list of features is a big service to new users, and I highly encourage this, regardless of what are the standards in JOSS. For example, the Einstein Toolkit does not have that, so, in the past, I had to write codes that already existed because I was unaware of their existence.

@KAClough
Copy link
Member

I think these have all now been addressed or moved to specific issues so I am closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Modification of existing feature/general improvement
Projects
None yet
Development

No branches or pull requests

5 participants