-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
style: whether to preserve previously used curation scripts or not? #190
Comments
Based on the reasoning in #188, I think @JonathanRob summarized it well:
Together with this, it would be nice to have a guide on how to go back in time and walk through an example curation. |
Yes, I like @mihai-sysbio's suggestion to provide a short guide on how to take advantage of the git framework. I guess adding such a guide to the README in the |
@mihai-sysbio @JonathanRob I don't have strong opinions to the removal of old scripts, but I do concern that this solution might be biased toward curators, instead of users. |
To add to this, I would also recommend removing a lot of the content in the |
Agree, the outdated content under |
The move to a |
|
To respond to your points, @Hao-Chalmers:
Deleting the code/data will also prevent running and curating it further.
I'm not sure what you mean by this, but if there is code or data that is not used or maintained and is not expected to be used or developed in the future, then it should not remain in the repository. A difficulty for new users is having to familiarize themselves with all the code/data in a repository, to see if what they need is already present or if it needs to be developed. For example, I have written my own custom functions in RAVEN only to find out afterward that they already existed and I just didn't see them at first because there are so many (this is not a critique of RAVEN, but an implicit challenge that comes with larger repositories). By keeping a lot of old and unused scripts and data, it makes this even more tedious and problematic.
I do not delete any code or files unless I'm quite certain that they are old and no longer necessary. Of course I make a lot of mistakes, but it's very easy to simply restore a wrongly deleted file if we later realize it was incorrectly removed.
For a git repository, I generally try to keep the size as small as reasonably possible. So if something is not necessary, then why keep it? |
Whether it's necessary or not to keep the previous data and scripts that were used in modifying the GEM? This is an important question and should be openly discussed and considered with the opinions from multiple parties: administrators, conmmunity contributiors, and especially the users, whose thoughts are currently unavailable. No long ago, the modeling work was suffered from the lack of previous code and data, which made the reuse and further development a painful process. Git-based GEM repos greatly mitigated this. Even though the size of GEM repo grows with time. But in balancing between "easy to trace and full transparency" and "reduce repo size", the former has a higher priority in my view. |
Once code and data are added to a repository, only actions that alter history (eg Now, by deleting some of these files, they will disappear in future releases. As @JonathanRob described above, there are many advantages to keeping a clean and functional code-base, free of dead code that is not runnable directly (say because the file tree or file names have changes). At the same time, if code or data is not present in a current release, one has to dig a little deeper to find it. I believe the ultimate purpose of this issue is to identify all code that cannot be run any longer and will not be fixed to run in the future, together with associated data, and other data that is not relevant any longer.
I think we are all on the same page: any commit that makes changes to the model should be supported by code and data present in those commits. After many releases, I believe time has come to ask ourselves for how long will we keep around code and data that was meant to be used only once in the past while pretending it can still run. |
Thank you for your input @Hao-Chalmers and @mihai-sysbio. After some further consideration, I have changed my view on the matter. In particular, I find it to be extremely useful to be able to use GitHub's search function to query the repository for any scripts, functions, log files, etc. that contain a given term (such as the name of a reaction that has been deleted). If the old data files and scripts are deleted, their contents will not appear in the search results, requiring a bit more digging into previous versions as @mihai-sysbio explained. I am therefore leaning toward @Hao-Chalmers's earlier suggestion to create a Once a file is moved to the To summarize, here is my revised suggestion:
|
A way to build further on the above would be to do cleanups routinely, say prior to each major release. I could even imagine different folder per cleanup, eg |
For the one-time curation scripts, one can actually directly keep them under |
Description of the issue:
code
folder or not?Expected feature/value/output:
I hereby confirm that I have:
master
branch of the repositoryThe text was updated successfully, but these errors were encountered: