-
Notifications
You must be signed in to change notification settings - Fork 661
MDAnalysis 1.0 paper initial notes
We are slowly moving towards MDAnalysis 1.0. This release will freeze the API (until MDAnalysis 2.0) and guarantee backwards compatibility throughout 1.x. We want to write a paper that describes the library and gives the more recent contributors also a chance to get proper academic credit.
This wiki page should act as initial starting point for gathering ideas and hashing out the broad structure. Once we get started in earnest, we will create a separate repository.
- domain description
- challenges and requirements
- history and other similar packages
Philosophy and what lessons did we learn between initial release and 1.0 and how did they impact the design?
- object oriented
- pythonic
- interactive
- interoperable
Understanding the code base
- core data structures
- library structure
- core, topology system, "lib"
- coordinates and topology readers/writers (+ maybe briefly selection writers); note on random access in trajectories
- trajectory formats, topology formats (table)
- special topics
- MemoryReader
- download from PDB with
fetch_mmtf()
- analysis: overview
This should answer the reader's question "What can MDAnalysis do for me?"
(Ordering of topics? First the "developer" oriented ones on AtomGroup
etc or rather "user" oriented with analysis first?)
Ready-made building blocks with a common API
- mention anything already published (ENCORE, water analysis, ...)
- highlights? – add anything here that you'd like to write about
Dealing with MD data in a unified is the core task in MDAnalysis, therefore we support many commonly used formats (and some uncommonly used ones, too):
- topology formats
- trajectory formats
- special topics
- MemoryReader
- ChainReader
- download from PDB with
fetch_mmtf()
- possibly some benchmarks on reading/writing?
- special topics
- selection writers (brief)
- auxiliaries
- on-the-fly transformations
- hierarchy of containers (Atom, AtomGroup, Residue, ResidueGroup, Segment, SegmentGroup) + fragments; indexing, slicing, groupby
- selection language
- set operations with groups
- updating AtomGroup
- highlighting some AtomGroup or Residue methods
What do we use under the hood?
- distance calculations
- choose optimal algorithms (eg PR #2035 and much of @aushsuhane's work — see also his notebooks and gis GSOC summary)
- RMSD (just mention QCPROT)
- PBC treatment (hopefully consistent...)
- user interface
- algorithms (and perhaps benchmarks to show why we chose what we chose)
- development (CI, testing, PR/review/merge)
- broad community (mailing lists, issue tracker), conferences, workshops
- code of conduct
Highlights from the literature
- something that was accomplished with the help of MDAnalysis, e.g., scientific question answered
- other packages/tools that use MDAnalysis
summary and future work and plans
We need to decide on how we want to deal with authorship. A few potential models:
- recent core developers (core developers with recent contributions; example: SciPy papers such as the SciPy 1.0 major paper draft)
- all core developers (past and present)
- anyone who has contributed (example: Astropy 2.0 paper (arXiv:1801.02634, see also Khmer 2.0 and C. Titus Brown's rationale Pubwication of software papers, and authorship on them)
@orbeckst feels that authorship should require at a minimum
- code contributions (commits to develop/master)
- participation in paper writing
- approving the paper and committing to be accountable for the work (e.g., if you're the author of an analysis module, commit to fixing any issues that might come up)
The widely used ICMJE guidelines on authorship suggests that authorship be based on
- Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work; AND
- Drafting the work or revising it critically for important intellectual content; AND
- Final approval of the version to be published; AND
- Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
@orbeckst feels that for software, almost all contributions are important and it is not straightforward to always measure the impact of code contributions; certainly not by line of codes or number of PRs. See also Pubwication of software papers, and authorship on them.
The journal prescribes the length of the article.
Some suggestions
- PLoS Comp Biol (software paper, requires presubmission inquiry; no length limitations; authorship policy requires "Substantial contributions to conception and design, acquisition of data, or analysis and interpretation of data" + more), Open Access
- J Comp Theory Comp (no length limitations as far as I can see) immediate Gold Open Access available ($4,000)
- J Comp Chem (Software News and Updates, no length limitations(?)) Gold Open Access available
- Biophys J (Computational Tools, max 5 pages (see author guidelines (PDF)), Open access available
- Software X (max 6 pages) Open Access