Skip to content

MDAnalysis 1.0 paper initial notes

Oliver Beckstein edited this page Jul 20, 2018 · 19 revisions

We are slowly moving towards MDAnalysis 1.0. This release will freeze the API (until MDAnalysis 2.0) and guarantee backwards compatibility throughout 1.x. We want to write a paper that describes the library and gives the more recent contributors also a chance to get proper academic credit.

This wiki page should act as initial starting point for gathering ideas and hashing out the broad structure. Once we get started in earnest, we will create a separate repository.

Topics

Introduction

  • domain description
    • challenges and requirements
  • history and other similar packages

Design and Structure of the library

Design

Philosophy and what lessons did we learn between initial release and 1.0 and how did they impact the design?

  • object oriented
  • pythonic
  • interactive
  • interoperable

Organization

Understanding the code base

  • core data structures
  • library structure
    • core, topology system, "lib"
    • coordinates and topology readers/writers (+ maybe briefly selection writers); note on random access in trajectories
      • trajectory formats, topology formats (table)
      • special topics
        • MemoryReader
        • download from PDB with fetch_mmtf()
    • analysis: overview

Capabilities

This should answer the reader's question "What can MDAnalysis do for me?"

(Ordering of topics? First the "developer" oriented ones on AtomGroup etc or rather "user" oriented with analysis first?)

Analysis

Ready-made building blocks with a common API

  • mention anything already published (ENCORE, water analysis, ...)
  • highlights? – add anything here that you'd like to write about

Working with MD data: trajectories, topologies, etc

Dealing with MD data in a unified is the core task in MDAnalysis, therefore we support many commonly used formats (and some uncommonly used ones, too):

  • topology formats
  • trajectory formats
    • special topics
      • MemoryReader
      • ChainReader
      • download from PDB with fetch_mmtf()
    • possibly some benchmarks on reading/writing?
  • selection writers (brief)
  • auxiliaries
  • on-the-fly transformations

Atom selection and working with AtomGroups

  • hierarchy of containers (Atom, AtomGroup, Residue, ResidueGroup, Segment, SegmentGroup) + fragments; indexing, slicing, groupby
  • selection language
  • updating AtomGroup
  • highlighting some AtomGroup or Residue methods

Enabling algorithms

What do we use under the hood?

  • distance calculations
  • RMSD (just mention QCPROT)
  • PBC treatment (hopefully consistent...)
    • user interface
    • algorithms (and perhaps benchmarks to show why we chose what we chose)

Development process and Community

  • development (CI, testing, PR/review/merge)
  • broad community (mailing lists, issue tracker), conferences, workshops
  • code of conduct

Authorship model

We need to decide on how we want to deal with authorship.

  • select few (core developers with recent contributions)
  • all core developers, past and present
  • anyone who has contributed

@orbeckst feels that authorship should require at a minimum

  1. code contributions (commits to develop/master)
  2. participation in paper writing

Journal

The journal prescribes the length of the article.

Some suggestions

Clone this wiki locally