Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate class docstrings and __init__ docstrings in API #2282

Closed
lilyminium opened this issue Jun 24, 2019 · 2 comments
Closed

Separate class docstrings and __init__ docstrings in API #2282

lilyminium opened this issue Jun 24, 2019 · 2 comments

Comments

@lilyminium
Copy link
Member

There are a few open issues on documentation, but it seems to mostly focus on tutorials rather than API reference structure -- do let me know if I've missed somewhere this would be better located.

Is your feature request related to a problem? Please describe.
There's a lot of documentation in the code itself that isn't necessarily where it belongs, mostly in Class.__init__ vs Class docstrings. You've noted before that you have different kinds of users, who can be mostly grouped into MD-applications people and developer people. Applications people will be looking for examples in the tutorials, while developers will be looking at the API to understand how the code itself works. Having examples and notes in __init__(), therefore, is unexpected for both and clutters up the code. On the flip side, the class docstring is not where I would first go to look for parameters to pass to __init__.

Describe the solution you'd like

  1. Move long Examples etc. to tutorial .rst pages. Keep method docstrings limited to parameters that the methods take, and what the method returns. Keep class docstrings limited to description, short notes/examples, references, and attributes.

  2. Have self-documenting code with clear names and explicit arguments instead of kwargs (although I see that PR More user-friendly docs (Addressing Issue #1175) #1190 is already addressing that). Splitting up big methods into multiple smaller ones would also help interpretability. For example, Universe.__init__ could be turned into several defined class methods: Universe.from_files, Universe.from_streams, etc.

Additional context
Here's an example in the RMSF.__init__ docstring:

r"""Parameters
        ----------
        atomgroup : AtomGroup
            Atoms for which RMSF is calculated
        start : int (optional)
            starting frame, default None becomes 0.
        stop : int (optional)
            Frame index to stop analysis. Default: None becomes
            n_frames. Iteration stops *before* this frame number,
            which means that the trajectory would be read until the end.
        step : int (optional)
            step between frames, default None becomes 1.
        verbose : bool (optional)
             Show detailed progress of the calculation if set to ``True``; the
             default is ``False``.

        Raises
        ------
        ValueError
             raised if negative values are calculated, which indicates that a
             numerical overflow or underflow occured


        Notes
        -----
        The root mean square fluctuation of an atom :math:`i` is computed as the
        time average

        .. math::

          \rho_i = \sqrt{\left\langle (\mathbf{x}_i - \langle\mathbf{x}_i\rangle)^2 \right\rangle}

        No mass weighting is performed.

        This method implements an algorithm for computing sums of squares while
        avoiding overflows and underflows [Welford1962]_.


        Examples
        --------

        In this example we calculate the residue RMSF fluctuations by analyzing
        the :math:`\text{C}_\alpha` atoms. First we need to fit the trajectory
        to the average structure as a reference. That requires calculating the
        average structure first. Because we need to analyze and manipulate the
        same trajectory multiple times, we are going to load it into memory
        using the :mod:`~MDAnalysis.coordinates.MemoryReader`. (If your
        trajectory does not fit into memory, you will need to :ref:`write out
        intermediate trajectories <writing-trajectories>` to disk or
        :ref:`generate an in-memory universe
        <creating-in-memory-trajectory-label>` that only contains, say, the
        protein)::

           import MDAnalysis as mda
           from MDAnalysis.analysis import align

           from MDAnalysis.tests.datafiles import TPR, XTC

           u = mda.Universe(TPR, XTC, in_memory=True)
           protein = u.select_atoms("protein")

           # 1) need a step to center and make whole: this trajectory
           #    contains the protein being split across periodic boundaries
           #
           # TODO

           # 2) fit to the initial frame to get a better average structure
           #    (the trajectory is changed in memory)
           prealigner = align.AlignTraj(u, select="protein and name CA", in_memory=True).run()

           # 3) reference = average structure
           reference_coordinates = u.trajectory.timeseries(asel=protein).mean(axis=1)
           # make a reference structure (need to reshape into a 1-frame "trajectory")
           reference = mda.Merge(protein).load_new(
                       reference_coordinates[:, None, :], order="afc")

        We created a new universe ``reference`` that contains a single frame
        with the averaged coordinates of the protein.  Now we need to fit the
        whole trajectory to the reference by minimizing the RMSD. We use
        :class:`MDAnalysis.analysis.align.AlignTraj`::

           aligner = align.AlignTraj(u, reference, select="protein and name CA", in_memory=True).run()

        The trajectory is now fitted to the reference (the RMSD is stored as
        `aligner.rmsd` for further inspection). Now we can calculate the RMSF::

           from MDAnalysis.analysis.rms import RMSF

           calphas = protein.select_atoms("name CA")
           rmsfer = RMSF(calphas, verbose=True).run()

        and plot::

           import matplotlib.pyplot as plt

           plt.plot(calphas.resnums, rmsfer.rmsf)



        References
        ----------
        .. [Welford1962] B. P. Welford (1962). "Note on a Method for
           Calculating Corrected Sums of Squares and Products." Technometrics
           4(3):419-420.


        .. versionadded:: 0.11.0
        .. versionchanged:: 0.16.0
           refactored to fit with AnalysisBase API
        .. deprecated:: 0.16.0
           the keyword argument `quiet` is deprecated in favor of `verbose`.
        .. versionchanged:: 0.17.0
           removed unused keyword `weights`

        """

The method itself is two lines. In my opinion, everything after the Raises section (except the version notes) belongs in the class docstring, or even better, in a separate tutorial page somewhere. Plotting with matplotlib doesn't have anything to do with RMSF. Actually figuring out how to initialise the class (because the arguments are hidden away in kwargs) becomes quite a chore.

The Universe docstring is almost the exact opposite format (and these two systems are inconsistent). The __init__ method has no docstring. This is the class documentation:

"""The MDAnalysis Universe contains all the information describing the system.

    The system always requires a *topology* file --- in the simplest case just
    a list of atoms. This can be a CHARMM/NAMD PSF file or a simple coordinate
    file with atom informations such as XYZ, PDB, Gromacs GRO, or CHARMM
    CRD. See :ref:`Supported topology formats` for what kind of topologies can
    be read.

    A trajectory provides coordinates; the coordinates have to be ordered in
    the same way as the list of atoms in the topology. A trajectory can be a
    single frame such as a PDB, CRD, or GRO file, or it can be a MD trajectory
    (in CHARMM/NAMD/LAMMPS DCD, Gromacs XTC/TRR, or generic XYZ format).  See
    :ref:`Supported coordinate formats` for what can be read as a
    "trajectory".

    As a special case, when the topology is a file that contains atom
    information *and* coordinates (such as XYZ, PDB, GRO or CRD, see
    :ref:`Supported coordinate formats`) then the coordinates are immediately
    loaded from the "topology" file unless a trajectory is supplied.

    Examples for setting up a universe::

       u = Universe(topology, trajectory)          # read system from file(s)
       u = Universe(pdbfile)                       # read atoms and coordinates from PDB or GRO
       u = Universe(topology, [traj1, traj2, ...]) # read from a list of trajectories
       u = Universe(topology, traj1, traj2, ...)   # read from multiple trajectories

    Load new data into a universe (replaces old trajectory and does *not* append)::

       u.load_new(trajectory)                      # read from a new trajectory file

    Select atoms, with syntax similar to CHARMM (see
    :class:`~Universe.select_atoms` for details)::

       u.select_atoms(...)

    Parameters
    ----------
    topology : str, Topology object or stream
        A CHARMM/XPLOR PSF topology file, PDB file or Gromacs GRO file; used to
        define the list of atoms. If the file includes bond information,
        partial charges, atom masses, ... then these data will be available to
        MDAnalysis. A "structure" file (PSF, PDB or GRO, in the sense of a
        topology) is always required. Alternatively, an existing
        :class:`MDAnalysis.core.topology.Topology` instance may also be given.
    topology_format
        Provide the file format of the topology file; ``None`` guesses it from
        the file extension [``None``] Can also pass a subclass of
        :class:`MDAnalysis.topology.base.TopologyReaderBase` to define a custom
        reader to be used on the topology file.
    format
        Provide the file format of the coordinate or trajectory file; ``None``
        guesses it from the file extension. Note that this keyword has no
        effect if a list of file names is supplied because the "chained" reader
        has to guess the file format for each individual list member.
        [``None``] Can also pass a subclass of
        :class:`MDAnalysis.coordinates.base.ProtoReader` to define a custom
        reader to be used on the trajectory file.
    all_coordinates : bool
        If set to ``True`` specifies that if more than one filename is passed
        they are all to be used, if possible, as coordinate files (employing a
        :class:`MDAnalysis.coordinates.chain.ChainReader`). [``False``] The
        default behavior is to take the first file as a topology and the
        remaining as coordinates. The first argument will always always be used
        to infer a topology regardless of *all_coordinates*. This parameter is
        ignored if only one argument is passed.
    guess_bonds : bool, optional
        Once Universe has been loaded, attempt to guess the connectivity
        between atoms.  This will populate the .bonds .angles and .dihedrals
        attributes of the Universe.
    vdwradii : dict, optional
        For use with *guess_bonds*. Supply a dict giving a vdwradii for each
        atom type which are used in guessing bonds.
    is_anchor : bool, optional
        When unpickling instances of
        :class:`MDAnalysis.core.groups.AtomGroup` existing Universes are
        searched for one where to anchor those atoms. Set to ``False`` to
        prevent this Universe from being considered. [``True``]
    anchor_name : str, optional
        Setting to other than ``None`` will cause
        :class:`MDAnalysis.core.groups.AtomGroup` instances pickled from the
        Universe to only unpickle if a compatible Universe with matching
        *anchor_name* is found. Even if *anchor_name* is set *is_anchor* will
        still be honored when unpickling.
    transformations: function or list, optional
        Provide a list of transformations that you wish to apply to the 
        trajectory upon reading. Transformations can be found in 
        :mod:`MDAnalysis.transformations`, or can be user-created.
    in_memory
        After reading in the trajectory, transfer it to an in-memory
        representations, which allow for manipulation of coordinates.
    in_memory_step
        Only read every nth frame into in-memory representation.
    continuous : bool, optional
        The `continuous` option is used by the
        :mod:`ChainReader<MDAnalysis.coordinates.chain>`, which contains the
        functionality to treat independent trajectory files as a single virtual
        trajectory.

    Attributes
    ----------
    trajectory
        currently loaded trajectory reader;
    dimensions
        current system dimensions (simulation unit cell, if set in the
        trajectory)
    atoms, residues, segments
        master Groups for each topology level
    bonds, angles, dihedrals
        master ConnectivityGroups for each connectivity type

    """

The Parameters section is much clearer with the __init__ method. Another issue here is that the parameters specified don't line up with the examples; they go straight from topology to topology_format, but the examples show that you can pass in coordinate files too:


       u = Universe(topology, trajectory)          # read system from file(s)
       u = Universe(pdbfile)                       # read atoms and coordinates from PDB or GRO
       u = Universe(topology, [traj1, traj2, ...]) # read from a list of trajectories
       u = Universe(topology, traj1, traj2, ...)   # read from multiple trajectories
@richardjgowers
Copy link
Member

Hi @lilyminium

These are very valid criticisms, and good suggestions. If you want to open a few PRs for these changes these would be very welcome!

I would recommend however:

  • branch from develop not master, master is the state of the last release, whereas develop is the current cutting edge (which becomes master when I make releases).
  • try and only include one major change per pull request, this will make reviewing easier & faster. I would rather review 5x 200 line changes than 1x 1,000 line change :p

@lilyminium
Copy link
Member Author

On the flip side, the class docstring is not where I would first go to look for parameters to pass to init.

Narrator: it was.

I now know of help(cls) and the logic behind putting those parameters in the class docstring, so closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants