You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are a few open issues on documentation, but it seems to mostly focus on tutorials rather than API reference structure -- do let me know if I've missed somewhere this would be better located.
Is your feature request related to a problem? Please describe.
There's a lot of documentation in the code itself that isn't necessarily where it belongs, mostly in Class.__init__ vs Class docstrings. You've noted before that you have different kinds of users, who can be mostly grouped into MD-applications people and developer people. Applications people will be looking for examples in the tutorials, while developers will be looking at the API to understand how the code itself works. Having examples and notes in __init__(), therefore, is unexpected for both and clutters up the code. On the flip side, the class docstring is not where I would first go to look for parameters to pass to __init__.
Describe the solution you'd like
Move long Examples etc. to tutorial .rst pages. Keep method docstrings limited to parameters that the methods take, and what the method returns. Keep class docstrings limited to description, short notes/examples, references, and attributes.
Have self-documenting code with clear names and explicit arguments instead of kwargs (although I see that PR More user-friendly docs (Addressing Issue #1175) #1190 is already addressing that). Splitting up big methods into multiple smaller ones would also help interpretability. For example, Universe.__init__ could be turned into several defined class methods: Universe.from_files, Universe.from_streams, etc.
Additional context
Here's an example in the RMSF.__init__ docstring:
r"""Parameters
----------
atomgroup : AtomGroup
Atoms for which RMSF is calculated
start : int (optional)
starting frame, default None becomes 0.
stop : int (optional)
Frame index to stop analysis. Default: None becomes
n_frames. Iteration stops *before* this frame number,
which means that the trajectory would be read until the end.
step : int (optional)
step between frames, default None becomes 1.
verbose : bool (optional)
Show detailed progress of the calculation if set to ``True``; the
default is ``False``.
Raises
------
ValueError
raised if negative values are calculated, which indicates that a
numerical overflow or underflow occured
Notes
-----
The root mean square fluctuation of an atom :math:`i` is computed as the
time average
.. math::
\rho_i = \sqrt{\left\langle (\mathbf{x}_i - \langle\mathbf{x}_i\rangle)^2 \right\rangle}
No mass weighting is performed.
This method implements an algorithm for computing sums of squares while
avoiding overflows and underflows [Welford1962]_.
Examples
--------
In this example we calculate the residue RMSF fluctuations by analyzing
the :math:`\text{C}_\alpha` atoms. First we need to fit the trajectory
to the average structure as a reference. That requires calculating the
average structure first. Because we need to analyze and manipulate the
same trajectory multiple times, we are going to load it into memory
using the :mod:`~MDAnalysis.coordinates.MemoryReader`. (If your
trajectory does not fit into memory, you will need to :ref:`write out
intermediate trajectories <writing-trajectories>` to disk or
:ref:`generate an in-memory universe
<creating-in-memory-trajectory-label>` that only contains, say, the
protein)::
import MDAnalysis as mda
from MDAnalysis.analysis import align
from MDAnalysis.tests.datafiles import TPR, XTC
u = mda.Universe(TPR, XTC, in_memory=True)
protein = u.select_atoms("protein")
# 1) need a step to center and make whole: this trajectory
# contains the protein being split across periodic boundaries
#
# TODO
# 2) fit to the initial frame to get a better average structure
# (the trajectory is changed in memory)
prealigner = align.AlignTraj(u, select="protein and name CA", in_memory=True).run()
# 3) reference = average structure
reference_coordinates = u.trajectory.timeseries(asel=protein).mean(axis=1)
# make a reference structure (need to reshape into a 1-frame "trajectory")
reference = mda.Merge(protein).load_new(
reference_coordinates[:, None, :], order="afc")
We created a new universe ``reference`` that contains a single frame
with the averaged coordinates of the protein. Now we need to fit the
whole trajectory to the reference by minimizing the RMSD. We use
:class:`MDAnalysis.analysis.align.AlignTraj`::
aligner = align.AlignTraj(u, reference, select="protein and name CA", in_memory=True).run()
The trajectory is now fitted to the reference (the RMSD is stored as
`aligner.rmsd` for further inspection). Now we can calculate the RMSF::
from MDAnalysis.analysis.rms import RMSF
calphas = protein.select_atoms("name CA")
rmsfer = RMSF(calphas, verbose=True).run()
and plot::
import matplotlib.pyplot as plt
plt.plot(calphas.resnums, rmsfer.rmsf)
References
----------
.. [Welford1962] B. P. Welford (1962). "Note on a Method for
Calculating Corrected Sums of Squares and Products." Technometrics
4(3):419-420.
.. versionadded:: 0.11.0
.. versionchanged:: 0.16.0
refactored to fit with AnalysisBase API
.. deprecated:: 0.16.0
the keyword argument `quiet` is deprecated in favor of `verbose`.
.. versionchanged:: 0.17.0
removed unused keyword `weights`
"""
The method itself is two lines. In my opinion, everything after the Raises section (except the version notes) belongs in the class docstring, or even better, in a separate tutorial page somewhere. Plotting with matplotlib doesn't have anything to do with RMSF. Actually figuring out how to initialise the class (because the arguments are hidden away in kwargs) becomes quite a chore.
The Universe docstring is almost the exact opposite format (and these two systems are inconsistent). The __init__ method has no docstring. This is the class documentation:
"""The MDAnalysis Universe contains all the information describing the system.
The system always requires a *topology* file --- in the simplest case just
a list of atoms. This can be a CHARMM/NAMD PSF file or a simple coordinate
file with atom informations such as XYZ, PDB, Gromacs GRO, or CHARMM
CRD. See :ref:`Supported topology formats` for what kind of topologies can
be read.
A trajectory provides coordinates; the coordinates have to be ordered in
the same way as the list of atoms in the topology. A trajectory can be a
single frame such as a PDB, CRD, or GRO file, or it can be a MD trajectory
(in CHARMM/NAMD/LAMMPS DCD, Gromacs XTC/TRR, or generic XYZ format). See
:ref:`Supported coordinate formats` for what can be read as a
"trajectory".
As a special case, when the topology is a file that contains atom
information *and* coordinates (such as XYZ, PDB, GRO or CRD, see
:ref:`Supported coordinate formats`) then the coordinates are immediately
loaded from the "topology" file unless a trajectory is supplied.
Examples for setting up a universe::
u = Universe(topology, trajectory) # read system from file(s)
u = Universe(pdbfile) # read atoms and coordinates from PDB or GRO
u = Universe(topology, [traj1, traj2, ...]) # read from a list of trajectories
u = Universe(topology, traj1, traj2, ...) # read from multiple trajectories
Load new data into a universe (replaces old trajectory and does *not* append)::
u.load_new(trajectory) # read from a new trajectory file
Select atoms, with syntax similar to CHARMM (see
:class:`~Universe.select_atoms` for details)::
u.select_atoms(...)
Parameters
----------
topology : str, Topology object or stream
A CHARMM/XPLOR PSF topology file, PDB file or Gromacs GRO file; used to
define the list of atoms. If the file includes bond information,
partial charges, atom masses, ... then these data will be available to
MDAnalysis. A "structure" file (PSF, PDB or GRO, in the sense of a
topology) is always required. Alternatively, an existing
:class:`MDAnalysis.core.topology.Topology` instance may also be given.
topology_format
Provide the file format of the topology file; ``None`` guesses it from
the file extension [``None``] Can also pass a subclass of
:class:`MDAnalysis.topology.base.TopologyReaderBase` to define a custom
reader to be used on the topology file.
format
Provide the file format of the coordinate or trajectory file; ``None``
guesses it from the file extension. Note that this keyword has no
effect if a list of file names is supplied because the "chained" reader
has to guess the file format for each individual list member.
[``None``] Can also pass a subclass of
:class:`MDAnalysis.coordinates.base.ProtoReader` to define a custom
reader to be used on the trajectory file.
all_coordinates : bool
If set to ``True`` specifies that if more than one filename is passed
they are all to be used, if possible, as coordinate files (employing a
:class:`MDAnalysis.coordinates.chain.ChainReader`). [``False``] The
default behavior is to take the first file as a topology and the
remaining as coordinates. The first argument will always always be used
to infer a topology regardless of *all_coordinates*. This parameter is
ignored if only one argument is passed.
guess_bonds : bool, optional
Once Universe has been loaded, attempt to guess the connectivity
between atoms. This will populate the .bonds .angles and .dihedrals
attributes of the Universe.
vdwradii : dict, optional
For use with *guess_bonds*. Supply a dict giving a vdwradii for each
atom type which are used in guessing bonds.
is_anchor : bool, optional
When unpickling instances of
:class:`MDAnalysis.core.groups.AtomGroup` existing Universes are
searched for one where to anchor those atoms. Set to ``False`` to
prevent this Universe from being considered. [``True``]
anchor_name : str, optional
Setting to other than ``None`` will cause
:class:`MDAnalysis.core.groups.AtomGroup` instances pickled from the
Universe to only unpickle if a compatible Universe with matching
*anchor_name* is found. Even if *anchor_name* is set *is_anchor* will
still be honored when unpickling.
transformations: function or list, optional
Provide a list of transformations that you wish to apply to the
trajectory upon reading. Transformations can be found in
:mod:`MDAnalysis.transformations`, or can be user-created.
in_memory
After reading in the trajectory, transfer it to an in-memory
representations, which allow for manipulation of coordinates.
in_memory_step
Only read every nth frame into in-memory representation.
continuous : bool, optional
The `continuous` option is used by the
:mod:`ChainReader<MDAnalysis.coordinates.chain>`, which contains the
functionality to treat independent trajectory files as a single virtual
trajectory.
Attributes
----------
trajectory
currently loaded trajectory reader;
dimensions
current system dimensions (simulation unit cell, if set in the
trajectory)
atoms, residues, segments
master Groups for each topology level
bonds, angles, dihedrals
master ConnectivityGroups for each connectivity type
"""
The Parameters section is much clearer with the __init__ method. Another issue here is that the parameters specified don't line up with the examples; they go straight from topology to topology_format, but the examples show that you can pass in coordinate files too:
u = Universe(topology, trajectory) # read system from file(s)
u = Universe(pdbfile) # read atoms and coordinates from PDB or GRO
u = Universe(topology, [traj1, traj2, ...]) # read from a list of trajectories
u = Universe(topology, traj1, traj2, ...) # read from multiple trajectories
The text was updated successfully, but these errors were encountered:
These are very valid criticisms, and good suggestions. If you want to open a few PRs for these changes these would be very welcome!
I would recommend however:
branch from develop not master, master is the state of the last release, whereas develop is the current cutting edge (which becomes master when I make releases).
try and only include one major change per pull request, this will make reviewing easier & faster. I would rather review 5x 200 line changes than 1x 1,000 line change :p
There are a few open issues on documentation, but it seems to mostly focus on tutorials rather than API reference structure -- do let me know if I've missed somewhere this would be better located.
Is your feature request related to a problem? Please describe.
There's a lot of documentation in the code itself that isn't necessarily where it belongs, mostly in
Class.__init__
vsClass
docstrings. You've noted before that you have different kinds of users, who can be mostly grouped into MD-applications people and developer people. Applications people will be looking for examples in the tutorials, while developers will be looking at the API to understand how the code itself works. Having examples and notes in__init__()
, therefore, is unexpected for both and clutters up the code. On the flip side, the class docstring is not where I would first go to look for parameters to pass to__init__
.Describe the solution you'd like
Move long
Examples
etc. to tutorial.rst
pages. Keep method docstrings limited to parameters that the methods take, and what the method returns. Keep class docstrings limited to description, short notes/examples, references, and attributes.Have self-documenting code with clear names and explicit arguments instead of
kwargs
(although I see that PR More user-friendly docs (Addressing Issue #1175) #1190 is already addressing that). Splitting up big methods into multiple smaller ones would also help interpretability. For example,Universe.__init__
could be turned into several defined class methods:Universe.from_files
,Universe.from_streams
, etc.Additional context
Here's an example in the
RMSF.__init__
docstring:The method itself is two lines. In my opinion, everything after the
Raises
section (except the version notes) belongs in the class docstring, or even better, in a separate tutorial page somewhere. Plotting with matplotlib doesn't have anything to do with RMSF. Actually figuring out how to initialise the class (because the arguments are hidden away inkwargs
) becomes quite a chore.The
Universe
docstring is almost the exact opposite format (and these two systems are inconsistent). The__init__
method has no docstring. This is the class documentation:The
Parameters
section is much clearer with the__init__
method. Another issue here is that the parameters specified don't line up with the examples; they go straight fromtopology
totopology_format
, but the examples show that you can pass in coordinate files too:The text was updated successfully, but these errors were encountered: