Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix InterRDF_s when given density=True #2812

Merged
merged 2 commits into from
Jul 12, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions package/CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The rules for this file:

------------------------------------------------------------------------------
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
??/??/?? tylerjereddy, richardjgowers, IAlibay, hmacdope, orbeckst, cbouy,
lilyminium, daveminh, jbarnoud, yuxuanzhuang
lilyminium, daveminh, jbarnoud, yuxuanzhuang, VOD555

* 2.0.0

Expand All @@ -24,11 +24,14 @@ Fixes
* TOPParser no longer guesses elements when missing atomic number records
(Issues #2449, #2651)
* Testsuite does not any more matplotlib.use('agg') (#2191)
* In ChainReader, read_frame does not trigger change of iterating position.
* In ChainReader, read_frame does not trigger change of iterating position.
(Issue #2723, PR #2815)
* TRZReader now checks `n_atoms` on reading. (Issue #2817, PR #2820)
* TRZWriter now writes `n_atoms` to the file. (Issue #2817, PR #2820)

* rdf.InterRDF_s density keyword documented and now gives correct results for
density=True; the keyword was available since 0.19.0 but with incorrect
semantics and not documented and did not produce correct results (Issue
#2811, PR #2812)
Comment on lines +31 to +34
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the meaning of the density keyword so that it makes more sense. Given that we never documented it, this should be ok and I will classify it as a fix so that it can go into 1.0.1 PR #2798


Enhancements
* Added the RDKitParser which creates a `core.topology.Topology` object from
Expand All @@ -44,7 +47,7 @@ Enhancements
Changes
* deprecated NumPy type aliases have been replaced with their actual types
(see upstream NumPy PR 14882), and our pytest configuration will raise an
error for any violation when testing with development NumPy builds
error for any violation when testing with development NumPy builds
(`1.20.0`)
* Changes development status from Beta to Mature (Issue #2773)
* Removes deprecated MDAnalysis.analysis.hole, please use
Expand Down
158 changes: 125 additions & 33 deletions package/MDAnalysis/analysis/rdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@
`pair distribution functions`_ (`radial distribution functions`_ or "RDF").
The RDF :math:`g_{ab}(r)` between types of particles :math:`a` and :math:`b` is

.. _equation-gab:

.. math::

g_{ab}(r) = (N_{a} N_{b})^{-1} \sum_{i=1}^{N_a} \sum_{j=1}^{N_b}
Expand All @@ -46,6 +48,8 @@

and the average number of :math:`b` particles within radius :math:`r`

.. _equation-countab:

.. math::

N_{ab}(r) = \rho G_{ab}(r)
Expand All @@ -55,6 +59,15 @@
the first solvation shell :math:`N(r_1)` where :math:`r_1` is the position of
the first minimum in :math:`g(r)`.

In :class:`InterRDF_s`, we provide an option `density`. When `density` is
``False``, it will return the RDF :math:`g_{ab}(r)`; when `density` is
``True``, it will return the density of particle :math:`b` in a shell at
distance :math:`r` around a :math:`a` particle, which is
Comment on lines +62 to +65
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that density=False now means "calculate g(r)" and density=True means "calculate density".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we still need another PR to change the default density, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this PR switched the behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above #2812 (comment)


.. _equation-nab:

.. math::
n_{ab}(r) = \rho g_{ab}(r)

.. _`pair distribution functions`:
https://en.wikipedia.org/wiki/Pair_distribution_function
Expand All @@ -68,7 +81,7 @@
:class:`InterRDF` is a tool to calculate average radial distribution functions
between two groups of atoms. Suppose we have two AtomGroups ``A`` and
``B``. ``A`` contains atom ``A1``, ``A2``, and ``B`` contains ``B1``,
``B2``. Give ``A`` and ``B`` to class:`InterRDF`, the output will be the
``B2``. Given ``A`` and ``B`` to :class:`InterRDF`, the output will be the
average of RDFs between ``A1`` and ``B1``, ``A1`` and ``B2``, ``A2`` and
``B1``, ``A2`` and ``B2``. A typical application is to calculate the RDF of
solvent with itself or with another solute.
Expand All @@ -77,28 +90,82 @@
:members:
:inherited-members:

.. attribute:: bins

:class:`numpy.ndarray` of the centers of the `nbins` histogram
bins.

.. attribute:: edges

:class:`numpy.ndarray` of the `nbins + 1` edges of the histogram
bins.

.. attribute:: rdf

:class:`numpy.ndarray` of the :ref:`radial distribution
function<equation-gab>` values for the :attr:`bins`.

.. attribute:: count

:class:`numpy.ndarray` representing the radial histogram, i.e.,
the raw counts, for all :attr:`bins`.


Site-specific radial distribution function
------------------------------------------

:class:`InterRDF_s` calculates site-specific radial distribution
functions. Instead of two groups of atoms it takes as input a list of pairs of
AtomGroup, ``[[A, B], [C, D], ...]``. Give the same ``A`` and ``B`` to
:class:`InterRDF_s`, the output will be a list of RDFs between ``A1`` and
``B1``, ``A1`` and ``B2``, ``A2`` and ``B1``, ``A2`` and ``B2`` (and similarly
for ``C`` and ``D``). These site-specific radial distribution functions are
typically calculated if one is interested in the solvation shells of a ligand
in a binding site or the solvation of specific residues in a protein. A common
use case is to choose ``A`` and ``C`` to be AtomGroups that only contain a
single atom and ``W`` all solvent molecules: ``InterRDF_s(u, [[A, W], [B,
W]])`` will then produce the RDF of solvent around atom ``A[0]`` and around
atom ``B[0]``.

AtomGroup, ``[[A, B], [C, D], ...]``. Given the same ``A`` and ``B`` to
:class:`InterRDF_s`, the output will be a list of individual RDFs between
``A1`` and ``B1``, ``A1`` and ``B2``, ``A2`` and ``B1``, ``A2`` and ``B2`` (and
similarly for ``C`` and ``D``). These site-specific radial distribution
functions are typically calculated if one is interested in the solvation shells
of a ligand in a binding site or the solvation of specific residues in a
protein.

.. autoclass:: InterRDF_s
:members:
:inherited-members:

.. attribute:: bins

:class:`numpy.ndarray` of the centers of the `nbins` histogram
bins; all individual site-specific RDFs have the same bins.

.. attribute:: edges

:class:`numpy.ndarray` of the `nbins + 1` edges of the histogram
bins; all individual site-specific RDFs have the same bins.

.. attribute:: rdf

:class:`list` of the site-specific :ref:`radial distribution
functions<equation-gab>` or :ref:`density
functions<equation-nab>` for the :attr:`bins`. The list contains
``len(ags)`` entries. Each entry for the ``i``-th pair ``[A, B]
= ags[i]`` in `ags` is a :class:`numpy.ndarray` with shape
``(len(A), len(B))``, i.e., a stack of RDFs. For example,
``rdf[i][0, 2]`` is the RDF between atoms ``A[0]`` and ``B[2]``.

.. attribute:: count

:class:`list` of the site-specific radial histograms, i.e., the
raw counts, for all :attr:`bins`. The data have the same
structure as :attr:`rdf` except that the arrays contain the raw
counts.

.. attribute:: cdf

:class:`list` of the site-specific :ref:`cumulative
counts<equation-countab>`, for all :attr:`bins`. The data have the same
structure as :attr:`rdf` except that the arrays contain the cumulative
counts.

This attribute only exists after :meth:`get_cdf` has been run.




.. Not Implemented yet:
.. - Structure factor?
Expand All @@ -113,9 +180,17 @@


class InterRDF(AnalysisBase):
"""Intermolecular pair distribution function
r"""Intermolecular pair distribution function

The :ref:`radial distribution function<equation-gab>` is calculated by
histogramming distances between all particles in `g1` and `g2` while taking
periodic boundary conditions into account via the minimum image
convention.

The `exclusion_block` keyword may be used to exclude a set of distances
from the calculations.

InterRDF(g1, g2, nbins=75, range=(0.0, 15.0))
Results are available in the attributes :attr:`rdf` and :attr:`count`.

Arguments
---------
Expand Down Expand Up @@ -230,7 +305,7 @@ def _conclude(self):


class InterRDF_s(AnalysisBase):
"""Site-specific intermolecular pair distribution function
r"""Site-specific intermolecular pair distribution function

Arguments
---------
Expand All @@ -243,6 +318,18 @@ class InterRDF_s(AnalysisBase):
Number of bins in the histogram [75]
range : tuple or list (optional)
The size of the RDF [0.0, 15.0]
density : bool (optional)
``False``: calculate :math:`g_{ab}(r)`; ``True``: calculate
the true :ref:`single particle density<equation-nab>`
:math:`n_{ab}(r)`. The default is ``False``.

.. versionadded:: 1.0.1

This keyword was available since 0.19.0 but was not
documented. Furthermore, it had the opposite
meaning. Since 1.0.1 it is officially supported as
documented.


Example
-------
Expand All @@ -265,19 +352,20 @@ class InterRDF_s(AnalysisBase):

Results are available through the :attr:`bins` and :attr:`rdf` attributes::

plt.plot(rdf.bins, rdf.rdf[0][0][0])
plt.plot(rdf.bins, rdf.rdf[0][0, 0])

(Which plots the rdf between the first atom in ``s1`` and the first atom in
``s2``)

To generate the *cumulative distribution function* (cdf), use the
To generate the *cumulative distribution function* (cdf) in the sense of
"particles within radius :math:`r`", i.e., :math:`N_{ab}(r)`, use the
:meth:`~InterRDF_s.get_cdf` method ::

cdf = rdf.get_cdf()

Results are available through the :attr:'cdf' attribute::
Results are available through the :attr:`cdf` attribute::

plt.plot(rdf.bins, rdf.cdf[0][0][0])
plt.plot(rdf.bins, rdf.cdf[0][0, 0])

(Which plots the cdf between the first atom in ``s1`` and the first atom in
``s2``)
Expand All @@ -291,7 +379,7 @@ class InterRDF_s(AnalysisBase):

"""
def __init__(self, u, ags,
nbins=75, range=(0.0, 15.0), density=True, **kwargs):
nbins=75, range=(0.0, 15.0), density=False, **kwargs):
super(InterRDF_s, self).__init__(u.universe.trajectory, **kwargs)

# List of pairs of AtomGroups
Expand Down Expand Up @@ -325,8 +413,8 @@ def _single_frame(self):
box=self.u.dimensions)

for j, (idx1, idx2) in enumerate(pairs):
self.count[i][idx1, idx2, :] += np.histogram(dist[j],
**self.rdf_settings)[0]
self.count[i][idx1, idx2, :] += np.histogram(dist[j],
**self.rdf_settings)[0]

self.volume += self._ts.volume

Expand All @@ -342,31 +430,35 @@ def _conclude(self):

for i, (ag1, ag2) in enumerate(self.ags):
# Number of each selection
nA = len(ag1)
nB = len(ag2)
N = nA * nB
indices.append([ag1.indices, ag2.indices])

# Average number density
box_vol = self.volume / self.n_frames
density = N / box_vol
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
density = 1 / box_vol

if self._density:
rdf.append(self.count[i] / (density * vol * self.n_frames))
else:
rdf.append(self.count[i] / (vol * self.n_frames))
else:
rdf.append(self.count[i] / (density * vol * self.n_frames))

self.rdf = rdf
self.indices = indices

def get_cdf(self):
"""Calculate the cumulative distribution functions (CDF) for all sites.
Note that this is the actual count within a given radius, i.e.,
:math:`N(r)`.
r"""Calculate the cumulative counts for all sites.

This is the :ref:`cumulative count<equation-countab>` within a given
radius, i.e., :math:`N_{ab}(r)`.
Comment on lines +450 to +451
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we calculating the counts and calling it CDF? This is confusing...

I made it clear in the text but still: why did we do this instead of having separate methods that just do the thing that they are named after?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should have seperate functions to give CDF and cummulative N.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, new issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VOD555 can you raise issues for MDAnalysis and PMDA, please?


The result is returned and also stored in the attribute
:attr:`cdf`.


Returns
-------
cdf : list
list of arrays with the same structure as :attr:`rdf`
cdf : list
list of arrays with the same structure as :attr:`rdf`

"""
# Calculate cumulative distribution function
# Empty list to restore CDF
Expand Down
24 changes: 18 additions & 6 deletions testsuite/MDAnalysisTests/analysis/test_rdf_s.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
from numpy.testing import assert_almost_equal

import MDAnalysis as mda
from MDAnalysis.analysis.rdf import InterRDF_s
from MDAnalysis.analysis.rdf import InterRDF_s, InterRDF

from MDAnalysisTests.datafiles import GRO_MEMPROT, XTC_MEMPROT

Expand All @@ -38,16 +38,19 @@ def u():
@pytest.fixture(scope='module')
def sels(u):
s1 = u.select_atoms('name ZND and resid 289')
s2 = u.select_atoms('(name OD1 or name OD2) and resid 51 and sphzone 5.0 (resid 289)')
s2 = u.select_atoms(
'(name OD1 or name OD2) and resid 51 and sphzone 5.0 (resid 289)')
s3 = u.select_atoms('name ZND and (resid 291 or resid 292)')
s4 = u.select_atoms('(name OD1 or name OD2) and sphzone 5.0 (resid 291)')
ags = [[s1, s2], [s3, s4]]
return ags


@pytest.fixture(scope='module')
def rdf(u, sels):
return InterRDF_s(u, sels).run()


def test_nbins(u, sels):
rdf = InterRDF_s(u, sels, nbins=412).run()

Expand Down Expand Up @@ -90,15 +93,24 @@ def test_double_run(rdf):
assert len(rdf.count[0][0][1][rdf.count[0][0][1] == 5]) == 1
assert len(rdf.count[1][1][0][rdf.count[1][1][0] == 3]) == 1


def test_cdf(rdf):
rdf.get_cdf()
assert rdf.cdf[0][0][0][-1] == rdf.count[0][0][0].sum()/rdf.n_frames


@pytest.mark.parametrize("density, value", [
(True, 13275.775440503656),
(False, 0.021915460340071267)])

(None, 26551.55088100731), # default, like False (no kwarg, see below)
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
(False, 26551.55088100731),
(True, 0.021915460340071267)])
def test_density(u, sels, density, value):
rdf = InterRDF_s(u, sels, density=density).run()
kwargs = {'density': density} if density is not None else {}
rdf = InterRDF_s(u, sels, **kwargs).run()
assert_almost_equal(max(rdf.rdf[0][0][0]), value)
if not density:
s1 = u.select_atoms('name ZND and resid 289')
s2 = u.select_atoms(
'name OD1 and resid 51 and sphzone 5.0 (resid 289)')
rdf_ref = InterRDF(s1, s2).run()
assert_almost_equal(rdf_ref.rdf, rdf.rdf[0][0][0])