Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide consistent PRNG interface for all MDA tools that use random numbers #1933

Open
zemanj opened this issue Jun 12, 2018 · 1 comment
Open

Comments

@zemanj
Copy link
Member

zemanj commented Jun 12, 2018

Expected behaviour

All MDA tools employing random numbers should provide a consistent interface allowing to assign a seed for the employed pseudo-random number generator (PRNG) or even a PRNG state. If possible, numpy's random module should be used by default. This has already been proposed by @kain88-de in PR #1932.

A good example for how this could be done nicely already exists in the initializer of the class MDAnalysis.analysis.encore.clustering.ClusteringMethod.KMeans:

def __init__(self, ..., random_state=None, ...)
...

Here, the kwarg random_state accepts either an int serving as a seed or a numpy.random.RandomState object. By default, the global numpy PRNG state is used.

Actual behaviour

The way random numbers are handled in MDA varies widely, and implementation range from good examples as the above one to rather "ugly" ones such as in the C-level part of the MDAnalysis.analysis.encore.dimensionality_reduction.stochasticproxembed extension:

double CStochasticProximityEmbedding(...) {
    ...
    srand(time(NULL)+getpid()*getpid());
    for (int i=0; i<nelem*dim; i++) {
        d_coords[i] = (double) rand() / (double) RAND_MAX;
    }
    ...
}

Here, the random number quality is poor, and reproducing previous results is virtually impossible.

Affected parts of the MDA code:

At the Python level:

  • analysis/encore/bootstrap.py
  • analysis/encore/clustering/ClusteringMethod.py
  • analysis/hole.py (no possibility to pass PRNG state to external executable)
  • lib/transformations.py

At the C level:

  • analysis/encore/clustering/src/ap.c
  • analysis/encore/dimensionality_reduction/src/spe.c
  • lib/src/transformations/transformations.c (will be tricky)

To the best of my knowledge, providing access to numpy's global PRNG at the C-level is not easily possible. Unless we find a good way of exposing numpy.random.mtrand to the C-level, we'll have to port (at least parts of) the MDA C-level code to Cython.

Possible caveats:

  • Performance
  • Thread safety

Current version of MDAnalysis:

0.18.1-dev

@zemanj zemanj changed the title Provide consistent PRNG interface for all MDA tools using random numbers Provide consistent PRNG interface for all MDA tools that use random numbers Jun 12, 2018
@orbeckst
Copy link
Member

orbeckst commented Jul 5, 2018

Related to MDAnalysis/mdaencore#37 - I think... at least if one starts working on analysis.encore's random number handling then they should take the discussion in MDAnalysis/mdaencore#37 into account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants