Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDKIT tests sometimes fails #2958

Closed
IAlibay opened this issue Sep 29, 2020 · 9 comments
Closed

RDKIT tests sometimes fails #2958

IAlibay opened this issue Sep 29, 2020 · 9 comments

Comments

@IAlibay
Copy link
Member

IAlibay commented Sep 29, 2020

Expected behavior

All tests pass

Actual behavior

In some cases (I believe @lilyminium first mentioned this, and I've seen this occur a few times on appveyor), the TestRDKitConverter.test_assign_coordinates can fail: https://ci.appveyor.com/project/orbeckst/mdanalysis/builds/35434315/job/4fl3rfbyde60rpqp

Current version of MDAnalysis

  • Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)") current develop
  • Which version of Python (python -V)? In the captured case: 3.6
  • Which operating system? In the captured case: Win10 64

Notes

This test failure is not deterministic, and I'm not sure what causes it to be triggered, maybe just hardware?

@IAlibay IAlibay changed the title RDKIT tests sometimes fail due to machine precision RDKIT tests sometimes fails Sep 29, 2020
@IAlibay
Copy link
Member Author

IAlibay commented Sep 30, 2020

It's done it again but this time on travis: https://travis-ci.com/github/MDAnalysis/mdanalysis/jobs/393115290

Again python 3.6...

@IAlibay
Copy link
Member Author

IAlibay commented Sep 30, 2020

Another error, different test failing, not python 3.6: https://travis-ci.com/github/MDAnalysis/mdanalysis/jobs/393257863

Also tagging @cbouy who might have a better idea as to why we are getting random failures.

@cbouy
Copy link
Member

cbouy commented Oct 1, 2020

For test_assign_coordinates, I was gonna say it looks like the precision is at fault but now I just don't understand the shape mismatch (shapes (42, 3), (1877, 3) mismatch)... 42 is a rather specific number that I sometimes use in tests so maybe it has something to do with the converter cache somehow ? 42 also appears in the travis fail

@IAlibay IAlibay mentioned this issue Oct 13, 2020
2 tasks
@IAlibay
Copy link
Member Author

IAlibay commented Oct 14, 2020

See here for another occasional RDKIT test failure: #2983 (comment)

@IAlibay
Copy link
Member Author

IAlibay commented Oct 19, 2020

Here's another failure from #2997

_______________ TestRDKitConverter.test_monomer_info[resid 1-0] ________________
self = <MDAnalysisTests.coordinates.test_rdkit.TestRDKitConverter object at 0x7f35d867ad68>
pdb = <Universe with 1877 atoms>, sel_str = 'resid 1', atom_index = 0
    @pytest.mark.parametrize("sel_str, atom_index", [
        ("resid 1", 0),
        ("resname LYS and name NZ", 1),
        ("resid 34 and altloc B", 2),
    ])
    def test_monomer_info(self, pdb, sel_str, atom_index):
        sel = pdb.select_atoms(sel_str)
>       umol = sel.convert_to("RDKIT")
testsuite/MDAnalysisTests/coordinates/test_rdkit.py:196: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
package/MDAnalysis/core/groups.py:3152: in convert_to
    return converter().convert(self.atoms)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <RDKitConverter>, obj = <AtomGroup with 14 atoms>, cache = True
NoImplicit = True, max_iter = 200
    def convert(self, obj, cache=True, NoImplicit=True, max_iter=200):
        """Write selection at current trajectory frame to
        :class:`~rdkit.Chem.rdchem.Mol`.
    
        Parameters
        -----------
        obj : :class:`~MDAnalysis.core.groups.AtomGroup` or :class:`~MDAnalysis.core.universe.Universe`
    
        cache : bool
            Use a cached copy of the molecule's topology when available. To be
            used, the cached molecule and the new one have to be made from the
            same AtomGroup object (same id) and with the same arguments passed
            to the converter (with the exception of this `cache` argument)
        NoImplicit : bool
            Prevent adding hydrogens to the molecule
        max_iter : int
            Maximum number of iterations to standardize conjugated systems.
            See :func:`_rebuild_conjugated_bonds`
        """
        # parameters passed to atomgroup_to_mol and used by the cache
        kwargs = dict(NoImplicit=NoImplicit, max_iter=max_iter)
    
        try:
            from rdkit import Chem
        except ImportError:
            raise ImportError("RDKit is required for the RDKitConverter but "
                              "it's not installed. Try installing it with \n"
                              "conda install -c conda-forge rdkit")
        try:
            # make sure to use atoms (Issue 46)
            ag = obj.atoms
        except AttributeError:
            raise TypeError("No `atoms` attribute in object of type {}, "
                            "please use a valid AtomGroup or Universe".format(
                                type(obj))) from None
    
        if cache:
            # key used to search the cache
            key = f"<{id(ag):#x}>" + ",".join(f"{key}={value}"
                                            for key, value in kwargs.items())
            try:
                mol = self._cache[key]
            except KeyError:
                # only keep the current molecule in cache
                self._cache.clear()
                # create the topology
                self._cache[key] = mol = self.atomgroup_to_mol(ag, **kwargs)
            # continue on copy of the cached molecule
            mol = copy.deepcopy(mol)
        else:
            self._cache.clear()
            mol = self.atomgroup_to_mol(ag, **kwargs)
    
        # add a conformer for the current Timestep
        if hasattr(ag, "positions"):
            if np.isnan(ag.positions).any():
                warnings.warn("NaN detected in coordinates, the output "
                              "molecule will not have 3D coordinates assigned")
            else:
                # assign coordinates
                conf = Chem.Conformer(mol.GetNumAtoms())
                for atom in mol.GetAtoms():
                    idx = atom.GetIntProp("_MDAnalysis_index")
>                   xyz = ag.positions[idx].astype(float)
E                   IndexError: index 14 is out of bounds for axis 0 with size 14

@lilyminium
Copy link
Member

Still intermittently occurs on GH actions:

https://github.com/MDAnalysis/mdanalysis/pull/2927/checks?check_run_id=1514424973#step:10:297

(shapes (30, 3), (1877, 3) mismatch)

@lilyminium
Copy link
Member

Still happening, still weird

__________________ TestRDKitConverter.test_assign_coordinates __________________
[gw0] linux -- Python 3.6.13 /usr/share/miniconda/envs/test/bin/python

self = <MDAnalysisTests.coordinates.test_rdkit.TestRDKitConverter object at 0x7f58e45c9d30>
pdb = <Universe with 1877 atoms>

    def test_assign_coordinates(self, pdb):
        mol = pdb.atoms.convert_to("RDKIT")
        positions = mol.GetConformer().GetPositions()
        indices = sorted(mol.GetAtoms(),
                         key=lambda a: a.GetIntProp("_MDAnalysis_index"))
        indices = [a.GetIdx() for a in indices]
>       assert_almost_equal(positions[indices], pdb.atoms.positions)
E       AssertionError: 
E       Arrays are not almost equal to 7 decimals
E       
E       (shapes (30, 3), (1877, 3) mismatch)
E        x: array([[ 4.0099999e-01,  4.0138000e+01,  1.7790001e+01],
E              [-5.4000002e-01,  3.9113998e+01,  1.8240999e+01],
E              [-2.8000001e-02,  3.8396999e+01,  1.9490999e+01],...
E        y: array([[ 4.0100e-01,  4.0138e+01,  1.7790e+01],
E              [-5.4000e-01,  3.9114e+01,  1.8241e+01],
E              [-2.8000e-02,  3.8397e+01,  1.9491e+01],...

https://github.com/MDAnalysis/mdanalysis/runs/2111126476

@orbeckst
Copy link
Member

orbeckst commented Apr 8, 2021

If we're collecting strange/intermittent RDKit test failures then here's the one from #2950 (comment)

_____________ TestRDKitConverter.test_mol_from_selection[ILE-38-1] _____________
[gw0] linux -- Python 3.8.8 /usr/share/miniconda/envs/test/bin/python

self = <MDAnalysisTests.coordinates.test_rdkit.TestRDKitConverter object at 0x7f20c185ec70>
peptide = <AtomGroup with 155 atoms>, resname = 'ILE', n_atoms = 38
n_fragments = 1

    @pytest.mark.parametrize("resname, n_atoms, n_fragments", [
        ("PRO", 14, 1),
        ("ILE", 38, 1),
        ("ALA", 20, 2),
        ("GLY", 21, 3),
    ])
    def test_mol_from_selection(self, peptide, resname, n_atoms, n_fragments):
        mol = peptide.select_atoms("resname %s" % resname).convert_to("RDKIT")
>       assert n_atoms == mol.GetNumAtoms()
E       assert 38 == 1
E        +  where 1 = <bound method GetNumAtoms of <rdkit.Chem.rdchem.Mol object at 0x7f20b9a6a280>>()
E        +    where <bound method GetNumAtoms of <rdkit.Chem.rdchem.Mol object at 0x7f20b9a6a280>> = <rdkit.Chem.rdchem.Mol object at 0x7f20b9a6a280>.GetNumAtoms

/home/runner/work/mdanalysis/mdanalysis/testsuite/MDAnalysisTests/coordinates/test_rdkit.py:186: AssertionError

@cbouy
Copy link
Member

cbouy commented Apr 19, 2021

I believe I finally understand why this happens thanks to #3235, and the RDKitConverter cache system is likely our culprit here. It should be fixed with #2942

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants