Issue #260 Partial python 3 compatibility #658

jbarnoud · 2016-01-25T00:28:23Z

This pull request brings a bunch of small fixes to improve MDAnalysis compatibility with Python 3. MDAnalysis can now be imported on python 3, and some tests pass.

This PR, among other things deals, with:

some implicit relative imports; in python 3, imports must be either absolute or explicitly relative;
all calls to the print function; some were left as it in the docstrings, though;
map returning a generator on python 3
cStringIO being moved in the io module
cPickle being renamed pickle
calls to iteritems method of dict now pass through six (see Apply Six fixes for Python 3 #632)
generator do not have a next method anymore, instead one should use the next built-in;
other things, see the list of commits...

A few changes may have side effects.

First of all, the DCD and LAMMPS coordinate parsers are made optional on python 3 but not on python 2. In practice, MDAnalysis will fail to load the parsers but will continue to run. The TRZ coordinate parser is also made optional. Indeed, there is a small incompatibility with python 3 and I was not sure about how to fix it.

Python 3 changes the conditions for an object to be hashable (and therefore usable as a dict key or as part of a set). In python 2, a user-defined object is hashable unless the __hash__ method is overloaded to return None. It is assumed that two objects that are equal (a == b) will have the same hash. This assumption can break when a user overload the __eq__ method. This is why, in python 3, a user-defined object is not hashable by default is the __eq__ method get overloaded. MDAnalysis, relies on instances of TopologyObject to be hashable so they can be parts of set. Yet the __eq__ method is overloaded, so TopologyObject is not hashable in python 3. I wrote a __hash__ method for TopologyObject and I wrote it so as two instances that are equal have the same hash. This differs from the behavior of these objects in python 2. Before this PR, in python 2, if two instances involve the same atoms but in reverse order, then they are equal but they have a different hash. With this PR, these instances would be equal, and with the same hash (so only one of them will be kept in a set). I am not sure that it is the expected behavior, but is seemed to be what was expected from the documentation.

The following tests do pass:

analysis/test_distances.py
analysis/test_persistencelength.py
analysis/test_rdf.py
coordinates/test_dlpoly.py
coordinates/test_dms.py
coordinates/test_mol2.py
coordinates/test_netcdf.py
coordinates/test_pdbqt.py
coordinates/test_pqr.py
test_altloc.py
test_deprecated.py
test_imports.py
test_lammpsdata.py
test_log.py
test_meta.py
test_nuclinfo.py
test_persistence.py
test_qcprot.py
test_tprparser.py
test_units.py
test_util.py
topology/test_gro.py

The main reason for tests not to pass is that a lot of them rely on DCD test files. Since the DCD parser is not working on python 3, these tests do not pass.

An other reason for tests not to pass is that util.anyopen open text files as bytes rather than string. This is especially puzzling as I had to explicitly specify that TPR files had to be open as bytes. I am still investigating that point.

Besides fixing all the small things, here is what remains to be done:

fixing the anyopen problem
make nosetests skip all the tests that require a DCD file
add a warning at import so users know how experimental MDAnalysis on python 3 is
make TravisCI run the tests on python 3

I tried to understand each line I changed, but I may have overlooked things. So please, have a look.

Why is there a print here anyway?

The DCD parser is not compatible with python 3, this commit makes it optional on that version of python. As the LAMPPS coordinate parser uses the DCD parser, it is made optional as well. The TRZ parser requires some small modifications to be compatible. It is made optional until is can be tested.

See MDAnalysis#632

The unicode representation for the Angstrom symbol in the dictionary used for unit conversion was wrong. While the escaping sequence was correct, it was used as a series of bytes rather than like a unicode string. Also, the same key was created twice as two escaping sequences were used (utf-16 and utf-8) and they were synonymous. The tests were passing because they were testing against the same series of bytes. This commit use a proper unicode string as key and remove the redundant definition. It also adds a test with the actual unicode symbol rather than an escaping sequence.

On python 2, user defined object are hashable by default unless the __hash__ method is overloaded to return None. On python 3, user objects are not hashable if the __eq__ method is overloaded. This commit defines a __hash__ method that returns a hash based on the sorted indices of the atoms involved in the topology object. This should reproduce the behavior advertised in the topology object documentation. Yet, it is different from the previous behavior. In the previous behavior, two different objects with the same atoms were equal for __eq__ but different regarding the hash.

* The test for format guessing ignore DCD, DATA, and LAMMPS on python 3 * fix some call to the `map` function that do not return a list on python 3 * other minor fixes

In python 3, numpy integers are not instances of int anymore.

* replace seek(0L) by seek(0) * use six for StringIO

* use six for cPickle * use the `next` builtin rather than the `next` method

An odd failure about StringIO remains in test_streamio.py on Python 3.

Read the TPR file as binary (mode='rb').

In Python 3, map returns a generator rather than a list. Make coordinates/test_dl_poly.py pass on python 3

jbarnoud · 2016-01-25T00:33:38Z

Note also that tests cannot run with the --memleak option as it is not python 3 compatible.

richardjgowers · 2016-01-25T08:59:12Z

@jbarnoud this looks awesome. You can spin out the things that aren't easy into more issues, (eg I don't mind fixing TRZ once I know what doesn't work)

You can add in Py3 as an "allowed to fail" row in the matrix. I don't see any harm in having this on the develop build for now. It will let us see what needs doing to work towards passing
https://docs.travis-ci.com/user/customizing-the-build/#Rows-that-are-Allowed-to-Fail

richardjgowers · 2016-01-25T09:01:59Z

package/MDAnalysis/core/topologyobjects.py

+
+    def __hash__(self):
+        """Makes the object hashable"""
+        return hash(self._cmp_key())


I don't think a hash needs to be a string, so you could just do hash(self.indices) or maybe hash((self.__class__.__name__, self.indices)). I think this hash will happen a lot, so I'd rather it was fast (not constructing strings each time)

You're right. I changed the _cmp_key method accordingly.

Tests that relies on DCD, LAMMPS, or TRZ test files ends with an error because the reader is not supported on python 3 yet. This creates a lot of noise in the nose report that hides the other failures. Hereby we make these tests being skipped if the parser is not loaded. Note that this reduce dramatically the test coverage on python 3!

The python 3 tests are allowed to fail so they will not mark pull requests as broken.

@richardjgowers

Following the comment of @richardjgowers on MDAnalysis#658

Note however than some tests are skipped because they rely on files in unsupported formats.

richardjgowers · 2016-01-25T14:32:51Z

I wouldn't bother skipping all the tests that currently don't work, I'd rather keep them as failing so we know what to fix. Once you're ready/driven insane we can merge this and split it up into many issues

jbarnoud · 2016-01-25T14:50:17Z

@richardjgowers A lot of tests fail because the DCD reader is not compatible. It makes the report impossible to read because of the noise. Of course failing tests should be reported as failing, but in that particular instance it is the same error that get reported over and over.

I would like to fix the issue with anyopen as it would make a lot of tests pass. But it appears that I have proper work to do so I'll work on it latter. I'll open individual issues for the big problem that remain. Fel free to have a look at the changes in that PR.

richardjgowers · 2016-01-26T10:16:08Z

@jbarnoud ready to review & merge?

jbarnoud · 2016-01-26T10:18:03Z

@richardjgowers Yes. I'll do an other PR for what remains to be done.

richardjgowers · 2016-01-26T10:26:19Z

testsuite/MDAnalysisTests/test_util.py

+    if six.PY2:
+        # DCD, LAMMPS, and TRZ are not supported on Python 3 yet
+        formats += [
+            ('DATA', mda.topology.LAMMPSParser.DATAParser,


Can we get a separate issue for each of these? (I've done DCD)

kain88-de · 2016-01-26T10:33:38Z

testsuite/MDAnalysisTests/test_atomgroup.py



 class TestAtom(TestCase):
    """Tests of Atom."""

+    @dec.skipif(parser_not_found('DCD'),


isn't it ok to let all this stuff fail. We tell travis that a failed python3 test is ok for the moment.

Letting these test fail creates a lot of noise that make nose's report painful to read. Also all the errors generated by these tests are unrelated to what the test tries to asses. The correct semantics would be known failures, but they still display the traceback in the report which left the problem unresolved.

I think the amount of skipped tests is enough to show that there is a problem with the coverage. Also, as soon as the DCD parser works, these tests will not be skipped any more. And so even before we remove the explicit skip decorator.

Issue #260 Partial python 3 compatibility

jbarnoud added 15 commits January 25, 2016 00:24

Fix implicit relative import in the PDB parser

bd6663d

Fix print for py3 in LAMMPS topology parser

756ca76

Why is there a print here anyway?

Fix implicit relative import in test plugins

36a38a5

Replace iteritems with six.iteritems

7b1fded

See MDAnalysis#632

Replace print statement by print function

23b8cdd

Make tests from test_util.py pass on python 3

9f191b6

* The test for format guessing ignore DCD, DATA, and LAMMPS on python 3 * fix some call to the `map` function that do not return a list on python 3 * other minor fixes

Make test_altloc.py pass on Python 3

6a5c60f

In python 3, numpy integers are not instances of int anymore.

Male test_log.py pass on Python 3

fa990ec

* replace seek(0L) by seek(0) * use six for StringIO

Make test_persistance.py pass on Python 3

6fed196

* use six for cPickle * use the `next` builtin rather than the `next` method

Use six for StringIO and cPickle everywhere

269c629

An odd failure about StringIO remains in test_streamio.py on Python 3.

Make test_tprparser.py pass on Python 3

b1640c3

Read the TPR file as binary (mode='rb').

Fix some call to map that expected a list output

64ad467

In Python 3, map returns a generator rather than a list. Make coordinates/test_dl_poly.py pass on python 3

jbarnoud added help wanted Python3 Work in progress labels Jan 25, 2016

richardjgowers reviewed Jan 25, 2016
View reviewed changes

jbarnoud added 4 commits January 25, 2016 10:38

Avoid import * in test_modelling.py

20d0ab0

Avoid import * in test_topology

d7d9562

Avoid import * in test_atomgroup.py

c739bbe

jbarnoud force-pushed the issue-260-python-3 branch 5 times, most recently from bb51c39 to 0ad9078 Compare January 25, 2016 12:07

Make Travis run the tests on python 3

18beb92

The python 3 tests are allowed to fail so they will not mark pull requests as broken.

jbarnoud force-pushed the issue-260-python-3 branch from 0ad9078 to 18beb92 Compare January 25, 2016 12:12

jbarnoud added 4 commits January 25, 2016 13:52

Simplify the __hash__ function of TopologyObject

c4281f3

Following the comment of @richardjgowers on MDAnalysis#658

Add a warning when MDAnalysis is imported on py 3

3acef5b

Make test_topology.py pass on python 3

b51ba1c

Note however than some tests are skipped because they rely on files in unsupported formats.

Skip a test that relies on the LAMMPS parser

b0f0364

Random fixes to improve python 3 compatibility

187272c

jbarnoud force-pushed the issue-260-python-3 branch from 9db0c76 to 187272c Compare January 25, 2016 14:42

jbarnoud added needs review and removed Work in progress labels Jan 25, 2016

richardjgowers self-assigned this Jan 26, 2016

richardjgowers added this to the 0.14 milestone Jan 26, 2016

richardjgowers reviewed Jan 26, 2016
View reviewed changes

jbarnoud mentioned this pull request Jan 26, 2016

Memleak nose plugin does not work on python 3 #662

Closed

kain88-de reviewed Jan 26, 2016
View reviewed changes

richardjgowers added a commit that referenced this pull request Jan 26, 2016

Merge pull request #658 from jbarnoud/issue-260-python-3

ebc2f59

Issue #260 Partial python 3 compatibility

richardjgowers merged commit ebc2f59 into MDAnalysis:develop Jan 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue #260 Partial python 3 compatibility #658

Issue #260 Partial python 3 compatibility #658

jbarnoud commented Jan 25, 2016

jbarnoud commented Jan 25, 2016

richardjgowers commented Jan 25, 2016

richardjgowers Jan 25, 2016

jbarnoud Jan 25, 2016

richardjgowers commented Jan 25, 2016

jbarnoud commented Jan 25, 2016

richardjgowers commented Jan 26, 2016

jbarnoud commented Jan 26, 2016

richardjgowers Jan 26, 2016

kain88-de Jan 26, 2016

jbarnoud Jan 26, 2016

Issue #260 Partial python 3 compatibility #658

Issue #260 Partial python 3 compatibility #658

Conversation

jbarnoud commented Jan 25, 2016

jbarnoud commented Jan 25, 2016

richardjgowers commented Jan 25, 2016

richardjgowers Jan 25, 2016

Choose a reason for hiding this comment

jbarnoud Jan 25, 2016

Choose a reason for hiding this comment

richardjgowers commented Jan 25, 2016

jbarnoud commented Jan 25, 2016

richardjgowers commented Jan 26, 2016

jbarnoud commented Jan 26, 2016

richardjgowers Jan 26, 2016

Choose a reason for hiding this comment

kain88-de Jan 26, 2016

Choose a reason for hiding this comment

jbarnoud Jan 26, 2016

Choose a reason for hiding this comment