Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #260 Partial python 3 compatibility #658

Merged
merged 25 commits into from
Jan 26, 2016

Conversation

jbarnoud
Copy link
Contributor

This pull request brings a bunch of small fixes to improve MDAnalysis compatibility with Python 3. MDAnalysis can now be imported on python 3, and some tests pass.

This PR, among other things deals, with:

  • some implicit relative imports; in python 3, imports must be either absolute or explicitly relative;
  • all calls to the print function; some were left as it in the docstrings, though;
  • map returning a generator on python 3
  • cStringIO being moved in the io module
  • cPickle being renamed pickle
  • calls to iteritems method of dict now pass through six (see Apply Six fixes for Python 3 #632)
  • generator do not have a next method anymore, instead one should use the next built-in;
  • other things, see the list of commits...

A few changes may have side effects.

First of all, the DCD and LAMMPS coordinate parsers are made optional on python 3 but not on python 2. In practice, MDAnalysis will fail to load the parsers but will continue to run. The TRZ coordinate parser is also made optional. Indeed, there is a small incompatibility with python 3 and I was not sure about how to fix it.

Python 3 changes the conditions for an object to be hashable (and therefore usable as a dict key or as part of a set). In python 2, a user-defined object is hashable unless the __hash__ method is overloaded to return None. It is assumed that two objects that are equal (a == b) will have the same hash. This assumption can break when a user overload the __eq__ method. This is why, in python 3, a user-defined object is not hashable by default is the __eq__ method get overloaded. MDAnalysis, relies on instances of TopologyObject to be hashable so they can be parts of set. Yet the __eq__ method is overloaded, so TopologyObject is not hashable in python 3. I wrote a __hash__ method for TopologyObject and I wrote it so as two instances that are equal have the same hash. This differs from the behavior of these objects in python 2. Before this PR, in python 2, if two instances involve the same atoms but in reverse order, then they are equal but they have a different hash. With this PR, these instances would be equal, and with the same hash (so only one of them will be kept in a set). I am not sure that it is the expected behavior, but is seemed to be what was expected from the documentation.

The following tests do pass:

  • analysis/test_distances.py
  • analysis/test_persistencelength.py
  • analysis/test_rdf.py
  • coordinates/test_dlpoly.py
  • coordinates/test_dms.py
  • coordinates/test_mol2.py
  • coordinates/test_netcdf.py
  • coordinates/test_pdbqt.py
  • coordinates/test_pqr.py
  • test_altloc.py
  • test_deprecated.py
  • test_imports.py
  • test_lammpsdata.py
  • test_log.py
  • test_meta.py
  • test_nuclinfo.py
  • test_persistence.py
  • test_qcprot.py
  • test_tprparser.py
  • test_units.py
  • test_util.py
  • topology/test_gro.py

The main reason for tests not to pass is that a lot of them rely on DCD test files. Since the DCD parser is not working on python 3, these tests do not pass.

An other reason for tests not to pass is that util.anyopen open text files as bytes rather than string. This is especially puzzling as I had to explicitly specify that TPR files had to be open as bytes. I am still investigating that point.

Besides fixing all the small things, here is what remains to be done:

  • fixing the anyopen problem
  • make nosetests skip all the tests that require a DCD file
  • add a warning at import so users know how experimental MDAnalysis on python 3 is
  • make TravisCI run the tests on python 3

I tried to understand each line I changed, but I may have overlooked things. So please, have a look.

Why is there a print here anyway?
The DCD parser is not compatible with python 3, this commit makes it
optional on that version of python. As the LAMPPS coordinate parser uses
the DCD parser, it is made optional as well.

The TRZ parser requires some small modifications to be compatible. It is
made optional until is can be tested.
The unicode representation for the Angstrom symbol in the dictionary
used for unit conversion was wrong. While the escaping sequence was
correct, it was used as a series of bytes rather than like a unicode
string. Also, the same key was created twice as two escaping sequences
were used (utf-16 and utf-8) and they were synonymous.

The tests were passing because they were testing against the same series
of bytes.

This commit use a proper unicode string as key and remove the redundant
definition. It also adds a test with the actual unicode symbol rather
than an escaping sequence.
On python 2, user defined object are hashable by default unless the
__hash__ method is overloaded to return None. On python 3, user objects
are not hashable if the __eq__ method is overloaded.

This commit defines a __hash__ method that returns a hash based on the
sorted indices of the atoms involved in the topology object. This should
reproduce the behavior advertised in the topology object documentation.
Yet, it is different from the previous behavior. In the previous behavior,
two different objects with the same atoms were equal for  __eq__ but
different regarding the hash.
* The test for format guessing ignore DCD, DATA, and LAMMPS on python 3
* fix some call to the `map` function that do not return a list on
  python 3
* other minor fixes
In python 3, numpy integers are not instances of int anymore.
* replace seek(0L) by seek(0)
* use six for StringIO
* use six for cPickle
* use the `next` builtin rather than the `next` method
An odd failure about StringIO remains in test_streamio.py on Python 3.
Read the TPR file as binary (mode='rb').
In Python 3, map returns a generator rather than a list.

Make coordinates/test_dl_poly.py pass on python 3
@jbarnoud
Copy link
Contributor Author

Note also that tests cannot run with the --memleak option as it is not python 3 compatible.

@richardjgowers
Copy link
Member

@jbarnoud this looks awesome. You can spin out the things that aren't easy into more issues, (eg I don't mind fixing TRZ once I know what doesn't work)

You can add in Py3 as an "allowed to fail" row in the matrix. I don't see any harm in having this on the develop build for now. It will let us see what needs doing to work towards passing
https://docs.travis-ci.com/user/customizing-the-build/#Rows-that-are-Allowed-to-Fail


def __hash__(self):
"""Makes the object hashable"""
return hash(self._cmp_key())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a hash needs to be a string, so you could just do hash(self.indices) or maybe hash((self.__class__.__name__, self.indices)). I think this hash will happen a lot, so I'd rather it was fast (not constructing strings each time)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I changed the _cmp_key method accordingly.

Tests that relies on DCD, LAMMPS, or TRZ test files ends with an error
because the reader is not supported on python 3 yet. This creates a lot
of noise in the nose report that hides the other failures. Hereby we
make these tests being skipped if the parser is not loaded.

Note that this reduce dramatically the test coverage on python 3!
@jbarnoud jbarnoud force-pushed the issue-260-python-3 branch 5 times, most recently from bb51c39 to 0ad9078 Compare January 25, 2016 12:07
The python 3 tests are allowed to fail so they will not mark pull
requests as broken.
@richardjgowers
Copy link
Member

I wouldn't bother skipping all the tests that currently don't work, I'd rather keep them as failing so we know what to fix. Once you're ready/driven insane we can merge this and split it up into many issues

@jbarnoud
Copy link
Contributor Author

@richardjgowers A lot of tests fail because the DCD reader is not compatible. It makes the report impossible to read because of the noise. Of course failing tests should be reported as failing, but in that particular instance it is the same error that get reported over and over.

I would like to fix the issue with anyopen as it would make a lot of tests pass. But it appears that I have proper work to do so I'll work on it latter. I'll open individual issues for the big problem that remain. Fel free to have a look at the changes in that PR.

@richardjgowers
Copy link
Member

@jbarnoud ready to review & merge?

@richardjgowers richardjgowers added this to the 0.14 milestone Jan 26, 2016
@jbarnoud
Copy link
Contributor Author

@richardjgowers Yes. I'll do an other PR for what remains to be done.

if six.PY2:
# DCD, LAMMPS, and TRZ are not supported on Python 3 yet
formats += [
('DATA', mda.topology.LAMMPSParser.DATAParser,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a separate issue for each of these? (I've done DCD)



class TestAtom(TestCase):
"""Tests of Atom."""

@dec.skipif(parser_not_found('DCD'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it ok to let all this stuff fail. We tell travis that a failed python3 test is ok for the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Letting these test fail creates a lot of noise that make nose's report painful to read. Also all the errors generated by these tests are unrelated to what the test tries to asses. The correct semantics would be known failures, but they still display the traceback in the report which left the problem unresolved.

I think the amount of skipped tests is enough to show that there is a problem with the coverage. Also, as soon as the DCD parser works, these tests will not be skipped any more. And so even before we remove the explicit skip decorator.

richardjgowers added a commit that referenced this pull request Jan 26, 2016
Issue #260 Partial python 3 compatibility
@richardjgowers richardjgowers merged commit ebc2f59 into MDAnalysis:develop Jan 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants