Skip to content

Commit

Permalink
Merge branch 'master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
jhcepas authored Mar 28, 2020
2 parents 9a535b5 + b524047 commit 337647f
Show file tree
Hide file tree
Showing 41 changed files with 536 additions and 403 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,4 @@ test_tmp/
# Coverage files
.coverage
htmlcov/
ete
22 changes: 13 additions & 9 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,37 +1,41 @@
language: python
sudo: false

services:
- xvfb

python:
- "2.7"
- "3.4"
#- "3.4"
- "3.5"
- "3.6"

cache:
directories:
- test_tmp
- test_tmp

install:
- ./run_tests.sh -s --setup-only -v "$TRAVIS_PYTHON_VERSION"

before_script:
# Ensure tags are available on the cloned repository
- git fetch --tags --depth=50

# Start xvfb
- "export DISPLAY=:99.0"
- "sh -e /etc/init.d/xvfb start"

# - "export DISPLAY=:99.0"
# - "sh -e /etc/init.d/xvfb start"
# - sleep 3

script:
- ./run_tests.sh --test-only -sv "$TRAVIS_PYTHON_VERSION"
- if [[ $TRAVIS_PYTHON_VERSION == 2.7 ]]; then ./run_tests.sh --qt4 -sv "$TRAVIS_PYTHON_VERSION" ; fi
# - if [[ $TRAVIS_PYTHON_VERSION == 2.7 ]]; then ./run_tests.sh --qt4 -sv "$TRAVIS_PYTHON_VERSION" ; fi
- if [[ $TRAVIS_PYTHON_VERSION == 2.7 ]]; then ./run_tests.sh -sv "$TRAVIS_PYTHON_VERSION" ; fi
#- coverage run -m ete3.test.test_api
#- coverage run -a -m ete3.test.test_ete_evol # too heavy for travis
#- coverage run -a -m ete3.test.test_ete_build # too heavy for travis

after_success:
- coveralls

# branches:
# only:
# - master
Expand Down
32 changes: 21 additions & 11 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,16 @@
.. image:: https://badges.gitter.im/Join%20Chat.svg
:alt: Join the chat at https://gitter.im/jhcepas/ete
:target: https://gitter.im/jhcepas/ete?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge

..
.. image:: https://coveralls.io/repos/jhcepas/ete/badge.png
.. image:: http://img.shields.io/badge/stackoverflow-etetoolkit-blue.svg
:target: https://stackoverflow.com/questions/tagged/etetoolkit+or+ete3

.. image:: http://img.shields.io/badge/biostars-etetoolkit-purple.svg
:target: https://www.biostars.org/t/etetoolkit,ete3,ete,ete2/


Overview
-----------

Expand Down Expand Up @@ -46,21 +52,25 @@ Getting Support
**Whenerver possible, please avoid sending support related emails directly to
the developers. Keep communication public:**

- For any type of question on how to use ETE in a bioinformatics context, the
BioStars community (http://biostars.org) provides an excellent help desk. ETE
developers contribute there with answers, but you will also get feedback from
other users. It is recommended to tag your questions with the "etetoolkit"
label.
- For any type of question on how to use ETE in the bioinformatics context, use BioStars (http://biostars.org) or even StackOverflow forums.

- For technical problems or more ETE specific questions, you can also use the
official ETE mailing list at https://groups.google.com/d/forum/etetoolkit. To
avoid spam, messages from new users are moderated. Expect some delay until
your first message and account is validated.
Please use the **"etetoolkit"** tag for your questions:

.. image:: http://img.shields.io/badge/stackoverflow-etetoolkit-blue.svg
:target: https://stackoverflow.com/questions/tagged/etetoolkit+or+ete3

.. image:: http://img.shields.io/badge/biostars-etetoolkit-purple.svg
:target: https://www.biostars.org/t/etetoolkit,ete3,ete,ete2/

- Bug reports, feature requests and general discussion should be posted into github:
https://github.com/etetoolkit/ete/issues

- For any other inquire please contact *huerta /at/ embl.de*
- For more technical problems, you can also use the
official ETE mailing list at https://groups.google.com/d/forum/etetoolkit. To
avoid spam, messages from new users are moderated. Expect some delay until
your first message and account is validated.

- For any other inquire please contact *jhcepas /at/ gmail.com*


Contributing and BUG reporting
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.1.1
3.1.2
226 changes: 110 additions & 116 deletions ete3/coretype/tree.py
Original file line number Diff line number Diff line change
Expand Up @@ -1551,13 +1551,8 @@ def sort_descendants(self, attr="name"):
.. versionadded: 2.1
Sort the branches of a given tree by node names. After the
tree is sorted, nodes are labeled in ascending order. This
can be used to ensure that nodes in a tree with the same node
names are always labeled in the same way. Note that if
duplicated names are present, extra criteria should be added
to sort nodes.
Unique id is stored as a node._nid attribute
tree is sorted. Note that if duplicated names are present,
extra criteria should be added to sort nodes.
"""

Expand Down Expand Up @@ -2346,7 +2341,108 @@ def _resolve(node):
for n in target:
_resolve(n)

def cophenetic_matrix(self):
"""
.. versionadded: 3.1.1
Generate a cophenetic distance matrix of the treee to standard output
The `cophenetic matrix <https://en.wikipedia.org/wiki/Cophenetic>` is a matrix representation of the
distance between each node.
if we have a tree like
----A
_____________|y
| |
| ----B
________|z
| ----C
| |
|____________|x -----D
| |
|______|w
|
|
-----E
Where w,x,y,z are internal nodes.
d(A,B) = d(y,A) + d(y,B)
and
d(A, E) = d(z,A) + d(z, E) = {d(z,y) + d(y,A)} + {d(z,x) + d(x,w) + d(w,E)}
We use an idea inspired by the ete3 team: https://gist.github.com/jhcepas/279f9009f46bf675e3a890c19191158b :
For each node find its path to the root.
e.g.
A -> A, y, z
E -> E, w, x,z
and make these orderless sets. Then we XOR the two sets to only find the elements
that are in one or other sets but not both. In this case A, E, y, x, w.
The distance between the two nodes is the sum of the distances from each of those nodes
to the parent
One more optimization: since the distances are symmetric, and distance to itself is zero
we user itertools.combinations rather than itertools.permutations. This cuts our computes from theta(n^2)
1/2n^2 - n (= O(n^2), which is still not great, but in reality speeds things up for large trees).
For this tree, we will return the two dimensional array:
A B C D E
A 0 d(A-y) + d(B-y) d(A-z) + d(C-z) d(A-z) + d(D-z) d(A-z) + d(E-z)
B d(B-y) + d(A-y) 0 d(B-z) + d(C-z) d(B-z) + d(D-z) d(B-z) + d(E-z)
C d(C-z) + d(A-z) d(C-z) + d(B-z) 0 d(C-x) + d(D-x) d(C-x) + d(E-x)
D d(D-z) + d(A-z) d(D-z) + d(B-z) d(D-x) + d(C-x) 0 d(D-w) + d(E-w)
E d(E-z) + d(A-z) d(E-z) + d(B-z) d(E-x) + d(C-x) d(E-w) + d(D-w) 0
We will also return the one dimensional array with the leaves in the order in which they appear in the matrix
(i.e. the column and/or row headers).
:param filename: the optional file to write to. If not provided, output will be to standard output
:return: two-dimensional array and a one dimensional array
"""

leaves = self.get_leaves()
paths = {x: set() for x in leaves}

# get the paths going up the tree
# we get all the nodes up to the last one and store them in a set

for n in leaves:
if n.is_root():
continue
movingnode = n
while not movingnode.is_root():
paths[n].add(movingnode)
movingnode = movingnode.up

# now we want to get all pairs of nodes using itertools combinations. We need AB AC etc but don't need BA CA

leaf_distances = {x.name: {} for x in leaves}

for (leaf1, leaf2) in itertools.combinations(leaves, 2):
# figure out the unique nodes in the path
uniquenodes = paths[leaf1] ^ paths[leaf2]
distance = sum(x.dist for x in uniquenodes)
leaf_distances[leaf1.name][leaf2.name] = leaf_distances[leaf2.name][leaf1.name] = distance

allleaves = sorted(leaf_distances.keys()) # the leaves in order that we will return

output = [] # the two dimensional array that we will return

for i, n in enumerate(allleaves):
output.append([])
for m in allleaves:
if m == n:
output[i].append(0) # distance to ourself = 0
else:
output[i].append(leaf_distances[n][m])
return output, allleaves

def add_face(self, face, column, position="branch-right"):
"""
.. versionadded: 2.1
Expand Down Expand Up @@ -2474,117 +2570,15 @@ def phonehome(self):
from .. import _ph
_ph.call()

def cophenetic_matrix(self):
"""
.. versionadded: 3.1.1
Generate a cophenetic distance matrix of the treee to standard output
The `cophenetic matrix <https://en.wikipedia.org/wiki/Cophenetic>` is a matrix representation of the
distance between each node.
if we have a tree like
----A
_____________|y
| |
| ----B
________|z
| ----C
| |
|____________|x -----D
| |
|______|w
|
|
-----E
Where w,x,y,z are internal nodes.
d(A,B) = d(y,A) + d(y,B)
and
d(A, E) = d(z,A) + d(z, E) = {d(z,y) + d(y,A)} + {d(z,x) + d(x,w) + d(w,E)}
We use an idea inspired by the ete3 team: https://gist.github.com/jhcepas/279f9009f46bf675e3a890c19191158b :
For each node find its path to the root.
e.g.
A -> A, y, z
E -> E, w, x,z
and make these orderless sets. Then we XOR the two sets to only find the elements
that are in one or other sets but not both. In this case A, E, y, x, w.
The distance between the two nodes is the sum of the distances from each of those nodes
to the parent
One more optimization: since the distances are symmetric, and distance to itself is zero
we user itertools.combinations rather than itertools.permutations. This cuts our computes from theta(n^2)
1/2n^2 - n (= O(n^2), which is still not great, but in reality speeds things up for large trees).
For this tree, we will return the two dimensional array:
A B C D E
A 0 d(A-y) + d(B-y) d(A-z) + d(C-z) d(A-z) + d(D-z) d(A-z) + d(E-z)
B d(B-y) + d(A-y) 0 d(B-z) + d(C-z) d(B-z) + d(D-z) d(B-z) + d(E-z)
C d(C-z) + d(A-z) d(C-z) + d(B-z) 0 d(C-x) + d(D-x) d(C-x) + d(E-x)
D d(D-z) + d(A-z) d(D-z) + d(B-z) d(D-x) + d(C-x) 0 d(D-w) + d(E-w)
E d(E-z) + d(A-z) d(E-z) + d(B-z) d(E-x) + d(C-x) d(E-w) + d(D-w) 0
We will also return the one dimensional array with the leaves in the order in which they appear in the matrix
(i.e. the column and/or row headers).
:param filename: the optional file to write to. If not provided, output will be to standard output
:return: two-dimensional array and a one dimensional array
"""

leaves = self.get_leaves()
paths = {x: set() for x in leaves}

# get the paths going up the tree
# we get all the nodes up to the last one and store them in a set

for n in leaves:
if n.is_root():
continue
movingnode = n
while not movingnode.is_root():
paths[n].add(movingnode)
movingnode = movingnode.up

# now we want to get all pairs of nodes using itertools combinations. We need AB AC etc but don't need BA CA

leaf_distances = {x.name: {} for x in leaves}

for (leaf1, leaf2) in itertools.combinations(leaves, 2):
# figure out the unique nodes in the path
uniquenodes = paths[leaf1] ^ paths[leaf2]
distance = sum(x.dist for x in uniquenodes)
leaf_distances[leaf1.name][leaf2.name] = leaf_distances[leaf2.name][leaf1.name] = distance

allleaves = sorted(leaf_distances.keys()) # the leaves in order that we will return

output = [] # the two dimensional array that we will return

for i, n in enumerate(allleaves):
output.append([])
for m in allleaves:
if m == n:
output[i].append(0) # distance to ourself = 0
else:
output[i].append(leaf_distances[n][m])
return output, allleaves


def _translate_nodes(root, *nodes):
name2node = dict([ [n, None] for n in nodes if type(n) is str])
for n in root.traverse():
if n.name in name2node:
if name2node[n.name] is not None:
raise TreeError("Ambiguous node name: "+str(n.name))
else:
name2node[n.name] = n
if name2node:
for n in root.traverse():
if n.name in name2node:
if name2node[n.name] is not None:
raise TreeError("Ambiguous node name: "+str(n.name))
else:
name2node[n.name] = n

if None in list(name2node.values()):
notfound = [key for key, value in six.iteritems(name2node) if value is None]
Expand Down
Loading

0 comments on commit 337647f

Please sign in to comment.