Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree.mrca has side effect of changing rooting state #152

Closed
mmore500 opened this issue Mar 30, 2023 · 6 comments · Fixed by #153
Closed

Tree.mrca has side effect of changing rooting state #152

mmore500 opened this issue Mar 30, 2023 · 6 comments · Fixed by #153

Comments

@mmore500
Copy link
Collaborator

Here is a minimum working example:

>>> import dendropy as dp
>>> tree = dp.Tree.get(data="((10,(((5,(11)),7,9))),((((1)))));", schema="newick")
>>> leaf_taxa = [leaf.taxon for leaf in tree.leaf_node_iter()]
>>> print("before tree.is_rooted", tree.is_rooted)
before tree.is_rooted None
>>> tree.mrca(taxa=leaf_taxa[:2])
<Node object at 0x7fd2df9bfee0: 'None' (None)>
>>> print("after tree.is_rooted", tree.is_rooted)
after tree.is_rooted False

Works on multiple versions of Python, including Python 3.9.16 (main, Dec 7 2022, 01:12:08) [GCC 11.3.0] on linux with dendropy 4.5.2.

In a more copy/pasteable form:

import dendropy as dp

tree = dp.Tree.get(
    data="((10,(((5,(11)),7,9))),((((1)))));",
    schema="newick",
)
leaf_taxa = [
    leaf.taxon for leaf in tree.leaf_node_iter()
]

print("before tree.is_rooted", tree.is_rooted)
tree.mrca(taxa=leaf_taxa[:2])
print("after tree.is_rooted", tree.is_rooted)

This side effect is can have very unexpected and perplexing consequences.
Consider, for example

>>> import dendropy as dp
>>> tree = dp.Tree.get(data="((10,(((5,(11)),7,9))),((((1)))));", schema="newick")
>>> leaf_taxa = [leaf.taxon for leaf in tree.leaf_node_iter()]
>>> print("before", tree.as_string(schema='newick'))
before ((10,(((5,(11)),7,9))),((((1)))));

>>> tree.mrca(taxa=leaf_taxa[:2])
<Node object at 0x7f70837fbee0: 'None' (None)>
>>> print("after", tree.as_string(schema='newick'))
after [&U] (10,(((5,(11)),7,9)),((((1)))));

(Note the addition of the [&U] prefix after calling mrca.)

as far as I can tell what is happening is Tree.mrca

mmore500 added a commit to mmore500/DendroPy that referenced this issue Mar 30, 2023
Sets `collapse_unrooted_basal_bifurcation=False` in call to
`Tree.encode_bipartitions` inside `Tree.mrca`. Closes jeetsukumaran#152.
mmore500 added a commit to mmore500/DendroPy that referenced this issue Mar 30, 2023
Sets `collapse_unrooted_basal_bifurcation=False` in call to
`Tree.encode_bipartitions` inside `Tree.mrca`. Closes jeetsukumaran#152.
@jeetsukumaran
Copy link
Owner

Thanks for bringing this to our attention and this fix!

I appreciate the issue and the fix makes sense. I'll look into this in a bit, and please don't hesitate to send me a reminder if I dawdle.

One question I have is whether or not the '[&U]' is cosmetic or is the tree.is_rooted property set. If the latter, is it only changed if it is None, or is it otherwise unconditionally set?

Maybe not related: does it make sense to ask for an MRCA of an unrooted tree? Should we throw an error?

@mmore500
Copy link
Collaborator Author

mmore500 commented Mar 30, 2023

One question I have is whether or not the '[&U]' is cosmetic or is the tree.is_rooted property set. If the latter, is it only changed if it is None, or is it otherwise unconditionally set?

I did an additional experiment, and it looks like the rootedness is only changed to True from the None state. Here, I set the rootedness explicitly to False and it remained False after the mrca operation.

>>> import dendropy as dp
>>> tree = dp.Tree.get(
...     data="((10,(((5,(11)),7,9))),((((1)))));",
...     schema="newick",
... )
>>> leaf_taxa = [
...     leaf.taxon for leaf in tree.leaf_node_iter()
... ]
>>> print("first tree.is_rooted", tree.is_rooted)
first tree.is_rooted None
>>> tree.is_rooted = False
>>> print("second tree.is_rooted", tree.is_rooted)
second tree.is_rooted False
>>> tree.mrca(taxa=leaf_taxa[:2])
<Node object at 0x7f7e81d0cd60: 'None' (None)>
>>> print("after tree.is_rooted", tree.is_rooted)
after tree.is_rooted False

@mmore500
Copy link
Collaborator Author

Maybe not related: does it make sense to ask for an MRCA of an unrooted tree? Should we throw an error?

Not sure 😅 I was kind of wondering that too

It looks like the existing behavior is to use seed_node implicitly as the root, even when is_rooted is False.

Calculating the MRCA between leaves 10 and 11 on this tree gives the root node R as the result.
image

>>> import dendropy as dp
>>> tree = dp.Tree.get(
...     data="(10,(((5,(11)),7,9)),((((1)))))R;",
...     schema="newick",
... )
>>> leaf_taxa = [
...     leaf.taxon for leaf in tree.leaf_node_iter()
...     if leaf.taxon.label in ("10", "11")
... ]
>>> print(leaf_taxa)
[<Taxon 0x7efda9708ca0 '10'>, <Taxon 0x7efda9708f10 '11'>]
>>> print("first tree.is_rooted", tree.is_rooted)
first tree.is_rooted None
>>> tree.is_rooted = False
>>> print("second tree.is_rooted", tree.is_rooted)
second tree.is_rooted False
>>> print(tree.mrca(taxa=leaf_taxa[:2]))
<Node object at 0x7efda9708d90: 'R' (None)>

@mmore500
Copy link
Collaborator Author

mmore500 commented Mar 30, 2023

There could plausibly be some kind of argument to calculate the MRCA between two nodes in an unrooted tree by taking the shortest path between them (would be well-defined/unique) and selecting the node closest to the middle of this path. How the MRCA between more than two nodes might be interpreted in an unrooted tree isn't immediately obvious to me Plus, for either of these cases, I doubt there's a substantial intentional use case for MRCA in an unrooted tree anyways.

In the scenario I ran into, I was basically relying on the ambiguity of the None rootedness state to prevent introduction of the [&R] or [&U] markers into newick string output. This can certainly be accomplished in other ways.

Warning or throwing an error for the False and None could be a reasonable choice, especially from a strict/literal interpretation of MRCA.

@mmore500
Copy link
Collaborator Author

I did a quick sanity check to see if BioPython had made any kind of decision here, and it looks like they're also just using the implicit root of an unrooted tree without a warning or error.

>>> from io import StringIO
>>> from Bio import Phylo
>>> tree = Phylo.read(StringIO("(10,(((5,(11)),7,9)),((((1)))))R;"), "newick")
>>> tree.rooted
False
>>> leaf1 = tree.get_terminals()[0]
>>> leaf1
Clade(name='10')
>>> leaf2 = tree.get_terminals()[1]
>>> leaf2
Clade(name='5')
>>> mrca = tree.common_ancestor(leaf1, leaf2)
>>> mrca
Clade(name='R')

@jeetsukumaran
Copy link
Owner

Ok, let's go ahead and merge this. Maybe a warning would be good thing to add though

@jeetsukumaran jeetsukumaran reopened this Apr 1, 2023
mmore500 added a commit to mmore500/DendroPy that referenced this issue Oct 25, 2023
Sets `collapse_unrooted_basal_bifurcation=False` in call to
`Tree.encode_bipartitions` inside `Tree.mrca`. Closes jeetsukumaran#152.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants