Tree Manipulation and Restructuring

The Tree class provides both low-level and high-level methods for manipulating tree structure.

Low-level methods are associated with Node objects, and allow to restructure the relationships between nodes at a fine level: add_child, new_child, remove_child, etc.

In most cases, however, you will be using high-level methods to restructure Tree objects.

In all cases, if any part of the Tree object’s structural relations change, and you are interested in calculating any metrics or statistics on the tree or comparing the tree to another tree, you need to call update_bipartitions on the object to update the internal splits hash representation. This is not done for you automatically because there is a computational cost associated with the operation, and the splits hashes are not always needed. Furthermore, even when needed, if there are a number of structural changes to be made to a Tree object before calculations/comparisions, it makes sense to postpone the splits rehashing until there all the tree manipulations are completed. Most methods that affect the tree structure that require the splits hashes to updated take a update_bipartitions argument. By specifying True for this, the Tree object will recalculate the splits hashes after the changes have been made.

Rooting, Derooting and Rerooting

The Rooting of Tree(s) Read from External Sources

Newick and NEXUS formats have a convention where the rooting of the tree is specified by a special comment token preceding the tree statement: “[&R]” to indicate a rooted tree:

[&R] ((A,B),(C,D));

and : “[&U]” to indicate an unrooted tree:

[&U] ((A,B),(C,D));

These rooting comment tokens are respected when tree data is read. If no such comment token is given, then the tree is assumed to be unrooted by default.

You can control the behavior of trees read by using the “rooting keyword argument when using the “get” or “read” methods of the Tree or TreeList classes. This takes one of four string values which determines how the rooting states of the tree(s) will be set:

“default-unrooted” [default]
All trees are interpreted as unrooted unless a “[&R]” comment token explicitly specifies them as rooted.
“default-rooted”
All trees are interpreted as rooted unless a “[&U]” comment token explicitly specifies them as unrooted.
“force-unrooted”
All trees are unconditionally interpreted as unrooted.
“force-rooted”
All trees are unconditionally interpreted as rooted.

The behavior of this can be be summarized by the following:

Keyword Argument [&U] in Tree Statement [&R] in Tree Statement No Rooting Given in Tree Statement
rooting=None (or unspecified) unrooted rooted None
rooting="default-unrooted" unrooted rooted unrooted
rooting="default-rooted" unrooted rooted rooted
rooting="force-unrooted" unrooted unrooted unrooted
rooting="force-rooted" rooted rooted rooted

As an example:

import dendropy

# force tree to be read as rooted
mle.rooted = dendropy.Tree.get(
        path='pythonidae.mle.nex',
        schema='nexus',
        rooting='force-rooted')

# force tree to be read as unrooted
mle.rooted = dendropy.Tree.get(
        path='pythonidae.mle.nex',
        schema='nexus',
        rooting='force-unrooted')

Setting the Rooting State

All Tree objects have a boolean property, is_rooted that DendroPy uses to track whether or not the tree should be treated as rooted. The property is_unrooted is also defined, and these two properties are synchronized. Thus setting is_rooted to True will result in is_unrooted being set to False and vice versa.

The state of a Tree object’s rootedness flag does not modify any internal structural relationship between nodes. It simply determines how its splits hashes are calculated, which in turn affects a broad range of comparison and metric operations. Thus you need to update the splits hashes after modifying the is_rooted property by calling the update_bipartitions before carrying out any calculations on or with the Tree object. Note that calling update_bipartitions on an unrooted tree will force the basal split to be a trifurcation. So if the original tree was bifurcating, the end result will be a tree with a trifurcation at the root. This can be prevented by passing in the keyword argument suppress_unifurcations=False to update_bipartitions.

For example, the following:

import dendropy

tree_str = "[&R] (A, (B, (C, (D, E))));"

tree = dendropy.Tree.get_from_string(
        tree_str,
        "newick")

print("Original:")
print(tree.as_ascii_plot())

tree.is_rooted = False
print("After `is_rooted=False`:")
print(tree.as_ascii_plot())

tree.update_bipartitions()
print("After `update_bipartitions()`:")
print(tree.as_ascii_plot())

tree2 = dendropy.Tree.get_from_string(
        tree_str,
        "newick")
tree2.is_rooted = False
tree2.update_bipartitions(suppress_unifurcations=False)
print("After `update_bipartitions(suppress_unifurcations=False)`:")
print(tree2.as_ascii_plot())

will result in:

Original:
/---------------------------------------------------- A
+
|            /--------------------------------------- B
\------------+
             |            /-------------------------- C
             \------------+
                          |            /------------- D
                          \------------+
                                       \------------- E


After `is_rooted=False`:
/---------------------------------------------------- A
+
|            /--------------------------------------- B
\------------+
             |            /-------------------------- C
             \------------+
                          |            /------------- D
                          \------------+
                                       \------------- E


After `update_bipartitions()`:
/---------------------------------------------------- A
|
+---------------------------------------------------- B
|
|                /----------------------------------- C
\----------------+
                 |                 /----------------- D
                 \-----------------+
                                   \----------------- E


After `update_bipartitions(suppress_unifurcations=False)`:
/---------------------------------------------------- A
+
|            /--------------------------------------- B
\------------+
             |            /-------------------------- C
             \------------+
                          |            /------------- D
                          \------------+
                                       \------------- E

Derooting

To deroot a rooted Tree, you can also call the deroot method, which collapses the root to a trifurcation if it is bifurcation and sets the is_rooted to False. The deroot method has the same structural and semantic affect of is_rooted to False and then calling update_bipartitions. You would use the former if you are not going to be doing any tree comparisons or calculating tree metrics, and thus do not want to calculate the splits hashes.

Rerooting

To reroot a Tree along an existing edge, you can use the reroot_at_edge method. This method takes an Edge object as as its first argument. This rerooting is a structural change that will require the splits hashes to be updated before performing any tree comparisons or calculating tree metrics. If needed, you can do this yourself by calling update_bipartitions later, or you can pass in True as the second argument to the reroot_at_edge method call, which instructs DendroPy to automatically update the splits for you.

As an example, the following reroots the tree along an internal edge (note that we do not recalculate the splits hashes, as we are not carrying out any calculations or comparisons with the Tree):

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy

tree_str = "[&R] (A, (B, (C, (D, E))));"

tree = dendropy.Tree.get(
        data=tree_str,
        schema="newick")

print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
mrca = tree.mrca(taxon_labels=["D", "E"])
tree.reroot_at_edge(mrca.edge, update_bipartitions=False)
print("After:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())

and results in:

Before:
[&R] (A,(B,(C,(D,E))));

/---------------------------------------------------- A
+
|            /--------------------------------------- B
\------------+
             |            /-------------------------- C
             \------------+
                          |            /------------- D
                          \------------+
                                       \------------- E


After:
[&R] ((D,E),(C,(B,A)));

                                   /----------------- D
/----------------------------------+
|                                  \----------------- E
+
|                /----------------------------------- C
\----------------+
                 |                 /----------------- B
                 \-----------------+
                                   \----------------- A

Another example, this time rerooting along an edge subtending a tip instead of an internal edge:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy

tree_str = "[&R] (A, (B, (C, (D, E))));"

tree = dendropy.Tree.get(
        data=tree_str,
        schema="newick")

print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
node_D = tree.find_node_with_taxon_label("D")
tree.reroot_at_edge(node_D.edge, update_bipartitions=False)
print("After:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())

which results in:

Before:
[&R] (A,(B,(C,(D,E))));

/---------------------------------------------------- A
+
|            /--------------------------------------- B
\------------+
            |             /-------------------------- C
             \------------+
                          |            /------------- D
                          \------------+
                                       \------------- E


After:
[&R] (D,(E,(C,(B,A))));

/---------------------------------------------------- D
+
|            /--------------------------------------- E
\------------+
             |            /-------------------------- C
             \------------+
                          |            /------------- B
                          \------------+
                                       \------------- A

To reroot a Tree at a node instead, you can use the reroot_at_node method:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy

tree_str = "[&R] (A, (B, (C, (D, E))));"

tree = dendropy.Tree.get(
        data=tree_str,
        schema="newick")

print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
mrca = tree.mrca(taxon_labels=["D", "E"])
tree.reroot_at_node(mrca, update_bipartitions=False)
print("After:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())

which results in:

Before:
[&R] (A,(B,(C,(D,E))));

/---------------------------------------------------- A
+
|            /--------------------------------------- B
\------------+
             |            /-------------------------- C
             \------------+
                          |            /------------- D
                          \------------+
                                       \------------- E


After:
[&R] (D,E,(C,(B,A)));

/---------------------------------------------------- D
|
+---------------------------------------------------- E
|
|                /----------------------------------- C
\----------------+
                 |                 /----------------- B
                 \-----------------+
                                   \----------------- A

You can also reroot the tree such that a particular node is moved to the outgroup position using the to_outgroup_position, which takes a Node as the first argument. Again, you can update the splits hashes in situ by passing True to the second argument, and again, here we do not because we are not carrying out any calculations. For example:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy

tree_str = "[&R] (A, (B, (C, (D, E))));"

tree = dendropy.Tree.get(
    data=tree_str,
    schema="newick")

print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
outgroup_node = tree.find_node_with_taxon_label("C")
tree.to_outgroup_position(outgroup_node, update_bipartitions=False)
print("After:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())

which will result in:

Before:
[&R] (A,(B,(C,(D,E))));

/---------------------------------------------------- A
+
|            /--------------------------------------- B
\------------+
             |            /-------------------------- C
             \------------+
                          |            /------------- D
                          \------------+
                                       \------------- E


After:
[&R] (C,(D,E),(B,A));

/---------------------------------------------------- C
|
|                         /-------------------------- D
+-------------------------+
|                         \-------------------------- E
|
|                         /-------------------------- B
\-------------------------+
                          \-------------------------- A

If you have a tree with edge lengths specified, you can reroot it at the midpoint, using the reroot_at_midpoint method:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy

tree_str = "[&R] (A:0.55, (B:0.82, (C:0.74, (D:0.42, E:0.64):0.24):0.15):0.20):0.3;"

tree = dendropy.Tree.get(
        data=tree_str,
        schema="newick")

print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot(plot_metric='length'))
tree.reroot_at_midpoint(update_bipartitions=False)
print("After:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot(plot_metric='length'))

which results in:

Before:
[&R] (A:0.55,(B:0.82,(C:0.74,(D:0.42,E:0.64):0.24):0.15):0.2):0.3;

          /------------------- A
          +
          |      /---------------------------- B
          \------+
                 |    /-------------------------- C
                 \----+
                      |        /-------------- D
                      \--------+
                               \---------------------- E


After:
[&R] ((C:0.74,(D:0.42,E:0.64):0.24):0.045,(B:0.82,A:0.75):0.105):0.3;

               /------------------------------- C
             /-+
             | |         /------------------ D
             | \---------+
             +           \---------------------------- E
             |
             |   /------------------------------------ B
             \---+
                 \-------------------------------- A

Pruning Subtrees and Tips

To remove a set of tips from a Tree, you can use either the prune_taxa or the prune_taxa_with_labels methods. The first takes a container of TaxonNamespace objects as an argument, while the second takes container of strings. In both cases, nodes associated with the specified taxa (as given by the TaxonNamespace objects directly in the first case, or TaxonNamespace objects with labels given in the list of string in the second case) will be removed from the tree. For example:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy

tree_str = "[&R] ((A, (B, (C, (D, E)))),(F, (G, H)));"

tree = dendropy.Tree.get(
        data=tree_str,
        schema="newick")
print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
tree.prune_taxa_with_labels(["A", "C", "G"])
print("After:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())

which results in:

Before:
[&R] ((A,(B,(C,(D,E)))),(F,(G,H)));

          /------------------------------------------- A
/---------+
|         |          /-------------------------------- B
|         \----------+
|                    |          /--------------------- C
|                    \----------+
+                               |          /---------- D
|                               \----------+
|                                          \---------- E
|
|                               /--------------------- F
\-------------------------------+
                                |          /---------- G
                                \----------+
                                           \---------- H


After:
[&R] ((B,(D,E)),(F,H));

                  /----------------------------------- B
/-----------------+
|                 |                 /----------------- D
|                 \-----------------+
+                                   \----------------- E
|
|                                   /----------------- F
\-----------------------------------+
                                    \----------------- H

Alternatively, the tree can be pruned based on a set of taxa that you want to keep. This can be affected through the use of the counterpart “retain” methods, retain_taxa and retain_taxa_with_labels. For example:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy

tree_str = "[&R] ((A, (B, (C, (D, E)))),(F, (G, H)));"

tree = dendropy.Tree.get(
        data=tree_str,
        schema="newick")

print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
tree.retain_taxa_with_labels(["A", "C", "G"])
print("After:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())

which results in:

Before:
[&R] ((A,(B,(C,(D,E)))),(F,(G,H)));

          /------------------------------------------- A
/---------+
|         |          /-------------------------------- B
|         \----------+
|                    |          /--------------------- C
|                    \----------+
+                               |          /---------- D
|                               \----------+
|                                          \---------- E
|
|                               /--------------------- F
\-------------------------------+
                                |          /---------- G
                                \----------+
                                           \---------- H


After:
[&R] ((A,C),G);

                           /-------------------------- A
/--------------------------+
+                          \-------------------------- C
|
\----------------------------------------------------- G

Again, it should be noted that, as these operations modify the structure of the tree, you need to call update_bipartitions to update the internal splits hashes, before carrying out any calculations, comparisons, or metrics.

Extracting Trees and Subtrees from an Existing Tree

If you just need a tree structure, i.e. the nodes and edges, and minimal other attributes, such as taxon associations, node and edge labels, and edge lengths, the following methods provide fast and powerful ways to give you copies of the current tree or parts of the current tree:

All these methods create a “thin” or “light” copy of the tree, optionally pruning out nodes based on criteria.

The basic extract_tree method returns a full clone of the structure of the source tree:

import dendropy
from dendropy.calculate import treecompare

tree0 = dendropy.Tree.get(
        path="pythonidae.mle.nex",
        schema="nexus")
for idx, nd in enumerate(tree0):
    nd.label = "hello, world{}".format(idx)
    nd.edge.label = "world, hello{}".format(idx)
    nd.annotations["color"] = "blue"
    nd.edge.annotations["taste"] = "sweet"
tree1 = tree0.extract_tree()

assert tree0.taxon_namespace is tree1.taxon_namespace
assert treecompare.weighted_robinson_foulds_distance(
        tree0, tree1) == 0.0

for nd in tree1:
    original_node = nd.extraction_source
    print("{} on extracted tree corresponds to {} on original tree".format(
        nd, original_node))
    ## basic attributes copied
    assert nd.label == original_node.label
    assert nd.edge.label == original_node.edge.label
    assert nd.edge.length == original_node.edge.length
    ## but not annotations
    assert len(nd.annotations) == 0 and len(original_node.annotations) > 0
    assert len(nd.edge.annotations) == 0 and len(original_node.edge.annotations) > 0


As can be seen, the nodes on the extracted tree have an attribute set on them, “extraction_source” that point to the corresponding node on the original tree. Note how while labels, taxon associations and edge lengths are copied, the metadata annotations are not.

The node_filter_fn argument provides a powerful and flexible way to selective pull out parts of the tree that might interest you:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy

tree0 = dendropy.Tree.get(
        path="pythonidae.mle.nex",
        schema="nexus")

node_filter_fn = lambda nd: nd.taxon is None or nd.taxon.label.startswith("Morelia")
tree1 = tree0.extract_tree(node_filter_fn=node_filter_fn)
print(tree1.as_string("newick"))

If you do not need the annotations, then this approach can be dramatically more performant than cloning and then pruning the tree. For example, the following:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy
import timeit

tree0 = dendropy.Tree.get(
        path="pythonidae.mle.nex",
        schema="nexus")
taxa_to_prune = set(taxon for taxon in tree0.taxon_namespace
        if taxon.label.startswith("Morelia"))

def f1():
    tree1 = dendropy.Tree(tree0)
    tree1.prune_taxa(taxa_to_prune)

def f2():
    tree1 = tree0.extract_tree(
            node_filter_fn=lambda taxon: taxon not in taxa_to_prune)

t1 = timeit.Timer(f1)
print(min(t1.repeat(10,10))/10)

t2 = timeit.Timer(f2)
print(min(t2.repeat(10,10))/10)

results in:

0.00191879272461
0.000579190254211

Some convenience wrappers around the extract_tree method allow you to more easily pull out or drop taxa of interest, either by passing in the Taxon instances directly or their labels:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy
from dendropy.calculate import treecompare

tree0 = dendropy.Tree.get(
        path="pythonidae.mle.nex",
        schema="nexus")

morelia_taxa = set(taxon for taxon in tree0.taxon_namespace
        if taxon.label.startswith("Morelia"))
morelia_labels = set([t.label for t in morelia_taxa])
non_morelia_taxa = set(taxon for taxon in tree0.taxon_namespace
        if not taxon.label.startswith("Morelia"))
non_morelia_labels = set([t.label for t in non_morelia_taxa])

tree1 = tree0.extract_tree_with_taxa(taxa=morelia_taxa)
tree2 = tree0.extract_tree_with_taxa_labels(labels=morelia_labels)
tree3 = tree0.extract_tree_without_taxa(taxa=non_morelia_taxa)
tree4 = tree0.extract_tree_without_taxa_labels(labels=non_morelia_labels)

print tree1.as_string("newick")
print tree2.as_string("newick")
print tree3.as_string("newick")
print tree4.as_string("newick")

assert treecompare.weighted_robinson_foulds_distance(tree1, tree2) == 0.0
assert treecompare.weighted_robinson_foulds_distance(tree2, tree3) == 0.0
assert treecompare.weighted_robinson_foulds_distance(tree3, tree4) == 0.0

You can also use the extract_tree method to created a “casted” copy, i.e. a (light) copy of the tree using your own custom classes instead of DendroPy’s native Tree and Node classes:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy


# Our custom node class. We do not actually need to derive from dendropy.Node.
# Doing so, though, ensures that all the required methods exist instead of
# having to write them ourselves.
class PhyloNode(dendropy.Node):
    pass

# Our custom tree class. Note that the definition of node factory
# (class-)method here ensures that nodes created are of the type we we want.
# Many DendroPy methods, include 'Tree.extract_tree()' check and preferentially
# use the return value of tree's class' ``node_factory()`` method when building
# trees. Note also that we do not actually need to derive from dendropy.Tree.
# Doing so, though, ensures that all the required methods exist instead of
# having to write them ourselves.
class PhyloTree(dendropy.Tree):
    @classmethod
    def node_factory(cls, *args, **kwargs):
        return PhyloNode(*args, **kwargs)

# The original tree using dendropy.Tree for the tree and dendropy.Node for the
# nodes.
tree0 = dendropy.Tree.get(
        data="(a,(b,(c,d)));",
        schema="newick",
        )
print(type(tree0)) # dendropy.Tree
for nd in tree0:
    print(type(nd)) # dendropy.Node

# Default extraction: use dendropy.Tree for tree
# and dendropy.Node for node
tree1 = tree0.extract_tree()
print(type(tree1)) # dendropy.Tree
for nd in tree1:
    print(type(nd)) # dendropy.Node

# Node factory defaults to ``node_factory`` method
# of instance returned by ``tree_factory`` if
# ``tree_factory is specified.
tree2 = tree0.extract_tree(tree_factory=PhyloTree)
print(type(tree2)) # PhyloTree
for nd in tree2:
    print(type(nd)) # PhyloNode

# equivalent to above
tree3 = tree0.extract_tree(tree_factory=PhyloTree,
        node_factory=PhyloNode)
print(type(tree3)) # PhyloTree
for nd in tree3:
    print(type(nd)) # PhyloNode

# Use dendropy.Tree for tree but
# PhyloNode for node
tree4 = tree0.extract_tree(node_factory=PhyloNode)
print(type(tree4)) # dendropy.Tree
for nd in tree4:
    print(type(nd)) # PhyloNode

Rotating

You can ladderize trees (sort the child nodes in order of the number of their children) by calling the ladderize method. This method takes one argument, ascending. If ascending=True, which is the default, then the nodes are sorted in ascending order (i.e., nodes with fewer children sort before nodes with more children). If ascending=False, then the nodes are sorted in descending order (i.e., nodes with more children sorting before nodes with fewer children). For example:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy

tree_str = "[&R] ((A, (B, (C, (D, E)))),(F, (G, H)));"

tree = dendropy.Tree.get(
        data=tree_str,
        schema="newick")

print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
tree.ladderize(ascending=True)
print("Ladderize, ascending=True:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
tree.ladderize(ascending=False)
print("Ladderize, ascending=False:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())

results in:

Before:
[&R] ((A,(B,(C,(D,E)))),(F,(G,H)));

          /------------------------------------------- A
/---------+
|         |          /-------------------------------- B
|         \----------+
|                    |          /--------------------- C
|                    \----------+
+                               |          /---------- D
|                               \----------+
|                                          \---------- E
|
|                               /--------------------- F
\-------------------------------+
                                |          /---------- G
                                \----------+
                                           \---------- H


Ladderize, ascending=True:
[&R] ((F,(G,H)),(A,(B,(C,(D,E)))));

                                /--------------------- F
/-------------------------------+
|                               |          /---------- G
|                               \----------+
+                                          \---------- H
|
|         /------------------------------------------- A
\---------+
          |          /-------------------------------- B
          \----------+
                     |          /--------------------- C
                     \----------+
                                |          /---------- D
                                \----------+
                                           \---------- E


Ladderize, ascending=False:
[&R] (((((D,E),C),B),A),((G,H),F));

                                           /---------- D
                                /----------+
                     /----------+          \---------- E
                     |          |
          /----------+          \--------------------- C
          |          |
/---------+          \-------------------------------- B
|         |
|         \------------------------------------------- A
+
|                                          /---------- G
|                               /----------+
\-------------------------------+          \---------- H
                                |
                                \--------------------- F

Tree rotation operations do not actually change the tree structure, at least in so far as splits are concerned, so it is not neccessary to update the splits hashes.