Trees

The Tree Class

Trees in DendroPy are represented by objects of the class Tree. Trees consist of a collection of nodes, represented by objects of the class Node, connected to each other in parent-child or ancestor-descendent relationships by objects of the class Edge. The first or initial node of a Tree is the head of the data structure, and is represented by the seed_node attribute of a Tree object. If the tree is rooted, then this is the root node. If the tree is unrooted, however, then this is an artificial node that only serves as the initial node when iterating over a tree in preorder or the final node when iterating over the tree in postorder, but does not have any informational significance by itself: it is an algorithmic artifact.

The seed_node attribute of a Tree object, like every other node on the tree, is an object of the Node class. Every Node object maintains a list of its immediate child Node objects as well as a reference to its parent Node object. You can iterate over the child of a particular Node object using the child_node_iter method, get a shallow-copy list of child Node objects using the child_nodes method, or access the parent Node object directly through the parent_node attribute of the Node. By definition, the seed_node has no parent node, leaf nodes have no child nodes, and term:internal nodes have both parent nodes and child nodes.

Every Node object has an attribute, edge, which is an Edge object representing the edge that is incident to or subtends the node represented by that Node object. Each Edge, in turn, has an attribute, head_node, which is the Node object representing the node that the edge subtends.

The Tree, Node, and Edge classes all have “annotations” as an attribute, which is a AnnotationSet object, i.e. a collection of Annotation instances tracking metadata. More information on working with metadata can be found in the “Working with Metadata Annotations” section.

Reading and Writing Tree Instances

The Tree class supports the “get” factory class method for simultaneously instantiating and populating a Tree instance, taking a data source as the first argument and a schema specification string (“nexus”, “newick”, “nexml”, “fasta”, or “phylip”, etc.) as the second:

import dendropy
tree = dendropy.Tree.get(
    path='pythonidae.mcmc.nex',
    schema='nexus')

A Tree object can be written to an external resource using the “write” method:

import dendropy
tree = dendropy.Tree.get(
    path="trees1.nex",
    schema="nexus",
    tree_offset=2,
    )
tree.write(
    path="trees1.newick",
    schema="newick",
    )

It can also be represented as a string using the “as_string” method:

import dendropy
tree = dendropy.Tree.get(
    path="trees1.nex",
    schema="nexus",
    )
print(tree.as_string(schema="newick",)

More information on reading operations is available in the Reading and Writing Phylogenetic Data section.

Cloning/Copying a Tree

You can make a “taxon namespace-scoped” copy of a Tree instance, i.e., where all Node and associated Edge instances of a Tree are cloned, but references to Taxon objects are preserved, you can call dendropy.datamodel.treemodel.Tree.clone with a “depth” argument value of 1 or by copy construction:

import dendropy

# original list
s1 = "(A,(B,C));"
tree1 = dendropy.Tree.get(
        data=s1,
        schema="newick")

# taxon namespace-scoped deep copy by calling Tree.clone(1)
# I.e. Everything cloned, but with Taxon and TaxonNamespace references shared
tree2 = tree1.clone(depth=1)

# taxon namespace-scoped deep copy by copy-construction
# I.e. Everything cloned, but with Taxon and TaxonNamespace references shared
tree3 = dendropy.Tree(tree1)

# *different* tree instances, with different nodes and edges
for nd1, nd2, nd3 in zip(tree1, tree2, tree3):
    assert nd1 is not nd2
    assert nd1 is not nd3
    assert nd2 is not nd3

# Note: TaxonNamespace is still shared
# I.e. Everything cloned, but with Taxon and TaxonNamespace references shared
assert tree2.taxon_namespace is tree1.taxon_namespace
assert tree3.taxon_namespace is tree1.taxon_namespace

For a true and complete deep-copy, where even the Taxon and TaxonNamespace references are copied, call copy.deepcopy:

import copy
import dendropy

# original list
s1 = "(A,(B,C));"
tree1 = dendropy.Tree.get(
        data=s1,
        schema="newick")

# Full deep copy by calling copy.deepcopy()
# I.e. Everything cloned including Taxon and TaxonNamespace instances
tree2 = copy.deepcopy(tree1)

# *different* tree instances
for nd1, nd2 in zip(tree1, tree2):
    assert nd1 is not nd2

# Note: TaxonNamespace is also different
assert tree2.taxon_namespace is not tree1.taxon_namespace
for tx1 in tree1.taxon_namespace:
    assert tx1 not in tree2.taxon_namespace
for tx2 in tree2.taxon_namespace:
    assert tx2 not in tree1.taxon_namespace

Alternatively, many times you want a “light” or “thin” copy, where just the tree structure (node and edge relationships) and basic information are retained (e.g., edge lengths, taxon associations, and node and edge labels), but not, e.g. the rich annotations. Then the extract_tree method is what you are looking for:

import dendropy
from dendropy.calculate import treecompare

tree0 = dendropy.Tree.get(
        path="pythonidae.mle.nex",
        schema="nexus")
for idx, nd in enumerate(tree0):
    nd.label = "hello, world{}".format(idx)
    nd.edge.label = "world, hello{}".format(idx)
    nd.annotations["color"] = "blue"
    nd.edge.annotations["taste"] = "sweet"
tree1 = tree0.extract_tree()

assert tree0.taxon_namespace is tree1.taxon_namespace
assert treecompare.weighted_robinson_foulds_distance(
        tree0, tree1) == 0.0

for nd in tree1:
    original_node = nd.extraction_source
    print("{} on extracted tree corresponds to {} on original tree".format(
        nd, original_node))
    ## basic attributes copied
    assert nd.label == original_node.label
    assert nd.edge.label == original_node.edge.label
    assert nd.edge.length == original_node.edge.length
    ## but not annotations
    assert len(nd.annotations) == 0 and len(original_node.annotations) > 0
    assert len(nd.edge.annotations) == 0 and len(original_node.edge.annotations) > 0


Tree Traversal

Iterating Over Nodes

The following example shows how you might evolve a continuous character on a tree by recursively visting each node, and setting the value of the character to one drawn from a normal distribution centered on the value of the character of the node’s ancestor and standard deviation given by the length of the edge subtending the node:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
#! /usr/bin/env python
# -*- coding: utf-8 -*-

import random
import dendropy

def process_node(node, start=1.0):
    if node.parent_node is None:
        node.value = start
    else:
        node.value = random.gauss(node.parent_node.value, node.edge.length)
    for child in node.child_nodes():
        process_node(child)
    if node.taxon is not None:
        print("%s : %s" % (node.taxon, node.value))

mle = dendropy.Tree.get(
    path='pythonidae.mle.nex',
    schema='nexus')
process_node(mle.seed_node)

While the previous example works, it is probably clearer and more efficient to use one of the pre-defined node iterator methods:

preorder_node_iter
Iterates over nodes in a Tree object in a depth-first search pattern, i.e., “visiting” a node before visiting the children of the node. This is the same traversal order as the previous example. This traversal order is useful if you require ancestral nodes to be processed before descendent nodes, as, for example, when evolving sequences over a tree.
postorder_node_iter
Iterates over nodes in a Tree object in a postorder search pattern, i.e., visiting the children of the node before visiting the node itself. This traversal order is useful if you require descendent nodes to be processed before ancestor nodes, as, for example, when calculating ages of nodes.
level_order_node_iter
Iterates over nodes in a Tree object in a breadth-first search pattern, i.e., every node at a particular level is visited before proceeding to the next level.
leaf_node_iter
Iterates over the leaf or tip nodes of a Tree object.

The previous example would thus be better implemented as follows:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import random
import dendropy

def evolve_char(tree, start=1.0):
    for node in tree.preorder_node_iter():
        if node.parent_node is None:
            node.value = 1.0
        else:
            node.value = random.gauss(node.parent_node.value, node.edge.length)
    return tree

mle = dendropy.Tree.get(
        path='pythonidae.mle.nex',
        schema='nexus')
evolve_char(mle)
for node in mle.leaf_iter():
    print("%s : %s" % (node.taxon, node.value))

The nodes returned by each of these iterators can be filtered if a filter function is passed as a second argument to the iterator. This filter function should take a Node object as an argument, and return True if the node is to be returned or False if it is not. For example, the following iterates over all nodes that have more than two children:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy

mle = dendropy.Tree.get(
    path='pythonidae.mle.nex',
    schema='nexus')
multifurcating = lambda x: True if len(x.child_nodes()) > 2 else False
for nd in mle.postorder_node_iter(multifurcating):
    print(nd.description(0))

Iterating Over Edges

The Edge objects associated with each Node can be accessed through the edge attribute of the Node object. So it is possible to iterate over every edge on a tree by iterating over the nodes and referencing the edge attribute of the node when processing the node. But it is clearer and probably more convenient to use one of the Edge iterators:

preorder_edge_iter
Iterates over edges in a Tree object in a depth-first search pattern, i.e., “visiting” an edge before visiting the edges descending from that edge. This is the same traversal order as the previous example. This traversal order is useful if you require ancestral edges to be processed before descendent edges, as, for example, when calculating the sum of edge lengths from the root.
postorder_edge_iter
Iterates over edges in a Tree object in a postorder search pattern, i.e., visiting the descendents of the edge before visiting the edge itself. This traversal order is useful if you require descendent edges to be processed before ancestral edges, as, for example, when calculating the sum of edge lengths from the tip
level_order_edge_iter
Iterates over edges in a Tree object in a breadth-first search pattern, i.e., every edge at a particular level is visited before proceeding to the next level.

The following example sets the edge lengths of a tree to the proportions of the total tree length that they represent:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy

mle = dendropy.Tree.get(path='pythonidae.mle.nex', schema='nexus')
mle_len = mle.length()
for edge in mle.postorder_edge_iter():
    if edge.length is None:
        edge.length = 0
    else:
        edge.length = float(edge.length)/mle_len
print(mle.as_string(schema="newick"))

While this one removes the edge lengths entirely:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy

mle = dendropy.Tree.get(
    path='pythonidae.mle.nex',
    schema='nexus')
mle_len = mle.length()
for edge in mle.postorder_edge_iter():
    edge.length = None
print(mle.as_string(schema="newick"))

Like the node iterators, the edge iterators also optionally take a filter function as a second argument, except here the filter function should take an Edge object as an argument. The following example shows how you might iterate over all edges with lengths less than some value:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy

mle = dendropy.Tree.get(
    path='pythonidae.mle.nex',
    schema='nexus')
short = lambda edge: True if edge.length < 0.01 else False
for edge in mle.postorder_edge_iter(short):
    print(edge.length)

Finding Nodes on Trees

Nodes with Particular Taxa

To retrieve a node associated with a particular taxon, we can use the find_taxon_node method, which takes a filter function as an argument. The filter function should take a Taxon object as an argument and return True if the taxon is to be returned. For example:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy

tree = dendropy.Tree.get(path="pythonidae.mle.nex", schema="nexus")
filter = lambda taxon: True if taxon.label=='Antaresia maculosa' else False
node = tree.find_node_with_taxon(filter)
print(node.description())

Because we might find it easier to refer to Taxon objects by their labels, a convenience method that wraps the retrieval of nodes associated with Taxon objects of particular label is provided:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy

tree = dendropy.Tree.get(path="pythonidae.mle.nex", schema="nexus")
node = tree.find_node_with_taxon_label('Antaresia maculosa')
print(node.description())

Most Recent Common Ancestors

The MRCA (most recent common ancestor) of taxa or nodes can be retrieved by the instance method mrca. This method takes a list of Taxon objects given by the taxa keyword argument, or a list of taxon labels given by the taxon_labels keyword argument, and returns a Node object that corresponds to the MRCA of the specified taxa. For example:

import dendropy

tree = dendropy.Tree.get(path="pythonidae.mle.nex", schema="nexus")
taxon_labels=['Python sebae',
              'Python regius',
              'Python curtus',
              'Python molurus']
mrca = tree.mrca(taxon_labels=taxon_labels)
print(mrca.description())

Note that this method is inefficient when you need to resolve MRCA’s for multiple sets or pairs of taxa. In this context, the PhylogeneticDistanceMatrix offers a more efficient approach, and should be preferred for applications such as calculating the patristic distances between all pairs of taxa. An instance of this class will be returned when you call phylogenetic_distance_matrix:

import dendropy

tree = dendropy.Tree.get(
        path="pythonidae.mle.nex",
        schema="nexus")
pdm = tree.phylogenetic_distance_matrix()
for idx1, taxon1 in enumerate(tree.taxon_namespace):
    for taxon2 in tree.taxon_namespace:
        mrca = pdm.mrca(taxon1, taxon2)
        weighted_patristic_distance = pdm.patristic_distance(taxon1, taxon2)
        unweighted_patristic_distance = pdm.path_edge_count(taxon1, taxon2)
        print("'{}' vs '{}': {} (distance (weighted-edges, unweighted-edges) = {}, {})".format(
            taxon1.label,
            taxon2.label,
            mrca.bipartition.split_as_bitstring(),
            weighted_patristic_distance,
            unweighted_patristic_distance))

Note that the PhylogeneticDistanceMatrix object does not automatically update if the original Tree changes: it is essentially a snapshot of Tree at the point in which it is instantiated. If the original Tree changes, you should create a new instance of the corresponding PhylogeneticDistanceMatrix object.

Viewing and Displaying Trees

Sometimes it is useful to get a visual representation of a Tree.

For quick inspection, the print_plot will write an ASCII text plot to the standard output stream:

>>> t = dendropy.Tree.get_from_string("(A,(B,(C,D)));", "newick")
>>> t.print_plot()
/----------------------------------------------- A
+
|                /------------------------------ B
\----------------+
                 |          /------------------- C
                 \----------+
                            \------------------- D

If you need to store this representation as a string instead, you can use as_ascii_plot:

>>> s = t.as_ascii_plot()
>>> print(s)
/----------------------------------------------- A
+
|                /------------------------------ B
\----------------+
                 |          /------------------- C
                 \----------+
                            \------------------- D

You can also, as mentioned above, using the as_string method to represent a Tree as string in any format:

t = dendropy.Tree.get_from_string("(A,(B,(C,D)));", "newick")
print(t.as_string(schema="nexus"))
print(t.as_string(schema="newick"))

Building a Tree Programmatically

For example:

import dendropy

taxon_namespace = dendropy.TaxonNamespace(["A", "B", "C", "D",])
tree = dendropy.Tree(taxon_namespace=taxon_namespace)


# Create and add a new child node to the seed node,
# assigning it an edge length:
#
#     (seed)
#      /
#     /
#    ch1
#
ch1 = tree.seed_node.new_child()
ch1.edge.length = 1

# Can also assign edge length on construction:
#
#     (seed)
#      / \
#     /   \
#   ch1   ch2
#
ch2 = tree.seed_node.new_child(edge_length=1)

# Can also add an existing node as child
#
#       (seed)
#       /   \
#      /     \
#    ch1     ch2
#   /  \     /  \
#  ch3 ch4  ch5 ch6
ch3 = dendropy.Node(edge_length=1)
ch4 = dendropy.Node(edge_length=2)
ch1.add_child(ch3)
ch1.add_child(ch4)
ch5 = dendropy.Node(edge_length=1)
ch6 = dendropy.Node(edge_length=2)
# Note: this clears/deletes existing child nodes before adding the new ones;
ch2.set_child_nodes([ch5, ch6])

# Assign taxa
ch3.taxon = taxon_namespace.get_taxon("A")
ch4.taxon = taxon_namespace.get_taxon("B")
ch5.taxon = taxon_namespace.get_taxon("C")
ch6.taxon = taxon_namespace.get_taxon("D")

print(tree.as_string("newick"))
print(tree.as_ascii_plot())

produces the following:

((A:1,B:2):1,(C:1,D:2):1);

                                   /---------------------------------- A
/----------------------------------+
|                                  \---------------------------------- B
+
|                                  /---------------------------------- C
\----------------------------------+
                                   \---------------------------------- D