Phylogenetic Character Analyses

Phylogenetic Independent Contrasts (PIC)

Basic Analysis

A phylogenetic independent contrasts analysis (Felsenstein 1985; Garland et al. 2005) can be carried out using the PhylogeneticIndependentConstrasts class. This requires you to have a Tree and a ContinuousCharacterMatrix which reference the same TaxonNamespace. Thus, if your data is in the same file:

import dendropy
dataset = dendropy.DataSet.get(
    path="primates.cc.combined.nex",
    schema="nexus")
tree = dataset.tree_lists[0][0]
chars = dataset.char_matrices[0]

While if you have the tree and characters in a different file:

import dendropy
taxa = dendropy.TaxonNamespace()
tree = dendropy.Tree.get(
    path="primates.cc.tre",
    schema="newick",
    taxon_namespace=taxa)
chars = dendropy.ContinuousCharacterMatrix.get(
    path="primates.cc.nex",
    schema="nexus",
    taxon_namespace=taxa)

In either case, we have a Tree object, tree and a ContinuousCharacterMatrix object, chars, that both reference the same TaxonNamespace.

Once the data is loaded, we create the PhylogeneticIndependentConstrasts object:

>>> from dendropy import continuous
>>> pic = continuous.PhylogeneticIndependentContrasts(tree=tree, char_matrix=chars)

At this point, the data is ready for analysis. Typically, we want to map the contrasts onto a tree. The contrasts_tree method takes a single mandatory argument, the 0-based index of the character (or column) to be analyzed, and returns a Tree object that is a clone of the original input Tree, but with the following attributes added to each Node:

  • pic_state_value
  • pic_state_variance
  • pic_contrast_raw
  • pic_contrast_variance
  • pic_contrast_standardized
  • pic_edge_length_error
  • pic_corrected_edge_length

In addition to the 0-based index first argument, character_index, the contrasts_tree method takes the following optional arguments:

annotate_pic_statistics
If True then the PIC statistics attributes will be annotated (i.e., serialized or persisted when the tree is written out or saved. Defaults to False.
state_values_as_node_labels
If True then the label attribute of each Node object will be set to the value of the character.
corrected_edge_lengths
If True then the Tree returned will have its edge lengths adjusted to the corrected edge lengths as yielded by the PIC analysis.

Results as a Table

So the following retrieves the constrasts tree for the first character (index=0), and prints a table of the various statistics:

>>> ctree1 = pic.contrasts_tree(character_index=0,
...     annotate_pic_statistics=True,
...     state_values_as_node_labels=False,
...     corrected_edge_lengths=False)
>>> for nd in ctree1.postorder_internal_node_iter():
...     row = [nd.pic_state_value,
...             nd.pic_state_variance,
...             nd.pic_contrast_raw,
...             nd.pic_edge_length_error]
...     row_str = [(("%10.8f") % i) for i in row]
...     row_str = "    ".join(row_str)
...     label = nd.label.ljust(6)
...     print "%s %s" % (label, row_str)
HP     3.85263000    0.38500000    0.48342000    0.10500000
HPM    3.20037840    0.34560000    1.48239000    0.21560000
HPMA   2.78082358    0.60190555    1.17222840    0.22190555
Root   1.18372461    0.37574347    4.25050358    0.37574347

Results as a Newick String with State Values as Node Labels

Alternatively, you might want to visualize the results as a tree showing the numeric values of the states. The following produces this for each character in the matrix by first requesting that contrasts_tree replace existing node labels with the state values for that node, and then, when writing out in Newick format, suppressing taxon labels and printing node labels in their place:

import dendropy
from dendropy.model import continuous

taxa = dendropy.TaxonNamespace()
tree = dendropy.Tree.get(
        path="primates.cc.tre",
        schema="newick",
        taxon_namespace=taxa)
chars = dendropy.ContinuousCharacterMatrix.get_from_path(
        "primates.cc.nex",
        "nexus",
        taxon_namespace=taxa)
pic = continuous.PhylogeneticIndependentConstrasts(
        tree=tree,
        char_matrix=chars)
for cidx in range(chars.vector_size):
    ctree1 = pic.contrasts_tree(character_index=cidx,
            annotate_pic_statistics=True,
            state_values_as_node_labels=True,
            corrected_edge_lengths=False)
    print(ctree1.as_string("newick",
                suppress_leaf_taxon_labels=True,
                suppress_leaf_node_labels=False,
                suppress_internal_taxon_labels=True,
                suppress_internal_node_labels=False))










This results in:

[&R] ((((4.09434:0.21,3.61092:0.21)3.85263:0.28,2.37024:0.49)3.2003784:0.13,2.02815:0.62)2.78082357912:0.38,'-1.46968':1.0)1.1837246134:0.0;

[&R] ((((4.74493:0.21,3.3322:0.21)4.038565:0.28,3.3673:0.49)3.7432084:0.13,2.89037:0.62)3.43796714996:0.38,2.30259:1.0)3.01135659943:0.0;

Results as a NEXUS Document with Analysis Statistics as Node Metadata

However, probably the best way to visualize the results would be as a tree marked up with metadata that can be viewed in FigTree (by checking “Node Labels” and selecting the appropriate statistics from the drop-down menu). This is, in fact, even easier to do than the above, as it will result from the default options. The following illustrates this. It collects the metadata-annotated contrast analysis trees produced by contrasts_tree in a TreeList object, and then prints the TreeList as NEXUS-formatted string. The default options to contrasts_tree result in annotated attributes, while the default options to the writing method result in the annotations being written out as comment metadata.

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy
from dendropy.model import continuous

taxa = dendropy.TaxonNamespace()
tree = dendropy.Tree.get(
        path="primates.cc.tre",
        schema="newick",
        taxon_namespace=taxa)
chars = dendropy.ContinuousCharacterMatrix.get(
        path="primates.cc.nex",
        schema="nexus",
        taxon_namespace=taxa)
pic = dendropy.continuous.PhylogeneticIndependentConstrasts(
        tree=tree,
        char_matrix=chars)
pic_trees = dendropy.TreeList(taxon_namespace=taxa)
for cidx in range(chars.vector_size):
    ctree1 = pic.contrasts_tree(character_index=cidx)
    ctree1.label = "PIC %d" % (cidx+1)
    pic_trees.append(ctree1)
print(pic_trees.as_string(schema="nexus"))









Thus, we get:

#NEXUS


BEGIN TAXA;
    DIMENSIONS NTAX=5;
    TAXLABELS
        Homo
        Pongo
        Macaca
        Ateles
        Galago
;
END;

BEGIN TREES;
    TREE PIC_1 = [&R] ((((Homo:0.21[&pic_contrast_variance=None,pic_edge_length_error=0.0,pic_state_variance=None,pic_corrected_edge_length=0.21,pic_state_value=4.09434,pic_contrast_standardized=None,pic_contrast_raw=None],Pongo:0.21[&pic_contrast_variance=None,pic_edge_length_error=0.0,pic_state_variance=None,pic_corrected_edge_length=0.21,pic_state_value=3.61092,pic_contrast_standardized=None,pic_contrast_raw=None])HP:0.28[&pic_contrast_variance=0.42,pic_edge_length_error=0.105,pic_state_variance=0.385,pic_corrected_edge_length=0.385,pic_state_value=3.85263,pic_contrast_standardized=0.745933254387,pic_contrast_raw=0.48342],Macaca:0.49[&pic_contrast_variance=None,pic_edge_length_error=0.0,pic_state_variance=None,pic_corrected_edge_length=0.49,pic_state_value=2.37024,pic_contrast_standardized=None,pic_contrast_raw=None])HPM:0.13[&pic_contrast_variance=0.875,pic_edge_length_error=0.2156,pic_state_variance=0.3456,pic_corrected_edge_length=0.3456,pic_state_value=3.2003784,pic_contrast_standardized=1.58474156959,pic_contrast_raw=1.48239],Ateles:0.62[&pic_contrast_variance=None,pic_edge_length_error=0.0,pic_state_variance=None,pic_corrected_edge_length=0.62,pic_state_value=2.02815,pic_contrast_standardized=None,pic_contrast_raw=None])HPMA:0.38[&pic_contrast_variance=0.9656,pic_edge_length_error=0.221905550953,pic_state_variance=0.601905550953,pic_corrected_edge_length=0.601905550953,pic_state_value=2.78082357912,pic_contrast_standardized=1.19292629182,pic_contrast_raw=1.1722284],Galago:1.0[&pic_contrast_variance=None,pic_edge_length_error=0.0,pic_state_variance=None,pic_corrected_edge_length=1.0,pic_state_value=-1.46968,pic_contrast_standardized=None,pic_contrast_raw=None])Root:0.0[&pic_contrast_variance=1.60190555095,pic_edge_length_error=0.37574347039,pic_state_variance=0.37574347039,pic_corrected_edge_length=0.37574347039,pic_state_value=1.1837246134,pic_contrast_standardized=3.35831889583,pic_contrast_raw=4.25050357912];

    TREE PIC_2 = [&R] ((((Homo:0.21[&pic_contrast_variance=None,pic_edge_length_error=0.0,pic_state_variance=None,pic_corrected_edge_length=0.21,pic_state_value=4.74493,pic_contrast_standardized=None,pic_contrast_raw=None],Pongo:0.21[&pic_contrast_variance=None,pic_edge_length_error=0.0,pic_state_variance=None,pic_corrected_edge_length=0.21,pic_state_value=3.3322,pic_contrast_standardized=None,pic_contrast_raw=None])HP:0.28[&pic_contrast_variance=0.42,pic_edge_length_error=0.105,pic_state_variance=0.385,pic_corrected_edge_length=0.385,pic_state_value=4.038565,pic_contrast_standardized=2.17988971592,pic_contrast_raw=1.41273],Macaca:0.49[&pic_contrast_variance=None,pic_edge_length_error=0.0,pic_state_variance=None,pic_corrected_edge_length=0.49,pic_state_value=3.3673,pic_contrast_standardized=None,pic_contrast_raw=None])HPM:0.13[&pic_contrast_variance=0.875,pic_edge_length_error=0.2156,pic_state_variance=0.3456,pic_corrected_edge_length=0.3456,pic_state_value=3.7432084,pic_contrast_standardized=0.717612470209,pic_contrast_raw=0.671265],Ateles:0.62[&pic_contrast_variance=None,pic_edge_length_error=0.0,pic_state_variance=None,pic_corrected_edge_length=0.62,pic_state_value=2.89037,pic_contrast_standardized=None,pic_contrast_raw=None])HPMA:0.38[&pic_contrast_variance=0.9656,pic_edge_length_error=0.221905550953,pic_state_variance=0.601905550953,pic_corrected_edge_length=0.601905550953,pic_state_value=3.43796714996,pic_contrast_standardized=0.867896862104,pic_contrast_raw=0.8528384],Galago:1.0[&pic_contrast_variance=None,pic_edge_length_error=0.0,pic_state_variance=None,pic_corrected_edge_length=1.0,pic_state_value=2.30259,pic_contrast_standardized=None,pic_contrast_raw=None])Root:0.0[&pic_contrast_variance=1.60190555095,pic_edge_length_error=0.37574347039,pic_state_variance=0.37574347039,pic_corrected_edge_length=0.37574347039,pic_state_value=3.01135659943,pic_contrast_standardized=0.897060422518,pic_contrast_raw=1.13537714996];

END;

Multifurcating Trees and Polytomies

By default, the PhylogeneticIndependentConstrasts class only handles fully-bifurcating trees, and throws an exception if the input tree has polytomies. You can change this behavior by specifying one of the following strings to the “polytomy_strategy” argument of the class constructor:

ignore

Polytomies will handled without complaint:

>>> pic = dendropy.model.continuous.PhylogeneticIndependentContrasts(tree=tree,
...        char_matrix=chars,
...        polytomy_strategy='ignore')

Note that in this case the raw contrast and the raw contrast variance calculated for nodes that have more than two children will be invalid. The reconstructed state values should be still valid, though.

resolve

Polytomies will be arbitrarily resolved with 0-length branches:

>>> pic = dendropy.model.continuous.PhylogeneticIndependentContrasts(tree=tree,
...        char_matrix=chars,
...        polytomy_strategy='resolve')

In this case this validity of the analysis for nodes with (originally) more than two children is dubious, as the resulting contrasts are non-independent.