dendropy.simulate.treesim: Unified Namespace Aggregating Functions and Classes for Tree Simulations

This module provides a convenient interface that aggregates, wraps, and/or implements functions and classes that simulate trees under various models and processes. This module just exposes these function and classes under the dendropy.simulate.treesim namespace. The actual functions and classes are defined under the the appropriate model namespace in the dendropy.model sub-package.

dendropy.simulate.treesim.birth_death_tree(birth_rate, death_rate, birth_rate_sd=0.0, death_rate_sd=0.0, **kwargs)

Returns a birth-death tree with birth rate specified by birth_rate, and death rate specified by death_rate, with edge lengths in continuous (real) units.

Tree growth is controlled by one or more of the following arguments, of which at least one must be specified:

  • If num_extant_tips is given as a keyword argument, tree is grown until the number of EXTANT tips equals this number.
  • If num_extinct_tips is given as a keyword argument, tree is grown until the number of EXTINCT tips equals this number.
  • If num_total_tips is given as a keyword argument, tree is grown until the number of EXTANT plus EXTINCT tips equals this number.
  • If ‘max_time’ is given as a keyword argument, tree is grown for a maximum of max_time.
  • If gsa_ntax is given then the tree will be simulated up to this number of EXTANT tips (or 0 tips), then a tree will be randomly selected from the intervals which corresond to times at which the tree had exactly num_extant_tips leaves. This allows for simulations according to the “General Sampling Approach” of Hartmann et al. (2010). If this option is specified, then num_extant_tips MUST be specified and num_extinct_tips and num_total_tips CANNOT be specified.

If more than one of the above is given, then tree growth will terminate when any one of the termination conditions are met.

Parameters:
  • birth_rate (float) – The birth rate.
  • death_rate (float) – The death rate.
  • birth_rate_sd (float) – The standard deviation of the normally-distributed mutation added to the birth rate as it is inherited by daughter nodes; if 0, birth rate does not evolve on the tree.
  • death_rate_sd (float) – The standard deviation of the normally-distributed mutation added to the death rate as it is inherited by daughter nodes; if 0, death rate does not evolve on the tree.
Keyword Arguments:
 
  • num_extant_tips (int) – If specified, branching process is terminated when number of EXTANT tips equals this number.
  • num_extinct_tips (int) – If specified, branching process is terminated when number of EXTINCT tips equals this number.
  • num_total_tips (int) – If specified, branching process is terminated when number of EXTINCT plus EXTANT tips equals this number.
  • max_time (float) – If specified, branching process is terminated when time reaches or exceeds this value.
  • gsa_ntax (int) – The General Sampling Approach threshold for number of taxa. See above for details.
  • tree (Tree instance) – If given, then this tree will be used; otherwise a new one will be created.
  • taxon_namespace (TaxonNamespace instance) – If given, then this will be assigned to the new tree, and, in addition, taxa assigned to tips will be sourced from or otherwise created with reference to this.
  • is_assign_extant_taxa (bool [default: True]) – If False, then taxa will not be assigned to extant tips. If True (default), then taxa will be assigned to extant tips. Taxa will be assigned from the specified taxon_namespace or tree.taxon_namespace. If the number of taxa required exceeds the number of taxa existing in the taxon namespace, new Taxon objects will be created as needed and added to the taxon namespace.
  • is_assign_extinct_taxa (bool [default: True]) – If False, then taxa will not be assigned to extant tips. If True (default), then taxa will be assigned to extant tips. Taxa will be assigned from the specified taxon_namespace or tree.taxon_namespace. If the number of taxa required exceeds the number of taxa existing in the taxon namespace, new Taxon objects will be created as needed and added to the taxon namespace. Note that this option only makes sense if extinct tips are retained (specified via ‘is_retain_extinct_tips’ option), and will otherwise be ignored.
  • is_add_extinct_attr (bool [default: True]) – If True (default), add an boolean attribute indicating whether or not a node is an extinct tip or not. False will skip this. Name of attribute is set by ‘extinct_attr_name’ argument, defaulting to ‘is_extinct’. Note that this option only makes sense if extinct tips are retained (specified via ‘is_retain_extinct_tips’ option), and will otherwise be ignored.
  • extinct_attr_name (str [default: 'is_extinct']) – Name of attribute to add to nodes indicating whether or not tip is extinct. Note that this option only makes sense if extinct tips are retained (specified via ‘is_retain_extinct_tips’ option), and will otherwise be ignored.
  • is_retain_extinct_tips (bool [default: False]) – If True, extinct tips will be retained on tree. Defaults to False: extinct lineages removed from tree.
  • repeat_until_success (bool [default: True]) – Under some conditions, it is possible for all lineages on a tree to go extinct. In this case, if this argument is given as True (the default), then a new branching process is initiated. If False (default), then a TreeSimTotalExtinctionException is raised.
  • rng (random.Random() or equivalent instance) – A Random() object or equivalent can be passed using the rng keyword; otherwise GLOBAL_RNG is used.

References

Hartmann, Wong, and Stadler “Sampling Trees from Evolutionary Models” Systematic Biology. 2010. 59(4). 465-476

dendropy.simulate.treesim.discrete_birth_death_tree(birth_rate, death_rate, birth_rate_sd=0.0, death_rate_sd=0.0, **kwargs)

Returns a birth-death tree with birth rate specified by birth_rate, and death rate specified by death_rate, with edge lengths in discrete (integer) units.

birth_rate_sd is the standard deviation of the normally-distributed mutation added to the birth rate as it is inherited by daughter nodes; if 0, birth rate does not evolve on the tree.

death_rate_sd is the standard deviation of the normally-distributed mutation added to the death rate as it is inherited by daughter nodes; if 0, death rate does not evolve on the tree.

Tree growth is controlled by one or more of the following arguments, of which at least one must be specified:

  • If ntax is given as a keyword argument, tree is grown until the number of tips == ntax.
  • If taxon_namespace is given as a keyword argument, tree is grown until the number of tips == len(taxon_namespace), and the taxa are assigned randomly to the tips.
  • If ‘max_time’ is given as a keyword argument, tree is grown for max_time number of generations.

If more than one of the above is given, then tree growth will terminate when any of the termination conditions (i.e., number of tips == ntax, or number of tips == len(taxon_namespace) or number of generations = max_time) are met.

Also accepts a Tree object (with valid branch lengths) as an argument passed using the keyword tree: if given, then this tree will be used; otherwise a new one will be created.

If assign_taxa is False, then taxa will not be assigned to the tips; otherwise (default), taxa will be assigned. If taxon_namespace is given (tree.taxon_namespace, if tree is given), and the final number of tips on the tree after the termination condition is reached is less then the number of taxa in taxon_namespace (as will be the case, for example, when ntax < len(taxon_namespace)), then a random subset of taxa in taxon_namespace will be assigned to the tips of tree. If the number of tips is more than the number of taxa in the taxon_namespace, new Taxon objects will be created and added to the taxon_namespace if the keyword argument create_required_taxa is not given as False.

Under some conditions, it is possible for all lineages on a tree to go extinct. In this case, if the keyword argument repeat_until_success is True, then a new branching process is initiated. If False (default), then a TreeSimTotalExtinctionException is raised.

A Random() object or equivalent can be passed using the rng keyword; otherwise GLOBAL_RNG is used.

dendropy.simulate.treesim.contained_coalescent_tree(containing_tree, gene_to_containing_taxon_map, edge_pop_size_attr='pop_size', default_pop_size=1, rng=None)

Returns a gene tree simulated under the coalescent contained within a population or species tree.

containing_tree
The population or species tree. If edge_pop_size_map is not None, and population sizes given are non-trivial (i.e., >1), then edge lengths on this tree are in units of generations. Otherwise edge lengths are in population units; i.e. 2N generations for diploid populations of size N, or N generations for diploid populations of size N.
gene_to_containing_taxon_map
A TaxonNamespaceMapping object mapping Taxon objects in the containing_tree TaxonNamespace to corresponding Taxon objects in the resulting gene tree.
edge_pop_size_attr
Name of attribute of edges that specify population size. By default this is “pop_size”. If this attribute does not exist, default_pop_size will be used. The value for this attribute should be the haploid population size or the number of genes; i.e. 2N for a diploid population of N individuals, or N for a haploid population of N individuals. This value determines how branch length units are interpreted in the input tree, containing_tree. If a biologically-meaningful value, then branch lengths on the containing_tree are properly read as generations. If not (e.g. 1 or 0), then they are in population units, i.e. where 1 unit of time equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise time is in generations. If this argument is None, then population sizes default to default_pop_size.
default_pop_size
Population size to use if edge_pop_size_attr is None or if an edge does not have the attribute. Defaults to 1.

The returned gene tree will have the following extra attributes:

pop_node_genes
A dictionary with nodes of containing_tree as keys and a list of gene tree nodes that are uncoalesced as values.

Note that this function does very much the same thing as constrained_kingman(), but provides a very different API.

dendropy.simulate.treesim.pure_kingman_tree(taxon_namespace, pop_size=1, rng=None)

Generates a tree under the unconstrained Kingman’s coalescent process.

Parameters:
  • taxon_namespace (TaxonNamespace instance) – A pre-populated TaxonNamespace where the contained Taxon instances represent the genes or individuals sampled from the population.
  • pop_size (numeric) – The size of the population from the which the coalescent process is sampled.
Returns:

t (|Tree|) – A tree sampled from the Kingman’s neutral coalescent.

dendropy.simulate.treesim.mean_kingman_tree(taxon_namespace, pop_size=1, rng=None)

Returns a tree with coalescent intervals given by the expected times under Kingman’s neutral coalescent.

dendropy.simulate.treesim.constrained_kingman_tree(pop_tree, gene_tree_list=None, rng=None, gene_node_label_fn=None, num_genes_attr='num_genes', pop_size_attr='pop_size', decorate_original_tree=False)

Given a population tree, pop_tree this will return a pair of trees: a gene tree simulated on this population tree based on Kingman’s n-coalescent, and population tree with the additional attribute ‘gene_nodes’ on each node, which is a list of uncoalesced nodes from the gene tree associated with the given node from the population tree.

pop_tree should be a DendroPy Tree object or an object of a class derived from this with the following attribute num_genes – the number of gene samples from each population in the present. Each edge on the tree should also have the attribute

pop_size_attr is the attribute name of the edges of pop_tree that specify the population size. By default it is pop_size. The should specify the effective haploid population size; i.e., number of gene in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals.

If pop_size is 1 or 0 or None, then the edge lengths of pop_tree is taken to be in haploid population units; i.e. where 1 unit equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise the edge lengths of pop_tree is taken to be in generations.

If gene_tree_list is given, then the gene tree is added to the tree block, and the tree block’s taxa block will be used to manage the gene tree’s taxa.

gene_node_label_fn is a function that takes two arguments (a string and an integer, respectively, where the string is the containing species taxon label and the integer is the gene index) and returns a label for the corresponding the gene node.

if decorate_original_tree is True, then the list of uncoalesced nodes at each node of the population tree is added to the original (input) population tree instead of a copy.

Note that this function does very much the same thing as contained_coalescent(), but provides a very different API.

dendropy.simulate.treesim.star_tree(taxon_namespace, **kwargs)

Builds and returns a star tree from the given taxa block.