dendropy.calculate.statistics: General Statistics

Functions to calculate some general statistics.

class dendropy.calculate.statistics.FishersExactTest(table)

Given a 2x2 table:

a b
c d

represented by a list of lists:

[[a,b],[c,d]]

this calculates the sum of the probability of this table and all others more extreme under the null hypothesis that there is no association between the categories represented by the vertical and horizontal axes.

left_tail_p()

Returns the sum of probabilities of this table and all others more extreme.

static probability_of_table(table)

Given a 2x2 table:

a b
c d

represented by a list of lists:

[[a,b],[c,d]]

this returns the probability of this table under the null hypothesis of no association between rows and columns, which was shown by Fisher to be a hypergeometric distribution:

p = ( choose(a+b, a) * choose(c+d, c) ) / choose(a+b+c+d, a+c)
right_tail_p()

Returns the sum of probabilities of this table and all others more extreme.

two_tail_p()

Returns the sum of probabilities of this table and all others more extreme.

dendropy.calculate.statistics.empirical_cdf(values, v)

Returns the proportion of values in values <= v.

dendropy.calculate.statistics.empirical_hpd(values, conf=0.05)

Assuming a unimodal distribution, returns the 0.95 highest posterior density (HPD) interval for a set of samples from a posterior distribution. Adapted from emp.hpd in the “TeachingDemos” R package (Copyright Greg Snow; licensed under the Artistic License).

dendropy.calculate.statistics.mean_and_population_variance(values)

Returns the mean and population variance while only passing over the elements in values once.

dendropy.calculate.statistics.mean_and_sample_variance(values)

Returns the mean and sample variance while only passing over the elements in values once.

dendropy.calculate.statistics.median(pool)

Returns median of sample. From: http://wiki.python.org/moin/SimplePrograms

dendropy.calculate.statistics.mode(values, bin_size=0.1)

Returns the mode of a set of values.

dendropy.calculate.statistics.quantile_5_95(values)

Returns 5% and 95% quantiles.

dendropy.calculate.statistics.rank(value_to_be_ranked, value_providing_rank)

Returns the rank of value_to_be_ranked in set of values, values. Works even if values is a non-orderable collection (e.g., a set). A binary search would be an optimized way of doing this if we can constrain values to be an ordered collection.

dendropy.calculate.statistics.summarize(values)

Summarizes a sample of values:

  • range : tuple pair representing minimum and maximum values
  • mean : mean of sample
  • median : median of sample
  • var : (sample) variance
  • sd : (sample) standard deviation
  • hpd95 : tuple pair representing 5% and 95% HPD
  • quant_5_95 : tuple pair representing 5% and 95% quantile
dendropy.calculate.statistics.variance_covariance(data, population_variance=False)

Returns the Variance-Covariance matrix for data. From: http://www.python-forum.org/pythonforum/viewtopic.php?f=3&t=17441