subscribe to our mailing list:
|
SECTIONS
|
|
|
|
29+ Evidences for Macroevolution
Some Statistics of Incongruent Phylogenetic TreesCopyright ©
1999-2004 by Douglas Theobald,
Ph.D.
Outline
The table below and the javascript calculator following it provide
values for the statistical significance of a match between two incongruent
phylogenetic trees, reported as P-values. These P-values give the
probability that two bifurcating rooted trees, with a given number (or less) of
mismatching branches, would match by chance.
The number of incongruent branches is determined relative to the maximum
agreement subtree (MAST) between two trees. A MAST is the "core" subtree that is
common between two trees. The number of incongruent branches is equal to the
minimum number of branches that must be pruned from one of the real trees to get
the MAST. An example from John Harshman's analysis of crocodile species is given
in the figure below (Harshman
et al. 2003).
Two incongruent crocodile phylogenies. The tree at
left is based upon morphological data; the tree at right on the molecular
sequence of the c-myc proto-oncogene (Harshman
et al. 2003). The common MAST is shown in black. According to
the distance metric described above, the distance between the two trees is
one branch, due to the misplaced Gavialis branch indicated in
magenta. The significance of the match between these two incongruent
phylogenies is P ≤ 0.00077. Additionally, Harshman et al.
performed an independent phylogenetic analysis with mitochondrial genes,
which gave exactly the same tree as the c-myc proto-oncogene data.
The overall significance for these three independent trees is P ≤
7.4 × 10-8. |
In the table below, the rows list values for a comparison of two trees with
increasing numbers of taxa. The columns list the significance for a given number
of differences between the two trees. Incongruency of "1 adjacent" refers to the
case where a branch is misplaced by only one adjacent node (i.e., two branches
next to each other are swapped relative to the other tree). The remaining
columns labelled 1 through 10 refer to the case where x branches or less
are misplaced anywhere in the tree. High statistical significance (P <
0.01, or greater than 99% confidence) is indicated by light blue. Statistical
significance (P < 0.05, or greater than 95% confidence) is indicated
by pink. Equivocal values (0.05 < P < 0.50) are indicated by white.
Highly insignificant values (P > 0.50) are indicated by red, and
impossible values are colored black.
Statistical Significance of Two Incongruent Phylogenetic
Trees
Number of taxa |
Maximum P-value for two trees
incongruent by given number of branches: |
exactly congruent |
1 adjacent |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
4 |
0.067 |
0.20 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
5 |
0.0095 |
0.038 |
0.28 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
6 |
0.0011 |
0.0052 |
0.050 |
0.97 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
7 |
9.6 x 10-5 |
5.8 x 10-4 |
0.0067 |
0.20 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
8 |
7.4 x 10-6 |
5.2 x 10-5 |
6.8 x 10-4 |
0.030 |
0.53 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
9 |
4.9 x 10-7 |
3.9 x 10-6 |
6.2 x 10-5 |
0.0035 |
0.089 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
10 |
2.9 x 10-8 |
2.6 x 10-7 |
4.6 x 10-6 |
3.3 x 10-4 |
0.012 |
0.22 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
11 |
1.5 x 10-9 |
1.5 x 10-8 |
3.0 x 10-7 |
2.7 x 10-5 |
0.0012 |
0.032 |
0.49 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
12 |
7.2 x 10-11 |
8.0 x 10-10 |
1.8 x 10-8 |
1.9 x 10-6 |
1.1 x 10-4 |
0.0037 |
0.076 |
0.98 |
1.00 |
1.00 |
1.00 |
1.00 |
13 |
3.1 x 10-12 |
3.8 x 10-11 |
9.1 x 10-10 |
1.2 x 10-7 |
8.3 x 10-6 |
3.5 x 10-4 |
0.0095 |
0.17 |
1.00 |
1.00 |
1.00 |
1.00 |
14 |
1.2 x 10-13 |
1.6 x 10-12 |
4.3 x 10-11 |
6.6 x 10-9 |
5.6 x 10-7 |
2.9 x 10-5 |
9.9 x 10-4 |
0.022 |
0.33 |
1.00 |
1.00 |
1.00 |
15 |
4.6 x 10-15 |
6.6 x 10-14 |
1.8 x 10-12 |
3.3 x 10-10 |
3.3 x 10-8 |
2.1 x 10-6 |
8.7 x 10-5 |
0.0025 |
0.048 |
0.62 |
1.00 |
1.00 |
16 |
1.6 x 10-16 |
2.4 x 10-15 |
5.6 x 10-14 |
1.5 x 10-11 |
1.8 x 10-9 |
1.3 x 10-7 |
6.7 x 10-6 |
2.3 x 10-4 |
0.0056 |
0.095 |
1.00 |
1.00 |
17 |
5.2 x 10-18 |
8.3 x 10-17 |
2.1 x 10-15 |
6.4 x 10-13 |
8.6 x 10-11 |
7.5 x 10-9 |
4.5 x 10-7 |
1.9 x 10-5 |
5.6 x 10-4 |
0.012 |
0.18 |
1.00 |
18 |
1.5 x 10-19 |
2.7 x 10-18 |
7.4 x 10-17 |
2.5 x 10-14 |
3.8 x 10-12 |
3.9 x 10-10 |
2.7 x 10-8 |
1.4 x 10-6 |
4.9 x 10-5 |
0.0013 |
0.024 |
0.32 |
19 |
4.5 x 10-21 |
8.1 x 10-20 |
2.3 x 10-18 |
8.9 x 10-16 |
1.6 x 10-13 |
1.8 x 10-11 |
1.5 x 10-9 |
8.6 x 10-8 |
3.7 x 10-6 |
1.2 x 10-4 |
0.0027 |
0.046 |
20 |
1.2 x 10-22 |
2.3 x 10-21 |
7.3 x 10-20 |
3.0 x 10-17 |
5.9 x 10-15 |
7.8 x 10-13 |
7.3 x 10-11 |
4.9 x 10-9 |
2.5 x 10-7 |
9.2 x 10-6 |
2.5 x 10-4 |
0.0054 |
Number of taxa |
exact match |
1 adjacent |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Mathematical Details
For an exact match between two trees (no incongruence):
P = (2N-2)(N-2)! / (2N-3)!
or
P = 1 / (2N-3)!!
where "!!" is double factorial
notation and N = # of taxa. For an incongruency of "1 adjacent" branch:
P = (2N-2)(N-1)! / (2N-3)!
For an incongruency of I branches, misplaced anywhere between
two trees:
P ≤ (2N-I-2)(N-I-2)!N! /
(2[N-I]-3)!(N-I)!I!
or
P ≤ (N!/(N-I)!I!) /
(2[N-I]-3)!!
where N = # of taxa and I = # of incongruent branches.
This last P-value calculation is an upper bound. That is, this
P-value is an overestimation, since the actual P-value is very
likely to be lower (better). P is the ratio of the maximum number of
possible incongruent trees over the total number of possible trees. However, in
the final equation the calculated maximum number of incongruent trees includes
nonunique trees (i.e., some of the incongruent trees have the same topology and
thus are counted more than once). For example, for N = 4 and I =
1, this calculation gives P ≤ 1.3333, while the exact P = 0.73333.
At large N and I, P converges on the exact value.
These equations can be extended easily to the case of discrepancies between
more than two trees, each of the same number of taxa. The probability that
k rooted, binary, N-taxa trees have at most I incongruent
branches is:
P ≤ (N!/(N-I)!I!) /
((2[N-I]-3)!!){k - 1}
Equivalently, this is the probability that two or more N-taxa trees
will share the same MAST of size N - I or greater. The Javascript
calculator above uses this equation to determine its P-values.
I would appreciate hearing from anyone who has any ideas on how to correct
for nonunique trees. I independently derived most of these equations in the
summer of 2002. Later I discovered via personal correspondence that Mike Steel
had also derived these equations and was soon to publish all but the last in an
upcoming book (Bryant
et al. 2002). It appears that the final equation was independently
derived by both me and Mike Steel, and to my knowledge it remains unpublished.
References
Li, W.-H. (1997). Molecular Evolution. Sunderland,
MA, Sinauer Associates. p. 102.
Bryant, D., MacKenzie, A. and Steel, M. (2002).
"The size of a maximum agreement subtree for random binary trees." In:
Bioconsensus II. DIMACS Series in Discrete Mathematics and Theoretical
Computer Science (American Mathematical Society). ed., M.F. Janowitz.
Harshman, J., Huddleston, C. J., Bollback, J. P.,
Parsons, T. J., and Braun, M. J. (2003). "True and false gharials: a nuclear
gene phylogeny of crocodylia." Syst Biol. 52: 386-402. [PubMed]
|
|