TCS: a computer program to estimate gene genealogiesстатья из журнала
Аннотация: Phylogenies are extremely useful tools, not only for establishing genealogical relationships among a group of organisms or their parts (e.g. genes), but also for a variety of research once the phylogenies are estimated. In a recent review, Pagel (1999) eloquently outline a number of uses for phylogenetic information from discovery of drug resistance to reconstructing the common ancestor to all of life. Phylogenies have been used to predict future trends in infectious disease ( Bush et al. 1999 ) and have even been offered as evidence in a court of law ( Vogel 1997). Yet phylogenies are only as useful as they are accurate. Estimating genealogical relationships among genes at the population level presents a number of difficulties to traditional methods of phylogeny reconstruction. These traditional methods such as parsimony, neighbour-joining, and maximum-likelihood make assumptions that are invalid at the population level. For example, these methods assume ancestral haplotypes are no longer in the population, yet coalescent theory predicts that ancestral haplotypes will be the most frequent sequences sampled in a population level study ( Watterson & Guess 1977; Donnelly & Tavaré 1986; Crandall & Templeton 1993). Traditional methods require reasonably large numbers of variable characters to accurately reconstruct relationships ( Huelsenbeck & Hillis 1993) and population level studies typically lack such variation. Also, recombination is a real possibility among sequences at the population level and traditional methods assume recombination does not occur. The failure to incorporate the possibility of recombination in phylogeny reconstruction can lead to grave errors in the resulting estimated phylogeny. The combination of these effects can lead parsimony methods to infer a cumbersome amount of most parsimonious trees at the population level with no resolution among the set (e.g. over one billion trees for a set of human mitochondrial DNA (mtDNA), Excoffier & Smouse 1994). These effects can also lead neighbour-joining and traditional maximum-likelihood methods to be over confident in the resulting relationships ( Bandelt et al. 1995 ). Therefore, an alternative approach is needed to provide accurate estimates of gene genealogies at the population level that take into account these population level phenomena not addressed by traditional methods. Multiple groups have looked to network representations for population level genealogical information ( Bandelt & Dress 1992; Templeton et al. 1992 ; Excoffier & Smouse 1994; Fitch 1997). Networks allow one to naturally incorporate the often-times nonbifurcating genealogical information associated with population level divergences. The method of Templeton et al. (1992) (TCS) has been used extensively with restriction site and nucleotide sequence data to infer population level genealogies when divergences are low ( Georgiadis et al. 1994 ; Routman et al. 1994 ; Gerber & Templeton 1996; Hedin 1997; Schaal et al. 1998 ; Viláet al. 1999 , Gómez-Zurita et al. 2000). TCS has been used with traditional methods to estimate relationships among organisms that span a wide range of divergence ( Crandall & Fitzpatrick 1996; Benabib et al. 1997 ). The approach has also been used extensively with a nested analysis procedure to partition population structure from population history ( Templeton et al. 1995 ; Templeton 1998) and explore the phylogeographic history of a diversity of organisms (e.g. Johnson & Jordon 2000; Turner et al. 2000 ). In this note, we announce the availability of a new software package, TCS, to estimate genealogical relationships among sequences using the method of Templeton et al. (1992) . The TCS software opens nucleotide sequence files in either nexus ( Maddison et al. 1997 ) or phylip ( Felsenstein 1991) sequential format. Sequences should not be collapsed into haplotypes as frequency data can be incorporated into the output. The program collapses sequences into haplotypes and calculates the frequencies of the haplotypes in the sample. These frequencies are used to estimate haplotype outgroup probabilities, which correlate with haplotype age ( Donnelly & Tavaré 1986; Castelloe & Templeton 1994). An absolute distance matrix is then calculated for all pairwise comparisons of haplotypes. The probability of parsimony [as defined in Templeton et al. (1992) , equations 6, 7, and 8] is calculated for pairwise differences until the probability exceeds 0.95. The number of mutational differences associated with the probability just before this 95% cut-off is then the maximum number of mutational connections between pairs of sequences justified by the 'parsimony' criterion. These justified connections are then made resulting in a 95% set of plausible solutions. The program outputs the sequences, the pairwise absolute distance matrix, probabilities of parsimony for mutational steps just beyond the 95% cut-off, a test listing of connections made and missing intermediates generated, and a graph output file containing the resulting network ( Fig. 1). This graph output file can be opened in the freeware VGJ 1.0.3 ( http://www.eng.auburn.edu/department/cse/research/graphdrawing/graphdrawing.html; distributed under the terms of the GNU General Public License, Version 2), which is packaged with the TCS algorithm. The program can handle a reasonable number of sequences. For example, an HTLV data set with 69 haplotypes of length 725 bp took over one hour to run in a Macintosh G3. Memory requirements are low, and the program will run with less than 1 MB RAM. The TCS software package, including executables for Mac and PC, documentation, and Java source code, is distributed freely and is available at our website, along with a host of other programs for population genetic and phylogenetic analyses: http://bioag.byu.edu/zoology/crandalllab/programs.htm. TCS Java interface. The maximum number of steps connecting parsimoniously two haplotypes is indicated. Gaps can be treated as a 5th state or as missing data. The graph can be edited and arranged using different algorithms. By double-clicking over a haplotype, some information is displayed, such as sequences included in the haplotype and outgroup weights. The haplotype with the highest outgroup probability is displayed as a square, while other haplotypes are displayed as ovals. The size of the square or oval corresponds to the haplotype frequency. This work was supported by the Alfred P. Sloan Foundation, a Shannon Award from the National Institutes of Health, and NIH R01-HD34350.
Год издания: 2000
Авторы: Mark Clement, David Posada, Keith A. Crandall
Издательство: Wiley
Источник: Molecular Ecology
Ключевые слова: Genomics and Phylogenetic Studies, Genetic diversity and population structure, Evolution and Genetic Dynamics
Другие ссылки: Molecular Ecology (PDF)
Molecular Ecology (HTML)
PubMed (HTML)
Molecular Ecology (HTML)
PubMed (HTML)
Открытый доступ: bronze
Том: 9
Выпуск: 10
Страницы: 1657–1659