New approaches to phylogenetic tree search and their application to large numbers of protein alignments

Simon Whelan

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Phylogenetic tree estimation plays a critical role in a wide variety of molecular studies, including molecular systematics, phylogenetics, and comparative genomics. Finding the optimal tree relating a set of sequences using score-based (optimality criterion) methods, such as maximum likelihood and maximum parsimony, may require all possible trees to be considered, which is not feasible even for modest numbers of sequences. In practice, trees are estimated using heuristics that represent a trade-off between topological accuracy and speed. I present a series of novel algorithms suitable for score-based phylogenetic tree reconstruction that demonstrably improve the accuracy of tree estimates while maintaining high computational speeds. The heuristics function by allowing the efficient exploration of large numbers of trees through novel hill-climbing and resampling strategies. These heuristics, and other computational approximations, are implemented for maximum likelihood estimation of trees in the program Leaphy, and its performance is compared to other popular phylogenetic programs. Trees are estimated from 4059 different protein alignments using a selection of phylogenetic programs and the likelihoods of the tree estimates are compared. Trees estimated using Leaphy are found to have equal to or better likelihoods than trees estimated using other phylogenetic programs in 4004 (98.6%) families and provide a unique best tree that no other program found in 1102 (27.1%) families. The improvement is particularly marked for larger families (80 to 100 sequences), where Leaphy finds a unique best tree in 81.7% of families.
    Original languageEnglish
    Pages (from-to)727-740
    Number of pages13
    JournalSystematic Biology
    Volume56
    Issue number5
    DOIs
    Publication statusPublished - Oct 2007

    Keywords

    • Algorithms
    • Evolution
    • Phylogenetic tree inference
    • Tree estimation heuristics

    Fingerprint

    Dive into the research topics of 'New approaches to phylogenetic tree search and their application to large numbers of protein alignments'. Together they form a unique fingerprint.

    Cite this