Estimating the frequency of events that cause multiple-nucleotide changes

Simon Whelan, Nick Goldman

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Existing mathematical models of DNA sequence evolution assume that all substitutions derive from point mutations. There is, however, increasing evidence that larger-scale events, involving two or more consecutive sites, may also be important. We describe a model, denoted SDT, that allows for single-nucleotide, doublet, and triplet mutations. Applied to protein-coding DNA, the SDT model allows doublet and triplet mutations to overlap codon boundaries but still permits data to be analyzed using the simplifying assumption of independence of sites. We have implemented the SDT model for maximum-likelihood phylogenetic inference and have applied it to an alignment of mammalian globin sequences and to 258 other protein-coding sequence alignments from the Pandit database. We find the SDT model's inclusion of doublet and triplet mutations to be overwhelmingly successful in giving statistically significant improvements in fit of model to data, indicating that larger-scale mutation events do occur. Distributions of inferred parameter values over all alignments analyzed suggest that these events are far more prevalent than previously thought. Detailed consideration of our results and the absence of any known mechanism causing three adjacent nucleotides to be substituted simultaneously, however, leads us to suggest that the actual evolutionary events occurring may include still-larger-scale events, such as gene conversion, inversion, or recombination, or a series of rapid compensatory changes.
    Original languageEnglish
    Pages (from-to)2027-2043
    Number of pages16
    JournalGenetics
    Volume167
    Issue number4
    DOIs
    Publication statusPublished - Aug 2004

    Fingerprint

    Dive into the research topics of 'Estimating the frequency of events that cause multiple-nucleotide changes'. Together they form a unique fingerprint.

    Cite this