1
|
Matroud A, Tuffley C, Hendy M. An Asymmetric Alignment Algorithm for Estimating Ancestor-Descendant Edit Distance for Tandem Repeats. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2080-2091. [PMID: 33587704 DOI: 10.1109/tcbb.2021.3059239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Tandem repeats are repetitive structures present in some DNA sequences, consisting of many repeated copies of a single motif. They can serve as important markers for phylogenetic and population genetic studies, due to the high polymorphism in the number of motif copies as well as variations in the motif. The first step in using tandem repeats for phylogenetic studies is to estimate the evolutionary distance between a pair D1 and D2 of tandem repeat sequences with homologous motifs. This problem can be broken into two sub-problems: 1) Construct the most recent common ancestor of the sequences. 2) Calculate the evolutionary distance between each sequence and the hypothesised common ancestor. We present an algorithm that estimates the solution to the second problem. This takes the form of an asymmetric alignment algorithm to estimate the evolutionary distance between two tandem repeat sequences A and D, where D is assumed to have descended from A, under a model that allows block duplication, deletion, and variant substitution. The algorithm is asymmetric in the sense that the two input sequences A and D play different roles in the calculations, reflecting the assumption that D descends from A. Our model assumes static motif boundaries, meaning that motif duplication and deletion events must respect the motif boundaries. The algorithm may also be applied without modification to more complex repetitive structures with two or more motifs, such as nested tandem repeats.
Collapse
|
2
|
Bérard S, Nicolas F, Buard J, Gascuel O, Rivals E. A Fast and Specific Alignment Method for Minisatellite Maps. Evol Bioinform Online 2017. [DOI: 10.1177/117693430600200025] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Variable minisatellites count among the most polymorphic markers of eukaryotic and prokaryotic genomes. This variability can affect gene coding regions, like in the prion protein gene, or gene regulation regions, like for the cystatin B gene, and be associated or implicated in diseases: the Creutzfeld-Jakob disease and the myoclonus epilepsy type 1, for our examples. When it affects neutrally evolving regions, the polymorphism in length ( i.e., in number of copies) of minisatellites proved useful in population genetics. Motivation In these tandem repeat sequences, different mutational mechanisms let the number of copies, as well as the copies themselves, vary. Especially, the interspersion of events of tandem duplication/contraction and of punctual mutation makes the succession of variant repeats much more informative than the sole allele length. To exploit this information requires the ability to align minisatellite alleles by accounting for both punctual mutations and tandem duplications. Results We propose a minisatellite maps alignment program that improves on previous solutions. Our new program is faster, simpler, considers an extended evolutionary model, and is available to the community. We test it on the data set of 609 alleles of the MSY1 (DYF155S1) human minisatellite and confirm its ability to recover known evolutionary signals. Our experiments highlight that the informativeness of minisatellites resides in their length and composition polymorphisms. Exploiting both simultaneously is critical to unravel the implications of variable minisatellites in the control of gene expression and diseases. Availability Software is available at http://atgc.lirmm.fr/ms_align/
Collapse
Affiliation(s)
| | - François Nicolas
- LIRMM, UMR 5506 CNRS-Université de Montpellier II, Montpellier, France
| | - Jérôme Buard
- Institut de Génétique Humaine, UPR-CNRS 1142, Montpellier, France
| | - Olivier Gascuel
- LIRMM, UMR 5506 CNRS-Université de Montpellier II, Montpellier, France
| | - Eric Rivals
- LIRMM, UMR 5506 CNRS-Université de Montpellier II, Montpellier, France
| |
Collapse
|
3
|
Differing Patterns of Selection and Geospatial Genetic Diversity within Two Leading Plasmodium vivax Candidate Vaccine Antigens. PLoS Negl Trop Dis 2014; 8:e2796. [PMID: 24743266 PMCID: PMC3990511 DOI: 10.1371/journal.pntd.0002796] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2013] [Accepted: 03/05/2014] [Indexed: 02/04/2023] Open
|
4
|
Buard J, Rivals E, Dunoyer de Segonzac D, Garres C, Caminade P, de Massy B, Boursot P. Diversity of Prdm9 zinc finger array in wild mice unravels new facets of the evolutionary turnover of this coding minisatellite. PLoS One 2014; 9:e85021. [PMID: 24454780 PMCID: PMC3890296 DOI: 10.1371/journal.pone.0085021] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2013] [Accepted: 11/20/2013] [Indexed: 12/23/2022] Open
Abstract
In humans and mice, meiotic recombination events cluster into narrow hotspots whose genomic positions are defined by the PRDM9 protein via its DNA binding domain constituted of an array of zinc fingers (ZnFs). High polymorphism and rapid divergence of the Prdm9 gene ZnF domain appear to involve positive selection at DNA-recognition amino-acid positions, but the nature of the underlying evolutionary pressures remains a puzzle. Here we explore the variability of the Prdm9 ZnF array in wild mice, and uncovered a high allelic diversity of both ZnF copy number and identity with the caracterization of 113 alleles. We analyze features of the diversity of ZnF identity which is mostly due to non-synonymous changes at codons -1, 3 and 6 of each ZnF, corresponding to amino-acids involved in DNA binding. Using methods adapted to the minisatellite structure of the ZnF array, we infer a phylogenetic tree of these alleles. We find the sister species Mus spicilegus and M. macedonicus as well as the three house mouse (Mus musculus) subspecies to be polyphyletic. However some sublineages have expanded independently in Mus musculus musculus and M. m. domesticus, the latter further showing phylogeographic substructure. Compared to random genomic regions and non-coding minisatellites, none of these patterns appears exceptional. In silico prediction of DNA binding sites for each allele, overlap of their alignments to the genome and relative coverage of the different families of interspersed repeated elements suggest a large diversity between PRDM9 variants with a potential for highly divergent distributions of recombination events in the genome with little correlation to evolutionary distance. By compiling PRDM9 ZnF protein sequences in Primates, Muridae and Equids, we find different diversity patterns among the three amino-acids most critical for the DNA-recognition function, suggesting different diversification timescales.
Collapse
Affiliation(s)
- Jérôme Buard
- Institute of Human Genetics, UPR 1142, Centre National de la Recherche Scientifique, Montpellier, France
| | - Eric Rivals
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, UMR 5506, Université Montpellier 2, Centre National de la Recherche Scientifique, Montpellier, France
- Institut de Biologie Computationnelle, Montpellier, France
| | - Denis Dunoyer de Segonzac
- Institute of Human Genetics, UPR 1142, Centre National de la Recherche Scientifique, Montpellier, France
- Institut des Sciences de l'Evolution Montpellier, Université Montpellier 2, Centre National de la Recherche Scientifique, Institut de Recherche pour le Développement, Montpellier, France
| | - Charlotte Garres
- Institute of Human Genetics, UPR 1142, Centre National de la Recherche Scientifique, Montpellier, France
- Institut des Sciences de l'Evolution Montpellier, Université Montpellier 2, Centre National de la Recherche Scientifique, Institut de Recherche pour le Développement, Montpellier, France
| | - Pierre Caminade
- Institut des Sciences de l'Evolution Montpellier, Université Montpellier 2, Centre National de la Recherche Scientifique, Institut de Recherche pour le Développement, Montpellier, France
| | - Bernard de Massy
- Institute of Human Genetics, UPR 1142, Centre National de la Recherche Scientifique, Montpellier, France
| | - Pierre Boursot
- Institut des Sciences de l'Evolution Montpellier, Université Montpellier 2, Centre National de la Recherche Scientifique, Institut de Recherche pour le Développement, Montpellier, France
| |
Collapse
|
5
|
Pinhas T, Zakov S, Tsur D, Ziv-Ukelson M. Efficient edit distance with duplications and contractions. Algorithms Mol Biol 2013; 8:27. [PMID: 24168705 PMCID: PMC3879238 DOI: 10.1186/1748-7188-8-27] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2012] [Accepted: 09/30/2013] [Indexed: 11/10/2022] Open
Abstract
: We propose three algorithms for string edit distance with duplications and contractions. These include an efficient general algorithm and two improvements which apply under certain constraints on the cost function. The new algorithms solve a more general problem variant and obtain better time complexities with respect to previous algorithms. Our general algorithm is based on min-plus multiplication of square matrices and has time and space complexities of O (|Σ|MP (n)) and O (|Σ|n2), respectively, where |Σ| is the alphabet size, n is the length of the strings, and MP (n) is the time bound for the computation of min-plus matrix multiplication of two n × n matrices (currently, MP(n)=On3log3lognlog2n due to an algorithm by Chan).For integer cost functions, the running time is further improved to O|Σ|n3log2n. In addition, this variant of the algorithm is online, in the sense that the input strings may be given letter by letter, and its time complexity bounds the processing time of the first n given letters. This acceleration is based on our efficient matrix-vector min-plus multiplication algorithm, intended for matrices and vectors for which differences between adjacent entries are from a finite integer interval D. Choosing a constant 1log|D|n<λ<1, the algorithm preprocesses an n × n matrix in On2+λ|D| time and On2+λ|D|λ2log|D|2n space. Then, it may multiply the matrix with any given n-length vector in On2λ2log|D|2n time. Under some discreteness assumptions, this matrix-vector min-plus multiplication algorithm applies to several problems from the domains of context-free grammar parsing and RNA folding and, in particular, implies the asymptotically fastest On3log2n time algorithm for single-strand RNA folding with discrete cost functions.Finally, assuming a different constraint on the cost function, we present another version of the algorithm that exploits the run-length encoding of the strings and runs in O|Σ|nMP(ñ)ñ time and O(|Σ|nñ) space, where ñ is the length of the run-length encoding of the strings.
Collapse
|
6
|
Potter KM, Hipkins VD, Mahalovich MF, Means RE. Mitochondrial DNA haplotype distribution patterns in Pinus ponderosa (Pinaceae): range-wide evolutionary history and implications for conservation. AMERICAN JOURNAL OF BOTANY 2013; 100:1562-1579. [PMID: 23876453 DOI: 10.3732/ajb.1300039] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
PREMISE OF THE STUDY Ponderosa pine (Pinus ponderosa Douglas ex P. Lawson & C. Lawson) exhibits complicated patterns of morphological and genetic variation across its range in western North America. This study aims to clarify P. ponderosa evolutionary history and phylogeography using a highly polymorphic mitochondrial DNA marker, with results offering insights into how geographical and climatological processes drove the modern evolutionary structure of tree species in the region. METHODS We amplified the mtDNA nad1 second intron minisatellite region for 3,100 trees representing 104 populations, and sequenced all length variants. We estimated population-level haplotypic diversity and determined diversity partitioning among varieties, races and populations. After aligning sequences of minisatellite repeat motifs, we evaluated evolutionary relationships among haplotypes. KEY RESULTS The geographical structuring of the 10 haplotypes corresponded with division between Pacific and Rocky Mountain varieties. Pacific haplotypes clustered with high bootstrap support, and appear to have descended from Rocky Mountain haplotypes. A greater proportion of diversity was partitioned between Rocky Mountain races than between Pacific races. Areas of highest haplotypic diversity were the southern Sierra Nevada mountain range in California, northwestern California, and southern Nevada. CONCLUSIONS Pinus ponderosa haplotype distribution patterns suggest a complex phylogeographic history not revealed by other genetic and morphological data, or by the sparse paleoecological record. The results appear consistent with long-term divergence between the Pacific and Rocky Mountain varieties, along with more recent divergences not well-associated with race. Pleistocene refugia may have existed in areas of high haplotypic diversity, as well as the Great Basin, Southwestern United States/northern Mexico, and the High Plains.
Collapse
Affiliation(s)
- Kevin M Potter
- Department of Forestry and Environmental Resources, North Carolina State University, Research Triangle Park, North Carolina 27709, USA.
| | | | | | | |
Collapse
|
7
|
Hickey G, Blanchette M. A probabilistic model for sequence alignment with context-sensitive indels. J Comput Biol 2011; 18:1449-64. [PMID: 21951055 DOI: 10.1089/cmb.2011.0157] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Probabilistic approaches for sequence alignment are usually based on pair Hidden Markov Models (HMMs) or Stochastic Context Free Grammars (SCFGs). Recent studies have shown a significant correlation between the content of short indels and their flanking regions, which by definition cannot be modelled by the above two approaches. In this work, we present a context-sensitive indel model based on a pair Tree-Adjoining Grammar (TAG), along with accompanying algorithms for efficient alignment and parameter estimation. The increased precision and statistical power of this model is shown on simulated and real genomic data. As the cost of sequencing plummets, the usefulness of comparative analysis is becoming limited by alignment accuracy rather than data availability. Our results will therefore have an impact on any type of downstream comparative genomics analyses that rely on alignments. Fine-grained studies of small functional regions or disease markers, for example, could be significantly improved by our method. The implementation is available at www.mcb.mcgill.ca/~blanchem/software.html.
Collapse
Affiliation(s)
- Glenn Hickey
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA.
| | | |
Collapse
|
8
|
Uricaru R, Mancheron A, Rivals E. Novel definition and algorithm for chaining fragments with proportional overlaps. J Comput Biol 2011; 18:1141-54. [PMID: 21899421 DOI: 10.1089/cmb.2011.0126] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Chaining fragments is a crucial step in genome alignment. Existing chaining algorithms compute a maximum weighted chain with no overlaps allowed between adjacent fragments. In practice, using local alignments as fragments, instead of Maximal Exact Matches (MEMs), generates frequent overlaps between fragments, due to combinatorial reasons and biological factors, i.e., variable tandem repeat structures that differ in number of copies between genomic sequences. In this article, in order to raise this limitation, we formulate a novel definition of a chain, allowing overlaps proportional to the fragments lengths, and exhibit an efficient algorithm for computing such a maximum weighted chain. We tested our algorithm on a dataset composed of 694 genome pairs and accounted for significant improvements in terms of coverage, while keeping the running times below reasonable limits. Moreover, experiments with different ratios of allowed overlaps showed the robustness of the chains with respect to these ratios. Our algorithm is implemented in a tool called OverlapChainer (OC), which is available upon request to the authors.
Collapse
Affiliation(s)
- Raluca Uricaru
- Department of Computer Science, LIRMM, CNRS, Université de Montpellier 2, Montpellier, France
| | | | | |
Collapse
|
9
|
|
10
|
Abouelhoda MI, Giegerich R, Behzadi B, Steyaert JM. Alignment of minisatellite maps based on run-length encoding scheme. J Bioinform Comput Biol 2009; 7:287-308. [PMID: 19340916 DOI: 10.1142/s0219720009004060] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2008] [Revised: 09/27/2008] [Accepted: 10/23/2008] [Indexed: 11/18/2022]
Abstract
Subsequent duplication events are responsible for the evolution of the minisatellite maps. Alignment of two minisatellite maps should therefore take these duplication events into account, in addition to the well-known edit operations. All algorithms for computing an optimal alignment of two maps, including the one presented here, first deduce the costs of optimal duplication scenarios for all substrings of the given maps. Then, they incorporate the pre-computed costs in the alignment recurrence. However, all previous algorithms addressing this problem are dependent on the number of distinct map units (map alphabet) and do not fully make use of the repetitiveness of the map units. In this paper, we present an algorithm that remedies these shortcomings: our algorithm is alphabet-independent and is based on the run-length encoding scheme. It is the fastest in theory, and in practice as well, as shown by experimental results. Furthermore, our alignment model is more general than that of the previous algorithms, and captures better the duplication mechanism. Using our algorithm, we derive a quantitative evidence that there is a directional bias in the growth of minisatellites of the MSY1 dataset.
Collapse
|
11
|
Bonhomme F, Rivals E, Orth A, Grant GR, Jeffreys AJ, Bois PRJ. Species-wide distribution of highly polymorphic minisatellite markers suggests past and present genetic exchanges among house mouse subspecies. Genome Biol 2007; 8:R80. [PMID: 17501990 PMCID: PMC1929145 DOI: 10.1186/gb-2007-8-5-r80] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2006] [Revised: 01/22/2007] [Accepted: 05/14/2007] [Indexed: 11/17/2022] Open
Abstract
Global analysis of four minisatellite loci in House Mouse reveals unexpected long-range gene flow between populations and subspecies. Background Four hypervariable minisatellite loci were scored on a panel of 116 individuals of various geographical origins representing a large part of the diversity present in house mouse subspecies. Internal structures of alleles were determined by minisatellite variant repeat mapping PCR to produce maps of intermingled patterns of variant repeats along the repeat array. To reconstruct the genealogy of these arrays of variable length, the specifically designed software MS_Align was used to estimate molecular divergences, graphically represented as neighbor-joining trees. Results Given the high haplotypic diversity detected (mean He = 0.962), these minisatellite trees proved to be highly informative for tracing past and present genetic exchanges. Examples of identical or nearly identical alleles were found across subspecies and in geographically very distant locations, together with poor lineage sorting among subspecies except for the X-chromosome locus MMS30 in Mus mus musculus. Given the high mutation rate of mouse minisatellite loci, this picture cannot be interpreted only with simple splitting events followed by retention of polymorphism, but implies recurrent gene flow between already differentiated entities.
Conclusion This strongly suggests that, at least for the chromosomal regions under scrutiny, wild house mouse subspecies constitute a set of interrelated gene pools still connected through long range gene flow or genetic exchanges occurring in the various contact zones existing nowadays or that have existed in the past. Identifying genomic regions that do not follow this pattern will be a challenging task for pinpointing genes important for speciation.
Collapse
Affiliation(s)
- François Bonhomme
- Biologie Intégrative, ISEM CNRS Université de Montpellier 2 UMR 5554, Montpellier 34095, France
| | - Eric Rivals
- LIRMM, CNRS Université de Montpellier 2 UMR 5506, rue Ada, Montpellier 34392 Cedex 5, France
| | - Annie Orth
- Biologie Intégrative, ISEM CNRS Université de Montpellier 2 UMR 5554, Montpellier 34095, France
| | - Gemma R Grant
- Department of Genetics, University of Leicester, Leicester LE1 7RH, UK
| | - Alec J Jeffreys
- Department of Genetics, University of Leicester, Leicester LE1 7RH, UK
| | - Philippe RJ Bois
- Department of Genetics, University of Leicester, Leicester LE1 7RH, UK
- The Scripps Research Institute, Department of Cancer Biology, Genome Plasticity Laboratory, Parkside Drive, Jupiter, Florida 33458, USA
| |
Collapse
|
12
|
Didier G, Guziolowski C. Mapping sequences by parts. Algorithms Mol Biol 2007; 2:11. [PMID: 17880695 PMCID: PMC2148040 DOI: 10.1186/1748-7188-2-11] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2007] [Accepted: 09/19/2007] [Indexed: 12/04/2022] Open
Abstract
Background: We present the N-map method, a pairwise and asymmetrical approach which allows us to compare sequences by taking into account evolutionary events that produce shuffled, reversed or repeated elements. Basically, the optimal N-map of a sequence s over a sequence t is the best way of partitioning the first sequence into N parts and placing them, possibly complementary reversed, over the second sequence in order to maximize the sum of their gapless alignment scores. Results: We introduce an algorithm computing an optimal N-map with time complexity O (|s| × |t| × N) using O (|s| × |t| × N) memory space. Among all the numbers of parts taken in a reasonable range, we select the value N for which the optimal N-map has the most significant score. To evaluate this significance, we study the empirical distributions of the scores of optimal N-maps and show that they can be approximated by normal distributions with a reasonable accuracy. We test the functionality of the approach over random sequences on which we apply artificial evolutionary events. Practical Application: The method is illustrated with four case studies of pairs of sequences involving non-standard evolutionary events.
Collapse
Affiliation(s)
- Gilles Didier
- Institut de Mathématiques de Luminy, 163 avenue de Luminy, Case 907, 13288 Marseille Cedex 9, France
| | | |
Collapse
|
13
|
Sammeth M, Stoye J. Comparing tandem repeats with duplications and excisions of variable degree. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2006; 3:395-407. [PMID: 17085848 DOI: 10.1109/tcbb.2006.46] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Traditional sequence comparison by alignment employs a mutation model comprised of two events, substitutions and indels (insertions or deletions) of single positions. However, modern genetic analysis knows a variety of more complex mutation events (e.g., duplications, excisions, and rearrangements), especially regarding DNA. With ever more DNA sequence data becoming available, the need to accurately compare sequences which have clearly undergone more complicated types of mutational processes is becoming critical. Herein we introduce a new method for pairwise alignment and comparison of sequences with respect to the special evolution of tandem repeats: substitutions and indels of single positions and, additionally, duplications and excisions of variable degree (i.e., of one or more repeat copies simultaneously) are taken into account. To evaluate our method, we apply it to the spa VNTR (variable number of tandem repeats) cluster of Staphylococcus aureus, a bacterium of high medical importance.
Collapse
|
14
|
Rivals E, Bruyère C, Toffano-Nioche C, Lecharny A. Formation of the Arabidopsis pentatricopeptide repeat family. PLANT PHYSIOLOGY 2006; 141:825-39. [PMID: 16825340 PMCID: PMC1489915 DOI: 10.1104/pp.106.077826] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
In Arabidopsis (Arabidopsis thaliana) the 466 pentatricopeptide repeat (PPR) proteins are putative RNA-binding proteins with essential roles in organelles. Roughly half of the PPR proteins form the plant combinatorial and modular protein (PCMP) subfamily, which is land-plant specific. PCMPs exhibit a large and variable tandem repeat of a standard pattern of three PPR variant motifs. The association or not of this repeat with three non-PPR motifs at their C terminus defines four distinct classes of PCMPs. The highly structured arrangement of these motifs and the similar repartition of these arrangements in the four classes suggest precise relationships between motif organization and substrate specificity. This study is an attempt to reconstruct an evolutionary scenario of the PCMP family. We developed an innovative approach based on comparisons of the proteins at two levels: namely the succession of motifs along the protein and the amino acid sequence of the motifs. It enabled us to infer evolutionary relationships between proteins as well as between the inter- and intraprotein repeats. First, we observed a polarized elongation of the repeat from the C terminus toward the N-terminal region, suggesting local recombinations of motifs. Second, the most N-terminal PPR triple motif proved to evolve under different constraints than the remaining repeat. Altogether, the evidence indicates different evolution for the PPR region and the C-terminal one in PCMPs, which points to distinct functions for these regions. Moreover, local sequence homogeneity observed across PCMP classes may be due to interclass shuffling of motifs, or to deletions/insertions of non-PPR motifs at the C terminus.
Collapse
Affiliation(s)
- Eric Rivals
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5506, Université de Montpellier II, 34392 Montpellier cedex 5, France
| | | | | | | |
Collapse
|
15
|
Alkan C, Eichler EE, Bailey JA, Sahinalp SC, Tüzün E. The Role of Unequal Crossover in Alpha-Satellite DNA Evolution: A Computational Analysis. J Comput Biol 2004; 11:933-44. [PMID: 15700410 DOI: 10.1089/cmb.2004.11.933] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Human DNA consists of a large number of tandem repeat sequences. Such sequences are usually called satellites, with the primary example being the centromeric alpha-satellite DNA. The basic repeat unit of the alpha-satellite DNA is a 171 bp monomer. Arbitrary monomer pairs usually have considerable sequence divergence (20-40%). However, with the exception of peripheral alpha-satellite DNA, monomers can be grouped into blocks of k-monomers (4 < or = k < or = 20) between which the divergence rate is much smaller (e.g., 5%). Perhaps the simplest and best understood mechanism for tandem repeat array evolution is unequal crossover. Although it is possible that alpha-satellite sequences developed as a result of subsequent unequal crossovers only, no formal computational framework seems to have been developed to verify this possibility. In this paper, we develop such a framework and report on experiments which imply that pericentromeric alpha-satellite segments (which are devoid of higher order structure) are evolutionarily distinct from the higher order repeat segments. It is likely that the higher order repeats developed independently in distinct regions of the genome and were carried into their current locations through an unknown mechanism of transposition.
Collapse
Affiliation(s)
- Can Alkan
- Department of EECS, Case Western Reserve University, Cleveland, OH 44106, USA
| | | | | | | | | |
Collapse
|