1
|
Wygoda E, Loewenthal G, Moshe A, Alburquerque M, Mayrose I, Pupko T. Statistical framework to determine indel-length distribution. Bioinformatics 2024; 40:btae043. [PMID: 38269647 PMCID: PMC10868340 DOI: 10.1093/bioinformatics/btae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 01/10/2024] [Accepted: 01/22/2024] [Indexed: 01/26/2024] Open
Abstract
MOTIVATION Insertions and deletions (indels) of short DNA segments, along with substitutions, are the most frequent molecular evolutionary events. Indels were shown to affect numerous macro-evolutionary processes. Because indels may span multiple positions, their impact is a product of both their rate and their length distribution. An accurate inference of indel-length distribution is important for multiple evolutionary and bioinformatics applications, most notably for alignment software. Previous studies counted the number of continuous gap characters in alignments to determine the best-fitting length distribution. However, gap-counting methods are not statistically rigorous, as gap blocks are not synonymous with indels. Furthermore, such methods rely on alignments that regularly contain errors and are biased due to the assumption of alignment methods that indels lengths follow a geometric distribution. RESULTS We aimed to determine which indel-length distribution best characterizes alignments using statistical rigorous methodologies. To this end, we reduced the alignment bias using a machine-learning algorithm and applied an Approximate Bayesian Computation methodology for model selection. Moreover, we developed a novel method to test if current indel models provide an adequate representation of the evolutionary process. We found that the best-fitting model varies among alignments, with a Zipf length distribution fitting the vast majority of them. AVAILABILITY AND IMPLEMENTATION The data underlying this article are available in Github, at https://github.com/elyawy/SpartaSim and https://github.com/elyawy/SpartaPipeline.
Collapse
Affiliation(s)
- Elya Wygoda
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Asher Moshe
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Michael Alburquerque
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Itay Mayrose
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
2
|
Yao Y, Frith MC. Improved DNA-Versus-Protein Homology Search for Protein Fossils. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1691-1699. [PMID: 35617174 DOI: 10.1109/tcbb.2022.3177855] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Protein fossils, i.e., noncoding DNA descended from coding DNA, arise frequently from transposable elements (TEs), decayed genes, and viral integrations. They can reveal, and mislead about, evolutionary history and relationships. They have been detected by comparing DNA to protein sequences, but current methods are not optimized for this task. We describe a powerful DNA-protein homology search method. We use a 64×21 substitution matrix, which is fitted to sequence data, automatically learning the genetic code. We detect subtly homologous regions by considering alternative possible alignments between them, and calculate significance (probability of occurring by chance between random sequences). Our method detects TE protein fossils much more sensitively than blastx, and faster. Of the ∼ 7 major categories of eukaryotic TE, three were long thought absent in mammals: we find two of them in the human genome, polinton and DIRS/Ngaro. This method increases our power to find ancient fossils, and perhaps to detect non-standard genetic codes. The alternative-alignments and significance paradigm is not specific to DNA-protein comparison, and could benefit homology search generally. This is an extended version of a conference paper (Yao & Frith, 2021).
Collapse
|
3
|
Qi M, Stenson PD, Ball EV, Tainer JA, Bacolla A, Kehrer-Sawatzki H, Cooper DN, Zhao H. Distinct sequence features underlie microdeletions and gross deletions in the human genome. Hum Mutat 2021; 43:328-346. [PMID: 34918412 PMCID: PMC9069542 DOI: 10.1002/humu.24314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 11/02/2021] [Accepted: 12/14/2021] [Indexed: 11/18/2022]
Abstract
Microdeletions and gross deletions are important causes (~20%) of human inherited disease and their genomic locations are strongly influenced by the local DNA sequence environment. This notwithstanding, no study has systematically examined their underlying generative mechanisms. Here, we obtained 42,098 pathogenic microdeletions and gross deletions from the Human Gene Mutation Database (HGMD) that together form a continuum of germline deletions ranging in size from 1 to 28,394,429 bp. We analyzed the DNA sequence within 1 kb of the breakpoint junctions and found that the frequencies of non‐B DNA‐forming repeats, GC‐content, and the presence of seven of 78 specific sequence motifs in the vicinity of pathogenic deletions correlated with deletion length for deletions of length ≤30 bp. Further, we found that the presence of DR, GQ, and STR repeats is important for the formation of longer deletions (>30 bp) but not for the formation of shorter deletions (≤30 bp) while significantly (χ2, p < 2E−16) more microhomologies were identified flanking short deletions than long deletions (length >30 bp). We provide evidence to support a functional distinction between microdeletions and gross deletions. Finally, we propose that a deletion length cut‐off of 25–30 bp may serve as an objective means to functionally distinguish microdeletions from gross deletions.
Collapse
Affiliation(s)
- Mengling Qi
- Department of Medical Research Center, Sun Yat-sen Memorial Hospital; Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Guangzhou, China
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Edward V Ball
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - John A Tainer
- Departments of Cancer Biology and of Molecular and Cellular Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Albino Bacolla
- Departments of Cancer Biology and of Molecular and Cellular Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | | | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Huiying Zhao
- Department of Medical Research Center, Sun Yat-sen Memorial Hospital; Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Guangzhou, China
| |
Collapse
|
4
|
Loewenthal G, Rapoport D, Avram O, Moshe A, Wygoda E, Itzkovitch A, Israeli O, Azouri D, Cartwright RA, Mayrose I, Pupko T. A probabilistic model for indel evolution: differentiating insertions from deletions. Mol Biol Evol 2021; 38:5769-5781. [PMID: 34469521 PMCID: PMC8662616 DOI: 10.1093/molbev/msab266] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here, we introduce several improvements to indel modeling: 1) While previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here we propose a richer model that explicitly distinguishes between the two; 2) we introduce numerous summary statistics that allow approximate Bayesian computation-based parameter estimation; 3) we develop a method to correct for biases introduced by alignment programs, when inferring indel parameters from empirical data sets; and 4) using a model-selection scheme, we test whether the richer model better fits biological data compared with the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed richer model better fits a large number of empirical data sets and that, for the majority of these data sets, the deletion rate is higher than the insertion rate.
Collapse
Affiliation(s)
- Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Rapoport
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Asher Moshe
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Elya Wygoda
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Alon Itzkovitch
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Omer Israeli
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Azouri
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.,School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, Arizona, USA.,School of Life Sciences, Arizona State University, Tempe, Arizona, USA
| | - Itay Mayrose
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
5
|
Ospina-Sarria JJ, Cabra-García J. Parsimony analysis of unaligned sequence data: some clarifications. Cladistics 2018; 34:574-577. [PMID: 34706480 DOI: 10.1111/cla.12229] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/10/2017] [Indexed: 11/29/2022] Open
Abstract
De Laet (2015) claimed that minimization of ad hoc hypotheses of homoplasy does not lead to a preference for trivial optimizations when analysing unaligned sequence data, as claimed by Wheeler (2012; see also Kluge and Grant, 2006). In addition, De Laet asserted that Kluge and Grant's (2006) parsimony rationale is internally inconsistent in terms of Baker's (2003) theoretical framework. We argue that De Laet used extraneous presuppositions to critique Wheeler's position and, as such, his criticism should be considered cautiously in terms of its scope. Finally, we demonstrate that considering Kluge and Grant's parsimony rationale as inconsistent rests on De Laet's misunderstanding of the ideographic character concept and the consequences of relating it to Baker's rationale.
Collapse
Affiliation(s)
- Jhon Jairo Ospina-Sarria
- Departamento de Zoologia, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP 05508-090, Brazil
| | - Jimmy Cabra-García
- Departamento de Zoologia, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP 05508-090, Brazil.,Departamento de Biología, Universidad del Valle, Cali, AA 25360, Colombia
| |
Collapse
|
6
|
Foster TM, Aranzana MJ. Attention sports fans! The far-reaching contributions of bud sport mutants to horticulture and plant biology. HORTICULTURE RESEARCH 2018; 5:44. [PMID: 30038785 PMCID: PMC6046048 DOI: 10.1038/s41438-018-0062-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Accepted: 06/06/2018] [Indexed: 05/08/2023]
Abstract
A bud sport is a lateral shoot, inflorescence or single flower/fruit with a visibly different phenotype from the rest of the plant. The new phenotype is often caused by a stable somatic mutation in a single cell that is passed on to its clonal descendants and eventually populates part or all of a meristem. In many cases, a bud sport can be vegetatively propagated, thereby preserving the novel phenotype without sexual reproduction. Bud sports provide new characteristics while retaining the desirable qualities of the parent plant, which is why many bud sports have been developed into popular cultivars. We present an overview of the history of bud sports, the causes and methods of detecting somaclonal variation, and the types of mutant phenotypes that have arisen spontaneously. We focus on examples where the molecular or cytological changes causing the phenotype have been identified. Analysis of these sports has provided valuable insight into developmental processes, gene function and regulation, and in some cases has revealed new information about layer-specific roles of some genes. Examination of the molecular changes causing a phenotype and in some cases reversion back to the original state has contributed to our understanding of the mechanisms that drive genomic evolution.
Collapse
Affiliation(s)
- Toshi M. Foster
- The New Zealand Institute for Plant and Food Research Limited, Private Bag 11600, Palmerston North, 4474 New Zealand
| | - Maria José Aranzana
- IRTA (Institut de Recerca i Tecnologia Agroalimentàries), Barcelona, Spain
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra, Barcelona, Spain
| |
Collapse
|
7
|
Zhai Y, Alexandre BC. A Poissonian Model of Indel Rate Variation for Phylogenetic Tree Inference. Syst Biol 2018; 66:698-714. [PMID: 28204784 DOI: 10.1093/sysbio/syx033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 01/27/2017] [Indexed: 01/22/2023] Open
Abstract
While indel rate variation has been observed and analyzed in detail, it is not taken into account by current indel-aware phylogenetic reconstruction methods. In this work, we introduce a continuous time stochastic process, the geometric Poisson indel process, that generalizes the Poisson indel process by allowing insertion and deletion rates to vary across sites. We design an efficient algorithm for computing the probability of a given multiple sequence alignment based on our new indel model. We describe a method to construct phylogeny estimates from a fixed alignment using neighbor joining. Using simulation studies, we show that ignoring indel rate variation may have a detrimental effect on the accuracy of the inferred phylogenies, and that our proposed method can sidestep this issue by inferring latent indel rate categories. We also show that our phylogenetic inference method may be more stable to taxa subsampling than methods that either ignore indels or indel rate variation. [evolutionary stochastic process; indel rate variation; Poisson indel process; TKF91.].
Collapse
Affiliation(s)
- Yongliang Zhai
- Department of Statistics, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| | - Bouchard-Côté Alexandre
- Department of Statistics, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| |
Collapse
|
8
|
Massouh A, Schubert J, Yaneva-Roder L, Ulbricht-Jones ES, Zupok A, Johnson MTJ, Wright SI, Pellizzer T, Sobanski J, Bock R, Greiner S. Spontaneous Chloroplast Mutants Mostly Occur by Replication Slippage and Show a Biased Pattern in the Plastome of Oenothera. THE PLANT CELL 2016; 28:911-29. [PMID: 27053421 PMCID: PMC4863383 DOI: 10.1105/tpc.15.00879] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Revised: 03/23/2016] [Accepted: 03/31/2016] [Indexed: 05/08/2023]
Abstract
Spontaneous plastome mutants have been used as a research tool since the beginning of genetics. However, technical restrictions have severely limited their contributions to research in physiology and molecular biology. Here, we used full plastome sequencing to systematically characterize a collection of 51 spontaneous chloroplast mutants in Oenothera (evening primrose). Most mutants carry only a single mutation. Unexpectedly, the vast majority of mutations do not represent single nucleotide polymorphisms but are insertions/deletions originating from DNA replication slippage events. Only very few mutations appear to be caused by imprecise double-strand break repair, nucleotide misincorporation during replication, or incorrect nucleotide excision repair following oxidative damage. U-turn inversions were not detected. Replication slippage is induced at repetitive sequences that can be very small and tend to have high A/T content. Interestingly, the mutations are not distributed randomly in the genome. The underrepresentation of mutations caused by faulty double-strand break repair might explain the high structural conservation of seed plant plastomes throughout evolution. In addition to providing a fully characterized mutant collection for future research on plastid genetics, gene expression, and photosynthesis, our work identified the spectrum of spontaneous mutations in plastids and reveals that this spectrum is very different from that in the nucleus.
Collapse
Affiliation(s)
- Amid Massouh
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, D-14476 Potsdam-Golm, Germany
| | - Julia Schubert
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, D-14476 Potsdam-Golm, Germany
| | - Liliya Yaneva-Roder
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, D-14476 Potsdam-Golm, Germany
| | | | - Arkadiusz Zupok
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, D-14476 Potsdam-Golm, Germany
| | - Marc T J Johnson
- Department of Biology, University of Toronto-Mississauga, Mississauga, Ontario L5L 1C6, Canada
| | - Stephen I Wright
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| | - Tommaso Pellizzer
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, D-14476 Potsdam-Golm, Germany
| | - Johanna Sobanski
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, D-14476 Potsdam-Golm, Germany
| | - Ralph Bock
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, D-14476 Potsdam-Golm, Germany
| | - Stephan Greiner
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, D-14476 Potsdam-Golm, Germany
| |
Collapse
|
9
|
Low Genetic Quality Alters Key Dimensions of the Mutational Spectrum. PLoS Biol 2016; 14:e1002419. [PMID: 27015430 PMCID: PMC4807879 DOI: 10.1371/journal.pbio.1002419] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 02/25/2016] [Indexed: 12/18/2022] Open
Abstract
Mutations affect individual health, population persistence, adaptation, diversification, and genome evolution. There is evidence that the mutation rate varies among genotypes, but the causes of this variation are poorly understood. Here, we link differences in genetic quality with variation in spontaneous mutation in a Drosophila mutation accumulation experiment. We find that chromosomes maintained in low-quality genetic backgrounds experience a higher rate of indel mutation and a lower rate of gene conversion in a manner consistent with condition-based differences in the mechanisms used to repair DNA double strand breaks. These aspects of the mutational spectrum were also associated with body mass, suggesting that the effect of genetic quality on DNA repair was mediated by overall condition, and providing a mechanistic explanation for the differences in mutational fitness decline among these genotypes. The rate and spectrum of substitutions was unaffected by genetic quality, but we find variation in the probability of substitutions and indels with respect to several aspects of local sequence context, particularly GC content, with implications for models of molecular evolution and genome scans for signs of selection. Our finding that the chances of mutation depend on genetic context and overall condition has important implications for how sequences evolve, the risk of extinction, and human health.
Collapse
|
10
|
Cheng J, Liao L, Zhou H, Gu C, Wang L, Han Y. A small indel mutation in an anthocyanin transporter causes variegated colouration of peach flowers. JOURNAL OF EXPERIMENTAL BOTANY 2015; 66:7227-39. [PMID: 26357885 PMCID: PMC4765791 DOI: 10.1093/jxb/erv419] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
The ornamental peach cultivar 'Hongbaihuatao (HBH)' can simultaneously bear pink, red, and variegated flowers on a single tree. Anthocyanin content in pink flowers is extremely low, being only 10% that of a red flower. Surprisingly, the expression of anthocyanin structural and potential regulatory genes in white flowers was not significantly lower than that in both pink and red flowers. However, proteomic analysis revealed a GST encoded by a gene-regulator involved in anthocyanin transport (Riant)-which is expressed in the red flower, but almost undetectable in the variegated flower. The Riant gene contains an insertion-deletion (indel) polymorphism in exon 3. In white flowers, the Riant gene is interrupted by a 2-bp insertion in the last exon, which causes a frameshift and a premature stop codon. In contrast, both pink and red flowers that arise from bud sports are heterozygous for the Riant locus, with one functional allele due to the 2-bp deletion or a novel 1-bp insertion. Southern blot analysis indicated that the Riant gene occurs in a single copy in the peach genome and it is not interrupted by a transposon. The function of the Riant gene was confirmed by its ectopic expression in the Arabidopsis tt19 mutant, where it complements the anthocyanin phenotype, but not the proanthocyanidin pigmentation in seed coat. Collectively,these results indicate that a small indel mutation in the Riant gene, which is not the result of a transposon insertion or excision, causes variegated colouration of peach flowers.
Collapse
Affiliation(s)
- Jun Cheng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden of the Chinese Academy of Sciences, Wuhan, 430074, P.R. China Graduate University of Chinese Academy of Sciences, 19A Yuquanlu, Beijing, 100049, P.R. China
| | - Liao Liao
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden of the Chinese Academy of Sciences, Wuhan, 430074, P.R. China
| | - Hui Zhou
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden of the Chinese Academy of Sciences, Wuhan, 430074, P.R. China Graduate University of Chinese Academy of Sciences, 19A Yuquanlu, Beijing, 100049, P.R. China
| | - Chao Gu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden of the Chinese Academy of Sciences, Wuhan, 430074, P.R. China
| | - Lu Wang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden of the Chinese Academy of Sciences, Wuhan, 430074, P.R. China
| | - Yuepeng Han
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden of the Chinese Academy of Sciences, Wuhan, 430074, P.R. China
| |
Collapse
|
11
|
Xu Y, Liu B, Gröndahl-Yli-Hannuksila K, Tan Y, Feng L, Kallonen T, Wang L, Peng D, He Q, Wang L, Zhang S. Whole-genome sequencing reveals the effect of vaccination on the evolution of Bordetella pertussis. Sci Rep 2015; 5:12888. [PMID: 26283022 PMCID: PMC4539551 DOI: 10.1038/srep12888] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Accepted: 07/10/2015] [Indexed: 12/11/2022] Open
Abstract
Herd immunity can potentially induce a change of circulating viruses. However, it remains largely unknown that how bacterial pathogens adapt to vaccination. In this study, Bordetella pertussis, the causative agent of whooping cough, was selected as an example to explore possible effect of vaccination on the bacterial pathogen. We sequenced and analysed the complete genomes of 40 B. pertussis strains from Finland and China, as well as 11 previously sequenced strains from the Netherlands, where different vaccination strategies have been used over the past 50 years. The results showed that the molecular clock moved at different rates in these countries and in distinct periods, which suggested that evolution of the B. pertussis population was closely associated with the country vaccination coverage. Comparative whole-genome analyses indicated that evolution in this human-restricted pathogen was mainly characterised by ongoing genetic shift and gene loss. Furthermore, 116 SNPs were specifically detected in currently circulating ptxP3-containing strains. The finding might explain the successful emergence of this lineage and its spread worldwide. Collectively, our results suggest that the immune pressure of vaccination is one major driving force for the evolution of B. pertussis, which facilitates further exploration of the pathogenicity of B. pertussis.
Collapse
Affiliation(s)
- Yinghua Xu
- Key Laboratory of the Ministry of Health for Research on Quality and Standardization of Biotech Products, National Institutes of Food and Drug Control, Beijing 100050, P. R. China
| | - Bin Liu
- 1] TEDA School of Biological Sciences and Biotechnology, Nankai University, Tianjin 300457, P.R. China [2] Key Laboratory of Molecular Microbiology and Technology, Ministry of Education, 23 Hongda Street, Tianjin 300457, P. R. China
| | | | - Yajun Tan
- Key Laboratory of the Ministry of Health for Research on Quality and Standardization of Biotech Products, National Institutes of Food and Drug Control, Beijing 100050, P. R. China
| | - Lu Feng
- 1] TEDA School of Biological Sciences and Biotechnology, Nankai University, Tianjin 300457, P.R. China [2] Key Laboratory of Molecular Microbiology and Technology, Ministry of Education, 23 Hongda Street, Tianjin 300457, P. R. China
| | - Teemu Kallonen
- Department of Medical Microbiology and Immunology, Turku University, Turku 20520, Finland
| | - Lichan Wang
- Key Laboratory of the Ministry of Health for Research on Quality and Standardization of Biotech Products, National Institutes of Food and Drug Control, Beijing 100050, P. R. China
| | - Ding Peng
- 1] TEDA School of Biological Sciences and Biotechnology, Nankai University, Tianjin 300457, P.R. China [2] Key Laboratory of Molecular Microbiology and Technology, Ministry of Education, 23 Hongda Street, Tianjin 300457, P. R. China
| | - Qiushui He
- 1] Department of Medical Microbiology and Immunology, Turku University, Turku 20520, Finland [2] Department of Infectious Disease Surveillance and Control, National Institute for Health and Welfare, Turku 20520, Finland [3] Department of Medical Microbiology, Capital Medical University, Beijing 100069, P. R. China
| | - Lei Wang
- 1] TEDA School of Biological Sciences and Biotechnology, Nankai University, Tianjin 300457, P.R. China [2] Key Laboratory of Molecular Microbiology and Technology, Ministry of Education, 23 Hongda Street, Tianjin 300457, P. R. China [3] State Key Laboratory of Medicinal Chemical Biology, Nankai University 300457, Tianjin, P. R. China
| | - Shumin Zhang
- Key Laboratory of the Ministry of Health for Research on Quality and Standardization of Biotech Products, National Institutes of Food and Drug Control, Beijing 100050, P. R. China
| |
Collapse
|
12
|
Boschiero C, Gheyas AA, Ralph HK, Eory L, Paton B, Kuo R, Fulton J, Preisinger R, Kaiser P, Burt DW. Detection and characterization of small insertion and deletion genetic variants in modern layer chicken genomes. BMC Genomics 2015; 16:562. [PMID: 26227840 PMCID: PMC4563830 DOI: 10.1186/s12864-015-1711-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Accepted: 06/22/2015] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Small insertions and deletions (InDels) constitute the second most abundant class of genetic variants and have been found to be associated with many traits and diseases. The present study reports on the detection and characterisation of about 883 K high quality InDels from the whole-genome analysis of several modern layer chicken lines from diverse breeds. RESULTS To reduce the error rates seen in InDel detection, this study used the consensus set from two InDel-calling packages: SAMtools and Dindel, as well as stringent post-filtering criteria. By analysing sequence data from 163 chickens from 11 commercial and 5 experimental layer lines, this study detected about 883 K high quality consensus InDels with 93% validation rate and an average density of 0.78 InDels/kb over the genome. Certain chromosomes, viz, GGAZ, 16, 22 and 25 showed very low densities of InDels whereas the highest rate was observed on GGA6. In spite of the higher recombination rates on microchromosomes, the InDel density on these chromosomes was generally lower relative to macrochromosomes possibly due to their higher gene density. About 43-87% of the InDels were found to be fixed within each line. The majority of detected InDels (86%) were 1-5 bases and about 63% were non-repetitive in nature while the rest were tandem repeats of various motif types. Functional annotation identified 613 frameshift, 465 non-frameshift and 10 stop-gain/loss InDels. Apart from the frameshift and stopgain/loss InDels that are expected to affect the translation of protein sequences and their biological activity, 33% of the non-frameshift were predicted as evolutionary intolerant with potential impact on protein functions. Moreover, about 2.5% of the InDels coincided with the most-conserved elements previously mapped on the chicken genome and are likely to define functional elements. InDels potentially affecting protein function were found to be enriched for certain gene-classes e.g. those associated with cell proliferation, chromosome and Golgi organization, spermatogenesis, and muscle contraction. CONCLUSIONS The large catalogue of InDels presented in this study along with their associated information such as functional annotation, estimated allele frequency, etc. are expected to serve as a rich resource for application in future research and breeding in the chicken.
Collapse
Affiliation(s)
- Clarissa Boschiero
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK. .,Current Address: Departamento de Zootecnia, University of Sao Paulo/ESALQ, Piracicaba, SP, 13418-900, Brazil.
| | - Almas A Gheyas
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Hannah K Ralph
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Lel Eory
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Bob Paton
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Richard Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | | | | | - Pete Kaiser
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - David W Burt
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| |
Collapse
|
13
|
Sun C, Mueller RL. Hellbender genome sequences shed light on genomic expansion at the base of crown salamanders. Genome Biol Evol 2015; 6:1818-29. [PMID: 25115007 PMCID: PMC4122941 DOI: 10.1093/gbe/evu143] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Among animals, genome sizes range from 20 Mb to 130 Gb, with 380-fold variation across vertebrates. Most of the largest vertebrate genomes are found in salamanders, an amphibian clade of 660 species. Thus, salamanders are an important system for studying causes and consequences of genomic gigantism. Previously, we showed that plethodontid salamander genomes accumulate higher levels of long terminal repeat (LTR) retrotransposons than do other vertebrates, although the evolutionary origins of such sequences remained unexplored. We also showed that some salamanders in the family Plethodontidae have relatively slow rates of DNA loss through small insertions and deletions. Here, we present new data from Cryptobranchus alleganiensis, the hellbender. Cryptobranchus and Plethodontidae span the basal phylogenetic split within salamanders; thus, analyses incorporating these taxa can shed light on the genome of the ancestral crown salamander lineage, which underwent expansion. We show that high levels of LTR retrotransposons likely characterize all crown salamanders, suggesting that disproportionate expansion of this transposable element (TE) class contributed to genomic expansion. Phylogenetic and age distribution analyses of salamander LTR retrotransposons indicate that salamanders' high TE levels reflect persistence and diversification of ancestral TEs rather than horizontal transfer events. Finally, we show that relatively slow DNA loss rates through small indels likely characterize all crown salamanders, suggesting that a decreased DNA loss rate contributed to genomic expansion at the clade's base. Our identification of shared genomic features across phylogenetically distant salamanders is a first step toward identifying the evolutionary processes underlying accumulation and persistence of high levels of repetitive sequence in salamander genomes.
Collapse
|
14
|
Abstract
Genome-wide association studies (GWAS) are designed to identify the portion of single-nucleotide polymorphisms (SNPs) in genome sequences associated with a complex trait. Strategies based on the gene list enrichment concept are currently applied for the functional analysis of GWAS, according to which a significant overrepresentation of candidate genes associated with a biological pathway is used as a proxy to infer overrepresentation of candidate SNPs in the pathway. Here we show that such inference is not always valid and introduce the program SNP2GO, which implements a new method to properly test for the overrepresentation of candidate SNPs in biological pathways.
Collapse
|
15
|
Huang S, Li J, Xu A, Huang G, You L. Small Insertions Are More Deleterious than Small Deletions in Human Genomes. Hum Mutat 2013; 34:1642-9. [DOI: 10.1002/humu.22435] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Accepted: 08/22/2013] [Indexed: 11/09/2022]
Affiliation(s)
- Shengfeng Huang
- State Key Laboratory of Biocontrol; Guangdong Key Laboratory of Pharmaceutical Functional Genes; College of Life Sciences, Sun Yat-sen University; Guangzhou 510275 People's Republic of China
| | - Jie Li
- State Key Laboratory of Biocontrol; Guangdong Key Laboratory of Pharmaceutical Functional Genes; College of Life Sciences, Sun Yat-sen University; Guangzhou 510275 People's Republic of China
| | - Anlong Xu
- State Key Laboratory of Biocontrol; Guangdong Key Laboratory of Pharmaceutical Functional Genes; College of Life Sciences, Sun Yat-sen University; Guangzhou 510275 People's Republic of China
- Beijing University of Chinese Medicine, Chao-yang District; Beijing 100029 People's Republic of China
| | - Guangrui Huang
- State Key Laboratory of Biocontrol; Guangdong Key Laboratory of Pharmaceutical Functional Genes; College of Life Sciences, Sun Yat-sen University; Guangzhou 510275 People's Republic of China
| | - Leiming You
- State Key Laboratory of Biocontrol; Guangdong Key Laboratory of Pharmaceutical Functional Genes; College of Life Sciences, Sun Yat-sen University; Guangzhou 510275 People's Republic of China
| |
Collapse
|
16
|
Kvikstad EM, Duret L. Strong heterogeneity in mutation rate causes misleading hallmarks of natural selection on indel mutations in the human genome. Mol Biol Evol 2013; 31:23-36. [PMID: 24113537 PMCID: PMC3879449 DOI: 10.1093/molbev/mst185] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Elucidating the mechanisms of mutation accumulation and fixation is critical to understand the nature of genetic variation and its contribution to genome evolution. Of particular interest is the effect of insertions and deletions (indels) on the evolution of genome landscapes. Recent population-scaled sequencing efforts provide unprecedented data for analyzing the relative impact of selection versus nonadaptive forces operating on indels. Here, we combined McDonald-Kreitman tests with the analysis of derived allele frequency spectra to investigate the dynamics of allele fixation of short (1-50 bp) indels in the human genome. Our analyses revealed apparently higher fixation probabilities for insertions than deletions. However, this fixation bias is not consistent with either selection or biased gene conversion and varies with local mutation rate, being particularly pronounced at indel hotspots. Furthermore, we identified an unprecedented number of loci with evidence for multiple indel events in the primate phylogeny. Even in nonrepetitive sequence contexts (a priori not prone to indel mutations), such loci are 60-fold more frequent than expected according to a model of uniform indel mutation rate. This provides evidence of as yet unidentified cryptic indel hotspots. We propose that indel homoplasy, at known and cryptic hotspots, produces systematic errors in determination of ancestral alleles via parsimony and advise caution interpreting classic selection tests given the strong heterogeneity in indel rates across the genome. These results will have great impact on studies seeking to infer evolutionary forces operating on indels observed in closely related species, because such mutations are traditionally presumed homoplasy-free.
Collapse
Affiliation(s)
- Erika M Kvikstad
- Laboratoire de Biométrie et Biologie Evolutive, UMR 5558, CNRS, Université Lyon 1, Villeurbanne, France
| | | |
Collapse
|
17
|
Characterization of bud emergence 46 (BEM46) protein: sequence, structural, phylogenetic and subcellular localization analyses. Biochem Biophys Res Commun 2013; 438:526-32. [PMID: 23916612 DOI: 10.1016/j.bbrc.2013.07.103] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Accepted: 07/25/2013] [Indexed: 02/04/2023]
Abstract
The bud emergence 46 (BEM46) protein from Neurospora crassa belongs to the α/β-hydrolase superfamily. Recently, we have reported that the BEM46 protein is localized in the perinuclear ER and also forms spots close by the plasma membrane. The protein appears to be required for cell type-specific polarity formation in N. crassa. Furthermore, initial studies suggested that the BEM46 amino acid sequence is conserved in eukaryotes and is considered to be one of the widespread conserved "known unknown" eukaryotic genes. This warrants for a comprehensive phylogenetic analysis of this superfamily to unravel origin and molecular evolution of these genes in different eukaryotes. Herein, we observe that all eukaryotes have at least a single copy of a bem46 ortholog. Upon scanning of these proteins in various genomes, we find that there are expansions leading into several paralogs in vertebrates. Usingcomparative genomic analyses, we identified insertion/deletions (indels) in the conserved domain of BEM46 protein, which allow to differentiate fungal classes such as ascomycetes from basidiomycetes. We also find that exonic indels are able to differentiate BEM46 homologs of different eukaryotic lineage. Furthermore, we unravel that BEM46 protein from N. crassa possess a novel endoplasmic-retention signal (PEKK) using GFP-fusion tagging experiments. We propose that three residues namely a serine 188S, a histidine 292H and an aspartic acid 262D are most critical residues, forming a catalytic triad in BEM46 protein from N. crassa. We carried out a comprehensive study on bem46 genes from a molecular evolution perspective with combination of functional analyses. The evolutionary history of BEM46 proteins is characterized by exonic indels in lineage specific manner.
Collapse
|
18
|
Sun C, López Arriaza JR, Mueller RL. Slow DNA loss in the gigantic genomes of salamanders. Genome Biol Evol 2013; 4:1340-8. [PMID: 23175715 PMCID: PMC3542557 DOI: 10.1093/gbe/evs103] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Evolutionary changes in genome size result from the combined effects of mutation, natural
selection, and genetic drift. Insertion and deletion mutations (indels) directly impact
genome size by adding or removing sequences. Most species lose more DNA through small
indels (i.e., ∼1–30 bp) than they gain, which can result in genome reduction
over time. Because this rate of DNA loss varies across species, small indel dynamics have
been suggested to contribute to genome size evolution. Species with extremely large
genomes provide interesting test cases for exploring the link between small indels and
genome size; however, most large genomes remain relatively unexplored. Here, we examine
rates of DNA loss in the tetrapods with the largest genomes—the salamanders. We used
low-coverage genomic shotgun sequence data from four salamander species to examine
patterns of insertion, deletion, and substitution in neutrally evolving non-long terminal
repeat (LTR) retrotransposon sequences. For comparison, we estimated genome-wide DNA loss
rates in non-LTR retrotransposon sequences from five other vertebrate genomes:
Anolis carolinensis, Danio rerio, Gallus
gallus, Homo sapiens, and Xenopus tropicalis.
Our results show that salamanders have significantly lower rates of DNA loss than do other
vertebrates. More specifically, salamanders experience lower numbers of deletions relative
to insertions, and both deletions and insertions are skewed toward smaller sizes. On the
basis of these patterns, we conclude that slow DNA loss contributes to genomic gigantism
in salamanders. We also identify candidate molecular mechanisms underlying these
differences and suggest that natural variation in indel dynamics provides a unique
opportunity to study the basis of genome stability.
Collapse
Affiliation(s)
- Cheng Sun
- Department of Biology, Colorado State University, CO, USA
| | | | | |
Collapse
|
19
|
Zhao H, Yang Y, Lin H, Zhang X, Mort M, Cooper DN, Liu Y, Zhou Y. DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels. Genome Biol 2013; 14:R23. [PMID: 23497682 PMCID: PMC4053752 DOI: 10.1186/gb-2013-14-3-r23] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 03/13/2013] [Indexed: 02/07/2023] Open
Abstract
Micro-indels (insertions or deletions shorter than 21 bps) constitute the second most frequent class of human gene mutation after single nucleotide variants. Despite the relative abundance of non-frameshifting indels, their damaging effect on protein structure and function has gone largely unstudied. We have developed a support vector machine-based method named DDIG-in (Detecting disease-causing genetic variations due to indels) to prioritize non-frameshifting indels by comparing disease-associated mutations with putatively neutral mutations from the 1,000 Genomes Project. The final model gives good discrimination for indels and is robust against annotation errors. A webserver implementing DDIG-in is available at http://sparks-lab.org/ddig.
Collapse
|
20
|
Massouras A, Waszak SM, Albarca-Aguilera M, Hens K, Holcombe W, Ayroles JF, Dermitzakis ET, Stone EA, Jensen JD, Mackay TFC, Deplancke B. Genomic variation and its impact on gene expression in Drosophila melanogaster. PLoS Genet 2012. [PMID: 23189034 PMCID: PMC3499359 DOI: 10.1371/journal.pgen.1003055] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Understanding the relationship between genetic and phenotypic variation is one of the great outstanding challenges in biology. To meet this challenge, comprehensive genomic variation maps of human as well as of model organism populations are required. Here, we present a nucleotide resolution catalog of single-nucleotide, multi-nucleotide, and structural variants in 39 Drosophila melanogaster Genetic Reference Panel inbred lines. Using an integrative, local assembly-based approach for variant discovery, we identify more than 3.6 million distinct variants, among which were more than 800,000 unique insertions, deletions (indels), and complex variants (1 to 6,000 bp). While the SNP density is higher near other variants, we find that variants themselves are not mutagenic, nor are regions with high variant density particularly mutation-prone. Rather, our data suggest that the elevated SNP density around variants is mainly due to population-level processes. We also provide insights into the regulatory architecture of gene expression variation in adult flies by mapping cis-expression quantitative trait loci (cis-eQTLs) for more than 2,000 genes. Indels comprise around 10% of all cis-eQTLs and show larger effects than SNP cis-eQTLs. In addition, we identified two-fold more gene associations in males as compared to females and found that most cis-eQTLs are sex-specific, revealing a partial decoupling of the genomic architecture between the sexes as well as the importance of genetic factors in mediating sex-biased gene expression. Finally, we performed RNA-seq-based allelic expression imbalance analyses in the offspring of crosses between sequenced lines, which revealed that the majority of strong cis-eQTLs can be validated in heterozygous individuals. One of the principal challenges in current biology is to understand the relationship between genetic and phenotypic variation. The increasing availability of genomic variation maps of human as well as of model organism populations (mouse and Arabidopsis) constitutes an important step towards meeting this challenge. However, despite its excellent track record as a premier model to understand genome function, no genome-wide variation data beyond single-nucleotide variants and microsatellites are currently available for D. melanogaster. Here, we present a comprehensive, nucleotide-resolution catalogue of variants of various types (single-nucleotide, multi-nucleotide, and structural variants) for 39 wild-derived inbred D. melanogaster lines based on high-throughput sequencing. This catalogue confirms that non–SNP variants account for more than half of genomic variation, allowing us to provide new insights into the non-random distribution of variants in the Drosophila genome. We further present genome-wide cis-associations with gene expression based on whole adult fly microarray data, revealing significant associations for about 2,000 genes. Most associations are sex-specific, providing evidence for a decoupling of the genomic, regulatory architecture between males and females.
Collapse
Affiliation(s)
- Andreas Massouras
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Sebastian M. Waszak
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Monica Albarca-Aguilera
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Korneel Hens
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Wiebke Holcombe
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Julien F. Ayroles
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Emmanouil T. Dermitzakis
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
| | - Eric A. Stone
- Department of Genetics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Jeffrey D. Jensen
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Trudy F. C. Mackay
- Department of Genetics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
21
|
Huang S, Yu T, Chen Z, Yuan S, Chen S, Xu A. More single-nucleotide mutations surround small insertions than small deletions in primates. Hum Mutat 2012; 33:1099-106. [PMID: 22461281 DOI: 10.1002/humu.22085] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2011] [Accepted: 03/06/2012] [Indexed: 01/26/2023]
Abstract
Early studies have shown that single-nucleotide mutation rates increase close to insertions and deletions, but it is not fully understood how natural selection shapes genome-wide patterns of indels and their nearby single-nucleotide mutations. In this study, we find that, in primates, more single-nucleotide mutations surround small insertions than small deletions. This pattern affects <150 base pair (bp) sequences close to indels and persists under different genomic properties, such as exon/intron/intergenic contexts, repeated/nonrepeated sequences, replication timing, recombination rates, indel density, and guanine-cytosine (GC) content. We propose two different, but not mutually exclusive, hypothetical mechanisms to explain the pattern. One mechanism is that the sequence context preferring insertion formation may also favor nucleotide substitutions. Another mechanism is related to a hypothesis in which indel heterozygosity tends to increase nearby nucleotide substitution rates. It means that if insertions spend more time in heterozygotes, insertions may accumulate more surrounding single-nucleotide changes. In conclusion, we characterize a special genome-wide evolutionary pattern for indels and nearby single-nucleotide changes. This pattern may be driven by natural selection and bias primates' genome evolution and phenotypic variations.
Collapse
Affiliation(s)
- Shengfeng Huang
- Guangdong Key Laboratory of Pharmaceutical Functional Genes, College of Life Sciences, Sun Yat-Sen University, 135 XinGangXi Road,Guangzhou, People's Republic of China
| | | | | | | | | | | |
Collapse
|
22
|
Chachick R, Tanay A. Inferring divergence of context-dependent substitution rates in Drosophila genomes with applications to comparative genomics. Mol Biol Evol 2012; 29:1769-80. [PMID: 22319143 DOI: 10.1093/molbev/mss056] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Nucleotide substitution is a major evolutionary driving force that can incrementally and stochastically give rise to broad divergence patterns among species. The substitution process at each genomic position is frequently modeled independently of the other positions, although complex interactions between nearby bases are known to significantly affect mutation rates. Here, we study the evolution of 12 fly genomes using new algorithms for accurate inference of parameter-rich substitution models. By comparing models between lineages, we reveal the evolutionary histories of substitution rates at different flanking nucleotide contexts. We demonstrate these driving forces of molecular evolution to be constantly changing, suggesting that neutral drift of mutation rates is an important factor in the evolution of genomes and their sequence composition. This observation is used to develop a scalable approach for parameter-rich comparative genomics. By screening short DNA sequences, we demonstrate how homeoboxes and other transcription factor binding motifs are highly conserved based on our parameter-rich models but not according to standard conservation assays. With the increasing availability of genome sequences, rich substitution models become an attractive and practical approach for evolutionary analysis in general and comparative genomics in particular.
Collapse
Affiliation(s)
- Ran Chachick
- Department of Computer Science and Applied Mathematics, Weizmann Institute, Rehovot, Israel
| | | |
Collapse
|
23
|
Nourmohammad A, Lässig M. Formation of regulatory modules by local sequence duplication. PLoS Comput Biol 2011; 7:e1002167. [PMID: 21998564 PMCID: PMC3188502 DOI: 10.1371/journal.pcbi.1002167] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2011] [Accepted: 06/30/2011] [Indexed: 11/24/2022] Open
Abstract
Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms. Since Jacob and Monod stressed the importance of gene regulation in evolution, our understanding of the mechanisms of regulation has substantially advanced. In higher eukaryotes, genes often have complex regulatory input, which is encoded in cis-regulatory sequence with multiple transcription factor binding sites. However, the modes of genome evolution generating regulatory complexity are much less understood. This study reports a surprising finding: in fly regulatory modules, the majority of transcription factor binding sites show evidence of a local sequence duplication in their evolutionary history, which relates their sequence information to that of neighboring binding sites. Our analysis suggests that local sequence duplications are a pervasive production mode of regulatory information. This mode appears to be specific to higher eukaryotes; we have not found evidence of frequent local duplications in the yeast genome. Our results affect genomic sequence analysis, in particular, computational identification of cis-regulatory elements and alignment of regulatory DNA. At the same time, they address fundamental questions on the evolution of regulation: How much of the regulatory “grammar” observed in higher eukaryotes is due to optimization of function, and how much reflects the underlying sequence evolution modes? What is the result and what is the substrate of natural selection?
Collapse
Affiliation(s)
| | - Michael Lässig
- Institute for Theoretical Physics, University of Cologne, Köln, Germany
- * E-mail:
| |
Collapse
|
24
|
Cooper DN, Bacolla A, Férec C, Vasquez KM, Kehrer-Sawatzki H, Chen JM. On the sequence-directed nature of human gene mutation: the role of genomic architecture and the local DNA sequence environment in mediating gene mutations underlying human inherited disease. Hum Mutat 2011; 32:1075-99. [PMID: 21853507 PMCID: PMC3177966 DOI: 10.1002/humu.21557] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2011] [Accepted: 06/17/2011] [Indexed: 12/21/2022]
Abstract
Different types of human gene mutation may vary in size, from structural variants (SVs) to single base-pair substitutions, but what they all have in common is that their nature, size and location are often determined either by specific characteristics of the local DNA sequence environment or by higher order features of the genomic architecture. The human genome is now recognized to contain "pervasive architectural flaws" in that certain DNA sequences are inherently mutation prone by virtue of their base composition, sequence repetitivity and/or epigenetic modification. Here, we explore how the nature, location and frequency of different types of mutation causing inherited disease are shaped in large part, and often in remarkably predictable ways, by the local DNA sequence environment. The mutability of a given gene or genomic region may also be influenced indirectly by a variety of noncanonical (non-B) secondary structures whose formation is facilitated by the underlying DNA sequence. Since these non-B DNA structures can interfere with subsequent DNA replication and repair and may serve to increase mutation frequencies in generalized fashion (i.e., both in the context of subtle mutations and SVs), they have the potential to serve as a unifying concept in studies of mutational mechanisms underlying human inherited disease.
Collapse
Affiliation(s)
- David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United Kingdom.
| | | | | | | | | | | |
Collapse
|
25
|
Hickey G, Blanchette M. A probabilistic model for sequence alignment with context-sensitive indels. J Comput Biol 2011; 18:1449-64. [PMID: 21951055 DOI: 10.1089/cmb.2011.0157] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Probabilistic approaches for sequence alignment are usually based on pair Hidden Markov Models (HMMs) or Stochastic Context Free Grammars (SCFGs). Recent studies have shown a significant correlation between the content of short indels and their flanking regions, which by definition cannot be modelled by the above two approaches. In this work, we present a context-sensitive indel model based on a pair Tree-Adjoining Grammar (TAG), along with accompanying algorithms for efficient alignment and parameter estimation. The increased precision and statistical power of this model is shown on simulated and real genomic data. As the cost of sequencing plummets, the usefulness of comparative analysis is becoming limited by alignment accuracy rather than data availability. Our results will therefore have an impact on any type of downstream comparative genomics analyses that rely on alignments. Fine-grained studies of small functional regions or disease markers, for example, could be significantly improved by our method. The implementation is available at www.mcb.mcgill.ca/~blanchem/software.html.
Collapse
Affiliation(s)
- Glenn Hickey
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA.
| | | |
Collapse
|
26
|
Hellsten U, Aspden JL, Rio DC, Rokhsar DS. A segmental genomic duplication generates a functional intron. Nat Commun 2011; 2:454. [PMID: 21878908 PMCID: PMC3265369 DOI: 10.1038/ncomms1461] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2011] [Accepted: 07/28/2011] [Indexed: 11/18/2022] Open
Abstract
An intron is an extended genomic feature whose function requires multiple constrained positions—donor and acceptor splice sites, a branch point, a polypyrimidine tract and suitable splicing enhancers—that may be distributed over hundreds or thousands of nucleotides. New introns are therefore unlikely to emerge by incremental accumulation of functional sub-elements. Here we demonstrate that a functional intron can be created de novo in a single step by a segmental genomic duplication. This experiment recapitulates in vivo the birth of an intron that arose in the ancestral jawed vertebrate lineage nearly half-a-billion years ago. The appearance of a new intron that splits an exon without disrupting the corresponding peptide sequence is a rare event in vertebrate genomes. Hellsten et al. demonstrate that, under certain circumstances, a functional intron can be produced in a single step by segmental genomic duplication.
Collapse
Affiliation(s)
- Uffe Hellsten
- DOE Joint Genome Institute, Walnut Creek, California 94598, USA.
| | | | | | | |
Collapse
|
27
|
Wang S, Wang Y, Xie Y, Xiao G. A novel approach to DNA copy number data segmentation. J Bioinform Comput Biol 2011; 9:131-48. [PMID: 21328710 DOI: 10.1142/s0219720011005343] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Revised: 11/02/2010] [Accepted: 11/04/2010] [Indexed: 11/18/2022]
Abstract
DNA copy number (DCN) is the number of copies of DNA at a region of a genome. The alterations of DCN are highly associated with the development of different tumors. Recently, microarray technologies are being employed to detect DCN changes at many loci at the same time in tumor samples. The resulting DCN data are often very noisy, and the tumor sample is often contaminated by normal cells. The goal of computational analysis of array-based DCN data is to infer the underlying DCNs from raw DCN data. Previous methods for this task do not model the tumor/normal cell mixture ratio explicitly and they cannot output segments with DCN annotations. We developed a novel model-based method using the minimum description length (MDL) principle for DCN data segmentation. Our new method can output underlying DCN for each chromosomal segment, and at the same time, infer the underlying tumor proportion in the test samples. Empirical results show that our method achieves better accuracies on average as compared to three previous methods, namely Circular Binary Segmentation, Hidden Markov Model and Ultrasome.
Collapse
Affiliation(s)
- Siling Wang
- Department of Computer Science and Engineering, Southern Methodist University, Dallas, Texas 75205, USA.
| | | | | | | |
Collapse
|
28
|
Mehta P, Schwab DJ, Sengupta AM. Statistical Mechanics of Transcription-Factor Binding Site Discovery Using Hidden Markov Models. JOURNAL OF STATISTICAL PHYSICS 2011; 142:1187-1205. [PMID: 22851788 PMCID: PMC3407691 DOI: 10.1007/s10955-010-0102-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the "inverse" statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.
Collapse
Affiliation(s)
- Pankaj Mehta
- Dept. of Physics, Boston University, Boston, MA, USA
| | - David J. Schwab
- Dept. of Molecular Biology and Lewis-Sigler Institute, Princeton University, Princeton, NJ, USA
| | | |
Collapse
|
29
|
Tolstorukov MY, Volfovsky N, Stephens RM, Park PJ. Impact of chromatin structure on sequence variability in the human genome. Nat Struct Mol Biol 2011; 18:510-5. [PMID: 21399641 DOI: 10.1038/nsmb.2012] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2009] [Accepted: 12/10/2010] [Indexed: 02/02/2023]
Abstract
DNA sequence variations in individual genomes give rise to different phenotypes within the same species. One mechanism in this process is the alteration of chromatin structure due to sequence variation that influences gene regulation. We composed a high-confidence collection of human single-nucleotide polymorphisms and indels based on analysis of publicly available sequencing data and investigated whether the DNA loci associated with stable nucleosome positions are protected against mutations. We addressed how the sequence variation reflects the occupancy profiles of nucleosomes bearing different epigenetic modifications on genome scale. We found that indels are depleted around nucleosome positions of all considered types, whereas single-nucleotide polymorphisms are enriched around the positions of bulk nucleosomes but depleted around the positions of epigenetically modified nucleosomes. These findings indicate an increased level of conservation for the sequences associated with epigenetically modified nucleosomes, highlighting complex organization of the human chromatin.
Collapse
Affiliation(s)
- Michael Y Tolstorukov
- Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | | | | | | |
Collapse
|
30
|
Callahan B, Neher RA, Bachtrog D, Andolfatto P, Shraiman BI. Correlated evolution of nearby residues in Drosophilid proteins. PLoS Genet 2011; 7:e1001315. [PMID: 21383965 PMCID: PMC3044683 DOI: 10.1371/journal.pgen.1001315] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2010] [Accepted: 01/19/2011] [Indexed: 11/19/2022] Open
Abstract
Here we investigate the correlations between coding sequence substitutions as a function of their separation along the protein sequence. We consider both substitutions between the reference genomes of several Drosophilids as well as polymorphisms in a population sample of Zimbabwean Drosophila melanogaster. We find that amino acid substitutions are “clustered” along the protein sequence, that is, the frequency of additional substitutions is strongly enhanced within ≈10 residues of a first such substitution. No such clustering is observed for synonymous substitutions, supporting a “correlation length” associated with selection on proteins as the causative mechanism. Clustering is stronger between substitutions that arose in the same lineage than it is between substitutions that arose in different lineages. We consider several possible origins of clustering, concluding that epistasis (interactions between amino acids within a protein that affect function) and positional heterogeneity in the strength of purifying selection are primarily responsible. The role of epistasis is directly supported by the tendency of nearby substitutions that arose on the same lineage to preserve the total charge of the residues within the correlation length and by the preferential cosegregation of neighboring derived alleles in our population sample. We interpret the observed length scale of clustering as a statistical reflection of the functional locality (or modularity) of proteins: amino acids that are near each other on the protein backbone are more likely to contribute to, and collaborate toward, a common subfunction. Genes are templates for proteins, yet evolutionary studies of genes and proteins often bear little resemblance. Analyses of gene evolution typically treat each codon independently, quantifying gene evolution by summing over the constituent codons. In contrast, studies of protein evolution generally incorporate protein structure and interactions between amino acids explicitly. We investigate correlations in the evolution of codons as a function of their distance from each other along the protein coding sequence. This approach is motivated by the expectation that codons near each other in sequence often encode amino acids belonging to the same functional unit. Consequently, these amino acids are more likely to interact and/or experience similar selective regimes, introducing correlation between the evolution of the underlying codons. We find codon evolution in Drosophilids to be correlated over a characteristic length scale of ≈10 codons. Specifically, the presence of a non-synonymous substitution substantially increases the probability of further such substitutions nearby, particularly within that lineage. Further analysis suggests both functional interactions between amino acids and correlation in the strength of selection contribute to this effect. These findings are relevant for understanding the relative importance of different modes of selection, and particularly the role of epistasis, in gene and protein evolution.
Collapse
Affiliation(s)
- Benjamin Callahan
- Department of Applied Physics, Stanford University, Stanford, California, United States of America.
| | | | | | | | | |
Collapse
|
31
|
Nonsense-mediated decay enables intron gain in Drosophila. PLoS Genet 2010; 6:e1000819. [PMID: 20107520 PMCID: PMC2809761 DOI: 10.1371/journal.pgen.1000819] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2009] [Accepted: 12/18/2009] [Indexed: 12/03/2022] Open
Abstract
Intron number varies considerably among genomes, but despite their fundamental importance, the mutational mechanisms and evolutionary processes underlying the expansion of intron number remain unknown. Here we show that Drosophila, in contrast to most eukaryotic lineages, is still undergoing a dramatic rate of intron gain. These novel introns carry significantly weaker splice sites that may impede their identification by the spliceosome. Novel introns are more likely to encode a premature termination codon (PTC), indicating that nonsense-mediated decay (NMD) functions as a backup for weak splicing of new introns. Our data suggest that new introns originate when genomic insertions with weak splice sites are hidden from selection by NMD. This mechanism reduces the sequence requirement imposed on novel introns and implies that the capacity of the spliceosome to recognize weak splice sites was a prerequisite for intron gain during eukaryotic evolution. The surprising observation 30 years ago that genes are interrupted by non-coding introns changed our view of gene architecture. Intron number varies dramatically among species; ranging from nine introns/gene in humans to less than one in some simple eukyarotes. Here we ask where new introns come from and how they are maintained in a population. We find that novel introns do not arise from pre-existing introns, although the mechanisms that generate novel introns remain unclear. We also show that novel introns carry only weak signals for their identification and removal, and therefore depend on nonsense-mediated decay (NMD). NMD maintains RNA quality control by degrading transcripts that have not been spliced properly. We propose that NMD shelters novel introns from natural selection. This increases the likelihood that a novel intron will rise in frequency and be maintained within a population, thus increasing the rate of intron gain.
Collapse
|
32
|
Lusk RW, Eisen MB. Evolutionary mirages: selection on binding site composition creates the illusion of conserved grammars in Drosophila enhancers. PLoS Genet 2010; 6:e1000829. [PMID: 20107516 PMCID: PMC2809757 DOI: 10.1371/journal.pgen.1000829] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2009] [Accepted: 12/22/2009] [Indexed: 01/05/2023] Open
Abstract
The clustering of transcription factor binding sites in developmental enhancers and the apparent preferential conservation of clustered sites have been widely interpreted as proof that spatially constrained physical interactions between transcription factors are required for regulatory function. However, we show here that selection on the composition of enhancers alone, and not their internal structure, leads to the accumulation of clustered sites with evolutionary dynamics that suggest they are preferentially conserved. We simulated the evolution of idealized enhancers from Drosophila melanogaster constrained to contain only a minimum number of binding sites for one or more factors. Under this constraint, mutations that destroy an existing binding site are tolerated only if a compensating site has emerged elsewhere in the enhancer. Overlapping sites, such as those frequently observed for the activator Bicoid and repressor Krüppel, had significantly longer evolutionary half-lives than isolated sites for the same factors. This leads to a substantially higher density of overlapping sites than expected by chance and the appearance that such sites are preferentially conserved. Because D. melanogaster (like many other species) has a bias for deletions over insertions, sites tended to become closer together over time, leading to an overall clustering of sites in the absence of any selection for clustered sites. Since this effect is strongest for the oldest sites, clustered sites also incorrectly appear to be preferentially conserved. Following speciation, sites tend to be closer together in all descendent species than in their common ancestors, violating the common assumption that shared features of species' genomes reflect their ancestral state. Finally, we show that selection on binding site composition alone recapitulates the observed number of overlapping and closely neighboring sites in real D. melanogaster enhancers. Thus, this study calls into question the common practice of inferring "cis-regulatory grammars" from the organization and evolutionary dynamics of developmental enhancers.
Collapse
Affiliation(s)
- Richard W. Lusk
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
| | - Michael B. Eisen
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
- Genomics Division, Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- California Institute of Quantitative Biosciences, University of California Berkeley, Berkeley, California, United States of America
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, California, United States of America
- * E-mail:
| |
Collapse
|
33
|
Sjödin P, Bataillon T, Schierup MH. Insertion and deletion processes in recent human history. PLoS One 2010; 5:e8650. [PMID: 20098729 PMCID: PMC2808225 DOI: 10.1371/journal.pone.0008650] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2009] [Accepted: 12/14/2009] [Indexed: 11/25/2022] Open
Abstract
Background Although insertions and deletions (indels) account for a sizable portion of genetic changes within and among species, they have received little attention because they are difficult to type, are alignment dependent and their underlying mutational process is poorly understood. A fundamental question in this respect is whether insertions and deletions are governed by similar or different processes and, if so, what these differences are. Methodology/Principal Findings We use published resequencing data from Seattle SNPs and NIEHS human polymorphism databases to construct a genomewide data set of short polymorphic insertions and deletions in the human genome (n = 6228). We contrast these patterns of polymorphism with insertions and deletions fixed in the same regions since the divergence of human and chimpanzee (n = 10546). The macaque genome is used to resolve all indels into insertions and deletions. We find that the ratio of deletions to insertions is greater within humans than between human and chimpanzee. Deletions segregate at lower frequency in humans, providing evidence for deletions being under stronger purifying selection than insertions. The insertion and deletion rates correlate with several genomic features and we find evidence that both insertions and deletions are associated with point mutations. Finally, we find no evidence for a direct effect of the local recombination rate on the insertion and deletion rate. Conclusions/Significance Our data strongly suggest that deletions are more deleterious than insertions but that insertions and deletions are otherwise generally governed by the same genomic factors.
Collapse
Affiliation(s)
- Per Sjödin
- Bioinformatics Research Center, C. F. Møllers Alle, Arhus, Denmark.
| | | | | |
Collapse
|
34
|
Kvikstad EM, Chiaromonte F, Makova KD. Ride the wavelet: A multiscale analysis of genomic contexts flanking small insertions and deletions. Genome Res 2009; 19:1153-64. [PMID: 19502380 DOI: 10.1101/gr.088922.108] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Recent studies have revealed that insertions and deletions (indels) are more different in their formation than previously assumed. What remains enigmatic is how the local DNA sequence context contributes to these differences. To investigate the relative impact of various molecular mechanisms to indel formation, we analyzed sequence contexts of indels in the non protein- or RNA-coding, nonrepetitive (NCNR) portion of the human genome. We considered small (<or=30-bp) indels occurring in the human lineage since its divergence from chimpanzee and used wavelet techniques to study, simultaneously for multiple scales, the spatial patterns of short sequence motifs associated with indel mutagenesis. In particular, we focused on motifs associated with DNA polymerase activity, topoisomerase cleavage, double-strand breaks (DSBs), and their repair. We came to the following conclusions. First, many motifs are characterized by unique enrichment profiles in the vicinity of indels vs. indel-free portions of the genome, verifying the importance of sequence context in indel mutagenesis. Second, only limited similarity in motif frequency profiles is evident flanking insertions vs. deletions, confirming differences in their mutagenesis. Third, substantial similarity in frequency profiles exists between pairs of individual motifs flanking insertions (and separately deletions), suggesting "cooperation" among motifs, and thus molecular mechanisms, during indel formation. Fourth, the wavelet analyses demonstrate that all these patterns are highly dependent on scale (the size of an interval considered). Finally, our results depict a model of indel mutagenesis comprising both replication and recombination (via repair of paused replication forks and site-specific recombination).
Collapse
Affiliation(s)
- Erika M Kvikstad
- Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania 16802, USA
| | | | | |
Collapse
|
35
|
Brandstrom M, Bagshaw AT, Gemmell NJ, Ellegren H. The Relationship Between Microsatellite Polymorphism and Recombination Hot Spots in the Human Genome. Mol Biol Evol 2008; 25:2579-87. [DOI: 10.1093/molbev/msn201] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|