1
|
Bosio M, Drechsel O, Rahman R, Muyas F, Rabionet R, Bezdan D, Domenech Salgado L, Hor H, Schott JJ, Munell F, Colobran R, Macaya A, Estivill X, Ossowski S. eDiVA-Classification and prioritization of pathogenic variants for clinical diagnostics. Hum Mutat 2019; 40:865-878. [PMID: 31026367 PMCID: PMC6767450 DOI: 10.1002/humu.23772] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 04/17/2019] [Accepted: 04/24/2019] [Indexed: 01/06/2023]
Abstract
Mendelian diseases have shown to be an and efficient model for connecting genotypes to phenotypes and for elucidating the function of genes. Whole‐exome sequencing (WES) accelerated the study of rare Mendelian diseases in families, allowing for directly pinpointing rare causal mutations in genic regions without the need for linkage analysis. However, the low diagnostic rates of 20–30% reported for multiple WES disease studies point to the need for improved variant pathogenicity classification and causal variant prioritization methods. Here, we present the exome Disease Variant Analysis (eDiVA; http://ediva.crg.eu), an automated computational framework for identification of causal genetic variants (coding/splicing single‐nucleotide variants and small insertions and deletions) for rare diseases using WES of families or parent–child trios. eDiVA combines next‐generation sequencing data analysis, comprehensive functional annotation, and causal variant prioritization optimized for familial genetic disease studies. eDiVA features a machine learning‐based variant pathogenicity predictor combining various genomic and evolutionary signatures. Clinical information, such as disease phenotype or mode of inheritance, is incorporated to improve the precision of the prioritization algorithm. Benchmarking against state‐of‐the‐art competitors demonstrates that eDiVA consistently performed as a good or better than existing approach in terms of detection rate and precision. Moreover, we applied eDiVA to several familial disease cases to demonstrate its clinical applicability.
Collapse
Affiliation(s)
- Mattia Bosio
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Oliver Drechsel
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | | | - Francesc Muyas
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Raquel Rabionet
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Institut de Recerca Sant Joan de Déu, University of Barcelona, Barcelona, Spain
| | - Daniela Bezdan
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Laura Domenech Salgado
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Hyun Hor
- Department of Neurology, University Hospital Zurich, Zurich, Switzerland
| | - Jean-Jacques Schott
- L'Institut du Thorax, INSERM, CNRS, Univ Nantes, Nantes, France.,Service de Cardiologie, L'institut du thorax, CHU Nantes, Nantes, France
| | | | - Roger Colobran
- Vall d'Hebron Institut de Recerca (VHIR), Barcelona, Spain
| | - Alfons Macaya
- Vall d'Hebron Institut de Recerca (VHIR), Barcelona, Spain
| | - Xavier Estivill
- Sidra Medicine, Doha, Qatar.,Women's Health Dexeus, Barcelona, Spain
| | - Stephan Ossowski
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| |
Collapse
|
2
|
Li J, Jiang L, Wu CI, Lu X, Fang S, Ting CT. Small Segmental Duplications in Drosophila-High Rate of Emergence and Elimination. Genome Biol Evol 2019; 11:486-496. [PMID: 30689862 PMCID: PMC6380325 DOI: 10.1093/gbe/evz011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2019] [Indexed: 12/12/2022] Open
Abstract
Segmental duplications are an important class of mutations. Because a large proportion of segmental duplications may often be strongly deleterious, high frequency or fixed segmental duplications may represent only a tiny fraction of the mutational input. To understand the emergence and elimination of segmental duplications, we survey polymorphic duplications, including tandem and interspersed duplications, in natural populations of Drosophila by haploid embryo genomes. As haploid embryos are not expected to be heterozygous, the genome, sites of heterozygosity (referred to as pseudoheterozygous sites [PHS]), may likely represent recent duplications that have acquired new mutations. Among the 29 genomes of Drosophila melanogaster, we identify 2,282 polymorphic PHS duplications (linked PHS regions) in total or 154 PHS duplications per genome. Most PHS duplications are small (83.4% < 500 bp), Drosophila melanogaster lineage specific, and strain specific (72.6% singletons). The excess of the observed singleton PHS duplications deviates significantly from the neutral expectation, suggesting that most PHS duplications are strongly deleterious. In addition, these small segmental duplications are not evenly distributed in genomic regions and less common in noncoding functional element regions. The underrepresentation in RNA polymerase II binding sites and regions with active histone modifications is correlated with ages of duplications. In conclusion, small segmental duplications occur frequently in Drosophila but rapidly eliminated by natural selection.
Collapse
Affiliation(s)
- Juan Li
- Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Beijing; CAS Center for Excellence in Animal Evolution and Genetics, Kunming Institute of Zoology, Kunming, Chinese Academy of Sciences, China.,University of Chinese Academy of Sciences, Beijing, China.,Institute of Ecology and Evolutionary Biology, National Taiwan University, Taipei, Taiwan
| | - Lan Jiang
- Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Beijing; CAS Center for Excellence in Animal Evolution and Genetics, Kunming Institute of Zoology, Kunming, Chinese Academy of Sciences, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Chung-I Wu
- Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Beijing; CAS Center for Excellence in Animal Evolution and Genetics, Kunming Institute of Zoology, Kunming, Chinese Academy of Sciences, China.,Department of Ecology and Evolution, University of Chicago.,School of Life Science, Sun Yat-Sen University, Guangzhou, China
| | - Xuemei Lu
- Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Beijing; CAS Center for Excellence in Animal Evolution and Genetics, Kunming Institute of Zoology, Kunming, Chinese Academy of Sciences, China
| | - Shu Fang
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Chau-Ti Ting
- Institute of Ecology and Evolutionary Biology, National Taiwan University, Taipei, Taiwan.,Department of Life Science, Center for Biotechnology, Center for Developmental Biology and Regenerative Medicine, National Taiwan University.,Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan
| |
Collapse
|
3
|
Vijay A, Garg I, Ashraf MZ. Perspective: DNA Copy Number Variations in Cardiovascular Diseases. Epigenet Insights 2018; 11:2516865718818839. [PMID: 30560231 PMCID: PMC6291864 DOI: 10.1177/2516865718818839] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 11/08/2018] [Indexed: 12/27/2022] Open
Abstract
Human genome contains many variations, often called mutations, which are difficult to detect and have remained a challenge for years. A substantial part of the genome encompasses repeats and when such repeats are in the coding region they may lead to change in the gene expression profile followed by pathological conditions. Structural variants are alterations which change one or more sequence feature in the chromosome such as change in the copy number, rearrangements, and translocations of a sequence and can be balanced or unbalanced. Copy number variants (CNVs) may increase or decrease the copies of a given region and have a pivotal role in the onset of many diseases including cardiovascular disorders. Cardiovascular disorders have a magnitude of well-established risk factors and etiology, but their correlation with CNVs is still being studied. In this article, we have discussed history of CNVs and a summary on the diseases associated with CNVs. To detect such variations, we shed light on the number of techniques introduced so far and their limitations. The lack of studies on cardiovascular diseases to determine the frequency of such variants needs clinical studies with larger cohorts. This review is a compilation of articles suggesting the importance of CNVs in multitude of cardiovascular anomalies. Finally, future perspectives for better understanding of CNVs and cardiovascular disorders have also been discussed.
Collapse
Affiliation(s)
- Aatira Vijay
- Genomics Division, Defence Institute of Physiology & Allied Sciences, Delhi, India
| | - Iti Garg
- Genomics Division, Defence Institute of Physiology & Allied Sciences, Delhi, India
| | - Mohammad Zahid Ashraf
- Genomics Division, Defence Institute of Physiology and Allied Sciences, DRDO, Delhi, India
| |
Collapse
|
4
|
Watson CT, Matsen FA, Jackson KJL, Bashir A, Smith ML, Glanville J, Breden F, Kleinstein SH, Collins AM, Busse CE. Comment on “A Database of Human Immune Receptor Alleles Recovered from Population Sequencing Data”. THE JOURNAL OF IMMUNOLOGY 2017; 198:3371-3373. [DOI: 10.4049/jimmunol.1700306] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
5
|
Fungtammasan A, Tomaszkiewicz M, Campos-Sánchez R, Eckert KA, DeGiorgio M, Makova KD. Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats. Mol Biol Evol 2016; 33:2744-58. [PMID: 27413049 PMCID: PMC5026258 DOI: 10.1093/molbev/msw139] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA–DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD.
Collapse
Affiliation(s)
- Arkarachai Fungtammasan
- Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Huck Institute of Genome Sciences, Pennsylvania State University
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University
| | - Rebeca Campos-Sánchez
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University
| | - Kristin A Eckert
- Center for Medical Genomics, Pennsylvania State University Department of Pathology, The Jake Gittlen Laboratories for Cancer Research, The Pennsylvania State University College of Medicine
| | - Michael DeGiorgio
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Institute for CyberScience, Pennsylvania State University
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Huck Institute of Genome Sciences, Pennsylvania State University
| |
Collapse
|
6
|
Isakov O, Perrone M, Shomron N. Exome sequencing analysis: a guide to disease variant detection. Methods Mol Biol 2014; 1038:137-58. [PMID: 23872973 DOI: 10.1007/978-1-62703-514-9_8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Whole exome sequencing presents a powerful tool to study rare genetic disorders. The most challenging part of using exome sequencing for the purpose of disease-causing variant detection is analyzing, interpreting, and filtering the large number of detected variants. In this chapter we provide a comprehensive description of the various steps required for such an analysis. We address strategies in selecting samples to sequence, and technical considerations involved in exome sequencing. We then discuss how to identify variants, and methods for first annotating detected variants using characteristics such as allele frequency, location in the genome, and predicted severity, and then classifying and prioritizing the detected variants based on those annotations. Finally, we review possible gene annotations that may help to establish a relationship between genes carrying high-priority variants and the phenotype in question, in order to identify the most likely causative mutations.
Collapse
Affiliation(s)
- Ofer Isakov
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | | | | |
Collapse
|
7
|
Durtschi J, Margraf RL, Coonrod EM, Mallempati KC, Voelkerding KV. VarBin, a novel method for classifying true and false positive variants in NGS data. BMC Bioinformatics 2013; 14 Suppl 13:S2. [PMID: 24266885 PMCID: PMC3849648 DOI: 10.1186/1471-2105-14-s13-s2] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Background Variant discovery for rare genetic diseases using Illumina genome or exome sequencing involves screening of up to millions of variants to find only the one or few causative variant(s). Sequencing or alignment errors create "false positive" variants, which are often retained in the variant screening process. Methods to remove false positive variants often retain many false positive variants. This report presents VarBin, a method to prioritize variants based on a false positive variant likelihood prediction. Methods VarBin uses the Genome Analysis Toolkit variant calling software to calculate the variant-to-wild type genotype likelihood ratio at each variant change and position divided by read depth. The resulting Phred-scaled, likelihood-ratio by depth (PLRD) was used to segregate variants into 4 Bins with Bin 1 variants most likely true and Bin 4 most likely false positive. PLRD values were calculated for a proband of interest and 41 additional Illumina HiSeq, exome and whole genome samples (proband's family or unrelated samples). At variant sites without apparent sequencing or alignment error, wild type/non-variant calls cluster near -3 PLRD and variant calls typically cluster above 10 PLRD. Sites with systematic variant calling problems (evident by variant quality scores and biases as well as displayed on the iGV viewer) tend to have higher and more variable wild type/non-variant PLRD values. Depending on the separation of a proband's variant PLRD value from the cluster of wild type/non-variant PLRD values for background samples at the same variant change and position, the VarBin method's classification is assigned to each proband variant (Bin 1 to Bin 4). Results To assess VarBin performance, Sanger sequencing was performed on 98 variants in the proband and background samples. True variants were confirmed in 97% of Bin 1 variants, 30% of Bin 2, and 0% of Bin 3/Bin 4. Conclusions These data indicate that VarBin correctly classifies the majority of true variants as Bin 1 and Bin 3/4 contained only false positive variants. The "uncertain" Bin 2 contained both true and false positive variants. Future work will further differentiate the variants in Bin 2.
Collapse
|
8
|
RNA-Seq approach for genetic improvement of meat quality in pig and evolutionary insight into the substrate specificity of animal carbonyl reductases. PLoS One 2012; 7:e42198. [PMID: 22962580 PMCID: PMC3433470 DOI: 10.1371/journal.pone.0042198] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 07/05/2012] [Indexed: 11/19/2022] Open
Abstract
Changes in meat quality traits are strongly associated with alterations in postmortem metabolism which depend on genetic variations, especially nonsynonymous single nucleotide variations (nsSNVs) having critical effects on protein structure and function. To selectively identify metabolism-related nsSNVs, next-generation transcriptome sequencing (RNA-Seq) was carried out using RNAs from porcine liver, which contains a diverse range of metabolic enzymes. The multiplex SNV genotyping analysis showed that various metabolism-related genes had different nsSNV alleles. Moreover, many nsSNVs were significantly associated with multiple meat quality traits. Particularly, ch7:g.22112616A>G SNV was identified to create a single amino acid change (Thr/Ala) at the 145th residue of H1.3-like protein, very close to the putative 147th threonine phosphorylation site, suggesting that the nsSNV may affect multiple meat quality traits by affecting the epigenetic regulation of postmortem metabolism-related gene expression. Besides, one nonsynonymous variation, probably generated by gene duplication, led to a stop signal in porcine testicular carbonyl reductase (PTCR), resulting in a C-terminal (E281-A288) deletion. Molecular docking and energy minimization calculations indicated that the binding affinity of wild-type PTCR to 5α-DHT, a C21-steroid, was superior to that of C-terminal-deleted PTCR or human carbonyl reductase, which was very consistent with experimental data, reported previously. Furthermore, P284 was identified as an important residue mediating the specific interaction between PTCR and 5α-DHT, and phylogenetic analysis showed that P284 is an evolutionarily conserved residue among animal carbonyl reductases, which suggests that the C-terminal tails of these reductases may have evolved under evolutionary pressure to increase the substrate specificity for C21-steroids and facilitate metabolic adaptation. Altogether, our RNA-Seq revealed that selective nsSNVs were associated with meat quality traits that could be useful for successful marker-assisted selection in pigs and also represents a useful resource to enhance understanding of protein folding, substrate specificity, and the evolution of enzymes such as carbonyl reductase.
Collapse
|
9
|
Sanz A, Ordovás L, Zaragoza P, Sanz A, de Blas I, Rodellar C. A false single nucleotide polymorphism generated by gene duplication compromises meat traceability. Meat Sci 2012; 91:347-51. [PMID: 22405876 DOI: 10.1016/j.meatsci.2012.02.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2011] [Revised: 01/17/2012] [Accepted: 02/14/2012] [Indexed: 10/28/2022]
Abstract
Controlling meat traceability using SNPs is an effective method of ensuring food safety. We have analyzed several SNPs to create a panel for bovine genetic identification and traceability studies. One of these was the transversion g.329C>T (Genbank accession no. AJ496781) on the cytochrome P450 17A1 gene, which has been included in previously published panels. Using minisequencing reactions, we have tested 701 samples belonging to eight Spanish cattle breeds. Surprisingly, an excess of heterozygotes was detected, implying an extreme departure from Hardy-Weinberg equilibrium (P<0.001). By alignment analysis and sequencing, we detected that the g.329C>T SNP is a false positive polymorphism, which allows us to explain the inflated heterozygotic value. We recommend that this ambiguous SNP, as well as other polymorphisms located in this region, should not be used in identification, traceability or disease association studies. Annotation of these false SNPs should improve association studies and avoid misinterpretations.
Collapse
Affiliation(s)
- Arianne Sanz
- Laboratorio de Genética Bioquímica (LAGENBIO), Facultad de Veterinaria, Universidad de Zaragoza, Zaragoza, Spain.
| | | | | | | | | | | |
Collapse
|
10
|
Fadista J, Bendixen C. Genomic position mapping discrepancies of commercial SNP chips. PLoS One 2012; 7:e31025. [PMID: 22363540 PMCID: PMC3281913 DOI: 10.1371/journal.pone.0031025] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2011] [Accepted: 12/30/2011] [Indexed: 11/23/2022] Open
Abstract
The field of genetics has come to rely heavily on commercial genotyping arrays and accompanying annotations for insights into genotype-phenotype associations. However, in order to avoid errors and false leads, it is imperative that the annotation of SNP chromosomal positions is accurate and unambiguous. We report on genomic positional discrepancies of various SNP chips for human, cattle and mouse species, and discuss their causes and consequences.
Collapse
Affiliation(s)
- João Fadista
- Department of Clinical Sciences Malmö, CRC, Lund University, Malmö, Sweden
| | - Christian Bendixen
- Department of Molecular Biology and Genetics, Faculty of Science and Technology, Aarhus University, Tjele, Denmark
- * E-mail:
| |
Collapse
|
11
|
Uddin M, Sturge M, Peddle L, O'Rielly DD, Rahman P. Genome-wide signatures of 'rearrangement hotspots' within segmental duplications in humans. PLoS One 2011; 6:e28853. [PMID: 22194928 PMCID: PMC3237539 DOI: 10.1371/journal.pone.0028853] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2011] [Accepted: 11/16/2011] [Indexed: 11/19/2022] Open
Abstract
The primary objective of this study was to create a genome-wide high resolution map (i.e., >100 bp) of ‘rearrangement hotspots’ which can facilitate the identification of regions capable of mediating de novo deletions or duplications in humans. A hierarchical method was employed to fragment segmental duplications (SDs) into multiple smaller SD units. Combining an end space free pairwise alignment algorithm with a ‘seed and extend’ approach, we have exhaustively searched 409 million alignments to detect complex structural rearrangements within the reference-guided assembly of the NA18507 human genome (18× coverage), including the previously identified novel 4.8 Mb sequence from de novo assembly within this genome. We have identified 1,963 rearrangement hotspots within SDs which encompass 166 genes and display an enrichment of duplicated gene nucleotide variants (DNVs). These regions are correlated with increased non-allelic homologous recombination (NAHR) event frequency which presumably represents the origin of copy number variations (CNVs) and pathogenic duplications/deletions. Analysis revealed that 20% of the detected hotspots are clustered within the proximal and distal SD breakpoints flanked by the pathogenic deletions/duplications that have been mapped for 24 NAHR-mediated genomic disorders. FISH Validation of selected complex regions revealed 94% concordance with in silico localization of the highly homologous derivatives. Other results from this study indicate that intra-chromosomal recombination is enhanced in genic compared with agenic duplicated regions, and that gene desert regions comprising SDs may represent reservoirs for creation of novel genes. The generation of genome-wide signatures of ‘rearrangement hotspots’, which likely serve as templates for NAHR, may provide a powerful approach towards understanding the underlying mutational mechanism(s) for development of constitutional and acquired diseases.
Collapse
Affiliation(s)
- Mohammed Uddin
- Faculty of Medicine, Discipline of Medicine and Genetics, Memorial University, St. John's, Newfoundland, Canada
| | - Mitch Sturge
- Faculty of Medicine, Discipline of Medicine and Genetics, Memorial University, St. John's, Newfoundland, Canada
| | - Lynette Peddle
- Faculty of Medicine, Discipline of Medicine and Genetics, Memorial University, St. John's, Newfoundland, Canada
| | - Darren D. O'Rielly
- Faculty of Medicine, Discipline of Medicine and Genetics, Memorial University, St. John's, Newfoundland, Canada
| | - Proton Rahman
- Faculty of Medicine, Discipline of Medicine and Genetics, Memorial University, St. John's, Newfoundland, Canada
- * E-mail:
| |
Collapse
|
12
|
McDonald MJ, Wang WC, Huang HD, Leu JY. Clusters of nucleotide substitutions and insertion/deletion mutations are associated with repeat sequences. PLoS Biol 2011; 9:e1000622. [PMID: 21697975 PMCID: PMC3114760 DOI: 10.1371/journal.pbio.1000622] [Citation(s) in RCA: 97] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2010] [Accepted: 04/22/2011] [Indexed: 12/24/2022] Open
Abstract
The genome-sequencing gold rush has facilitated the use of comparative genomics to uncover patterns of genome evolution, although their causal mechanisms remain elusive. One such trend, ubiquitous to prokarya and eukarya, is the association of insertion/deletion mutations (indels) with increases in the nucleotide substitution rate extending over hundreds of base pairs. The prevailing hypothesis is that indels are themselves mutagenic agents. Here, we employ population genomics data from Escherichia coli, Saccharomyces paradoxus, and Drosophila to provide evidence suggesting that it is not the indels per se but the sequence in which indels occur that causes the accumulation of nucleotide substitutions. We found that about two-thirds of indels are closely associated with repeat sequences and that repeat sequence abundance could be used to identify regions of elevated sequence diversity, independently of indels. Moreover, the mutational signature of indel-proximal nucleotide substitutions matches that of error-prone DNA polymerases. We propose that repeat sequences promote an increased probability of replication fork arrest, causing the persistent recruitment of error-prone DNA polymerases to specific sequence regions over evolutionary time scales. Experimental measures of the mutation rates of engineered DNA sequences and analyses of experimentally obtained collections of spontaneous mutations provide molecular evidence supporting our hypothesis. This study uncovers a new role for repeat sequences in genome evolution and provides an explanation of how fine-scale sequence contextual effects influence mutation rates and thereby evolution.
Collapse
|