1
|
Quilez J, Guilmatre A, Garg P, Highnam G, Gymrek M, Erlich Y, Joshi RS, Mittelman D, Sharp AJ. Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucleic Acids Res 2016; 44:3750-62. [PMID: 27060133 PMCID: PMC4857002 DOI: 10.1093/nar/gkw219] [Citation(s) in RCA: 92] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 03/22/2016] [Indexed: 01/23/2023] Open
Abstract
Despite representing an important source of genetic variation, tandem repeats (TRs) remain poorly studied due to technical difficulties. We hypothesized that TRs can operate as expression (eQTLs) and methylation (mQTLs) quantitative trait loci. To test this we analyzed the effect of variation at 4849 promoter-associated TRs, genotyped in 120 individuals, on neighboring gene expression and DNA methylation. Polymorphic promoter TRs were associated with increased variance in local gene expression and DNA methylation, suggesting functional consequences related to TR variation. We identified >100 TRs associated with expression/methylation levels of adjacent genes. These potential eQTL/mQTL TRs were enriched for overlaps with transcription factor binding and DNaseI hypersensitivity sites, providing a rationale for their effects. Moreover, we showed that most TR variants are poorly tagged by nearby single nucleotide polymorphisms (SNPs) markers, indicating that many functional TR variants are not effectively assayed by SNP-based approaches. Our study assigns biological significance to TR variations in the human genome, and suggests that a significant fraction of TR variations exert functional effects via alterations of local gene expression or epigenetics. We conclude that targeted studies that focus on genotyping TR variants are required to fully ascertain functional variation in the genome.
Collapse
Affiliation(s)
- Javier Quilez
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Audrey Guilmatre
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Paras Garg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Gareth Highnam
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Melissa Gymrek
- Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA New York Genome Center, New York, NY 10038, USA
| | - Yaniv Erlich
- Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, NY 10027, USA
| | - Ricky S Joshi
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - David Mittelman
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
2
|
O’Rawe J, Wu Y, Dörfel M, Rope A, Au P, Parboosingh J, Moon S, Kousi M, Kosma K, Smith C, Tzetis M, Schuette J, Hufnagel R, Prada C, Martinez F, Orellana C, Crain J, Caro-Llopis A, Oltra S, Monfort S, Jiménez-Barrón L, Swensen J, Ellingwood S, Smith R, Fang H, Ospina S, Stegmann S, Den Hollander N, Mittelman D, Highnam G, Robison R, Yang E, Faivre L, Roubertie A, Rivière JB, Monaghan K, Wang K, Davis E, Katsanis N, Kalscheuer V, Wang E, Metcalfe K, Kleefstra T, Innes A, Kitsiou-Tzeli S, Rosello M, Keegan C, Lyon G. TAF1 Variants Are Associated with Dysmorphic Features, Intellectual Disability, and Neurological Manifestations. Am J Hum Genet 2015; 97:922-32. [PMID: 26637982 PMCID: PMC4678794 DOI: 10.1016/j.ajhg.2015.11.005] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Accepted: 11/05/2015] [Indexed: 11/30/2022] Open
Abstract
We describe an X-linked genetic syndrome associated with mutations in TAF1 and manifesting with global developmental delay, intellectual disability (ID), characteristic facial dysmorphology, generalized hypotonia, and variable neurologic features, all in male individuals. Simultaneous studies using diverse strategies led to the identification of nine families with overlapping clinical presentations and affected by de novo or maternally inherited single-nucleotide changes. Two additional families harboring large duplications involving TAF1 were also found to share phenotypic overlap with the probands harboring single-nucleotide changes, but they also demonstrated a severe neurodegeneration phenotype. Functional analysis with RNA-seq for one of the families suggested that the phenotype is associated with downregulation of a set of genes notably enriched with genes regulated by E-box proteins. In addition, knockdown and mutant studies of this gene in zebrafish have shown a quantifiable, albeit small, effect on a neuronal phenotype. Our results suggest that mutations in TAF1 play a critical role in the development of this X-linked ID syndrome.
Collapse
|
3
|
Bilgin Sonay T, Carvalho T, Robinson MD, Greminger MP, Krützen M, Comas D, Highnam G, Mittelman D, Sharp A, Marques-Bonet T, Wagner A. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. Genome Res 2015; 25:1591-1599. [PMID: 26290536 DOI: 10.1101/015784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 08/14/2015] [Indexed: 05/25/2023]
Abstract
Tandem repeats (TRs) are stretches of DNA that are highly variable in length and mutate rapidly. They are thus an important source of genetic variation. This variation is highly informative for population and conservation genetics. It has also been associated with several pathological conditions and with gene expression regulation. However, genome-wide surveys of TR variation in humans and closely related species have been scarce due to technical difficulties derived from short-read technology. Here we explored the genome-wide diversity of TRs in a panel of 83 human and nonhuman great ape genomes, in a total of six different species, and studied their impact on gene expression evolution. We found that population diversity patterns can be efficiently captured with short TRs (repeat unit length, 1-5 bp). We examined the potential evolutionary role of TRs in gene expression differences between humans and primates by using 30,275 larger TRs (repeat unit length, 2-50 bp). Genes that contained TRs in the promoters, in their 3' untranslated region, in introns, and in exons had higher expression divergence than genes without repeats in the regions. Polymorphic small repeats (1-5 bp) had also higher expression divergence compared with genes with fixed or no TRs in the gene promoters. Our findings highlight the potential contribution of TRs to human evolution through gene regulation.
Collapse
Affiliation(s)
- Tugce Bilgin Sonay
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tiago Carvalho
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Mark D Robinson
- The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; Institute of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
| | - Maja P Greminger
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - Michael Krützen
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - David Comas
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Gareth Highnam
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - David Mittelman
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - Andrew Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai School, New York, New York 10029, USA
| | - Tomàs Marques-Bonet
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain; Centro Nacional de Análisis Genómico (CNAG), PCB, Barcelona, 08028 Catalonia, Spain; Catalan Institution for Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
| | - Andreas Wagner
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; The Santa Fe Institute, Santa Fe, New Mexico 87501, USA
| |
Collapse
|
4
|
Bilgin Sonay T, Carvalho T, Robinson MD, Greminger MP, Krützen M, Comas D, Highnam G, Mittelman D, Sharp A, Marques-Bonet T, Wagner A. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. Genome Res 2015; 25:1591-9. [PMID: 26290536 PMCID: PMC4617956 DOI: 10.1101/gr.190868.115] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 08/14/2015] [Indexed: 12/20/2022]
Abstract
Tandem repeats (TRs) are stretches of DNA that are highly variable in length and mutate rapidly. They are thus an important source of genetic variation. This variation is highly informative for population and conservation genetics. It has also been associated with several pathological conditions and with gene expression regulation. However, genome-wide surveys of TR variation in humans and closely related species have been scarce due to technical difficulties derived from short-read technology. Here we explored the genome-wide diversity of TRs in a panel of 83 human and nonhuman great ape genomes, in a total of six different species, and studied their impact on gene expression evolution. We found that population diversity patterns can be efficiently captured with short TRs (repeat unit length, 1–5 bp). We examined the potential evolutionary role of TRs in gene expression differences between humans and primates by using 30,275 larger TRs (repeat unit length, 2–50 bp). Genes that contained TRs in the promoters, in their 3′ untranslated region, in introns, and in exons had higher expression divergence than genes without repeats in the regions. Polymorphic small repeats (1–5 bp) had also higher expression divergence compared with genes with fixed or no TRs in the gene promoters. Our findings highlight the potential contribution of TRs to human evolution through gene regulation.
Collapse
Affiliation(s)
- Tugce Bilgin Sonay
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tiago Carvalho
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Mark D Robinson
- The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; Institute of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
| | - Maja P Greminger
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - Michael Krützen
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - David Comas
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Gareth Highnam
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - David Mittelman
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - Andrew Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai School, New York, New York 10029, USA
| | - Tomàs Marques-Bonet
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain; Centro Nacional de Análisis Genómico (CNAG), PCB, Barcelona, 08028 Catalonia, Spain; Catalan Institution for Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
| | - Andreas Wagner
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; The Santa Fe Institute, Santa Fe, New Mexico 87501, USA
| |
Collapse
|
5
|
Balanovsky O, Zhabagin M, Agdzhoyan A, Chukhryaeva M, Zaporozhchenko V, Utevska O, Highnam G, Sabitov Z, Greenspan E, Dibirova K, Skhalyakho R, Kuznetsova M, Koshel S, Yusupov Y, Nymadawa P, Zhumadilov Z, Pocheshkhova E, Haber M, A. Zalloua P, Yepiskoposyan L, Dybo A, Tyler-Smith C, Balanovska E. Deep phylogenetic analysis of haplogroup G1 provides estimates of SNP and STR mutation rates on the human Y-chromosome and reveals migrations of Iranic speakers. PLoS One 2015; 10:e0122968. [PMID: 25849548 PMCID: PMC4388827 DOI: 10.1371/journal.pone.0122968] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Accepted: 02/16/2015] [Indexed: 11/18/2022] Open
Abstract
Y-chromosomal haplogroup G1 is a minor component of the overall gene pool of South-West and Central Asia but reaches up to 80% frequency in some populations scattered within this area. We have genotyped the G1-defining marker M285 in 27 Eurasian populations (n= 5,346), analyzed 367 M285-positive samples using 17 Y-STRs, and sequenced ~11 Mb of the Y-chromosome in 20 of these samples to an average coverage of 67X. This allowed detailed phylogenetic reconstruction. We identified five branches, all with high geographical specificity: G1-L1323 in Kazakhs, the closely related G1-GG1 in Mongols, G1-GG265 in Armenians and its distant brother clade G1-GG162 in Bashkirs, and G1-GG362 in West Indians. The haplotype diversity, which decreased from West Iran to Central Asia, allows us to hypothesize that this rare haplogroup could have been carried by the expansion of Iranic speakers northwards to the Eurasian steppe and via founder effects became a predominant genetic component of some populations, including the Argyn tribe of the Kazakhs. The remarkable agreement between genetic and genealogical trees of Argyns allowed us to calibrate the molecular clock using a historical date (1405 AD) of the most recent common genealogical ancestor. The mutation rate for Y-chromosomal sequence data obtained was 0.78×10-9 per bp per year, falling within the range of published rates. The mutation rate for Y-chromosomal STRs was 0.0022 per locus per generation, very close to the so-called genealogical rate. The “clan-based” approach to estimating the mutation rate provides a third, middle way between direct farther-to-son comparisons and using archeologically known migrations, whose dates are subject to revision and of uncertain relationship to genetic events.
Collapse
Affiliation(s)
- Oleg Balanovsky
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Research Centre for Medical Genetics, Russian Academy of Sciences, Moscow, Russia
- * E-mail:
| | - Maxat Zhabagin
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Center for Life Sciences, Nazarbayev University, Astana, Republic of Kazakhstan
| | - Anastasiya Agdzhoyan
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Marina Chukhryaeva
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Research Centre for Medical Genetics, Russian Academy of Sciences, Moscow, Russia
| | | | - Olga Utevska
- Department of Genetics and Citology, V. N. Karazin National University, Kharkiv, Ukraine
| | - Gareth Highnam
- Gene by Gene, Ltd., Houston, Texas, United States of America
| | - Zhaxylyk Sabitov
- Center for Life Sciences, Nazarbayev University, Astana, Republic of Kazakhstan
- Gumilov Eurasian National University, Astana, Republic of Kazakhstan
| | | | - Khadizhat Dibirova
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Research Centre for Medical Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Roza Skhalyakho
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Research Centre for Medical Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Marina Kuznetsova
- Research Centre for Medical Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Sergey Koshel
- Faculty of Geography, Lomonosov Moscow State University, Moscow, Russia
| | - Yuldash Yusupov
- Institute of Humanitarian Research of the Republic of Bashkortostan, Ufa, Russia
| | | | - Zhaxybay Zhumadilov
- Center for Life Sciences, Nazarbayev University, Astana, Republic of Kazakhstan
| | | | - Marc Haber
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | | | - Levon Yepiskoposyan
- Institute Molecular Biology, National Academy of Sciences of the Republic of Armenia, Yerevan, Armenia
| | - Anna Dybo
- Institute of Linguistics, Russian Academy of Sciences, Moscow, Russia
| | - Chris Tyler-Smith
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Elena Balanovska
- Research Centre for Medical Genetics, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
6
|
Abstract
Short tandem repeats are among the most polymorphic loci in the human genome. These loci play a role in the etiology of a range of genetic diseases and have been frequently utilized in forensics, population genetics, and genetic genealogy. Despite this plethora of applications, little is known about the variation of most STRs in the human population. Here, we report the largest-scale analysis of human STR variation to date. We collected information for nearly 700,000 STR loci across more than 1000 individuals in Phase 1 of the 1000 Genomes Project. Extensive quality controls show that reliable allelic spectra can be obtained for close to 90% of the STR loci in the genome. We utilize this call set to analyze determinants of STR variation, assess the human reference genome's representation of STR alleles, find STR loci with common loss-of-function alleles, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs. Overall, these analyses further elucidate the scale of genetic variation beyond classical point mutations.
Collapse
Affiliation(s)
- Thomas Willems
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA; Computational and Systems Biology Program, MIT, Cambridge, Massachusetts 02139, USA
| | - Melissa Gymrek
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA; Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA; Department of Molecular Biology and Diabetes Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Gareth Highnam
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - David Mittelman
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA; Gene by Gene, Ltd., Houston, Texas 77008, USA
| | - Yaniv Erlich
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA;
| |
Collapse
|
7
|
|
8
|
Huang W, Massouras A, Inoue Y, Peiffer J, Ràmia M, Tarone AM, Turlapati L, Zichner T, Zhu D, Lyman RF, Magwire MM, Blankenburg K, Carbone MA, Chang K, Ellis LL, Fernandez S, Han Y, Highnam G, Hjelmen CE, Jack JR, Javaid M, Jayaseelan J, Kalra D, Lee S, Lewis L, Munidasa M, Ongeri F, Patel S, Perales L, Perez A, Pu L, Rollmann SM, Ruth R, Saada N, Warner C, Williams A, Wu YQ, Yamamoto A, Zhang Y, Zhu Y, Anholt RRH, Korbel JO, Mittelman D, Muzny DM, Gibbs RA, Barbadilla A, Johnston JS, Stone EA, Richards S, Deplancke B, Mackay TFC. Natural variation in genome architecture among 205 Drosophila melanogaster Genetic Reference Panel lines. Genome Res 2014; 24:1193-208. [PMID: 24714809 PMCID: PMC4079974 DOI: 10.1101/gr.171546.113] [Citation(s) in RCA: 403] [Impact Index Per Article: 40.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The Drosophila melanogaster Genetic Reference Panel (DGRP) is a community resource of 205 sequenced inbred lines, derived to improve our understanding of the effects of naturally occurring genetic variation on molecular and organismal phenotypes. We used an integrated genotyping strategy to identify 4,853,802 single nucleotide polymorphisms (SNPs) and 1,296,080 non-SNP variants. Our molecular population genomic analyses show higher deletion than insertion mutation rates and stronger purifying selection on deletions. Weaker selection on insertions than deletions is consistent with our observed distribution of genome size determined by flow cytometry, which is skewed toward larger genomes. Insertion/deletion and single nucleotide polymorphisms are positively correlated with each other and with local recombination, suggesting that their nonrandom distributions are due to hitchhiking and background selection. Our cytogenetic analysis identified 16 polymorphic inversions in the DGRP. Common inverted and standard karyotypes are genetically divergent and account for most of the variation in relatedness among the DGRP lines. Intriguingly, variation in genome size and many quantitative traits are significantly associated with inversions. Approximately 50% of the DGRP lines are infected with Wolbachia, and four lines have germline insertions of Wolbachia sequences, but effects of Wolbachia infection on quantitative traits are rarely significant. The DGRP complements ongoing efforts to functionally annotate the Drosophila genome. Indeed, 15% of all D. melanogaster genes segregate for potentially damaged proteins in the DGRP, and genome-wide analyses of quantitative traits identify novel candidate genes. The DGRP lines, sequence data, genotypes, quality scores, phenotypes, and analysis and visualization tools are publicly available.
Collapse
Affiliation(s)
- Wen Huang
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Andreas Massouras
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Yutaka Inoue
- Center for Education in Liberal Arts and Sciences, Osaka University, Osaka-fu, 560-0043 Japan
| | - Jason Peiffer
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Miquel Ràmia
- Genomics, Bioinformatics and Evolution Group, Institut de Biotecnologia i de Biomedicina (IBB), Department of Genetics and Microbiology, Campus Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - Aaron M Tarone
- Department of Entomology, Texas A&M University, College Station, Texas 77843, USA
| | - Lavanya Turlapati
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Thomas Zichner
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | - Dianhui Zhu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Richard F Lyman
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Michael M Magwire
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Kerstin Blankenburg
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Mary Anna Carbone
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Kyle Chang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Lisa L Ellis
- Department of Entomology, Texas A&M University, College Station, Texas 77843, USA
| | - Sonia Fernandez
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Yi Han
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Gareth Highnam
- Virginia Tech Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - Carl E Hjelmen
- Department of Entomology, Texas A&M University, College Station, Texas 77843, USA
| | - John R Jack
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Mehwish Javaid
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Joy Jayaseelan
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Divya Kalra
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Sandy Lee
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Lora Lewis
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Mala Munidasa
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Fiona Ongeri
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Shohba Patel
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Lora Perales
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Agapito Perez
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - LingLing Pu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Stephanie M Rollmann
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Robert Ruth
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Nehad Saada
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Crystal Warner
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Aneisa Williams
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Yuan-Qing Wu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Akihiko Yamamoto
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Yiqing Zhang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Yiming Zhu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Robert R H Anholt
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Jan O Korbel
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | - David Mittelman
- Virginia Tech Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Antonio Barbadilla
- Genomics, Bioinformatics and Evolution Group, Institut de Biotecnologia i de Biomedicina (IBB), Department of Genetics and Microbiology, Campus Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - J Spencer Johnston
- Department of Entomology, Texas A&M University, College Station, Texas 77843, USA
| | - Eric A Stone
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Stephen Richards
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Trudy F C Mackay
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| |
Collapse
|
9
|
Guilmatre A, Highnam G, Borel C, Mittelman D, Sharp AJ. Rapid multiplexed genotyping of simple tandem repeats using capture and high-throughput sequencing. Hum Mutat 2013; 34:1304-11. [PMID: 23696428 DOI: 10.1002/humu.22359] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2013] [Accepted: 05/07/2013] [Indexed: 11/12/2022]
Abstract
Although simple tandem repeats (STRs) comprise ~2% of the human genome and represent an important source of polymorphism, this class of variation remains understudied. We have developed a cost-effective strategy for performing targeted enrichment of STR regions that utilizes capture probes targeting the flanking sequences of STR loci, enabling specific capture of DNA fragments containing STRs for subsequent high-throughput sequencing. Utilizing a capture design targeting 6,243 STR loci <94 bp and multiplexing eight individuals in a single Illumina HiSeq2000 sequencing lane we were able to call genotypes in at least one individual for 67.5% of the targeted STRs. We observed a strong relationship between (G+C) content and genotyping rate. STRs with moderate (G+C) content were recovered with >90% success rate, whereas only 12% of STRs with ≥ 80% (G+C) were genotyped in our assay. Analysis of a parent-offspring trio, complete hydatidiform mole samples, repeat analyses of the same individual, and Sanger sequencing-based validation indicated genotyping error rates between 7.6% and 12.4%. The majority of such errors were a single repeat unit at mono- or dinucleotide repeats. Altogether, our STR capture assay represents a cost-effective method that enables multiplexed genotyping of thousands of STR loci suitable for large-scale population studies.
Collapse
Affiliation(s)
- Audrey Guilmatre
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | | | | | | | | |
Collapse
|
10
|
Abstract
A report of the fifth annual Personal Genomes and Medical Genomics meeting, held at Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA, November 14-17, 2012.
Collapse
|
11
|
Highnam G, Franck C, Martin A, Stephens C, Puthige A, Mittelman D. Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles. Nucleic Acids Res 2012; 41:e32. [PMID: 23090981 PMCID: PMC3592458 DOI: 10.1093/nar/gks981] [Citation(s) in RCA: 103] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Repetitive sequences are biologically and clinically important because they can influence traits and disease, but repeats are challenging to analyse using short-read sequencing technology. We present a tool for genotyping microsatellite repeats called RepeatSeq, which uses Bayesian model selection guided by an empirically derived error model that incorporates sequence and read properties. Next, we apply RepeatSeq to high-coverage genomes from the 1000 Genomes Project to evaluate performance and accuracy. The software uses common formats, such as VCF, for compatibility with existing genome analysis pipelines. Source code and binaries are available at http://github.com/adaptivegenome/repeatseq.
Collapse
Affiliation(s)
- Gareth Highnam
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061, USA
| | | | | | | | | | | |
Collapse
|