1
|
Uguen K, Michaud JL, Génin E. Short Tandem Repeats in the era of next-generation sequencing: from historical loci to population databases. Eur J Hum Genet 2024:10.1038/s41431-024-01666-z. [PMID: 38982300 DOI: 10.1038/s41431-024-01666-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 06/20/2024] [Accepted: 06/27/2024] [Indexed: 07/11/2024] Open
Abstract
In this study, we explore the landscape of short tandem repeats (STRs) within the human genome through the lens of evolving technologies to detect genomic variations. STRs, which encompass approximately 3% of our genomic DNA, are crucial for understanding human genetic diversity, disease mechanisms, and evolutionary biology. The advent of high-throughput sequencing methods has revolutionized our ability to accurately map and analyze STRs, highlighting their significance in genetic disorders, forensic science, and population genetics. We review the current available methodologies for STR analysis, the challenges in interpreting STR variations across different populations, and the implications of STRs in medical genetics. Our findings underscore the urgent need for comprehensive STR databases that reflect the genetic diversity of global populations, facilitating the interpretation of STR data in clinical diagnostics, genetic research, and forensic applications. This work sets the stage for future studies aimed at harnessing STR variations to elucidate complex genetic traits and diseases, reinforcing the importance of integrating STRs into genetic research and clinical practice.
Collapse
Affiliation(s)
- Kevin Uguen
- Univ Brest, Inserm, EFS, UMR 1078, GGB, Brest, France.
- Service de Génétique Médicale et Biologie de la Reproduction, CHU de Brest, Brest, France.
- CHU Sainte-Justine Azrieli Research Centre, Montréal, QC, Canada.
| | - Jacques L Michaud
- CHU Sainte-Justine Azrieli Research Centre, Montréal, QC, Canada
- Department of Pediatrics, Université de Montréal, Montréal, QC, Canada
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| | | |
Collapse
|
2
|
Lamkin M, Gymrek M. The emerging role of tandem repeats in complex traits. Nat Rev Genet 2024; 25:452-453. [PMID: 38714860 DOI: 10.1038/s41576-024-00736-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Affiliation(s)
- Michael Lamkin
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
3
|
King DG. Mutation protocols share with sexual reproduction the physiological role of producing genetic variation within 'constraints that deconstrain'. J Physiol 2024; 602:2615-2626. [PMID: 38178567 DOI: 10.1113/jp285478] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 12/14/2023] [Indexed: 01/06/2024] Open
Abstract
Because the universe of possible DNA sequences is inconceivably vast, organisms have evolved mechanisms for exploring DNA sequence space while substantially reducing the hazard that would otherwise accrue to any process of random, accidental mutation. One such mechanism is meiotic recombination. Although sexual reproduction imposes a seemingly paradoxical 50% cost to fitness, sex evidently prevails because this cost is outweighed by the advantage of equipping offspring with genetic variation to accommodate environmental vicissitudes. The potential adaptive utility of additional mechanisms for producing genetic variation has long been obscured by a presumption that the vast majority of mutations are deleterious. Perhaps surprisingly, the probability for adaptive variation can be increased by several mechanisms that generate mutations abundantly. Such mechanisms, here called 'mutation protocols', implement implicit 'constraints that deconstrain'. Like meiotic recombination, they produce genetic variation in forms that minimize potential for harm while providing a reasonably high probability for benefit. One example is replication slippage of simple sequence repeats (SSRs); this process yields abundant, reversible mutations, typically with small quantitative effect on phenotype. This enables SSRs to function as adjustable 'tuning knobs'. There exists a clear pathway for SSRs to be shaped through indirect selection favouring their implicit tuning-knob protocol. Several other molecular mechanisms comprise probable components of additional mutation protocols. Biologists might plausibly regard such mechanisms of mutation not primarily as sources of deleterious genetic mistakes but also as potentially adaptive processes for 'exploring' DNA sequence space.
Collapse
Affiliation(s)
- David G King
- Department of Anatomy, School of Medicine, Southern Illinois University Carbondale, Carbondale, Illinois, USA
- Department of Zoology, College of Agricultural, Life, and Physical Sciences, Southern Illinois University Carbondale, Carbondale, Illinois, USA
| |
Collapse
|
4
|
Khamse S, Alizadeh S, Khorshid HRK, Delbari A, Tajeddin N, Ohadi M. A Hypermutable Region in the DISP2 Gene Links to Natural Selection and Late-Onset Neurocognitive Disorders in Humans. Mol Neurobiol 2024:10.1007/s12035-024-04155-y. [PMID: 38565786 DOI: 10.1007/s12035-024-04155-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 03/25/2024] [Indexed: 04/04/2024]
Abstract
(CCG) short tandem repeats (STRs) are predominantly enriched in genic regions, mutation hotspots for C to T truncating substitutions, and involved in various neurological and neurodevelopmental disorders. However, intact blocks of this class of STRs are widely overlooked with respect to their link with natural selection. The human neuron-specific gene, DISP2 (dispatched RND transporter family member 2), contains a (CCG) repeat in its 5' untranslated region. Here, we sequenced this STR in a sample of 448 Iranian individuals, consisting of late-onset neurocognitive disorder (NCD) (N = 203) and controls (N = 245). We found that the region spanning the (CCG) repeat was highly mutated, resulting in several flanking (CCG) residues. However, an 8-repeat of the (CCG) repeat was predominantly abundant (frequency = 0.92) across the two groups. While the overall distribution of genotypes was not different between the two groups (p > 0.05), we detected four genotypes in the NCD group only (2% of the NCD genotypes, Mid-p = 0.02), consisting of extreme short alleles, 5- and 6-repeats, that were not detected in the control group. The patients harboring those genotypes received the diagnoses of probable Alzheimer's disease and vascular dementia. We also found six genotypes in the control group only (2.5% of the control genotypes, Mid-p = 0.01) that consisted of the 8-repeat and extreme long alleles, 9- and 10-repeats, of which the 10-repeat was not detected in the NCD group. The (CCG) repeat specifically expanded in primates. In conclusion, we report an indication of natural selection at a novel hypermutable region in the human genome and divergent alleles and genotypes in late-onset NhCDs and controls. These findings reinforce the hypothesis that a collection of rare alleles and genotypes in a number of genes may unambiguously contribute to the cognition impairment component of late-onset NCDs.
Collapse
Affiliation(s)
- S Khamse
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - S Alizadeh
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - H R Khorram Khorshid
- Personalized Medicine and Genometabolomics Research Center, Hope Generation Foundation, Tehran, Iran
| | - A Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| | - N Tajeddin
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
- Department of Biology, Central Tehran Branch, Islamic Azad University, Tehran, Iran
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
5
|
Alizadeh S, Khamse S, Tajeddin N, Khorram Khorshid HR, Delbari A, Ohadi M. A GCC repeat in RAB26 undergoes natural selection in human and harbors divergent genotypes in late-onset Alzheimer's disease. Gene 2024; 893:147968. [PMID: 37931854 DOI: 10.1016/j.gene.2023.147968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 10/28/2023] [Accepted: 11/03/2023] [Indexed: 11/08/2023]
Abstract
Although mainly located in genic regions and being mutation hotspots, intact blocks of CG-rich trinucleotide short tandem repeats (STRs) are largely overlooked with respect to their link with natural selection. The human RAB26 (member RAS oncogene family) directs synaptic and secretory vesicles into preautophagosomal structures, inhibition of which specifically disrupts axonal transport of degradative organelles and leads to an axonal dystrophy, resembling Alzheimer's disease (AD). Human RAB26 contains a GCC repeat in the top 1st percent in respect of length. Here we sequenced this STR in 441 Iranian individuals, consisting of late-onset neurocognitive disorder (NCD) (N = 216) and controls (N = 225). In both groups, the 12-repeat allele and the 12/12 genotype were predominantly abundant. We found excess of homozygosity for non-12 alleles in the NCD group (Mid-P exact = 0.027). Furthermore, divergent genotypes were detected that were specific to the NCD group (2.8% of genotypes) (Mid-P exact = 0.006) or controls (3.1% of genotypes) (Mid-P exact = 0.004). The patients harboring divergent genotypes received the diagnosis of AD. Based on the predominant abundance of the 12-repeat and 12/12 genotype in both groups, excess of non-12 homozygosity in the NCD group, and divergent genotypes across the NCD and control groups, we propose natural selection at this locus and link with late-onset AD. Our findings strengthen the hypothesis that a collection of rare genotypes unambiguously contribute to the pathogenesis of late-onset NCDs, such as AD.
Collapse
Affiliation(s)
- S Alizadeh
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - S Khamse
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - N Tajeddin
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - H R Khorram Khorshid
- Personalized Medicine and Genometabolomics Research Center, Hope Generation Foundation, Tehran, Iran
| | - A Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
6
|
Dolzhenko E, English A, Dashnow H, De Sena Brandine G, Mokveld T, Rowell WJ, Karniski C, Kronenberg Z, Danzi MC, Cheung WA, Bi C, Farrow E, Wenger A, Chua KP, Martínez-Cerdeño V, Bartley TD, Jin P, Nelson DL, Zuchner S, Pastinen T, Quinlan AR, Sedlazeck FJ, Eberle MA. Characterization and visualization of tandem repeats at genome scale. Nat Biotechnol 2024:10.1038/s41587-023-02057-3. [PMID: 38168995 DOI: 10.1038/s41587-023-02057-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/06/2023] [Indexed: 01/05/2024]
Abstract
Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.
Collapse
Affiliation(s)
| | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Harriet Dashnow
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | | - Tom Mokveld
- Pacific Biosciences of California, Menlo Park, CA, USA
| | | | | | | | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Chengpeng Bi
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Emily Farrow
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron Wenger
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Khi Pin Chua
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Verónica Martínez-Cerdeño
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
- MIND Institute, UC Davis School of Medicine, Sacramento, CA, USA
| | - Trevor D Bartley
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - David L Nelson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | |
Collapse
|
7
|
Birnbaum R. Rediscovering tandem repeat variation in schizophrenia: challenges and opportunities. Transl Psychiatry 2023; 13:402. [PMID: 38123544 PMCID: PMC10733427 DOI: 10.1038/s41398-023-02689-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 11/23/2023] [Accepted: 11/27/2023] [Indexed: 12/23/2023] Open
Abstract
Tandem repeats (TRs) are prevalent throughout the genome, constituting at least 3% of the genome, and often highly polymorphic. The high mutation rate of TRs, which can be orders of magnitude higher than single-nucleotide polymorphisms and indels, indicates that they are likely to make significant contributions to phenotypic variation, yet their contribution to schizophrenia has been largely ignored by recent genome-wide association studies (GWAS). Tandem repeat expansions are already known causative factors for over 50 disorders, while common tandem repeat variation is increasingly being identified as significantly associated with complex disease and gene regulation. The current review summarizes key background concepts of tandem repeat variation as pertains to disease risk, elucidating their potential for schizophrenia association. An overview of next-generation sequencing-based methods that may be applied for TR genome-wide identification is provided, and some key methodological challenges in TR analyses are delineated.
Collapse
Affiliation(s)
- Rebecca Birnbaum
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
8
|
Margoliash J, Fuchs S, Li Y, Zhang X, Massarat A, Goren A, Gymrek M. Polymorphic short tandem repeats make widespread contributions to blood and serum traits. CELL GENOMICS 2023; 3:100458. [PMID: 38116119 PMCID: PMC10726533 DOI: 10.1016/j.xgen.2023.100458] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 09/09/2023] [Accepted: 11/07/2023] [Indexed: 12/21/2023]
Abstract
Short tandem repeats (STRs) are genomic regions consisting of repeated sequences of 1-6 bp in succession. Single-nucleotide polymorphism (SNP)-based genome-wide association studies (GWASs) do not fully capture STR effects. To study these effects, we imputed 445,720 STRs into genotype arrays from 408,153 White British UK Biobank participants and tested for association with 44 blood phenotypes. Using two fine-mapping methods, we identify 119 candidate causal STR-trait associations and estimate that STRs account for 5.2%-7.6% of causal variants identifiable from GWASs for these traits. These are among the strongest associations for multiple phenotypes, including a coding CTG repeat associated with apolipoprotein B levels, a promoter CGG repeat with platelet traits, and an intronic poly(A) repeat with mean platelet volume. Our study suggests that STRs make widespread contributions to complex traits, provides stringently selected candidate causal STRs, and demonstrates the need to consider a more complete view of genetic variation in GWASs.
Collapse
Affiliation(s)
- Jonathan Margoliash
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Shai Fuchs
- Pediatric Endocrine and Diabetes Unit, Edmond and Lily Safra Children’s Hospital, Sheba Medical Center, Ramat Gan, Israel
| | - Yang Li
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Xuan Zhang
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Arya Massarat
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Alon Goren
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
9
|
Shi Y, Niu Y, Zhang P, Luo H, Liu S, Zhang S, Wang J, Li Y, Liu X, Song T, Xu T, He S. Characterization of genome-wide STR variation in 6487 human genomes. Nat Commun 2023; 14:2092. [PMID: 37045857 PMCID: PMC10097659 DOI: 10.1038/s41467-023-37690-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 03/27/2023] [Indexed: 04/14/2023] Open
Abstract
Short tandem repeats (STRs) are abundant and highly mutagenic in the human genome. Many STR loci have been associated with a range of human genetic disorders. However, most population-scale studies on STR variation in humans have focused on European ancestry cohorts or are limited by sequencing depth. Here, we depicted a comprehensive map of 366,013 polymorphic STRs (pSTRs) constructed from 6487 deeply sequenced genomes, comprising 3983 Chinese samples (~31.5x, NyuWa) and 2504 samples from the 1000 Genomes Project (~33.3x, 1KGP). We found that STR mutations were affected by motif length, chromosome context and epigenetic features. We identified 3273 and 1117 pSTRs whose repeat numbers were associated with gene expression and 3'UTR alternative polyadenylation, respectively. We also implemented population analysis, investigated population differentiated signatures, and genotyped 60 known disease-causing STRs. Overall, this study further extends the scale of STR variation in humans and propels our understanding of the semantics of STRs.
Collapse
Affiliation(s)
- Yirong Shi
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yiwei Niu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Peng Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Huaxia Luo
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Shuai Liu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Sijia Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jiajia Wang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yanyan Li
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Xinyue Liu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Tingrui Song
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Tao Xu
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, 250117, Shandong, China.
| | - Shunmin He
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
10
|
Verbiest M, Maksimov M, Jin Y, Anisimova M, Gymrek M, Bilgin Sonay T. Mutation and selection processes regulating short tandem repeats give rise to genetic and phenotypic diversity across species. J Evol Biol 2023; 36:321-336. [PMID: 36289560 PMCID: PMC9990875 DOI: 10.1111/jeb.14106] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/29/2022] [Accepted: 08/01/2022] [Indexed: 02/03/2023]
Abstract
Short tandem repeats (STRs) are units of 1-6 bp that repeat in a tandem fashion in DNA. Along with single nucleotide polymorphisms and large structural variations, they are among the major genomic variants underlying genetic, and likely phenotypic, divergence. STRs experience mutation rates that are orders of magnitude higher than other well-studied genotypic variants. Frequent copy number changes result in a wide range of alleles, and provide unique opportunities for modulating complex phenotypes through variation in repeat length. While classical studies have identified key roles of individual STR loci, the advent of improved sequencing technology, high-quality genome assemblies for diverse species, and bioinformatics methods for genome-wide STR analysis now enable more systematic study of STR variation across wide evolutionary ranges. In this review, we explore mutation and selection processes that affect STR copy number evolution, and how these processes give rise to varying STR patterns both within and across species. Finally, we review recent examples of functional and adaptive changes linked to STRs.
Collapse
Affiliation(s)
- Max Verbiest
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Department of Molecular Life SciencesUniversity of ZurichZurichSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Mikhail Maksimov
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Ye Jin
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of BioengineeringUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Maria Anisimova
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Melissa Gymrek
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Tugce Bilgin Sonay
- Institute of Ecology, Evolution and Environmental BiologyColumbia UniversityNew YorkNew YorkUSA
| |
Collapse
|
11
|
Styk J, Pös Z, Pös O, Radvanszky J, Turnova EH, Buglyó G, Klimova D, Budis J, Repiska V, Nagy B, Szemes T. Microsatellite instability assessment is instrumental for Predictive, Preventive and Personalised Medicine: status quo and outlook. EPMA J 2023; 14:143-165. [PMID: 36866160 PMCID: PMC9971410 DOI: 10.1007/s13167-023-00312-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 01/06/2023] [Indexed: 01/26/2023]
Abstract
A form of genomic alteration called microsatellite instability (MSI) occurs in a class of tandem repeats (TRs) called microsatellites (MSs) or short tandem repeats (STRs) due to the failure of a post-replicative DNA mismatch repair (MMR) system. Traditionally, the strategies for determining MSI events have been low-throughput procedures that typically require assessment of tumours as well as healthy samples. On the other hand, recent large-scale pan-tumour studies have consistently highlighted the potential of massively parallel sequencing (MPS) on the MSI scale. As a result of recent innovations, minimally invasive methods show a high potential to be integrated into the clinical routine and delivery of adapted medical care to all patients. Along with advances in sequencing technologies and their ever-increasing cost-effectiveness, they may bring about a new era of Predictive, Preventive and Personalised Medicine (3PM). In this paper, we offered a comprehensive analysis of high-throughput strategies and computational tools for the calling and assessment of MSI events, including whole-genome, whole-exome and targeted sequencing approaches. We also discussed in detail the detection of MSI status by current MPS blood-based methods and we hypothesised how they may contribute to the shift from conventional medicine to predictive diagnosis, targeted prevention and personalised medical services. Increasing the efficacy of patient stratification based on MSI status is crucial for tailored decision-making. Contextually, this paper highlights drawbacks both at the technical level and those embedded deeper in cellular/molecular processes and future applications in routine clinical testing.
Collapse
Affiliation(s)
- Jakub Styk
- Institute of Medical Biology, Genetics and Clinical Genetics, Faculty of Medicine, Comenius University, 811 08 Bratislava, Slovakia ,Comenius University Science Park, 841 04 Bratislava, Slovakia ,Geneton Ltd, 841 04 Bratislava, Slovakia
| | - Zuzana Pös
- Comenius University Science Park, 841 04 Bratislava, Slovakia ,Geneton Ltd, 841 04 Bratislava, Slovakia ,Institute of Clinical and Translational Research, Biomedical Research Centre, Slovak Academy of Sciences, 845 05 Bratislava, Slovakia
| | - Ondrej Pös
- Comenius University Science Park, 841 04 Bratislava, Slovakia ,Geneton Ltd, 841 04 Bratislava, Slovakia
| | - Jan Radvanszky
- Comenius University Science Park, 841 04 Bratislava, Slovakia ,Institute of Clinical and Translational Research, Biomedical Research Centre, Slovak Academy of Sciences, 845 05 Bratislava, Slovakia ,Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04 Bratislava, Slovakia
| | - Evelina Hrckova Turnova
- Comenius University Science Park, 841 04 Bratislava, Slovakia ,Slovgen Ltd, 841 04 Bratislava, Slovakia
| | - Gergely Buglyó
- Department of Human Genetics, Faculty of Medicine, University of Debrecen, 4032 Debrecen, Hungary
| | - Daniela Klimova
- Institute of Medical Biology, Genetics and Clinical Genetics, Faculty of Medicine, Comenius University, 811 08 Bratislava, Slovakia
| | - Jaroslav Budis
- Comenius University Science Park, 841 04 Bratislava, Slovakia ,Geneton Ltd, 841 04 Bratislava, Slovakia ,Slovak Centre of Scientific and Technical Information, 811 04 Bratislava, Slovakia
| | - Vanda Repiska
- Institute of Medical Biology, Genetics and Clinical Genetics, Faculty of Medicine, Comenius University, 811 08 Bratislava, Slovakia ,Medirex Group Academy, NPO, 949 05 Nitra, Slovakia
| | - Bálint Nagy
- Comenius University Science Park, 841 04 Bratislava, Slovakia ,Department of Human Genetics, Faculty of Medicine, University of Debrecen, 4032 Debrecen, Hungary
| | - Tomas Szemes
- Comenius University Science Park, 841 04 Bratislava, Slovakia ,Geneton Ltd, 841 04 Bratislava, Slovakia ,Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04 Bratislava, Slovakia
| |
Collapse
|
12
|
Gochi L, Kawai Y, Fujimoto A. Comprehensive analysis of microsatellite polymorphisms in human populations. Hum Genet 2023; 142:45-57. [PMID: 36048238 DOI: 10.1007/s00439-022-02484-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 08/24/2022] [Indexed: 01/18/2023]
Abstract
Microsatellites (MS) are tandem repeats of short units, and have been used for population genetics, individual identification, and medical genetics. However, studies of MS on a whole-genome level are limited, and genotyping methods for MS have yet to be established. Here, we analyzed approximately 8.5 million MS regions using a previously developed MS caller for short reads (MIVcall method) for three large publicly available human genome sequencing data sets: the Korean Personal Genome Project, Simons Genome Diversity Project, and Human Genome Diversity Project. Our analysis identified 253,114 polymorphic MS. A comparison among different populations suggests that MS in the coding region evolved by random genetic drift and natural selection. In an analysis of genetic structures, MS clearly revealed population structures as SNPs and detected clusters that were not found by SNPs in African and Oceanian populations. Based on the MS polymorphisms, we selected MS marker candidates for individual identification. Finally, we applied our method to a deep sequenced ancient DNA sample. This study provides a comprehensive picture of MS polymorphisms and application to human population studies.
Collapse
Affiliation(s)
- Leo Gochi
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-0003, Japan
| | - Yosuke Kawai
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
| | - Akihiro Fujimoto
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-0003, Japan.
| |
Collapse
|
13
|
Global abundance of short tandem repeats is non-random in rodents and primates. BMC Genom Data 2022; 23:77. [PMID: 36329409 PMCID: PMC9635179 DOI: 10.1186/s12863-022-01092-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 10/18/2022] [Indexed: 11/06/2022] Open
Abstract
Background While of predominant abundance across vertebrate genomes and significant biological implications, the relevance of short tandem repeats (STRs) (also known as microsatellites) to speciation remains largely elusive and attributed to random coincidence for the most part. Here we collected data on the whole-genome abundance of mono-, di-, and trinucleotide STRs in nine species, encompassing rodents and primates, including rat, mouse, olive baboon, gelada, macaque, gorilla, chimpanzee, bonobo, and human. The collected data were used to analyze hierarchical clustering of the STR abundances in the selected species. Results We found massive differential STR abundances between the rodent and primate orders. In addition, while numerous STRs had random abundance across the nine selected species, the global abundance conformed to three consistent < clusters>, as follows: <rat, mouse>, <gelada, macaque, olive baboon>, and <gorilla, chimpanzee, bonobo, human>, which coincided with the phylogenetic distances of the selected species (p < 4E-05). Exceptionally, in the trinucleotide STR compartment, human was significantly distant from all other species. Conclusion Based on hierarchical clustering, we propose that the global abundance of STRs is non-random in rodents and primates, and probably had a determining impact on the speciation of the two orders. We also propose the STRs and STR lengths, which predominantly conformed to the phylogeny of the selected species, exemplified by (t)10, (ct)6, and (taa4). Phylogenetic and experimental platforms are warranted to further examine the observed patterns and the biological mechanisms associated with those STRs.
Collapse
|
14
|
Sánchez-Velásquez JJ, Pinedo-Bernal PN, Reyes-Flores LE, Yzásiga-Barrera C, Zelada-Mázmela E. Genetic diversity and relatedness inferred from microsatellite loci as a tool for broodstock management of fine flounder Paralichthys adspersus. AQUACULTURE AND FISHERIES 2022. [DOI: 10.1016/j.aaf.2021.06.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
15
|
A (GCC) repeat in SBF1 reveals a novel biological phenomenon in human and links to late onset neurocognitive disorder. Sci Rep 2022; 12:15480. [PMID: 36104480 PMCID: PMC9474449 DOI: 10.1038/s41598-022-19878-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Accepted: 09/06/2022] [Indexed: 12/22/2022] Open
Abstract
The human SBF1 (SET binding factor 1) gene, alternatively known as MTMR5, is predominantly expressed in the brain, and its epigenetic dysregulation is linked to late-onset neurocognitive disorders (NCDs), such as Alzheimer’s disease. This gene contains a (GCC)-repeat at the interval between + 1 and + 60 of the transcription start site (SBF1-202 ENST00000380817.8). We sequenced the SBF1 (GCC)-repeat in a sample of 542 Iranian individuals, consisting of late-onset NCDs (N = 260) and controls (N = 282). While multiple alleles were detected at this locus, the 8 and 9 repeats were predominantly abundant, forming > 95% of the allele pool across the two groups. Among a number of anomalies, the allele distribution was significantly different in the NCD group versus controls (Fisher’s exact p = 0.006), primarily as a result of enrichment of the 8-repeat in the former. The genotype distribution departed from the Hardy–Weinberg principle in both groups (p < 0.001), and was significantly different between the two groups (Fisher’s exact p = 0.001). We detected significantly low frequency of the 8/9 genotype in both groups, higher frequency of this genotype in the NCD group, and reverse order of 8/8 versus 9/9 genotypes in the NCD group versus controls. Biased heterozygous/heterozygous ratios were also detected for the 6/8 versus 6/9 genotypes (in favor of 6/8) across the human samples studied (Fisher’s exact p = 0.0001). Bioinformatics studies revealed that the number of (GCC)-repeats may change the RNA secondary structure and interaction sites at least across human exon 1. This STR was specifically expanded beyond 2-repeats in primates. In conclusion, we report indication of a novel biological phenomenon, in which there is selection against certain heterozygous genotypes at a STR locus in human. We also report different allele and genotype distribution at this STR locus in late-onset NCD versus controls. In view of the location of this STR in the 5′ untranslated region, RNA/RNA or RNA/DNA heterodimer formation of the involved genotypes and alternative RNA processing and/or translation should be considered.
Collapse
|
16
|
Cao DL, Zhang XJ, Xie SQ, Fan SJ, Qu XJ. Application of chloroplast genome in the identification of Traditional Chinese Medicine Viola philippica. BMC Genomics 2022; 23:540. [PMID: 35896957 PMCID: PMC9327190 DOI: 10.1186/s12864-022-08727-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 06/29/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Viola philippica Cav. is the only source plant of "Zi Hua Di Ding", which is a Traditional Chinese Medicine (TCM) that is utilized as an antifebrile and detoxicant agent for the treatment of acute pyogenic infections. Historically, many Viola species with violet flowers have been misused in "Zi Hua Di Ding". Viola have been recognized as a taxonomically difficult genera due to their highly similar morphological characteristics. Here, all common V. philippica adulterants were sampled. A total of 24 complete chloroplast (cp) genomes were analyzed, among these 5 cp genome sequences were downloaded from GenBank and 19 cp genomes, including 2 "Zi Hua Di Ding" purchased from a local TCM pharmacy, were newly sequenced. RESULTS The Viola cp genomes ranged from 156,483 bp to 158,940 bp in length. A total of 110 unique genes were annotated, including 76 protein-coding genes, 30 tRNAs, and four rRNAs. Sequence divergence analysis screening identified 16 highly diverged sequences; these could be used as markers for the identification of Viola species. The morphological, maximum likelihood and Bayesian inference trees of whole cp genome sequences and highly diverged sequences were divided into five monophyletic clades. The species in each of the five clades were identical in their positions within the morphological and cp genome tree. The shared morphological characters belonging to each clade was summarized. Interestingly, unique variable sites were found in ndhF, rpl22, and ycf1 of V. philippica, and these sites can be selected to distinguish V. philippica from samples all other Viola species, including its most closely related species. In addition, important morphological characteristics were proposed to assist the identification of V. philippica. We applied these methods to examine 2 "Zi Hua Di Ding" randomly purchased from the local TCM pharmacy, and this analysis revealed that the morphological and molecular characteristics were valid for the identification of V. philippica. CONCLUSIONS This study provides invaluable data for the improvement of species identification and germplasm of V. philippica that may facilitate the application of a super-barcode in TCM identification and enable future studies on phylogenetic evolution and safe medical applications.
Collapse
Affiliation(s)
- Dong-Ling Cao
- Shandong Provincial Key Laboratory of Plant Stress Research, College of Life Sciences, Shandong Normal University, Ji'nan, 250014, China
| | - Xue-Jie Zhang
- Shandong Provincial Key Laboratory of Plant Stress Research, College of Life Sciences, Shandong Normal University, Ji'nan, 250014, China
| | - Shao-Qiu Xie
- Shandong Provincial Key Laboratory of Plant Stress Research, College of Life Sciences, Shandong Normal University, Ji'nan, 250014, China
| | - Shou-Jin Fan
- Shandong Provincial Key Laboratory of Plant Stress Research, College of Life Sciences, Shandong Normal University, Ji'nan, 250014, China.
| | - Xiao-Jian Qu
- Shandong Provincial Key Laboratory of Plant Stress Research, College of Life Sciences, Shandong Normal University, Ji'nan, 250014, China.
| |
Collapse
|
17
|
Xiao X, Zhang CY, Zhang Z, Hu Z, Li M, Li T. Revisiting tandem repeats in psychiatric disorders from perspectives of genetics, physiology, and brain evolution. Mol Psychiatry 2022; 27:466-475. [PMID: 34650204 DOI: 10.1038/s41380-021-01329-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 09/16/2021] [Accepted: 09/28/2021] [Indexed: 01/28/2023]
Abstract
Genome-wide association studies (GWASs) have revealed substantial genetic components comprised of single nucleotide polymorphisms (SNPs) in the heritable risk of psychiatric disorders. However, genetic risk factors not covered by GWAS also play pivotal roles in these illnesses. Tandem repeats, which are likely functional but frequently overlooked by GWAS, may account for an important proportion in the "missing heritability" of psychiatric disorders. Despite difficulties in characterizing and quantifying tandem repeats in the genome, studies have been carried out in an attempt to describe impact of tandem repeats on gene regulation and human phenotypes. In this review, we have introduced recent research progress regarding the genomic distribution and regulatory mechanisms of tandem repeats. We have also summarized the current knowledge of the genetic architecture and biological underpinnings of psychiatric disorders brought by studies of tandem repeats. These findings suggest that tandem repeats, in candidate psychiatric risk genes or in different levels of linkage disequilibrium (LD) with psychiatric GWAS SNPs and haplotypes, may modulate biological phenotypes related to psychiatric disorders (e.g., cognitive function and brain physiology) through regulating alternative splicing, promoter activity, enhancer activity and so on. In addition, many tandem repeats undergo tight natural selection in the human lineage, and likely exert crucial roles in human brain evolution. Taken together, the putative roles of tandem repeats in the pathogenesis of psychiatric disorders is strongly implicated, and using examples from previous literatures, we wish to call for further attention to tandem repeats in the post-GWAS era of psychiatric disorders.
Collapse
Affiliation(s)
- Xiao Xiao
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Chu-Yi Zhang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.,Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Zhuohua Zhang
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Zhonghua Hu
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Department of Critical Care Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Hunan Key Laboratory of Animal Models for Human Diseases, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Eye Center of Xiangya Hospital and Hunan Key Laboratory of Ophthalmology, Central South University, Changsha, Hunan, China. .,National Clinical Research Center on Mental Disorders, Changsha, Hunan, China.
| | - Ming Li
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China. .,CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China. .,KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| | - Tao Li
- Affiliated Mental Health Center & Hangzhou Seventh People's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China. .,Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Guangzhou, China.
| |
Collapse
|
18
|
Voicu AA, Krützen M, Bilgin Sonay T. Short Tandem Repeats as a High-Resolution Marker for Capturing Recent Orangutan Population Evolution. FRONTIERS IN BIOINFORMATICS 2021; 1:695784. [PMID: 36303734 PMCID: PMC9581056 DOI: 10.3389/fbinf.2021.695784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 07/26/2021] [Indexed: 11/30/2022] Open
Abstract
The genus Pongo is ideal to study population genetics adaptation, given its remarkable phenotypic divergence and the highly contrasting environmental conditions it’s been exposed to. Studying its genetic variation bears the promise to reveal a motion picture of these great apes’ evolutionary and adaptive history, and also helps us expand our knowledge of the patterns of adaptation and evolution. In this work, we advance the understanding of the genetic variation among wild orangutans through a genome-wide study of short tandem repeats (STRs). Their elevated mutation rate makes STRs ideal markers for the study of recent evolution within a given population. Current technological and algorithmic advances have rendered their sequencing and discovery more accurate, therefore their potential can be finally leveraged in population genetics studies. To study patterns of population variation within the wild orangutan population, we genotyped the short tandem repeats in a population of 21 individuals spanning four Sumatran and Bornean (sub-) species and eight Southeast Asian regions. We studied the impact of sequencing depth on our ability to genotype STRs and found that the STR copy number changes function as a powerful marker, correctly capturing the demographic history of these populations, even the divergences as recent as 10 Kya. Moreover, gene ontology enrichments for genes close to STR variants are aligned with local adaptations in the two islands. Coupled with more advanced STR-compatible population models, and selection tests, genomic studies based on STRs will be able to reduce the gap caused by the missing heritability for species with recent adaptations.
Collapse
Affiliation(s)
| | - Michael Krützen
- Department of Anthropology, University of Zurich, Zurich, Switzerland
| | - Tugce Bilgin Sonay
- Department of Anthropology, University of Zurich, Zurich, Switzerland
- Department of Ecology, Evolution and Environmental Biology, Columbia University, New York, NY, United States
- *Correspondence: Tugce Bilgin Sonay,
| |
Collapse
|
19
|
Ahmed Z, Renart EG, Zeeshan S. Genomics pipelines to investigate susceptibility in whole genome and exome sequenced data for variant discovery, annotation, prediction and genotyping. PeerJ 2021; 9:e11724. [PMID: 34395068 PMCID: PMC8320519 DOI: 10.7717/peerj.11724] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 06/14/2021] [Indexed: 12/12/2022] Open
Abstract
Over the last few decades, genomics is leading toward audacious future, and has been changing our views about conducting biomedical research, studying diseases, and understanding diversity in our society across the human species. The whole genome and exome sequencing (WGS/WES) are two of the most popular next-generation sequencing (NGS) methodologies that are currently being used to detect genetic variations of clinical significance. Investigating WGS/WES data for the variant discovery and genotyping is based on the nexus of different data analytic applications. Although several bioinformatics applications have been developed, and many of those are freely available and published. Timely finding and interpreting genetic variants are still challenging tasks among diagnostic laboratories and clinicians. In this study, we are interested in understanding, evaluating, and reporting the current state of solutions available to process the NGS data of variable lengths and types for the identification of variants, alleles, and haplotypes. Residing within the scope, we consulted high quality peer reviewed literature published in last 10 years. We were focused on the standalone and networked bioinformatics applications proposed to efficiently process WGS and WES data, and support downstream analysis for gene-variant discovery, annotation, prediction, and interpretation. We have discussed our findings in this manuscript, which include but not are limited to the set of operations, workflow, data handling, involved tools, technologies and algorithms and limitations of the assessed applications.
Collapse
Affiliation(s)
- Zeeshan Ahmed
- Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA.,Department of Medicine, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
| | - Eduard Gibert Renart
- Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
| | - Saman Zeeshan
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
| |
Collapse
|
20
|
Pilav A, Pojskić N, Kalajdžić A, Ahatović A, Džehverović M, Čakar J. Analysis of forensic genetic parameters of 22 autosomal STR markers (PowerPlex® Fusion System) in a population sample from Bosnia and Herzegovina. Ann Hum Biol 2020; 47:273-283. [PMID: 32299246 DOI: 10.1080/03014460.2020.1740319] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Background: Bosnia and Herzegovina is a multinational and multireligious country, located in the western part of the Balkan Peninsula. Migrations through history were a key factor in the genetic identity of the Bosnian-Herzegovinian population.Aim: To analyse genetic polymorphisms of 22 autosomal short tandem repeat (STR) loci in the population of Bosnia and Herzegovina and to compare STR allele frequencies for STR loci with the reference data for European populations.Subjects and methods: The study was conducted among 600 unrelated individuals from all regions of Bosnia and Herzegovina. Genotyping was performed using the PowerPlex® Fusion amplification kit. Allele frequencies and statistical parameters were calculated, as well as the genetic distance among analysed populations through the construction of a neighbor-joining dendrogram.Results: STR loci included in the PowerPlex® Fusion amplification kit showed high discriminatory power indicating their reliability for human identification and paternity testing. The neighbor-joining dendrogram based on the results of genetic distance analysis showed that the Bosnian and Herzegovinian population has the greatest genetic distance from Turkish and Hungarian populations and greatest similarity with Croatian, Slovenian, and Serbian populations.Conclusion: The results of this study strongly support the application of 22 autosomal genetic markers for paternity testing and personal identity testing and are in agreement with most previous human studies in the investigated human populations.
Collapse
Affiliation(s)
- Amela Pilav
- Institute for Genetic Engineering and Biotechnology, University of Sarajevo, Bosnia and Herzegovina
| | - Naris Pojskić
- Institute for Genetic Engineering and Biotechnology, University of Sarajevo, Bosnia and Herzegovina
| | - Abdurahim Kalajdžić
- Institute for Genetic Engineering and Biotechnology, University of Sarajevo, Bosnia and Herzegovina
| | - Anesa Ahatović
- Institute for Genetic Engineering and Biotechnology, University of Sarajevo, Bosnia and Herzegovina
| | - Mirela Džehverović
- Institute for Genetic Engineering and Biotechnology, University of Sarajevo, Bosnia and Herzegovina
| | - Jasmina Čakar
- Institute for Genetic Engineering and Biotechnology, University of Sarajevo, Bosnia and Herzegovina
| |
Collapse
|
21
|
Pelassa I, Cibelli M, Villeri V, Lilliu E, Vaglietti S, Olocco F, Ghirardi M, Montarolo PG, Corà D, Fiumara F. Compound Dynamics and Combinatorial Patterns of Amino Acid Repeats Encode a System of Evolutionary and Developmental Markers. Genome Biol Evol 2020; 11:3159-3178. [PMID: 31589292 PMCID: PMC6839033 DOI: 10.1093/gbe/evz216] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/27/2019] [Indexed: 01/05/2023] Open
Abstract
Homopolymeric amino acid repeats (AARs) like polyalanine (polyA) and polyglutamine (polyQ) in some developmental proteins (DPs) regulate certain aspects of organismal morphology and behavior, suggesting an evolutionary role for AARs as developmental "tuning knobs." It is still unclear, however, whether these are occasional protein-specific phenomena or hints at the existence of a whole AAR-based regulatory system in DPs. Using novel approaches to trace their functional and evolutionary history, we find quantitative evidence supporting a generalized, combinatorial role of AARs in developmental processes with evolutionary implications. We observe nonrandom AAR distributions and combinations in HOX and other DPs, as well as in their interactomes, defining elements of a proteome-wide combinatorial functional code whereby different AARs and their combinations appear preferentially in proteins involved in the development of specific organs/systems. Such functional associations can be either static or display detectable evolutionary dynamics. These findings suggest that progressive changes in AAR occurrence/combination, by altering embryonic development, may have contributed to taxonomic divergence, leaving detectable traces in the evolutionary history of proteomes. Consistent with this hypothesis, we find that the evolutionary trajectories of the 20 AARs in eukaryotic proteomes are highly interrelated and their individual or compound dynamics can sharply mark taxonomic boundaries, or display clock-like trends, carrying overall a strong phylogenetic signal. These findings provide quantitative evidence and an interpretive framework outlining a combinatorial system of AARs whose compound dynamics mark at the same time DP functions and evolutionary transitions.
Collapse
Affiliation(s)
- Ilaria Pelassa
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Marica Cibelli
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Veronica Villeri
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Elena Lilliu
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Serena Vaglietti
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Federica Olocco
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Mirella Ghirardi
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy.,National Institute of Neuroscience (INN), Torino, Italy
| | - Pier Giorgio Montarolo
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy.,National Institute of Neuroscience (INN), Torino, Italy
| | - Davide Corà
- Department of Translational Medicine, Piemonte Orientale University, Novara, Italy.,Center for Translational Research on Autoimmune and Allergic Disease (CAAD), Novara, Italy
| | - Ferdinando Fiumara
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy.,National Institute of Neuroscience (INN), Torino, Italy
| |
Collapse
|
22
|
Abstract
Individuals within a species can exhibit vast variation in copy number of repetitive DNA elements. This variation may contribute to complex traits such as lifespan and disease, yet it is only infrequently considered in genotype-phenotype associations. Although the possible importance of copy number variation is widely recognized, accurate copy number quantification remains challenging. Here, we assess the technical reproducibility of several major methods for copy number estimation as they apply to the large repetitive ribosomal DNA array (rDNA). rDNA encodes the ribosomal RNAs and exists as a tandem gene array in all eukaryotes. Repeat units of rDNA are kilobases in size, often with several hundred units comprising the array, making rDNA particularly intractable to common quantification techniques. We evaluate pulsed-field gel electrophoresis, droplet digital PCR, and Nextera-based whole genome sequencing as approaches to copy number estimation, comparing techniques across model organisms and spanning wide ranges of copy numbers. Nextera-based whole genome sequencing, though commonly used in recent literature, produced high error. We explore possible causes for this error and provide recommendations for best practices in rDNA copy number estimation. We present a resource of high-confidence rDNA copy number estimates for a set of S. cerevisiae and C. elegans strains for future use. We furthermore explore the possibility for FISH-based copy number estimation, an alternative that could potentially characterize copy number on a cellular level.
Collapse
|
23
|
Chaley M, Kutyrkin V. Stochastic models for description of structural-statistical properties in DNA sequences. J Theor Biol 2019; 496:110126. [PMID: 31866393 DOI: 10.1016/j.jtbi.2019.110126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 12/02/2019] [Accepted: 12/18/2019] [Indexed: 10/25/2022]
Abstract
New stochastic models based on a notion of stochastic codon are proposed. These models, presented by special random strings, describe practical structural-statistical properties which are peculiar to coding DNA both from prokaryotic and eukaryotic genomes. In such the case coding regions are considered as the realizations of random strings. The models introduced explain existence of latent profile periodicity with a period which is not only equal to but also multiplied of three in the coding regions. For the sequences with latent profile period multiplied of three, but not equal to three, the proposed models ensure existence of special property of 3-regularity in these sequences which is practically recognized in all coding sequences of the genomes analyzed. Feasibility of the stochastic models proposed was tested in numerical experiments with binary reencoded paragraphs of literary texts (in English and Italian languages), used as analog of DNA coding regions.
Collapse
Affiliation(s)
- Maria Chaley
- Institute of Mathematical Problems of Biology RAS - Branch of Keldysh Institute of Applied Mathematics RAS, Professor Vitkevich St.,1, 142290 Pushchino, Russia.
| | - Vladimir Kutyrkin
- Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st.,5, 105005 Moscow, Russia.
| |
Collapse
|
24
|
Kinney N, Kang L, Eckstrand L, Pulenthiran A, Samuel P, Anandakrishnan R, Varghese RT, Michalak P, Garner HR. Abundance of ethnically biased microsatellites in human gene regions. PLoS One 2019; 14:e0225216. [PMID: 31830051 PMCID: PMC6907796 DOI: 10.1371/journal.pone.0225216] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 10/29/2019] [Indexed: 12/16/2022] Open
Abstract
Microsatellites-a type of short tandem repeat (STR)-have been used for decades as putatively neutral markers to study the genetic structure of diverse human populations. However, recent studies have demonstrated that some microsatellites contribute to gene expression, cis heritability, and phenotype. As a corollary, some microsatellites may contribute to differential gene expression and RNA/protein structure stability in distinct human populations. To test this hypothesis, we investigate genotype frequencies, functional relevance, and adaptive potential of microsatellites in five super-populations (ethnicities) drawn from the 1000 Genomes Project. We discover 3,984 ethnically-biased microsatellite loci (EBML); for each EBML at least one ethnicity has genotype frequencies statistically different from the remaining four. South Asian, East Asian, European, and American EBML show significant overlap; on the contrary, the set of African EBML is mostly unique. We cross-reference the 3,984 EBML with 2,060 previously identified expression STRs (eSTRs); repeats known to affect gene expression (64 total) are over-represented. The most significant pathway enrichments are those associated with the matrisome: a broad collection of genes encoding the extracellular matrix and its associated proteins. At least 14 of the EBML have established links to human disease. Analysis of the 3,984 EBML with respect to known selective sweep regions in the genome shows that allelic variation in some of them is likely associated with adaptive evolution.
Collapse
Affiliation(s)
- Nick Kinney
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
- Gibbs Cancer Center & Research Institute, Spartanburg, SC, United States of America
| | - Lin Kang
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
- Gibbs Cancer Center & Research Institute, Spartanburg, SC, United States of America
| | - Laurel Eckstrand
- Virginia-Maryland College of Veterinary Medicine, Blacksburg, VA, United States of America
| | - Arichanah Pulenthiran
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
| | - Peter Samuel
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
| | - Ramu Anandakrishnan
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
| | - Robin T. Varghese
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
| | - P. Michalak
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
- Virginia-Maryland College of Veterinary Medicine, Blacksburg, VA, United States of America
- Institute of Evolution, University of Haifa, Haifa, Israel
| | - Harold R. Garner
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
- Gibbs Cancer Center & Research Institute, Spartanburg, SC, United States of America
| |
Collapse
|
25
|
Dolzhenko E, Deshpande V, Schlesinger F, Krusche P, Petrovski R, Chen S, Emig-Agius D, Gross A, Narzisi G, Bowman B, Scheffler K, van Vugt JJFA, French C, Sanchis-Juan A, Ibáñez K, Tucci A, Lajoie BR, Veldink JH, Raymond FL, Taft RJ, Bentley DR, Eberle MA. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. BIOINFORMATICS (OXFORD, ENGLAND) 2019; 35:4754-4756. [PMID: 31134279 DOI: 10.1101/361162] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 04/26/2019] [Accepted: 05/23/2019] [Indexed: 05/25/2023]
Abstract
SUMMARY We describe a novel computational method for genotyping repeats using sequence graphs. This method addresses the long-standing need to accurately genotype medically important loci containing repeats adjacent to other variants or imperfect DNA repeats such as polyalanine repeats. Here we introduce a new version of our repeat genotyping software, ExpansionHunter, that uses this method to perform targeted genotyping of a broad class of such loci. AVAILABILITY AND IMPLEMENTATION ExpansionHunter is implemented in C++ and is available under the Apache License Version 2.0. The source code, documentation, and Linux/macOS binaries are available at https://github.com/Illumina/ExpansionHunter/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Peter Krusche
- Illumina Cambridge Ltd, Illumina Centre, 19 Granta Park, Great Abington, Cambridge CB21 6DF, UK
| | - Roman Petrovski
- Illumina Cambridge Ltd, Illumina Centre, 19 Granta Park, Great Abington, Cambridge CB21 6DF, UK
| | - Sai Chen
- Illumina Inc., San Diego, CA 92122, USA
| | | | | | - Giuseppe Narzisi
- Computational Biology, New York Genome Center, New York, NY 10013, USA
| | | | | | - Joke J F A van Vugt
- UMC Utrecht Brain Center, Utrecht University, 3508 AB Utrecht, The Netherlands
| | - Courtney French
- Department of Medical Genetics, NHS Blood and Transplant Centre, Cambridge, CB2 0PT, UK
| | - Alba Sanchis-Juan
- Department of Haematology, University of Cambridge, NHS Blood and Transplant Centre, Cambridge, CB2 0PT, UK
- NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | - Kristina Ibáñez
- Genomics England, Queen Mary University London, London EC1M 6BQ, UK
| | - Arianna Tucci
- Genomics England, Queen Mary University London, London EC1M 6BQ, UK
| | | | - Jan H Veldink
- UMC Utrecht Brain Center, Utrecht University, 3508 AB Utrecht, The Netherlands
| | - F Lucy Raymond
- Department of Medical Genetics, NHS Blood and Transplant Centre, Cambridge, CB2 0PT, UK
| | | | - David R Bentley
- Illumina Cambridge Ltd, Illumina Centre, 19 Granta Park, Great Abington, Cambridge CB21 6DF, UK
| | | |
Collapse
|
26
|
Mousavi N, Shleizer-Burko S, Yanicky R, Gymrek M. Profiling the genome-wide landscape of tandem repeat expansions. Nucleic Acids Res 2019; 47:e90. [PMID: 31194863 PMCID: PMC6735967 DOI: 10.1093/nar/gkz501] [Citation(s) in RCA: 121] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 05/15/2019] [Accepted: 05/28/2019] [Indexed: 12/15/2022] Open
Abstract
Tandem repeat (TR) expansions have been implicated in dozens of genetic diseases, including Huntington's Disease, Fragile X Syndrome, and hereditary ataxias. Furthermore, TRs have recently been implicated in a range of complex traits, including gene expression and cancer risk. While the human genome harbors hundreds of thousands of TRs, analysis of TR expansions has been mainly limited to known pathogenic loci. A major challenge is that expanded repeats are beyond the read length of most next-generation sequencing (NGS) datasets and are not profiled by existing genome-wide tools. We present GangSTR, a novel algorithm for genome-wide genotyping of both short and expanded TRs. GangSTR extracts information from paired-end reads into a unified model to estimate maximum likelihood TR lengths. We validate GangSTR on real and simulated data and show that GangSTR outperforms alternative methods in both accuracy and speed. We apply GangSTR to a deeply sequenced trio to profile the landscape of TR expansions in a healthy family and validate novel expansions using orthogonal technologies. Our analysis reveals that healthy individuals harbor dozens of long TR alleles not captured by current genome-wide methods. GangSTR will likely enable discovery of novel disease-associated variants not currently accessible from NGS.
Collapse
Affiliation(s)
- Nima Mousavi
- Department of Electrical and Computer Engineering, University of California San Diego, 9500 Gilman Drive, MC 0639, La Jolla, CA 92093, USA
| | - Sharona Shleizer-Burko
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, MC 0639, La Jolla, CA 92093, USA
| | - Richard Yanicky
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, MC 0639, La Jolla, CA 92093, USA
| | - Melissa Gymrek
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, MC 0639, La Jolla, CA 92093, USA
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, MC 0639, La Jolla, CA 92093, USA
| |
Collapse
|
27
|
Hamm MO, Moss BL, Leydon AR, Gala HP, Lanctot A, Ramos R, Klaeser H, Lemmex AC, Zahler ML, Nemhauser JL, Wright RC. Accelerating structure-function mapping using the ViVa webtool to mine natural variation. PLANT DIRECT 2019; 3:e00147. [PMID: 31372596 PMCID: PMC6658840 DOI: 10.1002/pld3.147] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Revised: 04/20/2019] [Accepted: 04/29/2019] [Indexed: 05/13/2023]
Abstract
Thousands of sequenced genomes are now publicly available capturing a significant amount of natural variation within plant species; yet, much of these data remain inaccessible to researchers without significant bioinformatics experience. Here, we present a webtool called ViVa (Visualizing Variation) which aims to empower any researcher to take advantage of the amazing genetic resource collected in the Arabidopsis thaliana 1001 Genomes Project (http://1001genomes.org). ViVa facilitates data mining on the gene, gene family, or gene network level. To test the utility and accessibility of ViVa, we assembled a team with a range of expertise within biology and bioinformatics to analyze the natural variation within the well-studied nuclear auxin signaling pathway. Our analysis has provided further confirmation of existing knowledge and has also helped generate new hypotheses regarding this well-studied pathway. These results highlight how natural variation could be used to generate and test hypotheses about less-studied gene families and networks, especially when paired with biochemical and genetic characterization. ViVa is also readily extensible to databases of interspecific genetic variation in plants as well as other organisms, such as the 3,000 Rice Genomes Project ( http://snp-seek.irri.org/) and human genetic variation ( https://www.ncbi.nlm.nih.gov/clinvar/).
Collapse
Affiliation(s)
- Morgan O. Hamm
- Department of BiologyUniversity of WashingtonSeattleWashington
| | | | | | - Hardik P. Gala
- Department of BiologyUniversity of WashingtonSeattleWashington
| | - Amy Lanctot
- Department of BiologyUniversity of WashingtonSeattleWashington
| | - Román Ramos
- Department of BiologyUniversity of WashingtonSeattleWashington
| | - Hannah Klaeser
- Department of BiologyWhitman CollegeWalla WallaWashington
| | | | | | | | - R. Clay Wright
- Biological Systems EngineeringVirginia TechBlacksburgVirginia
| |
Collapse
|
28
|
Lauschke VM, Zhou Y, Ingelman-Sundberg M. Novel genetic and epigenetic factors of importance for inter-individual differences in drug disposition, response and toxicity. Pharmacol Ther 2019; 197:122-152. [PMID: 30677473 PMCID: PMC6527860 DOI: 10.1016/j.pharmthera.2019.01.002] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Individuals differ substantially in their response to pharmacological treatment. Personalized medicine aspires to embrace these inter-individual differences and customize therapy by taking a wealth of patient-specific data into account. Pharmacogenomic constitutes a cornerstone of personalized medicine that provides therapeutic guidance based on the genomic profile of a given patient. Pharmacogenomics already has applications in the clinics, particularly in oncology, whereas future development in this area is needed in order to establish pharmacogenomic biomarkers as useful clinical tools. In this review we present an updated overview of current and emerging pharmacogenomic biomarkers in different therapeutic areas and critically discuss their potential to transform clinical care. Furthermore, we discuss opportunities of technological, methodological and institutional advances to improve biomarker discovery. We also summarize recent progress in our understanding of epigenetic effects on drug disposition and response, including a discussion of the only few pharmacogenomic biomarkers implemented into routine care. We anticipate, in part due to exciting rapid developments in Next Generation Sequencing technologies, machine learning methods and national biobanks, that the field will make great advances in the upcoming years towards unlocking the full potential of genomic data.
Collapse
Affiliation(s)
- Volker M Lauschke
- Department of Physiology and Pharmacology, Section of Pharmacogenetics, Biomedicum 5B, Karolinska Institutet, SE-171 77 Stockholm, Sweden
| | - Yitian Zhou
- Department of Physiology and Pharmacology, Section of Pharmacogenetics, Biomedicum 5B, Karolinska Institutet, SE-171 77 Stockholm, Sweden
| | - Magnus Ingelman-Sundberg
- Department of Physiology and Pharmacology, Section of Pharmacogenetics, Biomedicum 5B, Karolinska Institutet, SE-171 77 Stockholm, Sweden.
| |
Collapse
|
29
|
Zeeshan S, Xiong R, Liang BT, Ahmed Z. 100 Years of evolving gene-disease complexities and scientific debutants. Brief Bioinform 2019; 21:885-905. [PMID: 30972412 DOI: 10.1093/bib/bbz038] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Revised: 03/06/2019] [Accepted: 03/08/2019] [Indexed: 12/22/2022] Open
Abstract
It's been over 100 years since the word `gene' is around and progressively evolving in several scientific directions. Time-to-time technological advancements have heavily revolutionized the field of genomics, especially when it's about, e.g. triple code development, gene number proposition, genetic mapping, data banks, gene-disease maps, catalogs of human genes and genetic disorders, CRISPR/Cas9, big data and next generation sequencing, etc. In this manuscript, we present the progress of genomics from pea plant genetics to the human genome project and highlight the molecular, technical and computational developments. Studying genome and epigenome led to the fundamentals of development and progression of human diseases, which includes chromosomal, monogenic, multifactorial and mitochondrial diseases. World Health Organization has classified, standardized and maintained all human diseases, when many academic and commercial online systems are sharing information about genes and linking to associated diseases. To efficiently fathom the wealth of this biological data, there is a crucial need to generate appropriate gene annotation repositories and resources. Our focus has been how many gene-disease databases are available worldwide and which sources are authentic, timely updated and recommended for research and clinical purposes. In this manuscript, we have discussed and compared 43 such databases and bioinformatics applications, which enable users to connect, explore and, if possible, download gene-disease data.
Collapse
Affiliation(s)
- Saman Zeeshan
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Ruoyun Xiong
- Department of Genetics and Genome Sciences, School of Medicine, University of Connecticut Health Center, Farmington Ave, Farmington, CT, USA
| | - Bruce T Liang
- Department of Genetics and Genome Sciences, School of Medicine, University of Connecticut Health Center, Farmington Ave, Farmington, CT, USA.,Pat and Jim Calhoun Cardiology Center, School of Medicine, University of Connecticut Health Center, Farmington Ave, Farmington, CT, USA
| | - Zeeshan Ahmed
- Department of Genetics and Genome Sciences, School of Medicine, University of Connecticut Health Center, Farmington Ave, Farmington, CT, USA
| |
Collapse
|
30
|
Press MO, Hall AN, Morton EA, Queitsch C. Substitutions Are Boring: Some Arguments about Parallel Mutations and High Mutation Rates. Trends Genet 2019; 35:253-264. [PMID: 30797597 PMCID: PMC6435258 DOI: 10.1016/j.tig.2019.01.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 12/20/2018] [Accepted: 01/14/2019] [Indexed: 12/31/2022]
Abstract
Extant genomes are largely shaped by global transposition, copy-number fluctuation, and rearrangement of DNA sequences rather than by substitutions of single nucleotides. Although many of these large-scale mutations have low probabilities and are unlikely to repeat, others are recurrent or predictable in their effects, leading to stereotyped genome architectures and genetic variation in both eukaryotes and prokaryotes. Such recurrent, parallel mutation modes can profoundly shape the paths taken by evolution and undermine common models of evolutionary genetics. Similar patterns are also evident at the smaller scales of individual genes or short sequences. The scale and extent of this 'non-substitution' variation has recently come into focus through the advent of new genomic technologies; however, it is still not widely considered in genotype-phenotype association studies. In this review we identify common features of these disparate mutational phenomena and comment on the importance and interpretation of these mutational patterns.
Collapse
Affiliation(s)
| | - Ashley N Hall
- Department of Genome Sciences, University of Washington, Seattle, WA 91895, USA; Department of Molecular and Cellular Biology, University of Washington, Seattle, WA 91895, USA
| | - Elizabeth A Morton
- Department of Genome Sciences, University of Washington, Seattle, WA 91895, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, WA 91895, USA.
| |
Collapse
|
31
|
Wu X, Xu FL, Ding M, Zhang JJ, Yao J, Wang BJ. Characterization and functional analyses of the human HTR1A gene: 5' regulatory region modulates gene expression in vitro. BMC Genet 2018; 19:115. [PMID: 30594152 PMCID: PMC6311061 DOI: 10.1186/s12863-018-0708-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 12/19/2018] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND The serotonin neurotransmitter (5-HT) and its receptors have important roles in neuropsychiatric disorders such as schizophrenia. The aim of this study was to investigate the functional sequences of the 5' regulation region of the human HTR1A gene to explore the effects on the expression of the 5-HT1A receptor. METHODS Fourteen recombinant pGL3-basic vectors containing deletion fragments of the HTR1A gene regulatory region were transfected with HEK-293 and SK-N-SH cells. The relative chemiluminescence intensities of different length fragments were analyzed. The JASPAR software was used for the prediction of transcription factors. RESULTS In the HEK-293 cells, the relative chemiluminescence intensity of the - 1649 bp to - 1550 bp (ATG + 1) fragment was significantly different. Two inhibitory activity regions were found in the - 1409 bp to - 1381 bp and - 1196 bp to - 1124 bp fragments, which might be bound to the GATA or SOX10 transcription factors as predicted by the JASPAR software. In addition, the fragments located from - 1124 bp to - 1064 bp and from - 908 bp to - 722 bp up-regulated protein expression. Only the sequence from - 1550 bp to - 1409 bp demonstrated a difference in luciferase expression in the both cell lines. According to the results of the 5'-UTR truncated vectors, there was a repression region at the distal end of the 5'-UTR, an enhancer region might be present at the proximal end of the transcription start site. CONCLUSIONS Although the functional sequences of the HTR1A gene regulatory region were confirmed, the regulatory factors and functional components require further investigation.
Collapse
Affiliation(s)
- Xue Wu
- School of Forensic Medicine, China Medical University, No. 77 Puhe Road, Shenbei New District, Shenyang, 110122, China
| | - Feng-Ling Xu
- School of Forensic Medicine, China Medical University, No. 77 Puhe Road, Shenbei New District, Shenyang, 110122, China
| | - Mei Ding
- School of Forensic Medicine, China Medical University, No. 77 Puhe Road, Shenbei New District, Shenyang, 110122, China
| | - Jing-Jing Zhang
- School of Forensic Medicine, China Medical University, No. 77 Puhe Road, Shenbei New District, Shenyang, 110122, China
| | - Jun Yao
- School of Forensic Medicine, China Medical University, No. 77 Puhe Road, Shenbei New District, Shenyang, 110122, China.
| | - Bao-Jie Wang
- School of Forensic Medicine, China Medical University, No. 77 Puhe Road, Shenbei New District, Shenyang, 110122, China.
| |
Collapse
|
32
|
Abstract
Thalassemia is an inherited autosomal recessive disorder with microcytic hypochromic anemia resulting from reduced or absent synthesis of 1 or more of the globin chains of hemoglobin. This study provided the insight into prevalence and molecular characterization of thalassemia in Hakka population. 14,524 unrelated subjects were included in our study from January 2015 to November 2017. All the subjects were detected by hematological analysis, hemoglobin electrophoresis analysis, and molecular diagnosis (gap-polymerase chain reaction and flow-through hybridization technology). Data analysis was used to compare allele frequencies between the Hakka populations. Seven thousand four hundred twenty-two cases of microcytosis were found. The percentage of microcytosis in Meizhou, Ganzhou, and Heyuan was 50.91% (6738/13,236), 51.27% (445/868), and 56.90% (239/420), respectively. A total of 5516 mutant chromosomes were identified, including 3775 α-thalassemia and 1741 β-thalassemia. --/αα was the most common α-thalassemia genotype, followed by -α/αα and -α/αα, accounted for 84.92% of α-thalassemia genotypes. Twelve kinds of mutations and 26 genotypes in β-thalassemia were found. IVS-II-654(C→T), CD41-42(-TCTT), -28(A→G), and CD17(A→T) alleles accounted for 92.65% of these mutations. IVS-II-654/N, CD41-42/N, -28/N, CD17/N genotypes accounted for 91.53% of β-thalassemia genotypes. 27 fetuses with at-risk pregnancies were subjected to prenatal diagnosis. Five fetuses were Bart's hydrops syndrome and 2 fetuses with β-thalassemia major. There were some differences in molecular characterization of thalassemia among Hakka people in different areas of southern China. Our results enriched the related information of thalassemia in the region, which provided valuable references for the prevention and control of thalassemia.
Collapse
Affiliation(s)
- Pingsen Zhao
- Clinical Core Laboratory
- Center for Precision Medicine, Meizhou People's Hospital (Huangtang Hospital), Meizhou Academy of Medical Sciences, Meizhou Hospital Affiliated to Sun Yat-sen University
- Guangdong Provincial Engineering and Technology Research Center for Molecular Diagnostics of Cardiovascular Diseases
- Meizhou Municipal Engineering and Technology Research Center for Molecular Diagnostics of Cardiovascular Diseases
- Meizhou Municipal Engineering and Technology Research Center for Molecular Diagnostics of Major Genetic Disorders
- Prenatal Diagnosis Center, Meizhou People's Hospital (Huangtang Hospital), Meizhou Academy of Medical Sciences, Meizhou Hospital Affiliated to Sun Yat-sen University, Meizhou, P. R. China
| | - Heming Wu
- Clinical Core Laboratory
- Center for Precision Medicine, Meizhou People's Hospital (Huangtang Hospital), Meizhou Academy of Medical Sciences, Meizhou Hospital Affiliated to Sun Yat-sen University
- Guangdong Provincial Engineering and Technology Research Center for Molecular Diagnostics of Cardiovascular Diseases
- Meizhou Municipal Engineering and Technology Research Center for Molecular Diagnostics of Cardiovascular Diseases
- Meizhou Municipal Engineering and Technology Research Center for Molecular Diagnostics of Major Genetic Disorders
- Prenatal Diagnosis Center, Meizhou People's Hospital (Huangtang Hospital), Meizhou Academy of Medical Sciences, Meizhou Hospital Affiliated to Sun Yat-sen University, Meizhou, P. R. China
| | - Ruiqiang Weng
- Clinical Core Laboratory
- Center for Precision Medicine, Meizhou People's Hospital (Huangtang Hospital), Meizhou Academy of Medical Sciences, Meizhou Hospital Affiliated to Sun Yat-sen University
- Guangdong Provincial Engineering and Technology Research Center for Molecular Diagnostics of Cardiovascular Diseases
- Meizhou Municipal Engineering and Technology Research Center for Molecular Diagnostics of Cardiovascular Diseases
- Meizhou Municipal Engineering and Technology Research Center for Molecular Diagnostics of Major Genetic Disorders
- Prenatal Diagnosis Center, Meizhou People's Hospital (Huangtang Hospital), Meizhou Academy of Medical Sciences, Meizhou Hospital Affiliated to Sun Yat-sen University, Meizhou, P. R. China
| |
Collapse
|
33
|
Arabfard M, Kavousi K, Delbari A, Ohadi M. Link between short tandem repeats and translation initiation site selection. Hum Genomics 2018; 12:47. [PMID: 30373661 PMCID: PMC6206671 DOI: 10.1186/s40246-018-0181-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 10/10/2018] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Despite their vast biological implication, the relevance of short tandem repeats (STRs)/microsatellites to the protein-coding gene translation initiation sites (TISs) remains largely unknown. METHODS We performed an Ensembl-based comparative genomics study of all annotated orthologous TIS-flanking sequences in human and 46 other species across vertebrates, on the genomic DNA and cDNA platforms (755,956 TISs), aimed at identifying human-specific STRs in this interval. The collected data were used to examine the hypothesis of a link between STRs and TISs. BLAST was used to compare the initial five amino acids (excluding the initial methionine), codons of which were flanked by STRs in human, with the initial five amino acids of all annotated proteins for the orthologous genes in other vertebrates (total of 5,314,979 pair-wise TIS comparisons on the genomic DNA and cDNA platforms) in order to compare the number of events in which human-specific and non-specific STRs occurred with homologous and non-homologous TISs (i.e., ≥ 50% and < 50% similarity of the five amino acids). RESULTS We detected differential distribution of the human-specific STRs in comparison to the overall distribution of STRs on the genomic DNA and cDNA platforms (Mann Whitney U test p = 1.4 × 10-11 and p < 7.9 × 10-11, respectively). We also found excess occurrence of non-homologous TISs with human-specific STRs and excess occurrence of homologous TISs with non-specific STRs on both platforms (p < 0.00001). CONCLUSION We propose a link between STRs and TIS selection, based on the differential co-occurrence rate of human-specific STRs with non-homologous TISs and non-specific STRs with homologous TISs.
Collapse
Affiliation(s)
- Masoud Arabfard
- Department of Bioinformatics, Kish International Campus University of Tehran, Kish, Iran
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Kaveh Kavousi
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Ahmad Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Mina Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| |
Collapse
|
34
|
Saini S, Mitra I, Mousavi N, Fotsing SF, Gymrek M. A reference haplotype panel for genome-wide imputation of short tandem repeats. Nat Commun 2018; 9:4397. [PMID: 30353011 PMCID: PMC6199332 DOI: 10.1038/s41467-018-06694-0] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 09/18/2018] [Indexed: 12/14/2022] Open
Abstract
Short tandem repeats (STRs) are involved in dozens of Mendelian disorders and have been implicated in complex traits. However, genotyping arrays used in genome-wide association studies focus on single nucleotide polymorphisms (SNPs) and do not readily allow identification of STR associations. We leverage next-generation sequencing (NGS) from 479 families to create a SNP + STR reference haplotype panel. Our panel enables imputing STR genotypes into SNP array data when NGS is not available for directly genotyping STRs. Imputed genotypes achieve mean concordance of 97% with observed genotypes in an external dataset compared to 71% expected under a naive model. Performance varies widely across STRs, with near perfect concordance at bi-allelic STRs vs. 70% at highly polymorphic repeats. Imputation increases power over individual SNPs to detect STR associations with gene expression. Imputing STRs into existing SNP datasets will enable the first large-scale STR association studies across a range of complex traits.
Collapse
Affiliation(s)
- Shubham Saini
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Ileena Mitra
- Bioinformatics and Systems Biology Program, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Stephanie Feupe Fotsing
- Bioinformatics and Systems Biology Program, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
- Department of Biomedical Informatics, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.
- Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.
| |
Collapse
|
35
|
Genetic structure and polymorphisms of Gelao ethnicity residing in southwest china revealed by X-chromosomal genetic markers. Sci Rep 2018; 8:14585. [PMID: 30275508 PMCID: PMC6167355 DOI: 10.1038/s41598-018-32945-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 09/19/2018] [Indexed: 01/10/2023] Open
Abstract
X-chromosome short tandem repeat markers (X-STRs), due to their special inheritance models, physical location on a single chromosome and the absence of recombination in male meiosis, play an important role in forensic and population genetics. While a series of genetic analyses focusing on the genetic diversity and forensic characteristics of X-STRs are well studied for ethnically/linguistically diverse and demographically large Chinese populations, genetic evidence from Gelao ethnicity is still sparse. Here, we genotyped the first batch of 19 X-STRs in 513 Chinese Gelao individuals (265 females and 248 males), and reported genetic polymorphisms, forensic characteristics based on the single locus and seven linkage groups. DXS10135 with the highest PIC (0.9106) and LG1 (DXS10148-DXS10135-DXS8378) with the largest HD (0.9970) are polymorphic and informative. The CPDs in Gelao males and females are respectively larger than 0.999999999997095 and 0.99999999999999999999918, and the combined MECs are larger than 0.999999975715109. Subsequently, we investigated the population relationships among 14 Chinese populations based on 19 X-STRs and among 23 populations based on 11 overlapped X-STRs. Our results revealed genetic differentiations among Tibeto-Burman, Altaic and other Chinese homogenous populations, and demonstrated that Guizhou Gelao has the genetically closer relationships with Han Chinese and geographically close Guizhou Miao.
Collapse
|
36
|
Press MO, McCoy RC, Hall AN, Akey JM, Queitsch C. Massive variation of short tandem repeats with functional consequences across strains of Arabidopsis thaliana. Genome Res 2018; 28:1169-1178. [PMID: 29970452 PMCID: PMC6071631 DOI: 10.1101/gr.231753.117] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 06/26/2018] [Indexed: 11/24/2022]
Abstract
Short tandem repeat (STR) mutations may comprise more than half of the mutations in eukaryotic coding DNA, yet STR variation is rarely examined as a contributor to complex traits. We assessed this contribution across a collection of 96 strains of Arabidopsis thaliana, genotyping 2046 STR loci each, using highly parallel STR sequencing with molecular inversion probes. We found that 95% of examined STRs are polymorphic, with a median of six alleles per STR across these strains. STR expansions (large copy number increases) are found in most strains, several of which have evident functional effects. These include three of six intronic STR expansions we found to be associated with intron retention. Coding STRs were depleted of variation relative to noncoding STRs, and we detected a total of 56 coding STRs (11%) showing low variation consistent with the action of purifying selection. In contrast, some STRs show hypervariable patterns consistent with diversifying selection. Finally, we detected 133 novel STR-phenotype associations under stringent criteria, most of which could not be detected with SNPs alone, and validated some with follow-up experiments. Our results support the conclusion that STRs constitute a large, unascertained reservoir of functionally relevant genomic variation.
Collapse
Affiliation(s)
- Maximilian O Press
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Rajiv C McCoy
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Ashley N Hall
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.,Molecular and Cellular Biology Program, University of Washington, Seattle, Washington 98195, USA
| | - Joshua M Akey
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
37
|
Yu C, Baune BT, Wong ML, Licinio J. Investigation of short tandem repeats in major depression using whole-genome sequencing data. J Affect Disord 2018; 232:305-309. [PMID: 29501989 DOI: 10.1016/j.jad.2018.02.046] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 01/02/2018] [Accepted: 02/16/2018] [Indexed: 02/06/2023]
Abstract
BACKGROUND Major depressive disorder (MDD) is a leading contributor to global disease burden. Recent studies have shown that genetic factors play significant roles in the susceptibility to this condition; however, the underlying genetic basis currently remains largely unknown. Short tandem repeat (STR) has been proposed as an explanatory factor in the "missing heritability" of complex diseases or traits. METHODS We investigated STR variations from 15 MDD patients and 10 ethnically matched healthy controls based on their deep whole-genome sequencing (WGS) data. The lobSTR software was used to computationally determine STRs. RESULTS The results of the Mexican-American sample showed that STRs are significantly richer in healthy controls than in MDD cases on each of the 23 chromosomes (all false discovery rates, FDR P-values < 0.0062); while for the Australian of European-ancestry sample, there was no statistically significant STRs difference between MDD cases and controls. LIMITATIONS High quality WGS costs limited obtaining larger datasets. CONCLUSIONS This preliminary work is the first study that STR variations are applied to investigate MDD based on WGS data. The results on Mexican-American population may imply that within the same ancestry, targeted sequencing on a specific chromosome or region of genome would be sufficient for examining the relationship between STR and MDD. Further studies should examine larger sequencing datasets on other ethnic groups.
Collapse
Affiliation(s)
- Chenglong Yu
- Robinson Research Institute, Adelaide Medical School, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia; Mind and Brain Theme, South Australian Health and Medical Research Institute, North Terrace, Adelaide, SA 5000, Australia; School of Medicine, Faculty of Medicine, Nursing and Health Sciences, Flinders University, Bedford Park, SA 5042, Australia.
| | - Bernhard T Baune
- Discipline of Psychiatry, Adelaide Medical School, University of Adelaide, Adelaide, SA 5005, Australia
| | - Ma-Li Wong
- Mind and Brain Theme, South Australian Health and Medical Research Institute, North Terrace, Adelaide, SA 5000, Australia; School of Medicine, Faculty of Medicine, Nursing and Health Sciences, Flinders University, Bedford Park, SA 5042, Australia; Department of Psychiatry, College of Medicine, State University of New York, Upstate Medical University, Syracuse, NY 13210, USA
| | - Julio Licinio
- Department of Psychiatry, College of Medicine, State University of New York, Upstate Medical University, Syracuse, NY 13210, USA; Departments of Pharmacology and Medicine, College of Medicine, State University of New York, Upstate Medical University, Syracuse, NY 13210, USA
| |
Collapse
|
38
|
Nazaripanah N, Adelirad F, Delbari A, Sahaf R, Abbasi-Asl T, Ohadi M. Genome-scale portrait and evolutionary significance of human-specific core promoter tri- and tetranucleotide short tandem repeats. Hum Genomics 2018; 12:17. [PMID: 29622039 PMCID: PMC5887250 DOI: 10.1186/s40246-018-0149-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Accepted: 03/20/2018] [Indexed: 03/05/2023] Open
Abstract
BACKGROUND While there is an ongoing trend to identify single nucleotide substitutions (SNSs) that are linked to inter/intra-species differences and disease phenotypes, short tandem repeats (STRs)/microsatellites may be of equal (if not more) importance in the above processes. Genes that contain STRs in their promoters have higher expression divergence compared to genes with fixed or no STRs in the gene promoters. In line with the above, recent reports indicate a role of repetitive sequences in the rise of young transcription start sites (TSSs) in human evolution. RESULTS Following a comparative genomics study of all human protein-coding genes annotated in the GeneCards database, here we provide a genome-scale portrait of human-specific short- and medium-size (≥ 3-repeats) tri- and tetranucleotide STRs and STR motifs in the critical core promoter region between - 120 and + 1 to the TSS and evidence of skewing of this compartment in reference to the STRs that are not human-specific (Levene's test p < 0.001). Twenty-five percent and 26% enrichment of human-specific transcripts was detected in the tri and tetra human-specific compartments (mid-p < 0.00002 and mid-p < 0.002, respectively). CONCLUSION Our findings provide the first evidence of genome-scale skewing of STRs at a specific region of the human genome and a link between a number of these STRs and TSS selection/transcript specificity. The STRs and genes listed here may have a role in the evolution and development of characteristics and phenotypes that are unique to the human species.
Collapse
Affiliation(s)
- N Nazaripanah
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - F Adelirad
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - A Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - R Sahaf
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - T Abbasi-Asl
- Department of Biostatistics, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
39
|
Zhao P, Wu H, Zhong Z, Lan L, Zeng M, Lin H, Wang H, Zheng Z, Su L, Guo W. Molecular prenatal diagnosis of alpha and beta thalassemia in pregnant Hakka women in southern China. J Clin Lab Anal 2018; 32:e22306. [PMID: 28771834 PMCID: PMC6816879 DOI: 10.1002/jcla.22306] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Accepted: 07/11/2017] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND To date, there has been no systematic study of DNA-based prenatal diagnosis of thalassemia in pregnant Hakka women in southern China. METHODS A total of 279 pregnant Hakka women with confirmed cases of thalassemia who had been treated at the Meizhou People's Hospital in China's Guangdong Province from January 2014 to December 2016 were here enrolled. Genomic DNA was extracted from peripheral blood of couples and villus, amniotic fluid, or fetal cord blood. DNA-based diagnosis was performed on the tissues of fetuses whose parents had tested positive for α- and β-globin gene mutations were found using polymerase chain reaction (PCR) and flow-through hybridization technique. Follow-up visits were performed 6 months after the fetuses were born. Prenatal diagnosis was performed on 279 fetuses in at-risk pregnancies. RESULTS Here, 211 α-thalassemia fetuses were confirmed, including 41 (19.43%) that tested positive for Bart's hydrops syndrome and 15 (7.11%) for Hb H disease. There were 103 (48.81%) heterozygotes. β-thalassemia was confirmed in 68 fetuses, including 23 (33.82%) with severe thalassemia and 27 (39.71%) heterozygotes. Another 12 cases were confirmed with α+β-thalassemia, including three cases of severe β-thalassemia. DNA-based testing prenatal diagnosis of thalassemia was found to be highly reliable. CONCLUSIONS Our findings provide key information for clinical genetic counseling of prenatal diagnosis for major thalassemia in pregnant Hakka women in southern China.
Collapse
Affiliation(s)
- Pingsen Zhao
- Clinical Core LaboratoryMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
- Center for Precision MedicineMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
- Prenatal Diagnosis CenterMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
| | - Heming Wu
- Clinical Core LaboratoryMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
- Center for Precision MedicineMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
- Prenatal Diagnosis CenterMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
| | - Zhixiong Zhong
- Center for Precision MedicineMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
| | - Liubing Lan
- Prenatal Diagnosis CenterMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
- Department of ObstetricsMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
| | - Mei Zeng
- Prenatal Diagnosis CenterMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
- Department of ObstetricsMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
| | - Hualan Lin
- Prenatal Diagnosis CenterMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
- Department of ObstetricsMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
| | - Huaxian Wang
- Clinical Core LaboratoryMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
- Center for Precision MedicineMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
- Prenatal Diagnosis CenterMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
| | - Zhiyuan Zheng
- Clinical Core LaboratoryMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
- Center for Precision MedicineMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
- Prenatal Diagnosis CenterMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
| | - Luxian Su
- Clinical Core LaboratoryMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
- Center for Precision MedicineMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
- Prenatal Diagnosis CenterMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
| | - Wei Guo
- Clinical Core LaboratoryMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
- Center for Precision MedicineMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
- Prenatal Diagnosis CenterMeizhou People's Hospital (Huangtang Hospital)Meizhou Hospital Affiliated to Sun Yat‐sen UniversityMeizhouChina
| |
Collapse
|
40
|
Zavodna M, Bagshaw A, Brauning R, Gemmell NJ. The effects of transcription and recombination on mutational dynamics of short tandem repeats. Nucleic Acids Res 2018; 46:1321-1330. [PMID: 29300948 PMCID: PMC5814968 DOI: 10.1093/nar/gkx1253] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Revised: 11/27/2017] [Accepted: 12/27/2017] [Indexed: 01/07/2023] Open
Abstract
Short tandem repeats (STR) are ubiquitous components of the genomic architecture of most living organisms. Recent work has highlighted the widespread functional significance of such repeats, particularly around gene regulation, but the mutational processes underlying the evolution of these highly abundant and highly variable sequences are not fully understood. Traditional models assume that strand misalignment during replication is the predominant mechanism, but empirical data suggest the involvement of other processes including recombination and transcription. Despite this evidence, the relative influences of these processes have not previously been tested experimentally on a genome-wide scale. Using deep sequencing, we identify mutations at >200 microsatellites, across 700 generations in replicated populations of two otherwise identical sexual and asexual Saccharomyces cerevisiae strains. Using generalized linear models, we investigate correlates of STR mutability including the nature of the mutation, STR composition and contextual factors including recombination, transcription and replication origins. Sexual capability was not a significant predictor of microsatellite mutability, but, intriguingly, we identify transcription as a significant positive predictor. We also find that STR density is substantially increased in regions neighboring, but not within, recombination hotspots.
Collapse
Affiliation(s)
- Monika Zavodna
- Department of Anatomy, University of Otago, Dunedin 9054, New Zealand
| | - Andrew Bagshaw
- Department of Pathology, University of Otago, Christchurch 8140, New Zealand
| | - Rudiger Brauning
- AgResearch Limited, Invermay Agricultural Centre, Mosgiel, New Zealand
| | - Neil J Gemmell
- Department of Anatomy, University of Otago, Dunedin 9054, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, University of Otago, Dunedin 9054, New Zealand
| |
Collapse
|
41
|
Abstract
Accumulating evidence suggests that many classes of DNA repeats exhibit attributes that distinguish them from other genetic variants, including the fact that they are more liable to mutation; this enables them to mediate genetic plasticity. The expansion of tandem repeats, particularly of short tandem repeats, can cause a range of disorders (including Huntington disease, various ataxias, motor neuron disease, frontotemporal dementia, fragile X syndrome and other neurological disorders), and emerging data suggest that tandem repeat polymorphisms (TRPs) can also regulate gene expression in healthy individuals. TRPs in human genomes may also contribute to the missing heritability of polygenic disorders. A better understanding of tandem repeats and their associated repeatome, as well as their capacity for genetic plasticity via both germline and somatic mutations, is needed to transform our understanding of the role of TRPs in health and disease.
Collapse
Affiliation(s)
- Anthony J Hannan
- Florey Institute of Neuroscience and Mental Health, University of Melbourne.,Department of Anatomy and Neuroscience, University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
42
|
Skewing of the genetic architecture at the ZMYM3 human-specific 5' UTR short tandem repeat in schizophrenia. Mol Genet Genomics 2018; 293:747-752. [PMID: 29332164 DOI: 10.1007/s00438-018-1415-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2017] [Accepted: 01/02/2018] [Indexed: 02/06/2023]
Abstract
Differential expansion of a number of human short tandem repeats (STRs) at the critical core promoter and 5' untranslated region (UTR) support the hypothesis that at least some of these STRs may provide a selective advantage in human evolution. Following a genome-wide screen of all human protein-coding gene 5' UTRs based on the Ensembl database ( http://www.ensembl.org ), we previously reported that the longest STR in this interval is a (GA)32, which belongs to the X-linked zinc finger MYM-type containing 3 (ZMYM3) gene. In the present study, we analyzed the evolutionary implication of this region across evolution and examined the allele and genotype distribution of the "exceptionally long" STR by direct sequencing of 486 Iranian unrelated male subjects consisting of 196 cases of schizophrenia (SCZ) and 290 controls. We found that the ZMYM3 transcript containing the STR is human-specific (ENST00000373998.5). A significant allele variance difference was observed between the cases and controls (Levene's test for equality of variances F = 4.00, p < 0.03). In addition, six alleles were observed in the SCZ patients that were not detected in the control group ("disease-only" alleles) (mid p exact < 0.0003). Those alleles were at the extreme short and long ends of the allele distribution curve and composed 4% of the genotypes in the SCZ group. In conclusion, we found skewing of the genetic architecture at the ZMYM3 STR in SCZ. Further, we found a bell-shaped distribution of alleles and selection against alleles at the extreme ends of this STR. The ZMYM3 STR sets a prototype, the evolutionary course of which determines the range of alleles in a particular species. Extreme "disease-only" alleles and genotypes may change our perspective of adaptive evolution and complex disorders. The ZMYM3 gene "exceptionally long" STR should be sequenced in SCZ and other human-specific phenotypes/characteristics.
Collapse
|
43
|
Xu L, Haasl RJ, Sun J, Zhou Y, Bickhart DM, Li J, Song J, Sonstegard TS, Van Tassell CP, Lewin HA, Liu GE. Systematic Profiling of Short Tandem Repeats in the Cattle Genome. Genome Biol Evol 2018; 9:20-31. [PMID: 28172841 PMCID: PMC5381564 DOI: 10.1093/gbe/evw256] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2016] [Indexed: 12/13/2022] Open
Abstract
Short tandem repeats (STRs), or microsatellites, are genetic variants with repetitive 2–6 base pair motifs in many mammalian genomes. Using high-throughput sequencing and experimental validations, we systematically profiled STRs in five Holsteins. We identified a total of 60,106 microsatellites and generated the first high-resolution STR map, representing a substantial pool of polymorphism in dairy cattle. We observed significant STRs overlap with functional genes and quantitative trait loci (QTL). We performed evolutionary and population genetic analyses using over 20,000 common dinucleotide STRs. Besides corroborating the well-established positive correlation between allele size and variance in allele size, these analyses also identified dozens of outlier STRs based on two anomalous relationships that counter expected characteristics of neutral evolution. And one STR locus overlaps with a significant region of a summary statistic designed to detect STR-related selection. Additionally, our results showed that only 57.1% of STRs located within SNP-based linkage disequilibrium (LD) blocks whereas the other 42.9% were out of blocks. Therefore, a substantial number of STRs are not tagged by SNPs in the cattle genome, likely due to STR's distinct mutation mechanism and elevated polymorphism. This study provides the foundation for future STR-based studies of cattle genome evolution and selection.
Collapse
Affiliation(s)
- Lingyang Xu
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, Beltsville, MD.,Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China.,Department of Animal and Avian Sciences, University of Maryland, College Park, MD
| | - Ryan J Haasl
- Department of Biology, University of Wisconsin - Platteville, WI
| | - Jiajie Sun
- College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Yang Zhou
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, Beltsville, MD.,College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Yangling, Shannxi, China
| | - Derek M Bickhart
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, Beltsville, MD
| | - Junya Li
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jiuzhou Song
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD
| | - Tad S Sonstegard
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, Beltsville, MD
| | - Curtis P Van Tassell
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, Beltsville, MD
| | - Harris A Lewin
- Department of Evolution and Ecology, University of California, Davis, CA
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, Beltsville, MD
| |
Collapse
|
44
|
A microsatellite repeat in PCA3 long non-coding RNA is associated with prostate cancer risk and aggressiveness. Sci Rep 2017; 7:16862. [PMID: 29203868 PMCID: PMC5715103 DOI: 10.1038/s41598-017-16700-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Accepted: 11/10/2017] [Indexed: 01/08/2023] Open
Abstract
Short tandem repeats (STRs) are repetitive sequences of a polymorphic stretch of two to six nucleotides. We hypothesized that STRs are associated with prostate cancer development and/or progression. We undertook RNA sequencing analysis of prostate tumors and adjacent non-malignant cells to identify polymorphic STRs that are readily expressed in these cells. Most of the expressed STRs in the clinical samples mapped to intronic and intergenic DNA. Our analysis indicated that three of these STRs (TAAA-ACTG2, TTTTG-TRIB1, and TG-PCA3) are polymorphic and differentially expressed in prostate tumors compared to adjacent non-malignant cells. TG-PCA3 STR expression was repressed by the anti-androgen drug enzalutamide in prostate cancer cells. Genetic analysis of prostate cancer patients and healthy controls (N > 2,000) showed a significant association of the most common 11 repeat allele of TG-PCA3 STR with prostate cancer risk (OR = 1.49; 95% CI 1.11–1.99; P = 0.008). A significant association was also observed with aggressive disease (OR = 2.00; 95% CI 1.06–3.76; P = 0.031) and high mortality rates (HR = 3.0; 95% CI 1.03–8.77; P = 0.045). We propose that TG-PCA3 STR has both diagnostic and prognostic potential for prostate cancer. We provided a proof of concept to be applied to other RNA sequencing datasets to identify disease-associated STRs for future clinical exploratory studies.
Collapse
|
45
|
Tang H, Kirkness EF, Lippert C, Biggs WH, Fabani M, Guzman E, Ramakrishnan S, Lavrenko V, Kakaradov B, Hou C, Hicks B, Heckerman D, Och FJ, Caskey CT, Venter JC, Telenti A. Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes. Am J Hum Genet 2017; 101:700-715. [PMID: 29100084 PMCID: PMC5673627 DOI: 10.1016/j.ajhg.2017.09.013] [Citation(s) in RCA: 102] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 09/15/2017] [Indexed: 12/30/2022] Open
Abstract
Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the genome by next-generation sequencing (NGS). In particular, accurate detection of pathological STR expansion is limited by the sequence read length during whole-genome analysis. We developed TREDPARSE, a software package that incorporates various cues from read alignment and paired-end distance distribution, as well as a sequence stutter model, in a probabilistic framework to infer repeat sizes for genetic loci, and we used this software to infer repeat sizes for 30 known disease loci. Using simulated data, we show that TREDPARSE outperforms other available software. We sampled the full genome sequences of 12,632 individuals to an average read depth of approximately 30× to 40× with Illumina HiSeq X. We identified 138 individuals with risk alleles at 15 STR disease loci. We validated a representative subset of the samples (n = 19) by Sanger and by Oxford Nanopore sequencing. Additionally, we validated the STR calls against known allele sizes in a set of GeT-RM reference cell-line materials (n = 6). Several STR loci that are entirely guanine or cytosines (G or C) have insufficient read evidence for inference and therefore could not be assayed precisely by TREDPARSE. TREDPARSE extends the limit of STR size detection beyond the physical sequence read length. This extension is critical because many of the disease risk cutoffs are close to or beyond the short sequence read length of 100 to 150 bases.
Collapse
Affiliation(s)
- Haibao Tang
- Human Longevity, Mountain View, CA 94041, USA
| | | | | | | | | | | | | | | | | | - Claire Hou
- Human Longevity, San Diego, CA 92121, USA
| | - Barry Hicks
- Human Longevity, Mountain View, CA 94041, USA
| | | | - Franz J Och
- Human Longevity, Mountain View, CA 94041, USA
| | | | | | | |
Collapse
|
46
|
Gymrek M, Willems T, Reich D, Erlich Y. Interpreting short tandem repeat variations in humans using mutational constraint. Nat Genet 2017; 49:1495-1501. [PMID: 28892063 PMCID: PMC5679271 DOI: 10.1038/ng.3952] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Accepted: 08/14/2017] [Indexed: 12/19/2022]
Abstract
Identifying regions of the genome that are depleted of mutations can distinguish potentially deleterious variants. Short tandem repeats (STRs), also known as microsatellites, are among the largest contributors of de novo mutations in humans. However, per-locus studies of STR mutations have been limited to highly ascertained panels of several dozen loci. Here we harnessed bioinformatics tools and a novel analytical framework to estimate mutation parameters for each STR in the human genome by correlating STR genotypes with local sequence heterozygosity. We applied our method to obtain robust estimates of the impact of local sequence features on mutation parameters and used these estimates to create a framework for measuring constraint at STRs by comparing observed versus expected mutation rates. Constraint scores identified known pathogenic variants with early-onset effects. Our metric will provide a valuable tool for prioritizing pathogenic STRs in medical genetics studies.
Collapse
Affiliation(s)
- Melissa Gymrek
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- New York Genome Center, New York, NY, USA
- Department of Medicine, University of California San Diego, La Jolla, CA USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA USA
| | - Thomas Willems
- New York Genome Center, New York, NY, USA
- Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA USA
| | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA USA
| | - Yaniv Erlich
- New York Genome Center, New York, NY, USA
- Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, NY, USA
| |
Collapse
|
47
|
Bagshaw AT. Functional Mechanisms of Microsatellite DNA in Eukaryotic Genomes. Genome Biol Evol 2017; 9:2428-2443. [PMID: 28957459 PMCID: PMC5622345 DOI: 10.1093/gbe/evx164] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/23/2017] [Indexed: 02/06/2023] Open
Abstract
Microsatellite repeat DNA is best known for its length mutability, which is implicated in several neurological diseases and cancers, and often exploited as a genetic marker. Less well-known is the body of work exploring the widespread and surprisingly diverse functional roles of microsatellites. Recently, emerging evidence includes the finding that normal microsatellite polymorphism contributes substantially to the heritability of human gene expression on a genome-wide scale, calling attention to the task of elucidating the mechanisms involved. At present, these are underexplored, but several themes have emerged. I review evidence demonstrating roles for microsatellites in modulation of transcription factor binding, spacing between promoter elements, enhancers, cytosine methylation, alternative splicing, mRNA stability, selection of transcription start and termination sites, unusual structural conformations, nucleosome positioning and modification, higher order chromatin structure, noncoding RNA, and meiotic recombination hot spots.
Collapse
|
48
|
Prentice MB, Bowman J, Lalor JL, McKay MM, Thomson LA, Watt CM, McAdam AG, Murray DL, Wilson PJ. Signatures of selection in mammalian clock genes with coding trinucleotide repeats: Implications for studying the genomics of high-pace adaptation. Ecol Evol 2017; 7:7254-7276. [PMID: 28944015 PMCID: PMC5606889 DOI: 10.1002/ece3.3223] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 05/31/2017] [Accepted: 06/06/2017] [Indexed: 12/14/2022] Open
Abstract
Climate change is predicted to affect the reproductive ecology of wildlife; however, we have yet to understand if and how species can adapt to the rapid pace of change. Clock genes are functional genes likely critical for adaptation to shifting seasonal conditions through shifts in timing cues. Many of these genes contain coding trinucleotide repeats, which offer the potential for higher rates of change than single nucleotide polymorphisms (SNPs) at coding sites, and, thus, may translate to faster rates of adaptation in changing environments. We characterized repeats in 22 clock genes across all annotated mammal species and evaluated the potential for selection on repeat motifs in three clock genes (NR1D1,CLOCK, and PER1) in three congeneric species pairs with different latitudinal range limits: Canada lynx and bobcat (Lynx canadensis and L. rufus), northern and southern flying squirrels (Glaucomys sabrinus and G. volans), and white‐footed and deer mouse (Peromyscus leucopus and P. maniculatus). Signatures of positive selection were found in both the interspecific comparison of Canada lynx and bobcat, and intraspecific analyses in Canada lynx. Northern and southern flying squirrels showed differing frequencies at common CLOCK alleles and a signature of balancing selection. Regional excess homozygosity was found in the deer mouse at PER1 suggesting disruptive selection, and further analyses suggested balancing selection in the white‐footed mouse. These preliminary signatures of selection and the presence of trinucleotide repeats within many clock genes warrant further consideration of the importance of candidate gene motifs for adaptation to climate change.
Collapse
Affiliation(s)
- Melanie B Prentice
- Department of Environmental and Life Sciences Trent University Peterborough ON Canada
| | - Jeff Bowman
- Wildlife Research and Monitoring Section Ontario Ministry of Natural Resources and Forestry Peterborough ON Canada
| | | | - Michelle M McKay
- Department of Environmental and Life Sciences Trent University Peterborough ON Canada
| | | | - Cristen M Watt
- Department of Environmental and Life Sciences Trent University Peterborough ON Canada
| | - Andrew G McAdam
- Department of Integrative Biology University of Guelph Guelph ON Canada
| | | | - Paul J Wilson
- Biology Department Trent University Peterborough ON Canada
| |
Collapse
|
49
|
White SJ, Laros JF, Bakker E, Cambon‐Thomsen A, Eden M, Leonard S, Lochmüller H, Matthijs G, Mattocks C, Patton S, Payne K, Scheffer H, Souche E, Thomassen E, Thompson R, Traeger‐Synodinos J, Vooren S, Janssen B, den Dunnen JT. Critical points for an accurate human genome analysis. Hum Mutat 2017; 38:912-921. [DOI: 10.1002/humu.23238] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Revised: 04/13/2017] [Accepted: 04/23/2017] [Indexed: 12/16/2022]
Affiliation(s)
- Stefan J. White
- Department of Human Genetics, Leiden University Medical Center The Netherlands
| | - Jeroen F.J. Laros
- Department of Human Genetics, Leiden University Medical Center The Netherlands
- Clinical GeneticsLeiden University Medical Center The Netherlands
- GenomeScan Leiden The Netherlands
| | - Egbert Bakker
- Clinical GeneticsLeiden University Medical Center The Netherlands
| | - Anne Cambon‐Thomsen
- Epidemiology and Public Health Analyses, Inserm and Université Toulouse III Paul Sabatier Toulouse UMR 1027 France
| | - Martin Eden
- Manchester Centre for Health Economics, University of Manchester Manchester UK
| | - Samantha Leonard
- Epidemiology and Public Health Analyses, Inserm and Université Toulouse III Paul Sabatier Toulouse UMR 1027 France
| | - Hanns Lochmüller
- Institute of Genetic Medicine, Newcastle University Newcastle upon Tyne UK
| | | | | | - Simon Patton
- Central Manchester University Hospitals Foundation Trust, EMQN Manchester UK
| | - Katherine Payne
- Manchester Centre for Health Economics, University of Manchester Manchester UK
| | | | | | - Ellen Thomassen
- Department of Human Genetics, Leiden University Medical Center The Netherlands
| | - Rachel Thompson
- Institute of Genetic Medicine, Newcastle University Newcastle upon Tyne UK
| | | | | | | | - Johan T. den Dunnen
- Department of Human Genetics, Leiden University Medical Center The Netherlands
- Clinical GeneticsLeiden University Medical Center The Netherlands
| |
Collapse
|
50
|
Gymrek M. A genomic view of short tandem repeats. Curr Opin Genet Dev 2017; 44:9-16. [PMID: 28213161 DOI: 10.1016/j.gde.2017.01.012] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Accepted: 01/30/2017] [Indexed: 12/31/2022]
Abstract
Short tandem repeats (STRs) are some of the fastest mutating loci in the genome. Tools for accurately profiling STRs from high-throughput sequencing data have enabled genome-wide interrogation of more than a million STRs across hundreds of individuals. These catalogs have revealed that STRs are highly multiallelic and may contribute more de novo mutations than any other variant class. Recent studies have leveraged these catalogs to show that STRs play a widespread role in regulating gene expression and other molecular phenotypes. These analyses suggest that STRs are an underappreciated but rich reservoir of variation that likely make significant contributions to Mendelian diseases, complex traits, and cancer.
Collapse
Affiliation(s)
- Melissa Gymrek
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|