1
|
English AC, Dolzhenko E, Ziaei Jam H, McKenzie SK, Olson ND, De Coster W, Park J, Gu B, Wagner J, Eberle MA, Gymrek M, Chaisson MJP, Zook JM, Sedlazeck FJ. Analysis and benchmarking of small and large genomic variants across tandem repeats. Nat Biotechnol 2024:10.1038/s41587-024-02225-z. [PMID: 38671154 DOI: 10.1038/s41587-024-02225-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 03/28/2024] [Indexed: 04/28/2024]
Abstract
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies. We curated variants from the Genome in a Bottle (GIAB) HG002 individual to create a TR dataset to benchmark existing and future TR analysis methods. We also present an improved variant comparison method that handles variants greater than 4 bp in length and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ~24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 'truth-set' TR benchmark. We demonstrate the utility of this pipeline across short-read and long-read technologies.
Collapse
Affiliation(s)
- Adam C English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| | | | - Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | | | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Applied and Translational Neurogenomics Group, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Bida Gu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
2
|
English A, Dolzhenko E, Jam HZ, Mckenzie S, Olson ND, De Coster W, Park J, Gu B, Wagner J, Eberle MA, Gymrek M, Chaisson MJP, Zook JM, Sedlazeck FJ. Benchmarking of small and large variants across tandem repeats. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.29.564632. [PMID: 37961319 PMCID: PMC10634962 DOI: 10.1101/2023.10.29.564632] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples. We then curate variants from the GIAB HG002 individual to create a tandem repeat benchmark. We also present a variant comparison method that handles small and large alleles and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ∼24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 TR benchmark. We work with the GIAB community to demonstrate the utility of this benchmark across short and long read technologies.
Collapse
|
3
|
Lundström OS, Adriaan Verbiest M, Xia F, Jam HZ, Zlobec I, Anisimova M, Gymrek M. WebSTR: A Population-wide Database of Short Tandem Repeat Variation in Humans. J Mol Biol 2023; 435:168260. [PMID: 37678708 DOI: 10.1016/j.jmb.2023.168260] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 08/29/2023] [Accepted: 08/29/2023] [Indexed: 09/09/2023]
Abstract
Short tandem repeats (STRs) are consecutive repetitions of one to six nucleotide motifs. They are hypervariable due to the high prevalence of repeat unit insertions or deletions primarily caused by polymerase slippage during replication. Genetic variation at STRs has been shown to influence a range of traits in humans, including gene expression, cancer risk, and autism. Until recently STRs have been poorly studied since they pose significant challenges to bioinformatics analyses. Moreover, genome-wide analysis of STR variation in population-scale cohorts requires large amounts of data and computational resources. However, the recent advent of genome-wide analysis tools has resulted in multiple large genome-wide datasets of STR variation spanning nearly two million genomic loci in thousands of individuals from diverse populations. Here we present WebSTR, a database of genetic variation and other characteristics of genome-wide STRs across human populations. WebSTR is based on reference panels of more than 1.7 million human STRs created with state of the art repeat annotation methods and can easily be extended to include additional cohorts or species. It currently contains data based on STR genotypes for individuals from the 1000 Genomes Project, H3Africa, the Genotype-Tissue Expression (GTEx) Project and colorectal cancer patients from the TCGA dataset. WebSTR is implemented as a relational database with programmatic access available through an API and a web portal for browsing data. The web portal is publicly available at https://webstr.ucsd.edu.
Collapse
Affiliation(s)
- Oxana Sachenkova Lundström
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden; Vildly AB, Kalmar, Sweden; Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland. https://twitter.com/merenlin
| | - Max Adriaan Verbiest
- Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland; Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | - Feifei Xia
- Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland; Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland. https://twitter.com/Feifeix97
| | - Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Inti Zlobec
- Institute of Tissue Medicine and Pathology, University of Bern, Switzerland
| | - Maria Anisimova
- Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA; Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
4
|
Plavskin Y, de Biase MS, Schwarz RF, Siegal ML. The rate of spontaneous mutations in yeast deficient for MutSβ function. G3 (BETHESDA, MD.) 2023; 13:6931805. [PMID: 36529906 PMCID: PMC9997558 DOI: 10.1093/g3journal/jkac330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 08/25/2022] [Accepted: 11/30/2022] [Indexed: 12/23/2022]
Abstract
Mutations in simple sequence repeat loci underlie many inherited disorders in humans, and are increasingly recognized as important determinants of natural phenotypic variation. In eukaryotes, mutations in these sequences are primarily repaired by the MutSβ mismatch repair complex. To better understand the role of this complex in mismatch repair and the determinants of simple sequence repeat mutation predisposition, we performed mutation accumulation in yeast strains with abrogated MutSβ function. We demonstrate that mutations in simple sequence repeat loci in the absence of mismatch repair are primarily deletions. We also show that mutations accumulate at drastically different rates in short (<8 bp) and longer repeat loci. These data lend support to a model in which the mismatch repair complex is responsible for repair primarily in longer simple sequence repeats.
Collapse
Affiliation(s)
- Yevgeniy Plavskin
- Center for Genomics and Systems Biology, New York University, New York 10003, USA.,Department of Biology, New York University, New York 10003, USA
| | - Maria Stella de Biase
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin 10115, Germany.,Department of Biology, Humboldt-Universität zu Berlin, Berlin 10099, Germany
| | - Roland F Schwarz
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin 10115, Germany.,Institute for Computational Cancer Biology, Center for Integrated Oncology (CIO), Cancer Research Center Cologne Essen (CCCE), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne 50937, Germany.,Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin 10623, Germany
| | - Mark L Siegal
- Center for Genomics and Systems Biology, New York University, New York 10003, USA.,Department of Biology, New York University, New York 10003, USA
| |
Collapse
|
5
|
Verbiest M, Maksimov M, Jin Y, Anisimova M, Gymrek M, Bilgin Sonay T. Mutation and selection processes regulating short tandem repeats give rise to genetic and phenotypic diversity across species. J Evol Biol 2023; 36:321-336. [PMID: 36289560 PMCID: PMC9990875 DOI: 10.1111/jeb.14106] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/29/2022] [Accepted: 08/01/2022] [Indexed: 02/03/2023]
Abstract
Short tandem repeats (STRs) are units of 1-6 bp that repeat in a tandem fashion in DNA. Along with single nucleotide polymorphisms and large structural variations, they are among the major genomic variants underlying genetic, and likely phenotypic, divergence. STRs experience mutation rates that are orders of magnitude higher than other well-studied genotypic variants. Frequent copy number changes result in a wide range of alleles, and provide unique opportunities for modulating complex phenotypes through variation in repeat length. While classical studies have identified key roles of individual STR loci, the advent of improved sequencing technology, high-quality genome assemblies for diverse species, and bioinformatics methods for genome-wide STR analysis now enable more systematic study of STR variation across wide evolutionary ranges. In this review, we explore mutation and selection processes that affect STR copy number evolution, and how these processes give rise to varying STR patterns both within and across species. Finally, we review recent examples of functional and adaptive changes linked to STRs.
Collapse
Affiliation(s)
- Max Verbiest
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Department of Molecular Life SciencesUniversity of ZurichZurichSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Mikhail Maksimov
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Ye Jin
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of BioengineeringUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Maria Anisimova
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Melissa Gymrek
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Tugce Bilgin Sonay
- Institute of Ecology, Evolution and Environmental BiologyColumbia UniversityNew YorkNew YorkUSA
| |
Collapse
|
6
|
Akaishi T, Fujiwara K, Ishii T. Variable number tandem repeats of a 9-base insertion in the N-terminal domain of severe acute respiratory syndrome coronavirus 2 spike gene. Front Microbiol 2023; 13:1089399. [PMID: 36687631 PMCID: PMC9846035 DOI: 10.3389/fmicb.2022.1089399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 12/12/2022] [Indexed: 01/06/2023] Open
Abstract
Introduction The world is still struggling against the pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), in 2022. The pandemic has been facilitated by the intermittent emergence of variant strains, which has been explained and classified mainly by the patterns of point mutations of the spike (S) gene. However, the profiles of insertions/deletions (indels) in SARS-CoV-2 genomes during the pandemic remain largely unevaluated yet. Methods In this study, we first screened for the genome regions of polymorphic indel sites by performing multiple sequence alignment; then, NCBI BLAST search and GISAID database search were performed to comprehensively investigate the indel profiles at the polymorphic indel hotspot and elucidate the emergence and spread of the indels in time and geographical distribution. Results A polymorphic indel hotspot was identified in the N-terminal domain of the S gene at approximately 22,200 nucleotide position, corresponding to 210-215 amino acid positions of SARS-CoV-2 S protein. This polymorphic hotspot was comprised of adjacent 3-base deletion (5'-ATT-3'; Spike_N211del) and 9-base insertion (5'-AGCCAGAAG-3'; Spike_ins214EPE). By performing NCBI BLAST search and GISAID database search, we identified several types of tandem repeats of the 9-base insertion, creating an 18-base insertion (Spike_ins214EPEEPE, Spike_ins214EPDEPE). The results of the searches suggested that the two-cycle tandem repeats of the 9-base insertion were created in November 2021 in Central Europe, whereas the emergence of the original one-cycle 9-base insertion (Spike_ins214EPE) would date back to the middle of 2020 and was away from the Central Europe. The identified 18-base insertions based on 2-cycle tandem repeat of the 9-base insertion were collected between November 2021 and April 2022, suggesting that these mutations could not survive and have been already eliminated. Discussion The GISAID database search implied that this polymorphic indel hotspot to be with one of the highest tolerability for incorporating indels in SARS-CoV-2 S gene. In summary, the present study identified a variable number of tandem repeat of 9-base insertion in the N-terminal domain of SARS-CoV-2 S gene, and the repeat could have occurred at different time from the insertion of the original 9-base insertion.
Collapse
Affiliation(s)
- Tetsuya Akaishi
- Department of Education and Support for Regional Medicine, Tohoku University, Sendai, Japan,COVID-19 Testing Center, Tohoku University, Sendai, Japan,*Correspondence: Tetsuya Akaishi, ✉
| | - Kei Fujiwara
- Department of Gastroenterology and Metabolism, Nagoya City University, Nagoya, Japan
| | - Tadashi Ishii
- Department of Education and Support for Regional Medicine, Tohoku University, Sendai, Japan,COVID-19 Testing Center, Tohoku University, Sendai, Japan
| |
Collapse
|
7
|
Global abundance of short tandem repeats is non-random in rodents and primates. BMC Genom Data 2022; 23:77. [PMID: 36329409 PMCID: PMC9635179 DOI: 10.1186/s12863-022-01092-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 10/18/2022] [Indexed: 11/06/2022] Open
Abstract
Background While of predominant abundance across vertebrate genomes and significant biological implications, the relevance of short tandem repeats (STRs) (also known as microsatellites) to speciation remains largely elusive and attributed to random coincidence for the most part. Here we collected data on the whole-genome abundance of mono-, di-, and trinucleotide STRs in nine species, encompassing rodents and primates, including rat, mouse, olive baboon, gelada, macaque, gorilla, chimpanzee, bonobo, and human. The collected data were used to analyze hierarchical clustering of the STR abundances in the selected species. Results We found massive differential STR abundances between the rodent and primate orders. In addition, while numerous STRs had random abundance across the nine selected species, the global abundance conformed to three consistent < clusters>, as follows: <rat, mouse>, <gelada, macaque, olive baboon>, and <gorilla, chimpanzee, bonobo, human>, which coincided with the phylogenetic distances of the selected species (p < 4E-05). Exceptionally, in the trinucleotide STR compartment, human was significantly distant from all other species. Conclusion Based on hierarchical clustering, we propose that the global abundance of STRs is non-random in rodents and primates, and probably had a determining impact on the speciation of the two orders. We also propose the STRs and STR lengths, which predominantly conformed to the phylogeny of the selected species, exemplified by (t)10, (ct)6, and (taa4). Phylogenetic and experimental platforms are warranted to further examine the observed patterns and the biological mechanisms associated with those STRs.
Collapse
|
8
|
Mattes RD, Rowe SB, Ohlhorst SD, Brown AW, Hoffman DJ, Liska DJ, Feskens EJM, Dhillon J, Tucker KL, Epstein LH, Neufeld LM, Kelley M, Fukagawa NK, Sunde RA, Zeisel SH, Basile AJ, Borth LE, Jackson E. Valuing the Diversity of Research Methods to Advance Nutrition Science. Adv Nutr 2022; 13:1324-1393. [PMID: 35802522 PMCID: PMC9340992 DOI: 10.1093/advances/nmac043] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 04/08/2022] [Indexed: 12/13/2022] Open
Abstract
The ASN Board of Directors appointed the Nutrition Research Task Force to develop a report on scientific methods used in nutrition science to advance discovery, interpretation, and application of knowledge in the field. The genesis of this report was growing concern about the tone of discourse among nutrition professionals and the implications of acrimony on the productive study and translation of nutrition science. Too often, honest differences of opinion are cast as conflicts instead of areas of needed collaboration. Recognition of the value (and limitations) of contributions from well-executed nutrition science derived from the various approaches used in the discipline, as well as appreciation of how their layering will yield the strongest evidence base, will provide a basis for greater productivity and impact. Greater collaborative efforts within the field of nutrition science will require an understanding that each method or approach has a place and function that should be valued and used together to create the nutrition evidence base. Precision nutrition was identified as an important emerging nutrition topic by the preponderance of task force members, and this theme was adopted for the report because it lent itself to integration of many approaches in nutrition science. Although the primary audience for this report is nutrition researchers and other nutrition professionals, a secondary aim is to develop a document useful for the various audiences that translate nutrition research, including journalists, clinicians, and policymakers. The intent is to promote accurate, transparent, verifiable evidence-based communication about nutrition science. This will facilitate reasoned interpretation and application of emerging findings and, thereby, improve understanding and trust in nutrition science and appropriate characterization, development, and adoption of recommendations.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Leonard H Epstein
- University at Buffalo Jacobs School of Medicine and Biomedical Sciences, Buffalo, NY, USA
| | | | - Michael Kelley
- Michael Kelley Nutrition Science Consulting, Wauwatosa, WI, USA
| | - Naomi K Fukagawa
- USDA Beltsville Human Nutrition Research Center, Beltsville, MD, USA
| | | | - Steven H Zeisel
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | | | | |
Collapse
|
9
|
Boldyreva LV, Andreyeva EN, Pindyurin AV. Position Effect Variegation: Role of the Local Chromatin Context in Gene Expression Regulation. Mol Biol 2022. [DOI: 10.1134/s0026893322030049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
10
|
Xiao X, Zhang CY, Zhang Z, Hu Z, Li M, Li T. Revisiting tandem repeats in psychiatric disorders from perspectives of genetics, physiology, and brain evolution. Mol Psychiatry 2022; 27:466-475. [PMID: 34650204 DOI: 10.1038/s41380-021-01329-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 09/16/2021] [Accepted: 09/28/2021] [Indexed: 01/28/2023]
Abstract
Genome-wide association studies (GWASs) have revealed substantial genetic components comprised of single nucleotide polymorphisms (SNPs) in the heritable risk of psychiatric disorders. However, genetic risk factors not covered by GWAS also play pivotal roles in these illnesses. Tandem repeats, which are likely functional but frequently overlooked by GWAS, may account for an important proportion in the "missing heritability" of psychiatric disorders. Despite difficulties in characterizing and quantifying tandem repeats in the genome, studies have been carried out in an attempt to describe impact of tandem repeats on gene regulation and human phenotypes. In this review, we have introduced recent research progress regarding the genomic distribution and regulatory mechanisms of tandem repeats. We have also summarized the current knowledge of the genetic architecture and biological underpinnings of psychiatric disorders brought by studies of tandem repeats. These findings suggest that tandem repeats, in candidate psychiatric risk genes or in different levels of linkage disequilibrium (LD) with psychiatric GWAS SNPs and haplotypes, may modulate biological phenotypes related to psychiatric disorders (e.g., cognitive function and brain physiology) through regulating alternative splicing, promoter activity, enhancer activity and so on. In addition, many tandem repeats undergo tight natural selection in the human lineage, and likely exert crucial roles in human brain evolution. Taken together, the putative roles of tandem repeats in the pathogenesis of psychiatric disorders is strongly implicated, and using examples from previous literatures, we wish to call for further attention to tandem repeats in the post-GWAS era of psychiatric disorders.
Collapse
Affiliation(s)
- Xiao Xiao
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Chu-Yi Zhang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.,Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Zhuohua Zhang
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Zhonghua Hu
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Department of Critical Care Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Hunan Key Laboratory of Animal Models for Human Diseases, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Eye Center of Xiangya Hospital and Hunan Key Laboratory of Ophthalmology, Central South University, Changsha, Hunan, China. .,National Clinical Research Center on Mental Disorders, Changsha, Hunan, China.
| | - Ming Li
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China. .,CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China. .,KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| | - Tao Li
- Affiliated Mental Health Center & Hangzhou Seventh People's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China. .,Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Guangzhou, China.
| |
Collapse
|
11
|
Natural selection at the RASGEF1C (GGC) repeat in human and divergent genotypes in late-onset neurocognitive disorder. Sci Rep 2021; 11:19235. [PMID: 34584172 PMCID: PMC8479062 DOI: 10.1038/s41598-021-98725-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 09/14/2021] [Indexed: 12/17/2022] Open
Abstract
Expression dysregulation of the neuron-specific gene, RASGEF1C (RasGEF Domain Family Member 1C), occurs in late-onset neurocognitive disorders (NCDs), such as Alzheimer's disease. This gene contains a (GGC)13, spanning its core promoter and 5' untranslated region (RASGEF1C-201 ENST00000361132.9). Here we sequenced the (GGC)-repeat in a sample of human subjects (N = 269), consisting of late-onset NCDs (N = 115) and controls (N = 154). We also studied the status of this STR across various primate and non-primate species based on Ensembl 103. The 6-repeat allele was the predominant allele in the controls (frequency = 0.85) and NCD patients (frequency = 0.78). The NCD genotype compartment consisted of an excess of genotypes that lacked the 6-repeat (divergent genotypes) (Mid-P exact = 0.004). A number of those genotypes were not detected in the control group (Mid-P exact = 0.007). The RASGEF1C (GGC)-repeat expanded beyond 2-repeats specifically in primates, and was at maximum length in human. We conclude that there is natural selection for the 6-repeat allele of the RASGEF1C (GGC)-repeat in human, and significant divergence from that allele in late-onset NCDs. STR alleles that are predominantly abundant and genotypes that deviate from those alleles are underappreciated features, which may have deep evolutionary and pathological consequences.
Collapse
|
12
|
Voicu AA, Krützen M, Bilgin Sonay T. Short Tandem Repeats as a High-Resolution Marker for Capturing Recent Orangutan Population Evolution. FRONTIERS IN BIOINFORMATICS 2021; 1:695784. [PMID: 36303734 PMCID: PMC9581056 DOI: 10.3389/fbinf.2021.695784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 07/26/2021] [Indexed: 11/30/2022] Open
Abstract
The genus Pongo is ideal to study population genetics adaptation, given its remarkable phenotypic divergence and the highly contrasting environmental conditions it’s been exposed to. Studying its genetic variation bears the promise to reveal a motion picture of these great apes’ evolutionary and adaptive history, and also helps us expand our knowledge of the patterns of adaptation and evolution. In this work, we advance the understanding of the genetic variation among wild orangutans through a genome-wide study of short tandem repeats (STRs). Their elevated mutation rate makes STRs ideal markers for the study of recent evolution within a given population. Current technological and algorithmic advances have rendered their sequencing and discovery more accurate, therefore their potential can be finally leveraged in population genetics studies. To study patterns of population variation within the wild orangutan population, we genotyped the short tandem repeats in a population of 21 individuals spanning four Sumatran and Bornean (sub-) species and eight Southeast Asian regions. We studied the impact of sequencing depth on our ability to genotype STRs and found that the STR copy number changes function as a powerful marker, correctly capturing the demographic history of these populations, even the divergences as recent as 10 Kya. Moreover, gene ontology enrichments for genes close to STR variants are aligned with local adaptations in the two islands. Coupled with more advanced STR-compatible population models, and selection tests, genomic studies based on STRs will be able to reduce the gap caused by the missing heritability for species with recent adaptations.
Collapse
Affiliation(s)
| | - Michael Krützen
- Department of Anthropology, University of Zurich, Zurich, Switzerland
| | - Tugce Bilgin Sonay
- Department of Anthropology, University of Zurich, Zurich, Switzerland
- Department of Ecology, Evolution and Environmental Biology, Columbia University, New York, NY, United States
- *Correspondence: Tugce Bilgin Sonay,
| |
Collapse
|
13
|
Verbiest MA, Delucchi M, Bilgin Sonay T, Anisimova M. Beyond Microsatellite Instability: Intrinsic Disorder as a Potential Link Between Protein Short Tandem Repeats and Cancer. FRONTIERS IN BIOINFORMATICS 2021; 1:685844. [DOI: 10.3389/fbinf.2021.685844] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 05/21/2021] [Indexed: 12/28/2022] Open
Abstract
Short tandem repeats (STRs) are abundant in genomic sequences and are known for comparatively high mutation rates; STRs therefore are thought to be a potent source of genetic diversity. In protein-coding sequences STRs primarily encode disorder-promoting amino acids and are often located in intrinsically disordered regions (IDRs). STRs are frequently studied in the scope of microsatellite instability (MSI) in cancer, with little focus on the connection between protein STRs and IDRs. We believe, however, that this relationship should be explicitly included when ascertaining STR functionality in cancer. Here we explore this notion using all canonical human proteins from SwissProt, wherein we detected 3,699 STRs. Over 80% of these consisted completely of disorder promoting amino acids. 62.1% of amino acids in STR sequences were predicted to also be in an IDR, compared to 14.2% for non-repeat sequences. Over-representation analysis showed STR-containing proteins to be primarily located in the nucleus where they perform protein- and nucleotide-binding functions and regulate gene expression. They were also enriched in cancer-related signaling pathways. Furthermore, we found enrichments of STR-containing proteins among those correlated with patient survival for cancers derived from eight different anatomical sites. Intriguingly, several of these cancer types are not known to have a MSI-high (MSI-H) phenotype, suggesting that protein STRs play a role in cancer pathology in non MSI-H settings. Their intrinsic link with IDRs could therefore be an attractive topic of future research to further explore the role of STRs and IDRs in cancer. We speculate that our observations may be linked to the known dosage-sensitivity of disordered proteins, which could hint at a concentration-dependent gain-of-function mechanism in cancer for proteins containing STRs and IDRs.
Collapse
|
14
|
Microsatellites as Agents of Adaptive Change: An RNA-Seq-Based Comparative Study of Transcriptomes from Five Helianthus Species. Symmetry (Basel) 2021. [DOI: 10.3390/sym13060933] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Mutations that provide environment-dependent selective advantages drive adaptive divergence among species. Many phenotypic differences among related species are more likely to result from gene expression divergence rather than from non-synonymous mutations. In this regard, cis-regulatory mutations play an important part in generating functionally significant variation. Some proposed mechanisms that explore the role of cis-regulatory mutations in gene expression divergence involve microsatellites. Microsatellites exhibit high mutation rates achieved through symmetric or asymmetric mutation processes and are abundant in both coding and non-coding regions in positions that could influence gene function and products. Here we tested the hypothesis that microsatellites contribute to gene expression divergence among species with 50 individuals from five closely related Helianthus species using an RNA-seq approach. Differential expression analyses of the transcriptomes revealed that genes containing microsatellites in non-coding regions (UTRs and introns) are more likely to be differentially expressed among species when compared to genes with microsatellites in the coding regions and transcripts lacking microsatellites. We detected a greater proportion of shared microsatellites in 5′UTRs and coding regions compared to 3′UTRs and non-coding transcripts among Helianthus spp. Furthermore, allele frequency differences measured by pairwise FST at single nucleotide polymorphisms (SNPs), indicate greater genetic divergence in transcripts containing microsatellites compared to those lacking microsatellites. A gene ontology (GO) analysis revealed that microsatellite-containing differentially expressed genes are significantly enriched for GO terms associated with regulation of transcription and transcription factor activity. Collectively, our study provides compelling evidence to support the role of microsatellites in gene expression divergence.
Collapse
|
15
|
Eslami Rasekh M, Hernández Y, Drinan SD, Fuxman Bass J, Benson G. Genome-wide characterization of human minisatellite VNTRs: population-specific alleles and gene expression differences. Nucleic Acids Res 2021; 49:4308-4324. [PMID: 33849068 PMCID: PMC8096271 DOI: 10.1093/nar/gkab224] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 03/06/2021] [Accepted: 03/18/2021] [Indexed: 11/12/2022] Open
Abstract
Variable Number Tandem Repeats (VNTRs) are tandem repeat (TR) loci that vary in copy number across a population. Using our program, VNTRseek, we analyzed human whole genome sequencing datasets from 2770 individuals in order to detect minisatellite VNTRs, i.e., those with pattern sizes ≥7 bp. We detected 35 638 VNTR loci and classified 5676 as commonly polymorphic (i.e. with non-reference alleles occurring in >5% of the population). Commonly polymorphic VNTR loci were found to be enriched in genomic regions with regulatory function, i.e. transcription start sites and enhancers. Investigation of the commonly polymorphic VNTRs in the context of population ancestry revealed that 1096 loci contained population-specific alleles and that those could be used to classify individuals into super-populations with near-perfect accuracy. Search for quantitative trait loci (eQTLs), among the VNTRs proximal to genes, indicated that in 187 genes expression differences correlated with VNTR genotype. We validated our predictions in several ways, including experimentally, through the identification of predicted alleles in long reads, and by comparisons showing consistency between sequencing platforms. This study is the most comprehensive analysis of minisatellite VNTRs in the human population to date.
Collapse
Affiliation(s)
| | - Yözen Hernández
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | | | - Juan I Fuxman Bass
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biology, Boston University, Boston, MA 02215, USA
| | - Gary Benson
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biology, Boston University, Boston, MA 02215, USA
- Department of Computer Science, Boston University, Boston, MA 02215, USA
| |
Collapse
|
16
|
Kim MH, Yang GE, Jeong MS, Mun JY, Lee SY, Nam JK, Choi YH, Kim TN, Leem SH. VNTR polymorphism in the breakpoint region of ABL1 and susceptibility to bladder cancer. BMC Med Genomics 2021; 14:121. [PMID: 33952249 PMCID: PMC8097952 DOI: 10.1186/s12920-021-00968-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 04/21/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND ABL1 is primarily known as a leukemia-related oncogene due to translocation, but about 2.2% of ABL1 mutations have been identified in bladder cancer, and high expression in solid cancer has also been detected. METHODS Here, we used the NCBI database, UCSC genome browser gateway and Tandem repeat finder program to investigate the structural characterization of the ABL1 breakpoint region and to identify the variable number of tandem repeats (VNTR). To investigate the relationship between ABL1-MS1 and bladder cancer, a case-controlled study was conducted in 207 controls and 197 bladder cancer patients. We also examined the level of transcription of the reporter gene driven by the ABL1 promoter to determine if the VNTR region affects gene expression. RESULTS In our study, one VNTR was identified in the breakpoint region, the intron 1 region of ABL1, and was named ABL1-MS1. In the control group, only two common alleles (TR13, TR15) were detected, but an additional two rare alleles (TR14, TR16) were detected in bladder cancer. A statistically significant association was identified between the rare ABL1-MS1 allele and bladder cancer risk: P = 0.013. Investigating the level of transcription of the reporter gene driven by the ABL1 promoter, VNTR showed inhibition of ABL1 expression in non-cancer cells 293 T, but not in bladder cancer cells. In addition, ABL1-MS1 was accurately passed on to offspring according to Mendelian inheritance through meiosis. CONCLUSIONS Therefore, the ABL1-MS1 region can affect ABL1 expression of bladder cancer. This study provides that ABL1-MS1 can be used as a DNA fingerprinting marker. In addition, rare allele detection can predict susceptibility to bladder cancer.
Collapse
Affiliation(s)
- Min-Hye Kim
- Department of Biomedical Sciences, Dong-A University, Busan, 49315 Korea
| | - Gi-Eun Yang
- Department of Biomedical Sciences, Dong-A University, Busan, 49315 Korea
- Department of Health Sciences, The Graduated of Dong-A University, Busan, 49315 Korea
| | - Mi-So Jeong
- Department of Biomedical Sciences, Dong-A University, Busan, 49315 Korea
| | - Jeong-Yeon Mun
- Department of Biomedical Sciences, Dong-A University, Busan, 49315 Korea
| | - Sang-Yeop Lee
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Ochang, 28119 Korea
| | - Jong-Kil Nam
- Department of Urology, Research Institute for Convergence of Biomedical Science and Technology, Pusan National University Yangsan Hospital, Yangsan, 50612 Korea
| | - Yung Hyun Choi
- Department of Biochemistry, College of Oriental Medicine, Anti-Aging Research Center, Dong-Eui University, Busan, 47227 Korea
| | - Tae Nam Kim
- Department of Urology, Medical Research Institute, Pusan National University Hospital, Busan, 49241 Korea
| | - Sun-Hee Leem
- Department of Biomedical Sciences, Dong-A University, Busan, 49315 Korea
- Department of Health Sciences, The Graduated of Dong-A University, Busan, 49315 Korea
| |
Collapse
|
17
|
Bakhtiari M, Park J, Ding YC, Shleizer-Burko S, Neuhausen SL, Halldórsson BV, Stefánsson K, Gymrek M, Bafna V. Variable number tandem repeats mediate the expression of proximal genes. Nat Commun 2021; 12:2075. [PMID: 33824302 PMCID: PMC8024321 DOI: 10.1038/s41467-021-22206-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 02/17/2021] [Indexed: 12/12/2022] Open
Abstract
Variable number tandem repeats (VNTRs) account for significant genetic variation in many organisms. In humans, VNTRs have been implicated in both Mendelian and complex disorders, but are largely ignored by genomic pipelines due to the complexity of genotyping and the computational expense. We describe adVNTR-NN, a method that uses shallow neural networks to genotype a VNTR in 18 seconds on 55X whole genome data, while maintaining high accuracy. We use adVNTR-NN to genotype 10,264 VNTRs in 652 GTEx individuals. Associating VNTR length with gene expression in 46 tissues, we identify 163 "eVNTRs". Of the 22 eVNTRs in blood where independent data is available, 21 (95%) are replicated in terms of significance and direction of association. 49% of the eVNTR loci show a strong and likely causal impact on the expression of genes and 80% have maximum effect size at least 0.3. The impacted genes are involved in diseases including Alzheimer's, obesity and familial cancers, highlighting the importance of VNTRs for understanding the genetic basis of complex diseases.
Collapse
Affiliation(s)
- Mehrdad Bakhtiari
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Yuan-Chun Ding
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, CA, USA
| | | | - Susan L Neuhausen
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, CA, USA
| | | | | | - Melissa Gymrek
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Vineet Bafna
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
18
|
Annear DJ, Vandeweyer G, Elinck E, Sanchis-Juan A, French CE, Raymond L, Kooy RF. Abundancy of polymorphic CGG repeats in the human genome suggest a broad involvement in neurological disease. Sci Rep 2021; 11:2515. [PMID: 33510257 PMCID: PMC7844047 DOI: 10.1038/s41598-021-82050-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Accepted: 12/29/2020] [Indexed: 11/09/2022] Open
Abstract
Expanded CGG-repeats have been linked to neurodevelopmental and neurodegenerative disorders, including the fragile X syndrome and fragile X-associated tremor/ataxia syndrome (FXTAS). We hypothesized that as of yet uncharacterised CGG-repeat expansions within the genome contribute to human disease. To catalogue the CGG-repeats, 544 human whole genomes were analyzed. In total, 6101 unique CGG-repeats were detected of which more than 93% were highly variable in repeat length. Repeats with a median size of 12 repeat units or more were always polymorphic but shorter repeats were often polymorphic, suggesting a potential intergenerational instability of the CGG region even for repeats units with a median length of four or less. 410 of the CGG repeats were associated with known neurodevelopmental disease genes or with strong candidate genes. Based on their frequency and genomic location, CGG repeats may thus be a currently overlooked cause of human disease.
Collapse
Affiliation(s)
- Dale J Annear
- Department of Medical Genetics, University of Antwerp, Antwerp, Belgium
| | - Geert Vandeweyer
- Department of Medical Genetics, University of Antwerp, Antwerp, Belgium
| | - Ellen Elinck
- Department of Medical Genetics, University of Antwerp, Antwerp, Belgium
| | - Alba Sanchis-Juan
- NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK.,Department of Haematology, NHS Blood and Transplant Centre, University of Cambridge, Cambridge, CB2 0PT, UK
| | - Courtney E French
- Department of Paediatrics, University of Cambridge, Cambridge, CB2 0QQ, UK
| | - Lucy Raymond
- NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK.,Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, CB2 0XY, UK
| | - R Frank Kooy
- Department of Medical Genetics, University of Antwerp, Antwerp, Belgium.
| |
Collapse
|
19
|
Mitsuhashi S, Frith MC, Matsumoto N. Genome-wide survey of tandem repeats by nanopore sequencing shows that disease-associated repeats are more polymorphic in the general population. BMC Med Genomics 2021; 14:17. [PMID: 33413375 PMCID: PMC7791882 DOI: 10.1186/s12920-020-00853-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 12/08/2020] [Indexed: 12/13/2022] Open
Abstract
Background Tandem repeats are highly mutable and contribute to the development of human disease by a variety of mechanisms. It is difficult to predict which tandem repeats may cause a disease. One hypothesis is that changeable tandem repeats are the source of genetic diseases, because disease-causing repeats are polymorphic in healthy individuals. However, it is not clear whether disease-causing repeats are more polymorphic than other repeats. Methods We performed a genome-wide survey of the millions of human tandem repeats using publicly available long read genome sequencing data from 21 humans. We measured tandem repeat copy number changes using tandem-genotypes. Length variation of known disease-associated repeats was compared to other repeat loci. Results We found that known Mendelian disease-causing or disease-associated repeats, especially CAG and 5′UTR GGC repeats, are relatively long and polymorphic in the general population. We also show that repeat lengths of two disease-causing tandem repeats, in ATXN3 and GLS, are correlated with near-by GWAS SNP genotypes. Conclusions We provide a catalog of polymorphic tandem repeats across a variety of repeat unit lengths and sequences, from long read sequencing data. This method especially if used in genome wide association study, may indicate possible new candidates of pathogenic or biologically important tandem repeats in human genomes.
Collapse
Affiliation(s)
- Satomi Mitsuhashi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan. .,Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University, M&D Tower 24F, 1-5-45 Yushima, Bunkyo-ku, Tokyo, 113-8510, Japan.
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan.
| |
Collapse
|
20
|
Zeisel SH. Precision (Personalized) Nutrition: Understanding Metabolic Heterogeneity. Annu Rev Food Sci Technol 2020; 11:71-92. [DOI: 10.1146/annurev-food-032519-051736] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
People differ in their requirements for and responses to nutrients and bioactive molecules in the diet. Many inputs contribute to metabolic heterogeneity (including variations in genetics, epigenetics, microbiome, lifestyle, diet intake, and environmental exposure). Precision nutrition is not about developing unique prescriptions for individual people but rather about stratifying people into different subgroups of the population on the basis of biomarkers of the above-listed sources of metabolic variation and then using this stratification to better estimate the different subgroups’ dietary requirements, thereby enabling better dietary recommendations and interventions. The hope is that we will be able to subcategorize people into ever-smaller groups that can be targeted in terms of recommendations, but we will never achieve this at the individual level, thus, the choice of precision nutrition rather than personalized nutrition to designate this new field. This review focuses mainly on genetically related sources of metabolic heterogeneity and identifies challenges that need to be overcome to achieve a full understanding of the complex interactions between the many sources of metabolic heterogeneity that make people differ from one another in their requirements for and responses to foods. It also discusses the commercial applications of precision nutrition.
Collapse
Affiliation(s)
- Steven H. Zeisel
- Nutrition Research Institute, Department of Nutrition, University of North Carolina, Kannapolis, North Carolina 28081, USA
| |
Collapse
|
21
|
Shortt JA, Ruggiero RP, Cox C, Wacholder AC, Pollock DD. Finding and extending ancient simple sequence repeat-derived regions in the human genome. Mob DNA 2020; 11:11. [PMID: 32095164 PMCID: PMC7027126 DOI: 10.1186/s13100-020-00206-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 02/04/2020] [Indexed: 12/19/2022] Open
Abstract
Background Previously, 3% of the human genome has been annotated as simple sequence repeats (SSRs), similar to the proportion annotated as protein coding. The origin of much of the genome is not well annotated, however, and some of the unidentified regions are likely to be ancient SSR-derived regions not identified by current methods. The identification of these regions is complicated because SSRs appear to evolve through complex cycles of expansion and contraction, often interrupted by mutations that alter both the repeated motif and mutation rate. We applied an empirical, kmer-based, approach to identify genome regions that are likely derived from SSRs. Results The sequences flanking annotated SSRs are enriched for similar sequences and for SSRs with similar motifs, suggesting that the evolutionary remains of SSR activity abound in regions near obvious SSRs. Using our previously described P-clouds approach, we identified ‘SSR-clouds’, groups of similar kmers (or ‘oligos’) that are enriched near a training set of unbroken SSR loci, and then used the SSR-clouds to detect likely SSR-derived regions throughout the genome. Conclusions Our analysis indicates that the amount of likely SSR-derived sequence in the human genome is 6.77%, over twice as much as previous estimates, including millions of newly identified ancient SSR-derived loci. SSR-clouds identified poly-A sequences adjacent to transposable element termini in over 74% of the oldest class of Alu (roughly, AluJ), validating the sensitivity of the approach. Poly-A’s annotated by SSR-clouds also had a length distribution that was more consistent with their poly-A origins, with mean about 35 bp even in older Alus. This work demonstrates that the high sensitivity provided by SSR-Clouds improves the detection of SSR-derived regions and will enable deeper analysis of how decaying repeats contribute to genome structure.
Collapse
Affiliation(s)
- Jonathan A Shortt
- 1Colorado Center for Personalized Medicine, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Robert P Ruggiero
- 2Department of Biology, Southeast Missouri State University, Cape Girardeau, MO 63701 USA
| | - Corey Cox
- 1Colorado Center for Personalized Medicine, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Aaron C Wacholder
- 3Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213 USA
| | - David D Pollock
- 4Department of Biochemistry & Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045 USA
| |
Collapse
|
22
|
Kinney N, Kang L, Eckstrand L, Pulenthiran A, Samuel P, Anandakrishnan R, Varghese RT, Michalak P, Garner HR. Abundance of ethnically biased microsatellites in human gene regions. PLoS One 2019; 14:e0225216. [PMID: 31830051 PMCID: PMC6907796 DOI: 10.1371/journal.pone.0225216] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 10/29/2019] [Indexed: 12/16/2022] Open
Abstract
Microsatellites-a type of short tandem repeat (STR)-have been used for decades as putatively neutral markers to study the genetic structure of diverse human populations. However, recent studies have demonstrated that some microsatellites contribute to gene expression, cis heritability, and phenotype. As a corollary, some microsatellites may contribute to differential gene expression and RNA/protein structure stability in distinct human populations. To test this hypothesis, we investigate genotype frequencies, functional relevance, and adaptive potential of microsatellites in five super-populations (ethnicities) drawn from the 1000 Genomes Project. We discover 3,984 ethnically-biased microsatellite loci (EBML); for each EBML at least one ethnicity has genotype frequencies statistically different from the remaining four. South Asian, East Asian, European, and American EBML show significant overlap; on the contrary, the set of African EBML is mostly unique. We cross-reference the 3,984 EBML with 2,060 previously identified expression STRs (eSTRs); repeats known to affect gene expression (64 total) are over-represented. The most significant pathway enrichments are those associated with the matrisome: a broad collection of genes encoding the extracellular matrix and its associated proteins. At least 14 of the EBML have established links to human disease. Analysis of the 3,984 EBML with respect to known selective sweep regions in the genome shows that allelic variation in some of them is likely associated with adaptive evolution.
Collapse
Affiliation(s)
- Nick Kinney
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
- Gibbs Cancer Center & Research Institute, Spartanburg, SC, United States of America
| | - Lin Kang
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
- Gibbs Cancer Center & Research Institute, Spartanburg, SC, United States of America
| | - Laurel Eckstrand
- Virginia-Maryland College of Veterinary Medicine, Blacksburg, VA, United States of America
| | - Arichanah Pulenthiran
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
| | - Peter Samuel
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
| | - Ramu Anandakrishnan
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
| | - Robin T. Varghese
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
| | - P. Michalak
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
- Virginia-Maryland College of Veterinary Medicine, Blacksburg, VA, United States of America
- Institute of Evolution, University of Haifa, Haifa, Israel
| | - Harold R. Garner
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
- Gibbs Cancer Center & Research Institute, Spartanburg, SC, United States of America
| |
Collapse
|
23
|
Cechova M, Harris RS, Tomaszkiewicz M, Arbeithuber B, Chiaromonte F, Makova KD. High satellite repeat turnover in great apes studied with short- and long-read technologies. Mol Biol Evol 2019; 36:2415-2431. [PMID: 31273383 PMCID: PMC6805231 DOI: 10.1093/molbev/msz156] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 06/12/2019] [Accepted: 06/13/2019] [Indexed: 12/23/2022] Open
Abstract
Satellite repeats are a structural component of centromeres and telomeres, and in some instances, their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50 bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently shared among species, which formed two groups: 1) the (AATGG)n repeat (critical for heat shock response) and its derivatives; and 2) subtelomeric 32-mers involved in telomeric metabolism. Using the densities of abundant repeats, individuals could be classified into species. However, clustering did not reproduce the accepted species phylogeny, suggesting rapid repeat evolution. Several abundant repeats were enriched in males versus females; using Y chromosome assemblies or Fluorescent In Situ Hybridization, we validated their location on the Y. Finally, applying a novel computational tool, we identified many satellite repeats completely embedded within long Oxford Nanopore and Pacific Biosciences reads. Such repeats were up to 59 kb in length and consisted of perfect repeats interspersed with other similar sequences. Our results based on sequencing reads generated with three different technologies provide the first detailed characterization of great ape satellite repeats, and open new avenues for exploring their functions.
Collapse
Affiliation(s)
- Monika Cechova
- Department of Biology, Pennsylvania State University, University Park, PA USA
| | - Robert S Harris
- Department of Biology, Pennsylvania State University, University Park, PA USA
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University, University Park, PA USA
| | - Barbara Arbeithuber
- Department of Biology, Pennsylvania State University, University Park, PA USA
| | - Francesca Chiaromonte
- Department of Statistics, Pennsylvania State University, University Park, PA USA.,EMbeDS, Sant'Anna School of Advanced Studies, Pisa, Italy.,Center for Medical Genomics, Penn State, University Park, PA USA
| | | |
Collapse
|
24
|
De novo emergence and potential function of human-specific tandem repeats in brain-related loci. Hum Genet 2019; 138:661-672. [PMID: 31069507 DOI: 10.1007/s00439-019-02017-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 04/16/2019] [Indexed: 01/02/2023]
Abstract
Tandem repeats (TRs) are widespread in the genomes of all living organisms. In eukaryotes, they are found in both coding and noncoding regions and have potential roles in the regulation of cellular processes such as transcription, translation and in the modification of protein structure. Recent studies have highlighted TRs as a key regulator of gene expression and a potential contributor to human evolution. Thus, TRs are emerging as an important source of variation that can result in differential gene expression at intra- and inter-species levels. In this study, we performed a genome-wide survey to identify TRs that have emerged in the human lineage. We further examined these loci to explore their potential functional significance for human evolution. We identified 152 human-specific TR (HSTR) loci containing a repeat unit of more than ten bases, with most of them showing a repeat count of two. Gene set enrichment analysis showed that HSTR-associated genes were associated with biological functions in brain development and synapse function. In addition, we compared gene expression of human HSTR loci with orthologues from non-human primates (NHP) in seven different tissues. Strikingly, the expression level of HSTR-associated genes in brain tissues was significantly higher in human than in NHP. These results suggest the possibility that de novo emergence of TRs could have resulted in altered gene expression in humans within a short-time frame and contributed to the rapid evolution of human brain function.
Collapse
|
25
|
Banguera-Hinestroza E, Ferrada E, Sawall Y, Flot JF. Computational Characterization of the mtORF of Pocilloporid Corals: Insights into Protein Structure and Function in Stylophora Lineages from Contrasting Environments. Genes (Basel) 2019; 10:E324. [PMID: 31035578 PMCID: PMC6562464 DOI: 10.3390/genes10050324] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Revised: 04/22/2019] [Accepted: 04/23/2019] [Indexed: 01/15/2023] Open
Abstract
More than a decade ago, a new mitochondrial Open Reading Frame (mtORF) was discovered in corals of the family Pocilloporidae and has been used since then as an effective barcode for these corals. Recently, mtORF sequencing revealed the existence of two differentiated Stylophora lineages occurring in sympatry along the environmental gradient of the Red Sea (18.5°C to 33.9°C). In the endemic Red Sea lineage RS_LinB, the mtORF and the heat shock protein gene hsp70 uncovered similar phylogeographic patterns strongly correlated with environmental variations. This suggests that the mtORF too might be involved in thermal adaptation. Here, we used computational analyses to explore the features and putative function of this mtORF. In particular, we tested the likelihood that this gene encodes a functional protein and whether it may play a role in adaptation. Analyses of full mitogenomes showed that the mtORF originated in the common ancestor of Madracis and other pocilloporids, and that it encodes a transmembrane protein differing in length and domain architecture among genera. Homology-based annotation and the relative conservation of metal-binding sites revealed traces of an ancient hydrolase catalytic activity. Furthermore, signals of pervasive purifying selection, lack of stop codons in 1830 sequences analyzed, and a codon-usage bias similar to that of other mitochondrial genes indicate that the protein is functional, i.e., not a pseudogene. Other features, such as intrinsically disordered regions, tandem repeats, and signals of positive selection particularly in StylophoraRS_LinB populations, are consistent with a role of the mtORF in adaptive responses to environmental changes.
Collapse
Affiliation(s)
- Eulalia Banguera-Hinestroza
- Evolutionary Biology and Ecology, Université libre de Bruxelles, B-1050 Brussels, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels-(IB)2, 1050 Brussels, Belgium.
| | - Evandro Ferrada
- Center for Genomics and Bioinformatics, Universidad Mayor, Santiago, Chile.
| | - Yvonne Sawall
- Coral Reef Ecology, Bermuda Institute of Ocean Sciences (BIOS), St.George's GE 01, Bermuda.
| | - Jean-François Flot
- Evolutionary Biology and Ecology, Université libre de Bruxelles, B-1050 Brussels, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels-(IB)2, 1050 Brussels, Belgium.
| |
Collapse
|
26
|
Press MO, Hall AN, Morton EA, Queitsch C. Substitutions Are Boring: Some Arguments about Parallel Mutations and High Mutation Rates. Trends Genet 2019; 35:253-264. [PMID: 30797597 PMCID: PMC6435258 DOI: 10.1016/j.tig.2019.01.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 12/20/2018] [Accepted: 01/14/2019] [Indexed: 12/31/2022]
Abstract
Extant genomes are largely shaped by global transposition, copy-number fluctuation, and rearrangement of DNA sequences rather than by substitutions of single nucleotides. Although many of these large-scale mutations have low probabilities and are unlikely to repeat, others are recurrent or predictable in their effects, leading to stereotyped genome architectures and genetic variation in both eukaryotes and prokaryotes. Such recurrent, parallel mutation modes can profoundly shape the paths taken by evolution and undermine common models of evolutionary genetics. Similar patterns are also evident at the smaller scales of individual genes or short sequences. The scale and extent of this 'non-substitution' variation has recently come into focus through the advent of new genomic technologies; however, it is still not widely considered in genotype-phenotype association studies. In this review we identify common features of these disparate mutational phenomena and comment on the importance and interpretation of these mutational patterns.
Collapse
Affiliation(s)
| | - Ashley N Hall
- Department of Genome Sciences, University of Washington, Seattle, WA 91895, USA; Department of Molecular and Cellular Biology, University of Washington, Seattle, WA 91895, USA
| | - Elizabeth A Morton
- Department of Genome Sciences, University of Washington, Seattle, WA 91895, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, WA 91895, USA.
| |
Collapse
|
27
|
Chen F, Fengling Lai, Luo M, Han YS, Cheng H, Zhou R. The genome-wide landscape of small insertion and deletion mutations in Monopterus albus. J Genet Genomics 2019; 46:75-86. [PMID: 30867123 DOI: 10.1016/j.jgg.2019.02.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Revised: 12/21/2018] [Accepted: 02/01/2019] [Indexed: 11/17/2022]
Abstract
Insertion and deletion (indel) mutations, which can trigger single nucleotide substitutions on the flanking regions of genes, may generate abundant materials for disease defense, reproduction, species survival and evolution. However, genetic and evolutionary mechanisms of indels remain elusive. We establish a comparative genome-transcriptome-alignment approach for a large-scale identification of indels in Monopterus population. Over 2000 indels in 1738 indel genes, including 1-21 bp deletions and 1-15 bp insertions, were detected. Each indel gene had ∼1.1 deletions/insertions, and 2-4 alleles in population. Frequencies of deletions were prominently higher than those of insertions on both genome and population levels. Most of the indels led to in frame mutations with multiples of three and majorly occurred in non-domain regions, indicating functional constraint or tolerance of the indels. All indel genes showed higher expression levels than non-indel genes during sex reversal. Slide window analysis of global expression levels in gonads showed a significant positive correlation with indel density in the genome. Moreover, indel genes were evolutionarily conserved and evolved slowly compared to non-indel genes. Notably, population genetic structure of indels revealed divergent evolution of Monopterus population, as bottleneck effect of biogeographic isolation by Taiwan Strait, China.
Collapse
Affiliation(s)
- Feng Chen
- Hubei Key Laboratory of Cell Homeostasis, College of Life Sciences, Wuhan University, Wuhan, 430072, China
| | - Fengling Lai
- Hubei Key Laboratory of Cell Homeostasis, College of Life Sciences, Wuhan University, Wuhan, 430072, China
| | - Majing Luo
- Hubei Key Laboratory of Cell Homeostasis, College of Life Sciences, Wuhan University, Wuhan, 430072, China
| | - Yu-San Han
- Institute of Fisheries Science, College of Life Science, "National Taiwan University", Taipei, 10617, Taiwan, China
| | - Hanhua Cheng
- Hubei Key Laboratory of Cell Homeostasis, College of Life Sciences, Wuhan University, Wuhan, 430072, China.
| | - Rongjia Zhou
- Hubei Key Laboratory of Cell Homeostasis, College of Life Sciences, Wuhan University, Wuhan, 430072, China.
| |
Collapse
|
28
|
Farnoud F, Schwartz M, Bruck J. Estimation of duplication history under a stochastic model for tandem repeats. BMC Bioinformatics 2019; 20:64. [PMID: 30727948 PMCID: PMC6364452 DOI: 10.1186/s12859-019-2603-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 01/03/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Tandem repeat sequences are common in the genomes of many organisms and are known to cause important phenomena such as gene silencing and rapid morphological changes. Due to the presence of multiple copies of the same pattern in tandem repeats and their high variability, they contain a wealth of information about the mutations that have led to their formation. The ability to extract this information can enhance our understanding of evolutionary mechanisms. RESULTS We present a stochastic model for the formation of tandem repeats via tandem duplication and substitution mutations. Based on the analysis of this model, we develop a method for estimating the relative mutation rates of duplications and substitutions, as well as the total number of mutations, in the history of a tandem repeat sequence. We validate our estimation method via Monte Carlo simulation and show that it outperforms the state-of-the-art algorithm for discovering the duplication history. We also apply our method to tandem repeat sequences in the human genome, where it demonstrates the different behaviors of micro- and mini-satellites and can be used to compare mutation rates across chromosomes. It is observed that chromosomes that exhibit the highest mutation activity in tandem repeat regions are the same as those thought to have the highest overall mutation rates. However, unlike previous works that rely on comparing human and chimpanzee genomes to measure mutation rates, the proposed method allows us to find chromosomes with the highest mutation activity based on a single genome, in essence by comparing (approximate) copies of the pattern in tandem repeats. CONCLUSION The prevalence of tandem repeats in most organisms and the efficiency of the proposed method enable studying various aspects of the formation of tandem repeats and the surrounding sequences in a wide range of settings. AVAILABILITY The implementation of the estimation method is available at http://ips.lab.virginia.edu/smtr .
Collapse
Affiliation(s)
- Farzad Farnoud
- Department of Electrical and Computer Engineering, Department of Computer Science, University of Virginia, Charlottesville, USA
| | - Moshe Schwartz
- Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Jehoshua Bruck
- Department of Electrical Engineering, California Institute of Technology, Pasadena, USA
| |
Collapse
|
29
|
Zablotskaya A, Van Esch H, Verstrepen KJ, Froyen G, Vermeesch JR. Mapping the landscape of tandem repeat variability by targeted long read single molecule sequencing in familial X-linked intellectual disability. BMC Med Genomics 2018; 11:123. [PMID: 30567555 PMCID: PMC6299999 DOI: 10.1186/s12920-018-0446-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 12/06/2018] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The etiology of more than half of all patients with X-linked intellectual disability remains elusive, despite array-based comparative genomic hybridization, whole exome or genome sequencing. Since short read massive parallel sequencing approaches do not allow the detection of larger tandem repeat expansions, we hypothesized that such expansions could be a hidden cause of X-linked intellectual disability. METHODS We selectively captured over 1800 tandem repeats on the X chromosome and characterized them by long read single molecule sequencing in 3 families with idiopathic X-linked intellectual disability. RESULTS In male DNA samples, full tandem repeat length sequences were obtained for 88-93% of the targets and up to 99.6% of the repeats with a moderate guanine-cytosine content. Read length and analysis pipeline allow to detect cases of > 900 bp tandem repeat expansion. In one family, one repeat expansion co-occurs with down-regulation of the neighboring MIR222 gene. This gene has previously been implicated in intellectual disability and is apparently linked to FMR1 and NEFH overexpression associated with neurological disorders. CONCLUSIONS This study demonstrates the power of single molecule sequencing to measure tandem repeat lengths and detect expansions, and suggests that tandem repeat mutations may be a hidden cause of X-linked intellectual disability.
Collapse
Affiliation(s)
- Alena Zablotskaya
- Department of Human Genetics and Center for Human Genetics, Laboratory for Cytogenetics and Genome Research, University Hospitals Leuven, KU Leuven, O&N I Herestraat 49 - box 606, 3000, Leuven, Belgium
| | - Hilde Van Esch
- Department of Human Genetics and Center for Human Genetics, Laboratory for Genetics of Cognition, University Hospitals Leuven, KU Leuven, O&N I Herestraat 49 - box 606, 3000, Leuven, Belgium
| | - Kevin J Verstrepen
- VIB Center for Microbiology and CMPG Lab for Genetics and Genomics, KU Leuven, Gaston Geenslaan 1 - box 2471, 3001, Leuven, Belgium
| | - Guy Froyen
- Clinical Biology, Laboratory for Molecular Diagnostics, Jessa Hospital, Stadsomvaart 11, 3500, Hasselt, Belgium
| | - Joris R Vermeesch
- Department of Human Genetics and Center for Human Genetics, Laboratory for Cytogenetics and Genome Research, University Hospitals Leuven, KU Leuven, O&N I Herestraat 49 - box 606, 3000, Leuven, Belgium.
| |
Collapse
|
30
|
Corney BPA, Widnall CL, Rees DJ, Davies JS, Crunelli V, Carter DA. Regulatory Architecture of the Neuronal Cacng2/Tarpγ2 Gene Promoter: Multiple Repressive Domains, a Polymorphic Regulatory Short Tandem Repeat, and Bidirectional Organization with Co-regulated lncRNAs. J Mol Neurosci 2018; 67:282-294. [PMID: 30478755 PMCID: PMC6373327 DOI: 10.1007/s12031-018-1208-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Accepted: 11/08/2018] [Indexed: 12/14/2022]
Abstract
CACNG2 (TARPγ2, Stargazin) is a multi-functional regulator of excitatory neurotransmission and has been implicated in the pathological processes of several brain diseases. Cacng2 function is dependent upon expression level, but currently, little is known about the molecular mechanisms that control expression of this gene. To address this deficit and investigate disease-related gene variants, we have cloned and characterized the rat Cacng2 promoter and have defined three major features: (i) multiple repressive domains that include an array of RE-1 silencing transcription factor (REST) elements, and a calcium regulatory element-binding factor (CaRF) element, (ii) a (poly-GA) short tandem repeat (STR), and (iii) bidirectional organization with expressed lncRNAs. Functional activity of the promoter was demonstrated in transfected neuronal cell lines (HT22 and PC12), but although selective removal of REST and CaRF domains was shown to enhance promoter-driven transcription, the enhanced Cacng2 promoter constructs were still about fivefold weaker than a comparable rat Synapsin-1 promoter sequence. Direct evidence of REST activity at the Cacng2 promoter was obtained through co-transfection with an established dominant-negative REST (DNR) construct. Investigation of the GA-repeat STR revealed polymorphism across both animal strains and species, and size variation was also observed in absence epilepsy disease model cohorts (Genetic Absence Epilepsy Rats, Strasbourg [GAERS] and non-epileptic control [NEC] rats). These data provide evidence of a genotype (STR)-phenotype correlation that may be unique with respect to proximal gene regulatory sequence in the demonstrated absence of other promoter, or 3' UTR variants in GAERS rats. However, although transcriptional regulatory activity of the STR was demonstrated in further transfection studies, we did not find a GAERS vs. NEC difference, indicating that this specific STR length variation may only be relevant in the context of other (Cacna1h and Kcnk9) gene variants in this disease model. Additional studies revealed further (bidirectional) complexity at the Cacng2 promoter, and we identified novel, co-regulated, antisense rat lncRNAs that are paired with Cacng2 mRNA. These studies have provided novel insights into the organization of a synaptic protein gene promoter, describing multiple repressive and modulatory domains that can mediate diverse regulatory inputs.
Collapse
Affiliation(s)
- B P A Corney
- School of Biosciences, Cardiff University, CF103AX, Cardiff, UK
| | - C L Widnall
- School of Biosciences, Cardiff University, CF103AX, Cardiff, UK
| | - D J Rees
- Molecular Neurobiology, Institute of Life Science, Swansea University, Swansea, SA2 8PP, UK
| | - J S Davies
- Molecular Neurobiology, Institute of Life Science, Swansea University, Swansea, SA2 8PP, UK
| | - V Crunelli
- School of Biosciences, Cardiff University, CF103AX, Cardiff, UK
| | - D A Carter
- School of Biosciences, Cardiff University, CF103AX, Cardiff, UK.
| |
Collapse
|
31
|
Fan W, Xu L, Cheng H, Li M, Liu H, Jiang Y, Guo Y, Zhou Z, Hou S. Characterization of Duck ( Anas platyrhynchos) Short Tandem Repeat Variation by Population-Scale Genome Resequencing. Front Genet 2018; 9:520. [PMID: 30425731 PMCID: PMC6218588 DOI: 10.3389/fgene.2018.00520] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Accepted: 10/15/2018] [Indexed: 12/30/2022] Open
Abstract
Short tandem repeats (STRs) are usually associated with genetic diseases and gene regulatory functions, and are also important genetic markers for analysis of evolutionary, genetic diversity and forensic. However, for the majority of STRs in the duck genome, their population genetic properties and functional impacts remain poorly defined. Recent advent of next generation sequencing (NGS) has offered an opportunity for profiling large numbers of polymorphic STRs. Here, we reported a population-scale analysis of STR variation using genome resequencing in mallard and Pekin duck. Our analysis provided the first genome-wide duck STR reference including 198,022 STR loci with motif size of 2–6 base pairs. We observed a relatively uneven distribution of STRs in different genomic regions, which indicates that the occurrence of STRs in duck genome is not random, but undergoes a directional selection pressure. Using genome resequencing data of 23 mallard and 26 Pekin ducks, we successfully identified 89,891 polymorphic STR loci. Intensive analysis of this dataset suggested that shorter repeat motif, longer reference tract length, higher purity, and residing outside of a coding region are all associated with an increase in STR variability. STR genotypes were utilized for population genetic analysis, and the results showed that population structure and divergence patterns among population groups can be efficiently captured. In addition, comparison between Pekin duck and mallard identified 3,122 STRs with extremely divergent allele frequency, which overlapped with a set of genes related to nervous system, energy metabolism and behavior. The evolutionary analysis revealed that the genes containing divergent STRs may play important roles in phenotypic changes during duck domestication. The variation analysis of STRs in population scale provides valuable resource for future study of genetic diversity and genome evolution in duck.
Collapse
Affiliation(s)
- Wenlei Fan
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China.,State Key Laboratory of Animal Nutrition, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Lingyang Xu
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Hong Cheng
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Ming Li
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Hehe Liu
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yong Jiang
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yuming Guo
- State Key Laboratory of Animal Nutrition, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Zhengkui Zhou
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Shuisheng Hou
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
32
|
Arabfard M, Kavousi K, Delbari A, Ohadi M. Link between short tandem repeats and translation initiation site selection. Hum Genomics 2018; 12:47. [PMID: 30373661 PMCID: PMC6206671 DOI: 10.1186/s40246-018-0181-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 10/10/2018] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Despite their vast biological implication, the relevance of short tandem repeats (STRs)/microsatellites to the protein-coding gene translation initiation sites (TISs) remains largely unknown. METHODS We performed an Ensembl-based comparative genomics study of all annotated orthologous TIS-flanking sequences in human and 46 other species across vertebrates, on the genomic DNA and cDNA platforms (755,956 TISs), aimed at identifying human-specific STRs in this interval. The collected data were used to examine the hypothesis of a link between STRs and TISs. BLAST was used to compare the initial five amino acids (excluding the initial methionine), codons of which were flanked by STRs in human, with the initial five amino acids of all annotated proteins for the orthologous genes in other vertebrates (total of 5,314,979 pair-wise TIS comparisons on the genomic DNA and cDNA platforms) in order to compare the number of events in which human-specific and non-specific STRs occurred with homologous and non-homologous TISs (i.e., ≥ 50% and < 50% similarity of the five amino acids). RESULTS We detected differential distribution of the human-specific STRs in comparison to the overall distribution of STRs on the genomic DNA and cDNA platforms (Mann Whitney U test p = 1.4 × 10-11 and p < 7.9 × 10-11, respectively). We also found excess occurrence of non-homologous TISs with human-specific STRs and excess occurrence of homologous TISs with non-specific STRs on both platforms (p < 0.00001). CONCLUSION We propose a link between STRs and TIS selection, based on the differential co-occurrence rate of human-specific STRs with non-homologous TISs and non-specific STRs with homologous TISs.
Collapse
Affiliation(s)
- Masoud Arabfard
- Department of Bioinformatics, Kish International Campus University of Tehran, Kish, Iran
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Kaveh Kavousi
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Ahmad Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Mina Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| |
Collapse
|
33
|
Quinn JP, Savage AL, Bubb VJ. Non-coding genetic variation shaping mental health. Curr Opin Psychol 2018; 27:18-24. [PMID: 30099302 PMCID: PMC6624474 DOI: 10.1016/j.copsyc.2018.07.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Accepted: 07/16/2018] [Indexed: 12/12/2022]
Abstract
Gene expression determined by the genome mediating a response to cell environment. Genetic variation results in distinct individual response in gene expression. Non-coding DNA is an important site for such functional genetic variation. Gene expression is a major modulator of brain chemistry and thus behavior.
Over 98% of our genome is non-coding and is now recognised to have a major role in orchestrating the tissue specific and stimulus inducible gene expression pattern which underpins our wellbeing and mental health. The non-coding genome responds functionally to our environment at all levels, encompassing the span from psychological to physiological challenge. The gene expression pattern, termed the transcriptome, ultimately gives us our neurochemistry. Therefore a major modulator of mental wellbeing is how our genes are regulated in response to life experiences. Superimposed on the aforementioned non-coding DNA framework is a vast body of genetic variation in the elements that control response to challenges. These differences, termed polymorphisms, allow for a differential response from a specific DNA element to the same challenge thus potentially allowing ‘individuality’ in the modulation of our transcriptome. This review will focus on a fundamental mechanism defining our psychological and psychiatric wellbeing, namely how genetic variation can be correlated with differential gene expression in response to specific challenges, thus resulting in altered neurochemistry which consequently may shape behaviour.
Collapse
Affiliation(s)
- John P Quinn
- Department of Molecular and Clinical Pharmacology, Institute of Translational Medicine, The University of Liverpool, Liverpool L69 3BX, UK.
| | - Abigail L Savage
- Department of Molecular and Clinical Pharmacology, Institute of Translational Medicine, The University of Liverpool, Liverpool L69 3BX, UK
| | - Vivien J Bubb
- Department of Molecular and Clinical Pharmacology, Institute of Translational Medicine, The University of Liverpool, Liverpool L69 3BX, UK
| |
Collapse
|
34
|
Press MO, McCoy RC, Hall AN, Akey JM, Queitsch C. Massive variation of short tandem repeats with functional consequences across strains of Arabidopsis thaliana. Genome Res 2018; 28:1169-1178. [PMID: 29970452 PMCID: PMC6071631 DOI: 10.1101/gr.231753.117] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 06/26/2018] [Indexed: 11/24/2022]
Abstract
Short tandem repeat (STR) mutations may comprise more than half of the mutations in eukaryotic coding DNA, yet STR variation is rarely examined as a contributor to complex traits. We assessed this contribution across a collection of 96 strains of Arabidopsis thaliana, genotyping 2046 STR loci each, using highly parallel STR sequencing with molecular inversion probes. We found that 95% of examined STRs are polymorphic, with a median of six alleles per STR across these strains. STR expansions (large copy number increases) are found in most strains, several of which have evident functional effects. These include three of six intronic STR expansions we found to be associated with intron retention. Coding STRs were depleted of variation relative to noncoding STRs, and we detected a total of 56 coding STRs (11%) showing low variation consistent with the action of purifying selection. In contrast, some STRs show hypervariable patterns consistent with diversifying selection. Finally, we detected 133 novel STR-phenotype associations under stringent criteria, most of which could not be detected with SNPs alone, and validated some with follow-up experiments. Our results support the conclusion that STRs constitute a large, unascertained reservoir of functionally relevant genomic variation.
Collapse
Affiliation(s)
- Maximilian O Press
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Rajiv C McCoy
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Ashley N Hall
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.,Molecular and Cellular Biology Program, University of Washington, Seattle, Washington 98195, USA
| | - Joshua M Akey
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
35
|
Nazaripanah N, Adelirad F, Delbari A, Sahaf R, Abbasi-Asl T, Ohadi M. Genome-scale portrait and evolutionary significance of human-specific core promoter tri- and tetranucleotide short tandem repeats. Hum Genomics 2018; 12:17. [PMID: 29622039 PMCID: PMC5887250 DOI: 10.1186/s40246-018-0149-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Accepted: 03/20/2018] [Indexed: 03/05/2023] Open
Abstract
BACKGROUND While there is an ongoing trend to identify single nucleotide substitutions (SNSs) that are linked to inter/intra-species differences and disease phenotypes, short tandem repeats (STRs)/microsatellites may be of equal (if not more) importance in the above processes. Genes that contain STRs in their promoters have higher expression divergence compared to genes with fixed or no STRs in the gene promoters. In line with the above, recent reports indicate a role of repetitive sequences in the rise of young transcription start sites (TSSs) in human evolution. RESULTS Following a comparative genomics study of all human protein-coding genes annotated in the GeneCards database, here we provide a genome-scale portrait of human-specific short- and medium-size (≥ 3-repeats) tri- and tetranucleotide STRs and STR motifs in the critical core promoter region between - 120 and + 1 to the TSS and evidence of skewing of this compartment in reference to the STRs that are not human-specific (Levene's test p < 0.001). Twenty-five percent and 26% enrichment of human-specific transcripts was detected in the tri and tetra human-specific compartments (mid-p < 0.00002 and mid-p < 0.002, respectively). CONCLUSION Our findings provide the first evidence of genome-scale skewing of STRs at a specific region of the human genome and a link between a number of these STRs and TSS selection/transcript specificity. The STRs and genes listed here may have a role in the evolution and development of characteristics and phenotypes that are unique to the human species.
Collapse
Affiliation(s)
- N Nazaripanah
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - F Adelirad
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - A Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - R Sahaf
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - T Abbasi-Asl
- Department of Biostatistics, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
36
|
Li C, Lenhard B, Luscombe NM. Integrated analysis sheds light on evolutionary trajectories of young transcription start sites in the human genome. Genome Res 2018; 28:676-688. [PMID: 29618487 PMCID: PMC5932608 DOI: 10.1101/gr.231449.117] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 03/21/2018] [Indexed: 01/06/2023]
Abstract
Understanding the molecular mechanisms and evolution of the gene regulatory system remains a major challenge in biology. Transcription start sites (TSSs) are especially interesting because they are central to initiating gene expression. Previous studies revealed widespread transcription initiation and fast turnover of TSSs in mammalian genomes. Yet, how new TSSs originate and how they evolve over time remain poorly understood. To address these questions, we analyzed ∼200,000 human TSSs by integrating evolutionary (inter- and intra-species) and functional genomic data, particularly focusing on evolutionarily young TSSs that emerged in the primate lineage. TSSs were grouped according to their evolutionary age using sequence alignment information as a proxy. Comparisons of young and old TSSs revealed that (1) new TSSs emerge through a combination of intrinsic factors, like the sequence properties of transposable elements and tandem repeats, and extrinsic factors such as their proximity to existing regulatory modules; (2) new TSSs undergo rapid evolution that reduces the inherent instability of repeat sequences associated with a high propensity of TSS emergence; and (3) once established, the transcriptional competence of surviving TSSs is gradually enhanced, with evolutionary changes subject to temporal (fewer regulatory changes in younger TSSs) and spatial constraints (fewer regulatory changes in more isolated TSSs). These findings advance our understanding of how regulatory innovations arise in the genome throughout evolution and highlight the genomic robustness and evolvability in these processes.
Collapse
Affiliation(s)
- Cai Li
- The Francis Crick Institute, London NW1 1AT, United Kingdom
| | - Boris Lenhard
- Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London W12 0NN, United Kingdom.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London W12 0NN, United Kingdom.,Sars International Centre for Marine Molecular Biology, University of Bergen, N-5008 Bergen, Norway
| | - Nicholas M Luscombe
- The Francis Crick Institute, London NW1 1AT, United Kingdom.,UCL Genetics Institute, University College London, London WC1E 6BT, United Kingdom.,Okinawa Institute of Science & Technology Graduate University, Okinawa, 904-0495, Japan
| |
Collapse
|
37
|
Dahlhaus R. Of Men and Mice: Modeling the Fragile X Syndrome. Front Mol Neurosci 2018; 11:41. [PMID: 29599705 PMCID: PMC5862809 DOI: 10.3389/fnmol.2018.00041] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 01/31/2018] [Indexed: 12/26/2022] Open
Abstract
The Fragile X Syndrome (FXS) is one of the most common forms of inherited intellectual disability in all human societies. Caused by the transcriptional silencing of a single gene, the fragile x mental retardation gene FMR1, FXS is characterized by a variety of symptoms, which range from mental disabilities to autism and epilepsy. More than 20 years ago, a first animal model was described, the Fmr1 knock-out mouse. Several other models have been developed since then, including conditional knock-out mice, knock-out rats, a zebrafish and a drosophila model. Using these model systems, various targets for potential pharmaceutical treatments have been identified and many treatments have been shown to be efficient in preclinical studies. However, all attempts to turn these findings into a therapy for patients have failed thus far. In this review, I will discuss underlying difficulties and address potential alternatives for our future research.
Collapse
Affiliation(s)
- Regina Dahlhaus
- Institute for Biochemistry, Emil-Fischer Centre, University of Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
38
|
Abstract
Accumulating evidence suggests that many classes of DNA repeats exhibit attributes that distinguish them from other genetic variants, including the fact that they are more liable to mutation; this enables them to mediate genetic plasticity. The expansion of tandem repeats, particularly of short tandem repeats, can cause a range of disorders (including Huntington disease, various ataxias, motor neuron disease, frontotemporal dementia, fragile X syndrome and other neurological disorders), and emerging data suggest that tandem repeat polymorphisms (TRPs) can also regulate gene expression in healthy individuals. TRPs in human genomes may also contribute to the missing heritability of polygenic disorders. A better understanding of tandem repeats and their associated repeatome, as well as their capacity for genetic plasticity via both germline and somatic mutations, is needed to transform our understanding of the role of TRPs in health and disease.
Collapse
Affiliation(s)
- Anthony J Hannan
- Florey Institute of Neuroscience and Mental Health, University of Melbourne.,Department of Anatomy and Neuroscience, University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
39
|
|
40
|
Bagshaw AT. Functional Mechanisms of Microsatellite DNA in Eukaryotic Genomes. Genome Biol Evol 2017; 9:2428-2443. [PMID: 28957459 PMCID: PMC5622345 DOI: 10.1093/gbe/evx164] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/23/2017] [Indexed: 02/06/2023] Open
Abstract
Microsatellite repeat DNA is best known for its length mutability, which is implicated in several neurological diseases and cancers, and often exploited as a genetic marker. Less well-known is the body of work exploring the widespread and surprisingly diverse functional roles of microsatellites. Recently, emerging evidence includes the finding that normal microsatellite polymorphism contributes substantially to the heritability of human gene expression on a genome-wide scale, calling attention to the task of elucidating the mechanisms involved. At present, these are underexplored, but several themes have emerged. I review evidence demonstrating roles for microsatellites in modulation of transcription factor binding, spacing between promoter elements, enhancers, cytosine methylation, alternative splicing, mRNA stability, selection of transcription start and termination sites, unusual structural conformations, nucleosome positioning and modification, higher order chromatin structure, noncoding RNA, and meiotic recombination hot spots.
Collapse
|
41
|
Sousa AMM, Meyer KA, Santpere G, Gulden FO, Sestan N. Evolution of the Human Nervous System Function, Structure, and Development. Cell 2017; 170:226-247. [PMID: 28708995 DOI: 10.1016/j.cell.2017.06.036] [Citation(s) in RCA: 236] [Impact Index Per Article: 33.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Revised: 04/21/2017] [Accepted: 06/22/2017] [Indexed: 12/22/2022]
Abstract
The nervous system-in particular, the brain and its cognitive abilities-is among humans' most distinctive and impressive attributes. How the nervous system has changed in the human lineage and how it differs from that of closely related primates is not well understood. Here, we consider recent comparative analyses of extant species that are uncovering new evidence for evolutionary changes in the size and the number of neurons in the human nervous system, as well as the cellular and molecular reorganization of its neural circuits. We also discuss the developmental mechanisms and underlying genetic and molecular changes that generate these structural and functional differences. As relevant new information and tools materialize at an unprecedented pace, the field is now ripe for systematic and functionally relevant studies of the development and evolution of human nervous system specializations.
Collapse
Affiliation(s)
- André M M Sousa
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, USA
| | - Kyle A Meyer
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, USA
| | - Gabriel Santpere
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, USA
| | - Forrest O Gulden
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, USA
| | - Nenad Sestan
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, USA; Department of Genetics, Yale School of Medicine, New Haven, CT, USA; Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA; Section of Comparative Medicine, Yale School of Medicine, New Haven, CT, USA; Program in Cellular Neuroscience, Neurodegeneration and Repair, Yale School of Medicine, New Haven, CT, USA; Yale Child Study Center, Yale School of Medicine, New Haven, CT, USA; Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT, USA.
| |
Collapse
|
42
|
Ishiguro T, Sato N, Ueyama M, Fujikake N, Sellier C, Kanegami A, Tokuda E, Zamiri B, Gall-Duncan T, Mirceta M, Furukawa Y, Yokota T, Wada K, Taylor JP, Pearson CE, Charlet-Berguerand N, Mizusawa H, Nagai Y, Ishikawa K. Regulatory Role of RNA Chaperone TDP-43 for RNA Misfolding and Repeat-Associated Translation in SCA31. Neuron 2017; 94:108-124.e7. [PMID: 28343865 PMCID: PMC5681996 DOI: 10.1016/j.neuron.2017.02.046] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Revised: 10/04/2016] [Accepted: 02/24/2017] [Indexed: 12/20/2022]
Abstract
Microsatellite expansion disorders are pathologically characterized by RNA foci formation and repeat-associated non-AUG (RAN) translation. However, their underlying pathomechanisms and regulation of RAN translation remain unknown. We report that expression of expanded UGGAA (UGGAAexp) repeats, responsible for spinocerebellar ataxia type 31 (SCA31) in Drosophila, causes neurodegeneration accompanied by accumulation of UGGAAexp RNA foci and translation of repeat-associated pentapeptide repeat (PPR) proteins, consistent with observations in SCA31 patient brains. We revealed that motor-neuron disease (MND)-linked RNA-binding proteins (RBPs), TDP-43, FUS, and hnRNPA2B1, bind to and induce structural alteration of UGGAAexp. These RBPs suppress UGGAAexp-mediated toxicity in Drosophila by functioning as RNA chaperones for proper UGGAAexp folding and regulation of PPR translation. Furthermore, nontoxic short UGGAA repeat RNA suppressed mutated RBP aggregation and toxicity in MND Drosophila models. Thus, functional crosstalk of the RNA/RBP network regulates their own quality and balance, suggesting convergence of pathomechanisms in microsatellite expansion disorders and RBP proteinopathies.
Collapse
Affiliation(s)
- Taro Ishiguro
- Department of Neurology and Neurological Science, Graduate School, Tokyo Medical and Dental University (TMDU), Yushima 1-5-45, Bunkyo-ku, Tokyo 113-8519, Japan; Center for Brain Integration Research (CBIR), Tokyo Medical and Dental University (TMDU), Yushima 1-5-45, Bunkyo-ku, Tokyo 113-8519, Japan; Department of Degenerative Neurological Diseases, National Institute of Neuroscience, National Center of Neurology and Psychiatry, 4-1-1 Ogawa-Higashi, Kodaira, Tokyo 187-8502, Japan
| | - Nozomu Sato
- Department of Neurology and Neurological Science, Graduate School, Tokyo Medical and Dental University (TMDU), Yushima 1-5-45, Bunkyo-ku, Tokyo 113-8519, Japan; Center for Brain Integration Research (CBIR), Tokyo Medical and Dental University (TMDU), Yushima 1-5-45, Bunkyo-ku, Tokyo 113-8519, Japan
| | - Morio Ueyama
- Department of Degenerative Neurological Diseases, National Institute of Neuroscience, National Center of Neurology and Psychiatry, 4-1-1 Ogawa-Higashi, Kodaira, Tokyo 187-8502, Japan; Department of Neurotherapeutics, Osaka University Graduate School of Medicine, 2-2 Yamadaoka, Suita, Osaka 565-0871, Japan
| | - Nobuhiro Fujikake
- Department of Degenerative Neurological Diseases, National Institute of Neuroscience, National Center of Neurology and Psychiatry, 4-1-1 Ogawa-Higashi, Kodaira, Tokyo 187-8502, Japan
| | - Chantal Sellier
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, University of Strasbourg, Illkirch 67400, France
| | - Akemi Kanegami
- Research Institute of Biomolecule Metrology, 807-133 Enokido, Tsukuba, Ibaraki 305-0853, Japan
| | - Eiichi Tokuda
- Department of Chemistry, Keio University, 3-14-1 Hiyoshi, Yokohama, Kanagawa 223-8522, Japan
| | - Bita Zamiri
- Department of Pharmaceutical Sciences, Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON M5S 3M2, Canada; Department of Genetics, The Hospital for Sick Children, Peter Gilgan Centre for Research and Learning, 686 Bay Street, Toronto, ON M5G 0A4, Canada
| | - Terence Gall-Duncan
- Department of Genetics, The Hospital for Sick Children, Peter Gilgan Centre for Research and Learning, 686 Bay Street, Toronto, ON M5G 0A4, Canada; Program of Molecular Genetics, University of Toronto, Toronto, ON M5G 0A4, Canada
| | - Mila Mirceta
- Department of Genetics, The Hospital for Sick Children, Peter Gilgan Centre for Research and Learning, 686 Bay Street, Toronto, ON M5G 0A4, Canada; Program of Molecular Genetics, University of Toronto, Toronto, ON M5G 0A4, Canada
| | - Yoshiaki Furukawa
- Department of Chemistry, Keio University, 3-14-1 Hiyoshi, Yokohama, Kanagawa 223-8522, Japan
| | - Takanori Yokota
- Department of Neurology and Neurological Science, Graduate School, Tokyo Medical and Dental University (TMDU), Yushima 1-5-45, Bunkyo-ku, Tokyo 113-8519, Japan; Center for Brain Integration Research (CBIR), Tokyo Medical and Dental University (TMDU), Yushima 1-5-45, Bunkyo-ku, Tokyo 113-8519, Japan
| | - Keiji Wada
- Department of Degenerative Neurological Diseases, National Institute of Neuroscience, National Center of Neurology and Psychiatry, 4-1-1 Ogawa-Higashi, Kodaira, Tokyo 187-8502, Japan
| | - J Paul Taylor
- Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Christopher E Pearson
- Department of Genetics, The Hospital for Sick Children, Peter Gilgan Centre for Research and Learning, 686 Bay Street, Toronto, ON M5G 0A4, Canada; Program of Molecular Genetics, University of Toronto, Toronto, ON M5G 0A4, Canada
| | - Nicolas Charlet-Berguerand
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, University of Strasbourg, Illkirch 67400, France
| | - Hidehiro Mizusawa
- Department of Neurology and Neurological Science, Graduate School, Tokyo Medical and Dental University (TMDU), Yushima 1-5-45, Bunkyo-ku, Tokyo 113-8519, Japan; Center for Brain Integration Research (CBIR), Tokyo Medical and Dental University (TMDU), Yushima 1-5-45, Bunkyo-ku, Tokyo 113-8519, Japan
| | - Yoshitaka Nagai
- Department of Degenerative Neurological Diseases, National Institute of Neuroscience, National Center of Neurology and Psychiatry, 4-1-1 Ogawa-Higashi, Kodaira, Tokyo 187-8502, Japan; Department of Neurotherapeutics, Osaka University Graduate School of Medicine, 2-2 Yamadaoka, Suita, Osaka 565-0871, Japan.
| | - Kinya Ishikawa
- Department of Neurology and Neurological Science, Graduate School, Tokyo Medical and Dental University (TMDU), Yushima 1-5-45, Bunkyo-ku, Tokyo 113-8519, Japan; Center for Brain Integration Research (CBIR), Tokyo Medical and Dental University (TMDU), Yushima 1-5-45, Bunkyo-ku, Tokyo 113-8519, Japan; Center for Personalized Medicine for Healthy Aging, Tokyo Medical and Dental University, Yushima 1-5-45, Bunkyo-ku, Tokyo 113-8519, Japan.
| |
Collapse
|
43
|
Bagshaw ATM, Horwood LJ, Fergusson DM, Gemmell NJ, Kennedy MA. Microsatellite polymorphisms associated with human behavioural and psychological phenotypes including a gene-environment interaction. BMC MEDICAL GENETICS 2017; 18:12. [PMID: 28158988 PMCID: PMC5291968 DOI: 10.1186/s12881-017-0374-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Accepted: 01/25/2017] [Indexed: 02/05/2023]
Abstract
Background The genetic and environmental influences on human personality and behaviour are a complex matter of ongoing debate. Accumulating evidence indicates that short tandem repeats (STRs) in regulatory regions are good candidates to explain heritability not accessed by genome-wide association studies. Methods We tested for associations between the genotypes of four selected repeats and 18 traits relating to personality, behaviour, cognitive ability and mental health in a well-studied longitudinal birth cohort (n = 458-589) using one way analysis of variance. The repeats were a highly conserved poly-AC microsatellite in the upstream promoter region of the T-box brain 1 (TBR1) gene and three previously studied STRs in the activating enhancer-binding protein 2-beta (AP2-β) and androgen receptor (AR) genes. Where significance was found we used multiple regression to assess the influence of confounding factors. Results Carriers of the shorter, most common, allele of the AR gene’s GGN microsatellite polymorphism had fewer anxiety-related symptoms, which was consistent with previous studies, but in our study this was not significant following Bonferroni correction. No associations with two repeats in the AP2-β gene withstood this correction. A novel finding was that carriers of the minor allele of the TBR1 AC microsatellite were at higher risk of conduct problems in childhood at age 7-9 (p = 0.0007, which did pass Bonferroni correction). Including maternal smoking during pregnancy (MSDP) in models controlling for potentially confounding influences showed that an interaction between TBR1 genotype and MSDP was a significant predictor of conduct problems in childhood and adolescence (p < 0.001), and of self-reported criminal behaviour up to age 25 years (p ≤ 0.02). This interaction remained significant after controlling for possible confounders including maternal age at birth, socio-economic status and education, and offspring birth weight. Conclusions The potential functional importance of the TBR1 gene’s promoter microsatellite deserves further investigation. Our results suggest that it participates in a gene-environment interaction with MDSP and antisocial behaviour. However, previous evidence that mothers who smoke during pregnancy carry genes for antisocial behaviour suggests that epistasis may influence the interaction. Electronic supplementary material The online version of this article (doi:10.1186/s12881-017-0374-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Andrew T M Bagshaw
- Department of Pathology, University of Otago, Christchurch, PO Box 4345, Christchurch, New Zealand.
| | - L John Horwood
- Department of Psychological Medicine, University of Otago, Christchurch, New Zealand
| | - David M Fergusson
- Department of Psychological Medicine, University of Otago, Christchurch, New Zealand
| | - Neil J Gemmell
- Department of Anatomy, University of Otago, Dunedin, New Zealand.,Gravida - National Centre for Growth and Development, University of Otago, Dunedin, New Zealand
| | - Martin A Kennedy
- Department of Pathology, University of Otago, Christchurch, PO Box 4345, Christchurch, New Zealand
| |
Collapse
|
44
|
Tørresen OK, Star B, Jentoft S, Reinar WB, Grove H, Miller JR, Walenz BP, Knight J, Ekholm JM, Peluso P, Edvardsen RB, Tooming-Klunderud A, Skage M, Lien S, Jakobsen KS, Nederbragt AJ. An improved genome assembly uncovers prolific tandem repeats in Atlantic cod. BMC Genomics 2017; 18:95. [PMID: 28100185 PMCID: PMC5241972 DOI: 10.1186/s12864-016-3448-x] [Citation(s) in RCA: 115] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 12/20/2016] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies. RESULTS By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual. CONCLUSIONS The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.
Collapse
Affiliation(s)
- Ole K. Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Sissel Jentoft
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
- Department of Natural Sciences, University of Agder, Kristiansand, NO-4604 Norway
| | - William B. Reinar
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Harald Grove
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Ås, NO-1432 Norway
| | - Jason R. Miller
- J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, 20850 MD USA
| | - Brian P. Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, 20892 MD USA
| | - James Knight
- Yale School of Medicine, Yale University, New Haven, 06520 CT USA
| | | | | | | | - Ave Tooming-Klunderud
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Morten Skage
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Sigbjørn Lien
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Ås, NO-1432 Norway
| | - Kjetill S. Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Alexander J. Nederbragt
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
- Biomedical Informatics Research Group, Department of Informatics, University of Oslo, Oslo, NO-0316 Norway
| |
Collapse
|
45
|
Chen HY, Ma SL, Huang W, Ji L, Leung VHK, Jiang H, Yao X, Tang NLS. The mechanism of transactivation regulation due to polymorphic short tandem repeats (STRs) using IGF1 promoter as a model. Sci Rep 2016; 6:38225. [PMID: 27910883 PMCID: PMC5133613 DOI: 10.1038/srep38225] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 11/07/2016] [Indexed: 11/09/2022] Open
Abstract
Functional short tandem repeats (STR) are polymorphic in the population, and the number of repeats regulates the expression of nearby genes (known as expression STR, eSTR). STR in IGF1 promoter has been extensively studied for its association with IGF1 concentration in blood and various clinical traits and represents an important eSTR. We previously used an in-vitro luciferase reporter model to examine the interaction between STRs and SNPs in IGF1 promoter. Here, we further explored the mechanism how the number of repeats of the STR regulates gene transcription. An inverse correlation between the number of repeats and the extent of transactivation was found in a haplotype consisting of three promoter SNPs (C-STR-T-T). We showed that these adjacent SNPs located outside the STR were required for the STR to function as eSTR. The C allele of rs35767 provides a binding site for CCAAT/enhancer-binding-protein δ (C/EBPD), which is essential for the gradational transactivation property of eSTR and FOXA3 may also be involved. Therefore, we propose a mechanism in which the gradational transactivation by the eSTR is caused by the interaction of one or more transcriptional complexes located outside the STR, rather than by direct binding to a repeat motif of the STR.
Collapse
Affiliation(s)
- Holly Y Chen
- Department of Chemical Pathology, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Suk Ling Ma
- Department of Psychiatry, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Wei Huang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of Pharmaceutics, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Lindan Ji
- Department of Biochemistry and Molecular Biology, Zhejiang Provincial Key Laboratory of Pathophysiology, Ningbo University School of Medicine, Ningbo, China
| | - Vincent H K Leung
- Department of Chemical Pathology, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Honglin Jiang
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA
| | - Xiaoqiang Yao
- School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, China
| | - Nelson L S Tang
- Department of Chemical Pathology, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.,School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, China.,Laboratory of Genetics of Disease Susceptibility, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.,Functional Genomics and Biostatistical Computing laboratory, Shenzhen Research Institute, The Chinese University of Hong Kong, China.,KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming, China
| |
Collapse
|
46
|
Huang Y, Chen SY, Deng F. Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction. Comput Struct Biotechnol J 2016; 14:298-303. [PMID: 27536341 PMCID: PMC4975701 DOI: 10.1016/j.csbj.2016.07.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2016] [Revised: 07/06/2016] [Accepted: 07/12/2016] [Indexed: 12/31/2022] Open
Abstract
In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.
Collapse
Affiliation(s)
- Ying Huang
- College of Veterinary Medicine, Sichuan Agricultural University, Chengdu 611130, China
| | - Shi-Yi Chen
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
- Corresponding author at: Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, 211# Huimin Road, Wenjiang 611130, Sichuan, China.Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan ProvinceSichuan Agricultural University211# Huimin RoadWenjiangSichuan611130China
| | - Feilong Deng
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
| |
Collapse
|
47
|
Quilez J, Guilmatre A, Garg P, Highnam G, Gymrek M, Erlich Y, Joshi RS, Mittelman D, Sharp AJ. Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucleic Acids Res 2016; 44:3750-62. [PMID: 27060133 PMCID: PMC4857002 DOI: 10.1093/nar/gkw219] [Citation(s) in RCA: 92] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 03/22/2016] [Indexed: 01/23/2023] Open
Abstract
Despite representing an important source of genetic variation, tandem repeats (TRs) remain poorly studied due to technical difficulties. We hypothesized that TRs can operate as expression (eQTLs) and methylation (mQTLs) quantitative trait loci. To test this we analyzed the effect of variation at 4849 promoter-associated TRs, genotyped in 120 individuals, on neighboring gene expression and DNA methylation. Polymorphic promoter TRs were associated with increased variance in local gene expression and DNA methylation, suggesting functional consequences related to TR variation. We identified >100 TRs associated with expression/methylation levels of adjacent genes. These potential eQTL/mQTL TRs were enriched for overlaps with transcription factor binding and DNaseI hypersensitivity sites, providing a rationale for their effects. Moreover, we showed that most TR variants are poorly tagged by nearby single nucleotide polymorphisms (SNPs) markers, indicating that many functional TR variants are not effectively assayed by SNP-based approaches. Our study assigns biological significance to TR variations in the human genome, and suggests that a significant fraction of TR variations exert functional effects via alterations of local gene expression or epigenetics. We conclude that targeted studies that focus on genotyping TR variants are required to fully ascertain functional variation in the genome.
Collapse
Affiliation(s)
- Javier Quilez
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Audrey Guilmatre
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Paras Garg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Gareth Highnam
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Melissa Gymrek
- Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA New York Genome Center, New York, NY 10038, USA
| | - Yaniv Erlich
- Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, NY 10027, USA
| | - Ricky S Joshi
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - David Mittelman
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
48
|
Abundant contribution of short tandem repeats to gene expression variation in humans. Nat Genet 2015; 48:22-9. [PMID: 26642241 DOI: 10.1038/ng.3461] [Citation(s) in RCA: 230] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Accepted: 11/12/2015] [Indexed: 12/16/2022]
Abstract
The contribution of repetitive elements to quantitative human traits is largely unknown. Here we report a genome-wide survey of the contribution of short tandem repeats (STRs), which constitute one of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from that of linked SNPs and indels and found that eSTRs contribute 10-15% of the cis heritability mediated by all common variants. Further functional genomic analyses showed that eSTRs are enriched in conserved regions, colocalize with regulatory elements and may modulate certain histone modifications. By analyzing known genome-wide association study (GWAS) signals and searching for new associations in 1,685 whole genomes from deeply phenotyped individuals, we found that eSTRs are enriched in various clinically relevant conditions. These results highlight the contribution of STRs to the genetic architecture of quantitative human traits.
Collapse
|
49
|
Tandem repeats and divergent gene expression. Nat Rev Genet 2015. [DOI: 10.1038/nrg4040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
50
|
Sonay TB, Koletou M, Wagner A. A survey of tandem repeat instabilities and associated gene expression changes in 35 colorectal cancers. BMC Genomics 2015; 16:702. [PMID: 26376692 PMCID: PMC4574073 DOI: 10.1186/s12864-015-1902-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Accepted: 09/09/2015] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Colorectal cancer is a major contributor to cancer morbidity and mortality. Tandem repeat instability and its effect on cancer phenotypes remain so far poorly studied on a genome-wide scale. RESULTS Here we analyze the genomes of 35 colorectal tumors and their matched normal (healthy) tissues for two types of tandem repeat instability, de-novo repeat gain or loss and repeat copy number variation. Specifically, we study for the first time genome-wide repeat instability in the promoters and exons of 18,439 genes, and examine the association of repeat instability with genome-scale gene expression levels. We find that tumors with a microsatellite instable (MSI) phenotype are enriched in genes with repeat instability, and that tumor genomes have significantly more genes with repeat instability compared to healthy tissues. Genes in tumor genomes with repeat instability in their promoters are significantly less expressed and show slightly higher levels of methylation. Genes in well-studied cancer-associated signaling pathways also contain significantly more unstable repeats in tumor genomes. Genes with such unstable repeats in the tumor-suppressor p53 pathway have lower expression levels, whereas genes with repeat instability in the MAPK and Wnt signaling pathways are expressed at higher levels, consistent with the oncogenic role they play in cancer. CONCLUSIONS Our results suggest that repeat instability in gene promoters and associated differential gene expression may play an important role in colorectal tumors, which is a first step towards the development of more effective molecular diagnostic approaches centered on repeat instability.
Collapse
Affiliation(s)
- Tugce Bilgin Sonay
- Anthropological Institute and Museum, University of Zurich, Zurich, Switzerland.
- Institute of Evolutionary Biology and Environmental Sciences, University of Zurich, Zurich, Switzerland.
| | | | - Andreas Wagner
- Institute of Evolutionary Biology and Environmental Sciences, University of Zurich, Zurich, Switzerland.
- The Swiss Institute of Bioinformatics, Lausanne, Switzerland.
- The Santa Fe Institute, Santa Fe, NM, United States of America.
| |
Collapse
|