1
|
Hu Z, Lin G, Zhang M, Piao S, Fan J, Liu J, Liu P, Fu S, Sun W, Li L, Qiu X, Zhang J, Yang Y, Zhou C. Mechanistic Characterization of De Novo Generation of Variable Number Tandem Repeats in Circular Plasmids during Site-Directed Mutagenesis and Optimization for Coding Gene Application. Adv Biol (Weinh) 2024; 8:e2400084. [PMID: 38880850 DOI: 10.1002/adbi.202400084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 04/21/2024] [Indexed: 06/18/2024]
Abstract
Site-directed mutagenesis for creating point mutations, sometimes, gives rise to plasmids carrying variable number tandem repeats (VNTRs) locally, which are arbitrarily regarded as polymerase chain reaction (PCR) related artifacts. Here, the alternative end-joining mechanism is reported rather than PCR artifacts accounts largely for that VNTRs formation and expansion. During generating a point mutation on GPLD1 gene, an unexpected formation of VNTRs employing the 31 bp mutagenesis primers is observed as the repeat unit in the pcDNA3.1-GPLD1 plasmid. The 31 bp VNTRs are formed in 24.75% of the resulting clones with copy number varied from 2 to 13. All repeat units are aligned with the same orientation as GPLD1 gene. 43.54% of the repeat junctions harbor nucleotide mutations while the rest don't. Their demonstrated short primers spanning the 3' part of the mutagenesis primers are essential for initial creation of the 2-copy tandem repeats (TRs) in circular plasmids. The dimerization of mutagenesis primers by the alternative end-joining in a correct orientation is required for further expansion of the 2-copy TRs. Lastly, a half-double priming strategy is established, verified the findings and offered a simple method for VNTRs creation on coding genes in circular plasmids without junction mutations.
Collapse
Affiliation(s)
- Ziqi Hu
- The Laboratory of Medical Genetics, Harbin Medical University, Harbin, 150081, China
| | - Guochao Lin
- The Laboratory of Medical Genetics, Harbin Medical University, Harbin, 150081, China
| | - Mingzhu Zhang
- The Laboratory of Medical Genetics, Harbin Medical University, Harbin, 150081, China
| | - Shengwen Piao
- The Laboratory of Medical Genetics, Harbin Medical University, Harbin, 150081, China
| | - Jiankun Fan
- The Laboratory of Medical Genetics, Harbin Medical University, Harbin, 150081, China
| | - Jichao Liu
- The Second Affiliated Hospital, Harbin Medical University, Harbin, 150001, China
| | - Peng Liu
- The Laboratory of Medical Genetics, Harbin Medical University, Harbin, 150081, China
| | - Songbin Fu
- The Laboratory of Medical Genetics, Harbin Medical University, Harbin, 150081, China
- Key Laboratory of Preservation of Human Genetic Resources and Disease Control in China, Harbin Medical University, Ministry of Education, China
| | - Wenjing Sun
- The Laboratory of Medical Genetics, Harbin Medical University, Harbin, 150081, China
- Key Laboratory of Preservation of Human Genetic Resources and Disease Control in China, Harbin Medical University, Ministry of Education, China
| | - Li Li
- The Second Affiliated Hospital, Harbin Medical University, Harbin, 150001, China
| | - Xiaohong Qiu
- The Second Affiliated Hospital, Harbin Medical University, Harbin, 150001, China
| | - Jinwei Zhang
- The Second Affiliated Hospital, Harbin Medical University, Harbin, 150001, China
| | - Yu Yang
- The Second Affiliated Hospital, Harbin Medical University, Harbin, 150001, China
| | - Chunshui Zhou
- The Laboratory of Medical Genetics, Harbin Medical University, Harbin, 150081, China
- The Second Affiliated Hospital, Harbin Medical University, Harbin, 150001, China
- Key Laboratory of Preservation of Human Genetic Resources and Disease Control in China, Harbin Medical University, Ministry of Education, China
| |
Collapse
|
2
|
He H, Leng Y, Cao X, Zhu Y, Li X, Yuan Q, Zhang B, He W, Wei H, Liu X, Xu Q, Guo M, Zhang H, Yang L, Lv Y, Wang X, Shi C, Zhang Z, Chen W, Zhang B, Wang T, Yu X, Qian H, Zhang Q, Dai X, Liu C, Cui Y, Wang Y, Zheng X, Xiong G, Zhou Y, Qian Q, Shang L. The pan-tandem repeat map highlights multiallelic variants underlying gene expression and agronomic traits in rice. Nat Commun 2024; 15:7291. [PMID: 39181885 PMCID: PMC11344853 DOI: 10.1038/s41467-024-51854-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 08/20/2024] [Indexed: 08/27/2024] Open
Abstract
Tandem repeats (TRs) are genomic regions that tandemly change in repeat number, which are often multiallelic. Their characteristics and contributions to gene expression and quantitative traits in rice are largely unknown. Here, we survey rice TR variations based on 231 genome assemblies and the rice pan-genome graph. We identify 227,391 multiallelic TR loci, including 54,416 TR variations that are absent from the Nipponbare reference genome. Only 1/3 TR variations show strong linkage with nearby bi-allelic variants (SNPs, Indels and PAVs). Using 193 panicle and 202 leaf transcriptomic data, we reveal 485 and 511 TRs act as QTLs independently of other bi-allelic variations to nearby gene expression, respectively. Using plant height and grain width as examples, we identify and validate TRs contributions to rice agronomic trait variations. These findings would enhance our understanding of the functions of multiallelic variants and facilitate rice molecular breeding.
Collapse
Affiliation(s)
- Huiying He
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yue Leng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xinglan Cao
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng, 475004, China
- Shenzhen Research Institute of Henan university, Shenzhen, 518000, China
| | - Yiwang Zhu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- Institute of Biotechnology, Fujian Academy of Agricultural Sciences/Fujian Provincial Key Laboratory of Genetic Engineering for Agriculture, Fuzhou, 350003, China
| | - Xiaoxia Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qiaoling Yuan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Bin Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- Yazhouwan National Laboratory, Sanya, 572024, China
| | - Wenchuang He
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Hua Wei
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xiangpei Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qiang Xu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Mingliang Guo
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Hong Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Longbo Yang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yang Lv
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xianmeng Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Chuanlin Shi
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Zhipeng Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Wu Chen
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Bintao Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Tianyi Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xiaoman Yu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Hongge Qian
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qianqian Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xiaofan Dai
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Congcong Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yan Cui
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yuexing Wang
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China
| | - Xiaoming Zheng
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Science, Chinese Academy of Agricultural Sciences, 100081, Beijing, China
| | - Guosheng Xiong
- Academy for Advanced Interdisciplinary Studies, Plant Phenomics Research Center, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Yongfeng Zhou
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qian Qian
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China.
- Yazhouwan National Laboratory, Sanya, 572024, China.
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China.
| | - Lianguang Shang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China.
- Yazhouwan National Laboratory, Sanya, 572024, China.
| |
Collapse
|
3
|
Ziaei Jam H, Zook JM, Javadzadeh S, Park J, Sehgal A, Gymrek M. LongTR: genome-wide profiling of genetic variation at tandem repeats from long reads. Genome Biol 2024; 25:176. [PMID: 38965568 PMCID: PMC11229021 DOI: 10.1186/s13059-024-03319-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 06/21/2024] [Indexed: 07/06/2024] Open
Abstract
Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, Gaithersburg, MD, USA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Aarushi Sehgal
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
4
|
Hossam Abdelmonem B, Abdelaal NM, Anwer EKE, Rashwan AA, Hussein MA, Ahmed YF, Khashana R, Hanna MM, Abdelnaser A. Decoding the Role of CYP450 Enzymes in Metabolism and Disease: A Comprehensive Review. Biomedicines 2024; 12:1467. [PMID: 39062040 PMCID: PMC11275228 DOI: 10.3390/biomedicines12071467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/13/2024] [Accepted: 06/14/2024] [Indexed: 07/28/2024] Open
Abstract
Cytochrome P450 (CYP450) is a group of enzymes that play an essential role in Phase I metabolism, with 57 functional genes classified into 18 families in the human genome, of which the CYP1, CYP2, and CYP3 families are prominent. Beyond drug metabolism, CYP enzymes metabolize endogenous compounds such as lipids, proteins, and hormones to maintain physiological homeostasis. Thus, dysregulation of CYP450 enzymes can lead to different endocrine disorders. Moreover, CYP450 enzymes significantly contribute to fatty acid metabolism, cholesterol synthesis, and bile acid biosynthesis, impacting cellular physiology and disease pathogenesis. Their diverse functions emphasize their therapeutic potential in managing hypercholesterolemia and neurodegenerative diseases. Additionally, CYP450 enzymes are implicated in the onset and development of illnesses such as cancer, influencing chemotherapy outcomes. Assessment of CYP450 enzyme expression and activity aids in evaluating liver health state and differentiating between liver diseases, guiding therapeutic decisions, and optimizing drug efficacy. Understanding the roles of CYP450 enzymes and the clinical effect of their genetic polymorphisms is crucial for developing personalized therapeutic strategies and enhancing drug responses in diverse patient populations.
Collapse
Affiliation(s)
- Basma Hossam Abdelmonem
- Institute of Global Health and Human Ecology, School of Sciences and Engineering, The American University in Cairo, New Cairo 11835, Egypt; (B.H.A.); (M.A.H.); (Y.F.A.); (R.K.); (M.M.H.)
- Department of Microbiology and Immunology, Faculty of Pharmacy, October University for Modern Sciences & Arts (MSA), Giza 12451, Egypt
| | - Noha M. Abdelaal
- Biotechnology Graduate Program, School of Sciences and Engineering, The American University in Cairo, New Cairo 11835, Egypt; (N.M.A.); (E.K.E.A.); (A.A.R.)
| | - Eman K. E. Anwer
- Biotechnology Graduate Program, School of Sciences and Engineering, The American University in Cairo, New Cairo 11835, Egypt; (N.M.A.); (E.K.E.A.); (A.A.R.)
- Department of Microbiology and Immunology, Faculty of Pharmacy, Modern University for Technology and Information, Cairo 4411601, Egypt
| | - Alaa A. Rashwan
- Biotechnology Graduate Program, School of Sciences and Engineering, The American University in Cairo, New Cairo 11835, Egypt; (N.M.A.); (E.K.E.A.); (A.A.R.)
| | - Mohamed Ali Hussein
- Institute of Global Health and Human Ecology, School of Sciences and Engineering, The American University in Cairo, New Cairo 11835, Egypt; (B.H.A.); (M.A.H.); (Y.F.A.); (R.K.); (M.M.H.)
| | - Yasmin F. Ahmed
- Institute of Global Health and Human Ecology, School of Sciences and Engineering, The American University in Cairo, New Cairo 11835, Egypt; (B.H.A.); (M.A.H.); (Y.F.A.); (R.K.); (M.M.H.)
| | - Rana Khashana
- Institute of Global Health and Human Ecology, School of Sciences and Engineering, The American University in Cairo, New Cairo 11835, Egypt; (B.H.A.); (M.A.H.); (Y.F.A.); (R.K.); (M.M.H.)
| | - Mireille M. Hanna
- Institute of Global Health and Human Ecology, School of Sciences and Engineering, The American University in Cairo, New Cairo 11835, Egypt; (B.H.A.); (M.A.H.); (Y.F.A.); (R.K.); (M.M.H.)
| | - Anwar Abdelnaser
- Institute of Global Health and Human Ecology, School of Sciences and Engineering, The American University in Cairo, New Cairo 11835, Egypt; (B.H.A.); (M.A.H.); (Y.F.A.); (R.K.); (M.M.H.)
| |
Collapse
|
5
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
6
|
Liang Y, Hao J, Wang J, Zhang G, Su Y, Liu Z, Wang T. Statistical Genomics Analysis of Simple Sequence Repeats from the Paphiopedilum Malipoense Transcriptome Reveals Control Knob Motifs Modulating Gene Expression. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2304848. [PMID: 38647414 PMCID: PMC11200097 DOI: 10.1002/advs.202304848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 02/26/2024] [Indexed: 04/25/2024]
Abstract
Simple sequence repeats (SSRs) are found in nonrandom distributions in genomes and are thought to impact gene expression. The distribution patterns of 48 295 SSRs of Paphiopedilum malipoense are mined and characterized based on the first full-length transcriptome and comprehensive transcriptome dataset from 12 organs. Statistical genomics analyses are used to investigate how SSRs in transcripts affect gene expression. The results demonstrate the correlations between SSR distributions, characteristics, and expression level. Nine expression-modulating motifs (expMotifs) are identified and a model is proposed to explain the effect of their key features, potency, and gene function on an intra-transcribed region scale. The expMotif-transcribed region combination is the most predominant contributor to the expression-modulating effect of SSRs, and some intra-transcribed regions are critical for this effect. Genes containing the same type of expMotif-SSR elements in the same transcribed region are likely linked in function, regulation, or evolution aspects. This study offers novel evidence to understand how SSRs regulate gene expression and provides potential regulatory elements for plant genetic engineering.
Collapse
Affiliation(s)
- Yingyi Liang
- College of Life SciencesSouth China Agricultural UniversityGuangzhou510642China
| | - Jing Hao
- College of Life SciencesSouth China Agricultural UniversityGuangzhou510642China
| | - Jieyu Wang
- College of Forestry and Landscape ArchitectureSouth China Agricultural UniversityGuangzhou510642China
| | - Guoqiang Zhang
- Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization at College of Landscape Architecture and ArtFujian Agriculture and Forestry UniversityFuzhou350002China
| | - Yingjuan Su
- School of Life SciencesSun Yat‐sen UniversityGuangzhou510275China
- Research Institute of Sun Yat‐sen University in ShenzhenShenzhen518107China
| | - Zhong‐Jian Liu
- Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization at College of Landscape Architecture and ArtFujian Agriculture and Forestry UniversityFuzhou350002China
| | - Ting Wang
- College of Life SciencesSouth China Agricultural UniversityGuangzhou510642China
| |
Collapse
|
7
|
Moya R, Wang X, Tsien RW, Maurano MT. Structural characterization of a polymorphic repeat at the CACNA1C schizophrenia locus. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.05.24303780. [PMID: 38798557 PMCID: PMC11118589 DOI: 10.1101/2024.03.05.24303780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Genetic variation within intron 3 of the CACNA1C calcium channel gene is associated with schizophrenia and bipolar disorder, but analysis of the causal variants and their effect is complicated by a nearby variable-number tandem repeat (VNTR). Here, we used 155 long-read genome assemblies from 78 diverse individuals to delineate the structure and population variability of the CACNA1C intron 3 VNTR. We categorized VNTR sequences into 7 Types of structural alleles using sequence differences among repeat units. Only 12 repeat units at the 5' end of the VNTR were shared across most Types, but several Types were related through a series of large and small duplications. The most diverged Types were rare and present only in individuals with African ancestry, but the multiallelic structural polymorphism Variable Region 2 was present across populations at different frequencies, consistent with expansion of the VNTR preceding the emergence of early hominins. VR2 was in complete linkage disequilibrium with fine-mapped schizophrenia variants (SNPs) from genome-wide association studies (GWAS). This risk haplotype was associated with decreased CACNA1C gene expression in brain tissues profiled by the GTEx project. Our work suggests that sequence variation within a human-specific VNTR affects gene expression, and provides a detailed characterization of new alleles at a flagship neuropsychiatric locus.
Collapse
Affiliation(s)
- Raquel Moya
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | - Xiaohan Wang
- Neuroscience Institute, NYU School of Medicine, New York, NY 10016, USA
- Department of Neuroscience and Physiology, New York University, New York, NY 10016
| | - Richard W. Tsien
- Neuroscience Institute, NYU School of Medicine, New York, NY 10016, USA
- Department of Neuroscience and Physiology, New York University, New York, NY 10016
| | - Matthew T. Maurano
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
- Department of Pathology, NYU School of Medicine, New York, NY 10016, USA
| |
Collapse
|
8
|
Sirasangi MI, Roohi TF, Krishna KL, Kinattingal N, Wani SUD, Mehdi S. Dietary Co-supplements attenuate the chronic unpredictable mild stress-induced depression in mice. Behav Brain Res 2024; 459:114788. [PMID: 38036263 DOI: 10.1016/j.bbr.2023.114788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 11/21/2023] [Accepted: 11/27/2023] [Indexed: 12/02/2023]
Abstract
Does it make a difference what we eat when it comes to our mental health? Food and nutrients are essential not only for human biology and physical appearance but also for mental and emotional well-being. There has been a significant increase in the favourable effects of dietary supplements in the treatment of depressive state in the latest days. Co-supplements which can be a great contribution in the management of depression from the future perspective and might help to reduce standard anti-depressant drug doses, which can be a strategic way to reduce the side effect of standard anti-depressants drugs. This study was designed to evaluate and compare the anti-depressant effects of cholecalciferol-D3 (V.D3), n-3 polyunsaturated fatty acid (PUFA), and a combination of V.D3 + n-3 PUFA with fluoxetine treatment in chronic unpredictable mild stress (CUMS) induced depression in the mice model. We established CUMS depressant mice model and treated CUMS mice with V.D3, n-3 PUFA, and a combination of V.D3 + n-3 PUFA with fluoxetine. Behavioral changes were measured by the forced swim and tail suspension test. Oxidative stress markers and anti-depressant activity were assessed through parameters such as superoxide dismutase, reduced glutathione, lipid peroxidation, and serum corticosterone levels. Additionally, we measured the levels of neurotransmitters dopamine and serotonin. CUMS induced mice displayed depressive-like behaviours. Moreover, cholecalciferol-D3, n-3 PUFA, and a combination of Cholecalciferol-D3 + n-3 PUFA with fluoxetine treatment attenuated the depressive-like behaviour in CUMS mice accompanied with suppression of oxidative stress markers by up-regulated the expression of an antioxidant signalling pathway. The results suggested that treatment of cholecalciferol-D3, n-3 PUFA, and a combination of Cholecalciferol-D3 + n-3 PUFA with fluoxetine significantly ameliorated depressive-like behaviours in CUMS induced depression in mice. To delve further into the implications of these findings, future studies could explore the specific molecular mechanisms underlying the observed effects on oxidative stress markers and the antioxidant signaling pathway. This could provide valuable insights into the potential of dietary supplements in the management of depression and help in reducing the reliance on conventional antidepressant medications, thus improving the overall quality of treatment for this prevalent mental health condition.
Collapse
Affiliation(s)
- Mahesh I Sirasangi
- Department of Pharmacology, JSS College of Pharmacy, JSS Academy of Higher Education & Research, Mysuru 570 015, India
| | - Tamsheel Fatima Roohi
- Department of Pharmacology, JSS College of Pharmacy, JSS Academy of Higher Education & Research, Mysuru 570 015, India
| | - K L Krishna
- Department of Pharmacology, JSS College of Pharmacy, JSS Academy of Higher Education & Research, Mysuru 570 015, India
| | - Nabeel Kinattingal
- Department of Pharmacology, JSS College of Pharmacy, JSS Academy of Higher Education & Research, Mysuru 570 015, India
| | - Shahid Ud Din Wani
- Department of Pharmaceutical Sciences, School of Applied Sciences and Technology, University of Kashmir, Srinagar 190 006, India.
| | - Seema Mehdi
- Department of Pharmacology, JSS College of Pharmacy, JSS Academy of Higher Education & Research, Mysuru 570 015, India.
| |
Collapse
|
9
|
Hong EP, Ramos EM, Aziz NA, Massey TH, McAllister B, Lobanov S, Jones L, Holmans P, Kwak S, Orth M, Ciosi M, Lomeikaite V, Monckton DG, Long JD, Lucente D, Wheeler VC, Gillis T, MacDonald ME, Sequeiros J, Gusella JF, Lee JM. Modification of Huntington's disease by short tandem repeats. Brain Commun 2024; 6:fcae016. [PMID: 38449714 PMCID: PMC10917446 DOI: 10.1093/braincomms/fcae016] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 12/20/2023] [Accepted: 01/22/2024] [Indexed: 03/08/2024] Open
Abstract
Expansions of glutamine-coding CAG trinucleotide repeats cause a number of neurodegenerative diseases, including Huntington's disease and several of spinocerebellar ataxias. In general, age-at-onset of the polyglutamine diseases is inversely correlated with the size of the respective inherited expanded CAG repeat. Expanded CAG repeats are also somatically unstable in certain tissues, and age-at-onset of Huntington's disease corrected for individual HTT CAG repeat length (i.e. residual age-at-onset), is modified by repeat instability-related DNA maintenance/repair genes as demonstrated by recent genome-wide association studies. Modification of one polyglutamine disease (e.g. Huntington's disease) by the repeat length of another (e.g. ATXN3, CAG expansions in which cause spinocerebellar ataxia 3) has also been hypothesized. Consequently, we determined whether age-at-onset in Huntington's disease is modified by the CAG repeats of other polyglutamine disease genes. We found that the CAG measured repeat sizes of other polyglutamine disease genes that were polymorphic in Huntington's disease participants but did not influence Huntington's disease age-at-onset. Additional analysis focusing specifically on ATXN3 in a larger sample set (n = 1388) confirmed the lack of association between Huntington's disease residual age-at-onset and ATXN3 CAG repeat length. Additionally, neither our Huntington's disease onset modifier genome-wide association studies single nucleotide polymorphism data nor imputed short tandem repeat data supported the involvement of other polyglutamine disease genes in modifying Huntington's disease. By contrast, our genome-wide association studies based on imputed short tandem repeats revealed significant modification signals for other genomic regions. Together, our short tandem repeat genome-wide association studies show that modification of Huntington's disease is associated with short tandem repeats that do not involve other polyglutamine disease-causing genes, refining the landscape of Huntington's disease modification and highlighting the importance of rigorous data analysis, especially in genetic studies testing candidate modifiers.
Collapse
Affiliation(s)
- Eun Pyo Hong
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
- Medical and Population Genetics Program, The Broad Institute of M.I.T. and Harvard, Cambridge, MA 02142, USA
| | - Eliana Marisa Ramos
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
| | - N Ahmad Aziz
- Population & Clinical Neuroepidemiology, German Center for Neurodegenerative Diseases, 53127 Bonn, Germany
- Department of Neurology, Faculty of Medicine, University of Bonn, Bonn D-53113, Germany
| | - Thomas H Massey
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff CF24 4HQ, UK
| | - Branduff McAllister
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff CF24 4HQ, UK
| | - Sergey Lobanov
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff CF24 4HQ, UK
| | - Lesley Jones
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff CF24 4HQ, UK
| | - Peter Holmans
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff CF24 4HQ, UK
| | - Seung Kwak
- Molecular System Biology, CHDI Foundation, Princeton, NJ 08540, USA
| | - Michael Orth
- University Hospital of Old Age Psychiatry and Psychotherapy, Bern University, CH-3000 Bern 60, Switzerland
| | - Marc Ciosi
- School of Molecular Biosciences, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK
| | - Vilija Lomeikaite
- School of Molecular Biosciences, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK
| | - Darren G Monckton
- School of Molecular Biosciences, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK
| | - Jeffrey D Long
- Department of Psychiatry, Carver College of Medicine and Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA 52242, USA
| | - Diane Lucente
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Vanessa C Wheeler
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
| | - Tammy Gillis
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Marcy E MacDonald
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
- Medical and Population Genetics Program, The Broad Institute of M.I.T. and Harvard, Cambridge, MA 02142, USA
| | - Jorge Sequeiros
- UnIGENe, IBMC—Institute for Molecular and Cell Biology, i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto 420-135, Portugal
- ICBAS School of Medicine and Biomedical Sciences, University of Porto, Porto 420-135, Portugal
| | - James F Gusella
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Medical and Population Genetics Program, The Broad Institute of M.I.T. and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Jong-Min Lee
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
- Medical and Population Genetics Program, The Broad Institute of M.I.T. and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
10
|
Manigbas CA, Jadhav B, Garg P, Shadrina M, Lee W, Martin-Trujillo A, Sharp AJ. A phenome-wide association study of tandem repeat variation in 168,554 individuals from the UK Biobank. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.22.24301630. [PMID: 38343850 PMCID: PMC10854328 DOI: 10.1101/2024.01.22.24301630] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
Most genetic association studies focus on binary variants. To identify the effects of multi-allelic variation of tandem repeats (TRs) on human traits, we performed direct TR genotyping and phenome-wide association studies in 168,554 individuals from the UK Biobank, identifying 47 TRs showing causal associations with 73 traits. We replicated 23 of 31 (74%) of these causal associations in the All of Us cohort. While this set included several known repeat expansion disorders, novel associations we found were attributable to common polymorphic variation in TR length rather than rare expansions and include e.g. a coding polyhistidine motif in HRCT1 influencing risk of hypertension and a poly(CGC) in the 5'UTR of GNB2 influencing heart rate. Causal TRs were strongly enriched for associations with local gene expression and DNA methylation. Our study highlights the contribution of multi-allelic TRs to the "missing heritability" of the human genome.
Collapse
|
11
|
Jam HZ, Zook JM, Javadzadeh S, Park J, Sehgal A, Gymrek M. Genome-wide profiling of genetic variation at tandem repeat from long reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576266. [PMID: 38328152 PMCID: PMC10849534 DOI: 10.1101/2024.01.20.576266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve TR analysis, especially for long or complex repeats. Here we introduce LongTR, which accurately genotypes tandem repeats from high fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr.
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD, USA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Aarushi Sehgal
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
12
|
Birnbaum R. Rediscovering tandem repeat variation in schizophrenia: challenges and opportunities. Transl Psychiatry 2023; 13:402. [PMID: 38123544 PMCID: PMC10733427 DOI: 10.1038/s41398-023-02689-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 11/23/2023] [Accepted: 11/27/2023] [Indexed: 12/23/2023] Open
Abstract
Tandem repeats (TRs) are prevalent throughout the genome, constituting at least 3% of the genome, and often highly polymorphic. The high mutation rate of TRs, which can be orders of magnitude higher than single-nucleotide polymorphisms and indels, indicates that they are likely to make significant contributions to phenotypic variation, yet their contribution to schizophrenia has been largely ignored by recent genome-wide association studies (GWAS). Tandem repeat expansions are already known causative factors for over 50 disorders, while common tandem repeat variation is increasingly being identified as significantly associated with complex disease and gene regulation. The current review summarizes key background concepts of tandem repeat variation as pertains to disease risk, elucidating their potential for schizophrenia association. An overview of next-generation sequencing-based methods that may be applied for TR genome-wide identification is provided, and some key methodological challenges in TR analyses are delineated.
Collapse
Affiliation(s)
- Rebecca Birnbaum
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
13
|
Chaisson MJP, Sulovari A, Valdmanis PN, Miller DE, Eichler EE. Advances in the discovery and analyses of human tandem repeats. Emerg Top Life Sci 2023; 7:361-381. [PMID: 37905568 PMCID: PMC10806765 DOI: 10.1042/etls20230074] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 11/02/2023]
Abstract
Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.
Collapse
Affiliation(s)
- Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, U.S.A
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, CA 90089, U.S.A
| | - Arvis Sulovari
- Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, U.S.A
| | - Paul N Valdmanis
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, U.S.A
- Department of Pediatrics, University of Washington, Seattle, WA 98195, U.S.A
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, U.S.A
| |
Collapse
|
14
|
Hannan AJ. Expanding horizons of tandem repeats in biology and medicine: Why 'genomic dark matter' matters. Emerg Top Life Sci 2023; 7:ETLS20230075. [PMID: 38088823 PMCID: PMC10754335 DOI: 10.1042/etls20230075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 12/30/2023]
Abstract
Approximately half of the human genome includes repetitive sequences, and these DNA sequences (as well as their transcribed repetitive RNA and translated amino-acid repeat sequences) are known as the repeatome. Within this repeatome there are a couple of million tandem repeats, dispersed throughout the genome. These tandem repeats have been estimated to constitute ∼8% of the entire human genome. These tandem repeats can be located throughout exons, introns and intergenic regions, thus potentially affecting the structure and function of tandemly repetitive DNA, RNA and protein sequences. Over more than three decades, more than 60 monogenic human disorders have been found to be caused by tandem-repeat mutations. These monogenic tandem-repeat disorders include Huntington's disease, a variety of ataxias, amyotrophic lateral sclerosis and frontotemporal dementia, as well as many other neurodegenerative diseases. Furthermore, tandem-repeat disorders can include fragile X syndrome, related fragile X disorders, as well as other neurological and psychiatric disorders. However, these monogenic tandem-repeat disorders, which were discovered via their dominant or recessive modes of inheritance, may represent the 'tip of the iceberg' with respect to tandem-repeat contributions to human disorders. A previous proposal that tandem repeats may contribute to the 'missing heritability' of various common polygenic human disorders has recently been supported by a variety of new evidence. This includes genome-wide studies that associate tandem-repeat mutations with autism, schizophrenia, Parkinson's disease and various types of cancers. In this article, I will discuss how tandem-repeat mutations and polymorphisms could contribute to a wide range of common disorders, along with some of the many major challenges of tandem-repeat biology and medicine. Finally, I will discuss the potential of tandem repeats to be therapeutically targeted, so as to prevent and treat an expanding range of human disorders.
Collapse
Affiliation(s)
- Anthony J Hannan
- Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, Victoria 3010, Australia
- Department of Anatomy and Physiology, University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
15
|
Ziaei Jam H, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, Adam Y, Maksimov M, Huang B, Dolzhenko E, Qiu Y, Kakembo FE, Joseph H, Onyido B, Adeyemi J, Bakhtiari M, Park J, Javadzadeh S, Jjingo D, Adebiyi E, Bafna V, Gymrek M. A deep population reference panel of tandem repeat variation. Nat Commun 2023; 14:6711. [PMID: 37872149 PMCID: PMC10593948 DOI: 10.1038/s41467-023-42278-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 10/05/2023] [Indexed: 10/25/2023] Open
Abstract
Tandem repeats (TRs) represent one of the largest sources of genetic variation in humans and are implicated in a range of phenotypes. Here we present a deep characterization of TR variation based on high coverage whole genome sequencing from 3550 diverse individuals from the 1000 Genomes Project and H3Africa cohorts. We develop a method, EnsembleTR, to integrate genotypes from four separate methods resulting in high-quality genotypes at more than 1.7 million TR loci. Our catalog reveals novel sequence features influencing TR heterozygosity, identifies population-specific trinucleotide expansions, and finds hundreds of novel eQTL signals. Finally, we generate a phased haplotype panel which can be used to impute most TRs from nearby single nucleotide polymorphisms (SNPs) with high accuracy. Overall, the TR genotypes and reference haplotype panel generated here will serve as valuable resources for future genome-wide and population-wide studies of TRs and their role in human phenotypes.
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Yang Li
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Ross DeVito
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| | - Nichole Ma
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Ibra Lujumba
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mikhail Maksimov
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Bonnie Huang
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | | | - Yunjiang Qiu
- Illumina Incorporated, San Diego, CA, 92122, USA
| | - Fredrick Elishama Kakembo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Habi Joseph
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Blessing Onyido
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Jumoke Adeyemi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mehrdad Bakhtiari
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Daudi Jjingo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala, Uganda
- Department of Computer Science, Makerere University, Kampala, Uganda
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, Baden-Württemberg, 69120, Germany
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
16
|
Bzikadze AV, Pevzner PA. UniAligner: a parameter-free framework for fast sequence alignment. Nat Methods 2023; 20:1346-1354. [PMID: 37580559 DOI: 10.1038/s41592-023-01970-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 07/05/2023] [Indexed: 08/16/2023]
Abstract
Even though the recent advances in 'complete genomics' revealed the previously inaccessible genomic regions, analysis of variations in centromeres and other extra-long tandem repeats (ETRs) faces an algorithmic challenge since there are currently no tools for accurate sequence comparison of ETRs. Counterintuitively, the classical alignment approaches, such as the Smith-Waterman algorithm, fail to construct biologically adequate alignments of ETRs. We present UniAligner-the parameter-free sequence alignment algorithm with sequence-dependent alignment scoring that automatically changes for any pair of compared sequences. UniAligner prioritizes matches of rare substrings that are more likely to be relevant to the evolutionary relationship between two sequences. We apply UniAligner to estimate the mutation rates in human centromeres, and quantify the extremely high rate of large duplications and deletions in centromeres. This high rate suggests that centromeres may represent some of the most rapidly evolving regions of the human genome with respect to their structural organization.
Collapse
Affiliation(s)
- Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, La Jolla, CA, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
17
|
Mukamel RE, Handsaker RE, Sherman MA, Barton AR, Hujoel MLA, McCarroll SA, Loh PR. Repeat polymorphisms underlie top genetic risk loci for glaucoma and colorectal cancer. Cell 2023; 186:3659-3673.e23. [PMID: 37527660 PMCID: PMC10528368 DOI: 10.1016/j.cell.2023.07.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 04/07/2023] [Accepted: 07/03/2023] [Indexed: 08/03/2023]
Abstract
Many regions in the human genome vary in length among individuals due to variable numbers of tandem repeats (VNTRs). To assess the phenotypic impact of VNTRs genome-wide, we applied a statistical imputation approach to estimate the lengths of 9,561 autosomal VNTR loci in 418,136 unrelated UK Biobank participants and 838 GTEx participants. Association and statistical fine-mapping analyses identified 58 VNTRs that appeared to influence a complex trait in UK Biobank, 18 of which also appeared to modulate expression or splicing of a nearby gene. Non-coding VNTRs at TMCO1 and EIF3H appeared to generate the largest known contributions of common human genetic variation to risk of glaucoma and colorectal cancer, respectively. Each of these two VNTRs associated with a >2-fold range of risk across individuals. These results reveal a substantial and previously unappreciated role of non-coding VNTRs in human health and gene regulation.
Collapse
Affiliation(s)
- Ronen E Mukamel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Robert E Handsaker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Maxwell A Sherman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Alison R Barton
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Bioinformatics and Integrative Genomics Program, Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Margaux L A Hujoel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
18
|
Ren J, Gu B, Chaisson MJP. vamos: variable-number tandem repeats annotation using efficient motif sets. Genome Biol 2023; 24:175. [PMID: 37501141 PMCID: PMC10373352 DOI: 10.1186/s13059-023-03010-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 07/06/2023] [Indexed: 07/29/2023] Open
Abstract
Roughly 3% of the human genome is composed of variable-number tandem repeats (VNTRs): arrays of motifs at least six bases. These loci are highly polymorphic, yet current approaches that define and merge variants based on alignment breakpoints do not capture their full diversity. Here we present a method vamos: VNTR Annotation using efficient Motif Sets that instead annotates VNTR using repeat composition under different levels of motif diversity. Using vamos we estimate 7.4-16.7 alleles per locus when applied to 74 haplotype-resolved human assemblies, compared to breakpoint-based approaches that estimate 4.0-5.5 alleles per locus.
Collapse
Affiliation(s)
- Jingwen Ren
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, US
| | - Bida Gu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, US
| | - Mark J. P. Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, US
| |
Collapse
|
19
|
Saei H, Morinière V, Heidet L, Gribouval O, Lebbah S, Tores F, Mautret-Godefroy M, Knebelmann B, Burtey S, Vuiblet V, Antignac C, Nitschké P, Dorval G. VNtyper enables accurate alignment-free genotyping of MUC1 coding VNTR using short-read sequencing data in autosomal dominant tubulointerstitial kidney disease. iScience 2023; 26:107171. [PMID: 37456840 PMCID: PMC10338300 DOI: 10.1016/j.isci.2023.107171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 05/06/2023] [Accepted: 06/14/2023] [Indexed: 07/18/2023] Open
Abstract
The human genome comprises approximately 3% of tandem repeats with variable length (VNTR), a few of which have been linked to human rare diseases. Autosomal dominant tubulointerstitial kidney disease-MUC1 (ADTKD-MUC1) is caused by specific frameshift variants in the coding VNTR of the MUC1 gene. Calling variants from VNTR using short-read sequencing (SRS) is challenging due to poor read mappability. We developed a computational pipeline, VNtyper, for reliable detection of MUC1 VNTR pathogenic variants and demonstrated its clinical utility in two distinct cohorts: (1) a historical cohort including 108 families with ADTKD and (2) a replication naive cohort comprising 2,910 patients previously tested on a panel of genes involved in monogenic renal diseases. In the historical cohort all cases known to carry pathogenic MUC1 variants were re-identified, and a new 25bp-frameshift insertion in an additional mislaid family was detected. In the replication cohort, we discovered and validated 30 new patients.
Collapse
Affiliation(s)
- Hassan Saei
- Laboratoire des Maladies Rénales Héréditaires, Inserm UMR 1163, Institut Imagine, Université Paris Cité, Paris, France
| | - Vincent Morinière
- Service de Médecine Génomique des Maladies Rares, Hôpital Necker-Enfants Malades, Assistance publique, Hôpitaux de Paris (AP-HP), Paris, France
| | - Laurence Heidet
- Laboratoire des Maladies Rénales Héréditaires, Inserm UMR 1163, Institut Imagine, Université Paris Cité, Paris, France
- Service de Néphrologie Pédiatrique, Centre de Référence MARHEA, Hôpital Necker-Enfants Malades, Assistance publique, Hôpitaux de Paris (AP-HP), Paris, France
| | - Olivier Gribouval
- Laboratoire des Maladies Rénales Héréditaires, Inserm UMR 1163, Institut Imagine, Université Paris Cité, Paris, France
| | - Said Lebbah
- Département de Santé Publique, Unité de Recherche Clinique, Hôpital Pitié-Salpêtrière, Assistance publique, Hôpitaux de Paris (AP-HP), Paris, France
| | - Frederic Tores
- Plateforme Bio-informatique, Inserm UMR 1163, Institut Imagine, Université Paris Cité, Paris, France
| | - Manon Mautret-Godefroy
- Service de Médecine Génomique des Maladies Rares, Hôpital Necker-Enfants Malades, Assistance publique, Hôpitaux de Paris (AP-HP), Paris, France
| | - Bertrand Knebelmann
- Service de Néphrologie, Centre de Référence MARHEA, Hôpital Necker-Enfants Malades, Assistance publique, Hôpitaux de Paris (AP-HP), Paris, France
| | - Stéphane Burtey
- Inserm, C2VN, INRAE, C2VN, Aix-Marseille Université, Marseille, France
- Centre de Néphrologie et Transplantation Rénale, AP-HM Hôpital de la Conception, Marseille, France
| | - Vincent Vuiblet
- Service de Néphrologie, CHU de Reims, Reims, France
- Service de Pathologie, CHU De Reims, Reims, France
- Institut d'Intelligence Artificielle en Santé, Université de Reims Champagne-Ardenne et CHU de Reims, Reims, France
| | - Corinne Antignac
- Laboratoire des Maladies Rénales Héréditaires, Inserm UMR 1163, Institut Imagine, Université Paris Cité, Paris, France
- Service de Médecine Génomique des Maladies Rares, Hôpital Necker-Enfants Malades, Assistance publique, Hôpitaux de Paris (AP-HP), Paris, France
| | - Patrick Nitschké
- Plateforme Bio-informatique, Inserm UMR 1163, Institut Imagine, Université Paris Cité, Paris, France
| | - Guillaume Dorval
- Laboratoire des Maladies Rénales Héréditaires, Inserm UMR 1163, Institut Imagine, Université Paris Cité, Paris, France
- Service de Médecine Génomique des Maladies Rares, Hôpital Necker-Enfants Malades, Assistance publique, Hôpitaux de Paris (AP-HP), Paris, France
| |
Collapse
|
20
|
Leonard AS, Crysnanto D, Mapel XM, Bhati M, Pausch H. Graph construction method impacts variation representation and analyses in a bovine super-pangenome. Genome Biol 2023; 24:124. [PMID: 37217946 PMCID: PMC10204317 DOI: 10.1186/s13059-023-02969-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 05/10/2023] [Indexed: 05/24/2023] Open
Abstract
BACKGROUND Several models and algorithms have been proposed to build pangenomes from multiple input assemblies, but their impact on variant representation, and consequently downstream analyses, is largely unknown. RESULTS We create multi-species super-pangenomes using pggb, cactus, and minigraph with the Bos taurus taurus reference sequence and eleven haplotype-resolved assemblies from taurine and indicine cattle, bison, yak, and gaur. We recover 221 k nonredundant structural variations (SVs) from the pangenomes, of which 135 k (61%) are common to all three. SVs derived from assembly-based calling show high agreement with the consensus calls from the pangenomes (96%), but validate only a small proportion of variations private to each graph. Pggb and cactus, which also incorporate base-level variation, have approximately 95% exact matches with assembly-derived small variant calls, which significantly improves the edit rate when realigning assemblies compared to minigraph. We use the three pangenomes to investigate 9566 variable number tandem repeats (VNTRs), finding 63% have identical predicted repeat counts in the three graphs, while minigraph can over or underestimate the count given its approximate coordinate system. We examine a highly variable VNTR locus and show that repeat unit copy number impacts the expression of proximal genes and non-coding RNA. CONCLUSIONS Our findings indicate good consensus between the three pangenome methods but also show their individual strengths and weaknesses that need to be considered when analysing different types of variants from multiple input assemblies.
Collapse
Affiliation(s)
- Alexander S Leonard
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland.
| | - Danang Crysnanto
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | - Xena M Mapel
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | - Meenu Bhati
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | - Hubert Pausch
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland.
| |
Collapse
|
21
|
Park J, Kaufman E, Valdmanis PN, Bafna V. TRviz: a Python library for decomposing and visualizing tandem repeat sequences. BIOINFORMATICS ADVANCES 2023; 3:vbad058. [PMID: 37168281 PMCID: PMC10166586 DOI: 10.1093/bioadv/vbad058] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 04/14/2023] [Accepted: 04/24/2023] [Indexed: 05/13/2023]
Abstract
Summary TRviz is an open-source Python library for decomposing, encoding, aligning and visualizing tandem repeat (TR) sequences. TRviz takes a collection of alleles (TR containing sequences) and one or more motifs as input and generates a plot showing the motif composition of the TR sequences. Availability and implementation TRviz is an open-source Python library and freely available at https://github.com/Jong-hun-Park/trviz. Detailed documentation is available at https://trviz.readthedocs.io. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Jonghun Park
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Eli Kaufman
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Paul N Valdmanis
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Vineet Bafna
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
22
|
Lu TY, Smaruj PN, Fudenberg G, Mancuso N, Chaisson MJP. The motif composition of variable number tandem repeats impacts gene expression. Genome Res 2023; 33:511-524. [PMID: 37037626 PMCID: PMC10234305 DOI: 10.1101/gr.276768.122] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 03/29/2023] [Indexed: 04/12/2023]
Abstract
Understanding the impact of DNA variation on human traits is a fundamental question in human genetics. Variable number tandem repeats (VNTRs) make up ∼3% of the human genome but are often excluded from association analysis owing to poor read mappability or divergent repeat content. Although methods exist to estimate VNTR length from short-read data, it is known that VNTRs vary in both length and repeat (motif) composition. Here, we use a repeat-pangenome graph (RPGG) constructed on 35 haplotype-resolved assemblies to detect variation in both VNTR length and repeat composition. We align population-scale data from the Genotype-Tissue Expression (GTEx) Consortium to examine how variations in sequence composition may be linked to expression, including cases independent of overall VNTR length. We find that 9422 out of 39,125 VNTRs are associated with nearby gene expression through motif variations, of which only 23.4% are accessible from length. Fine-mapping identifies 174 genes to be likely driven by variation in certain VNTR motifs and not overall length. We highlight two genes, CACNA1C and RNF213, that have expression associated with motif variation, showing the utility of RPGG analysis as a new approach for trait association in multiallelic and highly variable loci.
Collapse
Affiliation(s)
- Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Paulina N Smaruj
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Geoffrey Fudenberg
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Nicholas Mancuso
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, California 90033, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA;
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, California 90033, USA
| |
Collapse
|
23
|
Jam HZ, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, Adam Y, Maksimov M, Huang B, Dolzhenko E, Qiu Y, Kakembo FE, Joseph H, Onyido B, Adeyemi J, Bakhtiari M, Park J, Javadzadeh S, Jjingo D, Adebiyi E, Bafna V, Gymrek M. A deep population reference panel of tandem repeat variation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.09.531600. [PMID: 36945429 PMCID: PMC10028971 DOI: 10.1101/2023.03.09.531600] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Tandem repeats (TRs) represent one of the largest sources of genetic variation in humans and are implicated in a range of phenotypes. Here we present a deep characterization of TR variation based on high coverage whole genome sequencing from 3,550 diverse individuals from the 1000 Genomes Project and H3Africa cohorts. We develop a method, EnsembleTR, to integrate genotypes from four separate methods resulting in high-quality genotypes at more than 1.7 million TR loci. Our catalog reveals novel sequence features influencing TR heterozygosity, identifies population-specific trinucleotide expansions, and finds hundreds of novel eQTL signals. Finally, we generate a phased haplotype panel which can be used to impute most TRs from nearby single nucleotide polymorphisms (SNPs) with high accuracy. Overall, the TR genotypes and reference haplotype panel generated here will serve as valuable resources for future genome-wide and population-wide studies of TRs and their role in human phenotypes.
Collapse
Affiliation(s)
- Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Yang Li
- Department of Medicine, University of California San Diego, La Jolla, CA
| | - Ross DeVito
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA
| | - Nichole Ma
- Department of Medicine, University of California San Diego, La Jolla, CA
| | - Ibra Lujumba
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
| | - Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mikhail Maksimov
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Bonnie Huang
- Department of Bioengineering, University of California San Diego, La Jolla, CA
| | | | - Yunjiang Qiu
- Illumina Incorporated, San Diego, California 92122, USA
| | - Fredrick Elishama Kakembo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
| | - Habi Joseph
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
| | - Blessing Onyido
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Jumoke Adeyemi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Mehrdad Bakhtiari
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Daudi Jjingo
- The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, Makerere University, Kampala-Uganda
- Department of Computer Science, Makerere University, Kampala, Uganda
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun, 112233, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, Baden-Württemberg, 69120, Germany
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA
- Department of Medicine, University of California San Diego, La Jolla, CA
| |
Collapse
|
24
|
Hamanaka K, Yamauchi D, Koshimizu E, Watase K, Mogushi K, Ishikawa K, Mizusawa H, Tsuchida N, Uchiyama Y, Fujita A, Misawa K, Mizuguchi T, Miyatake S, Matsumoto N. Genome-wide identification of tandem repeats associated with splicing variation across 49 tissues in humans. Genome Res 2023; 33:435-447. [PMID: 37307504 PMCID: PMC10078293 DOI: 10.1101/gr.277335.122] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 02/22/2023] [Indexed: 03/29/2023]
Abstract
Tandem repeats (TRs) are one of the largest sources of polymorphism, and their length is associated with gene regulation. Although previous studies reported several tandem repeats regulating gene splicing in cis (spl-TRs), no large-scale study has been conducted. In this study, we established a genome-wide catalog of 9537 spl-TRs with a total of 58,290 significant TR-splicing associations across 49 tissues (false discovery rate 5%) by using Genotype-Tissue expression (GTex) Project data. Regression models explaining splicing variation by using spl-TRs and other flanking variants suggest that at least some of the spl-TRs directly modulate splicing. In our catalog, two spl-TRs are known loci for repeat expansion diseases, spinocerebellar ataxia 6 (SCA6) and 12 (SCA12). Splicing alterations by these spl-TRs were compatible with those observed in SCA6 and SCA12. Thus, our comprehensive spl-TR catalog may help elucidate the pathomechanism of genetic diseases.
Collapse
Affiliation(s)
- Kohei Hamanaka
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Kanagawa 236-0004, Japan
| | | | - Eriko Koshimizu
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Kanagawa 236-0004, Japan
| | - Kei Watase
- Center for Brain Integration Research, Tokyo Medical and Dental University, Tokyo 113-8510, Japan
| | - Kaoru Mogushi
- Intractable Disease Research Center, Juntendo University Graduate School of Medicine, Tokyo 113-8421, Japan
| | - Kinya Ishikawa
- The Center for Personalized Medicine for Healthy Aging, Tokyo Medical and Dental University, Bunkyo-ku, Tokyo 113-8510, Japan
| | - Hidehiro Mizusawa
- Department of Neurology, National Center of Neurology and Psychiatry, Kodaira, Tokyo 187-8551, Japan
| | - Naomi Tsuchida
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Kanagawa 236-0004, Japan
- Department of Rare Disease Genomics, Yokohama City University Hospital, Yokohama, Kanagawa 236-0004, Japan
| | - Yuri Uchiyama
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Kanagawa 236-0004, Japan
- Department of Rare Disease Genomics, Yokohama City University Hospital, Yokohama, Kanagawa 236-0004, Japan
| | - Atsushi Fujita
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Kanagawa 236-0004, Japan
| | - Kazuharu Misawa
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Kanagawa 236-0004, Japan
| | - Takeshi Mizuguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Kanagawa 236-0004, Japan
| | - Satoko Miyatake
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Kanagawa 236-0004, Japan
- Clinical Genetics Department, Yokohama City University Hospital, Yokohama, Kanagawa 236-0004, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Kanagawa 236-0004, Japan;
| |
Collapse
|
25
|
Ding YC, Adamson AW, Bakhtiari M, Patrick C, Park J, Laitman Y, Weitzel JN, Bafna V, Friedman E, Neuhausen SL. Variable number tandem repeats (VNTRs) as modifiers of breast cancer risk in carriers of BRCA1 185delAG. Eur J Hum Genet 2023; 31:216-222. [PMID: 36434258 PMCID: PMC9905572 DOI: 10.1038/s41431-022-01238-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 10/10/2022] [Accepted: 11/08/2022] [Indexed: 11/27/2022] Open
Abstract
Despite substantial efforts in identifying both rare and common variants affecting disease risk, in the majority of diseases, a large proportion of unexplained genetic risk remains. We propose that variable number tandem repeats (VNTRs) may explain a proportion of the missing genetic risk. Herein, in a pilot study with a retrospective cohort design, we tested whether VNTRs are causal modifiers of breast cancer risk in 347 female carriers of the BRCA1 185delAG pathogenic variant, an important group given their high risk of developing breast cancer. We performed targeted-capture to sequence VNTRs, called genotypes with adVNTR, tested the association of VNTRs and breast cancer risk using Cox regression models, and estimated the effect size using a retrospective likelihood approach. Of 303 VNTRs that passed quality control checks, 4 VNTRs were significantly associated with risk to develop breast cancer at false discovery rate [FDR] < 0.05 and an additional 4 VNTRs had FDR < 0.25. After determining the specific risk alleles, there was a significantly earlier age at diagnosis of breast cancer in carriers of the risk alleles compared to those without the risk alleles for seven of eight VNTRs. One example is a VNTR in exon 2 of LINC01973 with a per-allele hazard ratio of 1.58 (1.07-2.33) and 5.28 (2.79-9.99) for the homozygous risk-allele genotype. Results from this first systematic study of VNTRs demonstrate that VNTRs may explain a proportion of the unexplained genetic risk for breast cancer.
Collapse
Affiliation(s)
- Yuan Chun Ding
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, CA, USA
| | - Aaron W Adamson
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, CA, USA
| | - Mehrdad Bakhtiari
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | - Carmina Patrick
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, CA, USA
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | - Yael Laitman
- Oncogenetics Unit, Institute of Human Genetics, Sheba Medical Center, Ramat Gan, Israel
| | - Jeffrey N Weitzel
- Latin American School of Oncology, Tuxla Gutierrez, Chiapas, MX and Natera, San Carlos, CA, USA
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | - Eitan Friedman
- Oncogenetics Unit, Institute of Human Genetics, Sheba Medical Center, Ramat Gan, Israel
- The Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
- The Center for Preventive Personalized Medicine, Assuta Medical Center, Tel Aviv, Israel
| | - Susan L Neuhausen
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, CA, USA.
| |
Collapse
|
26
|
Akaishi T, Fujiwara K, Ishii T. Variable number tandem repeats of a 9-base insertion in the N-terminal domain of severe acute respiratory syndrome coronavirus 2 spike gene. Front Microbiol 2023; 13:1089399. [PMID: 36687631 PMCID: PMC9846035 DOI: 10.3389/fmicb.2022.1089399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 12/12/2022] [Indexed: 01/06/2023] Open
Abstract
Introduction The world is still struggling against the pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), in 2022. The pandemic has been facilitated by the intermittent emergence of variant strains, which has been explained and classified mainly by the patterns of point mutations of the spike (S) gene. However, the profiles of insertions/deletions (indels) in SARS-CoV-2 genomes during the pandemic remain largely unevaluated yet. Methods In this study, we first screened for the genome regions of polymorphic indel sites by performing multiple sequence alignment; then, NCBI BLAST search and GISAID database search were performed to comprehensively investigate the indel profiles at the polymorphic indel hotspot and elucidate the emergence and spread of the indels in time and geographical distribution. Results A polymorphic indel hotspot was identified in the N-terminal domain of the S gene at approximately 22,200 nucleotide position, corresponding to 210-215 amino acid positions of SARS-CoV-2 S protein. This polymorphic hotspot was comprised of adjacent 3-base deletion (5'-ATT-3'; Spike_N211del) and 9-base insertion (5'-AGCCAGAAG-3'; Spike_ins214EPE). By performing NCBI BLAST search and GISAID database search, we identified several types of tandem repeats of the 9-base insertion, creating an 18-base insertion (Spike_ins214EPEEPE, Spike_ins214EPDEPE). The results of the searches suggested that the two-cycle tandem repeats of the 9-base insertion were created in November 2021 in Central Europe, whereas the emergence of the original one-cycle 9-base insertion (Spike_ins214EPE) would date back to the middle of 2020 and was away from the Central Europe. The identified 18-base insertions based on 2-cycle tandem repeat of the 9-base insertion were collected between November 2021 and April 2022, suggesting that these mutations could not survive and have been already eliminated. Discussion The GISAID database search implied that this polymorphic indel hotspot to be with one of the highest tolerability for incorporating indels in SARS-CoV-2 S gene. In summary, the present study identified a variable number of tandem repeat of 9-base insertion in the N-terminal domain of SARS-CoV-2 S gene, and the repeat could have occurred at different time from the insertion of the original 9-base insertion.
Collapse
Affiliation(s)
- Tetsuya Akaishi
- Department of Education and Support for Regional Medicine, Tohoku University, Sendai, Japan,COVID-19 Testing Center, Tohoku University, Sendai, Japan,*Correspondence: Tetsuya Akaishi, ✉
| | - Kei Fujiwara
- Department of Gastroenterology and Metabolism, Nagoya City University, Nagoya, Japan
| | - Tadashi Ishii
- Department of Education and Support for Regional Medicine, Tohoku University, Sendai, Japan,COVID-19 Testing Center, Tohoku University, Sendai, Japan
| |
Collapse
|
27
|
Wang X, Budowle B, Ge J. USAT: a bioinformatic toolkit to facilitate interpretation and comparative visualization of tandem repeat sequences. BMC Bioinformatics 2022; 23:497. [PMID: 36402991 PMCID: PMC9675219 DOI: 10.1186/s12859-022-05021-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 10/29/2022] [Indexed: 11/21/2022] Open
Abstract
Background Tandem repeats (TR), highly variable genomic variants, are widely used in individual identification, disease diagnostics, and evolutionary studies. The recent advances in sequencing technologies and bioinformatic tools facilitate calling TR haplotypes genome widely. Both length-based and sequence-based TR alleles are used in different applications. However, sequence-based TR alleles could provide the highest precision in characterizing TR haplotypes. The need to identify the differences at the single nucleotide level between or among TR haplotypes with an easy-use bioinformatic tool is essential. Results In this study, we developed a Universal STR Allele Toolkit (USAT) for TR haplotype analysis, which takes TR haplotype output from existing tools to perform allele size conversion, sequence comparison of haplotypes, figure plotting, comparison for allele distribution, and interactive visualization. An exemplary application of USAT for analysis of the CODIS core STR loci for DNA forensics with benchmarking human individuals demonstrated the capabilities of USAT. USAT has user-friendly graphic interfaces and runs fast in major computing operating systems with parallel computing enabled. Conclusion USAT is a user-friendly bioinformatics software for interpretation, visualization, and comparisons of TRs. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-05021-1.
Collapse
Affiliation(s)
- Xuewen Wang
- grid.266869.50000 0001 1008 957XCenter for Human Identification, Health Science Center, University of North Texas, Fort Worth, TX USA
| | - Bruce Budowle
- grid.266869.50000 0001 1008 957XCenter for Human Identification, Health Science Center, University of North Texas, Fort Worth, TX USA ,grid.266871.c0000 0000 9765 6057Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX USA
| | - Jianye Ge
- grid.266869.50000 0001 1008 957XCenter for Human Identification, Health Science Center, University of North Texas, Fort Worth, TX USA ,grid.266871.c0000 0000 9765 6057Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX USA
| |
Collapse
|
28
|
Park J, Bakhtiari M, Popp B, Wiesener M, Bafna V. Detecting tandem repeat variants in coding regions using code-adVNTR. iScience 2022; 25:104785. [PMID: 35982790 PMCID: PMC9379575 DOI: 10.1016/j.isci.2022.104785] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 05/16/2022] [Accepted: 07/13/2022] [Indexed: 11/25/2022] Open
Abstract
The human genome contains more than one million tandem repeats (TRs), DNA sequences containing multiple approximate copies of a motif repeated contiguously. TRs account for significant genetic variation, with 50 + diseases attributed to changes in motif number. A few diseases have been to be caused by small indels in variable number tandem repeats (VNTRs) including poly-cystic kidney disease type 1 (MCKD1) and monogenic type 1 diabetes. However, small indels in VNTRs are largely unexplored mainly due to the long and complex structure of VNTRs with multiple motifs. We developed a method, code-adVNTR, that utilizes multi-motif hidden Markov models to detect both, motif count variation and small indels, within VNTRs. In simulated data, code-adVNTR outperformed GATK-HaplotypeCaller in calling small indels within large VNTRs. We used code-adVNTR to characterize coding VNTRs in the 1000 genomes data identifying many population-specific variants, and to reliably call MUC1 mutations for MCKD1.
Collapse
Affiliation(s)
- Jonghun Park
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Mehrdad Bakhtiari
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Bernt Popp
- Institute of Human Genetics, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
- Institute of Human Genetics, University of Leipzig Hospitals and Clinics, Leipzig, Germany
| | - Michael Wiesener
- Department of Nephrology and Hypertension, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
| | - Vineet Bafna
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
29
|
Garg P, Jadhav B, Lee W, Rodriguez OL, Martin-Trujillo A, Sharp AJ. A phenome-wide association study identifies effects of copy-number variation of VNTRs and multicopy genes on multiple human traits. Am J Hum Genet 2022; 109:1065-1076. [PMID: 35609568 PMCID: PMC9247821 DOI: 10.1016/j.ajhg.2022.04.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 04/28/2022] [Indexed: 01/04/2023] Open
Abstract
The human genome contains tens of thousands of large tandem repeats and hundreds of genes that show common and highly variable copy-number changes. Due to their large size and repetitive nature, these variable number tandem repeats (VNTRs) and multicopy genes are generally recalcitrant to standard genotyping approaches and, as a result, this class of variation is poorly characterized. However, several recent studies have demonstrated that copy-number variation of VNTRs can modify local gene expression, epigenetics, and human traits, indicating that many have a functional role. Here, using read depth from whole-genome sequencing to profile copy number, we report results of a phenome-wide association study (PheWAS) of VNTRs and multicopy genes in a discovery cohort of ∼35,000 samples, identifying 32 traits associated with copy number of 38 VNTRs and multicopy genes at 1% FDR. We replicated many of these signals in an independent cohort and observed that VNTRs showing trait associations were significantly enriched for expression QTLs with nearby genes, providing strong support for our results. Fine-mapping studies indicated that in the majority (∼90%) of cases, the VNTRs and multicopy genes we identified represent the causal variants underlying the observed associations. Furthermore, several lie in regions where prior SNV-based GWASs have failed to identify any significant associations with these traits. Our study indicates that copy number of VNTRs and multicopy genes contributes to diverse human traits and suggests that complex structural variants potentially explain some of the so-called "missing heritability" of SNV-based GWASs.
Collapse
Affiliation(s)
- Paras Garg
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, Room 8-116, Box 1498, New York, NY 10029, USA
| | - Bharati Jadhav
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, Room 8-116, Box 1498, New York, NY 10029, USA
| | - William Lee
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, Room 8-116, Box 1498, New York, NY 10029, USA
| | - Oscar L Rodriguez
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, Room 8-116, Box 1498, New York, NY 10029, USA
| | - Alejandro Martin-Trujillo
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, Room 8-116, Box 1498, New York, NY 10029, USA
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, Room 8-116, Box 1498, New York, NY 10029, USA.
| |
Collapse
|
30
|
Dorado P, Santos-Díaz G, Gutiérrez-Martín Y, Suárez-Santisteban MÁ. Frequency of CYP2C9 Promoter Variable Number Tandem Repeat Polymorphism in a Spanish Population: Linkage Disequilibrium with CYP2C9*3 Allele. J Pers Med 2022; 12:782. [PMID: 35629204 PMCID: PMC9143480 DOI: 10.3390/jpm12050782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Revised: 05/02/2022] [Accepted: 05/10/2022] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND A promoter variable number tandem repeat polymorphism (pVNTR) of CYP2C9 is described with three types of fragments: short (pVNTR-S), medium (pVNTR-M) and long (pVNTR-L). The pVNTR-S allele reduces the CYP2C9 mRNA level in the human liver, and it was found to be in high linkage disequilibrium (LD) with the CYP2C9*3 allele in a White American population. The aim of the present study is to determine the presence and frequency of CYP2C9pVNTR in a Spanish population, as well as analyzing whether the pVNTR-S allele is in LD with the CYP2C9*3 allele in this population. SUBJECTS AND METHODS A total of 209 subjects from Spain participated in the study. The CYP2C9 promoter region was amplified and analyzed using capillary electrophoresis. Genotyping for CYP2C9*2 and *3 variants was performed using a fluorescence-based allele-specific TaqMan allelic discrimination assay. RESULTS The frequencies of CYP2C9pVNTR-L, M and S variant alleles are 0.10, 0.82 and 0.08, respectively. A high LD between CYP2C9pVNTR-S and CYP2C9*3 variant alleles is observed (D' = 0.929, r2 = 0.884). CONCLUSION The results from the present study show that both CYP2C9pVNTR and CYP2C9*3 are in a high LD, which could help to better understand the lower metabolic activity exhibited by CYP2C9*3 allele carriers. These data might be relevant for implementation in the diverse clinical guidelines for the pharmacogenetic analysis of the CYP2C9 gene before treatment with different drugs, such as non-steroidal anti-inflammatory drugs, warfarin, phenytoin and statins.
Collapse
Affiliation(s)
- Pedro Dorado
- Departamento de Terapéutica Médico-Quirúrgica, Centro Universitario de Plasencia, Universidad de Extremadura, Avda. Virgen del Puerto s/n, 10600 Plasencia, Spain
- Instituto Universitario de Investigación Biosanitaria de Extremadura (INUBE), Avenida de la Investigación s/n, 06071 Badajoz, Spain; (G.S.-D.); (M.Á.S.-S.)
| | - Gracia Santos-Díaz
- Instituto Universitario de Investigación Biosanitaria de Extremadura (INUBE), Avenida de la Investigación s/n, 06071 Badajoz, Spain; (G.S.-D.); (M.Á.S.-S.)
| | - Yolanda Gutiérrez-Martín
- Bioscience Applied Techniques Services, Servicio de Apoyo a la Investigación, Universidad de Extremadura, Avenida de la Investigación s/n, 06071 Badajoz, Spain;
| | - Miguel Ángel Suárez-Santisteban
- Instituto Universitario de Investigación Biosanitaria de Extremadura (INUBE), Avenida de la Investigación s/n, 06071 Badajoz, Spain; (G.S.-D.); (M.Á.S.-S.)
- Nephrology Department, Virgen del Puerto Hospital, Servicio Extremeño de Salud, 10600 Plasencia, Spain
| |
Collapse
|
31
|
Kasai S, Nishizawa D, Hasegawa J, Fukuda KI, Ichinohe T, Nagashima M, Hayashida M, Ikeda K. Short Tandem Repeat Variation in the CNR1 Gene Associated With Analgesic Requirements of Opioids in Postoperative Pain Management. Front Genet 2022; 13:815089. [PMID: 35360861 PMCID: PMC8963810 DOI: 10.3389/fgene.2022.815089] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 02/02/2022] [Indexed: 11/25/2022] Open
Abstract
Short tandem repeats (STRs) and variable number of tandem repeats (VNTRs) that have been identified at approximately 0.7 and 0.5 million loci in the human genome, respectively, are highly multi-allelic variations rather than single-nucleotide polymorphisms. The number of repeats of more than a few thousand STRs was associated with the expression of nearby genes, indicating that STRs are influential genetic variations in human traits. Analgesics act on the central nervous system via their intrinsic receptors to produce analgesic effects. In the present study, we focused on STRs and VNTRs in the CNR1, GRIN2A, PENK, and PDYN genes and analyzed two peripheral pain sensation-related traits and seven analgesia-related traits in postoperative pain management. A total of 192 volunteers who underwent the peripheral pain sensation tests and 139 and 252 patients who underwent open abdominal and orthognathic cosmetic surgeries, respectively, were included in the study. None of the four STRs or VNTRs were associated with peripheral pain sensation. Short tandem repeats in the CNR1, GRIN2A, and PENK genes were associated with the frequency of fentanyl use, fentanyl dose, and visual analog scale pain scores 3 h after orthognathic cosmetic surgery (Spearman's rank correlation coefficient ρ = 0.199, p = 0.002, ρ = 0.174, p = 0.006, and ρ = 0.135, p = 0.033, respectively), analgesic dose, including epidural analgesics after open abdominal surgery (ρ = -0.200, p = 0.018), and visual analog scale pain scores 24 h after orthognathic cosmetic surgery (ρ = 0.143, p = 0.023), respectively. The associations between STRs in the CNR1 gene and the frequency of fentanyl use and fentanyl dose after orthognathic cosmetic surgery were confirmed by Holm's multiple-testing correction. These findings indicate that STRs in the CNR1 gene influence analgesia in the orofacial region.
Collapse
Affiliation(s)
- Shinya Kasai
- Addictive Substance Project, Department of Psychiatry and Behavioral Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Daisuke Nishizawa
- Addictive Substance Project, Department of Psychiatry and Behavioral Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Junko Hasegawa
- Addictive Substance Project, Department of Psychiatry and Behavioral Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Ken-ichi Fukuda
- Department of Oral Health Science, Tokyo Dental College, Tokyo, Japan
| | - Tatsuya Ichinohe
- Department of Dental Anesthesiology, Tokyo Dental College, Tokyo, Japan
| | - Makoto Nagashima
- Department of Surgery, Toho University Sakura Medical Center, Sakura, Japan
| | - Masakazu Hayashida
- Department of Anesthesiology and Pain Medicine, Juntendo University School of Medicine, Tokyo, Japan
| | - Kazutaka Ikeda
- Addictive Substance Project, Department of Psychiatry and Behavioral Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| |
Collapse
|
32
|
McHale P, Quinlan AR. trfermikit: a tool to discover VNTR-associated deletions. Bioinformatics 2022; 38:1231-1234. [PMID: 34864893 PMCID: PMC8826174 DOI: 10.1093/bioinformatics/btab805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 10/25/2021] [Accepted: 11/27/2021] [Indexed: 02/04/2023] Open
Abstract
SUMMARY We present trfermikit, a software tool designed to detect deletions larger than 50 bp occurring in Variable Number Tandem Repeats using Illumina DNA sequencing reads. In such regions, it achieves a better tradeoff between sensitivity and false discovery than a state-of-the-art structural variation caller, Manta and complements it by recovering a significant number of deletions that Manta missed. trfermikit is based upon the fermikit pipeline, which performs read assembly, maps the assembly to the reference genome and calls variants from the alignment. AVAILABILITY AND IMPLEMENTATION https://github.com/petermchale/trfermikit. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peter McHale
- Department of Human Genetics and Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112, USA
| | - Aaron R Quinlan
- Department of Human Genetics and Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112, USA
| |
Collapse
|
33
|
Jafari P, Baghernia S, Moghanibashi M, Mohamadynejad P. Significant Association of Variable Number Tandem Repeat Polymorphism rs58335419 in the MIR137 Gene With the Risk of Gastric and Colon Cancers. Br J Biomed Sci 2022; 79:10095. [PMID: 35996520 PMCID: PMC8915678 DOI: 10.3389/bjbs.2021.10095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 12/23/2021] [Indexed: 11/20/2022]
Abstract
The purpose of the article: The MIR137 gene acts as a tumor-suppressor gene in colon and gastric cancers. The aim of this study was to investigate the association of functional variable number tandem repeat (VNTR) polymorphism rs58335419 locating in the upstream of the MIR137 gene with the risk of colon and gastric cancers. Materials and methods: Totally, 429 individuals were contributed in the study, including 154 colon and 120 gastric cancer patients and 155 healthy controls. The target VNTR was genotyped using PCR and electrophoresis for all samples. Statistical analysis was performed using SPSS 21.0 software and by T, χ2 and logistic regression tests. Results: Excluding the rare genotypes, our results showed that genotype 3/5 (95% CI = 1.08–3.73, OR = 2.01, p = 0.026) significantly increased the risk of colon cancer but not gastric cancer (95% CI = 0.88–3.30, OR = 1.70, p = 0.114). Also, in the stratification analysis for VNTRs and sex, genotypes 3/4 (95% CI = 1.00–6.07, OR = 2.46, p = 0.049) and 3/5 (95% CI = 1.25–7.18, OR = 2.99, p = 0.014) significantly increased the risk of colon cancer in men but not in women. In addition, all genotypes including the rare genotypes as a group, significantly increase the risk of gastric (95% CI = 1.14–3.00, OR = 1.85, p = 0.012) and colon (95% CI = 1.38–3.43, OR = 2.17, p = 0.001) cancers compared to the genotype 3/3 as a reference. Conclusion: The results show that increasing the copy of VNTR in the MIR137 gene, increases the risk of colon and gastric cancers and can serve as a marker for susceptibility to colon and gastric cancers.
Collapse
Affiliation(s)
- Pegah Jafari
- Department of Biology, Faculty of Basic Sciences, Kazerun Branch, Islamic Azad University, Kazerun, Iran
| | - Sedighe Baghernia
- Department of Biology, Faculty of Basic Sciences, Kazerun Branch, Islamic Azad University, Kazerun, Iran
| | - Mehdi Moghanibashi
- Department of Genetics, School of Medicine, Kazerun Branch, Islamic Azad University, Kazerun, Iran
- *Correspondence: Mehdi Moghanibashi,
| | - Parisa Mohamadynejad
- Department of Biology, Faculty of Basic Sciences, Shahrekord Branch, Islamic Azad University, Shahrekord, Iran
| |
Collapse
|
34
|
Ghamari R, Yazarlou F, Khosravizadeh Z, Moradkhani A, Abdollahi E, Alizadeh F. Serotonin transporter functional polymorphisms potentially increase risk of schizophrenia separately and as a haplotype. Sci Rep 2022; 12:1336. [PMID: 35079035 PMCID: PMC8789837 DOI: 10.1038/s41598-022-05206-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 01/10/2022] [Indexed: 11/10/2022] Open
Abstract
Schizophrenia is a severe, disabling psychiatric disorder with unclear etiology. Family-based, twins, and adoption studies have shown that genetic factors have major contributions in schizophrenia occurrence. Until now, many studies have discovered the association of schizophrenia and its comorbid symptoms with functional polymorphisms that lie within serotonin reuptake pathway genes. Here, we aimed to investigate the association of three variable number tandem repeats (VNTR) functional polymorphisms in MAOA and SLC6A4 with schizophrenia in the Iranian population. Two hundred and forty-one subjects with schizophrenia and three hundred and seventy age and sex-matched healthy controls were genotyped for MAOA promoter uVNTR, 5-HTTLPR, and STin2 polymorphisms. Genotyping was performed by polymerase chain reaction (PCR) with locus-specific primers and running the PCR product on agarose 2.5% gel electrophoresis. Finally, the statistical inference was performed using R programming language and Haploview software. MAOA promoter uVNTR analysis of allele frequency showed no differences between schizophrenia subjects and healthy controls in both males and females and no significant differences were observed between female cases and female controls in MAOA promoter uVNTR 4 repeat frequency. Also, there were no differences between Schizophrenia and healthy control groups in 5-HTTLPR allele and genotype frequency but, 5-HTTLPR S allele carriers are significantly more frequent among cases. In addition, STin2.12 repeats were significantly more frequent among schizophrenia patients. Genotype comparison suggested that 5-HTTLPR S allele and STin2.12 repeat carriers were significantly more frequent among schizophrenia cases and being STin2.12 repeat carrier significantly increase the risk of schizophrenia occurrence. Besides, analysis of haplotype showed stronger linkage disequilibrium between 5-HTTLPR and STin2 haplotype block in cases than controls. These results suggest that SLC6A4 functional polymorphisms potentially could play a possible role as risk factors for the incidence of schizophrenia.
Collapse
Affiliation(s)
- Rana Ghamari
- Department of Genetics, Faculty of Biology, Kharazmi University, Tehran, Iran
| | - Fatemeh Yazarlou
- Department of Medical Genetics, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Zahra Khosravizadeh
- Clinical Research Development Unit, Infertility treatment clinic, Amiralmomenin Hospital, Arak University of Medical Sciences, Arak, Iran
| | - Atefeh Moradkhani
- Department of Biology, Faculty of Science, Zanjan Branch, Islamic Azad University, Zanjan, Iran
| | - Elaheh Abdollahi
- Department of Medical Genetics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Fatemeh Alizadeh
- Department of Genomic Psychiatry and Behavioral Genomics (DGPBG), Roozbeh Hospital, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
35
|
Annear DJ, Vandeweyer G, Sanchis-Juan A, Raymond FL, Kooy RF. Non-Mendelian inheritance patterns and extreme deviation rates of CGG repeats in autism. Genome Res 2022; 32:1967-1980. [PMID: 36351771 PMCID: PMC9808627 DOI: 10.1101/gr.277011.122] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 10/14/2022] [Indexed: 11/10/2022]
Abstract
As expansions of CGG short tandem repeats (STRs) are established as the genetic etiology of many neurodevelopmental disorders, we aimed to elucidate the inheritance patterns and role of CGG STRs in autism-spectrum disorder (ASD). By genotyping 6063 CGG STR loci in a large cohort of trios and quads with an ASD-affected proband, we determined an unprecedented rate of CGG repeat length deviation across a single generation. Although the concept of repeat length being linked to deviation rate was solidified, we show how shorter STRs display greater degrees of size variation. We observed that CGG STRs did not segregate by Mendelian principles but with a bias against longer repeats, which appeared to magnify as repeat length increased. Through logistic regression, we identified 19 genes that displayed significantly higher rates and degrees of CGG STR expansion within the ASD-affected probands (P < 1 × 10-5). This study not only highlights novel repeat expansions that may play a role in ASD but also reinforces the hypothesis that CGG STRs are specifically linked to human cognition.
Collapse
Affiliation(s)
- Dale J. Annear
- Department of Medical Genetics, University of Antwerp, 2600 Antwerp, Belgium
| | - Geert Vandeweyer
- Department of Medical Genetics, University of Antwerp, 2600 Antwerp, Belgium
| | - Alba Sanchis-Juan
- NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, United Kingdom;,Department of Haematology, University of Cambridge, NHS Blood and Transplant Centre, Cambridge, CB2 0PT, United Kingdom
| | - F. Lucy Raymond
- NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, United Kingdom;,Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, CB2 0XY, United Kingdom
| | - R. Frank Kooy
- Department of Medical Genetics, University of Antwerp, 2600 Antwerp, Belgium
| |
Collapse
|
36
|
Gall-Duncan T, Sato N, Yuen RKC, Pearson CE. Advancing genomic technologies and clinical awareness accelerates discovery of disease-associated tandem repeat sequences. Genome Res 2022; 32:1-27. [PMID: 34965938 PMCID: PMC8744678 DOI: 10.1101/gr.269530.120] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 11/29/2021] [Indexed: 11/25/2022]
Abstract
Expansions of gene-specific DNA tandem repeats (TRs), first described in 1991 as a disease-causing mutation in humans, are now known to cause >60 phenotypes, not just disease, and not only in humans. TRs are a common form of genetic variation with biological consequences, observed, so far, in humans, dogs, plants, oysters, and yeast. Repeat diseases show atypical clinical features, genetic anticipation, and multiple and partially penetrant phenotypes among family members. Discovery of disease-causing repeat expansion loci accelerated through technological advances in DNA sequencing and computational analyses. Between 2019 and 2021, 17 new disease-causing TR expansions were reported, totaling 63 TR loci (>69 diseases), with a likelihood of more discoveries, and in more organisms. Recent and historical lessons reveal that properly assessed clinical presentations, coupled with genetic and biological awareness, can guide discovery of disease-causing unstable TRs. We highlight critical but underrecognized aspects of TR mutations. Repeat motifs may not be present in current reference genomes but will be in forthcoming gapless long-read references. Repeat motif size can be a single nucleotide to kilobases/unit. At a given locus, repeat motif sequence purity can vary with consequence. Pathogenic repeats can be "insertions" within nonpathogenic TRs. Expansions, contractions, and somatic length variations of TRs can have clinical/biological consequences. TR instabilities occur in humans and other organisms. TRs can be epigenetically modified and/or chromosomal fragile sites. We discuss the expanding field of disease-associated TR instabilities, highlighting prospects, clinical and genetic clues, tools, and challenges for further discoveries of disease-causing TR instabilities and understanding their biological and pathological impacts-a vista that is about to expand.
Collapse
Affiliation(s)
- Terence Gall-Duncan
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Nozomu Sato
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
| | - Ryan K C Yuen
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Christopher E Pearson
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| |
Collapse
|
37
|
Xiao X, Zhang CY, Zhang Z, Hu Z, Li M, Li T. Revisiting tandem repeats in psychiatric disorders from perspectives of genetics, physiology, and brain evolution. Mol Psychiatry 2022; 27:466-475. [PMID: 34650204 DOI: 10.1038/s41380-021-01329-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 09/16/2021] [Accepted: 09/28/2021] [Indexed: 01/28/2023]
Abstract
Genome-wide association studies (GWASs) have revealed substantial genetic components comprised of single nucleotide polymorphisms (SNPs) in the heritable risk of psychiatric disorders. However, genetic risk factors not covered by GWAS also play pivotal roles in these illnesses. Tandem repeats, which are likely functional but frequently overlooked by GWAS, may account for an important proportion in the "missing heritability" of psychiatric disorders. Despite difficulties in characterizing and quantifying tandem repeats in the genome, studies have been carried out in an attempt to describe impact of tandem repeats on gene regulation and human phenotypes. In this review, we have introduced recent research progress regarding the genomic distribution and regulatory mechanisms of tandem repeats. We have also summarized the current knowledge of the genetic architecture and biological underpinnings of psychiatric disorders brought by studies of tandem repeats. These findings suggest that tandem repeats, in candidate psychiatric risk genes or in different levels of linkage disequilibrium (LD) with psychiatric GWAS SNPs and haplotypes, may modulate biological phenotypes related to psychiatric disorders (e.g., cognitive function and brain physiology) through regulating alternative splicing, promoter activity, enhancer activity and so on. In addition, many tandem repeats undergo tight natural selection in the human lineage, and likely exert crucial roles in human brain evolution. Taken together, the putative roles of tandem repeats in the pathogenesis of psychiatric disorders is strongly implicated, and using examples from previous literatures, we wish to call for further attention to tandem repeats in the post-GWAS era of psychiatric disorders.
Collapse
Affiliation(s)
- Xiao Xiao
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Chu-Yi Zhang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.,Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Zhuohua Zhang
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Zhonghua Hu
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Department of Critical Care Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Hunan Key Laboratory of Animal Models for Human Diseases, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Eye Center of Xiangya Hospital and Hunan Key Laboratory of Ophthalmology, Central South University, Changsha, Hunan, China. .,National Clinical Research Center on Mental Disorders, Changsha, Hunan, China.
| | - Ming Li
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China. .,CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China. .,KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| | - Tao Li
- Affiliated Mental Health Center & Hangzhou Seventh People's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China. .,Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Guangzhou, China.
| |
Collapse
|
38
|
Rajabi F, Jabalameli N, Rezaei N. The Concept of Immunogenetics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1367:1-17. [DOI: 10.1007/978-3-030-92616-8_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
39
|
Adiguzel Y. Information-theoretic approach in allometric scaling relations of DNA and proteins. Chem Biol Drug Des 2021; 99:331-343. [PMID: 34855304 DOI: 10.1111/cbdd.13988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 10/06/2021] [Accepted: 11/14/2021] [Indexed: 11/28/2022]
Abstract
Allometric scaling relations can be observed in between molecular parameters. Hence, we looked for presence of such relation among sizes (i.e., lengths) of proteins and genes. Protein lengths exist in the literature as the number of amino acids. They can also be derived from the mRNA lengths. Here, we looked for allometric scaling relation by using such data and simultaneously, the data was compared with the sizes of genes and proteins that were obtained from our modified information-theoretic approach. Results implied presence of scaling relation in the calculated results. This was expected due to the implemented modification in the information-theoretic calculation. Relation in the literature-based data was lacking high goodness of fit value. It could be due to physical factors and selective pressures, which ended up in deviations of the literature-sourced values from those in the model. Genome size is correlated with cell size. Intracellular volume, which is related to the DNA size, would require certain number of proteins, the sizes of which can therefore be correlated with the protein sizes. Cell sizes, genome sizes, and average protein and gene sizes, along with the number of proteins, namely the expression levels of the genes, are the physical factors, and the molecular factors influence those physical factors. The selective pressures on those can act through the connection between those physical factors and limit the dynamic ranges. Biological measures could be prone to such forces and are likely to deviate from expected models, regardless of the validity of assumptions, unless those are also implemented in the models. Yet, present discrepancies could be pointing at the need for model improvement, data imperfection, invalid assumptions, etc. Still, current work highlights possible use of information-theoretic approach in allometric scaling relations' studies.
Collapse
Affiliation(s)
- Yekbun Adiguzel
- Department of Medical Biology, School of Medicine, Atilim University, Ankara, Turkey
| |
Collapse
|
40
|
Lu TY, Chaisson MJP. Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs. Nat Commun 2021; 12:4250. [PMID: 34253730 PMCID: PMC8275641 DOI: 10.1038/s41467-021-24378-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Accepted: 06/10/2021] [Indexed: 12/11/2022] Open
Abstract
Variable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease.
Collapse
Affiliation(s)
- Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
41
|
Khorsand P, Denti L, Bonizzoni P, Chikhi R, Hormozdiari F. Comparative genome analysis using sample-specific string detection in accurate long reads. BIOINFORMATICS ADVANCES 2021; 1:vbab005. [PMID: 36700094 PMCID: PMC9710709 DOI: 10.1093/bioadv/vbab005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Motivation Comparative genome analysis of two or more whole-genome sequenced (WGS) samples is at the core of most applications in genomics. These include the discovery of genomic differences segregating in populations, case-control analysis in common diseases and diagnosing rare disorders. With the current progress of accurate long-read sequencing technologies (e.g. circular consensus sequencing from PacBio sequencers), we can dive into studying repeat regions of the genome (e.g. segmental duplications) and hard-to-detect variants (e.g. complex structural variants). Results We propose a novel framework for comparative genome analysis through the discovery of strings that are specific to one genome ('samples-specific' strings). We have developed a novel, accurate and efficient computational method for the discovery of sample-specific strings between two groups of WGS samples. The proposed approach will give us the ability to perform comparative genome analysis without the need to map the reads and is not hindered by shortcomings of the reference genome and mapping algorithms. We show that the proposed approach is capable of accurately finding sample-specific strings representing nearly all variation (>98%) reported across pairs or trios of WGS samples using accurate long reads (e.g. PacBio HiFi data). Availability and implementation Data, code and instructions for reproducing the results presented in this manuscript are publicly available at https://github.com/Parsoa/PingPong. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | - Luca Denti
- Department of Computational Biology, Institut Pasteur, Paris 75015, France
| | | | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, 20126, Italy,To whom correspondence should be addressed. or or
| | - Rayan Chikhi
- Department of Computational Biology, Institut Pasteur, Paris 75015, France,To whom correspondence should be addressed. or or
| | - Fereydoun Hormozdiari
- Genome Center, UC Davis, Davis, CA 95616, USA,UC Davis MIND Institute, Sacramento, CA 95817, USA,Department of Biochemistry and Molecular Medicine, Sacramento, UC Davis, Sacramento, CA 95817, USA,To whom correspondence should be addressed. or or
| |
Collapse
|
42
|
Eslami Rasekh M, Hernández Y, Drinan SD, Fuxman Bass J, Benson G. Genome-wide characterization of human minisatellite VNTRs: population-specific alleles and gene expression differences. Nucleic Acids Res 2021; 49:4308-4324. [PMID: 33849068 PMCID: PMC8096271 DOI: 10.1093/nar/gkab224] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 03/06/2021] [Accepted: 03/18/2021] [Indexed: 11/12/2022] Open
Abstract
Variable Number Tandem Repeats (VNTRs) are tandem repeat (TR) loci that vary in copy number across a population. Using our program, VNTRseek, we analyzed human whole genome sequencing datasets from 2770 individuals in order to detect minisatellite VNTRs, i.e., those with pattern sizes ≥7 bp. We detected 35 638 VNTR loci and classified 5676 as commonly polymorphic (i.e. with non-reference alleles occurring in >5% of the population). Commonly polymorphic VNTR loci were found to be enriched in genomic regions with regulatory function, i.e. transcription start sites and enhancers. Investigation of the commonly polymorphic VNTRs in the context of population ancestry revealed that 1096 loci contained population-specific alleles and that those could be used to classify individuals into super-populations with near-perfect accuracy. Search for quantitative trait loci (eQTLs), among the VNTRs proximal to genes, indicated that in 187 genes expression differences correlated with VNTR genotype. We validated our predictions in several ways, including experimentally, through the identification of predicted alleles in long reads, and by comparisons showing consistency between sequencing platforms. This study is the most comprehensive analysis of minisatellite VNTRs in the human population to date.
Collapse
Affiliation(s)
| | - Yözen Hernández
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | | | - Juan I Fuxman Bass
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biology, Boston University, Boston, MA 02215, USA
| | - Gary Benson
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biology, Boston University, Boston, MA 02215, USA
- Department of Computer Science, Boston University, Boston, MA 02215, USA
| |
Collapse
|