1
|
Vijayaraghavan P, Batalov S, Ding Y, Sanford E, Kingsmore SF, Dimmock D, Hobbs C, Bainbridge M. The Genomic landscape of short tandem repeats across multiple ancestries. PLoS One 2023; 18:e0279430. [PMID: 36701310 PMCID: PMC9879404 DOI: 10.1371/journal.pone.0279430] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 12/07/2022] [Indexed: 01/27/2023] Open
Abstract
Short Tandem Repeats (STRs) have been found to play a role in a myriad of complex traits and genetic diseases. We examined the variability in the lengths of over 850,000 STR loci in 996 children with suspected genetic disorders and 1,178 parents across six separate ancestral groups: Africans, Europeans, East Asians, Admixed Americans, Non-admixed Americans, and Pacific Islanders. For each STR locus we compared allele length between and within each ancestry group. In relation to Europeans, admixed Americans had the most similar STR lengths with only 623 positions either significantly expanded or contracted, while the divergence was highest in Africans, with 4,933 chromosomal positions contracted or expanded. We also examined probands to identify STR expansions at known pathogenic loci. The genes TCF4, AR, and DMPK showed significant expansions with lengths 250% greater than their various average allele lengths in 49, 162, and 11 individuals respectively. All 49 individuals containing an expansion in TCF4 and six individuals containing an expansion in DMPK presented with allele lengths longer than the known pathogenic length for these genes. Next, we identified individuals with significant expansions in highly conserved loci across all ancestries. Eighty loci in conserved regions met criteria for divergence. Two of these individuals were found to have exonic STR expansions: one in ZBTB4 and the other in SLC9A7, which is associated with X-linked mental retardation. Finally, we used parent-child trios to detect and analyze de novo mutations. In total, we observed 3,219 de novo expansions, where proband allele lengths are greater than twice the longest parental allele length. This work helps lay the foundation for understanding STR lengths genome-wide across ancestries and may help identify new disease genes and novel mechanisms of pathogenicity in known disease genes.
Collapse
Affiliation(s)
| | - Sergey Batalov
- Rady Children’s Institute for Genomic Medicine, San Diego, CA, United States of America
| | - Yan Ding
- Rady Children’s Institute for Genomic Medicine, San Diego, CA, United States of America
| | - Erica Sanford
- Rady Children’s Institute for Genomic Medicine, San Diego, CA, United States of America
- Cedars-Sinai Medical Center, Los Angeles, CA, United States of America
| | - Stephen F. Kingsmore
- Rady Children’s Institute for Genomic Medicine, San Diego, CA, United States of America
| | - David Dimmock
- Rady Children’s Institute for Genomic Medicine, San Diego, CA, United States of America
| | - Charlotte Hobbs
- Rady Children’s Institute for Genomic Medicine, San Diego, CA, United States of America
| | - Matthew Bainbridge
- Rady Children’s Institute for Genomic Medicine, San Diego, CA, United States of America
- * E-mail:
| |
Collapse
|
2
|
Weisweiler M, Arlt C, Wu PY, Van Inghelandt D, Hartwig T, Stich B. Structural variants in the barley gene pool: precision and sensitivity to detect them using short-read sequencing and their association with gene expression and phenotypic variation. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:3511-3529. [PMID: 36029318 PMCID: PMC9519679 DOI: 10.1007/s00122-022-04197-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 08/03/2022] [Indexed: 06/15/2023]
Abstract
Structural variants (SV) of 23 barley inbreds, detected by the best combination of SV callers based on short-read sequencing, were associated with genome-wide and gene-specific gene expression and, thus, were evaluated to predict agronomic traits. In human genetics, several studies have shown that phenotypic variation is more likely to be caused by structural variants (SV) than by single nucleotide variants. However, accurate while cost-efficient discovery of SV in complex genomes remains challenging. The objectives of our study were to (i) facilitate SV discovery studies by benchmarking SV callers and their combinations with respect to their sensitivity and precision to detect SV in the barley genome, (ii) characterize the occurrence and distribution of SV clusters in the genomes of 23 barley inbreds that are the parents of a unique resource for mapping quantitative traits, the double round robin population, (iii) quantify the association of SV clusters with transcript abundance, and (iv) evaluate the use of SV clusters for the prediction of phenotypic traits. In our computer simulations based on a sequencing coverage of 25x, a sensitivity > 70% and precision > 95% was observed for all combinations of SV types and SV length categories if the best combination of SV callers was used. We observed a significant (P < 0.05) association of gene-associated SV clusters with global gene-specific gene expression. Furthermore, about 9% of all SV clusters that were within 5 kb of a gene were significantly (P < 0.05) associated with the gene expression of the corresponding gene. The prediction ability of SV clusters was higher compared to that of single-nucleotide polymorphisms from an array across the seven studied phenotypic traits. These findings suggest the usefulness of exploiting SV information when fine mapping and cloning the causal genes underlying quantitative traits as well as the high potential of using SV clusters for the prediction of phenotypes in diverse germplasm sets.
Collapse
Affiliation(s)
- Marius Weisweiler
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Christopher Arlt
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Po-Ya Wu
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Delphine Van Inghelandt
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Thomas Hartwig
- Institute for Molecular Physiology, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany.
- Cluster of Excellence on Plant Sciences, From Complex Traits towards Synthetic Modules, Universitätsstraße 1, 40225, Düsseldorf, Germany.
| |
Collapse
|
3
|
Pholtaisong J, Chaiyaratana N, Aporntewan C, Mutirangura A. Mononucleotide A-repeats may Play a Regulatory Role in Endothermic Housekeeping Genes. Evol Bioinform Online 2022; 18:11769343221110656. [PMID: 35860694 PMCID: PMC9290108 DOI: 10.1177/11769343221110656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 07/02/2022] [Indexed: 11/24/2022] Open
Abstract
Background: Coding and non-coding short tandem repeats (STRs) facilitate a great diversity of phenotypic traits. The imbalance of mononucleotide A-repeats around transcription start sites (TSSs) was found in 3 mammals: H. sapiens, M. musculus, and R. norvegicus. Principal Findings: We found that the imbalance pattern originated in some vertebrates. A similar pattern was observed in mammals and birds, but not in amphibians and reptiles. We proposed that the enriched A-repeats upstream of TSSs is a novel hallmark of endotherms or warm-blooded animals. Gene ontology analysis indicates that the primary function of upstream A-repeats involves metabolism, cellular transportation, and sensory perception (smell and chemical stimulus) through housekeeping genes. Conclusions: Upstream A-repeats may play a regulatory role in the metabolic process of endothermic animals.
Collapse
Affiliation(s)
- Jatuphol Pholtaisong
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Pathumwan, Bangkok, Thailand
| | - Nachol Chaiyaratana
- Department of Electrical and Computer Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, Bangkok, Thailand.,Division of Medical Genetics Research and Laboratory, Research Department, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Chatchawit Aporntewan
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Pathumwan, Bangkok, Thailand.,Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Pathumwan, Bangkok, Thailand.,Omics Sciences and Bioinformatics Center, Chulalongkorn University, Pathumwan, Bangkok, Thailand
| | - Apiwat Mutirangura
- Center of Excellence in Molecular Genetics of Cancer and Human Diseases, Department of Anatomy, Faculty of Medicine, Chulalongkorn University, Pathumwan, Bangkok, Thailand
| |
Collapse
|
4
|
Xiao X, Zhang CY, Zhang Z, Hu Z, Li M, Li T. Revisiting tandem repeats in psychiatric disorders from perspectives of genetics, physiology, and brain evolution. Mol Psychiatry 2022; 27:466-475. [PMID: 34650204 DOI: 10.1038/s41380-021-01329-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 09/16/2021] [Accepted: 09/28/2021] [Indexed: 01/28/2023]
Abstract
Genome-wide association studies (GWASs) have revealed substantial genetic components comprised of single nucleotide polymorphisms (SNPs) in the heritable risk of psychiatric disorders. However, genetic risk factors not covered by GWAS also play pivotal roles in these illnesses. Tandem repeats, which are likely functional but frequently overlooked by GWAS, may account for an important proportion in the "missing heritability" of psychiatric disorders. Despite difficulties in characterizing and quantifying tandem repeats in the genome, studies have been carried out in an attempt to describe impact of tandem repeats on gene regulation and human phenotypes. In this review, we have introduced recent research progress regarding the genomic distribution and regulatory mechanisms of tandem repeats. We have also summarized the current knowledge of the genetic architecture and biological underpinnings of psychiatric disorders brought by studies of tandem repeats. These findings suggest that tandem repeats, in candidate psychiatric risk genes or in different levels of linkage disequilibrium (LD) with psychiatric GWAS SNPs and haplotypes, may modulate biological phenotypes related to psychiatric disorders (e.g., cognitive function and brain physiology) through regulating alternative splicing, promoter activity, enhancer activity and so on. In addition, many tandem repeats undergo tight natural selection in the human lineage, and likely exert crucial roles in human brain evolution. Taken together, the putative roles of tandem repeats in the pathogenesis of psychiatric disorders is strongly implicated, and using examples from previous literatures, we wish to call for further attention to tandem repeats in the post-GWAS era of psychiatric disorders.
Collapse
Affiliation(s)
- Xiao Xiao
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Chu-Yi Zhang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.,Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Zhuohua Zhang
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Zhonghua Hu
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Department of Critical Care Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Hunan Key Laboratory of Animal Models for Human Diseases, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Eye Center of Xiangya Hospital and Hunan Key Laboratory of Ophthalmology, Central South University, Changsha, Hunan, China. .,National Clinical Research Center on Mental Disorders, Changsha, Hunan, China.
| | - Ming Li
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China. .,CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China. .,KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| | - Tao Li
- Affiliated Mental Health Center & Hangzhou Seventh People's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China. .,Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Guangzhou, China.
| |
Collapse
|
5
|
Chiu R, Rajan-Babu IS, Friedman JM, Birol I. Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences. Genome Biol 2021; 22:224. [PMID: 34389037 PMCID: PMC8361843 DOI: 10.1186/s13059-021-02447-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 07/26/2021] [Indexed: 12/11/2022] Open
Abstract
Tandem repeat (TR) expansion is the underlying cause of over 40 neurological disorders. Long-read sequencing offers an exciting avenue over conventional technologies for detecting TR expansions. Here, we present Straglr, a robust software tool for both targeted genotyping and novel expansion detection from long-read alignments. We benchmark Straglr using various simulations, targeted genotyping data of cell lines carrying expansions of known diseases, and whole genome sequencing data with chromosome-scale assembly. Our results suggest that Straglr may be useful for investigating disease-associated TR expansions using long-read sequencing.
Collapse
Affiliation(s)
- Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
- BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical and Molecular Genetics, King's College London, Strand, London, WC2R 2LS, UK
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
- BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada.
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada.
| |
Collapse
|
6
|
Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat Genet 2021; 53:779-786. [PMID: 33972781 DOI: 10.1038/s41588-021-00865-4] [Citation(s) in RCA: 123] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 04/05/2021] [Indexed: 01/05/2023]
Abstract
Long-read sequencing (LRS) promises to improve the characterization of structural variants (SVs). We generated LRS data from 3,622 Icelanders and identified a median of 22,636 SVs per individual (a median of 13,353 insertions and 9,474 deletions). We discovered a set of 133,886 reliably genotyped SV alleles and imputed them into 166,281 individuals to explore their effects on diseases and other traits. We discovered an association of a rare deletion in PCSK9 with lower low-density lipoprotein (LDL) cholesterol levels, compared to the population average. We also discovered an association of a multiallelic SV in ACAN with height; we found 11 alleles that differed in the number of a 57-bp-motif repeat and observed a linear relationship between the number of repeats carried and height. These results show that SVs can be accurately characterized at the population scale using LRS data in a genome-wide non-targeted approach and demonstrate how SVs impact phenotypes.
Collapse
|
7
|
Eslami Rasekh M, Hernández Y, Drinan SD, Fuxman Bass J, Benson G. Genome-wide characterization of human minisatellite VNTRs: population-specific alleles and gene expression differences. Nucleic Acids Res 2021; 49:4308-4324. [PMID: 33849068 PMCID: PMC8096271 DOI: 10.1093/nar/gkab224] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 03/06/2021] [Accepted: 03/18/2021] [Indexed: 11/12/2022] Open
Abstract
Variable Number Tandem Repeats (VNTRs) are tandem repeat (TR) loci that vary in copy number across a population. Using our program, VNTRseek, we analyzed human whole genome sequencing datasets from 2770 individuals in order to detect minisatellite VNTRs, i.e., those with pattern sizes ≥7 bp. We detected 35 638 VNTR loci and classified 5676 as commonly polymorphic (i.e. with non-reference alleles occurring in >5% of the population). Commonly polymorphic VNTR loci were found to be enriched in genomic regions with regulatory function, i.e. transcription start sites and enhancers. Investigation of the commonly polymorphic VNTRs in the context of population ancestry revealed that 1096 loci contained population-specific alleles and that those could be used to classify individuals into super-populations with near-perfect accuracy. Search for quantitative trait loci (eQTLs), among the VNTRs proximal to genes, indicated that in 187 genes expression differences correlated with VNTR genotype. We validated our predictions in several ways, including experimentally, through the identification of predicted alleles in long reads, and by comparisons showing consistency between sequencing platforms. This study is the most comprehensive analysis of minisatellite VNTRs in the human population to date.
Collapse
Affiliation(s)
| | - Yözen Hernández
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | | | - Juan I Fuxman Bass
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biology, Boston University, Boston, MA 02215, USA
| | - Gary Benson
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biology, Boston University, Boston, MA 02215, USA
- Department of Computer Science, Boston University, Boston, MA 02215, USA
| |
Collapse
|
8
|
Li D, Pan S, Zhang H, Fu Y, Peng Z, Zhang L, Peng S, Xu F, Huang H, Shi R, Zheng H, Peng Y, Tan Z. A comprehensive microsatellite landscape of human Y-DNA at kilobase resolution. BMC Genomics 2021; 22:76. [PMID: 33482734 PMCID: PMC7821415 DOI: 10.1186/s12864-021-07389-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 01/13/2021] [Indexed: 12/12/2022] Open
Abstract
Background Though interest in human simple sequence repeats (SSRs) is increasing, little is known about the exact distributional features of numerous SSRs in human Y-DNA at chromosomal level. Herein, totally 540 maps were established, which could clearly display SSR landscape in every bin of 1 k base pairs (Kbp) along the sequenced part of human reference Y-DNA (NC_000024.10), by our developed differential method for improving the existing method to reveal SSR distributional characteristics in large genomic sequences. Results The maps show that SSRs accumulate significantly with forming density peaks in at least 2040 bins of 1 Kbp, which involve different coding, noncoding and intergenic regions of the Y-DNA, and 10 especially high density peaks were reported to associate with biological significances, suggesting that the other hundreds of especially high density peaks might also be biologically significant and worth further analyzing. In contrast, the maps also show that SSRs are extremely sparse in at least 207 bins of 1 Kbp, including many noncoding and intergenic regions of the Y-DNA, which is inconsistent with the widely accepted view that SSRs are mostly rich in these regions, and these sparse distributions are possibly due to powerfully regional selection. Additionally, many regions harbor SSR clusters with same or similar motif in the Y-DNA. Conclusions These 540 maps may provide the important information of clearly position-related SSR distributional features along the human reference Y-DNA for better understanding the genome structures of the Y-DNA. This study may contribute to further exploring the biological significance and distribution law of the huge numbers of SSRs in human Y-DNA. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07389-5.
Collapse
Affiliation(s)
- Douyue Li
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Saichao Pan
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Hongxi Zhang
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Yongzhuo Fu
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Zhuli Peng
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Liang Zhang
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Shan Peng
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Fei Xu
- Department of Mathematics, Wilfrid Laurier University, Waterloo, Ontario, N2L 3C5, Canada
| | - Hanrou Huang
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Ruixue Shi
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Heping Zheng
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Yousong Peng
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Zhongyang Tan
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China.
| |
Collapse
|
9
|
Kim J, Copeland CE, Seki K, Vögeli B, Kwon YC. Tuning the Cell-Free Protein Synthesis System for Biomanufacturing of Monomeric Human Filaggrin. Front Bioeng Biotechnol 2020; 8:590341. [PMID: 33195157 PMCID: PMC7658397 DOI: 10.3389/fbioe.2020.590341] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Accepted: 10/05/2020] [Indexed: 12/20/2022] Open
Abstract
The modern cell-free protein synthesis (CFPS) system is expanding the opportunity of cell-free biomanufacturing as a versatile platform for synthesizing various therapeutic proteins. However, synthesizing human protein in the bacterial CFPS system remains challenging due to the low expression level, protein misfolding, inactivity, and more. These challenges limit the use of a bacterial CFPS system for human therapeutic protein synthesis. In this study, we demonstrated the improved performance of a customized CFPS platform for human therapeutic protein production by investigating the factors that limit cell-free transcription-translation. The improvement of the CFPS platform has been made in three ways. First, the cell extract was prepared from the rare tRNA expressed host strain, and CFPS was performed with a codon-optimized gene for Escherichia coli codon usage bias. The soluble protein yield was 15.2 times greater with the rare tRNA overexpressing host strain as cell extract and codon-optimized gene in the CFPS system. Next, we identify and prioritize the critical biomanufacturing factors for highly active crude cell lysate for human protein synthesis. Lastly, we engineer the CFPS reaction conditions to enhance protein yield. In this model, the therapeutic protein filaggrin expression was significantly improved by up to 23-fold, presenting 28 ± 5 μM of soluble protein yield. The customized CFPS system for filaggrin biomanufacturing described here demonstrates the potential of the CFPS system to be adapted for studying therapeutic proteins.
Collapse
Affiliation(s)
- Jeehye Kim
- Department of Biological and Agricultural Engineering, Louisiana State University, Baton Rouge, LA, United States
| | - Caroline E Copeland
- Department of Biological and Agricultural Engineering, Louisiana State University, Baton Rouge, LA, United States
| | - Kosuke Seki
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, United States.,Chemistry of Life Processes Institute, Northwestern University, Evanston, IL, United States
| | - Bastian Vögeli
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, United States
| | - Yong-Chan Kwon
- Department of Biological and Agricultural Engineering, Louisiana State University, Baton Rouge, LA, United States.,Louisiana State University Agricultural Center, Baton Rouge, LA, United States
| |
Collapse
|
10
|
Huang Y, Huang X, Zhou X, Wang J, Zhang R, Ma F, Wang K, Zhang Z, Dai X, Cao X, Zhang C, Han K, Ren Q. Immune activation by a multigene family of lectins with variable tandem repeats in oriental river prawn ( Macrobrachium nipponense). Open Biol 2020; 10:200141. [PMID: 32931720 PMCID: PMC7536079 DOI: 10.1098/rsob.200141] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Genomic regions with repeated sequences are unstable and prone to rapid DNA diversification. However, the role of tandem repeats within the coding region is not fully characterized. Here, we have identified a new hypervariable C-type lectin gene family with different numbers of tandem repeats (Rlecs; R means repeat) in oriental river prawn (Macrobrachium nipponense). Two types of repeat units (33 or 30 bp) are identified in the second exon, and the number of repeat units vary from 1 to 9. Rlecs can be classified into 15 types through phylogenetic analysis. The amino acid sequences in the same type of Rlec are highly conservative outside the repeat regions. The main differences among the Rlec types are evident in exon 5. A variable number of tandem repeats in Rlecs may be produced by slip mispairing during gene replication. Alternative splicing contributes to the multiplicity of forms in this lectin gene family, and different types of Rlecs vary in terms of tissue distribution, expression quantity and response to bacterial challenge. These variations suggest that Rlecs have functional diversity. The results of experiments on sugar binding, microbial inhibition and clearance, regulation of antimicrobial peptide gene expression and prophenoloxidase activation indicate that the function of Rlecs with the motif of YRSKDD in innate immunity is enhanced when the number of tandem repeats increases. Our results suggest that Rlecs undergo gene expansion through gene duplication and alternative splicing, which ultimately leads to functional diversity.
Collapse
Affiliation(s)
- Ying Huang
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China.,College of Oceanography, Hohai University, 1 Xikang Road, Nanjing, Jiangsu 210098, People's Republic of China
| | - Xin Huang
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Xuming Zhou
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Jialin Wang
- Hubei Key Laboratory of Genetic Regulation and Integrative Biology, School of Life Sciences, Central China Normal University, Wuhan 430079, People's Republic of China
| | - Ruidong Zhang
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Futong Ma
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Kaiqiang Wang
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Zhuoxing Zhang
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Xiaoling Dai
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Xueying Cao
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Chao Zhang
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Keke Han
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Qian Ren
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China.,Co-Innovation Center for Marine Bio-Industry Technology of Jiangsu Province, Lianyungang, Jiangsu 222005, People's Republic of China
| |
Collapse
|
11
|
Shortt JA, Ruggiero RP, Cox C, Wacholder AC, Pollock DD. Finding and extending ancient simple sequence repeat-derived regions in the human genome. Mob DNA 2020; 11:11. [PMID: 32095164 PMCID: PMC7027126 DOI: 10.1186/s13100-020-00206-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 02/04/2020] [Indexed: 12/19/2022] Open
Abstract
Background Previously, 3% of the human genome has been annotated as simple sequence repeats (SSRs), similar to the proportion annotated as protein coding. The origin of much of the genome is not well annotated, however, and some of the unidentified regions are likely to be ancient SSR-derived regions not identified by current methods. The identification of these regions is complicated because SSRs appear to evolve through complex cycles of expansion and contraction, often interrupted by mutations that alter both the repeated motif and mutation rate. We applied an empirical, kmer-based, approach to identify genome regions that are likely derived from SSRs. Results The sequences flanking annotated SSRs are enriched for similar sequences and for SSRs with similar motifs, suggesting that the evolutionary remains of SSR activity abound in regions near obvious SSRs. Using our previously described P-clouds approach, we identified ‘SSR-clouds’, groups of similar kmers (or ‘oligos’) that are enriched near a training set of unbroken SSR loci, and then used the SSR-clouds to detect likely SSR-derived regions throughout the genome. Conclusions Our analysis indicates that the amount of likely SSR-derived sequence in the human genome is 6.77%, over twice as much as previous estimates, including millions of newly identified ancient SSR-derived loci. SSR-clouds identified poly-A sequences adjacent to transposable element termini in over 74% of the oldest class of Alu (roughly, AluJ), validating the sensitivity of the approach. Poly-A’s annotated by SSR-clouds also had a length distribution that was more consistent with their poly-A origins, with mean about 35 bp even in older Alus. This work demonstrates that the high sensitivity provided by SSR-Clouds improves the detection of SSR-derived regions and will enable deeper analysis of how decaying repeats contribute to genome structure.
Collapse
Affiliation(s)
- Jonathan A Shortt
- 1Colorado Center for Personalized Medicine, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Robert P Ruggiero
- 2Department of Biology, Southeast Missouri State University, Cape Girardeau, MO 63701 USA
| | - Corey Cox
- 1Colorado Center for Personalized Medicine, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Aaron C Wacholder
- 3Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213 USA
| | - David D Pollock
- 4Department of Biochemistry & Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045 USA
| |
Collapse
|
12
|
Zhang T, Xing Y, Xu L, Bao G, Zhan Z, Yang Y, Wang J, Li S, Zhang D, Kang T. Comparative analysis of the complete chloroplast genome sequences of six species of Pulsatilla Miller, Ranunculaceae. Chin Med 2019; 14:53. [PMID: 31798674 PMCID: PMC6883693 DOI: 10.1186/s13020-019-0274-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 11/04/2019] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Baitouweng is a traditional Chinese medicine with a long history of different applications. Although referred to as a single medicine, Baitouweng is actually comprised of many closely related species. It is therefore critically important to identify the different species that are utilized in these medicinal applications. Knowledge about their phylogenetic relationships can be derived from their chloroplast genomes and may provide additional insights into development of molecular markers. METHODS Genomic DNA was extracted from six species of Pulsatilla and then sequenced on an Illumina HiSeq 4000. Sequences were assembled into contigs by SOAPdenovo 2.04, aligned to the reference genome using BLAST, and then manually corrected. Genome annotation was performed by the online DOGMA tool. General characteristics of the cp genomes of the six species were analyzed and compared with closely related species. Additionally, phylogenetic trees were constructed, based on single nucleotide polymorphisms (SNPs) and 51 shared protein-coding gene sequences in the cp genome among all 31 species via maximum likelihood. RESULTS The size of cp genomes of P. chinensis (Bge.) Regel, P. chinensis (Bge.) Regel var. kissii (Mandl) S. H. Li et Y. H. Huang, P. cernua (Thunb.) Bercht. et Opiz f. plumbea J. X. Ji et Y. T. zhao, P. dahurica (Fisch.) Spreng, P. turczaninovii Kryl. et Serg, and P. cernua (Thunb.) Bercht. et Opiz. were 163,851 bp, 163,756 bp, 162,481 bp, 162,450 bp, 162,795 bp, and 162,924 bp, respectively. Each species included two inverted repeat regions, a small single-copy region, and a large single-copy region. A total of 134 genes were annotated, including 90 protein-coding genes, 36 tRNAs, and eight rRNAs across all species. In simple sequence repeat analysis, only P. dahurica was found to contain hexanucleotide repeats. A total of 26, 39, 32, 37, 32 and 43 large repeat sequences were identified in the genic regions of the six Pulsatilla species. Nucleotide diversity analysis revealed that the rpl36 gene and ccsA-ndhD region have the highest Pi value. In addition, two phylogenetic trees of the cp genomes were constructed, which laced all Pulsatilla species into one branch within Ranunculaceae. CONCLUSIONS We identified and analyzed the cp genome features of six species of P. Miller, with implications for species identification and phylogenetic analysis.
Collapse
Affiliation(s)
- Tingting Zhang
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| | - Yanping Xing
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| | - Liang Xu
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
- Liaoning Quality Monitoring and Technology Service Center for Chinese Materia Medica Raw Materials, Dalian, China
| | - Guihua Bao
- School of Mongol Medicine, Inner Mongolia University for Nationalities, Tongliao, China
| | - Zhilai Zhan
- Traditional Chinese Medicine Resource Center, Chinese Academy of Traditional Chinese Medicine, Beijing, China
| | - Yanyun Yang
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| | - Jiahao Wang
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| | - Shengnan Li
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| | - Dachuan Zhang
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| | - Tingguo Kang
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
- Liaoning Quality Monitoring and Technology Service Center for Chinese Materia Medica Raw Materials, Dalian, China
| |
Collapse
|
13
|
Midha MK, Wu M, Chiu KP. Long-read sequencing in deciphering human genetics to a greater depth. Hum Genet 2019; 138:1201-1215. [PMID: 31538236 DOI: 10.1007/s00439-019-02064-y] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 09/13/2019] [Indexed: 12/12/2022]
Abstract
Through four decades' development, DNA sequencing has inched into the era of single-molecule sequencing (SMS), or the third-generation sequencing (TGS), as represented by two distinct technical approaches developed independently by Pacific Bioscience (PacBio) and Oxford Nanopore Technologies (ONT). Historically, each generation of sequencing technologies was marked by innovative technological achievements and novel applications. Long reads (LRs) are considered as the most advantageous feature of SMS shared by both PacBio and ONT to distinguish SMS from next-generation sequencing (NGS, or the second-generation sequencing) and Sanger sequencing (the first-generation sequencing). Long reads overcome the limitations of NGS and drastically improves the quality of genome assembly. Besides, ONT also contributes several unique features including ultra-long reads (ULRs) with read length above 300 kb and some close to 1 million bp, direct RNA sequencing and superior portability as made possible by pocket-sized MinION sequencer. Here, we review the history of DNA sequencing technologies and associated applications, with a special focus on the advantages as well as the limitations of ULR sequencing in genome assembly.
Collapse
Affiliation(s)
- Mohit K Midha
- Genomics Research Center, Academia Sinica, 128 Academia Road, Sec. 2, Nankang District, Taipei, 115, Taiwan.,Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei, Taiwan
| | - Mengchu Wu
- Health GeneTech, 22F No. 99, Xin Pu 6th St., Taoyuan, Taiwan
| | - Kuo-Ping Chiu
- Genomics Research Center, Academia Sinica, 128 Academia Road, Sec. 2, Nankang District, Taipei, 115, Taiwan. .,Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei, Taiwan. .,Department of Life Sciences, College of Life Sciences, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
14
|
Beh CW, Zhang Y, Zheng YL, Sun B, Wang TH. Fluorescence spectroscopic detection and measurement of single telomere molecules. Nucleic Acids Res 2019; 46:e117. [PMID: 30010842 PMCID: PMC6212783 DOI: 10.1093/nar/gky627] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Accepted: 06/28/2018] [Indexed: 01/26/2023] Open
Abstract
Telomeres are the end-caps of chromosomes that serve to protect the integrity of the genome. Below certain critical lengths, the telomeres can no longer fulfill their protective function, and chromosomal instability ensues. Telomeres shorten during normal cell division due to the end replication problem and are implicated in the development of various aging-associated diseases, including cancer. Telomere length has the potential to serve as a useful biomarker in the field of aging and cancer. However, existing methods of telomere measurement are either too laborious, unable to provide absolute measurement of individual telomere lengths, or limited to certain chromosomes or cell types. Here, we describe an easy single-molecule, fluorescence spectroscopic method for measuring the length of telomeres that permits the profiling of absolute telomere lengths in any DNA sample. We have demonstrated the accurate detection of telomeres as short as 100 bp using cloned telomere standards, and have profiled telomere lengths in human cancer cell lines and primary cells. Since this method allows direct comparison between samples, it could greatly improve the clinical utility of telomere biomarkers.
Collapse
Affiliation(s)
- Cyrus W Beh
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Ye Zhang
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Yun-Ling Zheng
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA
| | - Bing Sun
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA
| | - Tza-Huei Wang
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.,Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.,Institute for NanoBioTechnology, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
15
|
Zablotskaya A, Van Esch H, Verstrepen KJ, Froyen G, Vermeesch JR. Mapping the landscape of tandem repeat variability by targeted long read single molecule sequencing in familial X-linked intellectual disability. BMC Med Genomics 2018; 11:123. [PMID: 30567555 PMCID: PMC6299999 DOI: 10.1186/s12920-018-0446-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 12/06/2018] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The etiology of more than half of all patients with X-linked intellectual disability remains elusive, despite array-based comparative genomic hybridization, whole exome or genome sequencing. Since short read massive parallel sequencing approaches do not allow the detection of larger tandem repeat expansions, we hypothesized that such expansions could be a hidden cause of X-linked intellectual disability. METHODS We selectively captured over 1800 tandem repeats on the X chromosome and characterized them by long read single molecule sequencing in 3 families with idiopathic X-linked intellectual disability. RESULTS In male DNA samples, full tandem repeat length sequences were obtained for 88-93% of the targets and up to 99.6% of the repeats with a moderate guanine-cytosine content. Read length and analysis pipeline allow to detect cases of > 900 bp tandem repeat expansion. In one family, one repeat expansion co-occurs with down-regulation of the neighboring MIR222 gene. This gene has previously been implicated in intellectual disability and is apparently linked to FMR1 and NEFH overexpression associated with neurological disorders. CONCLUSIONS This study demonstrates the power of single molecule sequencing to measure tandem repeat lengths and detect expansions, and suggests that tandem repeat mutations may be a hidden cause of X-linked intellectual disability.
Collapse
Affiliation(s)
- Alena Zablotskaya
- Department of Human Genetics and Center for Human Genetics, Laboratory for Cytogenetics and Genome Research, University Hospitals Leuven, KU Leuven, O&N I Herestraat 49 - box 606, 3000, Leuven, Belgium
| | - Hilde Van Esch
- Department of Human Genetics and Center for Human Genetics, Laboratory for Genetics of Cognition, University Hospitals Leuven, KU Leuven, O&N I Herestraat 49 - box 606, 3000, Leuven, Belgium
| | - Kevin J Verstrepen
- VIB Center for Microbiology and CMPG Lab for Genetics and Genomics, KU Leuven, Gaston Geenslaan 1 - box 2471, 3001, Leuven, Belgium
| | - Guy Froyen
- Clinical Biology, Laboratory for Molecular Diagnostics, Jessa Hospital, Stadsomvaart 11, 3500, Hasselt, Belgium
| | - Joris R Vermeesch
- Department of Human Genetics and Center for Human Genetics, Laboratory for Cytogenetics and Genome Research, University Hospitals Leuven, KU Leuven, O&N I Herestraat 49 - box 606, 3000, Leuven, Belgium.
| |
Collapse
|
16
|
Lv J, Jiao W, Guo H, Liu P, Wang R, Zhang L, Zeng Q, Hu X, Bao Z, Wang S. HD-Marker: a highly multiplexed and flexible approach for targeted genotyping of more than 10,000 genes in a single-tube assay. Genome Res 2018; 28:1919-1930. [PMID: 30409770 PMCID: PMC6280760 DOI: 10.1101/gr.235820.118] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2018] [Accepted: 10/25/2018] [Indexed: 01/03/2023]
Abstract
Targeted genotyping of transcriptome-scale genetic markers is highly attractive for genetic, ecological, and evolutionary studies, but achieving this goal in a cost-effective manner remains a major challenge, especially for laboratories working on nonmodel organisms. Here, we develop a high-throughput, sequencing-based GoldenGate approach (called HD-Marker), which addresses the array-related issues of original GoldenGate methodology and allows for highly multiplexed and flexible targeted genotyping of more than 12,000 loci in a single-tube assay (in contrast to fewer than 3100 in the original GoldenGate assay). We perform extensive analyses to demonstrate the power and performance of HD-Marker on various multiplex levels (296, 795, 1293, and 12,472 genic SNPs) across two sequencing platforms in two nonmodel species (the scallops Chlamys farreri and Patinopecten yessoensis), with extremely high capture rate (98%-99%) and genotyping accuracy (97%-99%). We also demonstrate the potential of HD-Marker for high-throughput targeted genotyping of alternative marker types (e.g., microsatellites and indels). With its remarkable cost-effectiveness (as low as $0.002 per genotype) and high flexibility in choice of multiplex levels and marker types, HD-Marker provides a highly attractive tool over array-based platforms for fulfilling genome/transcriptome-wide targeted genotyping applications, especially in nonmodel organisms.
Collapse
Affiliation(s)
- Jia Lv
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China
| | - Wenqian Jiao
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Haobing Guo
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Pingping Liu
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Ruijia Wang
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Lingling Zhang
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China
| | - Qifan Zeng
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China
| | - Xiaoli Hu
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.,Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China
| | - Zhenmin Bao
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.,Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China
| | - Shi Wang
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China
| |
Collapse
|
17
|
Witkos TM, Krzyzosiak WJ, Fiszer A, Koscianska E. A potential role of extended simple sequence repeats in competing endogenous RNA crosstalk. RNA Biol 2018; 15:1399-1409. [PMID: 30381983 PMCID: PMC6284579 DOI: 10.1080/15476286.2018.1536593] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
MicroRNA (miRNA)-mediated crosstalk between coding and non-coding RNAs of various types is known as the competing endogenous RNA (ceRNA) concept. Here, we propose that there is a specific variant of the ceRNA language that takes advantage of simple sequence repeat (SSR) wording. We applied bioinformatics tools to identify human transcripts that may be regarded as repeat-associated ceRNAs (raceRNAs). Multiple protein-coding transcripts, transcribed pseudogenes, long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) showing this potential were identified, and numerous miRNAs were predicted to bind to SSRs. We propose that simple repeats expanded in various hereditary neurological diseases may act as sponges for miRNAs containing complementary repeats that would affect raceRNA crosstalk. Based on the representation of specific SSRs in transcripts, expression data for SSR-binding miRNAs and expression profiling data from patients, we determined that raceRNA crosstalk is most likely to be perturbed in the case of myotonic dystrophy type 1 (DM1) and type 2 (DM2).
Collapse
Affiliation(s)
- Tomasz M Witkos
- a Department of Molecular Biomedicine , Institute of Bioorganic Chemistry, Polish Academy of Sciences , Poznan , Poland
| | - Wlodzimierz J Krzyzosiak
- a Department of Molecular Biomedicine , Institute of Bioorganic Chemistry, Polish Academy of Sciences , Poznan , Poland
| | - Agnieszka Fiszer
- a Department of Molecular Biomedicine , Institute of Bioorganic Chemistry, Polish Academy of Sciences , Poznan , Poland
| | - Edyta Koscianska
- a Department of Molecular Biomedicine , Institute of Bioorganic Chemistry, Polish Academy of Sciences , Poznan , Poland
| |
Collapse
|
18
|
Genetic structure and polymorphisms of Gelao ethnicity residing in southwest china revealed by X-chromosomal genetic markers. Sci Rep 2018; 8:14585. [PMID: 30275508 PMCID: PMC6167355 DOI: 10.1038/s41598-018-32945-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 09/19/2018] [Indexed: 01/10/2023] Open
Abstract
X-chromosome short tandem repeat markers (X-STRs), due to their special inheritance models, physical location on a single chromosome and the absence of recombination in male meiosis, play an important role in forensic and population genetics. While a series of genetic analyses focusing on the genetic diversity and forensic characteristics of X-STRs are well studied for ethnically/linguistically diverse and demographically large Chinese populations, genetic evidence from Gelao ethnicity is still sparse. Here, we genotyped the first batch of 19 X-STRs in 513 Chinese Gelao individuals (265 females and 248 males), and reported genetic polymorphisms, forensic characteristics based on the single locus and seven linkage groups. DXS10135 with the highest PIC (0.9106) and LG1 (DXS10148-DXS10135-DXS8378) with the largest HD (0.9970) are polymorphic and informative. The CPDs in Gelao males and females are respectively larger than 0.999999999997095 and 0.99999999999999999999918, and the combined MECs are larger than 0.999999975715109. Subsequently, we investigated the population relationships among 14 Chinese populations based on 19 X-STRs and among 23 populations based on 11 overlapped X-STRs. Our results revealed genetic differentiations among Tibeto-Burman, Altaic and other Chinese homogenous populations, and demonstrated that Guizhou Gelao has the genetically closer relationships with Han Chinese and geographically close Guizhou Miao.
Collapse
|
19
|
Kristmundsdóttir S, Sigurpálsdóttir BD, Kehr B, Halldórsson BV. popSTR: population-scale detection of STR variants. Bioinformatics 2018; 33:4041-4048. [PMID: 27591079 DOI: 10.1093/bioinformatics/btw568] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Accepted: 08/26/2016] [Indexed: 11/14/2022] Open
Abstract
Motivation Microsatellites, also known as short tandem repeats (STRs), are tracts of repetitive DNA sequences containing motifs ranging from two to six bases. Microsatellites are one of the most abundant type of variation in the human genome, after single nucleotide polymorphisms (SNPs) and Indels. Microsatellite analysis has a wide range of applications, including medical genetics, forensics and construction of genetic genealogy. However, microsatellite variations are rarely considered in whole-genome sequencing studies, in large due to a lack of tools capable of analyzing them. Results Here we present a microsatellite genotyper, optimized for Illumina WGS data, which is both faster and more accurate than other methods previously presented. There are two main ingredients to our improvements. First we reduce the amount of sequencing data necessary for creating microsatellite profiles by using previously aligned sequencing data. Second, we use population information to train microsatellite and individual specific error profiles. By comparing our genotyping results to genotypes generated by capillary electrophoresis we show that our error rates are 50% lower than those of lobSTR, another program specifically developed to determine microsatellite genotypes. Availability and Implementation Source code is available on Github: https://github.com/DecodeGenetics/popSTR. Contact snaedis.kristmundsdottir@decode.is or bjarni.halldorsson@decode.is.
Collapse
Affiliation(s)
| | | | | | - Bjarni V Halldórsson
- deCODE genetics/Amgen.,School of Science and Engineering, Reykjavík University, Reykjavík, 101, Iceland
| |
Collapse
|
20
|
Repeat length variations in polyglutamine disease-associated genes affect body mass index. Int J Obes (Lond) 2018; 43:440-449. [PMID: 30120431 DOI: 10.1038/s41366-018-0161-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Revised: 05/15/2018] [Accepted: 06/15/2018] [Indexed: 11/08/2022]
Abstract
BACKGROUND The worldwide prevalence of obesity, a major risk factor for numerous debilitating chronic disorders, is increasing rapidly. Although a substantial amount of the variation in body mass index (BMI) is estimated to be heritable, the largest meta-analysis of genome-wide association studies (GWAS) to date explained only ~2.7% of the variation. To tackle this 'missing heritability' problem of obesity, here we focused on the contribution of DNA repeat length polymorphisms which are not detectable by GWAS. SUBJECTS AND METHODS We determined the cytosine-adenine-guanine (CAG) repeat length in the nine known polyglutamine disease-associated genes (ATXN1, ATXN2, ATXN3, CACNA1A, ATXN7, TBP, HTT, ATN1 and AR) in two large cohorts consisting of 12,457 individuals and analyzed their association with BMI, using generalized linear mixed-effect models. RESULTS We found a significant association between BMI and the length of CAG repeats in seven polyglutamine disease-associated genes (including ATXN1, ATXN2, ATXN3, CACNA1A, ATXN7, TBP and AR). Importantly, these repeat variations could account for 0.75% of the total BMI variation. CONCLUSIONS Our findings incriminate repeat polymorphisms as an important novel class of genetic risk factors of obesity and highlight the role of the brain in its pathophysiology.
Collapse
|
21
|
Ganesamoorthy D, Cao MD, Duarte T, Chen W, Coin L. GtTR: Bayesian estimation of absolute tandem repeat copy number using sequence capture and high throughput sequencing. BMC Bioinformatics 2018; 19:267. [PMID: 30012093 PMCID: PMC6048696 DOI: 10.1186/s12859-018-2282-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 07/09/2018] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Tandem repeats comprise significant proportion of the human genome including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between individuals. Despite recent advances in high throughput sequencing, analysis of tandem repeats in the context of complex diseases is still hindered by technical limitations. We report a novel targeted sequencing approach, which allows simultaneous analysis of hundreds of repeats. We developed a Bayesian algorithm, namely - GtTR - which combines information from a reference long-read dataset with a short read counting approach to genotype tandem repeats at population scale. PCR sizing analysis was used for validation. RESULTS We used a PacBio long-read sequenced sample to generate a reference tandem repeat genotype dataset with on average 13% absolute deviation from PCR sizing results. Using this reference dataset GtTR generated estimates of VNTR copy number with accuracy within 95% high posterior density (HPD) intervals of 68 and 83% for capture sequence data and 200X WGS data respectively, improving to 87 and 94% with use of a PCR reference. We show that the genotype resolution increases as a function of depth, such that the median 95% HPD interval lies within 25, 14, 12 and 8% of the its midpoint copy number value for 30X, 200X WGS, 395X and 800X capture sequence data respectively. We validated nine targets by PCR sizing analysis and genotype estimates from sequencing results correlated well with PCR results. CONCLUSIONS The novel genotyping approach described here presents a new cost-effective method to explore previously unrecognized class of repeat variation in GWAS studies of complex diseases at the population level. Further improvements in accuracy can be obtained by improving accuracy of the reference dataset.
Collapse
Affiliation(s)
- Devika Ganesamoorthy
- Institute for Molecular Biosciences, University of Queensland, Brisbane, Australia
| | - Minh Duc Cao
- Institute for Molecular Biosciences, University of Queensland, Brisbane, Australia
| | - Tania Duarte
- Institute for Molecular Biosciences, University of Queensland, Brisbane, Australia
| | - Wenhan Chen
- Institute for Molecular Biosciences, University of Queensland, Brisbane, Australia
| | - Lachlan Coin
- Institute for Molecular Biosciences, University of Queensland, Brisbane, Australia
| |
Collapse
|
22
|
Genovese LM, Geraci F, Corrado L, Mangano E, D'Aurizio R, Bordoni R, Severgnini M, Manzini G, De Bellis G, D'Alfonso S, Pellegrini M. A Census of Tandemly Repeated Polymorphic Loci in Genic Regions Through the Comparative Integration of Human Genome Assemblies. Front Genet 2018; 9:155. [PMID: 29770143 PMCID: PMC5941971 DOI: 10.3389/fgene.2018.00155] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 04/13/2018] [Indexed: 11/29/2022] Open
Abstract
Polymorphic Tandem Repeat (PTR) is a common form of polymorphism in the human genome. A PTR consists in a variation found in an individual (or in a population) of the number of repeating units of a Tandem Repeat (TR) locus of the genome with respect to the reference genome. Several phenotypic traits and diseases have been discovered to be strongly associated with or caused by specific PTR loci. PTR are further distinguished in two main classes: Short Tandem Repeats (STR) when the repeating unit has size up to 6 base pairs, and Variable Number Tandem Repeats (VNTR) for repeating units of size above 6 base pairs. As larger and larger populations are screened via high throughput sequencing projects, it becomes technically feasible and desirable to explore the association between PTR and a panoply of such traits and conditions. In order to facilitate these studies, we have devised a method for compiling catalogs of PTR from assembled genomes, and we have produced a catalog of PTR for genic regions (exons, introns, UTR and adjacent regions) of the human genome (GRCh38). We applied four different TR discovery software tools to uncover in the first phase 55,223,485 TR (after duplicate removal) in GRCh38, of which 373,173 were determined to be PTR in the second phase by comparison with five assembled human genomes. Of these, 263,266 are not included by state-of-the-art PTR catalogs. The new methodology is mainly based on a hierarchical and systematic application of alignment-based sequence comparisons to identify and measure the polymorphism of TR. While previous catalogs focus on the class of STR of small total size, we remove any size restrictions, aiming at the more general class of PTR, and we also target fuzzy TR by using specific detection tools. Similarly to other previous catalogs of human polymorphic loci, we focus our catalog toward applications in the discovery of disease-associated loci. Validation by cross-referencing with existing catalogs on common clinically-relevant loci shows good concordance. Overall, this proposed census of human PTR in genic regions is a shared resource (web accessible), complementary to existing catalogs, facilitating future genome-wide studies involving PTR.
Collapse
Affiliation(s)
| | - Filippo Geraci
- Institute for Informatics and Telematics of CNR, Pisa, Italy
| | - Lucia Corrado
- Department of Health Sciences, University of Eastern Piedmont Amedeo Avogadro, Novara, Italy
| | | | | | - Roberta Bordoni
- Institute for Biomedical Technologies of CNR, Segrate, Italy
| | | | - Giovanni Manzini
- Institute for Informatics and Telematics of CNR, Pisa, Italy.,Department of Science and Technological Innovation, University of Eastern Piedmont Amedeo Avogadro, Novara, Italy
| | | | - Sandra D'Alfonso
- Department of Health Sciences, University of Eastern Piedmont Amedeo Avogadro, Novara, Italy
| | | |
Collapse
|
23
|
Abstract
Accumulating evidence suggests that many classes of DNA repeats exhibit attributes that distinguish them from other genetic variants, including the fact that they are more liable to mutation; this enables them to mediate genetic plasticity. The expansion of tandem repeats, particularly of short tandem repeats, can cause a range of disorders (including Huntington disease, various ataxias, motor neuron disease, frontotemporal dementia, fragile X syndrome and other neurological disorders), and emerging data suggest that tandem repeat polymorphisms (TRPs) can also regulate gene expression in healthy individuals. TRPs in human genomes may also contribute to the missing heritability of polygenic disorders. A better understanding of tandem repeats and their associated repeatome, as well as their capacity for genetic plasticity via both germline and somatic mutations, is needed to transform our understanding of the role of TRPs in health and disease.
Collapse
Affiliation(s)
- Anthony J Hannan
- Florey Institute of Neuroscience and Mental Health, University of Melbourne.,Department of Anatomy and Neuroscience, University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
24
|
Franco ME, Bitencourt TA, Marins M, Fachin AL. In silico characterization of tandem repeats in Trichophyton rubrum and related dermatophytes provides new insights into their role in pathogenesis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2017:3866792. [PMID: 29220431 PMCID: PMC5502367 DOI: 10.1093/database/bax035] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 03/28/2017] [Indexed: 01/01/2023]
Abstract
Trichophyton rubrum is the most common etiological agent of dermatophytoses worldwide, which is able to degrade keratinized tissues. The sequencing of the genome of different dermatophyte species has provided a large amount of data, including tandem repeats that may play a role in genetic variability and in the pathogenesis of these fungi. Tandem repeats are adjacent DNA sequences of 2–200 nucleotides in length, which exert regulatory and adaptive functions. These repetitive DNA sequences are found in different classes of fungal proteins, especially those involved in cell adhesion, a determinant factor for the establishment of fungal infection. The objective of this study was to develop a Dermatophyte Tandem Repeat Database (DTRDB) for the storage and identification of tandem repeats in T. rubrum and six other dermatophyte species. The current version of the database contains 35 577 tandem repeats detected in 16 173 coding sequences. The repeats can be searched using entry parameters such as repeat unit length (nt—nucleotide), repeat number, variability score, and repeat sequence motif. These data were used to study the relative frequency and distribution of repeats in the sequences, as well as their possible functions in dermatophytes. A search of the database revealed that these repeats occur in 22–33% of genes transcribed in dermatophytes where they could be involved in the success of adaptation to the host tissue and establishment of infection. The repeats were detected in transcripts that are mainly related to three biological processes: regulation, adhesion, and metabolism. The database developed enables users to identify and analyse tandem repeat regions in target genes related to pathogenicity and fungal–host interactions in dermatophytes and may contribute to the discovery of new targets for the development of antifungal agents. Database URL:http://comp.mch.ifsuldeminas.edu.br/dtrdb/
Collapse
Affiliation(s)
- Matheus Eloy Franco
- Unidade de Biotecnologia, Universidade de Ribeirão Preto, Av: Costabile Romano 2201, 14096-900, Ribeirao Preto SP, Brazil.,Federal Institute of Education, Science and Technology of South of Minas Gerais - IFSULDEMINAS, 37750-000, Brazil
| | - Tamires Aparecida Bitencourt
- Unidade de Biotecnologia, Universidade de Ribeirão Preto, Av: Costabile Romano 2201, 14096-900, Ribeirao Preto SP, Brazil.,Departamento de Genetica, 049-900, FMRP-USP, SP, Brazil
| | - Mozart Marins
- Unidade de Biotecnologia, Universidade de Ribeirão Preto, Av: Costabile Romano 2201, 14096-900, Ribeirao Preto SP, Brazil.,Curso de Medicina, Universidade de Ribeirão Preto, SP, Brazil
| | - Ana Lúcia Fachin
- Unidade de Biotecnologia, Universidade de Ribeirão Preto, Av: Costabile Romano 2201, 14096-900, Ribeirao Preto SP, Brazil.,Curso de Medicina, Universidade de Ribeirão Preto, SP, Brazil
| |
Collapse
|
25
|
A microsatellite repeat in PCA3 long non-coding RNA is associated with prostate cancer risk and aggressiveness. Sci Rep 2017; 7:16862. [PMID: 29203868 PMCID: PMC5715103 DOI: 10.1038/s41598-017-16700-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Accepted: 11/10/2017] [Indexed: 01/08/2023] Open
Abstract
Short tandem repeats (STRs) are repetitive sequences of a polymorphic stretch of two to six nucleotides. We hypothesized that STRs are associated with prostate cancer development and/or progression. We undertook RNA sequencing analysis of prostate tumors and adjacent non-malignant cells to identify polymorphic STRs that are readily expressed in these cells. Most of the expressed STRs in the clinical samples mapped to intronic and intergenic DNA. Our analysis indicated that three of these STRs (TAAA-ACTG2, TTTTG-TRIB1, and TG-PCA3) are polymorphic and differentially expressed in prostate tumors compared to adjacent non-malignant cells. TG-PCA3 STR expression was repressed by the anti-androgen drug enzalutamide in prostate cancer cells. Genetic analysis of prostate cancer patients and healthy controls (N > 2,000) showed a significant association of the most common 11 repeat allele of TG-PCA3 STR with prostate cancer risk (OR = 1.49; 95% CI 1.11–1.99; P = 0.008). A significant association was also observed with aggressive disease (OR = 2.00; 95% CI 1.06–3.76; P = 0.031) and high mortality rates (HR = 3.0; 95% CI 1.03–8.77; P = 0.045). We propose that TG-PCA3 STR has both diagnostic and prognostic potential for prostate cancer. We provided a proof of concept to be applied to other RNA sequencing datasets to identify disease-associated STRs for future clinical exploratory studies.
Collapse
|
26
|
Tang H, Nzabarushimana E. STRScan: targeted profiling of short tandem repeats in whole-genome sequencing data. BMC Bioinformatics 2017; 18:398. [PMID: 28984185 PMCID: PMC5629557 DOI: 10.1186/s12859-017-1800-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Short tandem repeats (STRs) are found in many prokaryotic and eukaryotic genomes, and are commonly used as genetic markers, in particular for identity and parental testing in DNA forensics. The unstable expansion of some STRs was associated with various genetic disorders (e.g., the Huntington disease), and thus was used in genetic testing for screening individuals at high risk. Traditional STR analyses were based on the PCR amplification of STR loci followed by gel electrophoresis. With the availability of massive whole genome sequencing data, it becomes practical to mine STR profiles in silico from genome sequences. Software tools such as lobSTR and STR-FM have been developed to address these demands, which are, however, built upon whole genome reads mapping tools, and thus may not be sensitive enough. RESULTS In this paper, we present a standalone software tool STRScan that uses a greedy algorithm for targeted STR profiling in next-generation sequencing (NGS) data. STRScan was tested on the whole genome sequencing data from Venter genome sequencing and 1000 Genomes Project. The results showed that STRScan can profile 20% more STRs in the target set that are missed by lobSTR. CONCLUSION STRScan is particularly useful for the NGS-based targeted STR profiling, e.g., in genetic and human identity testing. STRScan is available as open-source software at http://darwin.informatics.indiana.edu/str/ .
Collapse
Affiliation(s)
- Haixu Tang
- School of Informatics and Computing, Indiana University, 150 S. Woodlawn Avenue, Bloomington, 47405, IN, USA.
| | - Etienne Nzabarushimana
- School of Informatics and Computing, Indiana University, 150 S. Woodlawn Avenue, Bloomington, 47405, IN, USA
| |
Collapse
|
27
|
Prentice MB, Bowman J, Lalor JL, McKay MM, Thomson LA, Watt CM, McAdam AG, Murray DL, Wilson PJ. Signatures of selection in mammalian clock genes with coding trinucleotide repeats: Implications for studying the genomics of high-pace adaptation. Ecol Evol 2017; 7:7254-7276. [PMID: 28944015 PMCID: PMC5606889 DOI: 10.1002/ece3.3223] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 05/31/2017] [Accepted: 06/06/2017] [Indexed: 12/14/2022] Open
Abstract
Climate change is predicted to affect the reproductive ecology of wildlife; however, we have yet to understand if and how species can adapt to the rapid pace of change. Clock genes are functional genes likely critical for adaptation to shifting seasonal conditions through shifts in timing cues. Many of these genes contain coding trinucleotide repeats, which offer the potential for higher rates of change than single nucleotide polymorphisms (SNPs) at coding sites, and, thus, may translate to faster rates of adaptation in changing environments. We characterized repeats in 22 clock genes across all annotated mammal species and evaluated the potential for selection on repeat motifs in three clock genes (NR1D1,CLOCK, and PER1) in three congeneric species pairs with different latitudinal range limits: Canada lynx and bobcat (Lynx canadensis and L. rufus), northern and southern flying squirrels (Glaucomys sabrinus and G. volans), and white‐footed and deer mouse (Peromyscus leucopus and P. maniculatus). Signatures of positive selection were found in both the interspecific comparison of Canada lynx and bobcat, and intraspecific analyses in Canada lynx. Northern and southern flying squirrels showed differing frequencies at common CLOCK alleles and a signature of balancing selection. Regional excess homozygosity was found in the deer mouse at PER1 suggesting disruptive selection, and further analyses suggested balancing selection in the white‐footed mouse. These preliminary signatures of selection and the presence of trinucleotide repeats within many clock genes warrant further consideration of the importance of candidate gene motifs for adaptation to climate change.
Collapse
Affiliation(s)
- Melanie B Prentice
- Department of Environmental and Life Sciences Trent University Peterborough ON Canada
| | - Jeff Bowman
- Wildlife Research and Monitoring Section Ontario Ministry of Natural Resources and Forestry Peterborough ON Canada
| | | | - Michelle M McKay
- Department of Environmental and Life Sciences Trent University Peterborough ON Canada
| | | | - Cristen M Watt
- Department of Environmental and Life Sciences Trent University Peterborough ON Canada
| | - Andrew G McAdam
- Department of Integrative Biology University of Guelph Guelph ON Canada
| | | | - Paul J Wilson
- Biology Department Trent University Peterborough ON Canada
| |
Collapse
|
28
|
Shin G, Grimes SM, Lee H, Lau BT, Xia LC, Ji HP. CRISPR-Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis. Nat Commun 2017; 8:14291. [PMID: 28169275 PMCID: PMC5309709 DOI: 10.1038/ncomms14291] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2016] [Accepted: 12/15/2016] [Indexed: 11/09/2022] Open
Abstract
Microsatellites are multi-allelic and composed of short tandem repeats (STRs) with individual motifs composed of mononucleotides, dinucleotides or higher including hexamers. Next-generation sequencing approaches and other STR assays rely on a limited number of PCR amplicons, typically in the tens. Here, we demonstrate STR-Seq, a next-generation sequencing technology that analyses over 2,000 STRs in parallel, and provides the accurate genotyping of microsatellites. STR-Seq employs in vitro CRISPR-Cas9-targeted fragmentation to produce specific DNA molecules covering the complete microsatellite sequence. Amplification-free library preparation provides single molecule sequences without unique molecular barcodes. STR-selective primers enable massively parallel, targeted sequencing of large STR sets. Overall, STR-Seq has higher throughput, improved accuracy and provides a greater number of informative haplotypes compared with other microsatellite analysis approaches. With these new features, STR-Seq can identify a 0.1% minor genome fraction in a DNA mixture composed of different, unrelated samples.
Collapse
Affiliation(s)
- GiWon Shin
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, CCSR 1115, 269 Campus Drive, Stanford, California 94305, USA
| | - Susan M Grimes
- Stanford Genome Technology Center, Stanford University, 3165 Porter Drive, Palo Alto, California 94304, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, CCSR 1115, 269 Campus Drive, Stanford, California 94305, USA
| | - Billy T Lau
- Stanford Genome Technology Center, Stanford University, 3165 Porter Drive, Palo Alto, California 94304, USA
| | - Li C Xia
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, CCSR 1115, 269 Campus Drive, Stanford, California 94305, USA
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, CCSR 1115, 269 Campus Drive, Stanford, California 94305, USA.,Stanford Genome Technology Center, Stanford University, 3165 Porter Drive, Palo Alto, California 94304, USA
| |
Collapse
|
29
|
Ho PW, Swinnen S, Duitama J, Nevoigt E. The sole introduction of two single-point mutations establishes glycerol utilization in Saccharomyces cerevisiae CEN.PK derivatives. BIOTECHNOLOGY FOR BIOFUELS 2017; 10:10. [PMID: 28053667 PMCID: PMC5209837 DOI: 10.1186/s13068-016-0696-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2016] [Accepted: 12/23/2016] [Indexed: 06/06/2023]
Abstract
BACKGROUND Glycerol is an abundant by-product of biodiesel production and has several advantages as a substrate in biotechnological applications. Unfortunately, the popular production host Saccharomyces cerevisiae can barely metabolize glycerol by nature. RESULTS In this study, two evolved derivatives of the strain CEN.PK113-1A were created that were able to grow in synthetic glycerol medium (strains PW-1 and PW-2). Their growth performances on glycerol were compared with that of the previously published evolved CEN.PK113-7D derivative JL1. As JL1 showed a higher maximum specific growth rate on glycerol (0.164 h-1 compared to 0.119 h-1 for PW-1 and 0.127 h-1 for PW-2), its genomic DNA was subjected to whole-genome resequencing. Two point mutations in the coding sequences of the genes UBR2 and GUT1 were identified to be crucial for growth in synthetic glycerol medium and subsequently verified by reverse engineering of the wild-type strain CEN.PK113-7D. The growth rate of the resulting reverse-engineered strain was 0.130 h-1. Sanger sequencing of the GUT1 and UBR2 alleles of the above-mentioned evolved strains PW-1 and PW-2 also revealed one single-point mutation in these two genes, and both mutations were demonstrated to be also crucial and sufficient for obtaining a maximum specific growth rate on glycerol of ~0.120 h-1. CONCLUSIONS The current work confirmed the importance of UBR2 and GUT1 as targets for establishing glycerol utilization in strains of the CEN.PK family. In addition, it shows that a growth rate on glycerol of 0.130 h-1 can be established in reverse-engineered CEN.PK strains by solely replacing a single amino acid in the coding sequences of both Ubr2 and Gut1.
Collapse
Affiliation(s)
- Ping-Wei Ho
- Department of Life Sciences and Chemistry, Jacobs University Bremen gGmbH, Campus Ring 1, 28759 Bremen, Germany
| | - Steve Swinnen
- Department of Life Sciences and Chemistry, Jacobs University Bremen gGmbH, Campus Ring 1, 28759 Bremen, Germany
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de los Andes, Cra 1 Este No 19A-40, Bogotá, Colombia
| | - Elke Nevoigt
- Department of Life Sciences and Chemistry, Jacobs University Bremen gGmbH, Campus Ring 1, 28759 Bremen, Germany
| |
Collapse
|
30
|
Quilez J, Guilmatre A, Garg P, Highnam G, Gymrek M, Erlich Y, Joshi RS, Mittelman D, Sharp AJ. Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucleic Acids Res 2016; 44:3750-62. [PMID: 27060133 PMCID: PMC4857002 DOI: 10.1093/nar/gkw219] [Citation(s) in RCA: 92] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 03/22/2016] [Indexed: 01/23/2023] Open
Abstract
Despite representing an important source of genetic variation, tandem repeats (TRs) remain poorly studied due to technical difficulties. We hypothesized that TRs can operate as expression (eQTLs) and methylation (mQTLs) quantitative trait loci. To test this we analyzed the effect of variation at 4849 promoter-associated TRs, genotyped in 120 individuals, on neighboring gene expression and DNA methylation. Polymorphic promoter TRs were associated with increased variance in local gene expression and DNA methylation, suggesting functional consequences related to TR variation. We identified >100 TRs associated with expression/methylation levels of adjacent genes. These potential eQTL/mQTL TRs were enriched for overlaps with transcription factor binding and DNaseI hypersensitivity sites, providing a rationale for their effects. Moreover, we showed that most TR variants are poorly tagged by nearby single nucleotide polymorphisms (SNPs) markers, indicating that many functional TR variants are not effectively assayed by SNP-based approaches. Our study assigns biological significance to TR variations in the human genome, and suggests that a significant fraction of TR variations exert functional effects via alterations of local gene expression or epigenetics. We conclude that targeted studies that focus on genotyping TR variants are required to fully ascertain functional variation in the genome.
Collapse
Affiliation(s)
- Javier Quilez
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Audrey Guilmatre
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Paras Garg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Gareth Highnam
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Melissa Gymrek
- Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA New York Genome Center, New York, NY 10038, USA
| | - Yaniv Erlich
- Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, NY 10027, USA
| | - Ricky S Joshi
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - David Mittelman
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
31
|
Zhang YJ, Hou JX, Zhang S, Hausner G, Liu XZ, Li WJ. The intronic minisatellite OsMin1 within a serine protease gene in the Chinese caterpillar fungus Ophiocordyceps sinensis. Appl Microbiol Biotechnol 2016; 100:3599-610. [PMID: 26754819 DOI: 10.1007/s00253-016-7287-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Revised: 12/24/2015] [Accepted: 12/29/2015] [Indexed: 12/01/2022]
Abstract
Repetitive DNA sequences make up a significant portion of all genomes and may occur in intergenic, regulatory, coding, or even intronic regions. Partial sequences of a serine protease gene csp1 was previously used as a population genetic marker of the Chinese caterpillar fungus Ophiocordyceps sinensis, but its first intron region was excluded due to ambiguous alignment. Here in this study, we report the presence of a minisatellite OsMin1 within this intron, where a 20(19)-bp repeat motif is duplicated two to six times in different isolates. Fourteen intron alleles and 13 OsMin1 alleles were identified among 125 O. sinensis samples distributed broadly on the Tibetan Plateau. Two OsMin1 alleles were prevalent, corresponding to either two or five repeats of the core sequence motif. OsMin1 appears to be a single locus marker in the O. sinensis genome, but its origin is undetermined. Abundant recombination signals were detected between upstream and downstream flanking regions of OsMin1, suggesting that OsMin1 mutate by unequal crossing over. Geographic distribution, fungal phylogeny, and host insect phylogeny all significantly affected intron distribution patterns but with the greatest influence noted for fungal genotypes and the least for geography. As far as we know, OsMin1 is the first minisatellite found in O. sinensis and the second found in fungal introns. OsMin1 may be useful in designing an efficient protocol to discriminate authentic O. sinensis from counterfeits.
Collapse
Affiliation(s)
- Yong-Jie Zhang
- School of Life Sciences, Shanxi University, Taiyuan, 030006, China.
| | - Jun-Xiu Hou
- School of Life Sciences, Shanxi University, Taiyuan, 030006, China
| | - Shu Zhang
- Institute of Applied Chemistry, Shanxi University, Taiyuan, 030006, China
| | - Georg Hausner
- Department of Microbiology, University of Manitoba, Winnipeg, MB, R3T 2N2, Canada
| | - Xing-Zhong Liu
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Wen-Jia Li
- Sunshine Lake Pharma Co., LTD, Dongguan, 523808, China
| |
Collapse
|
32
|
Isaza JP, Galván AL, Polanco V, Huang B, Matveyev AV, Serrano MG, Manque P, Buck GA, Alzate JF. Revisiting the reference genomes of human pathogenic Cryptosporidium species: reannotation of C. parvum Iowa and a new C. hominis reference. Sci Rep 2015; 5:16324. [PMID: 26549794 PMCID: PMC4637869 DOI: 10.1038/srep16324] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2015] [Accepted: 10/08/2015] [Indexed: 11/09/2022] Open
Abstract
Cryptosporidium parvum and C. hominis are the most relevant species of this genus for human health. Both cause a self-limiting diarrhea in immunocompetent individuals, but cause potentially life-threatening disease in the immunocompromised. Despite the importance of these pathogens, only one reference genome of each has been analyzed and published. These two reference genomes were sequenced using automated capillary sequencing; as of yet, no next generation sequencing technology has been applied to improve their assemblies and annotations. For C. hominis, the main challenge that prevents a larger number of genomes to be sequenced is its resistance to axenic culture. In the present study, we employed next generation technology to analyse the genomic DNA and RNA to generate a new reference genome sequence of a C. hominis strain isolated directly from human stool and a new genome annotation of the C. parvum Iowa reference genome.
Collapse
Affiliation(s)
- Juan P Isaza
- Grupo de Parasitología, Facultad de Medicina, Universidad de Antioquia Carrera 53 No. 61-30, Medellin, Antioquia 05001, Colombia.,Centro Nacional de Secuenciación Genómica-CNSG, Universidad de Antioquia Carrera 53 No. 61-30, Medellin, Antioquia 05001, Colombia
| | - Ana Luz Galván
- Grupo de Parasitología, Facultad de Medicina, Universidad de Antioquia Carrera 53 No. 61-30, Medellin, Antioquia 05001, Colombia
| | - Victor Polanco
- Universidad Mayor de Chile-Centro de Genómica y Bioinformatica Camino La piramide 5750 Huechuraba, Santiago de Chile, 8580000, Chile
| | - Bernice Huang
- Virginia Commonwealth University - Center for the Study of Biological Complexity 1101 E. Marshall St., Virginia 23298-0678, US
| | - Andrey V Matveyev
- Virginia Commonwealth University - Center for the Study of Biological Complexity 1101 E. Marshall St., Virginia 23298-0678, US
| | - Myrna G Serrano
- Virginia Commonwealth University - Center for the Study of Biological Complexity 1101 E. Marshall St., Virginia 23298-0678, US
| | - Patricio Manque
- Universidad Mayor de Chile-Centro de Genómica y Bioinformatica Camino La piramide 5750 Huechuraba, Santiago de Chile, 8580000, Chile
| | - Gregory A Buck
- Virginia Commonwealth University - Center for the Study of Biological Complexity 1101 E. Marshall St., Virginia 23298-0678, US
| | - Juan F Alzate
- Grupo de Parasitología, Facultad de Medicina, Universidad de Antioquia Carrera 53 No. 61-30, Medellin, Antioquia 05001, Colombia.,Centro Nacional de Secuenciación Genómica-CNSG, Universidad de Antioquia Carrera 53 No. 61-30, Medellin, Antioquia 05001, Colombia
| |
Collapse
|
33
|
Bilgin Sonay T, Carvalho T, Robinson MD, Greminger MP, Krützen M, Comas D, Highnam G, Mittelman D, Sharp A, Marques-Bonet T, Wagner A. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. Genome Res 2015; 25:1591-1599. [PMID: 26290536 DOI: 10.1101/015784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 08/14/2015] [Indexed: 05/25/2023]
Abstract
Tandem repeats (TRs) are stretches of DNA that are highly variable in length and mutate rapidly. They are thus an important source of genetic variation. This variation is highly informative for population and conservation genetics. It has also been associated with several pathological conditions and with gene expression regulation. However, genome-wide surveys of TR variation in humans and closely related species have been scarce due to technical difficulties derived from short-read technology. Here we explored the genome-wide diversity of TRs in a panel of 83 human and nonhuman great ape genomes, in a total of six different species, and studied their impact on gene expression evolution. We found that population diversity patterns can be efficiently captured with short TRs (repeat unit length, 1-5 bp). We examined the potential evolutionary role of TRs in gene expression differences between humans and primates by using 30,275 larger TRs (repeat unit length, 2-50 bp). Genes that contained TRs in the promoters, in their 3' untranslated region, in introns, and in exons had higher expression divergence than genes without repeats in the regions. Polymorphic small repeats (1-5 bp) had also higher expression divergence compared with genes with fixed or no TRs in the gene promoters. Our findings highlight the potential contribution of TRs to human evolution through gene regulation.
Collapse
Affiliation(s)
- Tugce Bilgin Sonay
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tiago Carvalho
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Mark D Robinson
- The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; Institute of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
| | - Maja P Greminger
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - Michael Krützen
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - David Comas
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Gareth Highnam
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - David Mittelman
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - Andrew Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai School, New York, New York 10029, USA
| | - Tomàs Marques-Bonet
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain; Centro Nacional de Análisis Genómico (CNAG), PCB, Barcelona, 08028 Catalonia, Spain; Catalan Institution for Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
| | - Andreas Wagner
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; The Santa Fe Institute, Santa Fe, New Mexico 87501, USA
| |
Collapse
|
34
|
Prentice MB, Bowman J, Wilson PJ. A test of somatic mosaicism in the androgen receptor gene of Canada lynx (Lynx canadensis). BMC Genet 2015; 16:125. [PMID: 26503624 PMCID: PMC4623281 DOI: 10.1186/s12863-015-0284-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Accepted: 10/19/2015] [Indexed: 11/11/2022] Open
Abstract
Background The androgen receptor, an X-linked gene, has been widely studied in human populations because it contains highly polymorphic trinucleotide repeat motifs that have been associated with a number of adverse human health and behavioral effects. A previous study on the androgen receptor gene in carnivores reported somatic mosaicism in the tissues of a number of species including Eurasian lynx (Lynx lynx). We investigated this claim in a closely related species, Canada lynx (Lynx canadensis). The presence of somatic mosaicism in lynx tissues could have implications for the future study of exonic trinucleotide repeats in landscape genomic studies, in which the accurate reporting of genotypes would be highly problematic. Methods To determine whether mosaicism occurs in Canada lynx, two lynx individuals were sampled for a variety of tissue types (lynx 1) and tissue locations (lynx 1 and 2), and 1,672 individuals of known sex were genotyped to further rule out mosaicism. Results We found no evidence of mosaicism in tissues from the two necropsied individuals, or any of our genotyped samples. Conclusions Our results indicate that mosaicism does not manifest in Canada lynx. Therefore, the use of hide samples for further work involving trinucleotide repeat polymorphisms in Canada lynx is warranted. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0284-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Melanie B Prentice
- Department of Environmental & Life Sciences, Trent University, 1600 West Bank Drive, Peterborough, K9J 7B8, ON, Canada.
| | - Jeff Bowman
- Wildlife Research and Monitoring Section, Ontario Ministry of Natural Resources and Forestry, 2140 East Bank Drive, Peterborough, K9J 7B8, ON, Canada.
| | - Paul J Wilson
- Biology Department, Trent University, 1600 West Bank Drive, Peterborough, K9J 7B8, ON, Canada.
| |
Collapse
|
35
|
Bilgin Sonay T, Carvalho T, Robinson MD, Greminger MP, Krützen M, Comas D, Highnam G, Mittelman D, Sharp A, Marques-Bonet T, Wagner A. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. Genome Res 2015; 25:1591-9. [PMID: 26290536 PMCID: PMC4617956 DOI: 10.1101/gr.190868.115] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 08/14/2015] [Indexed: 12/20/2022]
Abstract
Tandem repeats (TRs) are stretches of DNA that are highly variable in length and mutate rapidly. They are thus an important source of genetic variation. This variation is highly informative for population and conservation genetics. It has also been associated with several pathological conditions and with gene expression regulation. However, genome-wide surveys of TR variation in humans and closely related species have been scarce due to technical difficulties derived from short-read technology. Here we explored the genome-wide diversity of TRs in a panel of 83 human and nonhuman great ape genomes, in a total of six different species, and studied their impact on gene expression evolution. We found that population diversity patterns can be efficiently captured with short TRs (repeat unit length, 1–5 bp). We examined the potential evolutionary role of TRs in gene expression differences between humans and primates by using 30,275 larger TRs (repeat unit length, 2–50 bp). Genes that contained TRs in the promoters, in their 3′ untranslated region, in introns, and in exons had higher expression divergence than genes without repeats in the regions. Polymorphic small repeats (1–5 bp) had also higher expression divergence compared with genes with fixed or no TRs in the gene promoters. Our findings highlight the potential contribution of TRs to human evolution through gene regulation.
Collapse
Affiliation(s)
- Tugce Bilgin Sonay
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tiago Carvalho
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Mark D Robinson
- The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; Institute of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
| | - Maja P Greminger
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - Michael Krützen
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - David Comas
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Gareth Highnam
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - David Mittelman
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - Andrew Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai School, New York, New York 10029, USA
| | - Tomàs Marques-Bonet
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain; Centro Nacional de Análisis Genómico (CNAG), PCB, Barcelona, 08028 Catalonia, Spain; Catalan Institution for Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
| | - Andreas Wagner
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; The Santa Fe Institute, Santa Fe, New Mexico 87501, USA
| |
Collapse
|
36
|
Gemayel R, Chavali S, Pougach K, Legendre M, Zhu B, Boeynaems S, van der Zande E, Gevaert K, Rousseau F, Schymkowitz J, Babu MM, Verstrepen KJ. Variable Glutamine-Rich Repeats Modulate Transcription Factor Activity. Mol Cell 2015; 59:615-27. [PMID: 26257283 PMCID: PMC4543046 DOI: 10.1016/j.molcel.2015.07.003] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 06/26/2015] [Accepted: 07/01/2015] [Indexed: 12/15/2022]
Abstract
Excessive expansions of glutamine (Q)-rich repeats in various human proteins are known to result in severe neurodegenerative disorders such as Huntington's disease and several ataxias. However, the physiological role of these repeats and the consequences of more moderate repeat variation remain unknown. Here, we demonstrate that Q-rich domains are highly enriched in eukaryotic transcription factors where they act as functional modulators. Incremental changes in the number of repeats in the yeast transcriptional regulator Ssn6 (Cyc8) result in systematic, repeat-length-dependent variation in expression of target genes that result in direct phenotypic changes. The function of Ssn6 increases with its repeat number until a certain threshold where further expansion leads to aggregation. Quantitative proteomic analysis reveals that the Ssn6 repeats affect its solubility and interactions with Tup1 and other regulators. Thus, Q-rich repeats are dynamic functional domains that modulate a regulator's innate function, with the inherent risk of pathogenic repeat expansions.
Collapse
Affiliation(s)
- Rita Gemayel
- Laboratory of Systems Biology, VIB, Gaston Geenslaan 1, 3001 Heverlee, Belgium; Laboratory of Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), Department M2S, KU Leuven, Gaston Geenslaan 1, 3001 Heverlee, Belgium
| | - Sreenivas Chavali
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Ksenia Pougach
- Laboratory of Systems Biology, VIB, Gaston Geenslaan 1, 3001 Heverlee, Belgium; Laboratory of Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), Department M2S, KU Leuven, Gaston Geenslaan 1, 3001 Heverlee, Belgium
| | - Matthieu Legendre
- Structural and Genomic Information Laboratory, IGS UMR7256, Centre National de la Recherche Scientifique, Aix-Marseille Université, Institut de Microbiologie de la Méditerranée (IMM), 13288 Marseille Cedex 9, France
| | - Bo Zhu
- Laboratory of Systems Biology, VIB, Gaston Geenslaan 1, 3001 Heverlee, Belgium; Laboratory of Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), Department M2S, KU Leuven, Gaston Geenslaan 1, 3001 Heverlee, Belgium
| | - Steven Boeynaems
- Laboratory of Systems Biology, VIB, Gaston Geenslaan 1, 3001 Heverlee, Belgium; Laboratory of Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), Department M2S, KU Leuven, Gaston Geenslaan 1, 3001 Heverlee, Belgium
| | - Elisa van der Zande
- Laboratory of Systems Biology, VIB, Gaston Geenslaan 1, 3001 Heverlee, Belgium; Laboratory of Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), Department M2S, KU Leuven, Gaston Geenslaan 1, 3001 Heverlee, Belgium
| | - Kris Gevaert
- Department of Medical Protein Research, VIB, 9000 Ghent, Belgium; Department of Biochemistry, Ghent University, 9000 Ghent, Belgium
| | - Frederic Rousseau
- Switch Laboratory, VIB, Campus Gasthuisberg, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Joost Schymkowitz
- Switch Laboratory, VIB, Campus Gasthuisberg, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - M Madan Babu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Kevin J Verstrepen
- Laboratory of Systems Biology, VIB, Gaston Geenslaan 1, 3001 Heverlee, Belgium; Laboratory of Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), Department M2S, KU Leuven, Gaston Geenslaan 1, 3001 Heverlee, Belgium.
| |
Collapse
|
37
|
Abstract
Cancer is widely recognized as a genetic disease in which somatic mutations are sequentially accumulated to drive tumor progression. Although genomic landscape studies are informative for individual cancer types, a comprehensive comparative study of tumorigenic mutations across cancer types based on integrative data sources is still a pressing need. We systematically analyzed ~10(6) non-synonymous mutations extracted from COSMIC, involving ~8000 genome-wide screened samples across 23 major human cancers at both the amino acid and gene levels. Our analysis identified cancer-specific heterogeneity that traditional nucleotide variation analysis alone usually overlooked. Particularly, the amino acid arginine (R) turns out to be the most favorable target of amino acid alteration in most cancer types studied (P < 10(-9), binomial test), reflecting its important role in cellular physiology. The tumor suppressor gene TP53 is mutated exclusively with the HYDIN, KRAS, and PTEN genes in large intestine, lung, and endometrial cancers respectively, indicating that TP53 takes part in different signaling pathways in different cancers. While some of our analyses corroborated previous observations, others indicated relevant candidates with high priority for further experimental validation. Our findings have many ramifications in understanding the etiology of cancer and the underlying molecular mechanisms in particular cancers.
Collapse
|
38
|
Carlson KD, Sudmant PH, Press MO, Eichler EE, Shendure J, Queitsch C. MIPSTR: a method for multiplex genotyping of germline and somatic STR variation across many individuals. Genome Res 2015; 25:750-61. [PMID: 25659649 PMCID: PMC4417122 DOI: 10.1101/gr.182212.114] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 02/05/2015] [Indexed: 12/21/2022]
Abstract
Short tandem repeats (STRs) are highly mutable genetic elements that often reside in regulatory and coding DNA. The cumulative evidence of genetic studies on individual STRs suggests that STR variation profoundly affects phenotype and contributes to trait heritability. Despite recent advances in sequencing technology, STR variation has remained largely inaccessible across many individuals compared to single nucleotide variation or copy number variation. STR genotyping with short-read sequence data is confounded by (1) the difficulty of uniquely mapping short, low-complexity reads; and (2) the high rate of STR amplification stutter. Here, we present MIPSTR, a robust, scalable, and affordable method that addresses these challenges. MIPSTR uses targeted capture of STR loci by single-molecule Molecular Inversion Probes (smMIPs) and a unique mapping strategy. Targeted capture and our mapping strategy resolve the first challenge; the use of single molecule information resolves the second challenge. Unlike previous methods, MIPSTR is capable of distinguishing technical error due to amplification stutter from somatic STR mutations. In proof-of-principle experiments, we use MIPSTR to determine germline STR genotypes for 102 STR loci with high accuracy across diverse populations of the plant A. thaliana. We show that putatively functional STRs may be identified by deviation from predicted STR variation and by association with quantitative phenotypes. Using DNA mixing experiments and a mutant deficient in DNA repair, we demonstrate that MIPSTR can detect low-frequency somatic STR variants. MIPSTR is applicable to any organism with a high-quality reference genome and is scalable to genotyping many thousands of STR loci in thousands of individuals.
Collapse
Affiliation(s)
- Keisha D Carlson
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Peter H Sudmant
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Maximilian O Press
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA; Howard Hughes Medical Institute, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
39
|
Willems T, Gymrek M, Highnam G, Mittelman D, Erlich Y. The landscape of human STR variation. Genome Res 2014; 24:1894-904. [PMID: 25135957 PMCID: PMC4216929 DOI: 10.1101/gr.177774.114] [Citation(s) in RCA: 176] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 08/15/2014] [Indexed: 02/06/2023]
Abstract
Short tandem repeats are among the most polymorphic loci in the human genome. These loci play a role in the etiology of a range of genetic diseases and have been frequently utilized in forensics, population genetics, and genetic genealogy. Despite this plethora of applications, little is known about the variation of most STRs in the human population. Here, we report the largest-scale analysis of human STR variation to date. We collected information for nearly 700,000 STR loci across more than 1000 individuals in Phase 1 of the 1000 Genomes Project. Extensive quality controls show that reliable allelic spectra can be obtained for close to 90% of the STR loci in the genome. We utilize this call set to analyze determinants of STR variation, assess the human reference genome's representation of STR alleles, find STR loci with common loss-of-function alleles, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs. Overall, these analyses further elucidate the scale of genetic variation beyond classical point mutations.
Collapse
Affiliation(s)
- Thomas Willems
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA; Computational and Systems Biology Program, MIT, Cambridge, Massachusetts 02139, USA
| | - Melissa Gymrek
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA; Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA; Department of Molecular Biology and Diabetes Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Gareth Highnam
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - David Mittelman
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA; Gene by Gene, Ltd., Houston, Texas 77008, USA
| | - Yaniv Erlich
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA;
| |
Collapse
|
40
|
Press MO, Carlson KD, Queitsch C. The overdue promise of short tandem repeat variation for heritability. Trends Genet 2014; 30:504-12. [PMID: 25182195 DOI: 10.1016/j.tig.2014.07.008] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 07/23/2014] [Accepted: 07/24/2014] [Indexed: 12/11/2022]
Abstract
Short tandem repeat (STR) variation has been proposed as a major explanatory factor in the heritability of complex traits in humans and model organisms. However, we still struggle to incorporate STR variation into genotype-phenotype maps. We review here the promise of STRs in contributing to complex trait heritability and highlight the challenges that STRs pose due to their repetitive nature. We argue that STR variants are more likely than single-nucleotide variants to have epistatic interactions, reiterate the need for targeted assays to genotype STRs accurately, and call for more appropriate statistical methods in detecting STR-phenotype associations. Lastly, we suggest that somatic STR variation within individuals may serve as a read-out of disease susceptibility, and is thus potentially a valuable covariate for future association studies.
Collapse
Affiliation(s)
- Maximilian O Press
- Department of Genome Sciences, University of Washington, Foege Building S-250, Box 355065, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Keisha D Carlson
- Department of Genome Sciences, University of Washington, Foege Building S-250, Box 355065, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Foege Building S-250, Box 355065, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA.
| |
Collapse
|