51
|
Reinar WB, Lalun VO, Reitan T, Jakobsen KS, Butenko MA. Length variation in short tandem repeats affects gene expression in natural populations of Arabidopsis thaliana. THE PLANT CELL 2021; 33:2221-2234. [PMID: 33848350 PMCID: PMC8364236 DOI: 10.1093/plcell/koab107] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 04/07/2021] [Indexed: 06/12/2023]
Abstract
The genetic basis for the fine-tuned regulation of gene expression is complex and ultimately influences the phenotype and thus the local adaptation of natural populations. Short tandem repeats (STRs) consisting of repetitive DNA motifs have been shown to regulate gene expression. STRs are variable in length within a population and serve as a heritable, but semi-reversible, reservoir of standing genetic variation. For sessile organisms, such as plants, STRs could be of major importance in fine-tuning gene expression as a response to a shifting local environment. Here, we used a transcriptome dataset from natural accessions of Arabidopsis thaliana to investigate population-wide gene expression patterns in light of genome-wide STR variation. We empirically modeled gene expression as a response to the STR length within and around the gene and demonstrated that an association between gene expression and STR length variation is unequivocally present in the sampled population. To support our model, we explored the promoter activity in a transcriptional regulator involved in root hair formation and provided experimentally determined causality between coding sequence length variation and promoter activity. Our results support a general link between gene expression variation and STR length variation in A. thaliana.
Collapse
Affiliation(s)
- William B. Reinar
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316 Oslo, Norway
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Vilde O. Lalun
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Trond Reitan
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Kjetill S. Jakobsen
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Melinka A. Butenko
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| |
Collapse
|
52
|
PolyG-DS: An ultrasensitive polyguanine tract-profiling method to detect clonal expansions and trace cell lineage. Proc Natl Acad Sci U S A 2021; 118:2023373118. [PMID: 34330826 DOI: 10.1073/pnas.2023373118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Polyguanine tracts (PolyGs) are short guanine homopolymer repeats that are prone to accumulating mutations when cells divide. This feature makes them especially suitable for cell lineage tracing, which has been exploited to detect and characterize precancerous and cancerous somatic evolution. PolyG genotyping, however, is challenging because of the inherent biochemical difficulties in amplifying and sequencing repetitive regions. To overcome this limitation, we developed PolyG-DS, a next-generation sequencing (NGS) method that combines the error-correction capabilities of duplex sequencing (DS) with enrichment of PolyG loci using CRISPR-Cas9-targeted genomic fragmentation. PolyG-DS markedly reduces technical artifacts by comparing the sequences derived from the complementary strands of each original DNA molecule. We demonstrate that PolyG-DS genotyping is accurate, reproducible, and highly sensitive, enabling the detection of low-frequency alleles (<0.01) in spike-in samples using a panel of only 19 PolyG markers. PolyG-DS replicated prior results based on PolyG fragment length analysis by capillary electrophoresis, and exhibited higher sensitivity for identifying clonal expansions in the nondysplastic colon of patients with ulcerative colitis. We illustrate the utility of this method for resolving the phylogenetic relationship among precancerous lesions in ulcerative colitis and for tracing the metastatic dissemination of ovarian cancer. PolyG-DS enables the study of tumor evolution without prior knowledge of tumor driver mutations and provides a tool to perform cost-effective and easily scalable ultra-accurate NGS-based PolyG genotyping for multiple applications in biology, genetics, and cancer research.
Collapse
|
53
|
Eslami Rasekh M, Hernández Y, Drinan SD, Fuxman Bass J, Benson G. Genome-wide characterization of human minisatellite VNTRs: population-specific alleles and gene expression differences. Nucleic Acids Res 2021; 49:4308-4324. [PMID: 33849068 PMCID: PMC8096271 DOI: 10.1093/nar/gkab224] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 03/06/2021] [Accepted: 03/18/2021] [Indexed: 11/12/2022] Open
Abstract
Variable Number Tandem Repeats (VNTRs) are tandem repeat (TR) loci that vary in copy number across a population. Using our program, VNTRseek, we analyzed human whole genome sequencing datasets from 2770 individuals in order to detect minisatellite VNTRs, i.e., those with pattern sizes ≥7 bp. We detected 35 638 VNTR loci and classified 5676 as commonly polymorphic (i.e. with non-reference alleles occurring in >5% of the population). Commonly polymorphic VNTR loci were found to be enriched in genomic regions with regulatory function, i.e. transcription start sites and enhancers. Investigation of the commonly polymorphic VNTRs in the context of population ancestry revealed that 1096 loci contained population-specific alleles and that those could be used to classify individuals into super-populations with near-perfect accuracy. Search for quantitative trait loci (eQTLs), among the VNTRs proximal to genes, indicated that in 187 genes expression differences correlated with VNTR genotype. We validated our predictions in several ways, including experimentally, through the identification of predicted alleles in long reads, and by comparisons showing consistency between sequencing platforms. This study is the most comprehensive analysis of minisatellite VNTRs in the human population to date.
Collapse
Affiliation(s)
| | - Yözen Hernández
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | | | - Juan I Fuxman Bass
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biology, Boston University, Boston, MA 02215, USA
| | - Gary Benson
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biology, Boston University, Boston, MA 02215, USA
- Department of Computer Science, Boston University, Boston, MA 02215, USA
| |
Collapse
|
54
|
Depienne C, Mandel JL. 30 years of repeat expansion disorders: What have we learned and what are the remaining challenges? Am J Hum Genet 2021; 108:764-785. [PMID: 33811808 PMCID: PMC8205997 DOI: 10.1016/j.ajhg.2021.03.011] [Citation(s) in RCA: 193] [Impact Index Per Article: 48.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 03/05/2021] [Indexed: 12/13/2022] Open
Abstract
Tandem repeats represent one of the most abundant class of variations in human genomes, which are polymorphic by nature and become highly unstable in a length-dependent manner. The expansion of repeat length across generations is a well-established process that results in human disorders mainly affecting the central nervous system. At least 50 disorders associated with expansion loci have been described to date, with half recognized only in the last ten years, as prior methodological difficulties limited their identification. These limitations still apply to the current widely used molecular diagnostic methods (exome or gene panels) and thus result in missed diagnosis detrimental to affected individuals and their families, especially for disorders that are very rare and/or clinically not recognizable. Most of these disorders have been identified through family-driven approaches and many others likely remain to be identified. The recent development of long-read technologies provides a unique opportunity to systematically investigate the contribution of tandem repeats and repeat expansions to the genetic architecture of human disorders. In this review, we summarize the current and most recent knowledge about the genetics of repeat expansion disorders and the diversity of their pathophysiological mechanisms and outline the perspectives of developing personalized treatments in the future.
Collapse
Affiliation(s)
- Christel Depienne
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany; Institut du Cerveau et de la Moelle épinière (ICM), Sorbonne Université, UMR S 1127, Inserm U1127, CNRS UMR 7225, 75013 Paris, France.
| | - Jean-Louis Mandel
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Illkirch 67400, France; Centre National de la Recherche Scientifique, UMR 7104, Illkirch 67400, France; Institut National de la Santé et de la Recherche Médicale, U 1258, Illkirch 67400, France; Université de Strasbourg, Illkirch 67400, France; USIAS University of Strasbourg Institute of Advanced study, 67000 Strasbourg, France.
| |
Collapse
|
55
|
Bakhtiari M, Park J, Ding YC, Shleizer-Burko S, Neuhausen SL, Halldórsson BV, Stefánsson K, Gymrek M, Bafna V. Variable number tandem repeats mediate the expression of proximal genes. Nat Commun 2021; 12:2075. [PMID: 33824302 PMCID: PMC8024321 DOI: 10.1038/s41467-021-22206-z] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 02/17/2021] [Indexed: 12/12/2022] Open
Abstract
Variable number tandem repeats (VNTRs) account for significant genetic variation in many organisms. In humans, VNTRs have been implicated in both Mendelian and complex disorders, but are largely ignored by genomic pipelines due to the complexity of genotyping and the computational expense. We describe adVNTR-NN, a method that uses shallow neural networks to genotype a VNTR in 18 seconds on 55X whole genome data, while maintaining high accuracy. We use adVNTR-NN to genotype 10,264 VNTRs in 652 GTEx individuals. Associating VNTR length with gene expression in 46 tissues, we identify 163 "eVNTRs". Of the 22 eVNTRs in blood where independent data is available, 21 (95%) are replicated in terms of significance and direction of association. 49% of the eVNTR loci show a strong and likely causal impact on the expression of genes and 80% have maximum effect size at least 0.3. The impacted genes are involved in diseases including Alzheimer's, obesity and familial cancers, highlighting the importance of VNTRs for understanding the genetic basis of complex diseases.
Collapse
Affiliation(s)
- Mehrdad Bakhtiari
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Jonghun Park
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Yuan-Chun Ding
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, CA, USA
| | | | - Susan L Neuhausen
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, CA, USA
| | | | | | - Melissa Gymrek
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Vineet Bafna
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
56
|
Jian H, Wang L, Lv M, Tan Y, Zhang R, Qu S, Wang J, Zha L, Zhang L, Liang W. A Novel SNP-STR System Based on a Capillary Electrophoresis Platform. Front Genet 2021; 12:636821. [PMID: 33613649 PMCID: PMC7893108 DOI: 10.3389/fgene.2021.636821] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 01/14/2021] [Indexed: 11/13/2022] Open
Abstract
Various compound markers encompassing two or more variants within a small region can be regarded as generalized microhaplotypes. Many of these markers have been investigated for various forensic purposes, such as individual identification, deconvolution of DNA mixtures, or forensic ancestry inference. SNP-STR is a compound biomarker composed of a single nucleotide polymorphism (SNP) and a closely linked short tandem repeat polymorphism (STR), and possess the advantages of both SNPs and STRs. In addition, in conjunction with a polymerase chain reaction (PCR) technique based on the amplification refractory mutation system (ARMS), SNP-STRs can be used for forensic unbalanced DNA mixture analysis based on capillary electrophoresis (CE), which is the most commonly used platform in worldwide forensic laboratories. Our previous research reported 11 SNP-STRs, but few of them are derived from the commonly used STR loci, for which existing STR databases can be used as a reference. For maximum compatibility with existing DNA databases, in this study, we screened 18 SNP-STR loci, of which 14 were derived from the expanded CODIS core loci set. Stable and sensitive SNP-STR multiplex PCR panels based on the CE platform were established. Assays on simulated two-person DNA mixtures showed that all allele-specific primers could detect minor DNA components in 1:500 mixtures. Population data based on 113 unrelated Chengdu Han individuals were investigated. A Bayesian framework was developed for the likelihood ratio (LR) evaluation of SNP-STR profiling results obtained from two-person mixtures. Furthermore, we report on the first use of SNP-STRs in casework to show the advantages and limitations for use in practice. Compared to 2.86 × 103 for autosomal STR kits, the combined LR reached 7.14 × 107 using the SNP-STR method in this casework example.
Collapse
Affiliation(s)
- Hui Jian
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China
| | - Li Wang
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, China.,Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, Chengdu, China
| | - Meili Lv
- Department of Immunology, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China
| | - Yu Tan
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, China.,Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, Chengdu, China
| | - Ranran Zhang
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China
| | - Shengqiu Qu
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China
| | - Jijun Wang
- HI-TECH Industrial Sub-Branch of Chengdu Municipal Public Security Bureau, Chengdu, China
| | - Lagabaiyila Zha
- Department of Forensic Medicine, School of Basic Medical Sciences, Central South University, Changsha, China
| | - Lin Zhang
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China
| | - Weibo Liang
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China
| |
Collapse
|
57
|
Mitsuhashi S, Frith MC, Matsumoto N. Genome-wide survey of tandem repeats by nanopore sequencing shows that disease-associated repeats are more polymorphic in the general population. BMC Med Genomics 2021; 14:17. [PMID: 33413375 PMCID: PMC7791882 DOI: 10.1186/s12920-020-00853-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 12/08/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Tandem repeats are highly mutable and contribute to the development of human disease by a variety of mechanisms. It is difficult to predict which tandem repeats may cause a disease. One hypothesis is that changeable tandem repeats are the source of genetic diseases, because disease-causing repeats are polymorphic in healthy individuals. However, it is not clear whether disease-causing repeats are more polymorphic than other repeats. METHODS We performed a genome-wide survey of the millions of human tandem repeats using publicly available long read genome sequencing data from 21 humans. We measured tandem repeat copy number changes using tandem-genotypes. Length variation of known disease-associated repeats was compared to other repeat loci. RESULTS We found that known Mendelian disease-causing or disease-associated repeats, especially CAG and 5'UTR GGC repeats, are relatively long and polymorphic in the general population. We also show that repeat lengths of two disease-causing tandem repeats, in ATXN3 and GLS, are correlated with near-by GWAS SNP genotypes. CONCLUSIONS We provide a catalog of polymorphic tandem repeats across a variety of repeat unit lengths and sequences, from long read sequencing data. This method especially if used in genome wide association study, may indicate possible new candidates of pathogenic or biologically important tandem repeats in human genomes.
Collapse
Affiliation(s)
- Satomi Mitsuhashi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan.
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University, M&D Tower 24F, 1-5-45 Yushima, Bunkyo-ku, Tokyo, 113-8510, Japan.
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
- Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan.
| |
Collapse
|
58
|
Beecroft SJ, Lamont PJ, Edwards S, Goullée H, Davis MR, Laing NG, Ravenscroft G. The Impact of Next-Generation Sequencing on the Diagnosis, Treatment, and Prevention of Hereditary Neuromuscular Disorders. Mol Diagn Ther 2020; 24:641-652. [PMID: 32997275 DOI: 10.1007/s40291-020-00495-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/05/2020] [Indexed: 12/13/2022]
Abstract
The impact of high-throughput sequencing in genetic neuromuscular disorders cannot be overstated. The ability to rapidly and affordably sequence multiple genes simultaneously has enabled a second golden age of Mendelian disease gene discovery, with flow-on impacts for rapid genetic diagnosis, evidence-based treatment, tailored therapy development, carrier-screening, and prevention of disease recurrence in families. However, there are likely many more neuromuscular disease genes and mechanisms to be discovered. Many patients and families remain without a molecular diagnosis following targeted panel sequencing, clinical exome sequencing, or even genome sequencing. Here we review how massively parallel, or next-generation, sequencing has changed the field of genetic neuromuscular disorders, and anticipate future benefits of recent technological innovations such as RNA-seq implementation and detection of tandem repeat expansions from short-read sequencing.
Collapse
Affiliation(s)
- Sarah J Beecroft
- Neurogenetic Diseases Group, Centre for Medical Research, QEII Medical Centre, University of Western Australia, 6 Verdun St, Nedlands, WA, 6009, Australia.,Harry Perkins Institute of Medical Research, QEII Medical Centre, Nedlands, WA, 6009, Australia
| | | | - Samantha Edwards
- Neurogenetic Diseases Group, Centre for Medical Research, QEII Medical Centre, University of Western Australia, 6 Verdun St, Nedlands, WA, 6009, Australia.,Harry Perkins Institute of Medical Research, QEII Medical Centre, Nedlands, WA, 6009, Australia
| | - Hayley Goullée
- Neurogenetic Diseases Group, Centre for Medical Research, QEII Medical Centre, University of Western Australia, 6 Verdun St, Nedlands, WA, 6009, Australia.,Harry Perkins Institute of Medical Research, QEII Medical Centre, Nedlands, WA, 6009, Australia
| | - Mark R Davis
- Neurogenetic Unit, Department of Diagnostic Genomics, PP Block, QEII Medical Centre, Nedlands, WA, Australia
| | - Nigel G Laing
- Neurogenetic Diseases Group, Centre for Medical Research, QEII Medical Centre, University of Western Australia, 6 Verdun St, Nedlands, WA, 6009, Australia.,Harry Perkins Institute of Medical Research, QEII Medical Centre, Nedlands, WA, 6009, Australia.,Neurogenetic Clinic, Royal Perth Hospital, Perth, Australia
| | - Gianina Ravenscroft
- Neurogenetic Diseases Group, Centre for Medical Research, QEII Medical Centre, University of Western Australia, 6 Verdun St, Nedlands, WA, 6009, Australia. .,Harry Perkins Institute of Medical Research, QEII Medical Centre, Nedlands, WA, 6009, Australia.
| |
Collapse
|
59
|
Wyner N, Barash M, McNevin D. Forensic Autosomal Short Tandem Repeats and Their Potential Association With Phenotype. Front Genet 2020; 11:884. [PMID: 32849844 PMCID: PMC7425049 DOI: 10.3389/fgene.2020.00884] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 07/17/2020] [Indexed: 12/11/2022] Open
Abstract
Forensic DNA profiling utilizes autosomal short tandem repeat (STR) markers to establish identity of missing persons, confirm familial relations, and link persons of interest to crime scenes. It is a widely accepted notion that genetic markers used in forensic applications are not predictive of phenotype. At present, there has been no demonstration of forensic STR variants directly causing or predicting disease. Such a demonstration would have many legal and ethical implications. For example, is there a duty to inform a DNA donor if a medical condition is discovered during routine analysis of their sample? In this review, we evaluate the possibility that forensic STRs could provide information beyond mere identity. An extensive search of the literature returned 107 articles associating a forensic STR with a trait. A total of 57 of these studies met our inclusion criteria: a reported link between a STR-inclusive gene and a phenotype and a statistical analysis reporting a p-value less than 0.05. A total of 50 unique traits were associated with the 24 markers included in the 57 studies. TH01 had the greatest number of associations with 27 traits reportedly linked to 40 different genotypes. Five of the articles associated TH01 with schizophrenia. None of the associations found were independently causative or predictive of disease. Regardless, the likelihood of identifying significant associations is increasing as the function of non-coding STRs in gene expression is steadily revealed. It is recommended that regular reviews take place in order to remain aware of future studies that identify a functional role for any forensic STRs.
Collapse
Affiliation(s)
- Nicole Wyner
- Centre for Forensic Science, School of Mathematical and Physical Sciences, Faculty of Science, University of Technology Sydney, Sydney, NSW, Australia
| | - Mark Barash
- Centre for Forensic Science, School of Mathematical and Physical Sciences, Faculty of Science, University of Technology Sydney, Sydney, NSW, Australia.,Department of Justice Studies, San José State University, San Jose, CA, United States
| | - Dennis McNevin
- Centre for Forensic Science, School of Mathematical and Physical Sciences, Faculty of Science, University of Technology Sydney, Sydney, NSW, Australia
| |
Collapse
|
60
|
Expanding genes, repeating themes and therapeutic schemes: The neurobiology of tandem repeat disorders. Neurobiol Dis 2020; 144:105053. [PMID: 32810583 DOI: 10.1016/j.nbd.2020.105053] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
|
61
|
Weissensteiner MH, Bunikis I, Catalán A, Francoijs KJ, Knief U, Heim W, Peona V, Pophaly SD, Sedlazeck FJ, Suh A, Warmuth VM, Wolf JBW. Discovery and population genomics of structural variation in a songbird genus. Nat Commun 2020; 11:3403. [PMID: 32636372 PMCID: PMC7341801 DOI: 10.1038/s41467-020-17195-4] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 06/16/2020] [Indexed: 02/07/2023] Open
Abstract
Structural variation (SV) constitutes an important type of genetic mutations providing the raw material for evolution. Here, we uncover the genome-wide spectrum of intra- and interspecific SV segregating in natural populations of seven songbird species in the genus Corvus. Combining short-read (N = 127) and long-read re-sequencing (N = 31), as well as optical mapping (N = 16), we apply both assembly- and read mapping approaches to detect SV and characterize a total of 220,452 insertions, deletions and inversions. We exploit sampling across wide phylogenetic timescales to validate SV genotypes and assess the contribution of SV to evolutionary processes in an avian model of incipient speciation. We reveal an evolutionary young (~530,000 years) cis-acting 2.25-kb LTR retrotransposon insertion reducing expression of the NDP gene with consequences for premating isolation. Our results attest to the wealth and evolutionary significance of SV segregating in natural populations and highlight the need for reliable SV genotyping.
Collapse
Affiliation(s)
- Matthias H Weissensteiner
- Department of Evolutionary Biology and Science for Life Laboratory, Uppsala University, 752 36, Uppsala, Sweden.
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany.
- Department of Biology, Pennsylvania State University, 310 Wartik Lab, University Park, PA, 16802, USA.
| | - Ignas Bunikis
- Uppsala Genome Center, Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, BMC, Box 815, 752 37, Uppsala, Sweden
| | - Ana Catalán
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany
| | | | - Ulrich Knief
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany
| | - Wieland Heim
- Institute of Landscsape Ecology, University of Münster, Heisenbergstrasse 2, 48149, Münster, Germany
| | - Valentina Peona
- Department of Evolutionary Biology and Science for Life Laboratory, Uppsala University, 752 36, Uppsala, Sweden
- Department of Organismal Biology - Systematic Biology, Uppsala University, 752 36, Uppsala, Sweden
| | - Saurabh D Pophaly
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany
- Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center at Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - Alexander Suh
- Department of Evolutionary Biology and Science for Life Laboratory, Uppsala University, 752 36, Uppsala, Sweden
- Department of Organismal Biology - Systematic Biology, Uppsala University, 752 36, Uppsala, Sweden
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TU, UK
| | - Vera M Warmuth
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany
| | - Jochen B W Wolf
- Department of Evolutionary Biology and Science for Life Laboratory, Uppsala University, 752 36, Uppsala, Sweden.
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany.
| |
Collapse
|
62
|
Yu H, Zhao S, Ness S, Kang H, Sheng Q, Samuels DC, Oyebamiji O, Zhao YY, Guo Y. Non-canonical RNA-DNA differences and other human genomic features are enriched within very short tandem repeats. PLoS Comput Biol 2020; 16:e1007968. [PMID: 32511223 PMCID: PMC7302867 DOI: 10.1371/journal.pcbi.1007968] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 06/18/2020] [Accepted: 05/19/2020] [Indexed: 11/19/2022] Open
Abstract
Very short tandem repeats bear substantial genetic, evolutional, and pathological significance in genome analyses. Here, we compiled a census of tandem mono-nucleotide/di-nucleotide/tri-nucleotide repeats (MNRs/DNRs/TNRs) in GRCh38, which we term "polytracts" in general. Of the human genome, 144.4 million nucleotides (4.7%) are occupied by polytracts, and 0.47 million single nucleotides are identified as polytract hinges, i.e., break-points of tandem polytracts. Preliminary exploration of the census suggested polytract hinge sites and boundaries of AAC polytracts may bear a higher mapping error rate than other polytract regions. Further, we revealed landscapes of polytract enrichment with respect to nearly a hundred genomic features. We found MNRs, DNRs, and TNRs displayed noticeable difference in terms of locational enrichment for miscellaneous genomic features, especially RNA editing events. Non-canonical and C-to-U RNA-editing events are enriched inside and/or adjacent to MNRs, while all categories of RNA-editing events are under-represented in DNRs. A-to-I RNA-editing events are generally under-represented in polytracts. The selective enrichment of non-canonical RNA-editing events within MNR adjacency provides a negative evidence against their authenticity. To enable similar locational enrichment analyses in relation to polytracts, we developed a software Polytrap which can handle 11 reference genomes. Additionally, we compiled polytracts of four model organisms into a Track Hub which can be integrated into USCS Genome Browser as an official track for convenient visualization of polytracts.
Collapse
Affiliation(s)
- Hui Yu
- Comprehensive Cancer Center, University of New Mexico, Albuquerque, New Mexico, United States of America
- * E-mail: (HY); (YG)
| | - Shilin Zhao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Scott Ness
- Comprehensive Cancer Center, University of New Mexico, Albuquerque, New Mexico, United States of America
| | - Huining Kang
- Comprehensive Cancer Center, University of New Mexico, Albuquerque, New Mexico, United States of America
| | - Quanhu Sheng
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - David C. Samuels
- Deptartment of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Olufunmilola Oyebamiji
- Comprehensive Cancer Center, University of New Mexico, Albuquerque, New Mexico, United States of America
| | - Ying-yong Zhao
- Key Laboratory of Resource Biology and Biotechnology in Western China, School of Life Sciences, Northwest University, Xi'an, Shaanxi, China
| | - Yan Guo
- Comprehensive Cancer Center, University of New Mexico, Albuquerque, New Mexico, United States of America
- * E-mail: (HY); (YG)
| |
Collapse
|
63
|
Rocca MS, Ferrarini M, Msaki A, Vinanzi C, Ghezzi M, De Rocco Ponce M, Foresta C, Ferlin A. Comparison of NGS panel and Sanger sequencing for genotyping CAG repeats in the
AR
gene. Mol Genet Genomic Med 2020; 8:e1207. [PMID: 32216057 PMCID: PMC7284049 DOI: 10.1002/mgg3.1207] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 02/19/2020] [Accepted: 02/22/2020] [Indexed: 12/30/2022] Open
Abstract
Background The androgen receptor (AR) is a nuclear receptor, encoded by the AR gene on the X chromosome. Within the first exon of the AR gene, two short tandem repeats (STR), CAG and GGC, are a source of polymorphism in the population. Therefore, high‐throughput methods for screening AR, such as next‐generation sequencing (NGS), are sought after; however, data generated by NGS are limited by the availability of bioinformatics tools. Here, we evaluated the accuracy of the bioinformatics tool HipSTR in detecting and quantify CAG repeats within the AR gene. Method The AR gene of 228 infertile men was sequenced using NGSgene panel. Data generated were analyzed with HipSTR to detect CAG repeats. The accuracy was compared with the results obtained with Sanger. Results We found that HipSTR was more accurate than Sanger in genotyping normal karyotype men (46,XY), however, it was more likely to misidentify homozygote genotypes in men with Klinefelter syndrome (47,XXY). Conclusion Our findings show that the bioinformatics tool HipSTR is 100% accurate in detecting and assessing AR CAG repeats in infertile men (46,XY) as well as in men with low‐level mosaicism.
Collapse
Affiliation(s)
- Maria Santa Rocca
- Unit of Andrology and Reproductive Medicine Department of Medicine University of Padua Padua Italy
| | - Margherita Ferrarini
- Unit of Andrology and Reproductive Medicine Department of Medicine University of Padua Padua Italy
| | - Aichi Msaki
- Unit of Andrology and Reproductive Medicine Department of Medicine University of Padua Padua Italy
| | - Cinzia Vinanzi
- Unit of Andrology and Reproductive Medicine Department of Medicine University of Padua Padua Italy
| | - Marco Ghezzi
- Unit of Andrology and Reproductive Medicine Department of Medicine University of Padua Padua Italy
| | - Maurizio De Rocco Ponce
- Unit of Andrology and Reproductive Medicine Department of Medicine University of Padua Padua Italy
| | - Carlo Foresta
- Unit of Andrology and Reproductive Medicine Department of Medicine University of Padua Padua Italy
| | - Alberto Ferlin
- Department of Clinical and Experimental Sciences University of Brescia Brescia Italy
| |
Collapse
|
64
|
Lepais O, Chancerel E, Boury C, Salin F, Manicki A, Taillebois L, Dutech C, Aissi A, Bacles CF, Daverat F, Launey S, Guichoux E. Fast sequence-based microsatellite genotyping development workflow. PeerJ 2020; 8:e9085. [PMID: 32411534 PMCID: PMC7204839 DOI: 10.7717/peerj.9085] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 04/08/2020] [Indexed: 12/21/2022] Open
Abstract
Application of high-throughput sequencing technologies to microsatellite genotyping (SSRseq) has been shown to remove many of the limitations of electrophoresis-based methods and to refine inference of population genetic diversity and structure. We present here a streamlined SSRseq development workflow that includes microsatellite development, multiplexed marker amplification and sequencing, and automated bioinformatics data analysis. We illustrate its application to five groups of species across phyla (fungi, plant, insect and fish) with different levels of genomic resource availability. We found that relying on previously developed microsatellite assay is not optimal and leads to a resulting low number of reliable locus being genotyped. In contrast, de novo ad hoc primer designs gives highly multiplexed microsatellite assays that can be sequenced to produce high quality genotypes for 20-40 loci. We highlight critical upfront development factors to consider for effective SSRseq setup in a wide range of situations. Sequence analysis accounting for all linked polymorphisms along the sequence quickly generates a powerful multi-allelic haplotype-based genotypic dataset, calling to new theoretical and analytical frameworks to extract more information from multi-nucleotide polymorphism marker systems.
Collapse
Affiliation(s)
- Olivier Lepais
- INRAE, Univ. Bordeaux, BIOGECO, Cestas, France
- INRAE, Université de Pau et Pays de l’Adour, ECOBIOP, Saint-Peé-sur-Nivelle, France
| | | | | | | | - Aurélie Manicki
- INRAE, Université de Pau et Pays de l’Adour, ECOBIOP, Saint-Peé-sur-Nivelle, France
| | - Laura Taillebois
- INRAE, Université de Pau et Pays de l’Adour, ECOBIOP, Saint-Peé-sur-Nivelle, France
| | | | | | - Cecile F.E. Bacles
- INRAE, Université de Pau et Pays de l’Adour, ECOBIOP, Saint-Peé-sur-Nivelle, France
| | | | - Sophie Launey
- INRAE, Agrocampus Ouest, ESE, Ecology and Ecosystem Health, Rennes, France
| | | |
Collapse
|
65
|
Development and application of a nonbinary SNP-based microhaplotype panel for paternity testing involving close relatives. Forensic Sci Int Genet 2020; 46:102255. [DOI: 10.1016/j.fsigen.2020.102255] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 12/12/2019] [Accepted: 01/20/2020] [Indexed: 11/22/2022]
|
66
|
Abstract
Background: Short tandem repeats are an important source of genetic variation. They are highly mutable and repeat expansions are associated dozens of human disorders, such as Huntington's disease and spinocerebellar ataxias. Technical advantages in sequencing technology have made it possible to analyse these repeats at large scale; however, accurate genotyping is still a challenging task. We compared four different short tandem repeats genotyping tools on whole exome sequencing data to determine their genotyping performance and limits, which will aid other researchers in choosing a suitable tool and parameters for analysis. Methods: The analysis was performed on the Simons Simplex Collection dataset, where we used a novel method of evaluation with accuracy determined by the rate of homozygous calls on the X chromosome of male samples. In total we analysed 433 samples and around a million genotypes for evaluating tools on whole exome sequencing data. Results: We determined a relatively good performance of all tools when genotyping repeats of 3-6 bp in length, which could be improved with coverage and quality score filtering. However, genotyping homopolymers was challenging for all tools and a high error rate was present across different thresholds of coverage and quality scores. Interestingly, dinucleotide repeats displayed a high error rate as well, which was found to be mainly caused by the AC/TG repeats. Overall, LobSTR was able to make the most calls and was also the fastest tool, while RepeatSeq and HipSTR exhibited the lowest heterozygous error rate at low coverage. Conclusions: All tools have different strengths and weaknesses and the choice may depend on the application. In this analysis we demonstrated the effect of using different filtering parameters and offered recommendations based on the trade-off between the best accuracy of genotyping and the highest number of calls.
Collapse
Affiliation(s)
- Andreas Halman
- Murdoch Children’s Research Institute, Royal Children’s Hospital, Parkville, VIC, 3052, Australia
- Peter MacCallum Cancer Centre, 305 Grattan St, Melbourne, VIC, 3000, Australia
- Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, VIC, 3052, Australia
- School of Natural Sciences and Health, Tallinn University, Tallinn, 10120, Estonia
| | - Alicia Oshlack
- Murdoch Children’s Research Institute, Royal Children’s Hospital, Parkville, VIC, 3052, Australia
- Peter MacCallum Cancer Centre, 305 Grattan St, Melbourne, VIC, 3000, Australia
- School of BioSciences, University of Melbourne, Parkville, VIC, 3052, Australia
| |
Collapse
|
67
|
McDew-White M, Li X, Nkhoma SC, Nair S, Cheeseman I, Anderson TJC. Mode and Tempo of Microsatellite Length Change in a Malaria Parasite Mutation Accumulation Experiment. Genome Biol Evol 2020; 11:1971-1985. [PMID: 31273388 PMCID: PMC6644851 DOI: 10.1093/gbe/evz140] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/29/2019] [Indexed: 12/12/2022] Open
Abstract
Malaria parasites have small extremely AT-rich genomes: microsatellite repeats (1–9 bp) comprise 11% of the genome and genetic variation in natural populations is dominated by repeat changes in microsatellites rather than point mutations. This experiment was designed to quantify microsatellite mutation patterns in Plasmodium falciparum. We established 31 parasite cultures derived from a single parasite cell and maintained these for 114–267 days with frequent reductions to a single cell, so parasites accumulated mutations during ∼13,207 cell divisions. We Illumina sequenced the genomes of both progenitor and end-point mutation accumulation (MA) parasite lines in duplicate to validate stringent calling parameters. Microsatellite calls were 99.89% (GATK), 99.99% (freeBayes), and 99.96% (HipSTR) concordant in duplicate sequence runs from independent sequence libraries, whereas introduction of microsatellite mutations into the reference genome revealed a low false negative calling rate (0.68%). We observed 98 microsatellite mutations. We highlight several conclusions: microsatellite mutation rates (3.12 × 10−7 to 2.16 × 10−8/cell division) are associated with both repeat number and repeat motif like other organisms studied. However, 41% of changes resulted from loss or gain of more than one repeat: this was particularly true for long repeat arrays. Unlike other eukaryotes, we found no insertions or deletions that were not associated with repeats or homology regions. Overall, microsatellite mutation rates are among the lowest recorded and comparable to those in another AT-rich protozoan (Dictyostelium). However, a single infection (>1011 parasites) will still contain over 2.16 × 103 to 3.12 × 104 independent mutations at any single microsatellite locus.
Collapse
Affiliation(s)
| | - Xue Li
- Texas Biomedical Research Institute, San Antonio, Texas
| | - Standwell C Nkhoma
- Texas Biomedical Research Institute, San Antonio, Texas.,Malaria Research and Reference Reagent Resource Center (MR4), BEI Resources, American Type Culture Collection, 10801 University Boulevard, Manassas, VA
| | - Shalini Nair
- Texas Biomedical Research Institute, San Antonio, Texas
| | - Ian Cheeseman
- Texas Biomedical Research Institute, San Antonio, Texas
| | | |
Collapse
|
68
|
Chaley M, Kutyrkin V. Stochastic models for description of structural-statistical properties in DNA sequences. J Theor Biol 2019; 496:110126. [PMID: 31866393 DOI: 10.1016/j.jtbi.2019.110126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 12/02/2019] [Accepted: 12/18/2019] [Indexed: 10/25/2022]
Abstract
New stochastic models based on a notion of stochastic codon are proposed. These models, presented by special random strings, describe practical structural-statistical properties which are peculiar to coding DNA both from prokaryotic and eukaryotic genomes. In such the case coding regions are considered as the realizations of random strings. The models introduced explain existence of latent profile periodicity with a period which is not only equal to but also multiplied of three in the coding regions. For the sequences with latent profile period multiplied of three, but not equal to three, the proposed models ensure existence of special property of 3-regularity in these sequences which is practically recognized in all coding sequences of the genomes analyzed. Feasibility of the stochastic models proposed was tested in numerical experiments with binary reencoded paragraphs of literary texts (in English and Italian languages), used as analog of DNA coding regions.
Collapse
Affiliation(s)
- Maria Chaley
- Institute of Mathematical Problems of Biology RAS - Branch of Keldysh Institute of Applied Mathematics RAS, Professor Vitkevich St.,1, 142290 Pushchino, Russia.
| | - Vladimir Kutyrkin
- Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st.,5, 105005 Moscow, Russia.
| |
Collapse
|
69
|
Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019; 47:10994-11006. [PMID: 31584084 PMCID: PMC6868369 DOI: 10.1093/nar/gkz841] [Citation(s) in RCA: 169] [Impact Index Per Article: 28.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/03/2019] [Accepted: 10/01/2019] [Indexed: 12/13/2022] Open
Abstract
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. CB10 1SD, UK
| | - Patryk Jarnot
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawińskiego 5A, 02-106 Warsaw, Poland
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
- Institut de Biologie Computationnelle, 34095 Montpellier, France
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, PO Box 20537, CY 1678 Nicosia, Cyprus
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| |
Collapse
|
70
|
Harhay GP, Harhay DM, Bono JL, Capik SF, DeDonder KD, Apley MD, Lubbers BV, White BJ, Larson RL, Smith TPL. A Computational Method to Quantify the Effects of Slipped Strand Mispairing on Bacterial Tetranucleotide Repeats. Sci Rep 2019; 9:18087. [PMID: 31792233 PMCID: PMC6889271 DOI: 10.1038/s41598-019-53866-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 11/04/2019] [Indexed: 01/17/2023] Open
Abstract
The virulence and pathogenicity of bacterial pathogens are related to their adaptability to changing environments. One process enabling adaptation is based on minor changes in genome sequence, as small as a few base pairs, within segments of genome called simple sequence repeats (SSRs) that consist of multiple copies of a short sequence (from one to several nucleotides), repeated in series. SSRs are found in eukaryotes as well as prokaryotes, and length variation in them occurs at frequencies up to a million-fold higher than bacterial point mutations through the process of slipped strand mispairing (SSM) by DNA polymerase during replication. The characterization of SSR length by standard sequencing methods is complicated by the appearance of length variation introduced during the sequencing process that obscures the lower abundance repeat number variants in a population. Here we report a computational approach to correct for sequencing process-induced artifacts, validated for tetranucleotide repeats by use of synthetic constructs of fixed, known length. We apply this method to a laboratory culture of Histophilus somni, prepared from a single colony, and demonstrate that the culture consists of populations of distinct sequence phase and length variants at individual tetranucleotide SSR loci.
Collapse
Affiliation(s)
- Gregory P Harhay
- USDA ARS US Meat Animal Research Center, Clay Center, NE, United States.
| | - Dayna M Harhay
- USDA ARS US Meat Animal Research Center, Clay Center, NE, United States
| | - James L Bono
- USDA ARS US Meat Animal Research Center, Clay Center, NE, United States
| | - Sarah F Capik
- Texas A&M AgriLife Research, Amarillo, TX and the College of Veterinary Medicine & Biomedical Sciences, Texas A&M University System, College Station, TX, United States
| | - Keith D DeDonder
- Veterinary and Biomedical Research Center, Inc, Manhattan, KS, United States
| | - Michael D Apley
- Kansas State University, College of Veterinary Medicine, Manhattan, KS, United States
| | - Brian V Lubbers
- Kansas State University, College of Veterinary Medicine, Manhattan, KS, United States
| | - Bradley J White
- Kansas State University, College of Veterinary Medicine, Manhattan, KS, United States
| | - Robert L Larson
- Kansas State University, College of Veterinary Medicine, Manhattan, KS, United States
| | - Timothy P L Smith
- USDA ARS US Meat Animal Research Center, Clay Center, NE, United States
| |
Collapse
|
71
|
Kinney N, Titus-Glover K, Wren JD, Varghese RT, Michalak P, Liao H, Anandakrishnan R, Pulenthiran A, Kang L, Garner HR. CAGm: a repository of germline microsatellite variations in the 1000 genomes project. Nucleic Acids Res 2019; 47:D39-D45. [PMID: 30329086 PMCID: PMC6323891 DOI: 10.1093/nar/gky969] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 10/04/2018] [Accepted: 10/05/2018] [Indexed: 12/14/2022] Open
Abstract
The human genome harbors an abundance of repetitive DNA; however, its function continues to be debated. Microsatellites-a class of short tandem repeat-are established as an important source of genetic variation. Array length variants are common among microsatellites and affect gene expression; but, efforts to understand the role and diversity of microsatellite variation has been hampered by several challenges. Without adequate depth, both long-read and short-read sequencing may not detect the variants present in a sample; additionally, large sample sizes are needed to reveal the degree of population-level polymorphism. To address these challenges we present the Comparative Analysis of Germline Microsatellites (CAGm): a database of germline microsatellites from 2529 individuals in the 1000 genomes project. A key novelty of CAGm is the ability to aggregate microsatellite variation by population, ethnicity (super population) and gender. The database provides advanced searching for microsatellites embedded in genes and functional elements. All data can be downloaded as Microsoft Excel spreadsheets. Two use-case scenarios are presented to demonstrate its utility: a mononucleotide (A) microsatellite at the BAT-26 locus and a dinucleotide (CA) microsatellite in the coding region of FGFRL1. CAGm is freely available at http://www.cagmdb.org/.
Collapse
Affiliation(s)
- Nicholas Kinney
- Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060, USA
| | - Kyle Titus-Glover
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Jonathan D Wren
- Arthritis and Clinical Immunology Research Program, Division of Genomics and Data Sciences Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA
- Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Robin T Varghese
- Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060, USA
| | - Pawel Michalak
- Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060, USA
- One Health Research Center, Virginia-Maryland College of Veterinary Medicine, 1410 Prices Fork Rd, Blacksburg, VA 24060, USA
- Institute of Evolution,University of Haifa, Abba Khoushy Ave 199, Haifa, 3498838, Israel
| | - Han Liao
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Ramu Anandakrishnan
- Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060, USA
| | - Arichanah Pulenthiran
- Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060, USA
| | - Lin Kang
- Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060, USA
| | - Harold R Garner
- Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060, USA
- Gibbs Cancer Center & Research Institute, 101 E Wood St., Spartanburg, SC 29303, USA
| |
Collapse
|
72
|
Mao Z, Fu Y, Wang Y, Wang S, Zhang M, Gao X, Luo K, Qin Q, Zhang C, Tao M, Yao Z, Liu S. Evidence for paternal DNA transmission to gynogenetic grass carp. BMC Genet 2019; 20:3. [PMID: 30616510 PMCID: PMC6323743 DOI: 10.1186/s12863-018-0712-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 12/26/2018] [Indexed: 12/24/2022] Open
Abstract
Background Grass carp (Ctenopharyngodon idellus, GC), as the highest-output fish in China, is economically important. The production of gynogenetic grass carp (GGC) will provide important germplasm resource for producing improved GC. At present, knowledge regarding the heterologous sperm DNA in gynogenetic offspring is little. Thus, revealing paternal DNA in GGC at the molecular level would be highly significant for fish genetic breeding. Result In this study, ultraviolet-treated sperm of koi carp (Cyprinus carpio haematopterus, KOC, 2n = 100), was used to activate the eggs of GC (2n = 48). Afterwards, cold shock (0–4 °C) was administered for 12 min to double the chromosomes, resulting in GGC. No significant difference (p > 0.05) was found between GGC and GC in appearance, erythrocytes size and chromosome numbers. However, at the molecular level, a specific microsatellite DNA fragment (MFW1-gynogenetic grass carp, MFW1-G) derived from the paternal parent KOC was found to be transmitted into GGC. Conclusions For the first time, this study provided an evidence at the molecular level that the DNA fragment derived from the paternal parent occurred in GGC. This finding is of great significance for fish genetic breeding.
Collapse
Affiliation(s)
- Zhuangwen Mao
- State Key Laboratory of Developmental Biology of Freshwater Fish, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China.,College of Life Sciences, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China
| | - Yeqing Fu
- State Key Laboratory of Developmental Biology of Freshwater Fish, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China.,College of Life Sciences, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China
| | - Yude Wang
- State Key Laboratory of Developmental Biology of Freshwater Fish, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China.,College of Life Sciences, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China
| | - Shi Wang
- State Key Laboratory of Developmental Biology of Freshwater Fish, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China.,College of Life Sciences, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China
| | - Minghe Zhang
- State Key Laboratory of Developmental Biology of Freshwater Fish, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China.,College of Life Sciences, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China
| | - Xin Gao
- State Key Laboratory of Developmental Biology of Freshwater Fish, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China.,College of Life Sciences, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China
| | - Kaikun Luo
- State Key Laboratory of Developmental Biology of Freshwater Fish, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China.,College of Life Sciences, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China
| | - Qinbo Qin
- State Key Laboratory of Developmental Biology of Freshwater Fish, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China.,College of Life Sciences, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China
| | - Chun Zhang
- State Key Laboratory of Developmental Biology of Freshwater Fish, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China.,College of Life Sciences, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China
| | - Min Tao
- State Key Laboratory of Developmental Biology of Freshwater Fish, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China.,College of Life Sciences, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China
| | - Zhanzhou Yao
- State Key Laboratory of Developmental Biology of Freshwater Fish, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China.,College of Life Sciences, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China
| | - Shaojun Liu
- State Key Laboratory of Developmental Biology of Freshwater Fish, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China. .,College of Life Sciences, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China.
| |
Collapse
|
73
|
Zou X, Wang Z, He G, Wang M, Su Y, Liu J, Chen P, Wang S, Gao B, Li Z, Hou Y. Population Genetic Diversity and Phylogenetic Characteristics for High-Altitude Adaptive Kham Tibetan Revealed by DNATyper TM 19 Amplification System. Front Genet 2018; 9:630. [PMID: 30619458 PMCID: PMC6304359 DOI: 10.3389/fgene.2018.00630] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Accepted: 11/26/2018] [Indexed: 11/13/2022] Open
Abstract
Tibetans residing in the high-altitude inhospitable environment have undergone significant natural selection of their genetic architecture. Recently, highly mutational autosomal short tandem repeats were widely used not only in the anthropology and population genetics to investigate the genetic structure and relationships, but also in the medical genetics to explore the pathogenesis of multiple genetic diseases and in the forensic science to identify individual and parentage relatedness. However, genetic variants and forensic efficiency of DNATyperTM 19 amplification system and genetic background of Kham Tibetan remain uncharacterized. Thus, we genotyped 19 forensic genetic markers in 11,402 Kham Tibetans to gain insight into the genetic diversity of Chinese high-altitude adaptive population. Highly discriminating and polymorphic forensic measures were observed, which indicated that this new-developed DNATyper 19 PCR amplification is suitable for routine forensic identification purposes and Chinese national DNA database establishment. Pairwise genetic distances among the comprehensive population comparisons suggested that this high-altitude adaptive Kham Tibetan has genetically closer relationships with lowlanders of Tibeto-Burman-speaking populations (Chengdu Tibetan, Liangshan Tibetan, and Liangshan Yi). Genetic substructure analyses via phylogenetic reconstruction, principal component analysis, and multidimensional scaling analysis in both nationwide and worldwide contexts suggested that the genetic proximity exists along the linguistic, ethnic, and continental geographical boundary. Further studies with whole-genome sequencing of modern or archaic Kham Tibetans would be useful in reconstructing the Tibetan population history.
Collapse
Affiliation(s)
- Xing Zou
- Institute of Forensic Medicine, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China
| | - Zheng Wang
- Institute of Forensic Medicine, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China
| | - Guanglin He
- Institute of Forensic Medicine, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China
| | - Mengge Wang
- Institute of Forensic Medicine, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China
| | - Yongdong Su
- Forensic Identification Center, Public Security Bureau of Tibet Autonomous Region, Lhasa, China
| | - Jing Liu
- Institute of Forensic Medicine, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China
| | - Pengyu Chen
- Center of Forensic Expertise, Affiliated Hospital of Zunyi Medical University, Zunyi, China.,School of Forensic Medicine, Zunyi Medical University, Zunyi, China
| | - Shouyu Wang
- Institute of Forensic Medicine, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China
| | - Bo Gao
- Institute of Forensic Science, Yili Public Security Bureau of Xinjiang, Kuytun, China
| | - Zhao Li
- Department of Criminal Investigation, Mianyang Public Security Bureau, Mianyang, China
| | - Yiping Hou
- Institute of Forensic Medicine, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China
| |
Collapse
|
74
|
Fan W, Xu L, Cheng H, Li M, Liu H, Jiang Y, Guo Y, Zhou Z, Hou S. Characterization of Duck ( Anas platyrhynchos) Short Tandem Repeat Variation by Population-Scale Genome Resequencing. Front Genet 2018; 9:520. [PMID: 30425731 PMCID: PMC6218588 DOI: 10.3389/fgene.2018.00520] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Accepted: 10/15/2018] [Indexed: 12/30/2022] Open
Abstract
Short tandem repeats (STRs) are usually associated with genetic diseases and gene regulatory functions, and are also important genetic markers for analysis of evolutionary, genetic diversity and forensic. However, for the majority of STRs in the duck genome, their population genetic properties and functional impacts remain poorly defined. Recent advent of next generation sequencing (NGS) has offered an opportunity for profiling large numbers of polymorphic STRs. Here, we reported a population-scale analysis of STR variation using genome resequencing in mallard and Pekin duck. Our analysis provided the first genome-wide duck STR reference including 198,022 STR loci with motif size of 2–6 base pairs. We observed a relatively uneven distribution of STRs in different genomic regions, which indicates that the occurrence of STRs in duck genome is not random, but undergoes a directional selection pressure. Using genome resequencing data of 23 mallard and 26 Pekin ducks, we successfully identified 89,891 polymorphic STR loci. Intensive analysis of this dataset suggested that shorter repeat motif, longer reference tract length, higher purity, and residing outside of a coding region are all associated with an increase in STR variability. STR genotypes were utilized for population genetic analysis, and the results showed that population structure and divergence patterns among population groups can be efficiently captured. In addition, comparison between Pekin duck and mallard identified 3,122 STRs with extremely divergent allele frequency, which overlapped with a set of genes related to nervous system, energy metabolism and behavior. The evolutionary analysis revealed that the genes containing divergent STRs may play important roles in phenotypic changes during duck domestication. The variation analysis of STRs in population scale provides valuable resource for future study of genetic diversity and genome evolution in duck.
Collapse
Affiliation(s)
- Wenlei Fan
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China.,State Key Laboratory of Animal Nutrition, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Lingyang Xu
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Hong Cheng
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Ming Li
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Hehe Liu
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yong Jiang
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yuming Guo
- State Key Laboratory of Animal Nutrition, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Zhengkui Zhou
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Shuisheng Hou
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
75
|
Genetic structure and polymorphisms of Gelao ethnicity residing in southwest china revealed by X-chromosomal genetic markers. Sci Rep 2018; 8:14585. [PMID: 30275508 PMCID: PMC6167355 DOI: 10.1038/s41598-018-32945-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 09/19/2018] [Indexed: 01/10/2023] Open
Abstract
X-chromosome short tandem repeat markers (X-STRs), due to their special inheritance models, physical location on a single chromosome and the absence of recombination in male meiosis, play an important role in forensic and population genetics. While a series of genetic analyses focusing on the genetic diversity and forensic characteristics of X-STRs are well studied for ethnically/linguistically diverse and demographically large Chinese populations, genetic evidence from Gelao ethnicity is still sparse. Here, we genotyped the first batch of 19 X-STRs in 513 Chinese Gelao individuals (265 females and 248 males), and reported genetic polymorphisms, forensic characteristics based on the single locus and seven linkage groups. DXS10135 with the highest PIC (0.9106) and LG1 (DXS10148-DXS10135-DXS8378) with the largest HD (0.9970) are polymorphic and informative. The CPDs in Gelao males and females are respectively larger than 0.999999999997095 and 0.99999999999999999999918, and the combined MECs are larger than 0.999999975715109. Subsequently, we investigated the population relationships among 14 Chinese populations based on 19 X-STRs and among 23 populations based on 11 overlapped X-STRs. Our results revealed genetic differentiations among Tibeto-Burman, Altaic and other Chinese homogenous populations, and demonstrated that Guizhou Gelao has the genetically closer relationships with Han Chinese and geographically close Guizhou Miao.
Collapse
|
76
|
Allele frequencies of 15 autosomal STRs in Chinese Nakhi and Yi populations. Int J Legal Med 2018; 133:105-108. [PMID: 30218175 DOI: 10.1007/s00414-018-1931-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Accepted: 08/31/2018] [Indexed: 01/26/2023]
Abstract
Genetic characterization of ethnically and geographically diverse populations via short tandem repeats (STRs) is relevant to various fundamental and applied areas of forensic genetics, population studies, and even molecular anthropology. In the present study, genetic polymorphisms of 15 autosomal STR loci were firstly obtained from 918 individuals (495 Nakhis and 423 Yis) residing in the foothills of the Himalayas. The cumulative powers of discrimination and probabilities of exclusion in the two studied ethnic groups were both larger than 0.999999999999999982 and 0.9999961, respectively. Genetic similarities and differences among 61 populations were subsequently investigated by pairwise Cavalli-Sforza genetic distance, multidimensional scaling plots, principal component analysis, and phylogenetic relationship reconstruction. Both Nakhi and Yi had the genetically close relationships with Yunnan Bai and distinct relationships with Xinjiang Turkic-speaking populations (Uyghur and Kazakh) and Vietnamese.
Collapse
|
77
|
Dashnow H, Lek M, Phipson B, Halman A, Sadedin S, Lonsdale A, Davis M, Lamont P, Clayton JS, Laing NG, MacArthur DG, Oshlack A. STRetch: detecting and discovering pathogenic short tandem repeat expansions. Genome Biol 2018; 19:121. [PMID: 30129428 PMCID: PMC6102892 DOI: 10.1186/s13059-018-1505-2] [Citation(s) in RCA: 93] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Accepted: 08/07/2018] [Indexed: 11/10/2022] Open
Abstract
Short tandem repeat (STR) expansions have been identified as the causal DNA mutation in dozens of Mendelian diseases. Most existing tools for detecting STR variation with short reads do so within the read length and so are unable to detect the majority of pathogenic expansions. Here we present STRetch, a new genome-wide method to scan for STR expansions at all loci across the human genome. We demonstrate the use of STRetch for detecting STR expansions using short-read whole-genome sequencing data at known pathogenic loci as well as novel STR loci. STRetch is open source software, available from github.com/Oshlack/STRetch.
Collapse
Affiliation(s)
- Harriet Dashnow
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, VIC, Australia.,School of Biosciences, The University of Melbourne, Parkville, VIC, Australia
| | - Monkol Lek
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Belinda Phipson
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, VIC, Australia
| | - Andreas Halman
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, VIC, Australia.,Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, VIC, Australia
| | - Simon Sadedin
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, VIC, Australia
| | - Andrew Lonsdale
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, VIC, Australia
| | - Mark Davis
- Department of Diagnostic Genomics, PathWest Laboratory Medicine, QEII Medical Centre, Nedlands, WA, Australia
| | - Phillipa Lamont
- Neurogenetic Unit, Royal Perth Hospital, Perth, WA, Australia
| | - Joshua S Clayton
- Harry Perkins Institute of Medical Research, Centre for Medical Research, University of Western Australia, Nedlands, WA, Australia
| | - Nigel G Laing
- Harry Perkins Institute of Medical Research, Centre for Medical Research, University of Western Australia, Nedlands, WA, Australia
| | - Daniel G MacArthur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Alicia Oshlack
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, VIC, Australia. .,School of Biosciences, The University of Melbourne, Parkville, VIC, Australia.
| |
Collapse
|
78
|
Genovese LM, Geraci F, Corrado L, Mangano E, D'Aurizio R, Bordoni R, Severgnini M, Manzini G, De Bellis G, D'Alfonso S, Pellegrini M. A Census of Tandemly Repeated Polymorphic Loci in Genic Regions Through the Comparative Integration of Human Genome Assemblies. Front Genet 2018; 9:155. [PMID: 29770143 PMCID: PMC5941971 DOI: 10.3389/fgene.2018.00155] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 04/13/2018] [Indexed: 11/29/2022] Open
Abstract
Polymorphic Tandem Repeat (PTR) is a common form of polymorphism in the human genome. A PTR consists in a variation found in an individual (or in a population) of the number of repeating units of a Tandem Repeat (TR) locus of the genome with respect to the reference genome. Several phenotypic traits and diseases have been discovered to be strongly associated with or caused by specific PTR loci. PTR are further distinguished in two main classes: Short Tandem Repeats (STR) when the repeating unit has size up to 6 base pairs, and Variable Number Tandem Repeats (VNTR) for repeating units of size above 6 base pairs. As larger and larger populations are screened via high throughput sequencing projects, it becomes technically feasible and desirable to explore the association between PTR and a panoply of such traits and conditions. In order to facilitate these studies, we have devised a method for compiling catalogs of PTR from assembled genomes, and we have produced a catalog of PTR for genic regions (exons, introns, UTR and adjacent regions) of the human genome (GRCh38). We applied four different TR discovery software tools to uncover in the first phase 55,223,485 TR (after duplicate removal) in GRCh38, of which 373,173 were determined to be PTR in the second phase by comparison with five assembled human genomes. Of these, 263,266 are not included by state-of-the-art PTR catalogs. The new methodology is mainly based on a hierarchical and systematic application of alignment-based sequence comparisons to identify and measure the polymorphism of TR. While previous catalogs focus on the class of STR of small total size, we remove any size restrictions, aiming at the more general class of PTR, and we also target fuzzy TR by using specific detection tools. Similarly to other previous catalogs of human polymorphic loci, we focus our catalog toward applications in the discovery of disease-associated loci. Validation by cross-referencing with existing catalogs on common clinically-relevant loci shows good concordance. Overall, this proposed census of human PTR in genic regions is a shared resource (web accessible), complementary to existing catalogs, facilitating future genome-wide studies involving PTR.
Collapse
Affiliation(s)
| | - Filippo Geraci
- Institute for Informatics and Telematics of CNR, Pisa, Italy
| | - Lucia Corrado
- Department of Health Sciences, University of Eastern Piedmont Amedeo Avogadro, Novara, Italy
| | | | | | - Roberta Bordoni
- Institute for Biomedical Technologies of CNR, Segrate, Italy
| | | | - Giovanni Manzini
- Institute for Informatics and Telematics of CNR, Pisa, Italy.,Department of Science and Technological Innovation, University of Eastern Piedmont Amedeo Avogadro, Novara, Italy
| | | | - Sandra D'Alfonso
- Department of Health Sciences, University of Eastern Piedmont Amedeo Avogadro, Novara, Italy
| | | |
Collapse
|
79
|
Yu C, Baune BT, Wong ML, Licinio J. Investigation of short tandem repeats in major depression using whole-genome sequencing data. J Affect Disord 2018; 232:305-309. [PMID: 29501989 DOI: 10.1016/j.jad.2018.02.046] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 01/02/2018] [Accepted: 02/16/2018] [Indexed: 02/06/2023]
Abstract
BACKGROUND Major depressive disorder (MDD) is a leading contributor to global disease burden. Recent studies have shown that genetic factors play significant roles in the susceptibility to this condition; however, the underlying genetic basis currently remains largely unknown. Short tandem repeat (STR) has been proposed as an explanatory factor in the "missing heritability" of complex diseases or traits. METHODS We investigated STR variations from 15 MDD patients and 10 ethnically matched healthy controls based on their deep whole-genome sequencing (WGS) data. The lobSTR software was used to computationally determine STRs. RESULTS The results of the Mexican-American sample showed that STRs are significantly richer in healthy controls than in MDD cases on each of the 23 chromosomes (all false discovery rates, FDR P-values < 0.0062); while for the Australian of European-ancestry sample, there was no statistically significant STRs difference between MDD cases and controls. LIMITATIONS High quality WGS costs limited obtaining larger datasets. CONCLUSIONS This preliminary work is the first study that STR variations are applied to investigate MDD based on WGS data. The results on Mexican-American population may imply that within the same ancestry, targeted sequencing on a specific chromosome or region of genome would be sufficient for examining the relationship between STR and MDD. Further studies should examine larger sequencing datasets on other ethnic groups.
Collapse
Affiliation(s)
- Chenglong Yu
- Robinson Research Institute, Adelaide Medical School, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia; Mind and Brain Theme, South Australian Health and Medical Research Institute, North Terrace, Adelaide, SA 5000, Australia; School of Medicine, Faculty of Medicine, Nursing and Health Sciences, Flinders University, Bedford Park, SA 5042, Australia.
| | - Bernhard T Baune
- Discipline of Psychiatry, Adelaide Medical School, University of Adelaide, Adelaide, SA 5005, Australia
| | - Ma-Li Wong
- Mind and Brain Theme, South Australian Health and Medical Research Institute, North Terrace, Adelaide, SA 5000, Australia; School of Medicine, Faculty of Medicine, Nursing and Health Sciences, Flinders University, Bedford Park, SA 5042, Australia; Department of Psychiatry, College of Medicine, State University of New York, Upstate Medical University, Syracuse, NY 13210, USA
| | - Julio Licinio
- Department of Psychiatry, College of Medicine, State University of New York, Upstate Medical University, Syracuse, NY 13210, USA; Departments of Pharmacology and Medicine, College of Medicine, State University of New York, Upstate Medical University, Syracuse, NY 13210, USA
| |
Collapse
|
80
|
Abstract
Accumulating evidence suggests that many classes of DNA repeats exhibit attributes that distinguish them from other genetic variants, including the fact that they are more liable to mutation; this enables them to mediate genetic plasticity. The expansion of tandem repeats, particularly of short tandem repeats, can cause a range of disorders (including Huntington disease, various ataxias, motor neuron disease, frontotemporal dementia, fragile X syndrome and other neurological disorders), and emerging data suggest that tandem repeat polymorphisms (TRPs) can also regulate gene expression in healthy individuals. TRPs in human genomes may also contribute to the missing heritability of polygenic disorders. A better understanding of tandem repeats and their associated repeatome, as well as their capacity for genetic plasticity via both germline and somatic mutations, is needed to transform our understanding of the role of TRPs in health and disease.
Collapse
Affiliation(s)
- Anthony J Hannan
- Florey Institute of Neuroscience and Mental Health, University of Melbourne.,Department of Anatomy and Neuroscience, University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
81
|
Bagshaw AT. Functional Mechanisms of Microsatellite DNA in Eukaryotic Genomes. Genome Biol Evol 2017; 9:2428-2443. [PMID: 28957459 PMCID: PMC5622345 DOI: 10.1093/gbe/evx164] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/23/2017] [Indexed: 02/06/2023] Open
Abstract
Microsatellite repeat DNA is best known for its length mutability, which is implicated in several neurological diseases and cancers, and often exploited as a genetic marker. Less well-known is the body of work exploring the widespread and surprisingly diverse functional roles of microsatellites. Recently, emerging evidence includes the finding that normal microsatellite polymorphism contributes substantially to the heritability of human gene expression on a genome-wide scale, calling attention to the task of elucidating the mechanisms involved. At present, these are underexplored, but several themes have emerged. I review evidence demonstrating roles for microsatellites in modulation of transcription factor binding, spacing between promoter elements, enhancers, cytosine methylation, alternative splicing, mRNA stability, selection of transcription start and termination sites, unusual structural conformations, nucleosome positioning and modification, higher order chromatin structure, noncoding RNA, and meiotic recombination hot spots.
Collapse
|