1
|
Li W, Liu F, Chen S, Wingfield MJ, Duong TA. High Genetic Diversity and Limited Regional Population Differentiation in Populations of Calonectria pseudoreteaudii from Eucalyptus Plantations. PHYTOPATHOLOGY 2025; 115:97-105. [PMID: 39320987 DOI: 10.1094/phyto-05-24-0154-r] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2024]
Abstract
Calonectria pseudoreteaudii causes a serious and widespread disease known as Calonectria leaf blight in Eucalyptus plantations of southern China. Little is known regarding the population biology or reproductive biology of this pathogen in the affected areas. The aims of this study were to investigate the genetic diversity, population structure, and reproductive mode of C. pseudoreteaudii from affected Eucalyptus plantations of southern China. Ten polymorphic simple sequence repeat markers were developed for the species and were used to genotype 311 isolates from eight populations. The mating types of all isolates were identified using the MAT gene primers. The results revealed a high level of genetic diversity of the pathogen in all investigated populations. Of the 90 multilocus genotypes detected, 10 were shared between at least two populations. With the exception of one population from HuiZhou, GuangDong (7HZ), the most dominant genotype was shared in the seven remaining populations. Discriminant analysis of principal components and population differentiation analyses showed that the 7HZ population was well differentiated from the others and that there was no significant differentiation between the remaining populations. Analysis of molecular variance suggested that most molecular variation was within populations (86%). Index of association analysis was consistent with a predominantly asexual life cycle for C. pseudoreteaudii in the studied regions. Although both mating types were detected in seven of the eight populations, the MAT1-1/MAT1-2 ratios in these populations deviated significantly from the 1:1 ratio expected in a randomly mating population.
Collapse
Affiliation(s)
- WenWen Li
- Department of Plant and Soil Sciences, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria 0028, South Africa
- Research Institute of Fast-growing Trees (RIFT), Chinese Academy of Forestry (CAF), ZhanJiang 524022, GuangDong Province, China
| | - FeiFei Liu
- Research Institute of Fast-growing Trees (RIFT), Chinese Academy of Forestry (CAF), ZhanJiang 524022, GuangDong Province, China
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria 0028, South Africa
| | - ShuaiFei Chen
- Research Institute of Fast-growing Trees (RIFT), Chinese Academy of Forestry (CAF), ZhanJiang 524022, GuangDong Province, China
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria 0028, South Africa
| | - Michael J Wingfield
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria 0028, South Africa
| | - Tuan A Duong
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria 0028, South Africa
| |
Collapse
|
2
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
3
|
Mandape SN, Budowle B, Mittelman K, Mittelman D. Dense single nucleotide polymorphism testing revolutionizes scope and degree of certainty for source attribution in forensic investigations. Croat Med J 2024; 65:249-260. [PMID: 38868971 PMCID: PMC11157251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 05/06/2024] [Indexed: 06/14/2024] Open
Abstract
The field of forensic DNA analysis has experienced significant advancements over the years, such as the advent of DNA fingerprinting, the introduction of the polymerase chain reaction for increased sensitivity, the shift to a primary genetic marker system based on short tandem repeats, and implementation of national DNA databases. Now, the forensics field is poised for another revolution with the advent of dense single nucleotide polymorphisms (SNPs) testing. SNP testing holds the potential to significantly enhance source attribution in forensic cases, particularly those involving low-quantity or low-quality samples. When coupled with genetic genealogy and kinship analysis, it can resolve countless active cases as well as cold cases and cases of unidentified human remains, which are hindered by the limitations of existing forensic capabilities that fail to generate viable investigative leads with DNA. The field of forensic genetic genealogy combined with genome-wide sequencing can associate relatives as distant as the seventh-degree and beyond. By leveraging volunteer-populated databases to locate near and distant relatives, genetic genealogy can effectively narrow the candidates linked to crime scene evidence or aid in determining the identity of human remains. With decreasing DNA sequencing costs and improving sensitivity of detection, forensic genetic genealogy is expanding its capabilities to generate investigative leads from a wide range of biological evidence.
Collapse
Affiliation(s)
| | | | | | - David Mittelman
- David Mittelman, Othram Inc., 2829 Technology Forest Blvd STE 100, The Woodlands, Texas 77381, USA,
| |
Collapse
|
4
|
Edwards SV, Cloutier A, Cockburn G, Driver R, Grayson P, Katoh K, Baldwin MW, Sackton TB, Baker AJ. A nuclear genome assembly of an extinct flightless bird, the little bush moa. SCIENCE ADVANCES 2024; 10:eadj6823. [PMID: 38781323 PMCID: PMC11809649 DOI: 10.1126/sciadv.adj6823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 04/17/2024] [Indexed: 05/25/2024]
Abstract
We present a draft genome of the little bush moa (Anomalopteryx didiformis)-one of approximately nine species of extinct flightless birds from Aotearoa, New Zealand-using ancient DNA recovered from a fossil bone from the South Island. We recover a complete mitochondrial genome at 249.9× depth of coverage and almost 900 megabases of a male moa nuclear genome at ~4 to 5× coverage, with sequence contiguity sufficient to identify more than 85% of avian universal single-copy orthologs. We describe a diverse landscape of transposable elements and satellite repeats, estimate a long-term effective population size of ~240,000, identify a diverse suite of olfactory receptor genes and an opsin repertoire with sensitivity in the ultraviolet range, show that the wingless moa phenotype is likely not attributable to gene loss or pseudogenization, and identify potential function-altering coding sequence variants in moa that could be synthesized for future functional assays. This genomic resource should support further studies of avian evolution and morphological divergence.
Collapse
Affiliation(s)
- Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Alison Cloutier
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Glenn Cockburn
- Evolution of Sensory Systems Research Group, Max Planck Institute for Biological Intelligence, 82319 Seewiesen, Germany
| | - Robert Driver
- Department of Biology, East Carolina University, E 5th Street, Greenville, NC 27605, USA
| | - Phil Grayson
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Kazutaka Katoh
- Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita 565-0871, Japan
| | - Maude W. Baldwin
- Evolution of Sensory Systems Research Group, Max Planck Institute for Biological Intelligence, 82319 Seewiesen, Germany
| | - Timothy B. Sackton
- Informatics Group, Harvard University, 38 Oxford Street, Cambridge, MA 02138, USA
| | - Allan J. Baker
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcox Street, Toronto, ON M5S 3B2, Canada
- Department of Natural History, Royal Ontario Museum, 100 Queen’s Park, Toronto, ON M5S 2C6, Canada
| |
Collapse
|
5
|
McComish BJ, Charleston MA, Parks M, Baroni C, Salvatore MC, Li R, Zhang G, Millar CD, Holland BR, Lambert DM. Ancient and Modern Genomes Reveal Microsatellites Maintain a Dynamic Equilibrium Through Deep Time. Genome Biol Evol 2024; 16:evae017. [PMID: 38412309 PMCID: PMC10972684 DOI: 10.1093/gbe/evae017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 12/22/2023] [Accepted: 01/23/2024] [Indexed: 02/29/2024] Open
Abstract
Microsatellites are widely used in population genetics, but their evolutionary dynamics remain poorly understood. It is unclear whether microsatellite loci drift in length over time. This is important because the mutation processes that underlie these important genetic markers are central to the evolutionary models that employ microsatellites. We identify more than 27 million microsatellites using a novel and unique dataset of modern and ancient Adélie penguin genomes along with data from 63 published chordate genomes. We investigate microsatellite evolutionary dynamics over 2 timescales: one based on Adélie penguin samples dating to ∼46.5 ka and the other dating to the diversification of chordates aged more than 500 Ma. We show that the process of microsatellite allele length evolution is at dynamic equilibrium; while there is length polymorphism among individuals, the length distribution for a given locus remains stable. Many microsatellites persist over very long timescales, particularly in exons and regulatory sequences. These often retain length variability, suggesting that they may play a role in maintaining phenotypic variation within populations.
Collapse
Affiliation(s)
- Bennet J McComish
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
- Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS 7001, Australia
| | | | - Matthew Parks
- Australian Research Centre for Human Evolution, Griffith University, Nathan, QLD 4111, Australia
- Department of Biology, University of Central Oklahoma, Edmond, OK 73034, USA
| | - Carlo Baroni
- Dipartimento di Scienze della Terra, University of Pisa, Pisa, Italy
- CNR-IGG, Institute of Geosciences and Earth Resources, Pisa, Italy
| | - Maria Cristina Salvatore
- Dipartimento di Scienze della Terra, University of Pisa, Pisa, Italy
- CNR-IGG, Institute of Geosciences and Earth Resources, Pisa, Italy
| | - Ruiqiang Li
- Novogene Bioinformatics Technology Co. Ltd., Beijing 100083, China
| | - Guojie Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
- Department of Biology, Centre for Social Evolution, University of Copenhagen, Copenhagen DK-2100, Denmark
| | - Craig D Millar
- School of Biological Sciences, University of Auckland, Auckland, New Zealand
| | - Barbara R Holland
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
| | - David M Lambert
- Australian Research Centre for Human Evolution, Griffith University, Nathan, QLD 4111, Australia
| |
Collapse
|
6
|
Birnbaum R. Rediscovering tandem repeat variation in schizophrenia: challenges and opportunities. Transl Psychiatry 2023; 13:402. [PMID: 38123544 PMCID: PMC10733427 DOI: 10.1038/s41398-023-02689-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 11/23/2023] [Accepted: 11/27/2023] [Indexed: 12/23/2023] Open
Abstract
Tandem repeats (TRs) are prevalent throughout the genome, constituting at least 3% of the genome, and often highly polymorphic. The high mutation rate of TRs, which can be orders of magnitude higher than single-nucleotide polymorphisms and indels, indicates that they are likely to make significant contributions to phenotypic variation, yet their contribution to schizophrenia has been largely ignored by recent genome-wide association studies (GWAS). Tandem repeat expansions are already known causative factors for over 50 disorders, while common tandem repeat variation is increasingly being identified as significantly associated with complex disease and gene regulation. The current review summarizes key background concepts of tandem repeat variation as pertains to disease risk, elucidating their potential for schizophrenia association. An overview of next-generation sequencing-based methods that may be applied for TR genome-wide identification is provided, and some key methodological challenges in TR analyses are delineated.
Collapse
Affiliation(s)
- Rebecca Birnbaum
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
7
|
Panoyan MA, Wendt FR. The role of tandem repeat expansions in brain disorders. Emerg Top Life Sci 2023; 7:249-263. [PMID: 37401564 DOI: 10.1042/etls20230022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/05/2023] [Accepted: 06/19/2023] [Indexed: 07/05/2023]
Abstract
The human genome contains numerous genetic polymorphisms contributing to different health and disease outcomes. Tandem repeat (TR) loci are highly polymorphic yet under-investigated in large genomic studies, which has prompted research efforts to identify novel variations and gain a deeper understanding of their role in human biology and disease outcomes. We summarize the current understanding of TRs and their implications for human health and disease, including an overview of the challenges encountered when conducting TR analyses and potential solutions to overcome these challenges. By shedding light on these issues, this article aims to contribute to a better understanding of the impact of TRs on the development of new disease treatments.
Collapse
Affiliation(s)
- Mary Anne Panoyan
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
| | - Frank R Wendt
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada
| |
Collapse
|
8
|
Chaisson MJP, Sulovari A, Valdmanis PN, Miller DE, Eichler EE. Advances in the discovery and analyses of human tandem repeats. Emerg Top Life Sci 2023; 7:361-381. [PMID: 37905568 PMCID: PMC10806765 DOI: 10.1042/etls20230074] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 11/02/2023]
Abstract
Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.
Collapse
Affiliation(s)
- Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, U.S.A
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, CA 90089, U.S.A
| | - Arvis Sulovari
- Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, U.S.A
| | - Paul N Valdmanis
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, U.S.A
- Department of Pediatrics, University of Washington, Seattle, WA 98195, U.S.A
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, U.S.A
| |
Collapse
|
9
|
Wang X, Huang M, Budowle B, Ge J. TRcaller: a novel tool for precise and ultrafast tandem repeat variant genotyping in massively parallel sequencing reads. Front Genet 2023; 14:1227176. [PMID: 37533432 PMCID: PMC10390829 DOI: 10.3389/fgene.2023.1227176] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 06/13/2023] [Indexed: 08/04/2023] Open
Abstract
Calling tandem repeat (TR) variants from DNA sequences is of both theoretical and practical significance. Some bioinformatics tools have been developed for detecting or genotyping TRs. However, little study has been done to genotyping TR alleles from long-read sequencing data, and the accuracy of genotyping TR alleles from next-generation sequencing data still needs to be improved. Herein, a novel algorithm is described to retrieve TR regions from sequence alignment, and a software program TRcaller has been developed and integrated into a web portal to call TR alleles from both short- and long-read sequences, both whole genome and targeted sequences generated from multiple sequencing platforms. All TR alleles are genotyped as haplotypes and the robust alleles will be reported, even multiple alleles in a DNA mixture. TRcaller could provide substantially higher accuracy (>99% in 289 human individuals) in detecting TR alleles with magnitudes faster (e.g., ∼2 s for 300x human sequence data) than the mainstream software tools. The web portal preselected 119 TR loci from forensics, genealogy, and disease related TR loci. TRcaller is validated to be scalable in various applications, such as DNA forensics and disease diagnosis, which can be expanded into other fields like breeding programs. Availability: TRcaller is available at https://www.trcaller.com/SignIn.aspx.
Collapse
Affiliation(s)
- Xuewen Wang
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - Meng Huang
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - Bruce Budowle
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
- Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - Jianye Ge
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
- Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX, United States
| |
Collapse
|
10
|
Ludwig S, Pimentel JDSM, Cardoso Resende L, Kalapothakis E. Eco-evolutionary factors that influence its demographic oscillations in Prochilodus costatus (Actinopterygii: Characiformes) populations evidenced through a genetic spatial-temporal evaluation. Evol Appl 2023; 16:895-910. [PMID: 37124086 PMCID: PMC10130561 DOI: 10.1111/eva.13544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 02/10/2020] [Accepted: 03/04/2020] [Indexed: 05/02/2023] Open
Abstract
The human activity impact on wild animal populations is indicated by eco-evolutionary and demographic processes, along with their survival and capacity to evolve; consequently, such data can contribute toward enhancing genetic-based conservation programs. In this context, knowledge on the life-history and the eco-evolutionary processes is required to understand extant patterns of population structure in Prochilodus costatus a Neotropical migratory fish that has been threatened due to loss and fragmentation of its natural habitat since 1960s promoted by the expansion of hydroelectric power plant construction programs. This study evaluated the eco-evolutionary parameters that cause oscillations in the demography and structure of P. costatus populations. An integrated approach was used, including temporal and spatial sampling, next-generation sequencing of eight microsatellite loci, multivariate genetic analysis, and demographic life-history reconstruction. The results provided evidence of the complex interplay of ecological-evolutionary and human-interference events on the life history of this species in the upper basin. In particular, spawning wave behavior might have ecological triggers resulting in an overlapping of distinct genetic generations, and arising distinct migratory and nonmigratory genetic patterns living in the same area. An abrupt decrease in the effective population size of the P. costatus populations in the recent past (1960-80) was likely driven by environment fragmentation promoted by the construction of the Três Marias hydropower dam. The low allelic diversity that resulted from this event is still detected today; thus, active stocking programs are not effective at expanding the genetic diversity of this species in the river basin. Finally, this study highlights the importance of using mixed methods to understand spatial and temporal variation in genetic structure for effective mitigation and conservation programs for threatened species that are directly affected by human actions.
Collapse
Affiliation(s)
- Sandra Ludwig
- Departament of Genetics, Ecology and EvolutionFederal University of Minas GeraisBelo HorizonteBrazil
| | | | - Leonardo Cardoso Resende
- Departament of Genetics, Ecology and EvolutionFederal University of Minas GeraisBelo HorizonteBrazil
| | - Evanguedes Kalapothakis
- Departament of Genetics, Ecology and EvolutionFederal University of Minas GeraisBelo HorizonteBrazil
| |
Collapse
|
11
|
Dunn T, Blaauw D, Das R, Narayanasamy S. nPoRe: n-polymer realigner for improved pileup-based variant calling. BMC Bioinformatics 2023; 24:98. [PMID: 36927439 PMCID: PMC10022090 DOI: 10.1186/s12859-023-05193-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 02/19/2023] [Indexed: 03/18/2023] Open
Abstract
Despite recent improvements in nanopore basecalling accuracy, germline variant calling of small insertions and deletions (INDELs) remains poor. Although precision and recall for single nucleotide polymorphisms (SNPs) now exceeds 99.5%, INDEL recall remains below 80% for standard R9.4.1 flow cells. We show that read phasing and realignment can recover a significant portion of false negative INDELs. In particular, we extend Needleman-Wunsch affine gap alignment by introducing new gap penalties for more accurately aligning repeated n-polymer sequences such as homopolymers ([Formula: see text]) and tandem repeats ([Formula: see text]). At the same precision, haplotype phasing improves INDEL recall from 63.76 to [Formula: see text] and nPoRe realignment improves it further to [Formula: see text].
Collapse
Affiliation(s)
- Tim Dunn
- University of Michigan, Ann Arbor, USA
| | | | | | | |
Collapse
|
12
|
Verbiest M, Maksimov M, Jin Y, Anisimova M, Gymrek M, Bilgin Sonay T. Mutation and selection processes regulating short tandem repeats give rise to genetic and phenotypic diversity across species. J Evol Biol 2023; 36:321-336. [PMID: 36289560 PMCID: PMC9990875 DOI: 10.1111/jeb.14106] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/29/2022] [Accepted: 08/01/2022] [Indexed: 02/03/2023]
Abstract
Short tandem repeats (STRs) are units of 1-6 bp that repeat in a tandem fashion in DNA. Along with single nucleotide polymorphisms and large structural variations, they are among the major genomic variants underlying genetic, and likely phenotypic, divergence. STRs experience mutation rates that are orders of magnitude higher than other well-studied genotypic variants. Frequent copy number changes result in a wide range of alleles, and provide unique opportunities for modulating complex phenotypes through variation in repeat length. While classical studies have identified key roles of individual STR loci, the advent of improved sequencing technology, high-quality genome assemblies for diverse species, and bioinformatics methods for genome-wide STR analysis now enable more systematic study of STR variation across wide evolutionary ranges. In this review, we explore mutation and selection processes that affect STR copy number evolution, and how these processes give rise to varying STR patterns both within and across species. Finally, we review recent examples of functional and adaptive changes linked to STRs.
Collapse
Affiliation(s)
- Max Verbiest
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Department of Molecular Life SciencesUniversity of ZurichZurichSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Mikhail Maksimov
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Ye Jin
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of BioengineeringUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Maria Anisimova
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Melissa Gymrek
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Tugce Bilgin Sonay
- Institute of Ecology, Evolution and Environmental BiologyColumbia UniversityNew YorkNew YorkUSA
| |
Collapse
|
13
|
Dashnow H, Pedersen BS, Hiatt L, Brown J, Beecroft SJ, Ravenscroft G, LaCroix AJ, Lamont P, Roxburgh RH, Rodrigues MJ, Davis M, Mefford HC, Laing NG, Quinlan AR. STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci. Genome Biol 2022; 23:257. [PMID: 36517892 PMCID: PMC9753380 DOI: 10.1186/s13059-022-02826-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 11/30/2022] [Indexed: 12/23/2022] Open
Abstract
Expansions of short tandem repeats (STRs) cause many rare diseases. Expansion detection is challenging with short-read DNA sequencing data since supporting reads are often mapped incorrectly. Detection is particularly difficult for "novel" STRs, which include new motifs at known loci or STRs absent from the reference genome. We developed STRling to efficiently count k-mers to recover informative reads and call expansions at known and novel STR loci. STRling is sensitive to known STR disease loci, has a low false discovery rate, and resolves novel STR expansions to base-pair position accuracy. It is fast, scalable, open-source, and available at: github.com/quinlan-lab/STRling .
Collapse
Affiliation(s)
- Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utrecht University Medical Center, Utrecht, The Netherlands
| | - Laurel Hiatt
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Joe Brown
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Sarah J Beecroft
- Pawsey Supercomputing Research Centre, Kensington, WA, Australia
- Harry Perkins Institute of Medical Research and Centre for Medical Research, University of Western Australia, Perth, WA, Australia
| | - Gianina Ravenscroft
- Harry Perkins Institute of Medical Research and Centre for Medical Research, University of Western Australia, Perth, WA, Australia
| | - Amy J LaCroix
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA, 98195, USA
| | - Phillipa Lamont
- Neurogenetic Unit, Royal Perth Hospital, Perth, WA, Australia
| | | | - Miriam J Rodrigues
- Neurology, Auckland City Hospital, Auckland, New Zealand
- Centre for Brain Research, University of Auckland, Auckland, New Zealand
| | - Mark Davis
- Neurogenetics Unit, Department of Diagnostic Genomics, PathWest Laboratory Medicine, Western Australian Department of Health, Nedlands, Australia
| | - Heather C Mefford
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA, 98195, USA
| | - Nigel G Laing
- Harry Perkins Institute of Medical Research and Centre for Medical Research, University of Western Australia, Perth, WA, Australia
- Neurogenetics Unit, Department of Diagnostic Genomics, PathWest Laboratory Medicine, Western Australian Department of Health, Nedlands, Australia
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
14
|
Wang Z, Moffitt AB, Andrews P, Wigler M, Levy D. Accurate measurement of microsatellite length by disrupting its tandem repeat structure. Nucleic Acids Res 2022; 50:e116. [PMID: 36095132 PMCID: PMC9723644 DOI: 10.1093/nar/gkac723] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 08/03/2022] [Accepted: 08/15/2022] [Indexed: 12/24/2022] Open
Abstract
Tandem repeats of simple sequence motifs, also known as microsatellites, are abundant in the genome. Because their repeat structure makes replication error-prone, variant microsatellite lengths are often generated during germline and other somatic expansions. As such, microsatellite length variations can serve as markers for cancer. However, accurate error-free measurement of microsatellite lengths is difficult with current methods precisely because of this high error rate during amplification. We have solved this problem by using partial mutagenesis to disrupt enough of the repeat structure of initial templates so that their sequence lengths replicate faithfully. In this work, we use bisulfite mutagenesis to convert a C to a U, later read as T. Compared to untreated templates, we achieve three orders of magnitude reduction in the error rate per round of replication. By requiring agreement from two independent first copies of an initial template, we reach error rates below one in a million. We apply this method to a thousand microsatellite loci from the human genome, revealing microsatellite length distributions not observable without mutagenesis.
Collapse
Affiliation(s)
| | | | - Peter Andrews
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | | | - Dan Levy
- To whom correspondence should be addressed. Tel: +1 516 367 5039; Fax: +1 516 367 8381;
| |
Collapse
|
15
|
Fang L, Liu Q, Monteys AM, Gonzalez-Alegre P, Davidson BL, Wang K. DeepRepeat: direct quantification of short tandem repeats on signal data from nanopore sequencing. Genome Biol 2022; 23:108. [PMID: 35484600 PMCID: PMC9052667 DOI: 10.1186/s13059-022-02670-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 04/08/2022] [Indexed: 12/12/2022] Open
Abstract
Despite recent improvements in basecalling accuracy, nanopore sequencing still has higher error rates on short-tandem repeats (STRs). Instead of using basecalled reads, we developed DeepRepeat which converts ionic current signals into red-green-blue channels, thus transforming the repeat detection problem into an image recognition problem. DeepRepeat identifies and accurately quantifies telomeric repeats in the CHM13 cell line and achieves higher accuracy in quantifying repeats in long STRs than competing methods. We also evaluate DeepRepeat on genome-wide or candidate region datasets from seven different sources. In summary, DeepRepeat enables accurate quantification of long STRs and complements existing methods relying on basecalled reads.
Collapse
Affiliation(s)
- Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,School of Life Sciences, College of Science, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV, 89154, USA. .,Nevada Institute of Personalized Medicine, College of Science, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV, 89154, USA.
| | - Alex Mas Monteys
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Pedro Gonzalez-Alegre
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Beverly L Davidson
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.,Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
16
|
Han J, Munro JE, Kocoski A, Barry AE, Bahlo M. Population-level genome-wide STR discovery and validation for population structure and genetic diversity assessment of Plasmodium species. PLoS Genet 2022; 18:e1009604. [PMID: 35007277 PMCID: PMC8782505 DOI: 10.1371/journal.pgen.1009604] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 01/21/2022] [Accepted: 12/14/2021] [Indexed: 11/18/2022] Open
Abstract
Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).
Collapse
Affiliation(s)
- Jiru Han
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
| | - Jacob E. Munro
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
| | - Anthony Kocoski
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Mathematics and Statistics, The University of Melbourne, Melbourne, Australia
| | - Alyssa E. Barry
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
- Disease Elimination Program, Burnet Institute, Melbourne, Australia
- IMPACT Institute for Innovation in Mental and Physical Health and Clinical Translation, Deakin University, Geelong, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
- * E-mail:
| |
Collapse
|
17
|
An Introductory Overview of Open-Source and Commercial Software Options for the Analysis of Forensic Sequencing Data. Genes (Basel) 2021; 12:genes12111739. [PMID: 34828345 PMCID: PMC8618049 DOI: 10.3390/genes12111739] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 10/27/2021] [Accepted: 10/27/2021] [Indexed: 12/30/2022] Open
Abstract
The top challenges of adopting new methods to forensic DNA analysis in routine laboratories are often the capital investment and the expertise required to implement and validate such methods locally. In the case of next-generation sequencing, in the last decade, several specifically forensic commercial options became available, offering reliable and validated solutions. Despite this, the readily available expertise to analyze, interpret and understand such data is still perceived to be lagging behind. This review gives an introductory overview for the forensic scientists who are at the beginning of their journey with implementing next-generation sequencing locally and because most in the field do not have a bioinformatics background may find it difficult to navigate the new terms and analysis options available. The currently available open-source and commercial software for forensic sequencing data analysis are summarized here to provide an accessible starting point for those fairly new to the forensic application of massively parallel sequencing.
Collapse
|
18
|
Species Delimitation and Conservation in Taxonomically Challenging Lineages: The Case of Two Clades of Capurodendron (Sapotaceae) in Madagascar. PLANTS 2021; 10:plants10081702. [PMID: 34451747 PMCID: PMC8400537 DOI: 10.3390/plants10081702] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/11/2021] [Accepted: 08/11/2021] [Indexed: 01/23/2023]
Abstract
Capurodendron is the largest endemic genus of plants from Madagascar, with around 76% of its species threatened by deforestation and illegal logging. However, some species are not well circumscribed and many of them remain undescribed, impeding a confident evaluation of their conservation status. Here we focus on taxa delimitation and conservation of two species complexes within Capurodendron: the Arid and Western complexes, each containing undescribed morphologies as well as intermediate specimens alongside well-delimited taxa. To solve these taxonomic issues, we studied 381 specimens morphologically and selected 85 of them to obtain intergenic, intronic, and exonic protein-coding sequences of 794 nuclear genes and 227 microsatellite loci. These data were used to test species limits and putative hybrid patterns using different approaches such as phylogenies, PCA, structure analyses, heterozygosity level, FST, and ABBA-BABA tests. The potential distributions were furthermore estimated for each inferred species. The results show that the Capurodendron Western Complex contains three well-delimited species, C. oblongifolium, C. perrieri, and C. pervillei, the first two hybridizing sporadically with the last and producing morphologies similar to, but genetically distinct from C. pervillei. The Arid Complex shows a more intricate situation, as it contains three species morphologically well-delimited but genetically intermixed. Capurodendron mikeorum nom. prov. is shown to be an undescribed species with a restricted distribution, while C. androyense and C. mandrarense have wider and mostly sympatric distributions. Each of the latter two species contains two major genetic pools, one showing interspecific admixture in areas where both taxa coexist, and the other being less admixed and comprising allopatric populations having fewer contacts with the other species. Only two specimens out of 172 showed clear genetic and morphological signals of recent hybridization, while all the others were morphologically well-delimited, independent of their degree of genetic admixture. Hybridization between Capurodendron androyense and C. microphyllum, the sister species of the Arid Complex, was additionally detected in areas where both species coexist, producing intermediate morphologies. Among the two complexes, species are well-defined morphologically with the exception of seven specimens (1.8%) displaying intermediate patterns and genetic signals compatible with a F1 hybridization. A provisional conservation assessment for each species is provided.
Collapse
|
19
|
Rajan-Babu IS, Peng JJ, Chiu R, Li C, Mohajeri A, Dolzhenko E, Eberle MA, Birol I, Friedman JM. Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions. Genome Med 2021; 13:126. [PMID: 34372915 PMCID: PMC8351082 DOI: 10.1186/s13073-021-00932-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 07/05/2021] [Indexed: 02/01/2023] Open
Abstract
Background Screening for short tandem repeat (STR) expansions in next-generation sequencing data can enable diagnosis, optimal clinical management/treatment, and accurate genetic counseling of patients with repeat expansion disorders. We aimed to develop an efficient computational workflow for reliable detection of STR expansions in next-generation sequencing data and demonstrate its clinical utility. Methods We characterized the performance of eight STR analysis methods (lobSTR, HipSTR, RepeatSeq, ExpansionHunter, TREDPARSE, GangSTR, STRetch, and exSTRa) on next-generation sequencing datasets of samples with known disease-causing full-mutation STR expansions and genomes simulated to harbor repeat expansions at selected loci and optimized their sensitivity. We then used a machine learning decision tree classifier to identify an optimal combination of methods for full-mutation detection. In Burrows-Wheeler Aligner (BWA)-aligned genomes, the ensemble approach of using ExpansionHunter, STRetch, and exSTRa performed the best (precision = 82%, recall = 100%, F1-score = 90%). We applied this pipeline to screen 301 families of children with suspected genetic disorders. Results We identified 10 individuals with full-mutations in the AR, ATXN1, ATXN8, DMPK, FXN, or HTT disease STR locus in the analyzed families. Additional candidates identified in our analysis include two probands with borderline ATXN2 expansions between the established repeat size range for reduced-penetrance and full-penetrance full-mutation and seven individuals with FMR1 CGG repeats in the intermediate/premutation repeat size range. In 67 probands with a prior negative clinical PCR test for the FMR1, FXN, or DMPK disease STR locus, or the spinocerebellar ataxia disease STR panel, our pipeline did not falsely identify aberrant expansion. We performed clinical PCR tests on seven (out of 10) full-mutation samples identified by our pipeline and confirmed the expansion status in all, showing absolute concordance between our bioinformatics and molecular findings. Conclusions We have successfully demonstrated the application of a well-optimized bioinformatics pipeline that promotes the utility of genome-wide sequencing as a first-tier screening test to detect expansions of known disease STRs. Interrogating clinical next-generation sequencing data for pathogenic STR expansions using our ensemble pipeline can improve diagnostic yield and enhance clinical outcomes for patients with repeat expansion disorders. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-021-00932-9.
Collapse
Affiliation(s)
- Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia and Children's & Women's Hospital, Vancouver, BC, V6H3N1, Canada. .,Department of Medical and Molecular Genetics, King's College London, Strand, London, WC2R 2LS, UK.
| | - Junran J Peng
- Department of Medical Genetics, University of British Columbia and Children's & Women's Hospital, Vancouver, BC, V6H3N1, Canada
| | - Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, V5Z4S6, Canada
| | | | | | - Chenkai Li
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, V5Z4S6, Canada.,Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, V6T1Z4, Canada
| | - Arezoo Mohajeri
- Department of Medical Genetics, University of British Columbia and Children's & Women's Hospital, Vancouver, BC, V6H3N1, Canada
| | | | | | - Inanc Birol
- Department of Medical Genetics, University of British Columbia and Children's & Women's Hospital, Vancouver, BC, V6H3N1, Canada.,Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, V5Z4S6, Canada
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia and Children's & Women's Hospital, Vancouver, BC, V6H3N1, Canada
| |
Collapse
|
20
|
Microsatellites as Agents of Adaptive Change: An RNA-Seq-Based Comparative Study of Transcriptomes from Five Helianthus Species. Symmetry (Basel) 2021. [DOI: 10.3390/sym13060933] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Mutations that provide environment-dependent selective advantages drive adaptive divergence among species. Many phenotypic differences among related species are more likely to result from gene expression divergence rather than from non-synonymous mutations. In this regard, cis-regulatory mutations play an important part in generating functionally significant variation. Some proposed mechanisms that explore the role of cis-regulatory mutations in gene expression divergence involve microsatellites. Microsatellites exhibit high mutation rates achieved through symmetric or asymmetric mutation processes and are abundant in both coding and non-coding regions in positions that could influence gene function and products. Here we tested the hypothesis that microsatellites contribute to gene expression divergence among species with 50 individuals from five closely related Helianthus species using an RNA-seq approach. Differential expression analyses of the transcriptomes revealed that genes containing microsatellites in non-coding regions (UTRs and introns) are more likely to be differentially expressed among species when compared to genes with microsatellites in the coding regions and transcripts lacking microsatellites. We detected a greater proportion of shared microsatellites in 5′UTRs and coding regions compared to 3′UTRs and non-coding transcripts among Helianthus spp. Furthermore, allele frequency differences measured by pairwise FST at single nucleotide polymorphisms (SNPs), indicate greater genetic divergence in transcripts containing microsatellites compared to those lacking microsatellites. A gene ontology (GO) analysis revealed that microsatellite-containing differentially expressed genes are significantly enriched for GO terms associated with regulation of transcription and transcription factor activity. Collectively, our study provides compelling evidence to support the role of microsatellites in gene expression divergence.
Collapse
|
21
|
Genome-wide simple sequence repeats (SSR) markers discovered from whole-genome sequence comparisons of multiple spinach accessions. Sci Rep 2021; 11:9999. [PMID: 33976335 PMCID: PMC8113571 DOI: 10.1038/s41598-021-89473-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Accepted: 04/13/2021] [Indexed: 02/03/2023] Open
Abstract
The availability of well-assembled genome sequences and reduced sequencing costs have enabled the resequencing of many additional accessions in several crops, thus facilitating the rapid discovery and development of simple sequence repeat (SSR) markers. Although the genome sequence of inbred spinach line Sp75 is available, previous efforts have resulted in a limited number of useful SSR markers. Identification of additional polymorphic SSR markers will support genetics and breeding research in spinach. This study aimed to use the available genomic resources to mine and catalog a large number of polymorphic SSR markers. A search for SSR loci on six chromosome sequences of spinach line Sp75 using GMATA identified a total of 42,155 loci with repeat motifs of two to six nucleotides in the Sp75 reference genome. Whole-genome sequences (30x) of additional 21 accessions were aligned against the chromosome sequences of the reference genome and in silico genotyped using the HipSTR program by comparing and counting repeat numbers variation across the SSR loci among the accessions. The HipSTR program generated SSR genotype data were filtered for monomorphic and high missing loci, and a final set of the 5986 polymorphic SSR loci were identified. The polymorphic SSR loci were present at a density of 12.9 SSRs/Mb and were physically mapped. Out of 36 randomly selected SSR loci for validation, two failed to amplify, while the remaining were all polymorphic in a set of 48 spinach accessions from 34 countries. Genetic diversity analysis performed using the SSRs allele score data on the 48 spinach accessions showed three main population groups. This strategy to mine and develop polymorphic SSR markers by a comparative analysis of the genome sequences of multiple accessions and computational genotyping of the candidate SSR loci eliminates the need for laborious experimental screening. Our approach increased the efficiency of discovering a large set of novel polymorphic SSR markers, as demonstrated in this report.
Collapse
|
22
|
Bhattarai G, Shi A, Kandel DR, Solís-Gracia N, da Silva JA, Avila CA. Genome-wide simple sequence repeats (SSR) markers discovered from whole-genome sequence comparisons of multiple spinach accessions. Sci Rep 2021. [PMID: 33976335 DOI: 10.1038/s41598-021-89472-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2023] Open
Abstract
The availability of well-assembled genome sequences and reduced sequencing costs have enabled the resequencing of many additional accessions in several crops, thus facilitating the rapid discovery and development of simple sequence repeat (SSR) markers. Although the genome sequence of inbred spinach line Sp75 is available, previous efforts have resulted in a limited number of useful SSR markers. Identification of additional polymorphic SSR markers will support genetics and breeding research in spinach. This study aimed to use the available genomic resources to mine and catalog a large number of polymorphic SSR markers. A search for SSR loci on six chromosome sequences of spinach line Sp75 using GMATA identified a total of 42,155 loci with repeat motifs of two to six nucleotides in the Sp75 reference genome. Whole-genome sequences (30x) of additional 21 accessions were aligned against the chromosome sequences of the reference genome and in silico genotyped using the HipSTR program by comparing and counting repeat numbers variation across the SSR loci among the accessions. The HipSTR program generated SSR genotype data were filtered for monomorphic and high missing loci, and a final set of the 5986 polymorphic SSR loci were identified. The polymorphic SSR loci were present at a density of 12.9 SSRs/Mb and were physically mapped. Out of 36 randomly selected SSR loci for validation, two failed to amplify, while the remaining were all polymorphic in a set of 48 spinach accessions from 34 countries. Genetic diversity analysis performed using the SSRs allele score data on the 48 spinach accessions showed three main population groups. This strategy to mine and develop polymorphic SSR markers by a comparative analysis of the genome sequences of multiple accessions and computational genotyping of the candidate SSR loci eliminates the need for laborious experimental screening. Our approach increased the efficiency of discovering a large set of novel polymorphic SSR markers, as demonstrated in this report.
Collapse
Affiliation(s)
- Gehendra Bhattarai
- Department of Horticulture, University of Arkansas, Fayetteville, AR, 72701, USA
| | - Ainong Shi
- Department of Horticulture, University of Arkansas, Fayetteville, AR, 72701, USA.
| | - Devi R Kandel
- Texas A&M AgriLife Research and Extension Center, Weslaco, TX, 78596, USA
| | - Nora Solís-Gracia
- Texas A&M AgriLife Research and Extension Center, Weslaco, TX, 78596, USA
| | - Jorge Alberto da Silva
- Texas A&M AgriLife Research and Extension Center, Weslaco, TX, 78596, USA
- Department of Crop and Soil Sciences, Texas A&M University, College Station, TX, 77843, USA
| | - Carlos A Avila
- Texas A&M AgriLife Research and Extension Center, Weslaco, TX, 78596, USA.
- Department of Horticultural Sciences, Texas A&M University, College Station, TX, 77843, USA.
| |
Collapse
|
23
|
Kinney N, Kang L, Bains H, Lawson E, Husain M, Husain K, Sandhu I, Shin Y, Carter JK, Anandakrishnan R, Michalak P, Garner H. Ethnically biased microsatellites contribute to differential gene expression and glutathione metabolism in Africans and Europeans. PLoS One 2021; 16:e0249148. [PMID: 33765058 PMCID: PMC7993785 DOI: 10.1371/journal.pone.0249148] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 03/11/2021] [Indexed: 12/28/2022] Open
Abstract
Approximately three percent of the human genome is occupied by microsatellites: a type of short tandem repeat (STR). Microsatellites have well established effects on (a) the genetic structure of diverse human populations and (b) expression of nearby genes. These lines of inquiry have uncovered 3,984 ethnically biased microsatellite loci (EBML) and 28,375 expression STRs (eSTRs), respectively. We hypothesize that a combination of EBML, eSTRs, and gene expression data (RNA-seq) can be used to show that microsatellites contribute to differential gene expression and phenotype in human populations. In fact, our previous study demonstrated a degree of mutual overlap between EBML and eSTRs but fell short of quantifying effects on gene expression. The present work aims to narrow the gap. First, we identify 313 overlapping EBML/eSTRs and recapitulate their mutual overlap. The 313 EBML/eSTRs are then characterized across ethnicity and tissue type. We use RNA-seq data to pursue validation of 49 regions that affect whole blood gene expression; 32 out of 54 affected genes are differentially expressed in Africans and Europeans. We quantify the relative contribution of these 32 genes to differential expression; fold change tends to be less than other differentially expressed genes. Repeat length correlates with expression for 15 of the 32 genes; two are conspicuously involved in glutathione metabolism. Finally, we repurpose a mathematical model of glutathione metabolism to investigate how a single polymorphic microsatellite affects phenotype. We conclude with a testable prediction that microsatellite polymorphisms affect GPX7 expression and oxidative stress in Africans and Europeans.
Collapse
Affiliation(s)
- Nick Kinney
- Edward Via College of Osteopathic Medicine, Blacksburg, Virginia, United States of America
- Gibbs Cancer Center & Research Institute, Spartanburg, South Carolina, United States of America
- * E-mail:
| | - Lin Kang
- Edward Via College of Osteopathic Medicine, Blacksburg, Virginia, United States of America
- Gibbs Cancer Center & Research Institute, Spartanburg, South Carolina, United States of America
| | - Harpal Bains
- Edward Via College of Osteopathic Medicine, Blacksburg, Virginia, United States of America
| | - Elizabeth Lawson
- Edward Via College of Osteopathic Medicine, Blacksburg, Virginia, United States of America
| | - Mesam Husain
- Edward Via College of Osteopathic Medicine, Blacksburg, Virginia, United States of America
| | - Kumayl Husain
- Edward Via College of Osteopathic Medicine, Blacksburg, Virginia, United States of America
| | - Inderjit Sandhu
- Edward Via College of Osteopathic Medicine, Blacksburg, Virginia, United States of America
| | - Yongdeok Shin
- Edward Via College of Osteopathic Medicine, Blacksburg, Virginia, United States of America
| | - Javan K. Carter
- University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Ramu Anandakrishnan
- Edward Via College of Osteopathic Medicine, Blacksburg, Virginia, United States of America
- Gibbs Cancer Center & Research Institute, Spartanburg, South Carolina, United States of America
| | - Pawel Michalak
- Edward Via College of Osteopathic Medicine, Blacksburg, Virginia, United States of America
- Gibbs Cancer Center & Research Institute, Spartanburg, South Carolina, United States of America
- Institute of Evolution, University of Haifa, Haifa, Israel
| | - Harold Garner
- Edward Via College of Osteopathic Medicine, Blacksburg, Virginia, United States of America
- Gibbs Cancer Center & Research Institute, Spartanburg, South Carolina, United States of America
| |
Collapse
|
24
|
Song X, Yang T, Zhang X, Yuan Y, Yan X, Wei Y, Zhang J, Zhou C. Comparison of the Microsatellite Distribution Patterns in the Genomes of Euarchontoglires at the Taxonomic Level. Front Genet 2021; 12:622724. [PMID: 33719337 PMCID: PMC7953163 DOI: 10.3389/fgene.2021.622724] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 02/05/2021] [Indexed: 02/05/2023] Open
Abstract
Microsatellite or simple sequence repeat (SSR) instability within genes can induce genetic variation. The SSR signatures remain largely unknown in different clades within Euarchontoglires, one of the most successful mammalian radiations. Here, we conducted a genome-wide characterization of microsatellite distribution patterns at different taxonomic levels in 153 Euarchontoglires genomes. Our results showed that the abundance and density of the SSRs were significantly positively correlated with primate genome size, but no significant relationship with the genome size of rodents was found. Furthermore, a higher level of complexity for perfect SSR (P-SSR) attributes was observed in rodents than in primates. The most frequent type of P-SSR was the mononucleotide P-SSR in the genomes of primates, tree shrews, and colugos, while mononucleotide or dinucleotide motif types were dominant in the genomes of rodents and lagomorphs. Furthermore, (A)n was the most abundant motif in primate genomes, but (A)n, (AC)n, or (AG)n was the most abundant motif in rodent genomes which even varied within the same genus. The GC content and the repeat copy numbers of P-SSRs varied in different species when compared at different taxonomic levels, reflecting underlying differences in SSR mutation processes. Notably, the CDSs containing P-SSRs were categorized by functions and pathways using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes annotations, highlighting their roles in transcription regulation. Generally, this work will aid future studies of the functional roles of the taxonomic features of microsatellites during the evolution of mammals in Euarchontoglires.
Collapse
Affiliation(s)
- Xuhao Song
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China.,Institute of Ecology, China West Normal University, Nanchong, China
| | - Tingbang Yang
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China.,Institute of Ecology, China West Normal University, Nanchong, China
| | - Xinyi Zhang
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China
| | - Ying Yuan
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China
| | - Xianghui Yan
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China
| | - Yi Wei
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China.,Institute of Ecology, China West Normal University, Nanchong, China
| | - Jun Zhang
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China.,Institute of Ecology, China West Normal University, Nanchong, China
| | - Caiquan Zhou
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China.,Institute of Ecology, China West Normal University, Nanchong, China
| |
Collapse
|
25
|
New challenges, new opportunities: Next generation sequencing and its place in the advancement of HLA typing. Hum Immunol 2021; 82:478-487. [PMID: 33551127 DOI: 10.1016/j.humimm.2021.01.010] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Revised: 12/29/2020] [Accepted: 01/18/2021] [Indexed: 02/07/2023]
Abstract
The Human Leukocyte Antigen (HLA) system has a critical role in immunorecognition, transplantation, and disease association. Early typing techniques provided the foundation for genotyping methods that revealed HLA as one of the most complex, polymorphic regions of the human genome. Next Generation Sequencing (NGS), the latest molecular technology introduced in clinical tissue typing laboratories, has demonstrated advantages over other established methods. NGS offers high-resolution sequencing of entire genes in time frames and price points considered unthinkable just a few years ago, contributing a wealth of data informing histocompatibility assessment and standards of clinical care. Although the NGS platforms share a high-throughput massively parallel processing model, differing chemistries provide specific strengths and weaknesses. Research-oriented Third Generation Sequencing and related advances in bioengineering continue to broaden the future of NGS in clinical settings. These diverse applications have demanded equally innovative strategies for data management and computational bioinformatics to support and analyze the unprecedented volume and complexity of data generated by NGS. We discuss some of the challenges and opportunities associated with NGS technologies, providing a comprehensive picture of the historical developments that paved the way for the NGS revolution, its current state and future possibilities for HLA typing.
Collapse
|
26
|
Roy D, Lehnert SJ, Venney CJ, Walter R, Heath DD. NGS-μsat: bioinformatics framework supporting high throughput microsatellite genotyping from next generation sequencing platforms. CONSERV GENET RESOUR 2021. [DOI: 10.1007/s12686-020-01186-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
27
|
Liu Q, Tong Y, Wang K. Genome-wide detection of short tandem repeat expansions by long-read sequencing. BMC Bioinformatics 2020; 21:542. [PMID: 33371889 PMCID: PMC7768641 DOI: 10.1186/s12859-020-03876-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 11/13/2020] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Short tandem repeat (STR), or "microsatellite", is a tract of DNA in which a specific motif (typically < 10 base pairs) is repeated multiple times. STRs are abundant throughout the human genome, and specific repeat expansions may be associated with human diseases. Long-read sequencing coupled with bioinformatics tools enables the estimation of repeat counts for STRs. However, with the exception of a few well-known disease-relevant STRs, normal ranges of repeat counts for most STRs in human populations are not well known, preventing the prioritization of STRs that may be associated with human diseases. RESULTS In this study, we extend a computational tool RepeatHMM to infer normal ranges of 432,604 STRs using 21 long-read sequencing datasets on human genomes, and build a genomic-scale database called RepeatHMM-DB with normal repeat ranges for these STRs. Evaluation on 13 well-known repeats show that the inferred repeat ranges provide good estimation to repeat ranges reported in literature from population-scale studies. This database, together with a repeat expansion estimation tool such as RepeatHMM, enables genomic-scale scanning of repeat regions in newly sequenced genomes to identify disease-relevant repeat expansions. As a case study of using RepeatHMM-DB, we evaluate the CAG repeats of ATXN3 for 20 patients with spinocerebellar ataxia type 3 (SCA3) and 5 unaffected individuals, and correctly classify each individual. CONCLUSIONS In summary, RepeatHMM-DB can facilitate prioritization and identification of disease-relevant STRs from whole-genome long-read sequencing data on patients with undiagnosed diseases. RepeatHMM-DB is incorporated into RepeatHMM and is available at https://github.com/WGLab/RepeatHMM .
Collapse
Affiliation(s)
- Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Yao Tong
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
28
|
Bolognini D, Magi A, Benes V, Korbel JO, Rausch T. TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data. Gigascience 2020; 9:giaa101. [PMID: 33034633 PMCID: PMC7539535 DOI: 10.1093/gigascience/giaa101] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 08/07/2020] [Accepted: 09/07/2020] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution. RESULTS We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees. CONCLUSIONS TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes.
Collapse
Affiliation(s)
- Davide Bolognini
- Department of Experimental and Clinical Medicine, University of Florence, Viale Pieraccini 6, Florence 50134, Italy
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Alberto Magi
- Department of Information Engineering, University of Florence, Via di S. Marta 3, Florence 50134, Italy
| | - Vladimir Benes
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, Heidelberg 69117, Germany
| |
Collapse
|
29
|
Pimentel JDSM, Ludwig S, Resende LC, Brandão-Dias PFP, Pereira AH, de Abreu NL, Rosse IC, Martins APV, Facchin S, Lopes JDM, Santos GB, Alves CBM, Kalapothakis E. Genetic evaluation of migratory fish: Implications for conservation and stocking programs. Ecol Evol 2020; 10:10314-10324. [PMID: 33072261 PMCID: PMC7548202 DOI: 10.1002/ece3.6231] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Revised: 12/09/2019] [Accepted: 02/19/2020] [Indexed: 02/04/2023] Open
Abstract
Fish stocking programs have been implemented to mitigate the blockage of original riverbeds by the construction of hydropower dams, which affects the natural migration of fish populations. However, this method raises concerns regarding the genetic rescue of the original populations of migratory fish species. We investigated the spatial distribution of genetic properties, such as genetic diversity, population structure, and gene flow (migration), of the Neotropical migratory fish Prochilodus costatus in the Três Marias dam in the São Francisco River basin, Brazil, and examined the possible effects of fish stocking programs on P. costatus populations in this region. In total, 1,017 specimens were sampled from 12 natural sites and a fish stocking program, and genotyped for high‐throughput sequencing at 8 microsatellite loci. The populations presented low genetic variability, with evidence of inbreeding and the presence of only four genetic pools; three pools were observed throughout the study region, and the fourth was exclusive to one area in the Paraopeba River. Additionally, we identified high unidirectional gene flow between regions, and a preferred migratory route between the Pará River and the upper portion of the São Francisco River. The fish stocking program succeeded in transposing the genetic pools from downstream to upstream of the Três Marias dam, but, regrettably, promoted genetic homogenization in the upper São Francisco River basin. Moreover, the data show the fragility of this species at the genetic level. This monitoring strategy could be a model for the development of conservation and management measures for migratory fish populations that are consumed by humans.
Collapse
Affiliation(s)
- Juliana da Silva Martins Pimentel
- Department of Genetic, Ecology and Evolution Institute of Biological Sciences Federal University of Minas Gerais Belo Horizonte Brazil.,Pitágoras College Belo Horizonte Brazil
| | - Sandra Ludwig
- Department of Genetic, Ecology and Evolution Institute of Biological Sciences Federal University of Minas Gerais Belo Horizonte Brazil.,Department of Zoology Institute of Biological Sciences Federal University of Minas Gerais Belo Horizonte Brazil
| | - Leonardo Cardoso Resende
- Department of Genetic, Ecology and Evolution Institute of Biological Sciences Federal University of Minas Gerais Belo Horizonte Brazil
| | - Pedro Ferreira Pinto Brandão-Dias
- Department of Genetic, Ecology and Evolution Institute of Biological Sciences Federal University of Minas Gerais Belo Horizonte Brazil
| | - Adriana Heloísa Pereira
- Department of Genetic, Ecology and Evolution Institute of Biological Sciences Federal University of Minas Gerais Belo Horizonte Brazil
| | - Nazaré Lúcio de Abreu
- Department of Genetic, Ecology and Evolution Institute of Biological Sciences Federal University of Minas Gerais Belo Horizonte Brazil
| | - Izinara Cruz Rosse
- Department of Genetic, Ecology and Evolution Institute of Biological Sciences Federal University of Minas Gerais Belo Horizonte Brazil.,Department of Pharmacy Federal University of Ouro Preto Ouro Preto Brazil
| | - Ana Paula Vimieiro Martins
- Department of Genetic, Ecology and Evolution Institute of Biological Sciences Federal University of Minas Gerais Belo Horizonte Brazil
| | - Susanne Facchin
- Department of Genetic, Ecology and Evolution Institute of Biological Sciences Federal University of Minas Gerais Belo Horizonte Brazil
| | | | | | | | - Evanguedes Kalapothakis
- Department of Genetic, Ecology and Evolution Institute of Biological Sciences Federal University of Minas Gerais Belo Horizonte Brazil
| |
Collapse
|
30
|
Mode and Tempo of Microsatellite Evolution across 300 Million Years of Insect Evolution. Genes (Basel) 2020; 11:genes11080945. [PMID: 32824315 PMCID: PMC7464534 DOI: 10.3390/genes11080945] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Revised: 08/11/2020] [Accepted: 08/14/2020] [Indexed: 01/02/2023] Open
Abstract
Microsatellites are short, repetitive DNA sequences that can rapidly expand and contract due to slippage during DNA replication. Despite their impacts on transcription, genome structure, and disease, relatively little is known about the evolutionary dynamics of these short sequences across long evolutionary periods. To address this gap in our knowledge, we performed comparative analyses of 304 available insect genomes. We investigated the impact of sequence assembly methods and assembly quality on the inference of microsatellite content, and we explored the influence of chromosome type and number on the tempo and mode of microsatellite evolution across one of the most speciose clades on the planet. Diploid chromosome number had no impact on the rate of microsatellite evolution or the amount of microsatellite content in genomes. We found that centromere type (holocentric or monocentric) is not associated with a difference in the amount of microsatellite content; however, in those species with monocentric chromosomes, microsatellite content tends to evolve faster than in species with holocentric chromosomes.
Collapse
|
31
|
The Potential of HTS Approaches for Accurate Genotyping in Grapevine ( Vitis vinifera L.). Genes (Basel) 2020; 11:genes11080917. [PMID: 32785184 PMCID: PMC7464945 DOI: 10.3390/genes11080917] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/03/2020] [Accepted: 08/06/2020] [Indexed: 11/16/2022] Open
Abstract
The main challenge associated with genotyping based on conventional length polymorphisms is the cross-laboratory standardization of allele sizes. This step requires the inclusion of standards and manual sizing to avoid false results. Capillary electrophoresis (CE) approaches limit the information to the length polymorphism and do not allow the determination of a complete marker sequence. As an alternative, high-throughput sequencing (HTS) offers complete information regarding marker sequences and their flanking regions. In this work, we investigated the suitability of a semi-quantitative sequencing approach for microsatellite genotyping using Illumina paired-end technology. Twelve microsatellite loci that are well established for grapevine CE typing were analysed on 96 grapevine samples from six different countries. We redesigned primers to the length of the amplicon for short sequencing (~100 bp). The primer pair was flanked with a 10 bp overhang for the introduction of barcodes on both sides of the amplicon to enable high multiplexing. The highest data peaks were determined as simple sequence repeat (SSR) alleles and compared with the CE dataset based on 12 reference samples. The comparison showed that HTS SSR genotyping can successfully replace the CE system in further experiments. We believe that, with next-generation sequencing, genotyping can be improved in terms of its speed, accuracy, and price.
Collapse
|
32
|
Yu H, Zhao S, Ness S, Kang H, Sheng Q, Samuels DC, Oyebamiji O, Zhao YY, Guo Y. Non-canonical RNA-DNA differences and other human genomic features are enriched within very short tandem repeats. PLoS Comput Biol 2020; 16:e1007968. [PMID: 32511223 PMCID: PMC7302867 DOI: 10.1371/journal.pcbi.1007968] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 06/18/2020] [Accepted: 05/19/2020] [Indexed: 11/19/2022] Open
Abstract
Very short tandem repeats bear substantial genetic, evolutional, and pathological significance in genome analyses. Here, we compiled a census of tandem mono-nucleotide/di-nucleotide/tri-nucleotide repeats (MNRs/DNRs/TNRs) in GRCh38, which we term "polytracts" in general. Of the human genome, 144.4 million nucleotides (4.7%) are occupied by polytracts, and 0.47 million single nucleotides are identified as polytract hinges, i.e., break-points of tandem polytracts. Preliminary exploration of the census suggested polytract hinge sites and boundaries of AAC polytracts may bear a higher mapping error rate than other polytract regions. Further, we revealed landscapes of polytract enrichment with respect to nearly a hundred genomic features. We found MNRs, DNRs, and TNRs displayed noticeable difference in terms of locational enrichment for miscellaneous genomic features, especially RNA editing events. Non-canonical and C-to-U RNA-editing events are enriched inside and/or adjacent to MNRs, while all categories of RNA-editing events are under-represented in DNRs. A-to-I RNA-editing events are generally under-represented in polytracts. The selective enrichment of non-canonical RNA-editing events within MNR adjacency provides a negative evidence against their authenticity. To enable similar locational enrichment analyses in relation to polytracts, we developed a software Polytrap which can handle 11 reference genomes. Additionally, we compiled polytracts of four model organisms into a Track Hub which can be integrated into USCS Genome Browser as an official track for convenient visualization of polytracts.
Collapse
Affiliation(s)
- Hui Yu
- Comprehensive Cancer Center, University of New Mexico, Albuquerque, New Mexico, United States of America
- * E-mail: (HY); (YG)
| | - Shilin Zhao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Scott Ness
- Comprehensive Cancer Center, University of New Mexico, Albuquerque, New Mexico, United States of America
| | - Huining Kang
- Comprehensive Cancer Center, University of New Mexico, Albuquerque, New Mexico, United States of America
| | - Quanhu Sheng
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - David C. Samuels
- Deptartment of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Olufunmilola Oyebamiji
- Comprehensive Cancer Center, University of New Mexico, Albuquerque, New Mexico, United States of America
| | - Ying-yong Zhao
- Key Laboratory of Resource Biology and Biotechnology in Western China, School of Life Sciences, Northwest University, Xi'an, Shaanxi, China
| | - Yan Guo
- Comprehensive Cancer Center, University of New Mexico, Albuquerque, New Mexico, United States of America
- * E-mail: (HY); (YG)
| |
Collapse
|
33
|
Rocca MS, Ferrarini M, Msaki A, Vinanzi C, Ghezzi M, De Rocco Ponce M, Foresta C, Ferlin A. Comparison of NGS panel and Sanger sequencing for genotyping CAG repeats in the
AR
gene. Mol Genet Genomic Med 2020; 8:e1207. [PMID: 32216057 PMCID: PMC7284049 DOI: 10.1002/mgg3.1207] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 02/19/2020] [Accepted: 02/22/2020] [Indexed: 12/30/2022] Open
Abstract
Background The androgen receptor (AR) is a nuclear receptor, encoded by the AR gene on the X chromosome. Within the first exon of the AR gene, two short tandem repeats (STR), CAG and GGC, are a source of polymorphism in the population. Therefore, high‐throughput methods for screening AR, such as next‐generation sequencing (NGS), are sought after; however, data generated by NGS are limited by the availability of bioinformatics tools. Here, we evaluated the accuracy of the bioinformatics tool HipSTR in detecting and quantify CAG repeats within the AR gene. Method The AR gene of 228 infertile men was sequenced using NGSgene panel. Data generated were analyzed with HipSTR to detect CAG repeats. The accuracy was compared with the results obtained with Sanger. Results We found that HipSTR was more accurate than Sanger in genotyping normal karyotype men (46,XY), however, it was more likely to misidentify homozygote genotypes in men with Klinefelter syndrome (47,XXY). Conclusion Our findings show that the bioinformatics tool HipSTR is 100% accurate in detecting and assessing AR CAG repeats in infertile men (46,XY) as well as in men with low‐level mosaicism.
Collapse
Affiliation(s)
- Maria Santa Rocca
- Unit of Andrology and Reproductive Medicine Department of Medicine University of Padua Padua Italy
| | - Margherita Ferrarini
- Unit of Andrology and Reproductive Medicine Department of Medicine University of Padua Padua Italy
| | - Aichi Msaki
- Unit of Andrology and Reproductive Medicine Department of Medicine University of Padua Padua Italy
| | - Cinzia Vinanzi
- Unit of Andrology and Reproductive Medicine Department of Medicine University of Padua Padua Italy
| | - Marco Ghezzi
- Unit of Andrology and Reproductive Medicine Department of Medicine University of Padua Padua Italy
| | - Maurizio De Rocco Ponce
- Unit of Andrology and Reproductive Medicine Department of Medicine University of Padua Padua Italy
| | - Carlo Foresta
- Unit of Andrology and Reproductive Medicine Department of Medicine University of Padua Padua Italy
| | - Alberto Ferlin
- Department of Clinical and Experimental Sciences University of Brescia Brescia Italy
| |
Collapse
|
34
|
Ranathunge C, Wheeler GL, Chimahusky ME, Perkins AD, Pramod S, Welch ME. Transcribed microsatellite allele lengths are often correlated with gene expression in natural sunflower populations. Mol Ecol 2020; 29:1704-1716. [PMID: 32285554 DOI: 10.1111/mec.15440] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 03/15/2020] [Accepted: 04/02/2020] [Indexed: 12/23/2022]
Abstract
Microsatellites are common in genomes of most eukaryotic species. Due to their high mutability, an adaptive role for microsatellites has been considered. However, little is known concerning the contribution of microsatellites towards phenotypic variation. We used populations of the common sunflower (Helianthus annuus) at two latitudes to quantify the effect of microsatellite allele length on phenotype at the level of gene expression. We conducted a common garden experiment with seed collected from sunflower populations in Kansas and Oklahoma followed by an RNA-Seq experiment on 95 individuals. The effect of microsatellite allele length on gene expression was assessed across 3,325 microsatellites that could be consistently scored. Our study revealed 479 microsatellites at which allele length significantly correlates with gene expression (eSTRs). When irregular allele sizes not conforming to the motif length were removed, the number of eSTRs rose to 2,379. The percentage of variation in gene expression explained by eSTRs ranged from 1%-86% when controlling for population and allele-by-population interaction effects at the 479 eSTRs. Of these eSTRs, 70.4% are in untranslated regions (UTRs). A gene ontology (GO) analysis revealed that eSTRs are significantly enriched for GO terms associated with cis- and trans-regulatory processes. Our findings suggest that a substantial number of transcribed microsatellites can influence gene expression.
Collapse
Affiliation(s)
- Chathurani Ranathunge
- Department of Biological Sciences, Mississippi State University, Starkville, MS, USA
| | - Gregory L Wheeler
- Department of Biological Sciences, Mississippi State University, Starkville, MS, USA
| | - Melody E Chimahusky
- Department of Biological Sciences, Mississippi State University, Starkville, MS, USA
| | - Andy D Perkins
- Department of Computer Science and Engineering, Mississippi State University, Starkville, MS, USA
| | - Sreepriya Pramod
- Department of Biological Sciences, Mississippi State University, Starkville, MS, USA
| | - Mark E Welch
- Department of Biological Sciences, Mississippi State University, Starkville, MS, USA
| |
Collapse
|
35
|
Abstract
Background: Short tandem repeats are an important source of genetic variation. They are highly mutable and repeat expansions are associated dozens of human disorders, such as Huntington's disease and spinocerebellar ataxias. Technical advantages in sequencing technology have made it possible to analyse these repeats at large scale; however, accurate genotyping is still a challenging task. We compared four different short tandem repeats genotyping tools on whole exome sequencing data to determine their genotyping performance and limits, which will aid other researchers in choosing a suitable tool and parameters for analysis. Methods: The analysis was performed on the Simons Simplex Collection dataset, where we used a novel method of evaluation with accuracy determined by the rate of homozygous calls on the X chromosome of male samples. In total we analysed 433 samples and around a million genotypes for evaluating tools on whole exome sequencing data. Results: We determined a relatively good performance of all tools when genotyping repeats of 3-6 bp in length, which could be improved with coverage and quality score filtering. However, genotyping homopolymers was challenging for all tools and a high error rate was present across different thresholds of coverage and quality scores. Interestingly, dinucleotide repeats displayed a high error rate as well, which was found to be mainly caused by the AC/TG repeats. Overall, LobSTR was able to make the most calls and was also the fastest tool, while RepeatSeq and HipSTR exhibited the lowest heterozygous error rate at low coverage. Conclusions: All tools have different strengths and weaknesses and the choice may depend on the application. In this analysis we demonstrated the effect of using different filtering parameters and offered recommendations based on the trade-off between the best accuracy of genotyping and the highest number of calls.
Collapse
Affiliation(s)
- Andreas Halman
- Murdoch Children’s Research Institute, Royal Children’s Hospital, Parkville, VIC, 3052, Australia
- Peter MacCallum Cancer Centre, 305 Grattan St, Melbourne, VIC, 3000, Australia
- Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, VIC, 3052, Australia
- School of Natural Sciences and Health, Tallinn University, Tallinn, 10120, Estonia
| | - Alicia Oshlack
- Murdoch Children’s Research Institute, Royal Children’s Hospital, Parkville, VIC, 3052, Australia
- Peter MacCallum Cancer Centre, 305 Grattan St, Melbourne, VIC, 3000, Australia
- School of BioSciences, University of Melbourne, Parkville, VIC, 3052, Australia
| |
Collapse
|
36
|
Wang D, Tao R, Li Z, Pan D, Wang Z, Li C, Shi Y. STRsearch: a new pipeline for targeted profiling of short tandem repeats in massively parallel sequencing data. Hereditas 2020; 157:8. [PMID: 32172688 PMCID: PMC7075041 DOI: 10.1186/s41065-020-00120-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 02/18/2020] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Short tandem repeats (STRs) are important polymorphism makers for human identification and kinship analyses in forensic science. With the continuous development of massively parallel sequencing (MPS), more laboratories have utilized this technology for forensic applications. Existing STR genotyping tools, mostly developed for whole-genome sequencing data, are not effective for MPS data. More importantly, their backward compatibility with the conventional capillary electrophoresis (CE) technology has not been evaluated and guaranteed. RESULTS In this study, we developed a new end-to-end pipeline called STRsearch for STR-MPS data analysis. The STRsearch can not only determine the allele by counting repeat patterns and INDELs that are actually in the STR region, but it also translates MPS results into standard STR nomenclature (numbers and letters). We evaluated the performance of STRsearch in two forensic sequencing datasets, and the concordance with CE genotypes was 75.73 and 75.75%, increasing 12.32 and 9.05% than the existing tool named STRScan, respectively. Additionally, we trained a base classifier using sequence properties and used it to predict the probability of correct genotyping at a given locus, resulting in the highest accuracy of 96.13%. CONCLUSIONS All these results demonstrated that STRsearch was a better tool to protect the backward compatibility with CE for the targeted STR profiling in MPS data. STRsearch is available as open-source software at https://github.com/AnJingwd/STRsearch.
Collapse
Affiliation(s)
- Dong Wang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai, China
| | - Ruiyang Tao
- Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Academy of Forensic Science, Ministry of Justice, Shanghai, 200063, People's Republic of China
| | - Zhiqiang Li
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai, China
| | - Dun Pan
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai, China
| | - Zhuo Wang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai, China.
| | - Chengtao Li
- Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Academy of Forensic Science, Ministry of Justice, Shanghai, 200063, People's Republic of China.
| | - Yongyong Shi
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
37
|
Guo L, Yang Q, Yang JW, Zhang N, Liu BS, Zhu KC, Guo HY, Jiang SG, Zhang DC. MultiplexSSR: A pipeline for developing multiplex SSR-PCR assays from resequencing data. Ecol Evol 2020; 10:3055-3067. [PMID: 32211176 PMCID: PMC7083706 DOI: 10.1002/ece3.6121] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 02/02/2020] [Accepted: 02/05/2020] [Indexed: 12/15/2022] Open
Abstract
Next-generation sequencing has greatly promoted the investigation of single nucleotide polymorphisms, while studies of simple sequence repeats are sharply decreasing. However, simple sequence repeats still present some advantages in conservation genetics. In this study, an end-to-end pipeline referred to as MultiplexSSR was established to develop multiplex PCR assays in batches with highly polymorphic simple sequence repeats for capillary platforms from resequencing data. The distribution of single sequence repeats in the genome, the error profiles of genotypes and allelotypes, and the increase in the allele length range depending on the number of individuals were investigated. A total of 98% of single sequence repeats presented lengths of less than 100 bp. The error rate of the genotyping and allelotyping of dimeric patterns was ten times higher than those for other patterns. The error rate of allelotyping was less than that of genotyping. The allele length range reached approximate saturation with 10 individuals. This pipeline uses allele numbers to select highly polymorphic loci, masks loci with variation, and applies in silico PCR to improve primer specificity. The application of the developed multiplex SSR-PCR assays validated the pipeline's robustness, showing higher polymorphism and stability for the developed simple sequence repeats and a lower cost for genotyping and providing low-depth resequencing data from less than a dozen individuals for the development of markers. This pipeline fills the gap between next-generation sequencing and multiplex SSR-PCR.
Collapse
Affiliation(s)
- Liang Guo
- Key Laboratory of South China Sea Fishery Resources Exploitation and Utilization Ministry of Agriculture and Rural Affairs South China Sea Fisheries Research Institute Chinese Academy of Fishery Sciences Guangzhou China
- Guangdong Provincial Engineer Technology Research Center of Marine Biological Seed Industry Guangzhou China
| | - Quan Yang
- Key Laboratory of South China Sea Fishery Resources Exploitation and Utilization Ministry of Agriculture and Rural Affairs South China Sea Fisheries Research Institute Chinese Academy of Fishery Sciences Guangzhou China
- Guangdong Provincial Engineer Technology Research Center of Marine Biological Seed Industry Guangzhou China
- National Demonstration Center for Experimental Fisheries Science Education Shanghai Ocean University Shanghai China
| | - Jing-Wen Yang
- Key Laboratory of South China Sea Fishery Resources Exploitation and Utilization Ministry of Agriculture and Rural Affairs South China Sea Fisheries Research Institute Chinese Academy of Fishery Sciences Guangzhou China
- Guangdong Provincial Engineer Technology Research Center of Marine Biological Seed Industry Guangzhou China
| | - Nan Zhang
- Key Laboratory of South China Sea Fishery Resources Exploitation and Utilization Ministry of Agriculture and Rural Affairs South China Sea Fisheries Research Institute Chinese Academy of Fishery Sciences Guangzhou China
- Guangdong Provincial Engineer Technology Research Center of Marine Biological Seed Industry Guangzhou China
| | - Bao-Suo Liu
- Key Laboratory of South China Sea Fishery Resources Exploitation and Utilization Ministry of Agriculture and Rural Affairs South China Sea Fisheries Research Institute Chinese Academy of Fishery Sciences Guangzhou China
- Guangdong Provincial Engineer Technology Research Center of Marine Biological Seed Industry Guangzhou China
| | - Ke-Cheng Zhu
- Key Laboratory of South China Sea Fishery Resources Exploitation and Utilization Ministry of Agriculture and Rural Affairs South China Sea Fisheries Research Institute Chinese Academy of Fishery Sciences Guangzhou China
- Guangdong Provincial Engineer Technology Research Center of Marine Biological Seed Industry Guangzhou China
| | - Hua-Yang Guo
- Key Laboratory of South China Sea Fishery Resources Exploitation and Utilization Ministry of Agriculture and Rural Affairs South China Sea Fisheries Research Institute Chinese Academy of Fishery Sciences Guangzhou China
- Guangdong Provincial Engineer Technology Research Center of Marine Biological Seed Industry Guangzhou China
| | - Shi-Gui Jiang
- Key Laboratory of South China Sea Fishery Resources Exploitation and Utilization Ministry of Agriculture and Rural Affairs South China Sea Fisheries Research Institute Chinese Academy of Fishery Sciences Guangzhou China
- Guangdong Provincial Engineer Technology Research Center of Marine Biological Seed Industry Guangzhou China
| | - Dian-Chang Zhang
- Key Laboratory of South China Sea Fishery Resources Exploitation and Utilization Ministry of Agriculture and Rural Affairs South China Sea Fisheries Research Institute Chinese Academy of Fishery Sciences Guangzhou China
- Guangdong Provincial Engineer Technology Research Center of Marine Biological Seed Industry Guangzhou China
| |
Collapse
|
38
|
Rivero-Hinojosa S, Kinney N, Garner HR, Rood BR. Germline microsatellite genotypes differentiate children with medulloblastoma. Neuro Oncol 2020; 22:152-162. [PMID: 31562520 PMCID: PMC6954392 DOI: 10.1093/neuonc/noz179] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND The germline genetic events underpinning medulloblastoma (MB) initiation, and therefore the ability to determine who is at risk, are still unknown for the majority of cases. Microsatellites are short repeated sequences that make up ~3% of the genome. Repeat lengths vary among individuals and are often nonrandomly associated with disease, including several cancers such as breast, glioma, lung, and ovarian. Due to their effects on gene function, they have been called the "tuning knobs of the genome." METHODS We have developed a novel approach for identifying a microsatellite-based signature to differentiate MB patients from controls using germline DNA. RESULTS Analyzing germline whole exome sequencing data from a training set of 120 MB subjects and 425 controls, we identified 139 individual microsatellite loci whose genotypes differ significantly between the groups. Using a genetic algorithm, we identified a subset of 43 microsatellites that distinguish MB subjects from controls with a sensitivity and specificity of 92% and 88%, respectively. This microsatellite signature was validated in an independent dataset consisting of 102 subjects and 428 controls, with comparable sensitivity and specificity of 95% and 90%, respectively. Analysis of the allele genotypes of those 139 informative loci demonstrates that their association with MB is a consequence of individual microsatellites' genotypes rather than their hypermutability. Finally, an analysis of the genes harboring these microsatellite loci reveals cellular functions important for tumorigenesis. CONCLUSION This study demonstrates that MB-specific germline microsatellite variations mark those at risk for MB development and suggests mechanisms of predisposition.
Collapse
Affiliation(s)
- Samuel Rivero-Hinojosa
- Center for Cancer and Immunology Research, Children's Research Institute, Children's National Medical Center (CNMC), Washington, DC
| | - Nicholas Kinney
- Center for Bioinformatics and Genetics, Edward Via College of Osteopathic Medicine, Blacksburg, Virginia
- Gibbs Cancer Center and Research Institute, Spartanburg, South Carolina
| | - Harold R Garner
- Center for Bioinformatics and Genetics, Edward Via College of Osteopathic Medicine, Blacksburg, Virginia
- Gibbs Cancer Center and Research Institute, Spartanburg, South Carolina
| | - Brian R Rood
- Center for Cancer and Immunology Research, Children's Research Institute, Children's National Medical Center (CNMC), Washington, DC
| |
Collapse
|
39
|
Kinney N, Kang L, Eckstrand L, Pulenthiran A, Samuel P, Anandakrishnan R, Varghese RT, Michalak P, Garner HR. Abundance of ethnically biased microsatellites in human gene regions. PLoS One 2019; 14:e0225216. [PMID: 31830051 PMCID: PMC6907796 DOI: 10.1371/journal.pone.0225216] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 10/29/2019] [Indexed: 12/16/2022] Open
Abstract
Microsatellites-a type of short tandem repeat (STR)-have been used for decades as putatively neutral markers to study the genetic structure of diverse human populations. However, recent studies have demonstrated that some microsatellites contribute to gene expression, cis heritability, and phenotype. As a corollary, some microsatellites may contribute to differential gene expression and RNA/protein structure stability in distinct human populations. To test this hypothesis, we investigate genotype frequencies, functional relevance, and adaptive potential of microsatellites in five super-populations (ethnicities) drawn from the 1000 Genomes Project. We discover 3,984 ethnically-biased microsatellite loci (EBML); for each EBML at least one ethnicity has genotype frequencies statistically different from the remaining four. South Asian, East Asian, European, and American EBML show significant overlap; on the contrary, the set of African EBML is mostly unique. We cross-reference the 3,984 EBML with 2,060 previously identified expression STRs (eSTRs); repeats known to affect gene expression (64 total) are over-represented. The most significant pathway enrichments are those associated with the matrisome: a broad collection of genes encoding the extracellular matrix and its associated proteins. At least 14 of the EBML have established links to human disease. Analysis of the 3,984 EBML with respect to known selective sweep regions in the genome shows that allelic variation in some of them is likely associated with adaptive evolution.
Collapse
Affiliation(s)
- Nick Kinney
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
- Gibbs Cancer Center & Research Institute, Spartanburg, SC, United States of America
| | - Lin Kang
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
- Gibbs Cancer Center & Research Institute, Spartanburg, SC, United States of America
| | - Laurel Eckstrand
- Virginia-Maryland College of Veterinary Medicine, Blacksburg, VA, United States of America
| | - Arichanah Pulenthiran
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
| | - Peter Samuel
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
| | - Ramu Anandakrishnan
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
| | - Robin T. Varghese
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
| | - P. Michalak
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
- Virginia-Maryland College of Veterinary Medicine, Blacksburg, VA, United States of America
- Institute of Evolution, University of Haifa, Haifa, Israel
| | - Harold R. Garner
- Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America
- Gibbs Cancer Center & Research Institute, Spartanburg, SC, United States of America
| |
Collapse
|
40
|
Li Z, Löytynoja A, Fraimout A, Merilä J. Effects of marker type and filtering criteria on Q ST- F ST comparisons. ROYAL SOCIETY OPEN SCIENCE 2019; 6:190666. [PMID: 31827824 PMCID: PMC6894560 DOI: 10.1098/rsos.190666] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 09/16/2019] [Indexed: 06/10/2023]
Abstract
Comparative studies of quantitative and neutral genetic differentiation (Q ST-F ST tests) provide means to detect adaptive population differentiation. However, Q ST-F ST tests can be overly liberal if the markers used deflate F ST below its expectation, or overly conservative if methodological biases lead to inflated F ST estimates. We investigated how marker type and filtering criteria for marker selection influence Q ST-F ST comparisons through their effects on F ST using simulations and empirical data on over 18 000 in silico genotyped microsatellites and 3.8 million single-locus polymorphism (SNP) loci from four populations of nine-spined sticklebacks (Pungitius pungitius). Empirical and simulated data revealed that F ST decreased with increasing marker variability, and was generally higher with SNPs than with microsatellites. The estimated baseline F ST levels were also sensitive to filtering criteria for SNPs: both minor alleles and linkage disequilibrium (LD) pruning influenced F ST estimation, as did marker ascertainment. However, in the case of stickleback data used here where Q ST is high, the choice of marker type, their genomic location, ascertainment and filtering made little difference to outcomes of Q ST-F ST tests. Nevertheless, we recommend that Q ST-F ST tests using microsatellites should discard the most variable loci, and those using SNPs should pay attention to marker ascertainment and properly account for LD before filtering SNPs. This may be especially important when level of quantitative trait differentiation is low and levels of neutral differentiation high.
Collapse
Affiliation(s)
- Zitong Li
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki 00014, Finland
| | - Ari Löytynoja
- Institute of Biotechnology, University of Helsinki, Helsinki 00014, Finland
| | - Antoine Fraimout
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki 00014, Finland
| | - Juha Merilä
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki 00014, Finland
| |
Collapse
|
41
|
Dolzhenko E, Deshpande V, Schlesinger F, Krusche P, Petrovski R, Chen S, Emig-Agius D, Gross A, Narzisi G, Bowman B, Scheffler K, van Vugt JJFA, French C, Sanchis-Juan A, Ibáñez K, Tucci A, Lajoie BR, Veldink JH, Raymond FL, Taft RJ, Bentley DR, Eberle MA. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. BIOINFORMATICS (OXFORD, ENGLAND) 2019; 35:4754-4756. [PMID: 31134279 DOI: 10.1101/361162] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 04/26/2019] [Accepted: 05/23/2019] [Indexed: 05/25/2023]
Abstract
SUMMARY We describe a novel computational method for genotyping repeats using sequence graphs. This method addresses the long-standing need to accurately genotype medically important loci containing repeats adjacent to other variants or imperfect DNA repeats such as polyalanine repeats. Here we introduce a new version of our repeat genotyping software, ExpansionHunter, that uses this method to perform targeted genotyping of a broad class of such loci. AVAILABILITY AND IMPLEMENTATION ExpansionHunter is implemented in C++ and is available under the Apache License Version 2.0. The source code, documentation, and Linux/macOS binaries are available at https://github.com/Illumina/ExpansionHunter/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Peter Krusche
- Illumina Cambridge Ltd, Illumina Centre, 19 Granta Park, Great Abington, Cambridge CB21 6DF, UK
| | - Roman Petrovski
- Illumina Cambridge Ltd, Illumina Centre, 19 Granta Park, Great Abington, Cambridge CB21 6DF, UK
| | - Sai Chen
- Illumina Inc., San Diego, CA 92122, USA
| | | | | | - Giuseppe Narzisi
- Computational Biology, New York Genome Center, New York, NY 10013, USA
| | | | | | - Joke J F A van Vugt
- UMC Utrecht Brain Center, Utrecht University, 3508 AB Utrecht, The Netherlands
| | - Courtney French
- Department of Medical Genetics, NHS Blood and Transplant Centre, Cambridge, CB2 0PT, UK
| | - Alba Sanchis-Juan
- Department of Haematology, University of Cambridge, NHS Blood and Transplant Centre, Cambridge, CB2 0PT, UK
- NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | - Kristina Ibáñez
- Genomics England, Queen Mary University London, London EC1M 6BQ, UK
| | - Arianna Tucci
- Genomics England, Queen Mary University London, London EC1M 6BQ, UK
| | | | - Jan H Veldink
- UMC Utrecht Brain Center, Utrecht University, 3508 AB Utrecht, The Netherlands
| | - F Lucy Raymond
- Department of Medical Genetics, NHS Blood and Transplant Centre, Cambridge, CB2 0PT, UK
| | | | - David R Bentley
- Illumina Cambridge Ltd, Illumina Centre, 19 Granta Park, Great Abington, Cambridge CB21 6DF, UK
| | | |
Collapse
|
42
|
Raz O, Biezuner T, Spiro A, Amir S, Milo L, Titelman A, Onn A, Chapal-Ilani N, Tao L, Marx T, Feige U, Shapiro E. Short tandem repeat stutter model inferred from direct measurement of in vitro stutter noise. Nucleic Acids Res 2019; 47:2436-2445. [PMID: 30698816 PMCID: PMC6412005 DOI: 10.1093/nar/gky1318] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2018] [Revised: 12/19/2018] [Accepted: 01/02/2019] [Indexed: 11/14/2022] Open
Abstract
Short tandem repeats (STRs) are polymorphic genomic loci valuable for various applications such as research, diagnostics and forensics. However, their polymorphic nature also introduces noise during in vitro amplification, making them difficult to analyze. Although it is possible to overcome stutter noise by using amplification-free library preparation, such protocols are presently incompatible with single cell analysis and with targeted-enrichment protocols. To address this challenge, we have designed a method for direct measurement of in vitro noise. Using a synthetic STR sequencing library, we have calibrated a Markov model for the prediction of stutter patterns at any amplification cycle. By employing this model, we have managed to genotype accurately cases of severe amplification bias, and biallelic STR signals, and validated our model for several high-fidelity PCR enzymes. Finally, we compared this model in the context of a naïve STR genotyping strategy against the state-of-the-art on a benchmark of single cells, demonstrating superior accuracy.
Collapse
Affiliation(s)
- Ofir Raz
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Tamir Biezuner
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Adam Spiro
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Shiran Amir
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Lilach Milo
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Alon Titelman
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Amos Onn
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Noa Chapal-Ilani
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Liming Tao
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Tzipy Marx
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Uriel Feige
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Ehud Shapiro
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| |
Collapse
|
43
|
Demir G, Alkan C. Characterizing microsatellite polymorphisms using assembly-based and mapping-based tools. ACTA ACUST UNITED AC 2019; 43:264-273. [PMID: 31496881 PMCID: PMC6710001 DOI: 10.3906/biy-1903-16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Microsatellite polymorphism has always been a challenge for genome assembly and sequence alignment due to sequencing errors, short read lengths, and high incidence of polymerase slippage in microsatellite regions. Despite the information they carry being very valuable, microsatellite variations have not gained enough attention to be a routine step in genome sequence analysis pipelines. After the completion of the 1000 Genomes Project, which aimed to establish the most detailed genetic variation catalog for humans, the consortium released only two microsatellite prediction sets generated by two tools. Many other large research efforts have failed to shed light on microsatellite variations. We evaluated the performance of three different local assembly methods on three different experimental settings, focusing on genotype-based performance, coverage impact, and preprocessing including flanking regions. All these experiments supported our initial expectations on assembly. We also demonstrate that overlap-layout-consensus (OLC)-basedassembly methods show higher sensitivity to microsatellite variant calling when compared to a de Bruijn graph-based approach. We conclude that assembly with OLC is the better method for genotyping microsatellites. Our pipeline is available at https://github.com/gulfemd/STRAssembly.
Collapse
Affiliation(s)
- Gülfem Demir
- Department of Computer Engineering, Faculty of Engineering, Bilkent University, Bilkent, Ankara Turkey
| | - Can Alkan
- Department of Computer Engineering, Faculty of Engineering, Bilkent University, Bilkent, Ankara Turkey
| |
Collapse
|
44
|
Mousavi N, Shleizer-Burko S, Yanicky R, Gymrek M. Profiling the genome-wide landscape of tandem repeat expansions. Nucleic Acids Res 2019; 47:e90. [PMID: 31194863 PMCID: PMC6735967 DOI: 10.1093/nar/gkz501] [Citation(s) in RCA: 139] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 05/15/2019] [Accepted: 05/28/2019] [Indexed: 12/15/2022] Open
Abstract
Tandem repeat (TR) expansions have been implicated in dozens of genetic diseases, including Huntington's Disease, Fragile X Syndrome, and hereditary ataxias. Furthermore, TRs have recently been implicated in a range of complex traits, including gene expression and cancer risk. While the human genome harbors hundreds of thousands of TRs, analysis of TR expansions has been mainly limited to known pathogenic loci. A major challenge is that expanded repeats are beyond the read length of most next-generation sequencing (NGS) datasets and are not profiled by existing genome-wide tools. We present GangSTR, a novel algorithm for genome-wide genotyping of both short and expanded TRs. GangSTR extracts information from paired-end reads into a unified model to estimate maximum likelihood TR lengths. We validate GangSTR on real and simulated data and show that GangSTR outperforms alternative methods in both accuracy and speed. We apply GangSTR to a deeply sequenced trio to profile the landscape of TR expansions in a healthy family and validate novel expansions using orthogonal technologies. Our analysis reveals that healthy individuals harbor dozens of long TR alleles not captured by current genome-wide methods. GangSTR will likely enable discovery of novel disease-associated variants not currently accessible from NGS.
Collapse
Affiliation(s)
- Nima Mousavi
- Department of Electrical and Computer Engineering, University of California San Diego, 9500 Gilman Drive, MC 0639, La Jolla, CA 92093, USA
| | - Sharona Shleizer-Burko
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, MC 0639, La Jolla, CA 92093, USA
| | - Richard Yanicky
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, MC 0639, La Jolla, CA 92093, USA
| | - Melissa Gymrek
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, MC 0639, La Jolla, CA 92093, USA
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, MC 0639, La Jolla, CA 92093, USA
| |
Collapse
|
45
|
Kinney N, Titus-Glover K, Wren JD, Varghese RT, Michalak P, Liao H, Anandakrishnan R, Pulenthiran A, Kang L, Garner HR. CAGm: a repository of germline microsatellite variations in the 1000 genomes project. Nucleic Acids Res 2019; 47:D39-D45. [PMID: 30329086 PMCID: PMC6323891 DOI: 10.1093/nar/gky969] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 10/04/2018] [Accepted: 10/05/2018] [Indexed: 12/14/2022] Open
Abstract
The human genome harbors an abundance of repetitive DNA; however, its function continues to be debated. Microsatellites-a class of short tandem repeat-are established as an important source of genetic variation. Array length variants are common among microsatellites and affect gene expression; but, efforts to understand the role and diversity of microsatellite variation has been hampered by several challenges. Without adequate depth, both long-read and short-read sequencing may not detect the variants present in a sample; additionally, large sample sizes are needed to reveal the degree of population-level polymorphism. To address these challenges we present the Comparative Analysis of Germline Microsatellites (CAGm): a database of germline microsatellites from 2529 individuals in the 1000 genomes project. A key novelty of CAGm is the ability to aggregate microsatellite variation by population, ethnicity (super population) and gender. The database provides advanced searching for microsatellites embedded in genes and functional elements. All data can be downloaded as Microsoft Excel spreadsheets. Two use-case scenarios are presented to demonstrate its utility: a mononucleotide (A) microsatellite at the BAT-26 locus and a dinucleotide (CA) microsatellite in the coding region of FGFRL1. CAGm is freely available at http://www.cagmdb.org/.
Collapse
Affiliation(s)
- Nicholas Kinney
- Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060, USA
| | - Kyle Titus-Glover
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Jonathan D Wren
- Arthritis and Clinical Immunology Research Program, Division of Genomics and Data Sciences Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA
- Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Robin T Varghese
- Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060, USA
| | - Pawel Michalak
- Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060, USA
- One Health Research Center, Virginia-Maryland College of Veterinary Medicine, 1410 Prices Fork Rd, Blacksburg, VA 24060, USA
- Institute of Evolution,University of Haifa, Abba Khoushy Ave 199, Haifa, 3498838, Israel
| | - Han Liao
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Ramu Anandakrishnan
- Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060, USA
| | - Arichanah Pulenthiran
- Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060, USA
| | - Lin Kang
- Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060, USA
| | - Harold R Garner
- Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060, USA
- Gibbs Cancer Center & Research Institute, 101 E Wood St., Spartanburg, SC 29303, USA
| |
Collapse
|
46
|
Tankard RM, Bennett MF, Degorski P, Delatycki MB, Lockhart PJ, Bahlo M. Detecting Expansions of Tandem Repeats in Cohorts Sequenced with Short-Read Sequencing Data. Am J Hum Genet 2018; 103:858-873. [PMID: 30503517 PMCID: PMC6288141 DOI: 10.1016/j.ajhg.2018.10.015] [Citation(s) in RCA: 82] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 10/16/2018] [Indexed: 10/27/2022] Open
Abstract
Repeat expansions cause more than 30 inherited disorders, predominantly neurogenetic. These can present with overlapping clinical phenotypes, making molecular diagnosis challenging. Single-gene or small-panel PCR-based methods can help to identify the precise genetic cause, but they can be slow and costly and often yield no result. Researchers are increasingly performing genomic analysis via whole-exome and whole-genome sequencing (WES and WGS) to diagnose genetic disorders. However, until recently, analysis protocols could not identify repeat expansions in these datasets. We developed exSTRa (expanded short tandem repeat algorithm), a method that uses either WES or WGS to identify repeat expansions. Performance of exSTRa was assessed in a simulation study. In addition, four retrospective cohorts of individuals with eleven different known repeat-expansion disorders were analyzed with exSTRa. We assessed results by comparing the findings to known disease status. Performance was also compared to three other analysis methods (ExpansionHunter, STRetch, and TREDPARSE), which were developed specifically for WGS data. Expansions in the assessed STR loci were successfully identified in WES and WGS datasets by all four methods with high specificity and sensitivity. Overall, exSTRa demonstrated more robust and superior performance for WES data than did the other three methods. We demonstrate that exSTRa can be effectively utilized as a screening tool for detecting repeat expansions in WES and WGS data, although the best performance would be produced by consensus calling, wherein at least two out of the four currently available screening methods call an expansion.
Collapse
Affiliation(s)
- Rick M Tankard
- Population Health and Immunity Division, the Walter and Eliza Hall Institute of Medical Research, Parkville 3052, VIC, Australia; Department of Medical Biology, The University of Melbourne, Melbourne 3010, VIC, Australia; Mathematics and Statistics, Murdoch University, Murdoch 6150, WA, Australia
| | - Mark F Bennett
- Population Health and Immunity Division, the Walter and Eliza Hall Institute of Medical Research, Parkville 3052, VIC, Australia; Department of Medical Biology, The University of Melbourne, Melbourne 3010, VIC, Australia; Epilepsy Research Centre, Department of Medicine, The University of Melbourne, Austin Health, Heidelberg 3084, VIC, Australia
| | - Peter Degorski
- Population Health and Immunity Division, the Walter and Eliza Hall Institute of Medical Research, Parkville 3052, VIC, Australia; Department of Medical Biology, The University of Melbourne, Melbourne 3010, VIC, Australia
| | - Martin B Delatycki
- Bruce Lefroy Centre for Genetic Health Research, Murdoch Children's Research Institute, Royal Children's Hospital, Parkville 3052, VIC, Australia; Victorian Clinical Genetics Services, Parkville 3052, VIC, Australia; Department of Paediatrics, University of Melbourne, Parkville 3058, VIC, Australia
| | - Paul J Lockhart
- Bruce Lefroy Centre for Genetic Health Research, Murdoch Children's Research Institute, Royal Children's Hospital, Parkville 3052, VIC, Australia; Department of Paediatrics, University of Melbourne, Parkville 3058, VIC, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, the Walter and Eliza Hall Institute of Medical Research, Parkville 3052, VIC, Australia; Department of Medical Biology, The University of Melbourne, Melbourne 3010, VIC, Australia.
| |
Collapse
|
47
|
Velmurugan KR, Michalak P, Kang L, Fonville NC, Garner HR. Dysfunctional DNA repair pathway via defective FANCD2 gene engenders multifarious exomic and transcriptomic effects in Fanconi anemia. Mol Genet Genomic Med 2018; 6:1199-1208. [PMID: 30450770 PMCID: PMC6305641 DOI: 10.1002/mgg3.502] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 09/20/2018] [Accepted: 10/10/2018] [Indexed: 01/27/2023] Open
Abstract
Background Fanconi anemia (FA) affects only one in 130,000 births, but has severe and diverse clinical consequences. It has been theorized that defects in the FA DNA cross‐link repair complex lead to a spectrum of variants that are responsible for those diverse clinical phenotypes. Methods Using NextGen sequencing, we show that a clinically derived FA cell line had accumulated numerous genetic variants, including high‐impact mutations, such as deletion of start codons, introduction of premature stop codons, missense mutations, and INDELs. Results About 65% of SNPs and 55% of INDELs were found to be commonly present in both the FA dysfunctional and retrovirally corrected cell lines, showing their common origin. The number of INDELs, but not SNPs, is decreased in FANCD2‐corrected samples, suggesting that FANCD2 deficiency preferentially promotes the origin of INDELs. These genetic modifications had a considerable effect on the transcriptome, with statistically significant changes in the expression of 270 genes. These genetic and transcriptomic variants significantly impacted pathways and molecular functions, spanning a diverse spectrum of disease phenotypes/symptoms, consistent with the disease diversity seen in FA patients. Conclusion These results underscore the consequences of defects in the DNA cross‐link repair mechanism and indicate that accumulating diverse mutations from individual parent cells may make it difficult to anticipate the longitudinal clinical behavior of emerging disease states in an individual with FA.
Collapse
Affiliation(s)
- Karthik Raja Velmurugan
- Primary Care Research Network and the Center for Bioinformatics and Genetics, Edward Via College of Osteopathic Medicine, Blacksburg, Virginia
| | - Pawel Michalak
- Primary Care Research Network and the Center for Bioinformatics and Genetics, Edward Via College of Osteopathic Medicine, Blacksburg, Virginia.,Center for One Health Research, Virginia-Maryland College of Veterinary Medicine, Blacksburg, Virginia.,Institute of Evolution, University of Haifa, Haifa, Israel
| | - Lin Kang
- Primary Care Research Network and the Center for Bioinformatics and Genetics, Edward Via College of Osteopathic Medicine, Blacksburg, Virginia
| | | | - Harold R Garner
- Primary Care Research Network and the Center for Bioinformatics and Genetics, Edward Via College of Osteopathic Medicine, Blacksburg, Virginia.,The Gibbs Cancer Center and Research Institute, Spartanburg, South Carolina
| |
Collapse
|
48
|
Kristmundsdóttir S, Sigurpálsdóttir BD, Kehr B, Halldórsson BV. popSTR: population-scale detection of STR variants. Bioinformatics 2018; 33:4041-4048. [PMID: 27591079 DOI: 10.1093/bioinformatics/btw568] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Accepted: 08/26/2016] [Indexed: 11/14/2022] Open
Abstract
Motivation Microsatellites, also known as short tandem repeats (STRs), are tracts of repetitive DNA sequences containing motifs ranging from two to six bases. Microsatellites are one of the most abundant type of variation in the human genome, after single nucleotide polymorphisms (SNPs) and Indels. Microsatellite analysis has a wide range of applications, including medical genetics, forensics and construction of genetic genealogy. However, microsatellite variations are rarely considered in whole-genome sequencing studies, in large due to a lack of tools capable of analyzing them. Results Here we present a microsatellite genotyper, optimized for Illumina WGS data, which is both faster and more accurate than other methods previously presented. There are two main ingredients to our improvements. First we reduce the amount of sequencing data necessary for creating microsatellite profiles by using previously aligned sequencing data. Second, we use population information to train microsatellite and individual specific error profiles. By comparing our genotyping results to genotypes generated by capillary electrophoresis we show that our error rates are 50% lower than those of lobSTR, another program specifically developed to determine microsatellite genotypes. Availability and Implementation Source code is available on Github: https://github.com/DecodeGenetics/popSTR. Contact snaedis.kristmundsdottir@decode.is or bjarni.halldorsson@decode.is.
Collapse
Affiliation(s)
| | | | | | - Bjarni V Halldórsson
- deCODE genetics/Amgen.,School of Science and Engineering, Reykjavík University, Reykjavík, 101, Iceland
| |
Collapse
|
49
|
Dashnow H, Lek M, Phipson B, Halman A, Sadedin S, Lonsdale A, Davis M, Lamont P, Clayton JS, Laing NG, MacArthur DG, Oshlack A. STRetch: detecting and discovering pathogenic short tandem repeat expansions. Genome Biol 2018; 19:121. [PMID: 30129428 PMCID: PMC6102892 DOI: 10.1186/s13059-018-1505-2] [Citation(s) in RCA: 93] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Accepted: 08/07/2018] [Indexed: 11/10/2022] Open
Abstract
Short tandem repeat (STR) expansions have been identified as the causal DNA mutation in dozens of Mendelian diseases. Most existing tools for detecting STR variation with short reads do so within the read length and so are unable to detect the majority of pathogenic expansions. Here we present STRetch, a new genome-wide method to scan for STR expansions at all loci across the human genome. We demonstrate the use of STRetch for detecting STR expansions using short-read whole-genome sequencing data at known pathogenic loci as well as novel STR loci. STRetch is open source software, available from github.com/Oshlack/STRetch.
Collapse
Affiliation(s)
- Harriet Dashnow
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, VIC, Australia.,School of Biosciences, The University of Melbourne, Parkville, VIC, Australia
| | - Monkol Lek
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Belinda Phipson
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, VIC, Australia
| | - Andreas Halman
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, VIC, Australia.,Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, VIC, Australia
| | - Simon Sadedin
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, VIC, Australia
| | - Andrew Lonsdale
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, VIC, Australia
| | - Mark Davis
- Department of Diagnostic Genomics, PathWest Laboratory Medicine, QEII Medical Centre, Nedlands, WA, Australia
| | - Phillipa Lamont
- Neurogenetic Unit, Royal Perth Hospital, Perth, WA, Australia
| | - Joshua S Clayton
- Harry Perkins Institute of Medical Research, Centre for Medical Research, University of Western Australia, Nedlands, WA, Australia
| | - Nigel G Laing
- Harry Perkins Institute of Medical Research, Centre for Medical Research, University of Western Australia, Nedlands, WA, Australia
| | - Daniel G MacArthur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Alicia Oshlack
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, VIC, Australia. .,School of Biosciences, The University of Melbourne, Parkville, VIC, Australia.
| |
Collapse
|
50
|
Ganesamoorthy D, Cao MD, Duarte T, Chen W, Coin L. GtTR: Bayesian estimation of absolute tandem repeat copy number using sequence capture and high throughput sequencing. BMC Bioinformatics 2018; 19:267. [PMID: 30012093 PMCID: PMC6048696 DOI: 10.1186/s12859-018-2282-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 07/09/2018] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Tandem repeats comprise significant proportion of the human genome including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between individuals. Despite recent advances in high throughput sequencing, analysis of tandem repeats in the context of complex diseases is still hindered by technical limitations. We report a novel targeted sequencing approach, which allows simultaneous analysis of hundreds of repeats. We developed a Bayesian algorithm, namely - GtTR - which combines information from a reference long-read dataset with a short read counting approach to genotype tandem repeats at population scale. PCR sizing analysis was used for validation. RESULTS We used a PacBio long-read sequenced sample to generate a reference tandem repeat genotype dataset with on average 13% absolute deviation from PCR sizing results. Using this reference dataset GtTR generated estimates of VNTR copy number with accuracy within 95% high posterior density (HPD) intervals of 68 and 83% for capture sequence data and 200X WGS data respectively, improving to 87 and 94% with use of a PCR reference. We show that the genotype resolution increases as a function of depth, such that the median 95% HPD interval lies within 25, 14, 12 and 8% of the its midpoint copy number value for 30X, 200X WGS, 395X and 800X capture sequence data respectively. We validated nine targets by PCR sizing analysis and genotype estimates from sequencing results correlated well with PCR results. CONCLUSIONS The novel genotyping approach described here presents a new cost-effective method to explore previously unrecognized class of repeat variation in GWAS studies of complex diseases at the population level. Further improvements in accuracy can be obtained by improving accuracy of the reference dataset.
Collapse
Affiliation(s)
- Devika Ganesamoorthy
- Institute for Molecular Biosciences, University of Queensland, Brisbane, Australia
| | - Minh Duc Cao
- Institute for Molecular Biosciences, University of Queensland, Brisbane, Australia
| | - Tania Duarte
- Institute for Molecular Biosciences, University of Queensland, Brisbane, Australia
| | - Wenhan Chen
- Institute for Molecular Biosciences, University of Queensland, Brisbane, Australia
| | - Lachlan Coin
- Institute for Molecular Biosciences, University of Queensland, Brisbane, Australia
| |
Collapse
|