51
|
Popic V, Rohlicek C, Cunial F, Hajirasouliha I, Meleshko D, Garimella K, Maheshwari A. Cue: a deep-learning framework for structural variant discovery and genotyping. Nat Methods 2023; 20:559-568. [PMID: 36959322 PMCID: PMC10152467 DOI: 10.1038/s41592-023-01799-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 01/29/2023] [Indexed: 03/25/2023]
Abstract
Structural variants (SVs) are a major driver of genetic diversity and disease in the human genome and their discovery is imperative to advances in precision medicine. Existing SV callers rely on hand-engineered features and heuristics to model SVs, which cannot scale to the vast diversity of SVs nor fully harness the information available in sequencing datasets. Here we propose an extensible deep-learning framework, Cue, to call and genotype SVs that can learn complex SV abstractions directly from the data. At a high level, Cue converts alignments to images that encode SV-informative signals and uses a stacked hourglass convolutional neural network to predict the type, genotype and genomic locus of the SVs captured in each image. We show that Cue outperforms the state of the art in the detection of several classes of SVs on synthetic and real short-read data and that it can be easily extended to other sequencing platforms, while achieving competitive performance.
Collapse
Affiliation(s)
| | | | - Fabio Cunial
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Iman Hajirasouliha
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Dmitry Meleshko
- Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
- Tri-Institutional Computational Biology and Medicine Program, Weill Cornell Medicine, New York, NY, USA
| | - Kiran Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | |
Collapse
|
52
|
Points to consider in the detection of germline structural variants using next-generation sequencing: A statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med 2023; 25:100316. [PMID: 36507974 DOI: 10.1016/j.gim.2022.09.017] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 09/29/2022] [Accepted: 09/30/2022] [Indexed: 12/14/2022] Open
|
53
|
Wang S, Liu Y, Wang J, Zhu X, Shi Y, Wang X, Liu T, Xiao X, Wang J. Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features. Front Genet 2023; 13:1096797. [PMID: 36685885 PMCID: PMC9852890 DOI: 10.3389/fgene.2022.1096797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Accepted: 12/14/2022] [Indexed: 01/07/2023] Open
Abstract
A lot of bioinformatics tools were released to detect structural variants from the sequencing data during the past decade. For a data analyst, a natural question is about the selection of a tool fits for the data. Thus, this study presents an automatic tool recommendation method to facilitate data analysis. The optimal variant calling tool was recommended from a set of state-of-the-art bioinformatics tools by given a sequencing data. This recommendation method was implemented under a meta-learning framework, identifying the relationships between data features and the performance of tools. First, the meta-features were extracted to characterize the sequencing data and meta-targets were identified to pinpoint the optimal caller for the sequencing data. Second, a meta-model was constructed to bridge the meta-features and meta-targets. Finally, the recommendation was made according to the evaluation from the meta-model. A series of experiments were conducted to validate this recommendation method on both the simulated and real sequencing data. The results revealed that different SV callers often fit different sequencing data. The recommendation accuracy averaged more than 80% across all experimental configurations, outperforming the random- and fixed-pick strategy. To further facilitate the research community, we incorporated the recommendation method into an online cloud services for genomic data analysis, which is available at https://c.solargenomics.com/ via a simple registration. In addition, the source code and a pre-trained model is available at https://github.com/hello-json/CallerRecommendation for academic usages only.
Collapse
Affiliation(s)
- Shenjie Wang
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China,Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Yuqian Liu
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China,Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Juan Wang
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China,Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China,Annoroad Gene Technology (Beijing) Co. Ltd, Beijing, China,*Correspondence: Juan Wang, ; Jiayin Wang,
| | - Xiaoyan Zhu
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China,Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Yuzhi Shi
- Annoroad Gene Technology (Beijing) Co. Ltd, Beijing, China
| | - Xuwen Wang
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China,Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Tao Liu
- Annoroad Gene Technology (Beijing) Co. Ltd, Beijing, China
| | - Xiao Xiao
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China,Geneplus Shenzhen, Shenzhen, China
| | - Jiayin Wang
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China,Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China,*Correspondence: Juan Wang, ; Jiayin Wang,
| |
Collapse
|
54
|
Rodriguez OL, Silver CA, Shields K, Smith ML, Watson CT. Targeted long-read sequencing facilitates phased diploid assembly and genotyping of the human T cell receptor alpha, delta, and beta loci. CELL GENOMICS 2022; 2:100228. [PMID: 36778049 PMCID: PMC9903726 DOI: 10.1016/j.xgen.2022.100228] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 08/25/2022] [Accepted: 11/05/2022] [Indexed: 12/02/2022]
Abstract
T cell receptors (TCRs) recognize peptide fragments presented by the major histocompatibility complex (MHC) and are critical to T cell-mediated immunity. Recent data have indicated that genetic diversity within TCR-encoding gene regions is underexplored, limiting understanding of the impact of TCR loci polymorphisms on TCR function in disease, even though TCR repertoire signatures (1) are heritable and (2) associate with disease phenotypes. To address this, we developed a targeted long-read sequencing approach to generate highly accurate haplotype resolved assemblies of the TCR beta (TRB) and alpha/delta (TRA/D) loci, facilitating the genotyping of all variant types, including structural variants. We validate our approach using two mother-father-child trios and 5 unrelated donors representing multiple populations. This resulted in improved genotyping accuracy and the discovery of 84 undocumented V, D, J, and C alleles, demonstrating the utility of this framework for improving our understanding of TCR diversity and function in disease.
Collapse
Affiliation(s)
- Oscar L. Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Catherine A. Silver
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Kaitlyn Shields
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Melissa L. Smith
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA,Corresponding author
| |
Collapse
|
55
|
Balachandran P, Walawalkar IA, Flores JI, Dayton JN, Audano PA, Beck CR. Transposable element-mediated rearrangements are prevalent in human genomes. Nat Commun 2022; 13:7115. [PMID: 36402840 PMCID: PMC9675761 DOI: 10.1038/s41467-022-34810-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 11/08/2022] [Indexed: 11/21/2022] Open
Abstract
Transposable elements constitute about half of human genomes, and their role in generating human variation through retrotransposition is broadly studied and appreciated. Structural variants mediated by transposons, which we call transposable element-mediated rearrangements (TEMRs), are less well studied, and the mechanisms leading to their formation as well as their broader impact on human diversity are poorly understood. Here, we identify 493 unique TEMRs across the genomes of three individuals. While homology directed repair is the dominant driver of TEMRs, our sequence-resolved TEMR resource allows us to identify complex inversion breakpoints, triplications or other high copy number polymorphisms, and additional complexities. TEMRs are enriched in genic loci and can create potentially important risk alleles such as a deletion in TRIM65, a known cancer biomarker and therapeutic target. These findings expand our understanding of this important class of structural variation, the mechanisms responsible for their formation, and establish them as an important driver of human diversity.
Collapse
Affiliation(s)
| | | | - Jacob I Flores
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Jacob N Dayton
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA.
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA.
| |
Collapse
|
56
|
Wang Y, Ling Y, Gong J, Zhao X, Zhou H, Xie B, Lou H, Zhuang X, Jin L, Fan S, Zhang G, Xu S. PGG.SV: a whole-genome-sequencing-based structural variant resource and data analysis platform. Nucleic Acids Res 2022; 51:D1109-D1116. [PMID: 36243989 PMCID: PMC9825616 DOI: 10.1093/nar/gkac905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/21/2022] [Accepted: 10/04/2022] [Indexed: 01/30/2023] Open
Abstract
Structural variations (SVs) play important roles in human evolution and diseases, but there is a lack of data resources concerning representative samples, especially for East Asians. Taking advantage of both next-generation sequencing and third-generation sequencing data at the whole-genome level, we developed the database PGG.SV to provide a practical platform for both regionally and globally representative structural variants. In its current version, PGG.SV archives 584 277 SVs obtained from whole-genome sequencing data of 6048 samples, including 1030 long-read sequencing genomes representing 177 global populations. PGG.SV provides (i) high-quality SVs with fine-scale and precise genomic locations in both GRCh37 and GRCh38, covering underrepresented SVs in existing sequencing and microarray data; (ii) hierarchical estimation of SV prevalence in geographical populations; (iii) informative annotations of SV-related genes, potential functions and clinical effects; (iv) an analysis platform to facilitate SV-based case-control association studies and (v) various visualization tools for understanding the SV structures in the human genome. Taken together, PGG.SV provides a user-friendly online interface, easy-to-use analysis tools and a detailed presentation of results. PGG.SV is freely accessible via https://www.biosino.org/pggsv.
Collapse
Affiliation(s)
| | | | | | - Xiaohan Zhao
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200438, China,Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai 201203, China
| | - Hanwen Zhou
- Key Laboratory of Computational Biology, National Genomics Data Center & Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Bo Xie
- Key Laboratory of Computational Biology, National Genomics Data Center & Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Haiyi Lou
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Xinhao Zhuang
- Key Laboratory of Computational Biology, National Genomics Data Center & Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200438, China,Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai 201203, China
| | | | - Shaohua Fan
- Correspondence may also be addressed to Shaohua Fan.
| | - Guoqing Zhang
- Correspondence may also be addressed to Guoqing Zhang.
| | - Shuhua Xu
- To whom correspondence should be addressed. Tel: +86 21 31246617; Fax: +86 21 31246617;
| |
Collapse
|
57
|
Han S, Dias GB, Basting PJ, Viswanatha R, Perrimon N, Bergman C. Local assembly of long reads enables phylogenomics of transposable elements in a polyploid cell line. Nucleic Acids Res 2022; 50:e124. [PMID: 36156149 PMCID: PMC9757076 DOI: 10.1093/nar/gkac794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 07/21/2022] [Accepted: 09/16/2022] [Indexed: 12/24/2022] Open
Abstract
Animal cell lines often undergo extreme genome restructuring events, including polyploidy and segmental aneuploidy that can impede de novo whole-genome assembly (WGA). In some species like Drosophila, cell lines also exhibit massive proliferation of transposable elements (TEs). To better understand the role of transposition during animal cell culture, we sequenced the genome of the tetraploid Drosophila S2R+ cell line using long-read and linked-read technologies. WGAs for S2R+ were highly fragmented and generated variable estimates of TE content across sequencing and assembly technologies. We therefore developed a novel WGA-independent bioinformatics method called TELR that identifies, locally assembles, and estimates allele frequency of TEs from long-read sequence data (https://github.com/bergmanlab/telr). Application of TELR to a ∼130x PacBio dataset for S2R+ revealed many haplotype-specific TE insertions that arose by transposition after initial cell line establishment and subsequent tetraploidization. Local assemblies from TELR also allowed phylogenetic analysis of paralogous TEs, which revealed that proliferation of TE families in vitro can be driven by single or multiple source lineages. Our work provides a model for the analysis of TEs in complex heterozygous or polyploid genomes that are recalcitrant to WGA and yields new insights into the mechanisms of genome evolution in animal cell culture.
Collapse
Affiliation(s)
| | | | - Preston J Basting
- Institute of Bioinformatics, University of Georgia, 120 E. Green St., Athens, GA, USA
| | - Raghuvir Viswanatha
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA, USA
| | - Norbert Perrimon
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA, USA,Howard Hughes Medical Institute, Boston, MA, USA
| | - Casey M Bergman
- To whom correspondence should be addressed. Tel: +1 706 542 1764; Fax: +1 706 542 3910;
| |
Collapse
|
58
|
Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, Corvelo A, Clarke WE, Musunuri R, Nagulapalli K, Fairley S, Runnels A, Winterkorn L, Lowy E, Paul Flicek, Germer S, Brand H, Hall IM, Talkowski ME, Narzisi G, Zody MC. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 2022; 185:3426-3440.e19. [PMID: 36055201 PMCID: PMC9439720 DOI: 10.1016/j.cell.2022.08.004] [Citation(s) in RCA: 363] [Impact Index Per Article: 121.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 06/21/2022] [Accepted: 08/03/2022] [Indexed: 01/05/2023]
Abstract
The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.
Collapse
Affiliation(s)
| | | | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | | | - Haley J Abel
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA; Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Allison A Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA; Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | | | - Wayne E Clarke
- New York Genome Center, New York, NY 10013, USA; Outlier Informatics Inc., Saskatoon, SK S7H 1L4, Canada
| | | | | | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | | - Ernesto Lowy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Harrison Brand
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ira M Hall
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA; Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA; Center for Genomic Health, Yale University School of Medicine, New Haven, CT 06510, USA; Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Michael E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | | |
Collapse
|
59
|
Fu JM, Satterstrom FK, Peng M, Brand H, Collins RL, Dong S, Wamsley B, Klei L, Wang L, Hao SP, Stevens CR, Cusick C, Babadi M, Banks E, Collins B, Dodge S, Gabriel SB, Gauthier L, Lee SK, Liang L, Ljungdahl A, Mahjani B, Sloofman L, Smirnov AN, Barbosa M, Betancur C, Brusco A, Chung BHY, Cook EH, Cuccaro ML, Domenici E, Ferrero GB, Gargus JJ, Herman GE, Hertz-Picciotto I, Maciel P, Manoach DS, Passos-Bueno MR, Persico AM, Renieri A, Sutcliffe JS, Tassone F, Trabetti E, Campos G, Cardaropoli S, Carli D, Chan MCY, Fallerini C, Giorgio E, Girardi AC, Hansen-Kiss E, Lee SL, Lintas C, Ludena Y, Nguyen R, Pavinato L, Pericak-Vance M, Pessah IN, Schmidt RJ, Smith M, Costa CIS, Trajkova S, Wang JYT, Yu MHC, Cutler DJ, De Rubeis S, Buxbaum JD, Daly MJ, Devlin B, Roeder K, Sanders SJ, Talkowski ME. Rare coding variation provides insight into the genetic architecture and phenotypic context of autism. Nat Genet 2022; 54:1320-1331. [PMID: 35982160 PMCID: PMC9653013 DOI: 10.1038/s41588-022-01104-0] [Citation(s) in RCA: 249] [Impact Index Per Article: 83.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 05/24/2022] [Indexed: 01/11/2023]
Abstract
Some individuals with autism spectrum disorder (ASD) carry functional mutations rarely observed in the general population. We explored the genes disrupted by these variants from joint analysis of protein-truncating variants (PTVs), missense variants and copy number variants (CNVs) in a cohort of 63,237 individuals. We discovered 72 genes associated with ASD at false discovery rate (FDR) ≤ 0.001 (185 at FDR ≤ 0.05). De novo PTVs, damaging missense variants and CNVs represented 57.5%, 21.1% and 8.44% of association evidence, while CNVs conferred greatest relative risk. Meta-analysis with cohorts ascertained for developmental delay (DD) (n = 91,605) yielded 373 genes associated with ASD/DD at FDR ≤ 0.001 (664 at FDR ≤ 0.05), some of which differed in relative frequency of mutation between ASD and DD cohorts. The DD-associated genes were enriched in transcriptomes of progenitor and immature neuronal cells, whereas genes showing stronger evidence in ASD were more enriched in maturing neurons and overlapped with schizophrenia-associated genes, emphasizing that these neuropsychiatric disorders may share common pathways to risk.
Collapse
Affiliation(s)
- Jack M Fu
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - F Kyle Satterstrom
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Minshi Peng
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Pediatric Surgical Research Laboratories, Department of Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Ryan L Collins
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA
| | - Shan Dong
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA, USA
| | - Brie Wamsley
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Lambertus Klei
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Lily Wang
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA
| | - Stephanie P Hao
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Pediatric Surgical Research Laboratories, Department of Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Christine R Stevens
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Caroline Cusick
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mehrtash Babadi
- Data Sciences Platform, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Eric Banks
- Data Sciences Platform, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Brett Collins
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sheila Dodge
- Genomics Platform, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Stacey B Gabriel
- Genomics Platform, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Laura Gauthier
- Data Sciences Platform, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Samuel K Lee
- Data Sciences Platform, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lindsay Liang
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA, USA
| | - Alicia Ljungdahl
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA, USA
| | - Behrang Mahjani
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Laura Sloofman
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Andrey N Smirnov
- Data Sciences Platform, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mafalda Barbosa
- The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Catalina Betancur
- Sorbonne Université, INSERM, CNRS, Neuroscience Paris Seine, Institut de Biologie Paris Seine, Paris, France
| | - Alfredo Brusco
- Department of Medical Sciences, University of Torino, Turin, Italy
- Medical Genetics Unit, 'Città della Salute e della Scienza' University Hospital, Turin, Italy
| | - Brian H Y Chung
- Department of Pediatrics and Adolescent Medicine, Duchess of Kent Children's Hospital, The University of Hong Kong, Hong Kong Special Administrative Region, China
| | - Edwin H Cook
- Institute for Juvenile Research, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA
| | - Michael L Cuccaro
- The John P Hussman Institute for Human Genomics, The University of Miami Miller School of Medicine, Miami, FL, USA
| | - Enrico Domenici
- Department of Cellular, Computational and Integrative Biology, , University of Trento, Trento, Italy
| | | | - J Jay Gargus
- Center for Autism Research and Translation, University of California Irvine, Irvine, CA, USA
| | - Gail E Herman
- The Research Institute at Nationwide Children's Hospital, Columbus, OH, USA
| | - Irva Hertz-Picciotto
- MIND (Medical Investigation of Neurodevelopmental Disorders) Institute, University of California Davis, Davis, CA, USA
| | - Patricia Maciel
- Life and Health Sciences Research Institute, School of Medicine, University of Minho, Braga, Portugal
| | - Dara S Manoach
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Maria Rita Passos-Bueno
- Centro de Pesquisas sobre o Genoma Humano e Células tronco, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Antonio M Persico
- Interdepartmental Program 'Autism 0-90', 'Gaetano Martino' University Hospital, University of Messina, Messina, Italy
| | - Alessandra Renieri
- Med Biotech Hub and Competence Center, Department of Medical Biotechnologies, University of Siena, Siena, Italy
- Medical Genetics, , University of Siena, Siena, Italy
- Genetica Medica, Azienda Ospedaliera Universitaria Senese, Siena, Italy
| | - James S Sutcliffe
- Department of Molecular Physiology & Biophysics and Psychiatry, Vanderbilt University School of Medicine, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Flora Tassone
- MIND (Medical Investigation of Neurodevelopmental Disorders) Institute, University of California Davis, Davis, CA, USA
- Department of Biochemistry and Molecular Medicine, University of California Davis, School of Medicine, Sacramento, CA, USA
| | - Elisabetta Trabetti
- Department of Neurosciences, Biomedicine and Movement Sciences, Section of Biology and Genetics, University of Verona, Verona, Italy
| | - Gabriele Campos
- Centro de Pesquisas sobre o Genoma Humano e Células tronco, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Simona Cardaropoli
- Department of Public Health and Pediatrics, University of Torino, Turin, Italy
| | - Diana Carli
- Department of Public Health and Pediatrics, University of Torino, Turin, Italy
| | - Marcus C Y Chan
- Department of Pediatrics and Adolescent Medicine, Duchess of Kent Children's Hospital, The University of Hong Kong, Hong Kong Special Administrative Region, China
| | - Chiara Fallerini
- Med Biotech Hub and Competence Center, Department of Medical Biotechnologies, University of Siena, Siena, Italy
- Medical Genetics, , University of Siena, Siena, Italy
| | - Elisa Giorgio
- Department of Medical Sciences, University of Torino, Turin, Italy
| | - Ana Cristina Girardi
- Centro de Pesquisas sobre o Genoma Humano e Células tronco, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Emily Hansen-Kiss
- Department of Diagnostic and Biomedical Sciences, University of Texas Health Science Center at Houston, School of Dentistry, Houston, TX, USA
| | - So Lun Lee
- Department of Pediatrics and Adolescent Medicine, Duchess of Kent Children's Hospital, The University of Hong Kong, Hong Kong Special Administrative Region, China
| | - Carla Lintas
- Service for Neurodevelopmental Disorders, University Campus Bio-medico of Rome, Rome, Italy
| | - Yunin Ludena
- MIND (Medical Investigation of Neurodevelopmental Disorders) Institute, University of California Davis, Davis, CA, USA
| | - Rachel Nguyen
- Center for Autism Research and Translation, University of California Irvine, Irvine, CA, USA
| | - Lisa Pavinato
- Department of Medical Sciences, University of Torino, Turin, Italy
| | - Margaret Pericak-Vance
- The John P Hussman Institute for Human Genomics, The University of Miami Miller School of Medicine, Miami, FL, USA
| | - Isaac N Pessah
- MIND (Medical Investigation of Neurodevelopmental Disorders) Institute, University of California Davis, Davis, CA, USA
- Department of Molecular Biosciences, University of California Davis, School of Veterinary Medicine, Davis, CA, USA
| | - Rebecca J Schmidt
- MIND (Medical Investigation of Neurodevelopmental Disorders) Institute, University of California Davis, Davis, CA, USA
| | - Moyra Smith
- Center for Autism Research and Translation, University of California Irvine, Irvine, CA, USA
| | - Claudia I S Costa
- Centro de Pesquisas sobre o Genoma Humano e Células tronco, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Slavica Trajkova
- Department of Medical Sciences, University of Torino, Turin, Italy
| | - Jaqueline Y T Wang
- Centro de Pesquisas sobre o Genoma Humano e Células tronco, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Mullin H C Yu
- Department of Pediatrics and Adolescent Medicine, Duchess of Kent Children's Hospital, The University of Hong Kong, Hong Kong Special Administrative Region, China
| | - David J Cutler
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Silvia De Rubeis
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Joseph D Buxbaum
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Mark J Daly
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Harvard Medical School, Boston, MA, USA.
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland.
| | - Bernie Devlin
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA.
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Stephan J Sanders
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA, USA.
| | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
60
|
Sarwal V, Niehus S, Ayyala R, Kim M, Sarkar A, Chang S, Lu A, Rajkumar N, Darci-Maher N, Littman R, Chhugani K, Soylev A, Comarova Z, Wesel E, Castellanos J, Chikka R, Distler MG, Eskin E, Flint J, Mangul S. A comprehensive benchmarking of WGS-based deletion structural variant callers. Brief Bioinform 2022; 23:bbac221. [PMID: 35753701 PMCID: PMC9294411 DOI: 10.1093/bib/bbac221] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 04/30/2022] [Accepted: 05/11/2022] [Indexed: 01/10/2023] Open
Abstract
Advances in whole-genome sequencing (WGS) promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from WGS data presents a substantial number of challenges and a plethora of SV detection methods have been developed. Currently, evidence that investigators can use to select appropriate SV detection tools is lacking. In this article, we have evaluated the performance of SV detection tools on mouse and human WGS data using a comprehensive polymerase chain reaction-confirmed gold standard set of SVs and the genome-in-a-bottle variant set, respectively. In contrast to the previous benchmarking studies, our gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of the SV detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance as the SV detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Additionally, we have determined the SV callers best suited for low- and ultralow-pass sequencing data as well as for different deletion length categories.
Collapse
Affiliation(s)
- Varuni Sarwal
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
- Indian Institute of Technology Delhi, Hauz Khas, New Delhi, Delhi 110016, India
| | - Sebastian Niehus
- Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Str. 2, 10178 Berlin, Germany
- Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, 10117 Berlin, Germany
| | - Ram Ayyala
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Minyoung Kim
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089
| | - Aditya Sarkar
- School of Computing and Electrical Engineering, Indian Institute of Technology Mandi, Kamand, Mandi, Himachal Pradesh 175001, India
| | - Sei Chang
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Angela Lu
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Neha Rajkumar
- Department of Bioengineering, Department of Bioengineering, University of California Los Angeles, Los Angeles, CA, 90095
| | - Nicholas Darci-Maher
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Russell Littman
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Karishma Chhugani
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California 1985 Zonal Avenue Los Angeles, CA 90089-9121
| | - Arda Soylev
- Department of Computer Engineering, Konya Food and Agriculture University, Konya, Turkey
| | - Zoia Comarova
- Department Civil and Environmental Engineering, University of Southern California, Los Angeles, CA, United States
| | - Emily Wesel
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Jacqueline Castellanos
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Rahul Chikka
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Margaret G Distler
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, 695 Charles E. Young Drive South, Box 708822, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, 73-235 CHS, Los Angeles, CA, 90095, USA
| | - Jonathan Flint
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, 760 Westwood Plaza, Los Angeles, CA 90095, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California 1985 Zonal Avenue Los Angeles, CA 90089-9121
| |
Collapse
|
61
|
Hamdan A, Ewing A. Unravelling the tumour genome: The evolutionary and clinical impacts of structural variants in tumourigenesis. J Pathol 2022; 257:479-493. [PMID: 35355264 PMCID: PMC9321913 DOI: 10.1002/path.5901] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 03/16/2022] [Accepted: 03/28/2022] [Indexed: 11/15/2022]
Abstract
Structural variants (SVs) represent a major source of aberration in tumour genomes. Given the diversity in the size and type of SVs present in tumours, the accurate detection and interpretation of SVs in tumours is challenging. New classes of complex structural events in tumours are discovered frequently, and the definitions of the genomic consequences of complex events are constantly being refined. Detailed analyses of short-read whole-genome sequencing (WGS) data from large tumour cohorts facilitate the interrogation of SVs at orders of magnitude greater scale and depth. However, the inherent technical limitations of short-read WGS prevent us from accurately detecting and investigating the impact of all the SVs present in tumours. The expanded use of long-read WGS will be critical for improving the accuracy of SV detection, and in fully resolving complex SV events, both of which are crucial for determining the impact of SVs on tumour progression and clinical outcome. Despite the present limitations, we demonstrate that SVs play an important role in tumourigenesis. In particular, SVs contribute significantly to late-stage tumour development and to intratumoural heterogeneity. The evolutionary trajectories of SVs represent a window into the clonal dynamics in tumours, a comprehensive understanding of which will be vital for influencing patient outcomes in the future. Recent findings have highlighted many clinical applications of SVs in cancer, from early detection to biomarkers for treatment response and prognosis. As the methods to detect and interpret SVs improve, elucidating the full breadth of the complex SV landscape and determining how these events modulate tumour evolution will improve our understanding of cancer biology and our ability to capitalise on the utility of SVs in the clinical management of cancer patients. © 2022 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.
Collapse
Affiliation(s)
- Alhafidz Hamdan
- MRC Human Genetics Unit, Institute of Genetics and CancerUniversity of EdinburghEdinburghUK
- Cancer Research UK Edinburgh Centre, Institute of Genetics and CancerUniversity of EdinburghEdinburghUK
| | - Ailith Ewing
- MRC Human Genetics Unit, Institute of Genetics and CancerUniversity of EdinburghEdinburghUK
- Cancer Research UK Edinburgh Centre, Institute of Genetics and CancerUniversity of EdinburghEdinburghUK
| |
Collapse
|
62
|
Maldonado-Taipe N, Barbier F, Schmid K, Jung C, Emrani N. High-Density Mapping of Quantitative Trait Loci Controlling Agronomically Important Traits in Quinoa ( Chenopodium quinoa Willd.). FRONTIERS IN PLANT SCIENCE 2022; 13:916067. [PMID: 35812962 PMCID: PMC9261497 DOI: 10.3389/fpls.2022.916067] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 05/17/2022] [Indexed: 06/15/2023]
Abstract
Quinoa is a pseudocereal originating from the Andean regions. Despite quinoa's long cultivation history, genetic analysis of this crop is still in its infancy. We aimed to localize quantitative trait loci (QTL) contributing to the phenotypic variation of agronomically important traits. We crossed the Chilean accession PI-614889 and the Peruvian accession CHEN-109, which depicted significant differences in days to flowering, days to maturity, plant height, panicle length, and thousand kernel weight (TKW), saponin content, and mildew susceptibility. We observed sizeable phenotypic variation across F2 plants and F3 families grown in the greenhouse and the field, respectively. We used Skim-seq to genotype the F2 population and constructed a high-density genetic map with 133,923 single nucleotide polymorphism (SNPs). Fifteen QTL were found for ten traits. Two significant QTL, common in F2 and F3 generations, depicted pleiotropy for days to flowering, plant height, and TKW. The pleiotropic QTL harbored several putative candidate genes involved in photoperiod response and flowering time regulation. This study presents the first high-density genetic map of quinoa that incorporates QTL for several important agronomical traits. The pleiotropic loci can facilitate marker-assisted selection in quinoa breeding programs.
Collapse
Affiliation(s)
| | - Federico Barbier
- Plant Breeding Institute, Christian-Albrechts-University of Kiel, Kiel, Germany
| | - Karl Schmid
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Stuttgart, Germany
| | - Christian Jung
- Plant Breeding Institute, Christian-Albrechts-University of Kiel, Kiel, Germany
| | - Nazgol Emrani
- Plant Breeding Institute, Christian-Albrechts-University of Kiel, Kiel, Germany
| |
Collapse
|
63
|
Lin X, Yang Y, Melton PE, Singh V, Simpson-Yap S, Burdon KP, Taylor BV, Zhou Y. Integrating Genetic Structural Variations and Whole-Genome Sequencing Into Clinical Neurology. Neurol Genet 2022. [DOI: 10.1212/nxg.0000000000200005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Advances in genome sequencing technologies have unlocked new possibilities in identifying disease-associated and causative genetic markers, which may in turn enhance disease diagnosis and improve prognostication and management strategies. With the capability of examining genetic variations ranging from single-nucleotide mutations to large structural variants, whole-genome sequencing (WGS) is an increasingly adopted approach to dissect the complex genetic architecture of neurologic diseases. There is emerging evidence for different structural variants and their roles in major neurologic and neurodevelopmental diseases. This review first describes different structural variants and their implicated roles in major neurologic and neurodevelopmental diseases, and then discusses the clinical relevance of WGS applications in neurology. Notably, WGS-based detection of structural variants has shown promising potential in enhancing diagnostic power of genetic tests in clinical settings. Ongoing WGS-based research in structural variations and quantifying mutational constraints can also yield clinical benefits by improving variant interpretation and disease diagnosis, while supporting biomarker discovery and therapeutic development. As a result, wider integration of WGS technologies into health care will likely increase diagnostic yields in difficult-to-diagnose conditions and define potential therapeutic targets or intervention points for genome-editing strategies.
Collapse
|
64
|
Yang J, Chaisson MJP. TT-Mars: structural variants assessment based on haplotype-resolved assemblies. Genome Biol 2022; 23:110. [PMID: 35524317 PMCID: PMC9077962 DOI: 10.1186/s13059-022-02666-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 03/30/2022] [Indexed: 01/30/2023] Open
Abstract
Variant benchmarking is often performed by comparing a test callset to a gold standard set of variants. In repetitive regions of the genome, it may be difficult to establish what is the truth for a call, for example, when different alignment scoring metrics provide equally supported but different variant calls on the same data. Here, we provide an alternative approach, TT-Mars, that takes advantage of the recent production of high-quality haplotype-resolved genome assemblies by providing false discovery rates for variant calls based on how well their call reflects the content of the assembly, rather than comparing calls themselves.
Collapse
Affiliation(s)
- Jianzhi Yang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
65
|
Danis D, Jacobsen JOB, Balachandran P, Zhu Q, Yilmaz F, Reese J, Haimel M, Lyon GJ, Helbig I, Mungall CJ, Beck CR, Lee C, Smedley D, Robinson PN. SvAnna: efficient and accurate pathogenicity prediction of coding and regulatory structural variants in long-read genome sequencing. Genome Med 2022; 14:44. [PMID: 35484572 PMCID: PMC9047340 DOI: 10.1186/s13073-022-01046-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 04/12/2022] [Indexed: 01/18/2023] Open
Abstract
Structural variants (SVs) are implicated in the etiology of Mendelian diseases but have been systematically underascertained owing to sequencing technology limitations. Long-read sequencing enables comprehensive detection of SVs, but approaches for prioritization of candidate SVs are needed. Structural variant Annotation and analysis (SvAnna) assesses all classes of SVs and their intersection with transcripts and regulatory sequences, relating predicted effects on gene function with clinical phenotype data. SvAnna places 87% of deleterious SVs in the top ten ranks. The interpretable prioritizations offered by SvAnna will facilitate the widespread adoption of long-read sequencing in diagnostic genomics. SvAnna is available at https://github.com/TheJacksonLaboratory/SvAnn a .
Collapse
Affiliation(s)
- Daniel Danis
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Julius O B Jacobsen
- William Harvey Research Institute, Charterhouse Square, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| | | | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Matthias Haimel
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, Vienna, Austria
- St. Anna Children's Cancer Research Institute, Vienna, Austria
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Present address: Global Computational Biology and Digital Sciences, Boehringer Ingelheim Regional Center Vienna GmbH & Co KG, 1120, Vienna, Austria
| | - Gholson J Lyon
- Department of Human Genetics, New York State Institute for Basic Research in Developmental Disabilities, Staten Island, New York, USA
- Biology PhD Program, The Graduate Center, The City University of New York, New York, USA
| | - Ingo Helbig
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Neurology, University of Pennsylvania, Philadelphia, PA, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, 06032, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, 06269, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Damian Smedley
- William Harvey Research Institute, Charterhouse Square, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK.
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, 06032, USA.
| |
Collapse
|
66
|
Austin-Tse CA, Jobanputra V, Perry DL, Bick D, Taft RJ, Venner E, Gibbs RA, Young T, Barnett S, Belmont JW, Boczek N, Chowdhury S, Ellsworth KA, Guha S, Kulkarni S, Marcou C, Meng L, Murdock DR, Rehman AU, Spiteri E, Thomas-Wilson A, Kearney HM, Rehm HL. Best practices for the interpretation and reporting of clinical whole genome sequencing. NPJ Genom Med 2022; 7:27. [PMID: 35395838 PMCID: PMC8993917 DOI: 10.1038/s41525-022-00295-z] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 02/17/2022] [Indexed: 01/19/2023] Open
Abstract
Whole genome sequencing (WGS) shows promise as a first-tier diagnostic test for patients with rare genetic disorders. However, standards addressing the definition and deployment practice of a best-in-class test are lacking. To address these gaps, the Medical Genome Initiative, a consortium of leading health care and research organizations in the US and Canada, was formed to expand access to high quality clinical WGS by convening experts and publishing best practices. Here, we present best practice recommendations for the interpretation and reporting of clinical diagnostic WGS, including discussion of challenges and emerging approaches that will be critical to harness the full potential of this comprehensive test.
Collapse
Affiliation(s)
- Christina A Austin-Tse
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Laboratory for Molecular Medicine, Mass General Brigham Personalized Medicine, Cambridge, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Vaidehi Jobanputra
- Molecular Diagnostics Laboratory, New York Genome Center, New York, NY, USA
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY, USA
| | | | - David Bick
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | | | - Eric Venner
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Ted Young
- Genome Diagnostics, Department of Paediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, ON, Canada
| | - Sarah Barnett
- Division of Laboratory Genetics and Genomics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | | | - Nicole Boczek
- Division of Laboratory Genetics and Genomics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
- Center for Individualized Medicine, College of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Shimul Chowdhury
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | | | - Saurav Guha
- Molecular Diagnostics Laboratory, New York Genome Center, New York, NY, USA
| | - Shashikant Kulkarni
- Baylor Genetics and Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Cherisse Marcou
- Division of Laboratory Genetics and Genomics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Linyan Meng
- Baylor Genetics and Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - David R Murdock
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Atteeq U Rehman
- Molecular Diagnostics Laboratory, New York Genome Center, New York, NY, USA
| | - Elizabeth Spiteri
- Department of Pathology, Stanford Medicine, Stanford University, Stanford, CA, USA
| | | | - Hutton M Kearney
- Division of Laboratory Genetics and Genomics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Heidi L Rehm
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
67
|
Ebler J, Ebert P, Clarke WE, Rausch T, Audano PA, Houwaart T, Mao Y, Korbel JO, Eichler EE, Zody MC, Dilthey AT, Marschall T. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat Genet 2022; 54:518-525. [PMID: 35410384 PMCID: PMC9005351 DOI: 10.1038/s41588-022-01043-w] [Citation(s) in RCA: 101] [Impact Index Per Article: 33.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 03/03/2022] [Indexed: 12/30/2022]
Abstract
Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation-a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.
Collapse
Affiliation(s)
- Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | | | - Tobias Rausch
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
- European Molecular Biology Laboratory, GeneCore, Heidelberg, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Torsten Houwaart
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | | | - Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute of Medical Statistics and Computational Biology, University of Cologne, Cologne, Germany
- Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne, Cologne, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
68
|
Schuler BA, Nelson ET, Koziura M, Cogan JD, Hamid R, Phillips JA. Lessons learned: next-generation sequencing applied to undiagnosed genetic diseases. J Clin Invest 2022; 132:e154942. [PMID: 35362483 PMCID: PMC8970663 DOI: 10.1172/jci154942] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Rare genetic disorders, when considered together, are relatively common. Despite advancements in genetics and genomics technologies as well as increased understanding of genomic function and dysfunction, many genetic diseases continue to be difficult to diagnose. The goal of this Review is to increase the familiarity of genetic testing strategies for non-genetics providers. As genetic testing is increasingly used in primary care, many subspecialty clinics, and various inpatient settings, it is important that non-genetics providers have a fundamental understanding of the strengths and weaknesses of various genetic testing strategies as well as develop an ability to interpret genetic testing results. We provide background on commonly used genetic testing approaches, give examples of phenotypes in which the various genetic testing approaches are used, describe types of genetic and genomic variations, cover challenges in variant identification, provide examples in which next-generation sequencing (NGS) failed to uncover the variant responsible for a disease, and discuss opportunities for continued improvement in the application of NGS clinically. As genetic testing becomes increasingly a part of all areas of medicine, familiarity with genetic testing approaches and result interpretation is vital to decrease the burden of undiagnosed disease.
Collapse
Affiliation(s)
- Bryce A. Schuler
- Division of Medical Genetics and Genomics and
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Erica T. Nelson
- Division of Medical Genetics and Genomics and
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Mary Koziura
- Division of Medical Genetics and Genomics and
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Joy D. Cogan
- Division of Medical Genetics and Genomics and
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Rizwan Hamid
- Division of Medical Genetics and Genomics and
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - John A. Phillips
- Division of Medical Genetics and Genomics and
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
69
|
Gardner EJ, Sifrim A, Lindsay SJ, Prigmore E, Rajan D, Danecek P, Gallone G, Eberhardt RY, Martin HC, Wright CF, FitzPatrick DR, Firth HV, Hurles ME. Detecting cryptic clinically relevant structural variation in exome-sequencing data increases diagnostic yield for developmental disorders. Am J Hum Genet 2021; 108:2186-2194. [PMID: 34626536 PMCID: PMC8595893 DOI: 10.1016/j.ajhg.2021.09.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 09/15/2021] [Indexed: 11/29/2022] Open
Abstract
Structural variation (SV) describes a broad class of genetic variation greater than 50 bp in size. SVs can cause a wide range of genetic diseases and are prevalent in rare developmental disorders (DDs). Individuals presenting with DDs are often referred for diagnostic testing with chromosomal microarrays (CMAs) to identify large copy-number variants (CNVs) and/or with single-gene, gene-panel, or exome sequencing (ES) to identify single-nucleotide variants, small insertions/deletions, and CNVs. However, individuals with pathogenic SVs undetectable by conventional analysis often remain undiagnosed. Consequently, we have developed the tool InDelible, which interrogates short-read sequencing data for split-read clusters characteristic of SV breakpoints. We applied InDelible to 13,438 probands with severe DDs recruited as part of the Deciphering Developmental Disorders (DDD) study and discovered 63 rare, damaging variants in genes previously associated with DDs missed by standard SNV, indel, or CNV discovery approaches. Clinical review of these 63 variants determined that about half (30/63) were plausibly pathogenic. InDelible was particularly effective at ascertaining variants between 21 and 500 bp in size and increased the total number of potentially pathogenic variants identified by DDD in this size range by 42.9%. Of particular interest were seven confirmed de novo variants in MECP2, which represent 35.0% of all de novo protein-truncating variants in MECP2 among DDD study participants. InDelible provides a framework for the discovery of pathogenic SVs that are most likely missed by standard analytical workflows and has the potential to improve the diagnostic yield of ES across a broad range of genetic diseases.
Collapse
Affiliation(s)
- Eugene J Gardner
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton CB10 1SA, UK
| | - Alejandro Sifrim
- Department of Human Genetics, KU Leuven, Herestraat 49, Box 602, Leuven 3000, Belgium
| | - Sarah J Lindsay
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton CB10 1SA, UK
| | - Elena Prigmore
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton CB10 1SA, UK
| | - Diana Rajan
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton CB10 1SA, UK
| | - Petr Danecek
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton CB10 1SA, UK
| | - Giuseppe Gallone
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton CB10 1SA, UK
| | - Ruth Y Eberhardt
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton CB10 1SA, UK
| | - Hilary C Martin
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton CB10 1SA, UK
| | - Caroline F Wright
- University of Exeter Medical School, Institute of Biomedical and Clinical Science, Royal Devon and Exeter Hospital, Exeter EX2 5DW, UK
| | - David R FitzPatrick
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, WGH, Edinburgh EH4 2SP, UK
| | - Helen V Firth
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton CB10 1SA, UK; East Anglian Medical Genetics Service, Box 134, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Matthew E Hurles
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton CB10 1SA, UK.
| |
Collapse
|
70
|
Yan SM, Sherman RM, Taylor DJ, Nair DR, Bortvin AN, Schatz MC, McCoy RC. Local adaptation and archaic introgression shape global diversity at human structural variant loci. eLife 2021; 10:e67615. [PMID: 34528508 PMCID: PMC8492059 DOI: 10.7554/elife.67615] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 09/14/2021] [Indexed: 12/13/2022] Open
Abstract
Large genomic insertions and deletions are a potent source of functional variation, but are challenging to resolve with short-read sequencing, limiting knowledge of the role of such structural variants (SVs) in human evolution. Here, we used a graph-based method to genotype long-read-discovered SVs in short-read data from diverse human genomes. We then applied an admixture-aware method to identify 220 SVs exhibiting extreme patterns of frequency differentiation - a signature of local adaptation. The top two variants traced to the immunoglobulin heavy chain locus, tagging a haplotype that swept to near fixation in certain southeast Asian populations, but is rare in other global populations. Further investigation revealed evidence that the haplotype traces to gene flow from Neanderthals, corroborating the role of immune-related genes as prominent targets of adaptive introgression. Our study demonstrates how recent technical advances can help resolve signatures of key evolutionary events that remained obscured within technically challenging regions of the genome.
Collapse
Affiliation(s)
- Stephanie M Yan
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
| | - Rachel M Sherman
- Department of Computer Science, Johns Hopkins UniversityBaltimoreUnited States
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
| | - Divya R Nair
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
| | - Andrew N Bortvin
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
| | - Michael C Schatz
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
- Department of Computer Science, Johns Hopkins UniversityBaltimoreUnited States
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
| |
Collapse
|
71
|
Trost B, Loureiro LO, Scherer SW. Discovery of genomic variation across a generation. Hum Mol Genet 2021; 30:R174-R186. [PMID: 34296264 PMCID: PMC8490016 DOI: 10.1093/hmg/ddab209] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/09/2021] [Accepted: 07/19/2021] [Indexed: 11/12/2022] Open
Abstract
Over the past 30 years (the timespan of a generation), advances in genomics technologies have revealed tremendous and unexpected variation in the human genome and have provided increasingly accurate answers to long-standing questions of how much genetic variation exists in human populations and to what degree the DNA complement changes between parents and offspring. Tracking the characteristics of these inherited and spontaneous (or de novo) variations has been the basis of the study of human genetic disease. From genome-wide microarray and next-generation sequencing scans, we now know that each human genome contains over 3 million single nucleotide variants when compared with the ~ 3 billion base pairs in the human reference genome, along with roughly an order of magnitude more DNA—approximately 30 megabase pairs (Mb)—being ‘structurally variable’, mostly in the form of indels and copy number changes. Additional large-scale variations include balanced inversions (average of 18 Mb) and complex, difficult-to-resolve alterations. Collectively, ~1% of an individual’s genome will differ from the human reference sequence. When comparing across a generation, fewer than 100 new genetic variants are typically detected in the euchromatic portion of a child’s genome. Driven by increasingly higher-resolution and higher-throughput sequencing technologies, newer and more accurate databases of genetic variation (for instance, more comprehensive structural variation data and phasing of combinations of variants along chromosomes) of worldwide populations will emerge to underpin the next era of discovery in human molecular genetics.
Collapse
Affiliation(s)
- Brett Trost
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Livia O Loureiro
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Stephen W Scherer
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada.,McLaughlin Centre and Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| |
Collapse
|