1
|
Collins RL, Talkowski ME. Diversity and consequences of structural variation in the human genome. Nat Rev Genet 2025:10.1038/s41576-024-00808-9. [PMID: 39838028 DOI: 10.1038/s41576-024-00808-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/26/2024] [Indexed: 01/23/2025]
Abstract
The biomedical community is increasingly invested in capturing all genetic variants across human genomes, interpreting their functional consequences and translating these findings to the clinic. A crucial component of this endeavour is the discovery and characterization of structural variants (SVs), which are ubiquitous in the human population, heterogeneous in their mutational processes, key substrates for evolution and adaptation, and profound drivers of human disease. The recent emergence of new technologies and the remarkable scale of sequence-based population studies have begun to crystalize our understanding of SVs as a mutational class and their widespread influence across phenotypes. In this Review, we summarize recent discoveries and new insights into SVs in the human genome in terms of their mutational patterns, population genetics, functional consequences, and impact on human traits and disease. We conclude by outlining three frontiers to be explored by the field over the next decade.
Collapse
Affiliation(s)
- Ryan L Collins
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
2
|
Gillani R, Collins RL, Crowdis J, Garza A, Jones JK, Walker M, Sanchis-Juan A, Whelan CW, Pierce-Hoffman E, Talkowski ME, Brand H, Haigis K, LoPiccolo J, AlDubayan SH, Gusev A, Crompton BD, Janeway KA, Van Allen EM. Rare germline structural variants increase risk for pediatric solid tumors. Science 2025; 387:eadq0071. [PMID: 39745975 DOI: 10.1126/science.adq0071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 10/25/2024] [Indexed: 01/04/2025]
Abstract
Pediatric solid tumors are a leading cause of childhood disease mortality. In this work, we examined germline structural variants (SVs) as risk factors for pediatric extracranial solid tumors using germline genome sequencing of 1765 affected children, their 943 unaffected parents, and 6665 adult controls. We discovered a sex-biased association between very large (>1 megabase) germline chromosomal abnormalities and increased risk of solid tumors in male children. The overall impact of germline SVs was greatest in neuroblastoma, where we uncovered burdens of ultrarare SVs that cause loss of function of highly expressed, mutationally constrained genes, as well as noncoding SVs predicted to disrupt chromatin domain boundaries. Collectively, we estimate that rare germline SVs explain 1.1 to 5.6% of pediatric cancer liability, establishing them as an important component of disease predisposition.
Collapse
Affiliation(s)
- Riaz Gillani
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Boston Children's Hospital, Boston, MA, USA
| | - Ryan L Collins
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Amanda Garza
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Jill K Jones
- Harvard Medical School, Boston, MA, USA
- Boston Children's Hospital, Boston, MA, USA
- Harvard/MIT MD-PhD Program, Harvard Medical School, Boston, MA, USA
| | - Mark Walker
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Alba Sanchis-Juan
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Christopher W Whelan
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Emma Pierce-Hoffman
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael E Talkowski
- Harvard Medical School, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Harrison Brand
- Harvard Medical School, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Kevin Haigis
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Jaclyn LoPiccolo
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Saud H AlDubayan
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Boston, MA, USA
- College of Medicine, King Saudi bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
| | - Alexander Gusev
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Brian D Crompton
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Boston Children's Hospital, Boston, MA, USA
| | - Katherine A Janeway
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Boston Children's Hospital, Boston, MA, USA
| | - Eliezer M Van Allen
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Center for Cancer Genomics, Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
3
|
Paschal CR, Zalusky MPG, Beck AE, Gillentine MA, Narayanan J, Damaraju N, Goffena J, Storz SHR, Miller DE. Concordance of Whole-Genome Long-Read Sequencing with Standard Clinical Testing for Prader-Willi and Angelman Syndromes. J Mol Diagn 2025:S1525-1578(25)00001-7. [PMID: 39756651 DOI: 10.1016/j.jmoldx.2024.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 09/27/2024] [Accepted: 12/05/2024] [Indexed: 01/07/2025] Open
Abstract
Current clinical testing approaches for individuals with suspected imprinting disorders are complex, often requiring multiple tests performed in a stepwise manner to make a precise molecular diagnosis. We investigated whether whole-genome long-read sequencing could be used as a single data source to simultaneously evaluate copy number variants, single-nucleotide variants, structural variants, and differences in methylation in a cohort of individuals known to have either Prader-Willi or Angelman syndrome. Twenty-five individuals sequenced to an average depth of coverage of 36× on an Oxford Nanopore Technologies PromethION were evaluated. A custom one-page report was generated that could be used to assess copy number, single-nucleotide variants, and methylation patterns at select CpG sites within the 15q11.2-q13.1 region and prioritize candidate pathogenic variants in UBE3A. After training with three positive controls, three analysts blinded to the known clinical diagnosis arrived at the correct molecular diagnosis for 22 of 22 cases (20 true positive, 2 negative controls). Our findings demonstrate the utility of long-read sequencing as a single, comprehensive data source for complex clinical testing, offering potential benefits, such as reduced testing costs, increased diagnostic yield, and shorter turnaround times, in the clinical laboratory.
Collapse
Affiliation(s)
- Cate R Paschal
- Department of Laboratories, Seattle Children's Hospital, Seattle, Washington; Department of Laboratory Medicine and Pathology, University of Washington and Seattle Children's Hospital, Seattle, Washington
| | - Miranda P G Zalusky
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, Washington
| | - Anita E Beck
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, Washington
| | | | - Jaya Narayanan
- Department of Laboratories, Seattle Children's Hospital, Seattle, Washington
| | - Nikhita Damaraju
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, Washington; Institute for Public Health Genetics, University of Washington, Seattle, Washington
| | - Joy Goffena
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, Washington
| | - Sophie H R Storz
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, Washington
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington and Seattle Children's Hospital, Seattle, Washington; Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, Washington; Department of Genome Sciences, University of Washington, Seattle, Washington; Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, Washington.
| |
Collapse
|
4
|
Secomandi S, Gallo GR, Rossi R, Rodríguez Fernandes C, Jarvis ED, Bonisoli-Alquati A, Gianfranceschi L, Formenti G. Pangenome graphs and their applications in biodiversity genomics. Nat Genet 2025; 57:13-26. [PMID: 39779953 DOI: 10.1038/s41588-024-02029-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 11/08/2024] [Indexed: 01/11/2025]
Abstract
Complete datasets of genetic variants are key to biodiversity genomic studies. Long-read sequencing technologies allow the routine assembly of highly contiguous, haplotype-resolved reference genomes. However, even when complete, reference genomes from a single individual may bias downstream analyses and fail to adequately represent genetic diversity within a population or species. Pangenome graphs assembled from aligned collections of high-quality genomes can overcome representation bias by integrating sequence information from multiple genomes from the same population, species or genus into a single reference. Here, we review the available tools and data structures to build, visualize and manipulate pangenome graphs while providing practical examples and discussing their applications in biodiversity and conservation genomics across the tree of life.
Collapse
Affiliation(s)
- Simona Secomandi
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
| | | | - Riccardo Rossi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
| | - Carlos Rodríguez Fernandes
- Centre for Ecology, Evolution and Environmental Changes (CE3C) and CHANGE, Global Change and Sustainability Institute, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
- Faculdade de Psicologia, Universidade de Lisboa, Lisboa, Portugal
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
- The Vertebrate Genome Laboratory, New York, NY, USA
| | - Andrea Bonisoli-Alquati
- Department of Biological Sciences, California State Polytechnic University, Pomona, Pomona, CA, USA
| | | | | |
Collapse
|
5
|
Climent-Cantó P, Subirana-Granés M, Ramos-Rodríguez M, Dámaso E, Marín F, Vara C, Pérez-González B, Raurell H, Munté E, Soto JL, Alonso Á, Shin G, Ji H, Hitchins M, Capellá G, Pasquali L, Pineda M. Altered chromatin landscape and 3D interactions associated with primary constitutional MLH1 epimutations. Clin Epigenetics 2024; 16:193. [PMID: 39741348 DOI: 10.1186/s13148-024-01770-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Accepted: 10/30/2024] [Indexed: 01/02/2025] Open
Abstract
BACKGROUND Lynch syndrome (LS), characterised by an increased risk for cancer, is mainly caused by germline pathogenic variants affecting a mismatch repair gene (MLH1, MSH2, MSH6, PMS2). Occasionally, LS may be caused by constitutional MLH1 epimutation (CME) characterised by soma-wide methylation of one allele of the MLH1 promoter. Most of these are "primary" epimutations, arising de novo without any apparent underlying cis-genetic cause, and are reversible between generations. We aimed to characterise genetic and gene regulatory changes associated with primary CME to elucidate possible underlying molecular mechanisms. METHODS Four carriers of a primary CME and three non-methylated relatives carrying the same genetic haplotype were included. Genetic alterations were sought using linked-read WGS in blood DNA. Transcriptome (RNA-seq), chromatin landscape (ATAC-seq, H3K27ac CUT&Tag) and 3D chromatin interactions (UMI-4C) were studied in lymphoblastoid cell lines. The MLH1 promoter SNP (c.-93G > A, rs1800734) was used as a reporter in heterozygotes to assess allele-specific chromatin conformation states. RESULTS MLH1 epimutant alleles presented a closed chromatin conformation and decreased levels of H3K27ac, as compared to the unmethylated allele. Moreover, the epimutant MLH1 promoter exhibited differential 3D chromatin contacts, including lost and gained interactions with distal regulatory elements. Of note, rare genetic alterations potentially affecting transcription factor binding sites were found in the promoter-contacting region of CME carriers. CONCLUSIONS Primary CMEs present allele-specific differential interaction patterns with neighbouring genes and regulatory elements. The role of the identified cis-regulatory regions in the molecular mechanism underlying the origin and maintenance of CME requires further investigation.
Collapse
Affiliation(s)
- Paula Climent-Cantó
- Hereditary Cancer Group, ONCOBELL Program, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), L'Hospitalet de Llobregat, Spain
- Hereditary Cancer Program, Institut Català d'Oncologia (ICO), L'Hospitalet de Llobregat, Spain
| | - Marc Subirana-Granés
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Mireia Ramos-Rodríguez
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Estela Dámaso
- Hereditary Cancer Group, ONCOBELL Program, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), L'Hospitalet de Llobregat, Spain
- Molecular Genetics Laboratory, Foundation for the Promotion of Health and Biomedical Research of Valencia Region (FISABIO), University Hospital of Elche, 03203, Elche, Alicante, Spain
- Hereditary Cancer Program, Institut Català d'Oncologia (ICO), L'Hospitalet de Llobregat, Spain
| | - Fátima Marín
- Hereditary Cancer Group, ONCOBELL Program, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), L'Hospitalet de Llobregat, Spain
- Ciber Oncología (CIBERONC), Instituto Salud Carlos III, Madrid, Spain
- Hereditary Cancer Program, Institut Català d'Oncologia (ICO), L'Hospitalet de Llobregat, Spain
| | - Covadonga Vara
- Hereditary Cancer Group, ONCOBELL Program, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), L'Hospitalet de Llobregat, Spain
- Hereditary Cancer Program, Institut Català d'Oncologia (ICO), L'Hospitalet de Llobregat, Spain
| | - Beatriz Pérez-González
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Helena Raurell
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Elisabet Munté
- Hereditary Cancer Group, ONCOBELL Program, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), L'Hospitalet de Llobregat, Spain
- Hereditary Cancer Program, Institut Català d'Oncologia (ICO), L'Hospitalet de Llobregat, Spain
| | - José Luis Soto
- Molecular Genetics Laboratory, Foundation for the Promotion of Health and Biomedical Research of Valencia Region (FISABIO), University Hospital of Elche, 03203, Elche, Alicante, Spain
| | - Ángel Alonso
- Genomics Medicine Unit, Navarrabiomed, Hospital Universitario de Navarra (HUN), Universidad Pública de Navarra (UPNA), IdiSNA, 31008, Pamplona, Spain
| | - GiWon Shin
- Department of Medicine (Oncology), Stanford Cancer Institute, Stanford University, Stanford, CA, 94305, USA
| | - Hanlee Ji
- Department of Medicine (Oncology), Stanford Cancer Institute, Stanford University, Stanford, CA, 94305, USA
| | - Megan Hitchins
- Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Gabriel Capellá
- Hereditary Cancer Group, ONCOBELL Program, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), L'Hospitalet de Llobregat, Spain.
- Ciber Oncología (CIBERONC), Instituto Salud Carlos III, Madrid, Spain.
- Hereditary Cancer Program, Institut Català d'Oncologia (ICO), L'Hospitalet de Llobregat, Spain.
| | - Lorenzo Pasquali
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra, 08003, Barcelona, Spain.
| | - Marta Pineda
- Hereditary Cancer Group, ONCOBELL Program, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), L'Hospitalet de Llobregat, Spain.
- Ciber Oncología (CIBERONC), Instituto Salud Carlos III, Madrid, Spain.
- Hereditary Cancer Program, Institut Català d'Oncologia (ICO), L'Hospitalet de Llobregat, Spain.
| |
Collapse
|
6
|
Zhou W, Mumm C, Gan Y, Switzenberg JA, Wang J, De Oliveira P, Kathuria K, Losh SJ, McDonald TL, Bessell B, Van Deynze K, McConnell MJ, Boyle AP, Mills RE. A personalized multi-platform assessment of somatic mosaicism in the human frontal cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.18.629274. [PMID: 39763954 PMCID: PMC11702624 DOI: 10.1101/2024.12.18.629274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/16/2025]
Abstract
Somatic mutations in individual cells lead to genomic mosaicism, contributing to the intricate regulatory landscape of genetic disorders and cancers. To evaluate and refine the detection of somatic mosaicism across different technologies with personalized donor-specific assembly (DSA), we obtained tissue from the dorsolateral prefrontal cortex (DLPFC) of a post-mortem neurotypical 31-year-old individual. We sequenced bulk DLPFC tissue using Oxford Nanopore Technologies (~60X), NovaSeq (~30X), and linked-read sequencing (~28X). Additionally, we applied Cas9 capture methodology coupled with long-read sequencing (TEnCATS), targeting active transposable elements. We also isolated and amplified DNA from flow-sorted single DLPFC neurons using MALBAC, sequencing 115 of these MALBAC libraries on Nanopore and 94 on NovaSeq. We constructed a haplotype-resolved assembly with a total length of 5.77 Gb and a phase block length of 2.67 Mb (N50) to facilitate cross-platform analysis of somatic genetic variations. We observed an increase in the phasing rate from 11.6% to 38.0% between short-read and long-read technologies. By generating a catalog of phased germline SNVs, CNVs, and TEs from the assembled genome, we applied standard approaches to recall these variants across sequencing technologies. We achieved aggregated recall rates from 97.3% to 99.4% based on long-read bulk tissue data, setting an upper bound for detection limits. Moreover, utilizing haplotype-based analysis from DSA, we achieved a remarkable reduction in false positive somatic calls in bulk tissue, ranging from 14.9% to 72.4%. We developed pipelines leveraging DSA information to enhance somatic large genetic variant calling in long-read single cells. By examining somatic variation using long-reads in 115 individual neurons, we identified 468 candidate somatic heterozygous large deletions (1.5Mb - 20Mb), 137 of which intersected with short-read single-cell data. Additionally, we identified 61 putative somatic TEs (60 Alus, one LINE-1) in the single-cell data. Collectively, our analysis spans personalized assembly to single-cell somatic variant calling, providing a comprehensive ab initio ad finem approach and resource in real human tissue.
Collapse
Affiliation(s)
- Weichen Zhou
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Camille Mumm
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Yanming Gan
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Jessica A. Switzenberg
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Jinhao Wang
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | | | - Kunal Kathuria
- Lieber Institute for Brain Development, Baltimore, MD, USA
| | - Steven J. Losh
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Torrin L. McDonald
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Brandt Bessell
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Kinsey Van Deynze
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | | | - Alan P. Boyle
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Ryan E. Mills
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
7
|
Mostovoy Y, Boone PM, Huang Y, Garimella KV, Tan KT, Russell BE, Salani M, de Esch CEF, Lemanski J, Curall B, Hauenstein J, Lucente D, Bowers T, DeSmet T, Gabriel S, Morton CC, Meyerson M, Hastie AR, Gusella J, Quintero-Rivera F, Brand H, Talkowski ME. Resolution of ring chromosomes, Robertsonian translocations, and complex structural variants from long-read sequencing and telomere-to-telomere assembly. Am J Hum Genet 2024; 111:2693-2706. [PMID: 39520989 PMCID: PMC11639088 DOI: 10.1016/j.ajhg.2024.10.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 10/04/2024] [Accepted: 10/09/2024] [Indexed: 11/16/2024] Open
Abstract
Delineation of structural variants (SVs) at sequence resolution in highly repetitive genomic regions has long been intractable. The sequence properties, origins, and functional effects of classes of genomic rearrangements such as ring chromosomes and Robertsonian translocations thus remain unknown. To resolve these complex structures, we leveraged several recent milestones in the field, including (1) the emergence of long-read sequencing, (2) the gapless telomere-to-telomere (T2T) assembly, and (3) a tool (BigClipper) to discover chromosomal rearrangements from long reads. We applied these technologies across 13 cases with ring chromosomes, Robertsonian translocations, and complex SVs that were unresolved by short reads, followed by validation using optical genome mapping (OGM). Our analyses resolved 10 of 13 cases, including a Robertsonian translocation and all ring chromosomes. Multiple breakpoints were localized to genomic regions previously recalcitrant to sequencing such as acrocentric p-arms, ribosomal DNA arrays, and telomeric repeats, and involved complex structures such as a deletion-inversion and interchromosomal dispersed duplications. We further performed methylation profiling from long-read data to discover phased differential methylation in a gene promoter proximal to a ring fusion, suggesting a long-range position effect (LRPE) with heterochromatin spreading. Breakpoint sequences suggested mechanisms of SV formation such as microhomology-mediated and non-homologous end-joining, as well as non-allelic homologous recombination. These methods provide some of the first glimpses into the sequence resolution of Robertsonian translocations and illuminate the structural diversity of ring chromosomes and complex chromosomal rearrangements with implications for genome biology, prediction of LRPEs from integrated multi-omics technologies, and molecular diagnostics in rare disease cases.
Collapse
Affiliation(s)
- Yulia Mostovoy
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Philip M Boone
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115, USA
| | - Yongqing Huang
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kiran V Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kar-Tong Tan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Bianca E Russell
- Division of Genetics, Department of Pediatrics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Monica Salani
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Celine E F de Esch
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - John Lemanski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Benjamin Curall
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02115, USA
| | | | - Diane Lucente
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Tera Bowers
- Genomics Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Tim DeSmet
- Genomics Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Stacey Gabriel
- Genomics Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Cynthia C Morton
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Departments of Obstetrics and Gynecology and of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA; Manchester Center for Audiology and Deafness, School of Health Sciences, University of Manchester, Manchester M13 9PL, UK
| | - Matthew Meyerson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | | | - James Gusella
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Fabiola Quintero-Rivera
- Departments of Pathology, Laboratory Medicine, and Pediatrics, Division of Genetic and Genomic Medicine, University of California, Irvine, Irvine, CA 92697, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02115, USA; Pediatric Surgery Research Laboratory, Department of Pediatrics, Boston, MA 02114, USA
| | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02115, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
8
|
Devadoss Gandhi G, Aliyev E, Syed N, Vempalli FR, Saad C, Mbarek H, Al-Saei O, Al-Maraghi A, Abdi M, Krishnamoorthy N, Badii R, Akil AA, Ben-Omran T, Fakhro KA. Mapping the genetic landscape of treatable inherited metabolic disorders in a large Middle Eastern biobank. Genet Med 2024; 26:101268. [PMID: 39286960 DOI: 10.1016/j.gim.2024.101268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 09/10/2024] [Accepted: 09/10/2024] [Indexed: 09/19/2024] Open
Abstract
PURPOSE To date, approximately 1400 inherited metabolic disorders (IMDs) have been described, some of which are treatable. It is estimated that 2% to 3% of live births worldwide are affected by treatable IMDs. Roughly 80% of IMDs are autosomal recessive, leading to a potentially higher incidence in regions with high consanguinity. METHODOLOGY The study utilized genome sequencing data from 14,060 adult Qatari participants who were recruited by the Qatar Biobank and sequenced by the Qatar Genome Program. The genome sequencing data were analyzed for 125 nuclear genes known to be associated with 115 treatable IMDs. RESULTS Our study identified 253 pathogenic/likely pathogenic single-nucleotide variations associated with 69 treatable IMDs, including 211 known and 42 novel predicted loss-of-function variants. We estimated that approximately 1 in 13 unrelated individuals (8%) carry a heterozygous pathogenic variant for at least 1 of 46 treatable IMDs. Notably, phenylketonuria/hyperphenylalaninemia and homocystinuria had among the highest carrier frequencies (1 in 68 and 1 in 85, respectively). CONCLUSION Population-based studies of treatable IMDs, particularly in globally under-studied populations, can identify high-frequency alleles segregating in the community and inform public health policies, including carrier and newborn screening.
Collapse
Affiliation(s)
| | - Elbay Aliyev
- Human Genetics Department, Sidra Medicine, Doha, Qatar
| | - Najeeb Syed
- Human Genetics Department, Sidra Medicine, Doha, Qatar
| | | | - Chadi Saad
- Qatar Genome Program, Qatar Foundation Research Development and Innovation, Doha, Qatar
| | - Hamdi Mbarek
- Qatar Genome Program, Qatar Foundation Research Development and Innovation, Doha, Qatar
| | | | | | - Mona Abdi
- Human Genetics Department, Sidra Medicine, Doha, Qatar
| | | | - Ramin Badii
- Molecular Genetics Laboratory, Hamad Medical Corporation, Doha, Qatar
| | - Ammira A Akil
- Genetics and Metabolic Clinical Research Program, Translational Medicine, Research Department, Sidra Medicine, Doha, Qatar
| | - Tawfeg Ben-Omran
- Division of Genetic & Genomics Medicine, Sidra Medicine, Doha, Qatar; Department of Medical Genetics, Hamad Medical Corporation, Doha, Qatar; Department of Pediatric, Weill Cornell Medical College, Doha, Qatar
| | - Khalid A Fakhro
- Human Genetics Department, Sidra Medicine, Doha, Qatar; College of Health and Life Sciences, Hamad Bin Khalifa University (HBKU), Doha, Qatar; Department of Genetic Medicine, Weill Cornell Medicine, Qatar (WCM-Q).
| |
Collapse
|
9
|
Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA, Richmond PA, De Coster W, Olson ND, Guarracino A, Li Q, Miller AL, Goffena J, Anderson ZB, Storz SHR, Ward SA, Sinha M, Gonzaga-Jauregui C, Clarke WE, Basile AO, Corvelo A, Reeves C, Helland A, Musunuri RL, Revsine M, Patterson KE, Paschal CR, Zakarian C, Goodwin S, Jensen TD, Robb E, McCombie WR, Sedlazeck FJ, Zook JM, Montgomery SB, Garrison E, Kolmogorov M, Schatz MC, McLaughlin RN, Dashnow H, Zody MC, Loose M, Jain M, Eichler EE, Miller DE. High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation. Genome Res 2024; 34:2061-2073. [PMID: 39358015 DOI: 10.1101/gr.279273.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 09/16/2024] [Indexed: 10/04/2024]
Abstract
Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control data sets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project (1KGP) Oxford Nanopore Technologies Sequencing Consortium aims to generate LRS data from at least 800 of the 1KGP samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37× and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.
Collapse
Affiliation(s)
- Jonas A Gustafson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, Washington 98195, USA
| | - Sophia B Gibson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Nikhita Damaraju
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
- Institute for Public Health Genetics, University of Washington, Seattle, Washington 98195, USA
| | - Miranda P G Zalusky
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - David Twesigomwe
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg 2193, South Africa
| | - Lei Yang
- Pacific Northwest Research Institute, Seattle, Washington 98122, USA
| | - Anthony A Snead
- Department of Biology, New York University, New York, New York 10003, USA
| | | | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp 2650, Belgium
- Department of Biomedical Sciences, University of Antwerp, Antwerp 2000, Belgium
| | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
- Human Technopole, Milan 20157, Italy
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Angela L Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | - Joy Goffena
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | - Zachary B Anderson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | - Sophie H R Storz
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | - Sydney A Ward
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | - Maisha Sinha
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | - Claudia Gonzaga-Jauregui
- International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Mexico City 76230, Mexico
| | - Wayne E Clarke
- New York Genome Center, New York, New York 10013, USA
- Outlier Informatics Inc., Saskatoon, Saskatchewan S7H 1L4, Canada
| | - Anna O Basile
- New York Genome Center, New York, New York 10013, USA
| | - André Corvelo
- New York Genome Center, New York, New York 10013, USA
| | | | | | | | - Mahler Revsine
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Karynne E Patterson
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Cate R Paschal
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, Washington 98195, USA
- Department of Laboratories, Seattle Children's Hospital, Seattle, Washington 98195, USA
| | - Christina Zakarian
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Tanner D Jensen
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Esther Robb
- Department of Computer Science, Stanford University, Stanford, California 94305, USA
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Computer Science, Rice University, Houston, Texas 77251, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | | | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, Maryland 20892, USA
| | | | - Richard N McLaughlin
- Molecular and Cellular Biology Program, University of Washington, Seattle, Washington 98195, USA
- Pacific Northwest Research Institute, Seattle, Washington 98122, USA
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| | - Michael C Zody
- International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Mexico City 76230, Mexico
| | - Matt Loose
- Deep Seq, School of Life Sciences, University of Nottingham, Nottingham NG7 2TQ, UK
| | - Miten Jain
- Department of Bioengineering, Northeastern University, Boston, Massachusetts 02115, USA
- Department of Physics, Northeastern University, Boston, Massachusetts 02115, USA
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts 02115, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, Washington 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Danny E Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA;
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, Washington 98195, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
10
|
Chang L, Niu X, Huang S, Song D, Ran X, Wang J. Detection of structural variants linked to mutton flavor and odor in two closely related black goat breeds. BMC Genomics 2024; 25:979. [PMID: 39425017 PMCID: PMC11490145 DOI: 10.1186/s12864-024-10874-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Accepted: 10/08/2024] [Indexed: 10/21/2024] Open
Abstract
BACKGROUND Mutton quality is closely related to genetic variants and gene expression alterations during growth and development, resulting in differences in nutritional values, flavor, and odor. RESULTS We first evaluated and compared the composition of crude protein, crude fat, cholesterol, amino acid (AA), and fatty acid (FA) in the longissimus dorsi muscle of Guizhou black goats (GZB, n = 5) and Yunshang black goats (YBG, n = 6). The contents of cholesterol and FA related to odor in GZB were significantly lower than that in YBG, while the concentrations of umami amino acids and intramuscular fat were significantly higher in GZB. Furthermore, structural variants (SVs) in the genomes of GZB (n = 30) and YBG (n = 11) were explored. It was found that some regions in Chr 10/12/18 were densely involved with a large number of SVs in the genomes of GZB and YBG. By setting FST ≥ 0.25, we got 837 stratified SVs, of which 25 SVs (involved in 12 genes, e.g., CORO1A, CLIC6, PCSK2, and TMEM9) were limited in GZB. Functional enrichment analysis of 14 protein-coding genes (e.g., ENPEP, LIPC, ABCA5, and SLC6A15) revealed multiple terms and pathways related with metabolisms of AA, FA, and cholesterol. The SVs (n = 10) obtained by the whole genome resequencing were confirmed in percentages of 36.67 to 86.67% (n = 96) by PCR method. The SVa and SVd polymorphisms indicated a moderate negative correlation with HMGCS1 activity (n = 17). CONCLUSION This study is the first to comprehensively reveal potential SVs related to mutton nutritional values, flavor, and odor based on genomic compare between two black goat breeds with closely genetic relationship. The SVs generated in this study provide a data resource for deeper studies to understand the genomic characteristics and possible evolutionary outcomes with better nutritional values, flavor and extremely light odor.
Collapse
Affiliation(s)
- Lingle Chang
- College of Animal Science, Key Laboratory of Animal Genetics, Breeding and Reproduction in the Plateau Mountainous Region (Ministry of Education), Guizhou University, Guiyang, 550025, China
| | - Xi Niu
- Institute of Agro-Bioengineering/Key Laboratory of Plant Resource Conservative and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, Guizhou University, Guiyang, 550025, China
| | - Shihui Huang
- College of Animal Science, Key Laboratory of Animal Genetics, Breeding and Reproduction in the Plateau Mountainous Region (Ministry of Education), Guizhou University, Guiyang, 550025, China
| | - Derong Song
- Bijie Academy of Agricultural Sciences, Bijie, 551700, China
| | - Xueqin Ran
- College of Animal Science, Key Laboratory of Animal Genetics, Breeding and Reproduction in the Plateau Mountainous Region (Ministry of Education), Guizhou University, Guiyang, 550025, China.
| | - Jiafu Wang
- Institute of Agro-Bioengineering/Key Laboratory of Plant Resource Conservative and Germplasm Innovation in Mountainous Region (Ministry of Education), College of Life Sciences, Guizhou University, Guiyang, 550025, China.
| |
Collapse
|
11
|
Sherman CA, Claw KG, Lee SB. Pharmacogenetic analysis of structural variation in the 1000 genomes project using whole genome sequences. Sci Rep 2024; 14:22774. [PMID: 39354004 PMCID: PMC11445439 DOI: 10.1038/s41598-024-73748-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 09/20/2024] [Indexed: 10/03/2024] Open
Abstract
While significant strides have been made in understanding pharmacogenetics (PGx) and gene-drug interactions, there remains limited characterization of population-level PGx variation. This study aims to comprehensively profile global star alleles (haplotype patterns) and phenotype frequencies in 58 pharmacogenes associated with drug absorption, distribution, metabolism, and excretion. PyPGx, a star-allele calling tool, was employed to identify star alleles within high-coverage whole genome sequencing (WGS) data from the 1000 Genomes Project (N = 2504; 26 global populations). This process involved detecting structural variants (SVs), such as gene deletions, duplications, hybrids, as well as single nucleotide variants and insertion-deletion variants. The majority of our PyPGx calls for star alleles and phenotype frequencies aligned with the Pharmacogenomics Knowledge Base, although notable population-specific frequencies differed at least twofold. Validation efforts confirmed known SVs while uncovering several novel SVs currently undefined as star alleles. Additionally, we identified 210 small nucleotide variants associated with severe functional consequences that are not defined as star alleles. The study serves as a valuable resource, providing updated population-level star allele and phenotype frequencies while incorporating SVs. It also highlights the burgeoning potential of cost-effective WGS for PGx genotyping, offering invaluable insights to improve tailored drug therapies across diverse populations.
Collapse
Affiliation(s)
- Carissa A Sherman
- Department of Biomedical Informatics, Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Katrina G Claw
- Department of Biomedical Informatics, Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | |
Collapse
|
12
|
Nakamichi K, Huey J, Sangermano R, Place EM, Bujakowska KM, Marra M, Everett LA, Yang P, Chao JR, Van Gelder RN, Mustafi D. Targeted long-read sequencing enriches disease-relevant genomic regions of interest to provide complete Mendelian disease diagnostics. JCI Insight 2024; 9:e183902. [PMID: 39264853 PMCID: PMC11530123 DOI: 10.1172/jci.insight.183902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Accepted: 09/10/2024] [Indexed: 09/14/2024] Open
Abstract
Despite advances in sequencing technologies, a molecular diagnosis remains elusive in many patients with Mendelian disease. Current short-read clinical sequencing approaches cannot provide chromosomal phase information or epigenetic information without further sample processing, which is not routinely done and can result in an incomplete molecular diagnosis in patients. The ability to provide phased genetic and epigenetic information from a single sequencing run would improve the diagnostic rate of Mendelian conditions. Here, we describe targeted long-read sequencing of Mendelian disease genes (TaLon-SeqMD) using a real-time adaptive sequencing approach. Optimization of bioinformatic targeting enabled selective enrichment of multiple disease-causing regions of the human genome. Haplotype-resolved variant calling and simultaneous resolution of epigenetic base modification could be achieved in a single sequencing run. The TaLon-SeqMD approach was validated in a cohort of 18 individuals with previous genetic testing targeting 373 inherited retinal disease (IRD) genes, yielding the complete molecular diagnosis in each case. This approach was then applied in 2 IRD cases with inconclusive testing, which uncovered noncoding and structural variants that were difficult to characterize by standard short-read sequencing. Overall, these results demonstrate TaLon-SeqMD as an approach to provide rapid phased-variant calling to provide the molecular basis of Mendelian diseases.
Collapse
Affiliation(s)
- Kenji Nakamichi
- Department of Ophthalmology, University of Washington, Seattle, Washington, USA
- Roger and Karalis Johnson Retina Center, Seattle, Washington, USA
| | - Jennifer Huey
- Department of Ophthalmology, University of Washington, Seattle, Washington, USA
- Roger and Karalis Johnson Retina Center, Seattle, Washington, USA
| | - Riccardo Sangermano
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA
| | - Emily M. Place
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA
| | - Kinga M. Bujakowska
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA
| | - Molly Marra
- Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA
| | - Lesley A. Everett
- Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA
| | - Paul Yang
- Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA
| | - Jennifer R. Chao
- Department of Ophthalmology, University of Washington, Seattle, Washington, USA
- Roger and Karalis Johnson Retina Center, Seattle, Washington, USA
| | - Russell N. Van Gelder
- Department of Ophthalmology, University of Washington, Seattle, Washington, USA
- Roger and Karalis Johnson Retina Center, Seattle, Washington, USA
- Departments of Laboratory Medicine and Pathology and Biological Structure, University of Washington, Seattle, Washington, USA
| | - Debarshi Mustafi
- Department of Ophthalmology, University of Washington, Seattle, Washington, USA
- Roger and Karalis Johnson Retina Center, Seattle, Washington, USA
- Brotman Baty Institute for Precision Medicine, Seattle, Washington, USA
- Division of Ophthalmology, Seattle Children’s Hospital, Seattle, Washington, USA
| |
Collapse
|
13
|
Hamvas A, Chaudhari BP, Nogee LM. Genetic testing for diffuse lung diseases in children. Pediatr Pulmonol 2024; 59:2286-2297. [PMID: 37191361 DOI: 10.1002/ppul.26447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 04/04/2023] [Accepted: 04/23/2023] [Indexed: 05/17/2023]
Abstract
Newly developing genomic technologies are an increasingly important part of clinical care and thus, it is not only important to understand the technologies and their limitations, but to also interpret the findings in an actionable fashion. Clinical geneticists and genetic counselors are now an integral part of the clinical team and are able to bridge the complexities of this rapidly changing science between the bedside clinicians and patients. This manuscript reviews the terminology, the current technology, some of the known genetic disorders that result in lung disease, and indications for genetic testing with associated caveats. Because this field is evolving quickly, we also provide links to websites that provide continuously updated information important for integrating genomic technology results into clinical decision-making.
Collapse
Affiliation(s)
- Aaron Hamvas
- Department of Pediatrics, Division of Neonatology, Ann and Robert H. Lurie Children's Hospital of Chicago and Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Bimal P Chaudhari
- Divisions of Genetics and Genomic Medicine, Neonatology, Nationwide Children's Hospital, The Ohio State University College of Medicine, Columbus, Ohio, USA
| | - Lawrence M Nogee
- Department of Pediatrics, Eudowood Neonatal Pulmonary Division, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
14
|
Hou K, Zheng X. A 10-Year Review on Advancements in Identifying and Treating Intellectual Disability Caused by Genetic Variations. Genes (Basel) 2024; 15:1118. [PMID: 39336708 PMCID: PMC11431063 DOI: 10.3390/genes15091118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 08/19/2024] [Accepted: 08/23/2024] [Indexed: 09/30/2024] Open
Abstract
Intellectual disability (ID) is a prevalent neurodevelopmental disorder characterized by neurodevelopmental defects such as the congenital impairment of intellectual function and restricted adaptive behavior. However, genetic studies have been significantly hindered by the extreme clinical and genetic heterogeneity of the subjects under investigation. With the development of gene sequencing technologies, more genetic variations have been discovered, assisting efforts in ID identification and treatment. In this review, the physiological basis of gene variations in ID is systematically explained, the diagnosis and therapy of ID is comprehensively described, and the potential of genetic therapies and exercise therapy in the rehabilitation of individuals with intellectual disabilities are highlighted, offering new perspectives for treatment approaches.
Collapse
Affiliation(s)
- Kexin Hou
- School of Exercise and Health, Shanghai University of Sport, 200 Hengren Road, Yangpu, Shanghai 200438, China
| | - Xinyan Zheng
- School of Exercise and Health, Shanghai University of Sport, 200 Hengren Road, Yangpu, Shanghai 200438, China
| |
Collapse
|
15
|
Taylor DJ, Eizenga JM, Li Q, Das A, Jenike KM, Kenny EE, Miga KH, Monlong J, McCoy RC, Paten B, Schatz MC. Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References. Annu Rev Genomics Hum Genet 2024; 25:77-104. [PMID: 38663087 PMCID: PMC11451085 DOI: 10.1146/annurev-genom-021623-081639] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024]
Abstract
The Human Genome Project was an enormous accomplishment, providing a foundation for countless explorations into the genetics and genomics of the human species. Yet for many years, the human genome reference sequence remained incomplete and lacked representation of human genetic diversity. Recently, two major advances have emerged to address these shortcomings: complete gap-free human genome sequences, such as the one developed by the Telomere-to-Telomere Consortium, and high-quality pangenomes, such as the one developed by the Human Pangenome Reference Consortium. Facilitated by advances in long-read DNA sequencing and genome assembly algorithms, complete human genome sequences resolve regions that have been historically difficult to sequence, including centromeres, telomeres, and segmental duplications. In parallel, pangenomes capture the extensive genetic diversity across populations worldwide. Together, these advances usher in a new era of genomics research, enhancing the accuracy of genomic analysis, paving the path for precision medicine, and contributing to deeper insights into human biology.
Collapse
Affiliation(s)
- Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Arun Das
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Katharine M Jenike
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA;
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA;
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Jean Monlong
- Institut de Recherche en Santé Digestive, Université de Toulouse, INSERM, INRA, ENVT, UPS, Toulouse, France;
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Benedict Paten
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| |
Collapse
|
16
|
Ormond C, Ryan NM, Byerley W, Heron EA, Corvin A. Investigating copy number variants in schizophrenia pedigrees using a new consensus pipeline called PECAN. Sci Rep 2024; 14:17518. [PMID: 39080331 PMCID: PMC11289470 DOI: 10.1038/s41598-024-66021-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 06/26/2024] [Indexed: 08/02/2024] Open
Abstract
Copy number variants (CNVs) have been implicated in many human diseases, including psychiatric disorders. Whole genome sequencing offers advantages in CNV calling compared to previous array-based methods. Here we present a robust and transparent CNV calling pipeline, PECAN (PEdigree Copy number vAriaNt calling), for short-read, whole genome sequencing data, comprised of a novel combination of four calling methods and structural variant genotyping. This method is scalable and can incorporate pedigree information to retain lower-confidence CNVs that would otherwise be discarded. We have robustly benchmarked PECAN using gold-standard CNV calls for two well-established evaluation samples, NA12878 and HG002, showing that PECAN performs with high precision and recall on both datasets, outperforming another pedigree-based CNV calling pipeline. As part of this work, we provide a list of high-confidence gold standard CNVs for the NA12878 reference sample, curated from multiple studies. We applied PECAN to a collection of pedigrees multiply affected with schizophrenia and identified a rare deletion that perfectly co-segregates with schizophrenia in one of the pedigrees. The CNV overlaps the gene PITRM1, which has been implicated in a complex phenotype including ataxia, developmental delay, and schizophrenia-like episodes in affected adults.
Collapse
Affiliation(s)
- Cathal Ormond
- Neuropsychiatric Genetics Research Group, Department of Psychiatry, Trinity Centre for Health Sciences, Trinity College Dublin, James' Street, Dublin 8, Ireland
| | - Niamh M Ryan
- Neuropsychiatric Genetics Research Group, Department of Psychiatry, Trinity Centre for Health Sciences, Trinity College Dublin, James' Street, Dublin 8, Ireland
| | - William Byerley
- Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, CA, USA
| | - Elizabeth A Heron
- Neuropsychiatric Genetics Research Group, Department of Psychiatry, Trinity Centre for Health Sciences, Trinity College Dublin, James' Street, Dublin 8, Ireland
| | - Aiden Corvin
- Neuropsychiatric Genetics Research Group, Department of Psychiatry, Trinity Centre for Health Sciences, Trinity College Dublin, James' Street, Dublin 8, Ireland.
| |
Collapse
|
17
|
Kramer M, Goodwin S, Wappel R, Borio M, Offit K, Feldman DR, Stadler ZK, McCombie WR. Exploring the genetic and epigenetic underpinnings of early-onset cancers: Variant prioritization for long read whole genome sequencing from family cancer pedigrees. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.27.601096. [PMID: 39005350 PMCID: PMC11244929 DOI: 10.1101/2024.06.27.601096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Despite significant advances in our understanding of genetic cancer susceptibility, known inherited cancer predisposition syndromes explain at most 20% of early-onset cancers. As early-onset cancer prevalence continues to increase, the need to assess previously inaccessible areas of the human genome, harnessing a trio or quad family-based architecture for variant filtration, may reveal further insights into cancer susceptibility. To assess a broader spectrum of variation than can be ascertained by multi-gene panel sequencing, or even whole genome sequencing with short reads, we employed long read whole genome sequencing using an Oxford Nanopore Technology (ONT) PromethION of 3 families containing an early-onset cancer proband using a trio or quad family architecture. Analysis included 2 early-onset colorectal cancer family trios and one quad consisting of two siblings with testicular cancer, all with unaffected parents. Structural variants (SVs), epigenetic profiles and single nucleotide variants (SNVs) were determined for each individual, and a filtering strategy was employed to refine and prioritize candidate variants based on the family architecture. The family architecture enabled us to focus on inapposite variants while filtering variants shared with the unaffected parents, significantly decreasing background variation that can hamper identification of potentially disease causing differences. Candidate d e novo and compound heterozygous variants were identified in this way. Gene expression, in matched neoplastic and pre-neoplastic lesions, was assessed for one trio. Our study demonstrates the feasibility of a streamlined analysis of genomic variants from long read ONT whole genome sequencing and a way to prioritize key variants for further evaluation of pathogenicity, while revealing what may be missing from panel based analyses.
Collapse
|
18
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
19
|
Koenig Z, Yohannes MT, Nkambule LL, Zhao X, Goodrich JK, Kim HA, Wilson MW, Tiao G, Hao SP, Sahakian N, Chao KR, Walker MA, Lyu Y, Rehm HL, Neale BM, Talkowski ME, Daly MJ, Brand H, Karczewski KJ, Atkinson EG, Martin AR. A harmonized public resource of deeply sequenced diverse human genomes. Genome Res 2024; 34:796-809. [PMID: 38749656 PMCID: PMC11216312 DOI: 10.1101/gr.278378.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 05/07/2024] [Indexed: 05/18/2024]
Abstract
Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.
Collapse
Affiliation(s)
- Zan Koenig
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Mary T Yohannes
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Lethukuthula L Nkambule
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Julia K Goodrich
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Heesu Ally Kim
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Michael W Wilson
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Stephanie P Hao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Nareh Sahakian
- Broad Genomics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, 02141, USA
| | - Katherine R Chao
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Mark A Walker
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Data Sciences Platform, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Yunfei Lyu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Benjamin M Neale
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Michael E Talkowski
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Mark J Daly
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA
- Institute for Molecular Medicine Finland, 00290 Helsinki, Finland
| | - Harrison Brand
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Konrad J Karczewski
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Elizabeth G Atkinson
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Alicia R Martin
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA;
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| |
Collapse
|
20
|
Miano-Burkhardt A, Alvarez Jerez P, Daida K, Bandres Ciga S, Billingsley KJ. The Role of Structural Variants in the Genetic Architecture of Parkinson's Disease. Int J Mol Sci 2024; 25:4801. [PMID: 38732020 PMCID: PMC11084710 DOI: 10.3390/ijms25094801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 04/17/2024] [Accepted: 04/22/2024] [Indexed: 05/13/2024] Open
Abstract
Parkinson's disease (PD) significantly impacts millions of individuals worldwide. Although our understanding of the genetic foundations of PD has advanced, a substantial portion of the genetic variation contributing to disease risk remains unknown. Current PD genetic studies have primarily focused on one form of genetic variation, single nucleotide variants (SNVs), while other important forms of genetic variation, such as structural variants (SVs), are mostly ignored due to the complexity of detecting these variants with traditional sequencing methods. Yet, these forms of genetic variation play crucial roles in gene expression and regulation in the human brain and are causative of numerous neurological disorders, including forms of PD. This review aims to provide a comprehensive overview of our current understanding of the involvement of coding and noncoding SVs in the genetic architecture of PD.
Collapse
Affiliation(s)
- Abigail Miano-Burkhardt
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD 20892, USA; (A.M.-B.); (K.D.)
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD 20892, USA; (P.A.J.); (S.B.C.)
| | - Pilar Alvarez Jerez
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD 20892, USA; (P.A.J.); (S.B.C.)
| | - Kensuke Daida
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD 20892, USA; (A.M.-B.); (K.D.)
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD 20892, USA; (P.A.J.); (S.B.C.)
| | - Sara Bandres Ciga
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD 20892, USA; (P.A.J.); (S.B.C.)
| | - Kimberley J. Billingsley
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD 20892, USA; (A.M.-B.); (K.D.)
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD 20892, USA; (P.A.J.); (S.B.C.)
| |
Collapse
|
21
|
Schloissnig S, Pani S, Rodriguez-Martin B, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, Asparuhova M, Hunt S, Rausch T, Marschall T, Korbel JO. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590093. [PMID: 38659906 PMCID: PMC11042266 DOI: 10.1101/2024.04.18.590093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Structural variants (SVs) contribute significantly to human genetic diversity and disease 1-4 . Previously, SVs have remained incompletely resolved by population genomics, with short-read sequencing facing limitations in capturing the whole spectrum of SVs at nucleotide resolution 5-7 . Here we leveraged nanopore sequencing 8 to construct an intermediate coverage resource of 1,019 long-read genomes sampled within 26 human populations from the 1000 Genomes Project. By integrating linear and graph-based approaches for SV analysis via pangenome graph-augmentation, we uncover 167,291 sequence-resolved SVs in these samples, considerably advancing SV characterization compared to population-wide short-read sequencing studies 3,4 . Our analysis details diverse SV classes-deletions, duplications, insertions, and inversions-at population-scale. LINE-1 and SVA retrotransposition activities frequently mediate transductions 9,10 of unique sequences, with both mobile element classes transducing sequences at either the 3'- or 5'-end, depending on the source element locus. Furthermore, analyses of SV breakpoint junctions suggest a continuum of homology-mediated rearrangement processes are integral to SV formation, and highlight evidence for SV recurrence involving repeat sequences. Our open-access dataset underscores the transformative impact of long-read sequencing in advancing the characterisation of polymorphic genomic architectures, and provides a resource for guiding variant prioritisation in future long-read sequencing-based disease studies.
Collapse
|
22
|
Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA, Richmond PA, De Coster W, Olson ND, Guarracino A, Li Q, Miller AL, Goffena J, Anderson Z, Storz SHR, Ward SA, Sinha M, Gonzaga-Jauregui C, Clarke WE, Basile AO, Corvelo A, Reeves C, Helland A, Musunuri RL, Revsine M, Patterson KE, Paschal CR, Zakarian C, Goodwin S, Jensen TD, Robb E, McCombie WR, Sedlazeck FJ, Zook JM, Montgomery SB, Garrison E, Kolmogorov M, Schatz MC, McLaughlin RN, Dashnow H, Zody MC, Loose M, Jain M, Eichler EE, Miller DE. Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.05.24303792. [PMID: 38496498 PMCID: PMC10942501 DOI: 10.1101/2024.03.05.24303792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.
Collapse
Affiliation(s)
- Jonas A. Gustafson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Sophia B. Gibson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Nikhita Damaraju
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Institute for Public Health Genetics, University of Washington, Seattle, WA, USA
| | - Miranda PG Zalusky
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - David Twesigomwe
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Lei Yang
- Pacific Northwest Research Institute, Seattle, WA, USA
| | | | | | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Human Technopole, Milan, Italy
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Angela L. Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Joy Goffena
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Zachery Anderson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Sophie HR Storz
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Sydney A. Ward
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Maisha Sinha
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Claudia Gonzaga-Jauregui
- International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México
| | - Wayne E. Clarke
- New York Genome Center, New York, NY, USA
- Outlier Informatics Inc., Saskatoon, SK, Canada
| | | | | | | | | | | | - Mahler Revsine
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | | | - Cate R. Paschal
- Department of Laboratories, Seattle Children’s Hospital, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Christina Zakarian
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Esther Robb
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | | | | | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Richard N. McLaughlin
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
- Pacific Northwest Research Institute, Seattle, WA, USA
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | | | - Matt Loose
- Deep Seq, School of Life Sciences, University of Nottingham, Nottingham, England
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Khoury College of Computer Sciences, Northeastern University, Boston, MA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Danny E. Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
| |
Collapse
|
23
|
Kehl A, Aupperle-Lellbach H, de Brot S, van der Weyden L. Review of Molecular Technologies for Investigating Canine Cancer. Animals (Basel) 2024; 14:769. [PMID: 38473154 PMCID: PMC10930838 DOI: 10.3390/ani14050769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/09/2024] [Accepted: 02/27/2024] [Indexed: 03/14/2024] Open
Abstract
Genetic molecular testing is starting to gain traction as part of standard clinical practice for dogs with cancer due to its multi-faceted benefits, such as potentially being able to provide diagnostic, prognostic and/or therapeutic information. However, the benefits and ultimate success of genomic analysis in the clinical setting are reliant on the robustness of the tools used to generate the results, which continually expand as new technologies are developed. To this end, we review the different materials from which tumour cells, DNA, RNA and the relevant proteins can be isolated and what methods are available for interrogating their molecular profile, including analysis of the genetic alterations (both somatic and germline), transcriptional changes and epigenetic modifications (including DNA methylation/acetylation and microRNAs). We also look to the future and the tools that are currently being developed, such as using artificial intelligence (AI) to identify genetic mutations from histomorphological criteria. In summary, we find that the molecular genetic characterisation of canine neoplasms has made a promising start. As we understand more of the genetics underlying these tumours and more targeted therapies become available, it will no doubt become a mainstay in the delivery of precision veterinary care to dogs with cancer.
Collapse
Affiliation(s)
- Alexandra Kehl
- Laboklin GmbH & Co. KG, Steubenstr. 4, 97688 Bad Kissingen, Germany; (A.K.); (H.A.-L.)
- School of Medicine, Institute of Pathology, Technical University of Munich, Trogerstr. 18, 81675 München, Germany
| | - Heike Aupperle-Lellbach
- Laboklin GmbH & Co. KG, Steubenstr. 4, 97688 Bad Kissingen, Germany; (A.K.); (H.A.-L.)
- School of Medicine, Institute of Pathology, Technical University of Munich, Trogerstr. 18, 81675 München, Germany
| | - Simone de Brot
- Institute of Animal Pathology, COMPATH, University of Bern, 3012 Bern, Switzerland;
| | | |
Collapse
|
24
|
Smeds L, Huson LSA, Ellegren H. Structural genomic variation in the inbred Scandinavian wolf population contributes to the realized genetic load but is positively affected by immigration. Evol Appl 2024; 17:e13652. [PMID: 38333557 PMCID: PMC10848878 DOI: 10.1111/eva.13652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 01/08/2024] [Accepted: 01/16/2024] [Indexed: 02/10/2024] Open
Abstract
When populations decrease in size and may become isolated, genomic erosion by loss of diversity from genetic drift and accumulation of deleterious mutations is likely an inevitable consequence. In such cases, immigration (genetic rescue) is necessary to restore levels of genetic diversity and counteract inbreeding depression. Recent work in conservation genomics has studied these processes focusing on the genetic diversity of single nucleotide polymorphisms. In contrast, our knowledge about structural genomic variation (insertions, deletions, duplications and inversions) in endangered species is limited. We analysed whole-genome, short-read sequences from 212 wolves from the inbred Scandinavian population and from neighbouring populations in Finland and Russia, and detected >35,000 structural variants (SVs) after stringent quality and genotype frequency filtering; >26,000 high-confidence variants remained after manual curation. The majority of variants were shorter than 1 kb, with a distinct peak in the length distribution of deletions at 190 bp, corresponding to insertion events of SINE/tRNA-Lys elements. The site frequency spectrum of SVs in protein-coding regions was significantly shifted towards rare alleles compared to putatively neutral variants, consistent with purifying selection. The realized genetic load of SVs in protein-coding regions increased with inbreeding levels in the Scandinavian population, but immigration provided a genetic rescue effect by lowering the load and reintroducing ancestral alleles at loci fixed for derived SVs. Our study shows that structural variation comprises a common type of in part deleterious mutations in endangered species and that establishing gene flow is necessary to mitigate the negative consequences of loss of diversity.
Collapse
Affiliation(s)
- Linnéa Smeds
- Department of Ecology and Genetics, Evolutionary BiologyUppsala UniversityUppsalaSweden
| | - Lars S. A. Huson
- Department of Ecology and Genetics, Evolutionary BiologyUppsala UniversityUppsalaSweden
| | - Hans Ellegren
- Department of Ecology and Genetics, Evolutionary BiologyUppsala UniversityUppsalaSweden
| |
Collapse
|
25
|
Groza C, Schwendinger-Schreck C, Cheung WA, Farrow EG, Thiffault I, Lake J, Rizzo WB, Evrony G, Curran T, Bourque G, Pastinen T. Pangenome graphs improve the analysis of structural variants in rare genetic diseases. Nat Commun 2024; 15:657. [PMID: 38253606 PMCID: PMC10803329 DOI: 10.1038/s41467-024-44980-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 01/10/2024] [Indexed: 01/24/2024] Open
Abstract
Rare DNA alterations that cause heritable diseases are only partially resolvable by clinical next-generation sequencing due to the difficulty of detecting structural variation (SV) in all genomic contexts. Long-read, high fidelity genome sequencing (HiFi-GS) detects SVs with increased sensitivity and enables assembling personal and graph genomes. We leverage standard reference genomes, public assemblies (n = 94) and a large collection of HiFi-GS data from a rare disease program (Genomic Answers for Kids, GA4K, n = 574 assemblies) to build a graph genome representing a unified SV callset in GA4K, identify common variation and prioritize SVs that are more likely to cause genetic disease (MAF < 0.01). Using graphs, we obtain a higher level of reproducibility than the standard reference approach. We observe over 200,000 SV alleles unique to GA4K, including nearly 1000 rare variants that impact coding sequence. With improved specificity for rare SVs, we isolate 30 candidate SVs in phenotypically prioritized genes, including known disease SVs. We isolate a novel diagnostic SV in KMT2E, demonstrating use of personal assemblies coupled with pangenome graphs for rare disease genomics. The community may interrogate our pangenome with additional assemblies to discover new SVs within the allele frequency spectrum relevant to genetic diseases.
Collapse
Affiliation(s)
- Cristian Groza
- Quantitative Life Sciences, McGill University, Montréal, QC, Canada
| | | | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA
| | - Emily G Farrow
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA
| | - Isabelle Thiffault
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA
| | | | - William B Rizzo
- Child Health Research Institute, Department of Pediatrics, Nebraska Medical Center, Omaha, NE, USA
| | - Gilad Evrony
- Center for Human Genetics and Genomics, Department of Pediatrics, Neuroscience & Physiology, New York University Grossman School of Medicine, New York, NY, USA
| | - Tom Curran
- Children's Mercy Research Institute, Kansas City, MO, USA
| | - Guillaume Bourque
- Canadian Center for Computational Genomics, McGill University, Montréal, QC, Canada.
- Department of Human Genetics, McGill University, Montréal, QC, Canada.
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan.
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, QC, Canada.
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA.
| |
Collapse
|
26
|
Bailey SM, Cross EM, Kinner-Bibeau L, Sebesta HC, Bedford JS, Tompkins CJ. Monitoring Genomic Structural Rearrangements Resulting from Gene Editing. J Pers Med 2024; 14:110. [PMID: 38276232 PMCID: PMC10817574 DOI: 10.3390/jpm14010110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/04/2024] [Accepted: 01/13/2024] [Indexed: 01/27/2024] Open
Abstract
The cytogenomics-based methodology of directional genomic hybridization (dGH) enables the detection and quantification of a more comprehensive spectrum of genomic structural variants than any other approach currently available, and importantly, does so on a single-cell basis. Thus, dGH is well-suited for testing and/or validating new advancements in CRISPR-Cas9 gene editing systems. In addition to aberrations detected by traditional cytogenetic approaches, the strand specificity of dGH facilitates detection of otherwise cryptic intra-chromosomal rearrangements, specifically small inversions. As such, dGH represents a powerful, high-resolution approach for the quantitative monitoring of potentially detrimental genomic structural rearrangements resulting from exposure to agents that induce DNA double-strand breaks (DSBs), including restriction endonucleases and ionizing radiations. For intentional genome editing strategies, it is critical that any undesired effects of DSBs induced either by the editing system itself or by mis-repair with other endogenous DSBs are recognized and minimized. In this paper, we discuss the application of dGH for assessing gene editing-associated structural variants and the potential heterogeneity of such rearrangements among cells within an edited population, highlighting its relevance to personalized medicine strategies.
Collapse
Affiliation(s)
- Susan M. Bailey
- Department of Environmental and Radiological Health Sciences, Colorado State University, Fort Collins, CO 80523, USA;
- KromaTiD, Inc., Longmont, CO 80501, USA; (E.M.C.); (L.K.-B.); (H.C.S.)
| | - Erin M. Cross
- KromaTiD, Inc., Longmont, CO 80501, USA; (E.M.C.); (L.K.-B.); (H.C.S.)
| | | | - Henry C. Sebesta
- KromaTiD, Inc., Longmont, CO 80501, USA; (E.M.C.); (L.K.-B.); (H.C.S.)
| | - Joel S. Bedford
- Department of Environmental and Radiological Health Sciences, Colorado State University, Fort Collins, CO 80523, USA;
- KromaTiD, Inc., Longmont, CO 80501, USA; (E.M.C.); (L.K.-B.); (H.C.S.)
| | | |
Collapse
|
27
|
Damaraju N, Miller AL, Miller DE. Long-Read DNA and RNA Sequencing to Streamline Clinical Genetic Testing and Reduce Barriers to Comprehensive Genetic Testing. J Appl Lab Med 2024; 9:138-150. [PMID: 38167773 DOI: 10.1093/jalm/jfad107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 10/24/2023] [Indexed: 01/05/2024]
Abstract
BACKGROUND Obtaining a precise molecular diagnosis through clinical genetic testing provides information about disease prognosis or progression, allows accurate counseling about recurrence risk, and empowers individuals to benefit from precision therapies or take part in N-of-1 trials. Unfortunately, more than half of individuals with a suspected Mendelian condition remain undiagnosed after a comprehensive clinical evaluation, and the results of any individual clinical genetic test ordered during a typical evaluation may take weeks or months to return. Furthermore, commonly used technologies, such as short-read sequencing, are limited in the types of disease-causing variation they can identify. New technologies, such as long-read sequencing (LRS), are poised to solve these problems. CONTENT Recent technical advances have improved accuracy, increased throughput, and decreased the costs of commercially available LRS technologies. This has resolved many historical concerns about the use of LRS in the clinical environment and opened the door to widespread clinical adoption of LRS. Here, we review LRS technology, how it has been used in the research setting to clarify complex variants or identify disease-causing variation missed by prior clinical testing, and how it may be used clinically in the near future. SUMMARY LRS is unique in that, as a single data source, it has the potential to replace nearly every other clinical genetic test offered today. When analyzed in a stepwise fashion, LRS will simplify laboratory processes, reduce barriers to comprehensive genetic testing, increase the rate of genetic diagnoses, and shorten the amount of time required to make a molecular diagnosis.
Collapse
Affiliation(s)
- Nikhita Damaraju
- Institute for Public Health Genetics, University of Washington, Seattle, WA 98195, United States
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, United States
| | - Angela L Miller
- Department of Pediatrics, University of Washington, Seattle, WA 98195, United States
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, United States
- Department of Pediatrics, University of Washington, Seattle, WA 98195, United States
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, United States
| |
Collapse
|
28
|
Poot M. Methods of Detection and Mechanisms of Origin of Complex Structural Genome Variations. Methods Mol Biol 2024; 2825:39-65. [PMID: 38913302 DOI: 10.1007/978-1-0716-3946-7_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]
Abstract
Based on classical karyotyping, structural genome variations (SVs) have generally been considered to be either "simple" (with one or two breakpoints) or "complex" (with more than two breakpoints). Studying the breakpoints of SVs at nucleotide resolution revealed additional, subtle structural variations, such that even "simple" SVs turned out to be "complex." Genome-wide sequencing methods, such as fosmid and paired-end mapping, short-read and long-read whole genome sequencing, and single-molecule optical mapping, also indicated that the number of SVs per individual was considerably larger than expected from karyotyping and high-resolution chromosomal array-based studies. Interestingly, SVs were detected in studies of cohorts of individuals without clinical phenotypes. The common denominator of all SVs appears to be a failure to accurately repair DNA double-strand breaks (DSBs) or to halt cell cycle progression if DSBs persist. This review discusses the various DSB response mechanisms during the mitotic cell cycle and during meiosis and their regulation. Emphasis is given to the molecular mechanisms involved in the formation of translocations, deletions, duplications, and inversions during or shortly after meiosis I. Recently, CRISPR-Cas9 studies have provided unexpected insights into the formation of translocations and chromothripsis by both breakage-fusion-bridge and micronucleus-dependent mechanisms.
Collapse
Affiliation(s)
- Martin Poot
- Department of Human Genetics, University of Wuerzburg, Wuerzburg, Germany
| |
Collapse
|
29
|
Chaisson MJP, Sulovari A, Valdmanis PN, Miller DE, Eichler EE. Advances in the discovery and analyses of human tandem repeats. Emerg Top Life Sci 2023; 7:361-381. [PMID: 37905568 PMCID: PMC10806765 DOI: 10.1042/etls20230074] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 11/02/2023]
Abstract
Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.
Collapse
Affiliation(s)
- Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, U.S.A
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, CA 90089, U.S.A
| | - Arvis Sulovari
- Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, U.S.A
| | - Paul N Valdmanis
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, U.S.A
- Department of Pediatrics, University of Washington, Seattle, WA 98195, U.S.A
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, U.S.A
| |
Collapse
|
30
|
Ghasemi A, Sadr Z, Babanejad M, Rohani M, Alavi A. Copy Number Variations in Hereditary Spastic Paraplegia-Related Genes: Evaluation of an Iranian Hereditary Spastic Paraplegia Cohort and Literature Review. Mol Syndromol 2023; 14:477-484. [PMID: 38058755 PMCID: PMC10697729 DOI: 10.1159/000531507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 06/07/2023] [Indexed: 12/08/2023] Open
Abstract
Introduction In human genetic disorders, copy number variations (CNVs) are considered a considerable underlying cause. CNVs are generally detected by array-based methods but can also be discovered by read-depth analysis of whole-exome sequencing (WES) data. We performed WES-based CNV identification in a cohort of 35 Iranian families with hereditary spastic paraplegia (HSP) patients. Methods Thirty-five patients whose routine single-nucleotide variants (SNVs) and insertion/deletion analyses from exome data were unrevealing underwent a pipeline of CNV analysis using the read-depth detection method. Subsequently, a comprehensive search about the existence of CNVs in all 84 known HSP-causing genes was carried out in all reported HSP cases, so far. Results and Discussion CNV analysis of exome data indicated that 1 patient harbored a heterozygous deletion in exon 17 of the SPAST gene. Multiplex ligation-dependent probe amplification analysis confirmed this deletion in the proband and his affected father. Literature review demonstrated that, to date, pathogenic CNVs have been identified in 30 out of 84 HSP-causing genes (∼36%). However, CNVs in only 17 of these genes were specifically associated with the HSP phenotype. Among them, CNVs were more common in L1CAM, PLP1, SPAST, SPG7, SPG11, and REEP1 genes. The identification of the CNV in 1 of our patients suggests that WES allows the detection of both SNVs and CNVs from a single method without additional costs and execution time. However, because of intrinsic issues of WES in the detection of large rearrangements, it may not yet be exploited to replace the CNV detection methods in standard clinical practice.
Collapse
Affiliation(s)
- Aida Ghasemi
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Zahra Sadr
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Mojgan Babanejad
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Mohammad Rohani
- Department of Neurology, Iran University of Medical Sciences, Hazrat Rasool Hospital, Tehran, Iran
| | - Afagh Alavi
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
- Neuromuscular Research Center, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
31
|
Moore AR, Yu J, Pei Y, Cheng EWY, Taylor Tavares AL, Walker WT, Thomas NS, Kamath A, Ibitoye R, Josifova D, Wilsdon A, Ross A, Calder AD, Offiah AC, Wilkie AOM, Taylor JC, Pagnamenta AT. Use of genome sequencing to hunt for cryptic second-hit variants: analysis of 31 cases recruited to the 100 000 Genomes Project. J Med Genet 2023; 60:1235-1244. [PMID: 37558402 PMCID: PMC10715503 DOI: 10.1136/jmg-2023-109362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 07/28/2023] [Indexed: 08/11/2023]
Abstract
BACKGROUND Current clinical testing methods used to uncover the genetic basis of rare disease have inherent limitations, which can lead to causative pathogenic variants being missed. Within the rare disease arm of the 100 000 Genomes Project (100kGP), families were recruited under the clinical indication 'single autosomal recessive mutation in rare disease'. These participants presented with strong clinical suspicion for a specific autosomal recessive disorder, but only one suspected pathogenic variant had been identified through standard-of-care testing. Whole genome sequencing (WGS) aimed to identify cryptic 'second-hit' variants. METHODS To investigate the 31 families with available data that remained unsolved following formal review within the 100kGP, SVRare was used to aggregate structural variants present in <1% of 100kGP participants. Small variants were assessed using population allele frequency data and SpliceAI. Literature searches and publicly available online tools were used for further annotation of pathogenicity. RESULTS Using these strategies, 8/31 cases were solved, increasing the overall diagnostic yield of this cohort from 10/41 (24.4%) to 18/41 (43.9%). Exemplar cases include a patient with cystic fibrosis harbouring a novel exonic LINE1 insertion in CFTR and a patient with generalised arterial calcification of infancy with complex interlinked duplications involving exons 2-6 of ENPP1. Although ambiguous by short-read WGS, the ENPP1 variant structure was resolved using optical genome mapping and RNA analysis. CONCLUSION Systematic examination of cryptic variants across a multi-disease cohort successfully identifies additional pathogenic variants. WGS data analysis in autosomal recessive rare disease should consider complex structural and small intronic variants as potentially pathogenic second hits.
Collapse
Affiliation(s)
- A Rachel Moore
- Wellcome Centre for Human Genetics, NIHR Oxford Biomedical Research Centre, University of Oxford, Oxford, UK
- Cambridge Genomics Laboratory, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Jing Yu
- Wellcome Centre for Human Genetics, NIHR Oxford Biomedical Research Centre, University of Oxford, Oxford, UK
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK
| | - Yang Pei
- Clinical Genetics Group, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
| | | | | | - Woolf T Walker
- School of Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, Southampton, UK
- PCD Centre, University Hospital Southampton NHS Foundation Trust, Southampton, UK
| | - N Simon Thomas
- Wessex Regional Genetics Laboratory, Salisbury NHS Foundation Trust, Salisbury, UK
| | - Arveen Kamath
- All Wales Medical Genomics Service, University Hospital of Wales, Cardiff, UK
| | - Rita Ibitoye
- North West Thames Regional Genetics Service, Northwick Park Hospital, Harrow, London, UK
| | - Dragana Josifova
- Department of Clinical Genetics, Guy's and St Thomas' Hospitals NHS Trust, London, UK
| | - Anna Wilsdon
- Clinical Genetics, Nottingham City Hospital, Nottingham, UK
| | - Alison Ross
- Clinical Genetics, NHS Grampian, Aberdeen, UK
| | - Alistair D Calder
- Radiology Department, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Amaka C Offiah
- Department of Oncology and Metabolism, The University of Sheffield, Sheffield, UK
| | - Andrew O M Wilkie
- Clinical Genetics Group, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
| | - Jenny C Taylor
- Wellcome Centre for Human Genetics, NIHR Oxford Biomedical Research Centre, University of Oxford, Oxford, UK
| | - Alistair T Pagnamenta
- Wellcome Centre for Human Genetics, NIHR Oxford Biomedical Research Centre, University of Oxford, Oxford, UK
| |
Collapse
|
32
|
Magi A, Mattei G, Mingrino A, Caprioli C, Ronchini C, Frigè G, Semeraro R, Baragli M, Bolognini D, Colombo E, Mazzarella L, Pelicci PG. GASOLINE: detecting germline and somatic structural variants from long-reads data. Sci Rep 2023; 13:20817. [PMID: 38012350 PMCID: PMC10682169 DOI: 10.1038/s41598-023-48285-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 11/24/2023] [Indexed: 11/29/2023] Open
Abstract
Long-read sequencing allows analyses of single nucleic-acid molecules and produces sequences in the order of tens to hundreds kilobases. Its application to whole-genome analyses allows identification of complex genomic structural-variants (SVs) with unprecedented resolution. SV identification, however, requires complex computational methods, based on either read-depth or intra- and inter-alignment signatures approaches, which are limited by size or type of SVs. Moreover, most currently available tools only detect germline variants, thus requiring separate computation of sample pairs for comparative analyses. To overcome these limits, we developed a novel tool (Germline And SOmatic structuraL varIants detectioN and gEnotyping; GASOLINE) that groups SV signatures using a sophisticated clustering procedure based on a modified reciprocal overlap criterion, and is designed to identify germline SVs, from single samples, and somatic SVs from paired test and control samples. GASOLINE is a collection of Perl, R and Fortran codes, it analyzes aligned data in BAM format and produces VCF files with statistically significant somatic SVs. Germline or somatic analysis of 30[Formula: see text] sequencing coverage experiments requires 4-5 h with 20 threads. GASOLINE outperformed currently available methods in the detection of both germline and somatic SVs in synthetic and real long-reads datasets. Notably, when applied on a pair of metastatic melanoma and matched-normal sample, GASOLINE identified five genuine somatic SVs that were missed using five different sequencing technologies and state-of-the art SV calling approaches. Thus, GASOLINE identifies germline and somatic SVs with unprecedented accuracy and resolution, outperforming currently available state-of-the-art WGS long-reads computational methods.
Collapse
Affiliation(s)
- Alberto Magi
- Department of Information Engineering, University of Florence, 50100, Florence, Italy.
- Institute for Biomedical Technologies, National Research Council, Segrate, Milan, Italy.
| | - Gianluca Mattei
- Department of Information Engineering, University of Florence, 50100, Florence, Italy
| | - Alessandra Mingrino
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Chiara Caprioli
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Chiara Ronchini
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
| | - Gianmaria Frigè
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Roberto Semeraro
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Marta Baragli
- Department of Information Engineering, University of Florence, 50100, Florence, Italy
| | - Davide Bolognini
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Emanuela Colombo
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Luca Mazzarella
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
| | - Pier Giuseppe Pelicci
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy.
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy.
| |
Collapse
|
33
|
Rice ES, Alberdi A, Alfieri J, Athrey G, Balacco JR, Bardou P, Blackmon H, Charles M, Cheng HH, Fedrigo O, Fiddaman SR, Formenti G, Frantz LAF, Gilbert MTP, Hearn CJ, Jarvis ED, Klopp C, Marcos S, Mason AS, Velez-Irizarry D, Xu L, Warren WC. A pangenome graph reference of 30 chicken genomes allows genotyping of large and complex structural variants. BMC Biol 2023; 21:267. [PMID: 37993882 PMCID: PMC10664547 DOI: 10.1186/s12915-023-01758-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 11/02/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND The red junglefowl, the wild outgroup of domestic chickens, has historically served as a reference for genomic studies of domestic chickens. These studies have provided insight into the etiology of traits of commercial importance. However, the use of a single reference genome does not capture diversity present among modern breeds, many of which have accumulated molecular changes due to drift and selection. While reference-based resequencing is well-suited to cataloging simple variants such as single-nucleotide changes and short insertions and deletions, it is mostly inadequate to discover more complex structural variation in the genome. METHODS We present a pangenome for the domestic chicken consisting of thirty assemblies of chickens from different breeds and research lines. RESULTS We demonstrate how this pangenome can be used to catalog structural variants present in modern breeds and untangle complex nested variation. We show that alignment of short reads from 100 diverse wild and domestic chickens to this pangenome reduces reference bias by 38%, which affects downstream genotyping results. This approach also allows for the accurate genotyping of a large and complex pair of structural variants at the K feathering locus using short reads, which would not be possible using a linear reference. CONCLUSIONS We expect that this new paradigm of genomic reference will allow better pinpointing of exact mutations responsible for specific phenotypes, which will in turn be necessary for breeding chickens that meet new sustainability criteria and are resilient to quickly evolving pathogen threats.
Collapse
Affiliation(s)
- Edward S Rice
- Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
- Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany
| | - Antton Alberdi
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
| | - James Alfieri
- Department of Ecology & Evolutionary Biology, Texas A&M University, College Station, TX, USA
| | - Giridhar Athrey
- Department of Poultry Science, Texas A&M University, College Station, TX, USA
| | - Jennifer R Balacco
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Philippe Bardou
- Sigenae, GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, 31326, France
| | - Heath Blackmon
- Department of Biology, Texas A&M University, College Station, TX, USA
| | - Mathieu Charles
- University Paris-Saclay, INRAE, AgroParisTech, GABI, Sigenae, Jouy-en-Josas, France
| | - Hans H Cheng
- Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | | | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Laurent A F Frantz
- Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, E1 4DQ, UK
| | - M Thomas P Gilbert
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
| | - Cari J Hearn
- Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- The Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Christophe Klopp
- Sigenae, Genotoul Bioinfo, MIAT UR875, INRAE, Castanet Tolosan, France
| | - Sofia Marcos
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
- Applied Genomics and Bioinformatics, University of the Basque Country (UPV/EHU), Leioa, Bilbao, Spain
| | | | | | - Luohao Xu
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, School of Life Sciences, Southwest University, Chongqing, 400715, China
| | - Wesley C Warren
- Department of Animal Sciences, University of Missouri, Columbia, MO, USA.
| |
Collapse
|
34
|
Zhang S, Cui Q, Yang S, Zhang F, Li C, Wang X, Lei B, Sheng X. Exome and genome sequencing to unravel the precise breakpoints of partial trisomy 6q and partial Monosomy 2q. BMC Pediatr 2023; 23:586. [PMID: 37993819 PMCID: PMC10664609 DOI: 10.1186/s12887-023-04368-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 10/15/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND Patients with complex phenotypes and a chromosomal translocation are particularly challenging, since several potentially pathogenic mechanisms need to be investigated. CASE PRESENTATION Here, we combined exome and genome sequencing techniques to identify the precise breakpoints of heterozygous microduplications in the 6q25.3-q27 region and microdeletions in the 2q37.1-q37.3 region in a proband. The 5-year-old girl exhibited a severe form of congenital cranial dysinnervation disorder (CCDD) in addition to skeletal dysmorphism anomalies and severe intellectual disability. This is the second case affecting chromosomes 2q and 6q. The individual's karyotype showed an unbalanced translocation 46,XX,del(2)t(2;6)(q37.1;q25.3), which was inherited from her unaffected father [46,XY,t(2;6)(q37.1;q25.3)]. We also obtained the precise breakpoints of a de novo heterozygous copy number deletion [del(2)(q37.1q37.3)chr2:g.232963568_24305260del] and a copy number duplication [dup(6)(q25.3q27)chr6:g.158730978_170930050dup]. The parental origin of the observed balanced translocation was not clear because the parents declined genetic testing. CONCLUSION Patients with a 2q37 deletion and 6q25.3 duplication may exhibit severe significant neurological and skeletal dysmorphisms, and the utilization of exome and genome sequencing techniques has the potential to unveil the entire translocation of the CNV and the precise breakpoint.
Collapse
Affiliation(s)
- Shuang Zhang
- People's Hospital of Ningxia Hui Autonomous Region (Ningxia Medical University), Ningxia Eye Hospital, Yinchuan, 750001, China
| | - Qianwei Cui
- People's Hospital of Ningxia Hui Autonomous Region (Ningxia Medical University), Ningxia Eye Hospital, Yinchuan, 750001, China
| | - Shangying Yang
- People's Hospital of Ningxia Hui Autonomous Region (Ningxia Medical University), Ningxia Eye Hospital, Yinchuan, 750001, China
| | - Fangxia Zhang
- People's Hospital of Ningxia Hui Autonomous Region (Ningxia Medical University), Ningxia Eye Hospital, Yinchuan, 750001, China
| | - Chunxia Li
- People's Hospital of Ningxia Hui Autonomous Region (Ningxia Medical University), Ningxia Eye Hospital, Yinchuan, 750001, China
| | - Xiaoguang Wang
- People's Hospital of Ningxia Hui Autonomous Region (Ningxia Medical University), Ningxia Eye Hospital, Yinchuan, 750001, China
| | - Bo Lei
- Henan Eye Institute, Henan Eye Hospital, People's Hospital of Zhengzhou University, Henan Provincial People's Hospital, Zhengzhou, Henan, 450003, China.
| | - Xunlun Sheng
- Gansu Aier Ophthalmology & Optometry Hospital, Lanzhou, 730030, China.
| |
Collapse
|
35
|
Hansen MH, Cédile O, Kjeldsen MLG, Thomassen M, Preiss B, von Neuhoff N, Abildgaard N, Nyvold CG. Toward Cytogenomics: Technical Assessment of Long-Read Nanopore Whole-Genome Sequencing for Detecting Large Chromosomal Alterations in Mantle Cell Lymphoma. J Mol Diagn 2023; 25:796-805. [PMID: 37683892 DOI: 10.1016/j.jmoldx.2023.08.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 06/20/2023] [Accepted: 08/14/2023] [Indexed: 09/10/2023] Open
Abstract
The current advances and success of next-generation sequencing hold the potential for the transition of cancer cytogenetics toward comprehensive cytogenomics. However, the conventional use of short reads impedes the resolution of chromosomal aberrations. Thus, this study evaluated the detection and reproducibility of extensive copy number alterations and chromosomal translocations using long-read Oxford Nanopore Technologies whole-genome sequencing compared with short-read Illumina sequencing. Using the mantle cell lymphoma cell line Granta-519, almost 99% copy-number reproducibility at the 100-kilobase resolution between replicates was demonstrated, with 98% concordance to Illumina. Collectively, the performance of copy number calling from 1.5 million to 7.5 million long reads was comparable to 1 billion Illumina-based reads (50× coverage). Expectedly, the long-read resolution of canonical translocation t(11;14)(q13;q32) was superior, with a sequence similarity of 89% to the already published CCND1/IGH junction (9× coverage), spanning up to 69 kilobases. The cytogenetic profile of Granta-519 was in general agreement with the literature and karyotype, although several differences remained unresolved. In conclusion, contemporary long-read sequencing is primed for future cytogenomics or sequencing-guided cytogenetics. The combined strength of long- and short-read sequencing is apparent, where the high-precision junctional mapping complements and splits paired-end reads. The potential is emphasized by the flexible single-sample genomic data acquisition of Oxford Nanopore Technologies with the high resolution of allelic imbalances using Illumina short-read sequencing.
Collapse
Affiliation(s)
- Marcus H Hansen
- Hematology-Pathology Research Laboratory, Research Unit of Hematology and Research Unit of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark; Department of Hematology, Odense University Hospital, Odense, Denmark.
| | - Oriane Cédile
- Hematology-Pathology Research Laboratory, Research Unit of Hematology and Research Unit of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark; Department of Hematology, Odense University Hospital, Odense, Denmark; OPEN, Odense Patient Data Explorative Network, Odense University Hospital, Odense, Denmark
| | - Marie L G Kjeldsen
- Hematology-Pathology Research Laboratory, Research Unit of Hematology and Research Unit of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark
| | - Mads Thomassen
- Department of Clinical Genetics, Odense University Hospital, Odense, Denmark
| | - Birgitte Preiss
- Hematology-Pathology Research Laboratory, Research Unit of Hematology and Research Unit of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark; Department of Pathology, Odense University Hospital, Odense, Denmark
| | - Nils von Neuhoff
- Department of Pediatric Hematology and Oncology, Essen University Hospital and University of Duisburg-Essen, Essen, Germany
| | - Niels Abildgaard
- Hematology-Pathology Research Laboratory, Research Unit of Hematology and Research Unit of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark; Department of Hematology, Odense University Hospital, Odense, Denmark
| | - Charlotte G Nyvold
- Hematology-Pathology Research Laboratory, Research Unit of Hematology and Research Unit of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark; Department of Hematology, Odense University Hospital, Odense, Denmark; OPEN, Odense Patient Data Explorative Network, Odense University Hospital, Odense, Denmark
| |
Collapse
|
36
|
Bonnet K, Marschall T, Doerr D. Constructing founder sets under allelic and non-allelic homologous recombination. Algorithms Mol Biol 2023; 18:15. [PMID: 37775806 PMCID: PMC10543304 DOI: 10.1186/s13015-023-00241-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 08/23/2023] [Indexed: 10/01/2023] Open
Abstract
Homologous recombination between the maternal and paternal copies of a chromosome is a key mechanism for human inheritance and shapes population genetic properties of our species. However, a similar mechanism can also act between different copies of the same sequence, then called non-allelic homologous recombination (NAHR). This process can result in genomic rearrangements-including deletion, duplication, and inversion-and is underlying many genomic disorders. Despite its importance for genome evolution and disease, there is a lack of computational models to study genomic loci prone to NAHR. In this work, we propose such a computational model, providing a unified framework for both (allelic) homologous recombination and NAHR. Our model represents a set of genomes as a graph, where haplotypes correspond to walks through this graph. We formulate two founder set problems under our recombination model, provide flow-based algorithms for their solution, describe exact methods to characterize the number of recombinations, and demonstrate scalability to problem instances arising in practice.
Collapse
Affiliation(s)
- Konstantinn Bonnet
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, and Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225, Düsseldorf, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, and Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225, Düsseldorf, Germany.
| | - Daniel Doerr
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, and Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225, Düsseldorf, Germany.
| |
Collapse
|
37
|
Lowther C, Valkanas E, Giordano JL, Wang HZ, Currall BB, O'Keefe K, Pierce-Hoffman E, Kurtas NE, Whelan CW, Hao SP, Weisburd B, Jalili V, Fu J, Wong I, Collins RL, Zhao X, Austin-Tse CA, Evangelista E, Lemire G, Aggarwal VS, Lucente D, Gauthier LD, Tolonen C, Sahakian N, Stevens C, An JY, Dong S, Norton ME, MacKenzie TC, Devlin B, Gilmore K, Powell BC, Brandt A, Vetrini F, DiVito M, Sanders SJ, MacArthur DG, Hodge JC, O'Donnell-Luria A, Rehm HL, Vora NL, Levy B, Brand H, Wapner RJ, Talkowski ME. Systematic evaluation of genome sequencing for the diagnostic assessment of autism spectrum disorder and fetal structural anomalies. Am J Hum Genet 2023; 110:1454-1469. [PMID: 37595579 PMCID: PMC10502737 DOI: 10.1016/j.ajhg.2023.07.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 07/25/2023] [Accepted: 07/25/2023] [Indexed: 08/20/2023] Open
Abstract
Short-read genome sequencing (GS) holds the promise of becoming the primary diagnostic approach for the assessment of autism spectrum disorder (ASD) and fetal structural anomalies (FSAs). However, few studies have comprehensively evaluated its performance against current standard-of-care diagnostic tests: karyotype, chromosomal microarray (CMA), and exome sequencing (ES). To assess the clinical utility of GS, we compared its diagnostic yield against these three tests in 1,612 quartet families including an individual with ASD and in 295 prenatal families. Our GS analytic framework identified a diagnostic variant in 7.8% of ASD probands, almost 2-fold more than CMA (4.3%) and 3-fold more than ES (2.7%). However, when we systematically captured copy-number variants (CNVs) from the exome data, the diagnostic yield of ES (7.4%) was brought much closer to, but did not surpass, GS. Similarly, we estimated that GS could achieve an overall diagnostic yield of 46.1% in unselected FSAs, representing a 17.2% increased yield over karyotype, 14.1% over CMA, and 4.1% over ES with CNV calling or 36.1% increase without CNV discovery. Overall, GS provided an added diagnostic yield of 0.4% and 0.8% beyond the combination of all three standard-of-care tests in ASD and FSAs, respectively. This corresponded to nine GS unique diagnostic variants, including sequence variants in exons not captured by ES, structural variants (SVs) inaccessible to existing standard-of-care tests, and SVs where the resolution of GS changed variant classification. Overall, this large-scale evaluation demonstrated that GS significantly outperforms each individual standard-of-care test while also outperforming the combination of all three tests, thus warranting consideration as the first-tier diagnostic approach for the assessment of ASD and FSAs.
Collapse
Affiliation(s)
- Chelsea Lowther
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Elise Valkanas
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Biological and Biomedical Sciences, Division of Medical Sciences, Harvard Medical School, Boston, MA, USA
| | - Jessica L Giordano
- Department of Obstetrics & Gynecology, Columbia University Medical Center, New York, NY, USA
| | - Harold Z Wang
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Benjamin B Currall
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Kathryn O'Keefe
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Emma Pierce-Hoffman
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nehir E Kurtas
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Christopher W Whelan
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Stephanie P Hao
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ben Weisburd
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vahid Jalili
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jack Fu
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Isaac Wong
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ryan L Collins
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Bioinformatics and Integrative Genomics, Division of Medical Sciences, Harvard Medical School, Boston, MA, USA
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Christina A Austin-Tse
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Department of Pathology, Harvard Medical School, Boston, MA, USA
| | - Emily Evangelista
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gabrielle Lemire
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vimla S Aggarwal
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY, USA
| | - Diane Lucente
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Laura D Gauthier
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Charlotte Tolonen
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nareh Sahakian
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Christine Stevens
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joon-Yong An
- School of Biosystem and Biomedical Science, Korea University, Seoul, South Korea
| | - Shan Dong
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Mary E Norton
- Center for Maternal-Fetal Precision Medicine, University of California, San Francisco, San Francisco, CA, USA; Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California, San Francisco, San Francisco, California, USA
| | - Tippi C MacKenzie
- Center for Maternal-Fetal Precision Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Bernie Devlin
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Kelly Gilmore
- Department of Obstetrics and Gynecology, Division of Maternal-Fetal Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Bradford C Powell
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alicia Brandt
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Francesco Vetrini
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Michelle DiVito
- Department of Obstetrics & Gynecology, Columbia University Medical Center, New York, NY, USA
| | - Stephan J Sanders
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Daniel G MacArthur
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Centre for Population Genomics, Garvan Institute of Medical Research, and University of New South Wales Sydney, Sydney, NSW, Australia; Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, VIC, Australia
| | - Jennelle C Hodge
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Anne O'Donnell-Luria
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Heidi L Rehm
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Neeta L Vora
- Department of Obstetrics and Gynecology, Division of Maternal-Fetal Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Brynn Levy
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Ronald J Wapner
- Department of Obstetrics & Gynecology, Columbia University Medical Center, New York, NY, USA
| | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Neurology, Harvard Medical School, Boston, MA, USA; Program in Biological and Biomedical Sciences, Division of Medical Sciences, Harvard Medical School, Boston, MA, USA; Program in Bioinformatics and Integrative Genomics, Division of Medical Sciences, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
38
|
Abstract
DNA sequencing has revolutionized medicine over recent decades. However, analysis of large structural variation and repetitive DNA, a hallmark of human genomes, has been limited by short-read technology, with read lengths of 100-300 bp. Long-read sequencing (LRS) permits routine sequencing of human DNA fragments tens to hundreds of kilobase pairs in size, using both real-time sequencing by synthesis and nanopore-based direct electronic sequencing. LRS permits analysis of large structural variation and haplotypic phasing in human genomes and has enabled the discovery and characterization of rare pathogenic structural variants and repeat expansions. It has also recently enabled the assembly of a complete, gapless human genome that includes previously intractable regions, such as highly repetitive centromeres and homologous acrocentric short arms. With the addition of protocols for targeted enrichment, direct epigenetic DNA modification detection, and long-range chromatin profiling, LRS promises to launch a new era of understanding of genetic diversity and pathogenic mutations in human populations.
Collapse
Affiliation(s)
- Peter E Warburton
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert P Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
39
|
Brandes D, Yasin L, Nebral K, Ebler J, Schinnerl D, Picard D, Bergmann AK, Alam J, Köhrer S, Haas OA, Attarbaschi A, Marschall T, Stanulla M, Borkhardt A, Brozou T, Fischer U, Wagener R. Optical Genome Mapping Identifies Novel Recurrent Structural Alterations in Childhood ETV6::RUNX1+ and High Hyperdiploid Acute Lymphoblastic Leukemia. Hemasphere 2023; 7:e925. [PMID: 37469802 PMCID: PMC10353714 DOI: 10.1097/hs9.0000000000000925] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 06/01/2023] [Indexed: 07/21/2023] Open
Abstract
The mutational landscape of B-cell precursor acute lymphoblastic leukemia (BCP-ALL), the most common pediatric cancer, is not fully described partially because commonly applied short-read next generation sequencing has a limited ability to identify structural variations. By combining comprehensive analysis of structural variants (SVs), single-nucleotide variants (SNVs), and small insertions-deletions, new subtype-defining and therapeutic targets may be detected. We analyzed the landscape of somatic alterations in 60 pediatric patients diagnosed with the most common BCP-ALL subtypes, ETV6::RUNX1+ and classical hyperdiploid (HD), using conventional cytogenetics, single nucleotide polymorphism (SNP) array, whole exome sequencing (WES), and the novel optical genome mapping (OGM) technique. Ninety-five percent of SVs detected by cytogenetics and SNP-array were verified by OGM. OGM detected an additional 677 SVs not identified using the conventional methods, including (subclonal) IKZF1 deletions. Based on OGM, ETV6::RUNX1+ BCP-ALL harbored 2.7 times more SVs than HD BCP-ALL, mainly focal deletions. Besides SVs in known leukemia development genes (ETV6, PAX5, BTG1, CDKN2A), we identified 19 novel recurrently altered regions (in n ≥ 3) including 9p21.3 (FOCAD/HACD4), 8p11.21 (IKBKB), 1p34.3 (ZMYM1), 4q24 (MANBA), 8p23.1 (MSRA), and 10p14 (SFMBT2), as well as ETV6::RUNX1+ subtype-specific SVs (12p13.1 (GPRC5A), 12q24.21 (MED13L), 18q11.2 (MIB1), 20q11.22 (NCOA6)). We detected 3 novel fusion genes (SFMBT2::DGKD, PDS5B::STAG2, and TDRD5::LPCAT2), for which the sequence and expression were validated by long-read and whole transcriptome sequencing, respectively. OGM and WES identified double hits of SVs and SNVs (ETV6, BTG1, STAG2, MANBA, TBL1XR1, NSD2) in the same patient demonstrating the power of the combined approach to define the landscape of genomic alterations in BCP-ALL.
Collapse
Affiliation(s)
- Danielle Brandes
- Pediatric Oncology, Hematology and Clinical Immunology, Medical Faculty, Heinrich-Heine University and University Hospital Dusseldorf, Germany
- Dusseldorf School of Oncology (DSO), Medical Faculty, Heinrich-Heine University, Dusseldorf, Germany
| | - Layal Yasin
- Pediatric Oncology, Hematology and Clinical Immunology, Medical Faculty, Heinrich-Heine University and University Hospital Dusseldorf, Germany
| | - Karin Nebral
- Labdia Labordiagnostik, Clinical Genetics, Vienna, Austria
- St. Anna Children´s Cancer Research Institute (CCRI), Vienna, Austria
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich-Heine University, Dusseldorf, Germany
- Center for Digital Medicine, Heinrich-Heine University, Dusseldorf, Germany
| | - Dagmar Schinnerl
- St. Anna Children´s Cancer Research Institute (CCRI), Vienna, Austria
| | - Daniel Picard
- Pediatric Oncology, Hematology and Clinical Immunology, Medical Faculty, Heinrich-Heine University and University Hospital Dusseldorf, Germany
| | - Anke K. Bergmann
- Institute of Human Genetics, Hannover Medical School (MHH), Hannover, Germany
| | - Jubayer Alam
- Pediatric Oncology, Hematology and Clinical Immunology, Medical Faculty, Heinrich-Heine University and University Hospital Dusseldorf, Germany
| | - Stefan Köhrer
- Labdia Labordiagnostik, Clinical Genetics, Vienna, Austria
- St. Anna Children´s Cancer Research Institute (CCRI), Vienna, Austria
| | - Oskar A. Haas
- St. Anna Children’s Hospital, Department of Pediatric Hematology/Oncology, Pediatric Clinic, Medical University, Vienna, Austria
| | - Andishe Attarbaschi
- St. Anna Children´s Cancer Research Institute (CCRI), Vienna, Austria
- St. Anna Children’s Hospital, Department of Pediatric Hematology/Oncology, Pediatric Clinic, Medical University, Vienna, Austria
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich-Heine University, Dusseldorf, Germany
- Center for Digital Medicine, Heinrich-Heine University, Dusseldorf, Germany
| | - Martin Stanulla
- Pediatric Hematology and Oncology, Hannover Medical School (MHH), Hannover, Germany
| | - Arndt Borkhardt
- Pediatric Oncology, Hematology and Clinical Immunology, Medical Faculty, Heinrich-Heine University and University Hospital Dusseldorf, Germany
- German Cancer Consortium (DKTK), partner site Essen/Dusseldorf, Germany
| | - Triantafyllia Brozou
- Pediatric Oncology, Hematology and Clinical Immunology, Medical Faculty, Heinrich-Heine University and University Hospital Dusseldorf, Germany
- German Cancer Consortium (DKTK), partner site Essen/Dusseldorf, Germany
| | - Ute Fischer
- Pediatric Oncology, Hematology and Clinical Immunology, Medical Faculty, Heinrich-Heine University and University Hospital Dusseldorf, Germany
- German Cancer Consortium (DKTK), partner site Essen/Dusseldorf, Germany
| | - Rabea Wagener
- Pediatric Oncology, Hematology and Clinical Immunology, Medical Faculty, Heinrich-Heine University and University Hospital Dusseldorf, Germany
- German Cancer Consortium (DKTK), partner site Essen/Dusseldorf, Germany
| |
Collapse
|
40
|
Ohori S, Miyauchi A, Osaka H, Lourenco CM, Arakaki N, Sengoku T, Ogata K, Honjo RS, Kim CA, Mitsuhashi S, Frith MC, Seyama R, Tsuchida N, Uchiyama Y, Koshimizu E, Hamanaka K, Misawa K, Miyatake S, Mizuguchi T, Saito K, Fujita A, Matsumoto N. Biallelic structural variations within FGF12 detected by long-read sequencing in epilepsy. Life Sci Alliance 2023; 6:e202302025. [PMID: 37286232 PMCID: PMC10248215 DOI: 10.26508/lsa.202302025] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 05/22/2023] [Accepted: 05/23/2023] [Indexed: 06/09/2023] Open
Abstract
We discovered biallelic intragenic structural variations (SVs) in FGF12 by applying long-read whole genome sequencing to an exome-negative patient with developmental and epileptic encephalopathy (DEE). We also found another DEE patient carrying a biallelic (homozygous) single-nucleotide variant (SNV) in FGF12 that was detected by exome sequencing. FGF12 heterozygous recurrent missense variants with gain-of-function or heterozygous entire duplication of FGF12 are known causes of epilepsy, but biallelic SNVs/SVs have never been described. FGF12 encodes intracellular proteins interacting with the C-terminal domain of the alpha subunit of voltage-gated sodium channels 1.2, 1.5, and 1.6, promoting excitability by delaying fast inactivation of the channels. To validate the molecular pathomechanisms of these biallelic FGF12 SVs/SNV, highly sensitive gene expression analyses using lymphoblastoid cells from the patient with biallelic SVs, structural considerations, and Drosophila in vivo functional analysis of the SNV were performed, confirming loss-of-function. Our study highlights the importance of small SVs in Mendelian disorders, which may be overlooked by exome sequencing but can be detected efficiently by long-read whole genome sequencing, providing new insights into the pathomechanisms of human diseases.
Collapse
Affiliation(s)
- Sachiko Ohori
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
- Department of Genetics, Kitasato University Hospital, Sagamihara, Japan
| | - Akihiko Miyauchi
- Department of Pediatrics, Jichi Medical School, Shimotsuke, Japan
| | - Hitoshi Osaka
- Department of Pediatrics, Jichi Medical School, Shimotsuke, Japan
| | - Charles Marques Lourenco
- Neurogenetics Department, Faculdade de Medicina de São José do Rio Preto, São Jose do Rio Preto, Brazil
- Personalized Medicine Department, Special Education Sector at DLE/Grupo Pardini, Belo Horizonte, Brazil
| | - Naohiro Arakaki
- Department of Chromosome Science, National Institute of Genetics, Research Organization of Information and Systems (ROIS), Shizuoka, Japan
- Graduate Institute for Advanced Studies, SOKENDAI, Shizuoka, Japan
| | - Toru Sengoku
- Department of Biochemistry, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Kazuhiro Ogata
- Department of Biochemistry, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Rachel Sayuri Honjo
- Unidade de Genética Médica do Instituto da Criança, Hospital das Clinicas HCFMUSP, Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, Brazil
| | - Chong Ae Kim
- Unidade de Genética Médica do Instituto da Criança, Hospital das Clinicas HCFMUSP, Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, Brazil
| | - Satomi Mitsuhashi
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan
- Computational Bio Big-Data Open Innovation Laboratory, AIST, Tokyo, Japan
| | - Rie Seyama
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
- Department of Obstetrics and Gynecology, Juntendo University, Tokyo, Japan
| | - Naomi Tsuchida
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
- Department of Rare Disease Genomics, Yokohama City University Hospital, Yokohama, Japan
| | - Yuri Uchiyama
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
- Department of Rare Disease Genomics, Yokohama City University Hospital, Yokohama, Japan
| | - Eriko Koshimizu
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Kohei Hamanaka
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Kazuharu Misawa
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Satoko Miyatake
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
- Department of Clinical Genetics, Yokohama City University Hospital, Yokohama, Japan
| | - Takeshi Mizuguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Kuniaki Saito
- Department of Chromosome Science, National Institute of Genetics, Research Organization of Information and Systems (ROIS), Shizuoka, Japan
- Graduate Institute for Advanced Studies, SOKENDAI, Shizuoka, Japan
| | - Atsushi Fujita
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| |
Collapse
|
41
|
Ahsan MU, Liu Q, Perdomo JE, Fang L, Wang K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat Methods 2023; 20:1143-1158. [PMID: 37386186 PMCID: PMC11208083 DOI: 10.1038/s41592-023-01932-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 05/31/2023] [Indexed: 07/01/2023]
Abstract
As long-read sequencing technologies are becoming increasingly popular, a number of methods have been developed for the discovery and analysis of structural variants (SVs) from long reads. Long reads enable detection of SVs that could not be previously detected from short-read sequencing, but computational methods must adapt to the unique challenges and opportunities presented by long-read sequencing. Here, we summarize over 50 long-read-based methods for SV detection, genotyping and visualization, and discuss how new telomere-to-telomere genome assemblies and pangenome efforts can improve the accuracy and drive the development of SV callers in the future.
Collapse
Affiliation(s)
- Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jonathan Elliot Perdomo
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
42
|
Wilson TE, Ahmed S, Higgins J, Salk J, Glover T. svCapture: efficient and specific detection of very low frequency structural variant junctions by error-minimized capture sequencing. NAR Genom Bioinform 2023; 5:lqad042. [PMID: 37181851 PMCID: PMC10167630 DOI: 10.1093/nargab/lqad042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 03/15/2023] [Accepted: 04/28/2023] [Indexed: 05/16/2023] Open
Abstract
Error-corrected sequencing of genomic targets enriched by probe-based capture has become a standard approach for detecting single-nucleotide variants (SNVs) and small insertion/deletions (indels) present at very low variant allele frequencies. Less attention has been given to comparable strategies for rare structural variant (SV) junctions, where different error mechanisms must be addressed. Working from samples with known SV properties, we demonstrate that duplex sequencing (DuplexSeq), which demands confirmation of variants on both strands of a source DNA molecule, eliminates false SV junctions arising from chimeric PCR. DuplexSeq could not address frequent intermolecular ligation artifacts that arise during Y-adapter addition prior to strand denaturation without requiring multiple source molecules. In contrast, tagmentation libraries coupled with data filtering based on strand family size greatly reduced both artifact classes and enabled efficient and specific detection of single-molecule SV junctions. The throughput of SV capture sequencing (svCapture) and base-level accuracy of DuplexSeq provided detailed views of the microhomology profile and limited occurrence of de novo SNVs near the junctions of hundreds of newly created SVs, suggesting end joining as a possible formation mechanism. The open source svCapture pipeline enables rare SV detection as a routine addition to SNVs/indels in properly prepared capture sequencing libraries.
Collapse
Affiliation(s)
- Thomas E Wilson
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Samreen Ahmed
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jake Higgins
- TwinStrand Biosciences Inc., Seattle, WA 98121, USA
| | - Jesse J Salk
- TwinStrand Biosciences Inc., Seattle, WA 98121, USA
| | - Thomas W Glover
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
43
|
Lin J, Jia P, Wang S, Kosters W, Ye K. Comparison and benchmark of structural variants detected from long read and long-read assembly. Brief Bioinform 2023:7169138. [PMID: 37200087 DOI: 10.1093/bib/bbad188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/20/2023] Open
Abstract
Structural variant (SV) detection is essential for genomic studies, and long-read sequencing technologies have advanced our capacity to detect SVs directly from read or de novo assembly, also known as read-based and assembly-based strategy. However, to date, no independent studies have compared and benchmarked the two strategies. Here, on the basis of SVs detected by 20 read-based and eight assembly-based detection pipelines from six datasets of HG002 genome, we investigated the factors that influence the two strategies and assessed their performance with well-curated SVs. We found that up to 80% of the SVs could be detected by both strategies among different long-read datasets, whereas variant type, size, and breakpoint detected by read-based strategy were greatly affected by aligners. For the high-confident insertions and deletions at non-tandem repeat regions, a remarkable subset of them (82% in assembly-based calls and 93% in read-based calls), accounting for around 4000 SVs, could be captured by both reads and assemblies. However, discordance between two strategies was largely caused by complex SVs and inversions, which resulted from inconsistent alignment of reads and assemblies at these loci. Finally, benchmarking with SVs at medically relevant genes, the recall of read-based strategy reached 77% on 5X coverage data, whereas assembly-based strategy required 20X coverage data to achieve similar performance. Therefore, integrating SVs from read and assembly is suggested for general-purpose detection because of inconsistently detected complex SVs and inversions, whereas assembly-based strategy is optional for applications with limited resources.
Collapse
Affiliation(s)
- Jiadong Lin
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061 China
- Leiden Institute of Advanced Computer Science, Faculty of Science, Leiden University, Leiden 2311 EZ, The Netherlands
| | - Peng Jia
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Songbo Wang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Walter Kosters
- Leiden Institute of Advanced Computer Science, Faculty of Science, Leiden University, Leiden 2311 EZ, The Netherlands
| | - Kai Ye
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061 China
- The School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
- Faculty of Science, Leiden University, Leiden 2311 , The Netherlands
| |
Collapse
|
44
|
Ferraj A, Audano PA, Balachandran P, Czechanski A, Flores JI, Radecki AA, Mosur V, Gordon DS, Walawalkar IA, Eichler EE, Reinholdt LG, Beck CR. Resolution of structural variation in diverse mouse genomes reveals chromatin remodeling due to transposable elements. CELL GENOMICS 2023; 3:100291. [PMID: 37228752 PMCID: PMC10203049 DOI: 10.1016/j.xgen.2023.100291] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 02/03/2023] [Accepted: 03/10/2023] [Indexed: 05/25/2023]
Abstract
Diverse inbred mouse strains are important biomedical research models, yet genome characterization of many strains is fundamentally lacking in comparison with humans. In particular, catalogs of structural variants (SVs) (variants ≥ 50 bp) are incomplete, limiting the discovery of causative alleles for phenotypic variation. Here, we resolve genome-wide SVs in 20 genetically distinct inbred mice with long-read sequencing. We report 413,758 site-specific SVs affecting 13% (356 Mbp) of the mouse reference assembly, including 510 previously unannotated coding variants. We substantially improve the Mus musculus transposable element (TE) callset, and we find that TEs comprise 39% of SVs and account for 75% of altered bases. We further utilize this callset to investigate how TE heterogeneity affects mouse embryonic stem cells and find multiple TE classes that influence chromatin accessibility. Our work provides a comprehensive analysis of SVs found in diverse mouse genomes and illustrates the role of TEs in epigenetic differences.
Collapse
Affiliation(s)
- Ardian Ferraj
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Peter A. Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | | | | | - Jacob I. Flores
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Alexander A. Radecki
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Varun Mosur
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - David S. Gordon
- Howard Hughes Medical Institute and Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Isha A. Walawalkar
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Evan E. Eichler
- Howard Hughes Medical Institute and Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | | - Christine R. Beck
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
45
|
Lee YL, Bosse M, Takeda H, Moreira GCM, Karim L, Druet T, Oget-Ebrad C, Coppieters W, Veerkamp RF, Groenen MAM, Georges M, Bouwman AC, Charlier C. High-resolution structural variants catalogue in a large-scale whole genome sequenced bovine family cohort data. BMC Genomics 2023; 24:225. [PMID: 37127590 PMCID: PMC10152703 DOI: 10.1186/s12864-023-09259-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Accepted: 03/20/2023] [Indexed: 05/03/2023] Open
Abstract
BACKGROUND Structural variants (SVs) are chromosomal segments that differ between genomes, such as deletions, duplications, insertions, inversions and translocations. The genomics revolution enabled the discovery of sub-microscopic SVs via array and whole-genome sequencing (WGS) data, paving the way to unravel the functional impact of SVs. Recent human expression QTL mapping studies demonstrated that SVs play a disproportionally large role in altering gene expression, underlining the importance of including SVs in genetic analyses. Therefore, this study aimed to generate and explore a high-quality bovine SV catalogue exploiting a unique cattle family cohort data (total 266 samples, forming 127 trios). RESULTS We curated 13,731 SVs segregating in the population, consisting of 12,201 deletions, 1,509 duplications, and 21 multi-allelic CNVs (> 50-bp). Of these, we validated a subset of copy number variants (CNVs) utilising a direct genotyping approach in an independent cohort, indicating that at least 62% of the CNVs are true variants, segregating in the population. Among gene-disrupting SVs, we prioritised two likely high impact duplications, encompassing ORM1 and POPDC3 genes, respectively. Liver expression QTL mapping results revealed that these duplications are likely causing altered gene expression, confirming the functional importance of SVs. Although most of the accurately genotyped CNVs are tagged by single nucleotide polymorphisms (SNPs) ascertained in WGS data, most CNVs were not captured by individual SNPs obtained from a 50K genotyping array. CONCLUSION We generated a high-quality SV catalogue exploiting unique whole genome sequenced bovine family cohort data. Two high impact duplications upregulating the ORM1 and POPDC3 are putative candidates for postpartum feed intake and hoof health traits, thus warranting further investigation. Generally, CNVs were in low LD with SNPs on the 50K array. Hence, it remains crucial to incorporate CNVs via means other than tagging SNPs, such as investigation of tagging haplotypes, direct imputation of CNVs, or direct genotyping as done in the current study. The SV catalogue and the custom genotyping array generated in the current study will serve as valuable resources accelerating utilisation of full spectrum of genetic variants in bovine genomes.
Collapse
Affiliation(s)
- Young-Lim Lee
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands.
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium.
| | - Mirte Bosse
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Haruko Takeda
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | | | - Latifa Karim
- GIGA Institute, GIGA Genomics Platform, University of Liège, Liège, Belgium
| | - Tom Druet
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | - Claire Oget-Ebrad
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | - Wouter Coppieters
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
- GIGA Institute, GIGA Genomics Platform, University of Liège, Liège, Belgium
| | - Roel F Veerkamp
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Martien A M Groenen
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Michel Georges
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | - Aniek C Bouwman
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Carole Charlier
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| |
Collapse
|
46
|
Billingsley KJ, Ding J, Jerez PA, Illarionova A, Levine K, Grenn FP, Makarious MB, Moore A, Vitale D, Reed X, Hernandez D, Torkamani A, Ryten M, Hardy J, Chia R, Scholz SW, Traynor BJ, Dalgard CL, Ehrlich DJ, Tanaka T, Ferrucci L, Beach T, Serrano GE, Quinn JP, Bubb VJ, Collins RL, Zhao X, Walker M, Pierce-Hoffman E, Brand H, Talkowski ME, Casey B, Cookson MR, Markham A, Nalls MA, Mahmoud M, Sedlazeck FJ, Blauwendraat C, Gibbs JR, Singleton AB. Genome-Wide Analysis of Structural Variants in Parkinson Disease. Ann Neurol 2023; 93:1012-1022. [PMID: 36695634 PMCID: PMC10192042 DOI: 10.1002/ana.26608] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 01/03/2023] [Accepted: 01/16/2023] [Indexed: 01/26/2023]
Abstract
OBJECTIVE Identification of genetic risk factors for Parkinson disease (PD) has to date been primarily limited to the study of single nucleotide variants, which only represent a small fraction of the genetic variation in the human genome. Consequently, causal variants for most PD risk are not known. Here we focused on structural variants (SVs), which represent a major source of genetic variation in the human genome. We aimed to discover SVs associated with PD risk by performing the first large-scale characterization of SVs in PD. METHODS We leveraged a recently developed computational pipeline to detect and genotype SVs from 7,772 Illumina short-read whole genome sequencing samples. Using this set of SV variants, we performed a genome-wide association study using 2,585 cases and 2,779 controls and identified SVs associated with PD risk. Furthermore, to validate the presence of these variants, we generated a subset of matched whole-genome long-read sequencing data. RESULTS We genotyped and tested 3,154 common SVs, representing over 412 million nucleotides of previously uncatalogued genetic variation. Using long-read sequencing data, we validated the presence of three novel deletion SVs that are associated with risk of PD from our initial association analysis, including a 2 kb intronic deletion within the gene LRRN4. INTERPRETATION We identified three SVs associated with genetic risk of PD. This study represents the most comprehensive assessment of the contribution of SVs to the genetic risk of PD to date. ANN NEUROL 2023;93:1012-1022.
Collapse
Affiliation(s)
- Kimberley J. Billingsley
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
| | - Jinhui Ding
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | - Pilar Alvarez Jerez
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
| | | | | | - Francis P. Grenn
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | - Mary B. Makarious
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | - Anni Moore
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | - Daniel Vitale
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
- Data Tecnica International, Washington, DC, USA
| | - Xylena Reed
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
| | - Dena Hernandez
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | - Ali Torkamani
- The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Mina Ryten
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - John Hardy
- UK Dementia Research Institute and Department of Neurodegenerative Disease and Reta Lila Weston Institute, UCL Queen Square Institute of Neurology and UCL Movement Disorders Centre, University College London, London, UK
- Institute for Advanced Study, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | | | - Ruth Chia
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | - Sonja W. Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, Maryland, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, Maryland, USA
| | - Bryan J. Traynor
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, Maryland, USA
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
- Therapeutic Development Branch, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, USA
- National Institute of Neurological Disorders and Stroke, Bethesda, MD 20892
- Reta Lila Weston Institute, UCL Queen Square Institute of Neurology, University College London, London WC1N 1PJ, UK
| | - Clifton L. Dalgard
- Department of Anatomy Physiology & Genetics, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
- The American Genome Center, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Debra J. Ehrlich
- Parkinson’s Disease Clinic, Office of the Clinical Director, National Institute of Neurological Disorders and Stroke, Bethesda, Maryland, USA
| | - Toshiko Tanaka
- Translational Gerontology Branch, National Institute on Aging, NIH, Baltimore, MD 21224, USA
| | - Luigi Ferrucci
- Translational Gerontology Branch, National Institute on Aging, NIH, Baltimore, MD 21224, USA
| | - Thomas.G. Beach
- Civin Laboratory for Neuropathology, Banner Sun Health Research Institute, Sun City, AZ
| | - Geidy E. Serrano
- Civin Laboratory for Neuropathology, Banner Sun Health Research Institute, Sun City, AZ
| | - John P. Quinn
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK
| | - Vivien J. Bubb
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK
| | - Ryan L Collins
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
- Division of Medical Sciences and Department of Medicine, Harvard Medical School, Boston, MA 02115
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
| | - Mark Walker
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
- Data Sciences Platform, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
| | - Emma Pierce-Hoffman
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
- Data Sciences Platform, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
- Division of Medical Sciences and Department of Medicine, Harvard Medical School, Boston, MA 02115
| | - Michael E. Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (M.I.T) and Harvard USA Cambridge, MA 02142, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Bradford Casey
- The Michael J. Fox Foundation for Parkinson’s Research, New York, NY 10001
| | - Mark R Cookson
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | | | - Mike A. Nalls
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
- Data Tecnica International, Washington, DC, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, US
| | - Cornelis Blauwendraat
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
| | - J. Raphael Gibbs
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
| | - Andrew B. Singleton
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
| |
Collapse
|
47
|
Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas JK, Monlong J, Abel HJ, Buonaiuto S, Chang XH, Cheng H, Chu J, Colonna V, Eizenga JM, Feng X, Fischer C, Fulton RS, Garg S, Groza C, Guarracino A, Harvey WT, Heumos S, Howe K, Jain M, Lu TY, Markello C, Martin FJ, Mitchell MW, Munson KM, Mwaniki MN, Novak AM, Olsen HE, Pesout T, Porubsky D, Prins P, Sibbesen JA, Sirén J, Tomlinson C, Villani F, Vollger MR, Antonacci-Fulton LL, Baid G, Baker CA, Belyaeva A, Billis K, Carroll A, Chang PC, Cody S, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Ebert P, Fairley S, Fedrigo O, Felsenfeld AL, Formenti G, Frankish A, Gao Y, Garrison NA, Giron CG, Green RE, Haggerty L, Hoekzema K, Hourlier T, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Magalhães H, Marco-Sola S, Marijon P, McCartney A, McDaniel J, Mountcastle J, Nattestad M, Nurk S, Olson ND, Popejoy AB, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Smith MW, Sofia HJ, Abou Tayoun AN, Thibaud-Nissen F, Tricomi FF, Wagner J, Walenz B, Wood JMD, Zimin AV, Bourque G, Chaisson MJP, Flicek P, Phillippy AM, Zook JM, Eichler EE, Haussler D, Wang T, Jarvis ED, Miga KH, Garrison E, Marschall T, Hall IM, Li H, Paten B. A draft human pangenome reference. Nature 2023; 617:312-324. [PMID: 37165242 PMCID: PMC10172123 DOI: 10.1038/s41586-023-05896-x] [Citation(s) in RCA: 363] [Impact Index Per Article: 181.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Accepted: 02/28/2023] [Indexed: 05/12/2023]
Abstract
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
Collapse
Affiliation(s)
- Wen-Wei Liao
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
- Division of Biology and Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Mobin Asri
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Daniel Doerr
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Marina Haukness
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Glenn Hickey
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Shuangjia Lu
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
| | - Julian K Lucas
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jean Monlong
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Haley J Abel
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Silvia Buonaiuto
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | - Xian H Chang
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Justin Chu
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Christian Fischer
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Robert S Fulton
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Shilpa Garg
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Cristian Groza
- Quantitative Life Sciences, McGill University, Montréal, Québec, Canada
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Miten Jain
- Northeastern University, Boston, MA, USA
| | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Charles Markello
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M Novak
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Hugh E Olsen
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Trevor Pesout
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jonas A Sibbesen
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
| | - Jouni Sirén
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Carl A Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | | | - Sarah Cody
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Robert M Cook-Deegan
- Barrett and O'Connor Washington Center, Arizona State University, Washington, DC, USA
| | - Omar E Cornejo
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Mark Diekhans
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam L Felsenfeld
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Nanibaa' A Garrison
- Institute for Society and Genetics, College of Letters and Science, University of California, Los Angeles, CA, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
- Dovetail Genomics, Scotts Valley, CA, USA
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Barbara A Koenig
- Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, CA, USA
| | | | - Jan O Korbel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Hugo Magalhães
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Santiago Marco-Sola
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Departament d'Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pierre Marijon
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Ann McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | | | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Alice B Popejoy
- Department of Public Health Sciences, University of California, Davis, CA, USA
| | - Daniela Puiu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison A Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Ashley D Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Baergen I Schultz
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Michael W Smith
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Heidi J Sofia
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Ahmad N Abou Tayoun
- Al Jalila Genomics Center of Excellence, Al Jalila Children's Specialty Hospital, Dubai, UAE
- Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brian Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Aleksey V Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal, Québec, Canada
- Canadian Center for Computational Genomics, McGill University, Montréal, Québec, Canada
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - David Haussler
- Genomics Institute, University of California, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Ting Wang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Karen H Miga
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA.
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany.
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany.
| | - Ira M Hall
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA.
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA.
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, CA, USA.
| |
Collapse
|
48
|
Olson ND, Wagner J, Dwarshuis N, Miga KH, Sedlazeck FJ, Salit M, Zook JM. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet 2023:10.1038/s41576-023-00590-0. [PMID: 37059810 DOI: 10.1038/s41576-023-00590-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2023] [Indexed: 04/16/2023]
Abstract
Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, USA
| | | | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| |
Collapse
|
49
|
Bezdvornykh I, Cherkasov N, Kanapin A, Samsonova A. A collection of read depth profiles at structural variant breakpoints. Sci Data 2023; 10:186. [PMID: 37024526 PMCID: PMC10079824 DOI: 10.1038/s41597-023-02076-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 03/16/2023] [Indexed: 04/08/2023] Open
Abstract
SWaveform, a newly created open genome-wide resource for read depth signal in the vicinity of structural variant (SV) breakpoints, aims to boost development of computational tools and algorithms for discovery of genomic rearrangement events from sequencing data. SVs are a dominant force shaping genomes and substantially contributing to genetic diversity. Still, there are challenges in reliable and efficient genotyping of SVs from whole genome sequencing data, thus delaying translation into clinical applications and wasting valuable resources. SWaveform includes a database containing ~7 M of read depth profiles at SV breakpoints extracted from 911 sequencing samples generated by the Human Genome Diversity Project, generalised patterns of the signal at breakpoints, an interface for navigation and download, as well as a toolbox for local deployment with user's data. The dataset can be of immense value to bioinformatics and engineering communities as it empowers smooth application of intelligent signal processing and machine learning techniques for discovery of genomic rearrangement events and thus opens the floodgates for development of innovative algorithms and software.
Collapse
Affiliation(s)
- Igor Bezdvornykh
- Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, 199004, Russia
| | - Nikolay Cherkasov
- Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, 199004, Russia
| | - Alexander Kanapin
- Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, 199004, Russia
| | - Anastasia Samsonova
- Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, 199004, Russia.
| |
Collapse
|
50
|
Denti L, Khorsand P, Bonizzoni P, Hormozdiari F, Chikhi R. SVDSS: structural variation discovery in hard-to-call genomic regions using sample-specific strings from accurate long reads. Nat Methods 2023; 20:550-558. [PMID: 36550274 DOI: 10.1038/s41592-022-01674-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 10/08/2022] [Indexed: 12/24/2022]
Abstract
Structural variants (SVs) account for a large amount of sequence variability across genomes and play an important role in human genomics and precision medicine. Despite intense efforts over the years, the discovery of SVs in individuals remains challenging due to the diploid and highly repetitive structure of the human genome, and by the presence of SVs that vastly exceed sequencing read lengths. However, the recent introduction of low-error long-read sequencing technologies such as PacBio HiFi may finally enable these barriers to be overcome. Here we present SV discovery with sample-specific strings (SVDSS)-a method for discovery of SVs from long-read sequencing technologies (for example, PacBio HiFi) that combines and effectively leverages mapping-free, mapping-based and assembly-based methodologies for overall superior SV discovery performance. Our experiments on several human samples show that SVDSS outperforms state-of-the-art mapping-based methods for discovery of insertion and deletion SVs in PacBio HiFi reads and achieves notable improvements in calling SVs in repetitive regions of the genome.
Collapse
Affiliation(s)
- Luca Denti
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France
| | | | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy.
| | - Fereydoun Hormozdiari
- Genome Center, UC Davis, Davis, CA, USA.
- UC Davis MIND Institute, Sacramento, CA, USA.
- Department of Biochemistry and Molecular Medicine, Sacramento, UC Davis, Sacramento, CA, USA.
| | - Rayan Chikhi
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France.
| |
Collapse
|