51
|
Tesi N, Salazar A, Zhang Y, van der Lee S, Hulsman M, Knoop L, Wijesekera S, Krizova J, Schneider AF, Pennings M, Sleegers K, Kamsteeg EJ, Reinders M, Holstege H. Characterizing tandem repeat complexities across long-read sequencing platforms with TREAT and otter. Genome Res 2024; 34:1942-1953. [PMID: 39406499 DOI: 10.1101/gr.279351.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 10/03/2024] [Indexed: 11/09/2024]
Abstract
Tandem repeats (TRs) play important roles in genomic variation and disease risk in humans. Long-read sequencing allows for the accurate characterization of TRs; however, the underlying bioinformatics perspectives remain challenging. We present otter and TREAT: otter is a fast targeted local assembler, cross-compatible across different sequencing platforms. It is integrated in TREAT, an end-to-end workflow for TR characterization, visualization, and analysis across multiple genomes. In a comparison with existing tools based on long-read sequencing data from both Oxford Nanopore Technology (ONT, Simplex and Duplex) and Pacific Bioscience (PacBio, Sequel II and Revio), otter and TREAT achieve state-of-the-art genotyping and motif characterization accuracy. Applied to clinically relevant TRs, TREAT/otter significantly identify individuals with pathogenic TR expansions. When applied to a case-control setting, we replicate previously reported associations of TRs with Alzheimer's disease, including those near or within APOC1 (P = 2.63 × 10-9), SPI1 (P = 6.5 × 10-3), and ABCA7 (P = 0.04) genes. Finally, we use TREAT/otter to systematically evaluate potential biases when genotyping TRs using diverse ONT and PacBio long-read sequencing data sets. We show that, in rare cases (0.06%), long-read sequencing from coverage drops in TRs, including the disease-associated TRs in ABCA7 and RFC1 genes. Such coverage drops can lead to TR misgenotyping, hampering the accurate characterization of TR alleles. Taken together, our tools can accurately genotype TRs across different sequencing technologies and with minimal requirements, allowing end-to-end analysis and comparisons of TRs in human genomes, with broad applications in research and clinical fields.
Collapse
Affiliation(s)
- Niccoló Tesi
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands;
- Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
| | - Alex Salazar
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Yaran Zhang
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Sven van der Lee
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Marc Hulsman
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
| | - Lydian Knoop
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Sanduni Wijesekera
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Jana Krizova
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Anne-Fleur Schneider
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Maartje Pennings
- Department of Genome Diagnostics, Radboud University Medical Center, 6525GA Nijmegen, The Netherlands
| | - Kristel Sleegers
- Complex Genetics of Alzheimer's Disease Group, Antwerp Center for Molecular Neurology, VIB, Antwerp B-2650, Belgium
| | - Erik-Jan Kamsteeg
- Department of Genome Diagnostics, Radboud University Medical Center, 6525GA Nijmegen, The Netherlands
| | - Marcel Reinders
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
| | - Henne Holstege
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
| |
Collapse
|
52
|
Kirby E, Bernier A, Guigó R, Wold B, Arzuaga F, Kusunose M, Zawati M, Knoppers BM. Data sharing ethics toolkit: The Human Cell Atlas. Nat Commun 2024; 15:9901. [PMID: 39567529 PMCID: PMC11579383 DOI: 10.1038/s41467-024-54300-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Accepted: 11/06/2024] [Indexed: 11/22/2024] Open
Abstract
Striving to build an exhaustive guidebook of the types and properties of human cells, the Human Cell Atlas' (HCA) success relies on the sampling of diverse populations, developmental stages, and tissue types. Its open science philosophy preconizes the rapid, seamless sharing of data - as openly as possible. In light of the scope and ambition of such an international initiative, the HCA Ethics Working Group (EWG) has been working to build a solid foundation to address the complexities of data collection and sharing as part of Atlas development. Indeed, a particular challenge of the HCA is the diversity of sampling scenarios (e.g., living participants, deceased donors, pediatric populations, culturally diverse backgrounds, tissues from various developmental stages, etc.), and associated ethical and legal norms, which vary across countries contributing to the effort. Hence, to the extent possible, the EWG set out to provide harmonised, international and interoperable policies and tools, to guide its research community. This paper provides a high-level overview of the types of challenges and approaches proposed by the EWG.
Collapse
Affiliation(s)
- Emily Kirby
- Centre of Genomics and Policy, School of Biomedical Sciences, Faculty of Medicine and Health Sciences, McGill University, 740 Dr. Penfield, Suite 5200, Montreal, QC, Canada.
| | - Alexander Bernier
- Centre of Genomics and Policy, School of Biomedical Sciences, Faculty of Medicine and Health Sciences, McGill University, 740 Dr. Penfield, Suite 5200, Montreal, QC, Canada
| | - Roderic Guigó
- Bioinformatics and Genomics, Center for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Dr. Aiguader 88, Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Barbara Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Fabiana Arzuaga
- Interministerial Comission on Advanced Therapies Ministry of Science, Technology and Innovation -Argentina Godoy Cruz 2320. 4th Floor, Ciudad Autónoma de, Buenos Aires, Argentina
| | - Mayumi Kusunose
- Center for Integrative Medical Sciences, RIKEN. 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, Japan
| | - Ma'n Zawati
- Centre of Genomics and Policy, School of Biomedical Sciences, Faculty of Medicine and Health Sciences, McGill University, 740 Dr. Penfield, Suite 5200, Montreal, QC, Canada
| | - Bartha M Knoppers
- Centre of Genomics and Policy, School of Biomedical Sciences, Faculty of Medicine and Health Sciences, McGill University, 740 Dr. Penfield, Suite 5200, Montreal, QC, Canada
| |
Collapse
|
53
|
Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA, Richmond PA, De Coster W, Olson ND, Guarracino A, Li Q, Miller AL, Goffena J, Anderson ZB, Storz SHR, Ward SA, Sinha M, Gonzaga-Jauregui C, Clarke WE, Basile AO, Corvelo A, Reeves C, Helland A, Musunuri RL, Revsine M, Patterson KE, Paschal CR, Zakarian C, Goodwin S, Jensen TD, Robb E, McCombie WR, Sedlazeck FJ, Zook JM, Montgomery SB, Garrison E, Kolmogorov M, Schatz MC, McLaughlin RN, Dashnow H, Zody MC, Loose M, Jain M, Eichler EE, Miller DE. High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation. Genome Res 2024; 34:2061-2073. [PMID: 39358015 DOI: 10.1101/gr.279273.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 09/16/2024] [Indexed: 10/04/2024]
Abstract
Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control data sets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project (1KGP) Oxford Nanopore Technologies Sequencing Consortium aims to generate LRS data from at least 800 of the 1KGP samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37× and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.
Collapse
Affiliation(s)
- Jonas A Gustafson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, Washington 98195, USA
| | - Sophia B Gibson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Nikhita Damaraju
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
- Institute for Public Health Genetics, University of Washington, Seattle, Washington 98195, USA
| | - Miranda P G Zalusky
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - David Twesigomwe
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg 2193, South Africa
| | - Lei Yang
- Pacific Northwest Research Institute, Seattle, Washington 98122, USA
| | - Anthony A Snead
- Department of Biology, New York University, New York, New York 10003, USA
| | | | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp 2650, Belgium
- Department of Biomedical Sciences, University of Antwerp, Antwerp 2000, Belgium
| | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
- Human Technopole, Milan 20157, Italy
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Angela L Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | - Joy Goffena
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | - Zachary B Anderson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | - Sophie H R Storz
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | - Sydney A Ward
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | - Maisha Sinha
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | - Claudia Gonzaga-Jauregui
- International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Mexico City 76230, Mexico
| | - Wayne E Clarke
- New York Genome Center, New York, New York 10013, USA
- Outlier Informatics Inc., Saskatoon, Saskatchewan S7H 1L4, Canada
| | - Anna O Basile
- New York Genome Center, New York, New York 10013, USA
| | - André Corvelo
- New York Genome Center, New York, New York 10013, USA
| | | | | | | | - Mahler Revsine
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Karynne E Patterson
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Cate R Paschal
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, Washington 98195, USA
- Department of Laboratories, Seattle Children's Hospital, Seattle, Washington 98195, USA
| | - Christina Zakarian
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Tanner D Jensen
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Esther Robb
- Department of Computer Science, Stanford University, Stanford, California 94305, USA
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Computer Science, Rice University, Houston, Texas 77251, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | | | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, Maryland 20892, USA
| | | | - Richard N McLaughlin
- Molecular and Cellular Biology Program, University of Washington, Seattle, Washington 98195, USA
- Pacific Northwest Research Institute, Seattle, Washington 98122, USA
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| | - Michael C Zody
- International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Mexico City 76230, Mexico
| | - Matt Loose
- Deep Seq, School of Life Sciences, University of Nottingham, Nottingham NG7 2TQ, UK
| | - Miten Jain
- Department of Bioengineering, Northeastern University, Boston, Massachusetts 02115, USA
- Department of Physics, Northeastern University, Boston, Massachusetts 02115, USA
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts 02115, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, Washington 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Danny E Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA;
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, Washington 98195, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
54
|
Amit I, Ardlie K, Arzuaga F, Awandare G, Bader G, Bernier A, Carninci P, Donnelly S, Eils R, Forrest ARR, Greely HT, Guigo R, Hacohen N, Haniffa M, Kirby ES, Knoppers BM, Kriegstein A, Lein ES, Linnarsson S, Majumder PP, Merad M, Meyer K, Mhlanga MM, Nolan G, Ntusi NAB, Pe'er D, Prabhakar S, Raven-Adams M, Regev A, Rozenblatt-Rosen O, Saha S, Saltzman A, Shalek AK, Shin JW, Stunnenberg H, Teichmann SA, Tickle T, Villani AC, Wells C, Wold B, Yang H, Zhuang X. The commitment of the human cell atlas to humanity. Nat Commun 2024; 15:10019. [PMID: 39567491 PMCID: PMC11579494 DOI: 10.1038/s41467-024-54306-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 11/06/2024] [Indexed: 11/22/2024] Open
Abstract
The Human Cell Atlas (HCA) is a global partnership "to create comprehensive reference maps of all human cells-the fundamental units of life - as a basis for both understanding human health and diagnosing, monitoring, and treating disease." ( https://www.humancellatlas.org/ ) The atlas shall characterize cells from diverse individuals across the globe to better understand human biology. HCA proactively considers the priorities of, and benefits accrued to, contributing communities. Here, we lay out principles and action items that have been adopted to affirm HCA's commitment to equity so that the atlas is beneficial to all of humanity.
Collapse
Affiliation(s)
- Ido Amit
- Weizmann Institute of Science, Rehovot, Israel
| | | | - Fabiana Arzuaga
- Ministry of Science, Technology and Productive Innovation, Buenos Aires, Argentina
| | - Gordon Awandare
- West African Center for Cell Biology of Infectious Pathogens, University of Ghana, Legon, Ghana
| | | | | | - Piero Carninci
- RIKEN Center for Integrative Medical Sciences, Hokkaido, Japan
- Human Technopole, Milan, Italy
| | | | - Roland Eils
- BIH@Charité - Center for Digital Health, Berlin, Ethiopia
| | | | - Henry T Greely
- Stanford Law School, Stanford University, California, USA
| | - Roderic Guigo
- Center for Genomic Regulation, Universitat Pompeu Fabra, Barcelona, Spain
| | - Nir Hacohen
- Center for Cancer Immunotherapy, Mass General Hospital, Charlestown, USA
| | - Muzlifah Haniffa
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | | | | | | | - Ed S Lein
- University of Washington, Seattle, USA
| | | | - Partha P Majumder
- John C. Martin Centre for Liver Research & Innovation, Kolkata, India.
- Indian Statistical Institute, Kolkata, India.
| | | | - Kerstin Meyer
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | | | - Garry Nolan
- Stanford University School of Medicine, California, USA
| | - Ntobeko A B Ntusi
- University of Cape Town, South Africa, and SAMRC Extramural Unit on Intersection of Noncommunicable Diseases and Infectious Diseases, Cape Town, South Africa
| | - Dana Pe'er
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Shyam Prabhakar
- Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Maili Raven-Adams
- Global Alliance for Genomics and Health, Wellcome Sanger Institute, Hinxton, UK
| | - Aviv Regev
- Human Cell Atlas, South San Francisco, USA
| | | | - Senjuti Saha
- Child Health Research Foundation, Dhaka, Bangladesh
| | | | - Alex K Shalek
- Broad Institute, Cambridge, USA
- Massachusetts Institute of Technology, Boston, USA
| | - Jay W Shin
- Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Henk Stunnenberg
- Radboud Institute for Molecular Life Sciences, Nijmegen, The Netherlands
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | | | | | | | - Barbara Wold
- California Institute of Technology, California, USA
| | | | | |
Collapse
|
55
|
Fang B, Edwards SV. Fitness consequences of structural variation inferred from a House Finch pangenome. Proc Natl Acad Sci U S A 2024; 121:e2409943121. [PMID: 39531493 PMCID: PMC11588099 DOI: 10.1073/pnas.2409943121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 10/03/2024] [Indexed: 11/16/2024] Open
Abstract
Genomic structural variants (SVs) play a crucial role in adaptive evolution, yet their average fitness effects and characterization with pangenome tools are understudied in wild animal populations. We constructed a pangenome for House Finches (Haemorhous mexicanus), a model for studies of host-pathogen coevolution, using long-read sequence data on 16 individuals (32 de novo-assembled haplotypes) and one outgroup. We identified 887,118 SVs larger than 50 base pairs, mostly (60%) involving repetitive elements, with reduced SV diversity in the eastern US as a result of its introduction by humans. The distribution of fitness effects of genome-wide SVs was estimated using maximum likelihood approaches and revealed that SVs in both coding and noncoding regions were on average more deleterious than smaller indels or single nucleotide polymorphisms. The reference-free pangenome facilitated identification of a > 10-My-old, 11-megabase-long pericentric inversion on chromosome 1. We found that the genotype frequencies of the inversion, estimated from 135 birds widely sampled temporally and geographically, increased steadily over the 25 y since House Finches were first exposed to the bacterial pathogen Mycoplasma gallisepticum and showed signatures of balancing selection, capturing genes related to immunity and telomerase activity. We also observed shorter telomeres in populations with a greater number of years exposure to Mycoplasma. Our study illustrates the utility of long-read sequencing and pangenome methods for understanding wild animal populations, estimating fitness effects of genome-wide SVs, and advancing our understanding of adaptive evolution through structural variation.
Collapse
Affiliation(s)
- Bohao Fang
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA02138
- Museum of Comparative Zoology, Harvard University, Cambridge, MA02138
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA02138
- Museum of Comparative Zoology, Harvard University, Cambridge, MA02138
| |
Collapse
|
56
|
Dekker J, Mirny LA. The chromosome folding problem and how cells solve it. Cell 2024; 187:6424-6450. [PMID: 39547207 PMCID: PMC11569382 DOI: 10.1016/j.cell.2024.10.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 10/15/2024] [Accepted: 10/15/2024] [Indexed: 11/17/2024]
Abstract
Every cell must solve the problem of how to fold its genome. We describe how the folded state of chromosomes is the result of the combined activity of multiple conserved mechanisms. Homotypic affinity-driven interactions lead to spatial partitioning of active and inactive loci. Molecular motors fold chromosomes through loop extrusion. Topological features such as supercoiling and entanglements contribute to chromosome folding and its dynamics, and tethering loci to sub-nuclear structures adds additional constraints. Dramatically diverse chromosome conformations observed throughout the cell cycle and across the tree of life can be explained through differential regulation and implementation of these basic mechanisms. We propose that the first functions of chromosome folding are to mediate genome replication, compaction, and segregation and that mechanisms of folding have subsequently been co-opted for other roles, including long-range gene regulation, in different conditions, cell types, and species.
Collapse
Affiliation(s)
- Job Dekker
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA; Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| | - Leonid A Mirny
- Institute for Medical Engineering and Science and Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
57
|
Zararsiz GE, Lintelmann J, Cecil A, Kirwan J, Poschet G, Gegner HM, Schuchardt S, Guan XL, Saigusa D, Wishart D, Zheng J, Mandal R, Adams K, Thompson JW, Snyder MP, Contrepois K, Chen S, Ashrafi N, Akyol S, Yilmaz A, Graham SF, O’Connell TM, Kalecký K, Bottiglieri T, Limonciel A, Pham HT, Koal T, Adamski J, Kastenmüller G. Interlaboratory comparison of standardised metabolomics and lipidomics analyses in human and rodent blood using the MxP ® Quant 500 kit. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.13.619447. [PMID: 39605511 PMCID: PMC11601468 DOI: 10.1101/2024.11.13.619447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Metabolomics and lipidomics are pivotal in understanding phenotypic variations beyond genomics. However, quantification and comparability of mass spectrometry (MS)-derived data are challenging. Standardised assays can enhance data comparability, enabling applications in multi-center epidemiological and clinical studies. Here we evaluated the performance and reproducibility of the MxP® Quant 500 kit across 14 laboratories. The kit allows quantification of 634 different metabolites from 26 compound classes using triple quadrupole MS. Each laboratory analysed twelve samples, including human plasma and serum, lipaemic plasma, NIST SRM 1950, and mouse and rat plasma, in triplicates. 505 out of the 634 metabolites were measurable above the limit of detection in all laboratories, while eight metabolites were undetectable in our study. Out of the 505 metabolites, 412 were observed in both human and rodent samples. Overall, the kit exhibited high reproducibility with a median coefficient of variation (CV) of 14.3 %. CVs in NIST SRM 1950 reference plasma were below 25 % and 10 % for 494 and 138 metabolites, respectively. To facilitate further inspection of reproducibility for any compound, we provide detailed results from the in-depth evaluation of reproducibility across concentration ranges using Deming regression. Interlaboratory reproducibility was similar across sample types, with some species-, matrix-, and phenotype-specific differences due to variations in concentration ranges. Comparisons with previous studies on the performance of MS-based kits (including the AbsoluteIDQ p180 and the Lipidyzer) revealed good concordance of reproducibility results and measured absolute concentrations in NIST SRM 1950 for most metabolites, making the MxP® Quant 500 kit a relevant tool to apply metabolomics and lipidomics in multi-center studies.
Collapse
Affiliation(s)
- Gözde Ertürk Zararsiz
- Institute of Computational Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
- Department of Biostatistics, Erciyes University School of Medicine, Kayseri, Turkey
- Drug Application and Research Center (ERFARMA), Erciyes University, Kayseri, Turkey
| | - Jutta Lintelmann
- Metabolomics and Proteomics Core, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Alexander Cecil
- Metabolomics and Proteomics Core, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Jennifer Kirwan
- Metabolomics Platform, Berlin Institute of Health at Charité, Berlin, Germany
| | - Gernot Poschet
- Metabolomics Core Technology Platform, Centre for Organismal Studies, Heidelberg University, Heidelberg, Germany
| | - Hagen M. Gegner
- Metabolomics Core Technology Platform, Centre for Organismal Studies, Heidelberg University, Heidelberg, Germany
| | - Sven Schuchardt
- Fraunhofer Institute for Toxicology and Experimental Medicine, Hannover, Germany
| | - Xue Li Guan
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Daisuke Saigusa
- Laboratory of Biomedical and Analytical Sciences, Faculty of Pharmaceutical Science, Teikyo University, Tokyo, Japan
| | - David Wishart
- Department of Biological Sciences, University of Alberta, Edmonton, Canada
| | - Jiamin Zheng
- Department of Biological Sciences, University of Alberta, Edmonton, Canada
| | - Rupasri Mandal
- Department of Biological Sciences, University of Alberta, Edmonton, Canada
| | - Kendra Adams
- Duke Proteomics and Metabolomics Shared Resource, Center for Genomic and Computational Biology, Duke University, Durham (NC), USA
| | - J. Will Thompson
- Duke Proteomics and Metabolomics Shared Resource, Center for Genomic and Computational Biology, Duke University, Durham (NC), USA
| | - Michael P. Snyder
- Department of Genetics, Stanford University School of Medicine, Stanford (CA), USA
| | - Kevin Contrepois
- Department of Genetics, Stanford University School of Medicine, Stanford (CA), USA
| | - Songjie Chen
- Department of Genetics, Stanford University School of Medicine, Stanford (CA), USA
| | - Nadia Ashrafi
- Corewell Health Research Institute, Metabolomics Department, Royal Oak (MI), USA
- Corewell Health William Beaumont University Hospital, Royal Oak (MI), USA
| | - Sumeyya Akyol
- Corewell Health Research Institute, Metabolomics Department, Royal Oak (MI), USA
| | - Ali Yilmaz
- Corewell Health Research Institute, Metabolomics Department, Royal Oak (MI), USA
- Corewell Health William Beaumont University Hospital, Royal Oak (MI), USA
- Oakland University-William Beaumont School of Medicine, Rochester (MI), USA
| | - Stewart F. Graham
- Corewell Health Research Institute, Metabolomics Department, Royal Oak (MI), USA
- Corewell Health William Beaumont University Hospital, Royal Oak (MI), USA
- Oakland University-William Beaumont School of Medicine, Rochester (MI), USA
| | | | - Karel Kalecký
- Center of Metabolomics, Institute of Metabolic Disease, Baylor Scott & White Research Institute, Dallas (TX), USA
| | - Teodoro Bottiglieri
- Center of Metabolomics, Institute of Metabolic Disease, Baylor Scott & White Research Institute, Dallas (TX), USA
| | | | | | | | - Jerzy Adamski
- Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Institute of Biochemistry, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Gabi Kastenmüller
- Institute of Computational Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
| |
Collapse
|
58
|
Bilgrav Saether K, Eisfeldt J. Detecting transposable elements in long-read genomes using sTELLeR. Bioinformatics 2024; 40:btae686. [PMID: 39558574 PMCID: PMC11601167 DOI: 10.1093/bioinformatics/btae686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 11/05/2024] [Accepted: 11/14/2024] [Indexed: 11/20/2024] Open
Abstract
MOTIVATION Repeat elements, such as transposable elements (TE), are highly repetitive DNA sequences that compose around 50% of the genome. TEs such as Alu, SVA, HERV, and L1 elements can cause disease through disrupting genes, causing frameshift mutations or altering splicing patters. These are elements challenging to characterize using short-read genome sequencing, due to its read length and TEs repetitive nature. Long-read genome sequencing (lrGS) enables bridging of TEs, allowing increased resolution across repetitive DNA sequences. lrGS therefore present an opportunity for improved TE detection and analysis not only from a research perspective but also for future clinical detection. When choosing an lrGS TE caller, parameters such as runtime, CPU hours, sensitivity, precision, and compatibility with inclusion into pipelines are crucial for efficient detection. RESULTS We therefore developed sTELLeR, (s) Transposable ELement in Long (e) Read, for accurate, fast, and effective TE detection. Particularly, sTELLeR exhibit higher precision and sensitivity for calling of Alu elements than similar tools. The caller is 5-48× as fast and uses <2% of the CPU hours compared to competitive callers. The caller is haplotype aware and output results in a variant call format (VCF) file, enabling compatibility with other variant callers and downstream analysis. AVAILABILITY AND IMPLEMENTATION sTELLeR is a python-based tool and is available at https://github.com/kristinebilgrav/sTELLeR. Altogether, we show that sTELLeR is a fast, sensitive, and precise caller for detection of TE elements, and can easily be implemented into variant calling workflows.
Collapse
Affiliation(s)
- Kristine Bilgrav Saether
- Department of Molecular Medicine and Surgery, Karolinska Institute, Stockholm 171 76, Sweden
- Clinical Genomics Facility, Science for Life Laboratory, Stockholm 171 76, Sweden
| | - Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Karolinska Institute, Stockholm 171 76, Sweden
- Clinical Genomics Facility, Science for Life Laboratory, Stockholm 171 76, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, Stockholm 171 77, Sweden
| |
Collapse
|
59
|
Frampton S, Smith R, Ferson L, Gibson J, Hollox EJ, Cragg MS, Strefford JC. Fc gamma receptors: Their evolution, genomic architecture, genetic variation, and impact on human disease. Immunol Rev 2024; 328:65-97. [PMID: 39345014 PMCID: PMC11659932 DOI: 10.1111/imr.13401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Fc gamma receptors (FcγRs) are a family of receptors that bind IgG antibodies and interface at the junction of humoral and innate immunity. Precise regulation of receptor expression provides the necessary balance to achieve healthy immune homeostasis by establishing an appropriate immune threshold to limit autoimmunity but respond effectively to infection. The underlying genetics of the FCGR gene family are central to achieving this immune threshold by regulating affinity for IgG, signaling efficacy, and receptor expression. The FCGR gene locus was duplicated during evolution, retaining very high homology and resulting in a genomic region that is technically difficult to study. Here, we review the recent evolution of the gene family in mammals, its complexity and variation through copy number variation and single-nucleotide polymorphism, and impact of these on disease incidence, resolution, and therapeutic antibody efficacy. We also discuss the progress and limitations of current approaches to study the region and emphasize how new genomics technologies will likely resolve much of the current confusion in the field. This will lead to definitive conclusions on the impact of genetic variation within the FCGR gene locus on immune function and disease.
Collapse
Affiliation(s)
- Sarah Frampton
- Cancer Genomics Group, Faculty of Medicine, School of Cancer SciencesUniversity of SouthamptonSouthamptonUK
| | - Rosanna Smith
- Antibody and Vaccine Group, Faculty of Medicine, School of Cancer Sciences, Centre for Cancer ImmunologyUniversity of SouthamptonSouthamptonUK
| | - Lili Ferson
- Cancer Genomics Group, Faculty of Medicine, School of Cancer SciencesUniversity of SouthamptonSouthamptonUK
| | - Jane Gibson
- Cancer Genomics Group, Faculty of Medicine, School of Cancer SciencesUniversity of SouthamptonSouthamptonUK
| | - Edward J. Hollox
- Department of Genetics, Genomics and Cancer SciencesCollege of Life Sciences, University of LeicesterLeicesterUK
| | - Mark S. Cragg
- Antibody and Vaccine Group, Faculty of Medicine, School of Cancer Sciences, Centre for Cancer ImmunologyUniversity of SouthamptonSouthamptonUK
| | - Jonathan C. Strefford
- Cancer Genomics Group, Faculty of Medicine, School of Cancer SciencesUniversity of SouthamptonSouthamptonUK
| |
Collapse
|
60
|
van Baardwijk MN, Heijnen LSEM, Zhao H, Baudis M, Stubbs AP. A systematic benchmark of copy number variation detection tools for high density SNP genotyping arrays. Genomics 2024; 116:110962. [PMID: 39547585 DOI: 10.1016/j.ygeno.2024.110962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 10/20/2024] [Accepted: 11/09/2024] [Indexed: 11/17/2024]
Abstract
Copy Number Variations (CNVs) are crucial in various diseases, especially cancer, but detecting them accurately from SNP genotyping arrays remains challenging. Therefore, this study benchmarked five CNV detection tools-PennCNV, QuantiSNP, iPattern, EnsembleCNV, and R-GADA-using SNP array and WGS data from 2002 individuals of the DRAGEN re-analysis of the 1000 Genomes project. Results showed significant variability in tool performance. R-GADA had the highest recall but low precision, while PennCNV was the most reliable in terms of precision and F1 score. EnsembleCNV improved recall by combining multiple callers but increased false positives. Overall, current tools, including new methods, do not outperform PennCNV in precise CNV detection. Improved reference data and consensus on true positive CNV calls are necessary. This study provides valuable insights and scalable workflows for researchers selecting CNV detection methods in future studies.
Collapse
Affiliation(s)
- M N van Baardwijk
- Department of Pathology and Clinical Bioinformatics, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands; Department of Surgery, Division of HPB & Transplant Surgery, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - L S E M Heijnen
- Department of Pathology and Clinical Bioinformatics, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - H Zhao
- Department of Molecular Life Sciences, University of Zurich, Switzerland; Swiss Institute of Bioinformatics, Switzerland
| | - M Baudis
- Department of Molecular Life Sciences, University of Zurich, Switzerland; Swiss Institute of Bioinformatics, Switzerland
| | - A P Stubbs
- Department of Pathology and Clinical Bioinformatics, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands
| |
Collapse
|
61
|
Xu Z, Wei P. A novel statistical framework for meta-analysis of total mediation effect with high-dimensional omics mediators in large-scale genomic consortia. PLoS Genet 2024; 20:e1011483. [PMID: 39561194 PMCID: PMC11614268 DOI: 10.1371/journal.pgen.1011483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 12/03/2024] [Accepted: 11/03/2024] [Indexed: 11/21/2024] Open
Abstract
Meta-analysis is used to aggregate the effects of interest across multiple studies, while its methodology is largely underexplored in mediation analysis, particularly in estimating the total mediation effect of high-dimensional omics mediators. Large-scale genomic consortia, such as the Trans-Omics for Precision Medicine (TOPMed) program, comprise multiple cohorts with diverse technologies to elucidate the genetic architecture and biological mechanisms underlying complex human traits and diseases. Leveraging the recent established asymptotic standard error of the R-squared (R2)-based mediation effect estimation for high-dimensional omics mediators, we have developed a novel meta-analysis framework requiring only summary statistics and allowing inter-study heterogeneity. Whereas the proposed meta-analysis can uniquely evaluate and account for potential effect heterogeneity across studies due to, for example, varying genomic profiling platforms, our extensive simulations showed that the developed method was more computationally efficient and yielded satisfactory operating characteristics comparable to analysis of the pooled individual-level data when there was no inter-study heterogeneity. We applied the developed method to 5 TOPMed studies with over 5800 participants to estimate the mediation effects of gene expression on age-related variation in systolic blood pressure and sex-related variation in high-density lipoprotein (HDL) cholesterol. The proposed method is available in R package MetaR2M on GitHub.
Collapse
Affiliation(s)
- Zhichao Xu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| |
Collapse
|
62
|
Hemstrom W, Grummer JA, Luikart G, Christie MR. Next-generation data filtering in the genomics era. Nat Rev Genet 2024; 25:750-767. [PMID: 38877133 DOI: 10.1038/s41576-024-00738-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2024] [Indexed: 06/16/2024]
Abstract
Genomic data are ubiquitous across disciplines, from agriculture to biodiversity, ecology, evolution and human health. However, these datasets often contain noise or errors and are missing information that can affect the accuracy and reliability of subsequent computational analyses and conclusions. A key step in genomic data analysis is filtering - removing sequencing bases, reads, genetic variants and/or individuals from a dataset - to improve data quality for downstream analyses. Researchers are confronted with a multitude of choices when filtering genomic data; they must choose which filters to apply and select appropriate thresholds. To help usher in the next generation of genomic data filtering, we review and suggest best practices to improve the implementation, reproducibility and reporting standards for filter types and thresholds commonly applied to genomic datasets. We focus mainly on filters for minor allele frequency, missing data per individual or per locus, linkage disequilibrium and Hardy-Weinberg deviations. Using simulated and empirical datasets, we illustrate the large effects of different filtering thresholds on common population genetics statistics, such as Tajima's D value, population differentiation (FST), nucleotide diversity (π) and effective population size (Ne).
Collapse
Affiliation(s)
- William Hemstrom
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| | - Jared A Grummer
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Gordon Luikart
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Mark R Christie
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
- Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
63
|
Luo J, Zhang Z, Ma X, Yan C, Luo H. GTasm: a genome assembly method using graph transformers and HiFi reads. Front Genet 2024; 15:1495657. [PMID: 39525812 PMCID: PMC11543488 DOI: 10.3389/fgene.2024.1495657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Accepted: 10/14/2024] [Indexed: 11/16/2024] Open
Abstract
Motivation Genome assembly aims to reconstruct the whole chromosome-scale genome sequence. Obtaining accurate and complete chromosome-scale genome sequence serve as an indispensable foundation for downstream genomics analyses. Due to the complex repeat regions contained in genome sequence, the assembly results commonly are fragmented. Long reads with high accuracy rate can greatly enhance the integrity of genome assembly results. Results Here we introduce GTasm, an assembly method that uses graph transformer network to find optimal assembly results based on assembly graphs. Based on assembly graph, GTasm first extracts features about vertices and edges. Then, GTasm scores the edges by graph transformer model, and adopt a heuristic algorithm to find optimal paths in the assembly graph, each path corresponding to a contig. The graph transformer model is trained using simulated HiFi reads from CHM13, and GTasm is compared with other assembly methods using real HIFI read set. Through experimental result, GTasm can produce well assembly results, and achieve good performance on NA50 and NGA50 evaluation indicators. Applying deep learning models to genome assembly can improve the continuity and accuracy of assembly results. The code is available from https://github.com/chu-xuezhe/GTasm.
Collapse
Affiliation(s)
- Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, China
| | - Ziheng Zhang
- School of Software, Henan Polytechnic University, Jiaozuo, China
| | - Xinliang Ma
- School of Software, Henan Polytechnic University, Jiaozuo, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| |
Collapse
|
64
|
Parmar JM, Laing NG, Kennerson ML, Ravenscroft G. Genetics of inherited peripheral neuropathies and the next frontier: looking backwards to progress forwards. J Neurol Neurosurg Psychiatry 2024; 95:992-1001. [PMID: 38744462 PMCID: PMC11503175 DOI: 10.1136/jnnp-2024-333436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 04/10/2024] [Indexed: 05/16/2024]
Abstract
Inherited peripheral neuropathies (IPNs) encompass a clinically and genetically heterogeneous group of disorders causing length-dependent degeneration of peripheral autonomic, motor and/or sensory nerves. Despite gold-standard diagnostic testing for pathogenic variants in over 100 known associated genes, many patients with IPN remain genetically unsolved. Providing patients with a diagnosis is critical for reducing their 'diagnostic odyssey', improving clinical care, and for informed genetic counselling. The last decade of massively parallel sequencing technologies has seen a rapid increase in the number of newly described IPN-associated gene variants contributing to IPN pathogenesis. However, the scarcity of additional families and functional data supporting variants in potential novel genes is prolonging patient diagnostic uncertainty and contributing to the missing heritability of IPNs. We review the last decade of IPN disease gene discovery to highlight novel genes, structural variation and short tandem repeat expansions contributing to IPN pathogenesis. From the lessons learnt, we provide our vision for IPN research as we anticipate the future, providing examples of emerging technologies, resources and tools that we propose that will expedite the genetic diagnosis of unsolved IPN families.
Collapse
Affiliation(s)
- Jevin M Parmar
- Rare Disease Genetics and Functional Genomics, Harry Perkins Institute of Medical Research, Perth, Western Australia, Australia
- Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Western Australia, Australia
| | - Nigel G Laing
- Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Western Australia, Australia
- Preventive Genetics, Harry Perkins Institute of Medical Research, Perth, Western Australia, Australia
| | - Marina L Kennerson
- Northcott Neuroscience Laboratory, ANZAC Research Institute, Concord, New South Wales, Australia
- Molecular Medicine Laboratory, Concord Hospital, Concord, New South Wales, Australia
| | - Gianina Ravenscroft
- Rare Disease Genetics and Functional Genomics, Harry Perkins Institute of Medical Research, Perth, Western Australia, Australia
- Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Western Australia, Australia
| |
Collapse
|
65
|
Groza C, Chen X, Wheeler TJ, Bourque G, Goubert C. A unified framework to analyze transposable element insertion polymorphisms using graph genomes. Nat Commun 2024; 15:8915. [PMID: 39414821 PMCID: PMC11484939 DOI: 10.1038/s41467-024-53294-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 10/02/2024] [Indexed: 10/18/2024] Open
Abstract
Transposable elements are ubiquitous mobile DNA sequences generating insertion polymorphisms, contributing to genomic diversity. We present GraffiTE, a flexible pipeline to analyze polymorphic mobile elements insertions. By integrating state-of-the-art structural variant detection algorithms and graph genomes, GraffiTE identifies polymorphic mobile elements from genomic assemblies or long-read sequencing data, and genotypes these variants using short or long read sets. Benchmarking on simulated and real datasets reports high precision and recall rates. GraffiTE is designed to allow non-expert users to perform comprehensive analyses, including in models with limited transposable element knowledge and is compatible with various sequencing technologies. Here, we demonstrate the versatility of GraffiTE by analyzing human, Drosophila melanogaster, maize, and Cannabis sativa pangenome data. These analyses reveal the landscapes of polymorphic mobile elements and their frequency variations across individuals, strains, and cultivars.
Collapse
Affiliation(s)
- Cristian Groza
- Quantitative Life Sciences, McGill University, Montréal, QC, Canada
| | - Xun Chen
- Institute for the Advanced Study of Human Biology (ASHBi), Kyoto University, Kyoto, Japan
| | - Travis J Wheeler
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ, USA
| | - Guillaume Bourque
- Institute for the Advanced Study of Human Biology (ASHBi), Kyoto University, Kyoto, Japan
- Canadian Centre for Computational Genomics, McGill University, Montréal, QC, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, QC, Canada
- Human Genetics, McGill University, Montréal, QC, Canada
| | - Clément Goubert
- Human Genetics, McGill University, Montréal, QC, Canada.
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ, USA.
| |
Collapse
|
66
|
Chandra G, Gibney D, Jain C. Haplotype-aware sequence alignment to pangenome graphs. Genome Res 2024; 34:1265-1275. [PMID: 39013594 PMCID: PMC11529843 DOI: 10.1101/gr.279143.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 06/24/2024] [Indexed: 07/18/2024]
Abstract
Modern pangenome graphs are built using haplotype-resolved genome assemblies. When mapping reads to a pangenome graph, prioritizing alignments that are consistent with the known haplotypes improves genotyping accuracy. However, the existing rigorous formulations for colinear chaining and alignment problems do not consider the haplotype paths in a pangenome graph. This often leads to spurious read alignments to those paths that are unlikely recombinations of the known haplotypes. In this paper, we develop novel formulations and algorithms for sequence-to-graph alignment and chaining problems. Inspired by the genotype imputation models, we assume that a query sequence is an imperfect mosaic of reference haplotypes. Accordingly, we introduce a recombination penalty in the scoring functions for each haplotype switch. First, we solve haplotype-aware sequence-to-graph alignment in [Formula: see text] time, where Q is the query sequence, E is the set of edges, and H is the set of haplotypes represented in the graph. To complement our solution, we prove that an algorithm significantly faster than [Formula: see text] is impossible under the strong exponential time hypothesis (SETH). Second, we propose a haplotype-aware chaining algorithm that runs in [Formula: see text] time after graph preprocessing, where N is the count of input anchors. We then establish that a chaining algorithm significantly faster than [Formula: see text] is impossible under SETH. As a proof-of-concept, we implemented our chaining algorithm in the Minichain aligner. By aligning sequences sampled from the human major histocompatibility complex (MHC) to a pangenome graph of 60 MHC haplotypes, we demonstrate that our algorithm achieves better consistency with ground-truth recombinations compared with a haplotype-agnostic algorithm.
Collapse
Affiliation(s)
- Ghanshyam Chandra
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore Karnataka 560012, India
| | - Daniel Gibney
- Department of Computer Science, The University of Texas at Dallas, Richardson, Texas 75080, USA
| | - Chirag Jain
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore Karnataka 560012, India;
| |
Collapse
|
67
|
Henglin M, Ghareghani M, Harvey WT, Porubsky D, Koren S, Eichler EE, Ebert P, Marschall T. Graphasing: phasing diploid genome assembly graphs with single-cell strand sequencing. Genome Biol 2024; 25:265. [PMID: 39390579 PMCID: PMC11466045 DOI: 10.1186/s13059-024-03409-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 09/30/2024] [Indexed: 10/12/2024] Open
Abstract
Haplotype information is crucial for biomedical and population genetics research. However, current strategies to produce de novo haplotype-resolved assemblies often require either difficult-to-acquire parental data or an intermediate haplotype-collapsed assembly. Here, we present Graphasing, a workflow which synthesizes the global phase signal of Strand-seq with assembly graph topology to produce chromosome-scale de novo haplotypes for diploid genomes. Graphasing readily integrates with any assembly workflow that both outputs an assembly graph and has a haplotype assembly mode. Graphasing performs comparably to trio phasing in contiguity, phasing accuracy, and assembly quality, outperforms Hi-C in phasing accuracy, and generates human assemblies with over 18 chromosome-spanning haplotypes.
Collapse
Affiliation(s)
- Mir Henglin
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Maryam Ghareghani
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
68
|
Boakye Serebour T, Cribbs AP, Baldwin MJ, Masimirembwa C, Chikwambi Z, Kerasidou A, Snelling SJB. Overcoming barriers to single-cell RNA sequencing adoption in low- and middle-income countries. Eur J Hum Genet 2024; 32:1206-1213. [PMID: 38565638 PMCID: PMC11499908 DOI: 10.1038/s41431-024-01564-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 01/29/2024] [Accepted: 02/06/2024] [Indexed: 04/04/2024] Open
Abstract
The advent of single-cell resolution sequencing and spatial transcriptomics has enabled the delivery of cellular and molecular atlases of tissues and organs, providing new insights into tissue health and disease. However, if the full potential of these technologies is to be equitably realised, ancestrally inclusivity is paramount. Such a goal requires greater inclusion of both researchers and donors in low- and middle-income countries (LMICs). In this perspective, we describe the current landscape of ancestral inclusivity in genomic and single-cell transcriptomic studies. We discuss the collaborative efforts needed to scale the barriers to establishing, expanding, and adopting single-cell sequencing research in LMICs and to enable globally impactful outcomes of these technologies.
Collapse
Affiliation(s)
- Tracy Boakye Serebour
- The Botnar Institute for Musculoskeletal Science, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Adam P Cribbs
- The Botnar Institute for Musculoskeletal Science, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Mathew J Baldwin
- The Botnar Institute for Musculoskeletal Science, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Collen Masimirembwa
- The African Institute of Biomedical Science and Technology, Harare, Zimbabwe
| | - Zedias Chikwambi
- The African Institute of Biomedical Science and Technology, Harare, Zimbabwe
| | - Angeliki Kerasidou
- The Ethox Centre and the Wellcome Centre for Ethics and Humanities, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Sarah J B Snelling
- The Botnar Institute for Musculoskeletal Science, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK.
| |
Collapse
|
69
|
Lau SHM, Jiin Ying L, Goh CYJ, Choo J, Chow C, Ling S, Ng YH, Yi Hua T, Teo JX, Chua KP, Chin M, Lim WK, Jamuar SS, Lai AHM, Goh JLK. Dilated aorta in CNOT3 -related neurodevelopmental disorder: 'expanding' the phenotype. Clin Dysmorphol 2024; 33:176-182. [PMID: 39140378 DOI: 10.1097/mcd.0000000000000495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2024]
Abstract
INTRODUCTION Neurodevelopmental disorders (NDDs) comprise conditions that emerge during the child's development and contribute significantly to global health and economic burdens. De novo variants in CNOT3 have been linked to NDDs and understanding the genotype-phenotype relationship between CNOT3 and NDDs will aid in improving diagnosis and management. METHODS In this study, we report a case of a patient with CNOT3 -related NDD who presented with progressive aortic dilatation, a feature not reported previously. RESULTS Our patient presented with intellectual disorder, dysmorphic facial features, and cardiac anomalies, notably progressive aortic dilatation - a novel finding in CNOT3 -related NDD. Genetic testing identified a de novo 6.3 kbp intragenic deletion in CNOT3 , providing a possible genetic basis for her condition. CONCLUSION This study presents the first case of CNOT3 -related NDD in Southeast Asia, expanding the phenotype to include progressive aortic dilatation and suggesting merit in cardiac surveillance of patients with CNOT3 -related NDD. It also emphasizes the importance of genetic testing in diagnosing complex NDD cases as well as reanalysis of 'negative' cases using advanced sequencing technologies to uncover potential hidden genetic etiologies in undiagnosed NDDs.
Collapse
Affiliation(s)
| | - Lim Jiin Ying
- Genetics Service, Department of Paediatrics , KK Women's and Children's Hospital
- SingHealth Duke-NUS Genomic Medicine Centre
| | - Chew Yin Jasmine Goh
- Genetics Service, Department of Paediatrics , KK Women's and Children's Hospital
- SingHealth Duke-NUS Genomic Medicine Centre
- Division of Nursing - Nursing Clinical Service, KK Women's and Children's Hospital
| | - Jonathan Choo
- Cardiology Service, Department of Paediatric Subspecialties
| | - Cristelle Chow
- Paediatric Academic Clinical Programme, Duke-NUS Medical School
- Complex Care Service, Department of Paediatrics
| | - Simon Ling
- Paediatric Academic Clinical Programme, Duke-NUS Medical School
- Neurology Service, Department of Paediatrics
| | - Yong Hong Ng
- Paediatric Academic Clinical Programme, Duke-NUS Medical School
- Nephrology Service, Department of Paediatrics
| | - Tan Yi Hua
- Paediatric Academic Clinical Programme, Duke-NUS Medical School
- Respiratory Medicine Service, Department of Paediatrics , KK Women's and Children's Hospital
| | - Jing Xian Teo
- SingHealth Duke-NUS Genomic Medicine Centre
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore, Singapore
| | - Khi Pin Chua
- Pacific BioSciences, Menlo Park, California, USA
| | - Minning Chin
- Pacific BioSciences, Menlo Park, California, USA
| | - Weng Khong Lim
- SingHealth Duke-NUS Genomic Medicine Centre
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore, Singapore
- Singapore Cancer and Stem Cell Biology Program, Duke-NUS Medical School
- Singapore Laboratory of Genome Variation Analytics, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Saumya Shekhar Jamuar
- Genetics Service, Department of Paediatrics , KK Women's and Children's Hospital
- SingHealth Duke-NUS Genomic Medicine Centre
- Paediatric Academic Clinical Programme, Duke-NUS Medical School
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore, Singapore
| | - Angeline Hwei Meeng Lai
- Lee Kong Chian School of Medicine , Nanyang Technological University
- Genetics Service, Department of Paediatrics , KK Women's and Children's Hospital
- SingHealth Duke-NUS Genomic Medicine Centre
- Paediatric Academic Clinical Programme, Duke-NUS Medical School
| | - Jeannette Lay Kuan Goh
- Genetics Service, Department of Paediatrics , KK Women's and Children's Hospital
- SingHealth Duke-NUS Genomic Medicine Centre
| |
Collapse
|
70
|
Dolzhenko E, English A, Dashnow H, De Sena Brandine G, Mokveld T, Rowell WJ, Karniski C, Kronenberg Z, Danzi MC, Cheung WA, Bi C, Farrow E, Wenger A, Chua KP, Martínez-Cerdeño V, Bartley TD, Jin P, Nelson DL, Zuchner S, Pastinen T, Quinlan AR, Sedlazeck FJ, Eberle MA. Characterization and visualization of tandem repeats at genome scale. Nat Biotechnol 2024; 42:1606-1614. [PMID: 38168995 PMCID: PMC11921810 DOI: 10.1038/s41587-023-02057-3] [Citation(s) in RCA: 43] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/06/2023] [Indexed: 01/05/2024]
Abstract
Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.
Collapse
Affiliation(s)
| | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Harriet Dashnow
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | | - Tom Mokveld
- Pacific Biosciences of California, Menlo Park, CA, USA
| | | | | | | | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Chengpeng Bi
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Emily Farrow
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron Wenger
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Khi Pin Chua
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Verónica Martínez-Cerdeño
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
- MIND Institute, UC Davis School of Medicine, Sacramento, CA, USA
| | - Trevor D Bartley
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - David L Nelson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | |
Collapse
|
71
|
Smolka M, Paulin LF, Grochowski CM, Horner DW, Mahmoud M, Behera S, Kalef-Ezra E, Gandhi M, Hong K, Pehlivan D, Scholz SW, Carvalho CMB, Proukakis C, Sedlazeck FJ. Detection of mosaic and population-level structural variants with Sniffles2. Nat Biotechnol 2024; 42:1571-1580. [PMID: 38168980 PMCID: PMC11217151 DOI: 10.1038/s41587-023-02024-y] [Citation(s) in RCA: 102] [Impact Index Per Article: 102.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 10/11/2023] [Indexed: 01/05/2024]
Abstract
Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over current methods by implementing a repeat aware clustering coupled with a fast consensus sequence and coverage-adaptive filtering. Sniffles2 is 11.8 times faster and 29% more accurate than state-of-the-art SV callers across different coverages (5-50×), sequencing technologies (ONT and HiFi) and SV types. Furthermore, Sniffles2 solves the problem of family-level to population-level SV calling to produce fully genotyped VCF files. Across 11 probands, we accurately identified causative SVs around MECP2, including highly complex alleles with three overlapping SVs. Sniffles2 also enables the detection of mosaic SVs in bulk long-read data. As a result, we identified multiple mosaic SVs in brain tissue from a patient with multiple system atrophy. The identified SV showed a remarkable diversity within the cingulate cortex, impacting both genes involved in neuron function and repetitive elements.
Collapse
Affiliation(s)
- Moritz Smolka
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Luis F Paulin
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | | | - Dominic W Horner
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Sairam Behera
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Ester Kalef-Ezra
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Mira Gandhi
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Karl Hong
- Bionano Genomics, San Diego, CA, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Division of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
72
|
Soto DC, Uribe-Salazar JM, Kaya G, Valdarrago R, Sekar A, Haghani NK, Hino K, La GN, Mariano NAF, Ingamells C, Baraban AE, Turner TN, Green ED, Simó S, Quon G, Andrés AM, Dennis MY. Gene expansions contributing to human brain evolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.26.615256. [PMID: 39386494 PMCID: PMC11463660 DOI: 10.1101/2024.09.26.615256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Genomic drivers of human-specific neurological traits remain largely undiscovered. Duplicated genes expanded uniquely in the human lineage likely contributed to brain evolution, including the increased complexity of synaptic connections between neurons and the dramatic expansion of the neocortex. Discovering duplicate genes is challenging because the similarity of paralogs makes them prone to sequence-assembly errors. To mitigate this issue, we analyzed a complete telomere-to-telomere human genome sequence (T2T-CHM13) and identified 213 duplicated gene families likely containing human-specific paralogs (>98% identity). Positing that genes important in universal human brain features should exist with at least one copy in all modern humans and exhibit expression in the brain, we narrowed in on 362 paralogs with at least one copy across thousands of ancestrally diverse genomes and present in human brain transcriptomes. Of these, 38 paralogs co-express in gene modules enriched for autism-associated genes and potentially contribute to human language and cognition. We narrowed in on 13 duplicate gene families with human-specific paralogs that are fixed among modern humans and show convincing brain expression patterns. Using long-read DNA sequencing revealed hidden variation across 200 modern humans of diverse ancestries, uncovering signatures of selection not previously identified, including possible balancing selection of CD8B. To understand the roles of duplicated genes in brain development, we generated zebrafish CRISPR "knockout" models of nine orthologs and transiently introduced mRNA-encoding paralogs, effectively "humanizing" the larvae. Morphometric, behavioral, and single-cell RNA-seq screening highlighted, for the first time, a possible role for GPR89B in dosage-mediated brain expansion and FRMPD2B function in altered synaptic signaling, both hallmark features of the human brain. Our holistic approach provides important insights into human brain evolution as well as a resource to the community for studying additional gene expansion drivers of human brain evolution.
Collapse
Affiliation(s)
- Daniela C. Soto
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - José M. Uribe-Salazar
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Gulhan Kaya
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Ricardo Valdarrago
- Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA
| | - Aarthi Sekar
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Nicholas K. Haghani
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Keiko Hino
- Department of Cell Biology & Human Anatomy, University of California, Davis, CA 95616, USA
| | - Gabriana N. La
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Natasha Ann F. Mariano
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
- Postbaccalaureate Research Education Program, University of California, Davis, CA 95616, USA
| | - Cole Ingamells
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Aidan E. Baraban
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Tychele N. Turner
- Department of Genetics, Washington University School of Medicine, St Louis, MS, 63110, USA
| | - Eric D. Green
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD,20892, USA
| | - Sergi Simó
- Department of Cell Biology & Human Anatomy, University of California, Davis, CA 95616, USA
| | - Gerald Quon
- Genome Center, University of California, Davis, CA 95616, USA
- Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA
| | - Aida M. Andrés
- UCL Genetics Institute, Department of Genetics, Evolution and Environment, University College, London, WC1E 6BT, UK
| | - Megan Y. Dennis
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| |
Collapse
|
73
|
Logsdon GA, Ebert P, Audano PA, Loftus M, Porubsky D, Ebler J, Yilmaz F, Hallast P, Prodanov T, Yoo D, Paisie CA, Harvey WT, Zhao X, Martino GV, Henglin M, Munson KM, Rabbani K, Chin CS, Gu B, Ashraf H, Austine-Orimoloye O, Balachandran P, Bonder MJ, Cheng H, Chong Z, Crabtree J, Gerstein M, Guethlein LA, Hasenfeld P, Hickey G, Hoekzema K, Hunt SE, Jensen M, Jiang Y, Koren S, Kwon Y, Li C, Li H, Li J, Norman PJ, Oshima KK, Paten B, Phillippy AM, Pollock NR, Rausch T, Rautiainen M, Scholz S, Song Y, Söylev A, Sulovari A, Surapaneni L, Tsapalou V, Zhou W, Zhou Y, Zhu Q, Zody MC, Mills RE, Devine SE, Shi X, Talkowski ME, Chaisson MJP, Dilthey AT, Konkel MK, Korbel JO, Lee C, Beck CR, Eichler EE, Marschall T. Complex genetic variation in nearly complete human genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614721. [PMID: 39372794 PMCID: PMC11451754 DOI: 10.1101/2024.09.24.614721] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here, we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (130 Mbp median continuity), closing 92% of all previous assembly gaps1,2 and reaching telomere-to-telomere (T2T) status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1/SMN2, NBPF8, and AMY1/AMY2, and fully resolve 1,852 complex structural variants (SVs). In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite high-order repeat (HOR) array length and characterize the pattern of mobile element insertions into α-satellite HOR arrays. While most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference1 significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference3 to a median quality value (QV) of 45. Using this approach, 26,115 SVs per sample are detected, substantially increasing the number of SVs now amenable to downstream disease association studies.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Perelman School of Medicine, University of Pennsylvania, Department of Genetics, Epigenetics Institute, Philadelphia, PA, USA
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Mark Loftus
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Timofey Prodanov
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Carolyn A Paisie
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Gianni V Martino
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
- Medical University of South Carolina, College of Graduate Studies, Charleston, SC, USA
| | - Mir Henglin
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Keon Rabbani
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | - Bida Gu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Hufsah Ashraf
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Olanrewaju Austine-Orimoloye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | | | - Marc Jan Bonder
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands; Oncode Institute, Utrecht, The Netherlands
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center, Heidelberg, Germany
| | - Haoyu Cheng
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Zechen Chong
- Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama, Birmingham, AL, USA
| | - Jonathan Crabtree
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Lisbeth A Guethlein
- Department of Structural Biology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Matthew Jensen
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Yunzhe Jiang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Youngjun Kwon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Chong Li
- Temple University, Department of Computer and Information Sciences, College of Science and Technology, Philadelphia, PA, USA
- Temple University, Institute for Genomics and Evolutionary Medicine, Philadelphia, PA, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Jiaqi Li
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Paul J Norman
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Department of Immunology and Microbiology, University of Colorado School of Medicine, Aurora, CO, USA
| | - Keisuke K Oshima
- Perelman School of Medicine, University of Pennsylvania, Department of Genetics, Epigenetics Institute, Philadelphia, PA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nicholas R Pollock
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Mikko Rautiainen
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Stephan Scholz
- Institute of Medical Microbiology and Hospital Hygiene, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Yuwei Song
- Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama, Birmingham, AL, USA
| | - Arda Söylev
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Vasiliki Tsapalou
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Weichen Zhou
- Department of Computational Medicine & Bioinformatics, University of Michigan, MI, USA
| | - Ying Zhou
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Stanford Health Care, Palo Alto, CA, USA
| | | | - Ryan E Mills
- Department of Computational Medicine & Bioinformatics, University of Michigan, MI, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Xinghua Shi
- Temple University, Department of Computer and Information Sciences, College of Science and Technology, Philadelphia, PA, USA
- Temple University, Institute for Genomics and Evolutionary Medicine, Philadelphia, PA, USA
| | - Mike E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Alexander T Dilthey
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
- Institute of Medical Microbiology and Hospital Hygiene, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Miriam K Konkel
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
74
|
Mazein I, Rougny A, Mazein A, Henkel R, Gütebier L, Michaelis L, Ostaszewski M, Schneider R, Satagopam V, Jensen LJ, Waltemath D, Wodke JAH, Balaur I. Graph databases in systems biology: a systematic review. Brief Bioinform 2024; 25:bbae561. [PMID: 39565895 PMCID: PMC11578065 DOI: 10.1093/bib/bbae561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 09/28/2024] [Accepted: 10/21/2024] [Indexed: 11/22/2024] Open
Abstract
Graph databases are becoming increasingly popular across scientific disciplines, being highly suitable for storing and connecting complex heterogeneous data. In systems biology, they are used as a backend solution for biological data repositories, ontologies, networks, pathways, and knowledge graph databases. In this review, we analyse all publications using or mentioning graph databases retrieved from PubMed and PubMed Central full-text search, focusing on the top 16 available graph databases, Publications are categorized according to their domain and application, focusing on pathway and network biology and relevant ontologies and tools. We detail different approaches and highlight the advantages of outstanding resources, such as UniProtKB, Disease Ontology, and Reactome, which provide graph-based solutions. We discuss ongoing efforts of the systems biology community to standardize and harmonize knowledge graph creation and the maintenance of integrated resources. Outlining prospects, including the use of graph databases as a way of communication between biological data repositories, we conclude that efficient design, querying, and maintenance of graph databases will be key for knowledge generation in systems biology and other research fields with heterogeneous data.
Collapse
Affiliation(s)
- Ilya Mazein
- Medical Informatics Laboratory, University Medicine Greifswald, Walther-Rathenau-Straße 48, Greifswald 17475, Germany
| | - Adrien Rougny
- Luxembourg Centre for Systems Biology, University of Luxembourg, 6 Avenue du Swing, Belvaux L-4367, Luxembourg
| | - Alexander Mazein
- Luxembourg Centre for Systems Biology, University of Luxembourg, 6 Avenue du Swing, Belvaux L-4367, Luxembourg
| | - Ron Henkel
- Medical Informatics Laboratory, University Medicine Greifswald, Walther-Rathenau-Straße 48, Greifswald 17475, Germany
| | - Lea Gütebier
- Medical Informatics Laboratory, University Medicine Greifswald, Walther-Rathenau-Straße 48, Greifswald 17475, Germany
| | - Lea Michaelis
- Medical Informatics Laboratory, University Medicine Greifswald, Walther-Rathenau-Straße 48, Greifswald 17475, Germany
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biology, University of Luxembourg, 6 Avenue du Swing, Belvaux L-4367, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biology, University of Luxembourg, 6 Avenue du Swing, Belvaux L-4367, Luxembourg
| | - Venkata Satagopam
- Luxembourg Centre for Systems Biology, University of Luxembourg, 6 Avenue du Swing, Belvaux L-4367, Luxembourg
| | - Lars Juhl Jensen
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Grønnegårdsvej 15, 1870 Frederiksberg C, Denmark
| | - Dagmar Waltemath
- Medical Informatics Laboratory, University Medicine Greifswald, Walther-Rathenau-Straße 48, Greifswald 17475, Germany
| | - Judith A H Wodke
- Medical Informatics Laboratory, University Medicine Greifswald, Walther-Rathenau-Straße 48, Greifswald 17475, Germany
| | - Irina Balaur
- Luxembourg Centre for Systems Biology, University of Luxembourg, 6 Avenue du Swing, Belvaux L-4367, Luxembourg
| |
Collapse
|
75
|
Ma Z, Zuo T, Frey N, Rangrez AY. A systematic framework for understanding the microbiome in human health and disease: from basic principles to clinical translation. Signal Transduct Target Ther 2024; 9:237. [PMID: 39307902 PMCID: PMC11418828 DOI: 10.1038/s41392-024-01946-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 07/03/2024] [Accepted: 08/01/2024] [Indexed: 09/26/2024] Open
Abstract
The human microbiome is a complex and dynamic system that plays important roles in human health and disease. However, there remain limitations and theoretical gaps in our current understanding of the intricate relationship between microbes and humans. In this narrative review, we integrate the knowledge and insights from various fields, including anatomy, physiology, immunology, histology, genetics, and evolution, to propose a systematic framework. It introduces key concepts such as the 'innate and adaptive genomes', which enhance genetic and evolutionary comprehension of the human genome. The 'germ-free syndrome' challenges the traditional 'microbes as pathogens' view, advocating for the necessity of microbes for health. The 'slave tissue' concept underscores the symbiotic intricacies between human tissues and their microbial counterparts, highlighting the dynamic health implications of microbial interactions. 'Acquired microbial immunity' positions the microbiome as an adjunct to human immune systems, providing a rationale for probiotic therapies and prudent antibiotic use. The 'homeostatic reprogramming hypothesis' integrates the microbiome into the internal environment theory, potentially explaining the change in homeostatic indicators post-industrialization. The 'cell-microbe co-ecology model' elucidates the symbiotic regulation affecting cellular balance, while the 'meta-host model' broadens the host definition to include symbiotic microbes. The 'health-illness conversion model' encapsulates the innate and adaptive genomes' interplay and dysbiosis patterns. The aim here is to provide a more focused and coherent understanding of microbiome and highlight future research avenues that could lead to a more effective and efficient healthcare system.
Collapse
Affiliation(s)
- Ziqi Ma
- Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg, Germany.
- DZHK (German Centre for Cardiovascular Research), partner site Heidelberg/Mannheim, Heidelberg, Germany.
| | - Tao Zuo
- Key Laboratory of Human Microbiome and Chronic Diseases (Sun Yat-sen University), Ministry of Education, Guangzhou, China
- Guangdong Institute of Gastroenterology, The Sixth Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Norbert Frey
- Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg, Germany.
- DZHK (German Centre for Cardiovascular Research), partner site Heidelberg/Mannheim, Heidelberg, Germany.
| | - Ashraf Yusuf Rangrez
- Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg, Germany.
- DZHK (German Centre for Cardiovascular Research), partner site Heidelberg/Mannheim, Heidelberg, Germany.
| |
Collapse
|
76
|
Matthews CA, Watson-Haigh NS, Burton RA, Sheppard AE. A gentle introduction to pangenomics. Brief Bioinform 2024; 25:bbae588. [PMID: 39552065 PMCID: PMC11570541 DOI: 10.1093/bib/bbae588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 09/12/2024] [Accepted: 11/01/2024] [Indexed: 11/19/2024] Open
Abstract
Pangenomes have emerged in response to limitations associated with traditional linear reference genomes. In contrast to a traditional reference that is (usually) assembled from a single individual, pangenomes aim to represent all of the genomic variation found in a group of organisms. The term 'pangenome' is currently used to describe multiple different types of genomic information, and limited language is available to differentiate between them. This is frustrating for researchers working in the field and confusing for researchers new to the field. Here, we provide an introduction to pangenomics relevant to both prokaryotic and eukaryotic organisms and propose a formalization of the language used to describe pangenomes (see the Glossary) to improve the specificity of discussion in the field.
Collapse
Affiliation(s)
- Chelsea A Matthews
- School of Agriculture, Food and Wine, Waite Campus, University of Adelaide, Urrbrae, South Australia 5064, Australia
| | - Nathan S Watson-Haigh
- Australian Genome Research Facility, Victorian Comprehensive Cancer Centre, Melbourne, Victoria 3000, Australia
- South Australian Genomics Centre, SAHMRI, North Terrace, Adelaide, South Australia 5000, Australia
- Alkahest Inc., San Carlos, CA 94070, United States
| | - Rachel A Burton
- School of Agriculture, Food and Wine, Waite Campus, University of Adelaide, Urrbrae, South Australia 5064, Australia
| | - Anna E Sheppard
- School of Biological Sciences, University of Adelaide, Adelaide, South Australia 5005, Australia
| |
Collapse
|
77
|
Hung TK, Liu WC, Lai SK, Chuang HW, Lee YC, Lin HY, Hsu CL, Chen CY, Yang YC, Hsu JS, Chen PL. Genetic complexity of killer-cell immunoglobulin-like receptor genes in human pangenome assemblies. Genome Res 2024; 34:1211-1223. [PMID: 39251346 PMCID: PMC11444179 DOI: 10.1101/gr.278358.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Accepted: 08/14/2024] [Indexed: 09/11/2024]
Abstract
The killer-cell immunoglobulin-like receptor (KIR) gene complex, a highly polymorphic region of the human genome that encodes proteins involved in immune responses, poses strong challenges in genotyping owing to its remarkable genetic diversity and structural intricacy. Accurate analysis of KIR alleles, including their structural variations, is crucial for understanding their roles in various immune responses. Leveraging the high-quality genome assemblies from the Human Pangenome Reference Consortium (HPRC), we present a novel bioinformatic tool, the structural KIR annoTator (SKIRT), to investigate gene diversity and facilitate precise KIR allele analysis. In 47 HPRC-phased assemblies, SKIRT identifies a recurrent novel KIR2DS4/3DL1 fusion gene in the paternal haplotype of HG02630 and maternal haplotype of NA19240. Additionally, SKIRT accurately identifies eight structural variants and 15 novel nonsynonymous alleles, all of which are independently validated using short-read data or quantitative polymerase chain reaction. Our study has discovered a total of 570 novel alleles, among which eight haplotypes harbor at least one KIR gene duplication, six haplotypes have lost at least one framework gene, and 75 out of 94 haplotypes (79.8%) carry at least five novel alleles, thus confirming KIR genetic diversity. These findings are pivotal in providing insights into KIR gene diversity and serve as a solid foundation for understanding the functional consequences of KIR structural variations. High-resolution genome assemblies offer unprecedented opportunities to explore polymorphic regions that are challenging to investigate using short-read sequencing methods. The SKIRT pipeline emerges as a highly efficient tool, enabling the comprehensive detection of the complete spectrum of KIR alleles within human genome assemblies.
Collapse
Affiliation(s)
- Tsung-Kai Hung
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei 100233, Taiwan
| | - Wan-Chi Liu
- Department of Clinical Laboratory Sciences and Medical Biotechnology, College of Medicine, National Taiwan University, Taipei 100229, Taiwan
| | - Sheng-Kai Lai
- Department of Medical Genetics, National Taiwan University Hospital, Taipei 100229, Taiwan
- Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei 10617, Taiwan
| | - Hui-Wen Chuang
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei 100233, Taiwan
| | - Yi-Che Lee
- Department of Clinical Laboratory Sciences and Medical Biotechnology, College of Medicine, National Taiwan University, Taipei 100229, Taiwan
| | - Hong-Ye Lin
- Department of Biomechatronics Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - Chia-Lang Hsu
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei 100233, Taiwan
- Department of Medical Research, National Taiwan University Hospital, Taipei 100229, Taiwan
| | - Chien-Yu Chen
- Department of Biomechatronics Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - Ya-Chien Yang
- Department of Clinical Laboratory Sciences and Medical Biotechnology, College of Medicine, National Taiwan University, Taipei 100229, Taiwan;
- Department of Laboratory Medicine, National Taiwan University Hospital, Taipei 100229, Taiwan
| | - Jacob Shujui Hsu
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei 100233, Taiwan;
| | - Pei-Lung Chen
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei 100233, Taiwan;
- Department of Medical Genetics, National Taiwan University Hospital, Taipei 100229, Taiwan
- Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei 10617, Taiwan
- Graduate Institute of Clinical Medicine, College of Medicine, National Taiwan University, Taipei 100229, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, National Taiwan University Hospital, Taipei 100229, Taiwan
| |
Collapse
|
78
|
Höps W, Rausch T, Jendrusch M, Korbel JO, Sedlazeck FJ. Impact and characterization of serial structural variations across humans and great apes. Nat Commun 2024; 15:8007. [PMID: 39266513 PMCID: PMC11393467 DOI: 10.1038/s41467-024-52027-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 08/23/2024] [Indexed: 09/14/2024] Open
Abstract
Modern sequencing technology enables the systematic detection of complex structural variation (SV) across genomes. However, extensive DNA rearrangements arising through a series of mutations, a phenomenon we refer to as serial SV (sSV), remain underexplored, posing a challenge for SV discovery. Here, we present NAHRwhals ( https://github.com/WHops/NAHRwhals ), a method to infer repeat-mediated series of SVs in long-read genomic assemblies. Applying NAHRwhals to haplotype-resolved human genomes from 28 individuals reveals 37 sSV loci of various length and complexity. These sSVs explain otherwise cryptic variation in medically relevant regions such as the TPSAB1 gene, 8p23.1, 22q11 and Sotos syndrome regions. Comparisons with great ape assemblies indicate that most human sSVs formed recently, after the human-ape split, and involved non-repeat-mediated processes in addition to non-allelic homologous recombination. NAHRwhals reliably discovers and characterizes sSVs at scale and independent of species, uncovering their genomic abundance and suggesting broader implications for disease.
Collapse
Affiliation(s)
- Wolfram Höps
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Tobias Rausch
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
- Molecular Medicine Partnership Unit, European Molecular Biology Laboratory, University of Heidelberg, Heidelberg, Germany
| | - Michael Jendrusch
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| |
Collapse
|
79
|
Record CJ, Pipis M, Skorupinska M, Blake J, Poh R, Polke JM, Eggleton K, Nanji T, Zuchner S, Cortese A, Houlden H, Rossor AM, Laura M, Reilly MM. Whole genome sequencing increases the diagnostic rate in Charcot-Marie-Tooth disease. Brain 2024; 147:3144-3156. [PMID: 38481354 PMCID: PMC11370804 DOI: 10.1093/brain/awae064] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 01/17/2024] [Accepted: 02/07/2024] [Indexed: 09/04/2024] Open
Abstract
Charcot-Marie-Tooth disease (CMT) is one of the most common and genetically heterogeneous inherited neurological diseases, with more than 130 disease-causing genes. Whole genome sequencing (WGS) has improved diagnosis across genetic diseases, but the diagnostic impact in CMT is yet to be fully reported. We present the diagnostic results from a single specialist inherited neuropathy centre, including the impact of WGS diagnostic testing. Patients were assessed at our specialist inherited neuropathy centre from 2009 to 2023. Genetic testing was performed using single gene testing, next-generation sequencing targeted panels, research whole exome sequencing and WGS and, latterly, WGS through the UK National Health Service. Variants were assessed using the American College of Medical Genetics and Genomics and Association for Clinical Genomic Science criteria. Excluding patients with hereditary ATTR amyloidosis, 1515 patients with a clinical diagnosis of CMT and related disorders were recruited. In summary, 621 patients had CMT1 (41.0%), 294 CMT2 (19.4%), 205 intermediate CMT (CMTi, 13.5%), 139 hereditary motor neuropathy (HMN, 9.2%), 93 hereditary sensory neuropathy (HSN, 6.1%), 38 sensory ataxic neuropathy (2.5%), 72 hereditary neuropathy with liability to pressure palsies (HNPP, 4.8%) and 53 'complex' neuropathy (3.5%). Overall, a genetic diagnosis was reached in 76.9% (1165/1515). A diagnosis was most likely in CMT1 (96.8%, 601/621), followed by CMTi (81.0%, 166/205) and then HSN (69.9%, 65/93). Diagnostic rates remained less than 50% in CMT2, HMN and complex neuropathies. The most common genetic diagnosis was PMP22 duplication (CMT1A; 505/1165, 43.3%), then GJB1 (CMTX1; 151/1165, 13.0%), PMP22 deletion (HNPP; 72/1165, 6.2%) and MFN2 (CMT2A; 46/1165, 3.9%). We recruited 233 cases to the UK 100 000 Genomes Project (100KGP), of which 74 (31.8%) achieved a diagnosis; 28 had been otherwise diagnosed since recruitment, leaving a true diagnostic rate of WGS through the 100KGP of 19.7% (46/233). However, almost half of the solved cases (35/74) received a negative report from the study, and the diagnosis was made through our research access to the WGS data. The overall diagnostic uplift of WGS for the entire cohort was 3.5%. Our diagnostic rate is the highest reported from a single centre and has benefitted from the use of WGS, particularly access to the raw data. However, almost one-quarter of all cases remain unsolved, and a new reference genome and novel technologies will be important to narrow the 'diagnostic gap'.
Collapse
Affiliation(s)
- Christopher J Record
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| | - Menelaos Pipis
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| | - Mariola Skorupinska
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| | - Julian Blake
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
- Department of Clinical Neurophysiology, Norfolk and Norwich University Hospital, Norwich NR4 7UY, UK
| | - Roy Poh
- Neurogenetics Laboratory, National Hospital for Neurology and Neurosurgery, London WC1N 3BG, UK
| | - James M Polke
- Neurogenetics Laboratory, National Hospital for Neurology and Neurosurgery, London WC1N 3BG, UK
| | - Kelly Eggleton
- Neurogenetics Laboratory, National Hospital for Neurology and Neurosurgery, London WC1N 3BG, UK
| | - Tina Nanji
- Neurogenetics Laboratory, National Hospital for Neurology and Neurosurgery, London WC1N 3BG, UK
| | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL 33136, USA
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL 33136, USA
| | - Andrea Cortese
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| | - Henry Houlden
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| | - Alexander M Rossor
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| | - Matilde Laura
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| | - Mary M Reilly
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| |
Collapse
|
80
|
IGVF Consortium, Writing group (ordered by contribution), Engreitz JM, Lawson HA, Singh H, Starita LM, Hon GC, Carter H, Sahni N, Reddy TE, Lin X, Li Y, Munshi NV, Chahrour MH, Boyle AP, Hitz BC, Mortazavi A, Craven M, Mohlke KL, Pinello L, Wang T, Steering Committee Co-Chairs (alphabetical by last name), Kundaje A, Yue F, Code of Conduct Committee (alphabetical by last name), Cody S, Farrell NP, Love MI, Muffley LA, Pazin MJ, Reese F, Van Buren E, Working Group and Focus Group Co-Chairs (alphabetical by last name), Catalog, Dey KK, Characterization, Kircher M, Computational Analysis, Modeling, and Prediction, Ma J, Radivojac P, Project Design, Balliu B, Mapping, Williams BA, Networks, Huangfu D, Standards and Pipelines, Cardiometabolic, Park CY, Quertermous T, Cellular Programs and Networks, Das J, Coding Variants, Calderwood MA, Fowler DM, Vidal M, CRISPR, Ferreira L, Defining and Systematizing Function, Mooney SD, Pejaver V, Enumerating Variants, Zhao J, Evolution, Gazal S, Koch E, Reilly SK, Sunyaev S, Imaging, Carpenter AE, Immune, Buenrostro JD, Leslie CS, Savage RE, Impact on Diverse Populations, Giric S, iPSC, Luo C, Plath K, MPRA, Barrera A, Schubach M, Noncoding Variants, Gschwind AR, Moore JE, Neuro, Ahituv N, Phenotypic Impact and Function, Yi SS, QTL/Statgen, Hallgrimsdottir I, Gaulton KJ, Sakaue S, Single Cell, Booeshaghi S, Mattei E, Nair S, Pachter L, Wang AT, Characterization Awards (contact PI, MPIs (alphabetical by last name), other members (alphabetical by last name)), et alIGVF Consortium, Writing group (ordered by contribution), Engreitz JM, Lawson HA, Singh H, Starita LM, Hon GC, Carter H, Sahni N, Reddy TE, Lin X, Li Y, Munshi NV, Chahrour MH, Boyle AP, Hitz BC, Mortazavi A, Craven M, Mohlke KL, Pinello L, Wang T, Steering Committee Co-Chairs (alphabetical by last name), Kundaje A, Yue F, Code of Conduct Committee (alphabetical by last name), Cody S, Farrell NP, Love MI, Muffley LA, Pazin MJ, Reese F, Van Buren E, Working Group and Focus Group Co-Chairs (alphabetical by last name), Catalog, Dey KK, Characterization, Kircher M, Computational Analysis, Modeling, and Prediction, Ma J, Radivojac P, Project Design, Balliu B, Mapping, Williams BA, Networks, Huangfu D, Standards and Pipelines, Cardiometabolic, Park CY, Quertermous T, Cellular Programs and Networks, Das J, Coding Variants, Calderwood MA, Fowler DM, Vidal M, CRISPR, Ferreira L, Defining and Systematizing Function, Mooney SD, Pejaver V, Enumerating Variants, Zhao J, Evolution, Gazal S, Koch E, Reilly SK, Sunyaev S, Imaging, Carpenter AE, Immune, Buenrostro JD, Leslie CS, Savage RE, Impact on Diverse Populations, Giric S, iPSC, Luo C, Plath K, MPRA, Barrera A, Schubach M, Noncoding Variants, Gschwind AR, Moore JE, Neuro, Ahituv N, Phenotypic Impact and Function, Yi SS, QTL/Statgen, Hallgrimsdottir I, Gaulton KJ, Sakaue S, Single Cell, Booeshaghi S, Mattei E, Nair S, Pachter L, Wang AT, Characterization Awards (contact PI, MPIs (alphabetical by last name), other members (alphabetical by last name)), UM1HG011966, Shendure J, Agarwal V, Blair A, Chalkiadakis T, Chardon FM, Dash PM, Deng C, Hamazaki N, Keukeleire P, Kubo C, Lalanne JB, Maass T, Martin B, McDiarmid TA, Nobuhara M, Page NF, Regalado S, Sims J, Ushiki A, UM1HG011969, Best SM, Boyle G, Camp N, Casadei S, Da EY, Dawood M, Dawson SC, Fayer S, Hamm A, James RG, Jarvik GP, McEwen AE, Moore N, Pendyala S, Popp NA, Post M, Rubin AF, Smith NT, Stone J, Tejura M, Wang ZR, Wheelock MK, Woo I, Zapp BD, UM1HG011972, Amgalan D, Aradhana A, Arana SM, Bassik MC, Bauman JR, Bhattacharya A, Cai XS, Chen Z, Conley S, Deshpande S, Doughty BR, Du PP, Galante JA, Gifford C, Greenleaf WJ, Guo K, Gupta R, Isobe S, Jagoda E, Jain N, Jones H, Kang HY, Kim SH, Kim Y, Klemm S, Kundu R, Kundu S, Lago-Docampo M, Lee-Yow YC, Levin-Konigsberg R, Li DY, Lindenhofer D, Ma XR, Marinov GK, Martyn GE, McCreery CV, Metzl-Raz E, Monteiro JP, Montgomery MT, Mualim KS, Munger C, Munson G, Nguyen TC, Nguyen T, Palmisano BT, Pampari A, Rabinovitch M, Ramste M, Ray J, Roy KR, Rubio OM, Schaepe JM, Schnitzler G, Schreiber J, Sharma D, Sheth MU, Shi H, Singh V, Sinha R, Steinmetz LM, Tan J, Tan A, Tycko J, Valbuena RC, Amiri VVP, van Kooten MJFM, Vaughan-Jackson A, Venida A, Weldy CS, Worssam MD, Xia F, Yao D, Zeng T, Zhao Q, Zhou R, UM1HG011989, Chen ZS, Cimini BA, Coppin G, Coté AG, Haghighi M, Hao T, Hill DE, Lacoste J, Laval F, Reno C, Roth FP, Singh S, Spirohn-Fitzgerald K, Taipale M, Teelucksingh T, Tixhon M, Yadav A, Yang Z, UM1HG011996, Kraus WL, Armendariz DA, Dederich AE, Gogate A, El Hayek L, Goetsch SC, Kaur K, Kim HB, McCoy MK, Nzima MZ, Pinzón-Arteaga CA, Posner BA, Schmitz DA, Sivakumar S, Sundarrajan A, Wang L, Wang Y, Wu J, Xu L, Xu J, Yu L, Zhang Y, Zhao H, Zhou Q, UM1HG012003, Won H, Bell JL, Broadaway KA, Degner KN, Etheridge AS, Koller BH, Mah W, Mu W, Ritola KD, Rosen JD, Schoenrock SA, Sharp RA, UM1HG012010, Bauer D, Lettre G, Sherwood R, Becerra B, Blaine LJ, Che E, Francoeur MJ, Gibbs EN, Kim N, King EM, Kleinstiver BP, Lecluze E, Li Z, Patel ZM, Phan QV, Ryu J, Starr ML, Wu T, UM1HG012053, Gersbach CA, Crawford GE, Allen AS, Majoros WH, Iglesias N, Rai R, Venukuttan R, Li B, Anglen T, Bounds LR, Hamilton MC, Liu S, McCutcheon SR, McRoberts Amador CD, Reisman SJ, ter Weele MA, Bodle JC, Streff HL, Siklenka K, Strouse K, Mapping Awards (contact PI, MPIs (alphabetical by last name), other members (alphabetical by last name)), UM1HG011986, Bernstein BE, Babu J, Corona GB, Dong K, Duarte FM, Durand NC, Epstein CB, Fan K, Gaskell E, Hall AW, Ham AM, Knudson MK, Shoresh N, Wekhande S, White CM, Xi W, UM1HG012076, Satpathy AT, Corces MR, Chang SH, Chin IM, Gardner JM, Gardell ZA, Gutierrez JC, Johnson AW, Kampman L, Kasowski M, Lareau CA, Liu V, Ludwig LS, McGinnis CS, Menon S, Qualls A, Sandor K, Turner AW, Ye CJ, Yin Y, Zhang W, UM1HG012077, Wold BJ, Carilli M, Cheong D, Filibam G, Green K, Kawauchi S, Kim C, Liang H, Loving R, Luebbert L, MacGregor G, Merchan AG, Rebboah E, Rezaie N, Sakr J, Sullivan DK, Swarna N, Trout D, Upchurch S, Weber R, Predictive Modeling Awards (contact PI, MPIs (alphabetical by last name), other members (alphabetical by last name)), U01HG011952, Castro CP, Chou E, Feng F, Guerra A, Huang Y, Jiang L, Liu J, Mills RE, Qian W, Qin T, Sartor MA, Sherpa RN, Wang J, Wang Y, Welch JD, Zhang Z, Zhao N, U01HG011967, Mukherjee S, Page CD, Clarke S, Doty RW, Duan Y, Gordan R, Ko KY, Li S, Li B, Thomson A, U01HG012009, Raychaudhuri S, Price A, Ali TA, Dey KK, Durvasula A, Kellis M, U01HG012022, Iakoucheva LM, Kakati T, Chen Y, Benazouz M, Jain S, Zeiberg D, De Paolis Kaluza MC, Velyunskiy M, U01HG012039, Gasch A, Huang K, Jin Y, Lu Q, Miao J, Ohtake M, Scopel E, Steiner RD, Sverchkov Y, U01HG012064, Weng Z, Garber M, Fu Y, Haas N, Li X, Phalke N, Shan SC, Shedd N, Yu T, Zhang Y, Zhou H, U01HG012069, Battle A, Jerby L, Kotler E, Kundu S, Marderstein AR, Montgomery SB, Nigam A, Padhi EM, Patel A, Pritchard J, Raine I, Ramalingam V, Rodrigues KB, Schreiber JM, Singhal A, Sinha R, Wang AT, Network Projects (contact PI, MPIs (alphabetical by last name), other members (alphabetical by last name)), U01HG012041, Abundis M, Bisht D, Chakraborty T, Fan J, Hall DR, Rarani ZH, Jain AK, Kaundal B, Keshari S, McGrail D, Pease NA, Yi VF, U01HG012047, Wu H, Kannan S, Song H, Cai J, Gao Z, Kurzion R, Leu JI, Li F, Liang D, Ming GL, Musunuru K, Qiu Q, Shi J, Su Y, Tishkoff S, Xie N, Yang Q, Yang W, Zhang H, Zhang Z, U01HG012051, Beer MA, Hadjantonakis AK, Adeniyi S, Cho H, Cutler R, Glenn RA, Godovich D, Hu N, Jovanic S, Luo R, Oh JW, Razavi-Mohseni M, Shigaki D, Sidoli S, Vierbuchen T, Wang X, Williams B, Yan J, Yang D, Yang Y, U01HG012059, Sander M, Gaulton KJ, Ren B, Bartosik W, Indralingam HS, Klie A, Mummey H, Okino ML, Wang G, Zemke NR, Zhang K, Zhu H, U01HG012079, Zaitlen N, Ernst J, Langerman J, Li T, Sun Y, U01HG012103, Rudensky AY, Periyakoil PK, Gao VR, Smith MH, Thomas NM, Donlin LT, Lakhanpal A, Southard KM, Ardy RC, Data and Administrative Coordinating Center Awards (contact PI, MPIs (alphabetical by last name), other members (alphabetical by last name)), U24HG012012, Cherry JM, Gerstein MB, Andreeva K, Assis PR, Borsari B, Douglass E, Dong S, Gabdank I, Graham K, Jolanki O, Jou J, Kagda MS, Lee JW, Li M, Lin K, Miyasato SR, Rozowsky J, Small C, Spragins E, Tanaka FY, Whaling IM, Youngworth IA, Sloan CA, U24HG012070, Belter E, Chen X, Chisholm RL, Dickson P, Fan C, Fulton L, Li D, Lindsay T, Luan Y, Luo Y, Lyu H, Ma X, Macias-Velasco J, Miga KH, Quaid K, Stitziel N, Stranger BE, Tomlinson C, Wang J, Zhang W, Zhang B, Zhao G, Zhuo X, IGVF Affiliate Member Projects (contact PIs, other members (alphabetical by last name)), Brennand lab, Brennand K, Ciccia lab, Ciccia A, Hayward SB, Huang JW, Leuzzi G, Taglialatela A, Thakar T, Vaitsiankova A, Dey lab, Dey KK, Ali TA, Gazal lab, Kim A, Grimes lab, Grimes HL, Salomonis N, Gupta lab, Gupta R, Fang S, Lee-Kim V, Heinig lab, Heinig M, Losert C, Jones lab, Jones TR, Donnard E, Murphy M, Roberts E, Song S, Moore lab, Mostafavi lab, Mostafavi S, Sasse A, Spiro A, Pennacchio and Visel lab, Pennacchio LA, Kato M, Kosicki M, Mannion B, Slaven N, Visel A, Pollard lab, Pollard KS, Drusinsky S, Whalen S, Ray lab, Ray J, Harten IA, Ho CH, Reilly lab, Sanjana lab, Sanjana NE, Caragine C, Morris JA, Seruggia lab, Seruggia D, Kutschat AP, Wittibschlager S, Xu lab, Xu H, Fu R, He W, Zhang L, Yi lab, Osorio D, NHGRI Program Management (alphabetical by last name), Bly Z, Calluori S, Gilchrist DA, Hutter CM, Morris SA, Samer EK. Deciphering the impact of genomic variation on function. Nature 2024; 633:47-57. [PMID: 39232149 PMCID: PMC11973978 DOI: 10.1038/s41586-024-07510-0] [Show More Authors] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 05/02/2024] [Indexed: 09/06/2024]
Abstract
Our genomes influence nearly every aspect of human biology-from molecular and cellular functions to phenotypes in health and disease. Studying the differences in DNA sequence between individuals (genomic variation) could reveal previously unknown mechanisms of human biology, uncover the basis of genetic predispositions to diseases, and guide the development of new diagnostic tools and therapeutic agents. Yet, understanding how genomic variation alters genome function to influence phenotype has proved challenging. To unlock these insights, we need a systematic and comprehensive catalogue of genome function and the molecular and cellular effects of genomic variants. Towards this goal, the Impact of Genomic Variation on Function (IGVF) Consortium will combine approaches in single-cell mapping, genomic perturbations and predictive modelling to investigate the relationships among genomic variation, genome function and phenotypes. IGVF will create maps across hundreds of cell types and states describing how coding variants alter protein activity, how noncoding variants change the regulation of gene expression, and how such effects connect through gene-regulatory and protein-interaction networks. These experimental data, computational predictions and accompanying standards and pipelines will be integrated into an open resource that will catalyse community efforts to explore how our genomes influence biology and disease across populations.
Collapse
|
81
|
Kullo IJ. Promoting equity in polygenic risk assessment through global collaboration. Nat Genet 2024; 56:1780-1787. [PMID: 39103647 DOI: 10.1038/s41588-024-01843-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 06/24/2024] [Indexed: 08/07/2024]
Abstract
The long delay before genomic technologies become available in low- and middle-income countries is a concern from both scientific and ethical standpoints. Polygenic risk scores (PRSs), a relatively recent advance in genomics, could have a substantial impact on promoting health by improving disease risk prediction and guiding preventive strategies. However, clinical use of PRSs in their current forms might widen global health disparities, as their portability to diverse groups is limited. This Perspective highlights the need for global collaboration to develop and implement PRSs that perform equitably across the world. Such collaboration requires capacity building and the generation of new data in low-resource settings, the sharing of harmonized genotype and phenotype data securely across borders, novel population genetics and statistical methods to improve PRS performance, and thoughtful clinical implementation in diverse settings. All this needs to occur while considering the ethical, legal and social implications, with support from regulatory and funding agencies and policymakers.
Collapse
Affiliation(s)
- Iftikhar J Kullo
- Department of Cardiovascular Medicine and the Gonda Vascular Center, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
82
|
Ozcelik F, Dundar MS, Yildirim AB, Henehan G, Vicente O, Sánchez-Alcázar JA, Gokce N, Yildirim DT, Bingol NN, Karanfilska DP, Bertelli M, Pojskic L, Ercan M, Kellermayer M, Sahin IO, Greiner-Tollersrud OK, Tan B, Martin D, Marks R, Prakash S, Yakubi M, Beccari T, Lal R, Temel SG, Fournier I, Ergoren MC, Mechler A, Salzet M, Maffia M, Danalev D, Sun Q, Nei L, Matulis D, Tapaloaga D, Janecke A, Bown J, Cruz KS, Radecka I, Ozturk C, Nalbantoglu OU, Sag SO, Ko K, Arngrimsson R, Belo I, Akalin H, Dundar M. The impact and future of artificial intelligence in medical genetics and molecular medicine: an ongoing revolution. Funct Integr Genomics 2024; 24:138. [PMID: 39147901 DOI: 10.1007/s10142-024-01417-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 08/01/2024] [Accepted: 08/05/2024] [Indexed: 08/17/2024]
Abstract
Artificial intelligence (AI) platforms have emerged as pivotal tools in genetics and molecular medicine, as in many other fields. The growth in patient data, identification of new diseases and phenotypes, discovery of new intracellular pathways, availability of greater sets of omics data, and the need to continuously analyse them have led to the development of new AI platforms. AI continues to weave its way into the fabric of genetics with the potential to unlock new discoveries and enhance patient care. This technology is setting the stage for breakthroughs across various domains, including dysmorphology, rare hereditary diseases, cancers, clinical microbiomics, the investigation of zoonotic diseases, omics studies in all medical disciplines. AI's role in facilitating a deeper understanding of these areas heralds a new era of personalised medicine, where treatments and diagnoses are tailored to the individual's molecular features, offering a more precise approach to combating genetic or acquired disorders. The significance of these AI platforms is growing as they assist healthcare professionals in the diagnostic and treatment processes, marking a pivotal shift towards more informed, efficient, and effective medical practice. In this review, we will explore the range of AI tools available and show how they have become vital in various sectors of genomic research supporting clinical decisions.
Collapse
Affiliation(s)
- Firat Ozcelik
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | - Mehmet Sait Dundar
- Department of Electrical and Computer Engineering, Graduate School of Engineering and Sciences, Abdullah Gul University, Kayseri, Turkey
| | - A Baki Yildirim
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | - Gary Henehan
- School of Food Science and Environmental Health, Technological University of Dublin, Dublin, Ireland
| | - Oscar Vicente
- Institute for the Conservation and Improvement of Valencian Agrodiversity (COMAV), Universitat Politècnica de València, Valencia, Spain
| | - José A Sánchez-Alcázar
- Centro de Investigación Biomédica en Red: Enfermedades Raras, Centro Andaluz de Biología del Desarrollo (CABD-CSIC-Universidad Pablo de Olavide), Instituto de Salud Carlos III, Sevilla, Spain
| | - Nuriye Gokce
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | - Duygu T Yildirim
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | - Nurdeniz Nalbant Bingol
- Department of Translational Medicine, Institute of Health Sciences, Bursa Uludag University, Bursa, Turkey
| | - Dijana Plaseska Karanfilska
- Research Centre for Genetic Engineering and Biotechnology, Macedonian Academy of Sciences and Arts, Skopje, Macedonia
| | | | - Lejla Pojskic
- Institute for Genetic Engineering and Biotechnology, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Mehmet Ercan
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | - Miklos Kellermayer
- Department of Biophysics and Radiation Biology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Izem Olcay Sahin
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | | | - Busra Tan
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | - Donald Martin
- University Grenoble Alpes, CNRS, TIMC-IMAG/SyNaBi (UMR 5525), Grenoble, France
| | - Robert Marks
- Avram and Stella Goldstein-Goren Department of Biotechnology Engineering, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Satya Prakash
- Department of Biomedical Engineering, University of McGill, Montreal, QC, Canada
| | - Mustafa Yakubi
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | - Tommaso Beccari
- Department of Pharmeceutical Sciences, University of Perugia, Perugia, Italy
| | - Ratnesh Lal
- Neuroscience Research Institute, University of California, Santa Barbara, USA
| | - Sehime G Temel
- Department of Translational Medicine, Institute of Health Sciences, Bursa Uludag University, Bursa, Turkey
- Department of Medical Genetics, Bursa Uludag University Faculty of Medicine, Bursa, Turkey
- Department of Histology and Embryology, Faculty of Medicine, Bursa Uludag University, Bursa, Turkey
| | - Isabelle Fournier
- Réponse Inflammatoire et Spectrométrie de Masse-PRISM, University of Lille, Lille, France
| | - M Cerkez Ergoren
- Department of Medical Genetics, Near East University Faculty of Medicine, Nicosia, Cyprus
| | - Adam Mechler
- Department of Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, Australia
| | - Michel Salzet
- Réponse Inflammatoire et Spectrométrie de Masse-PRISM, University of Lille, Lille, France
| | - Michele Maffia
- Department of Experimental Medicine, University of Salento, Via Lecce-Monteroni, Lecce, 73100, Italy
| | - Dancho Danalev
- University of Chemical Technology and Metallurgy, Sofia, Bulgaria
| | - Qun Sun
- Department of Food Science and Technology, Sichuan University, Chengdu, China
| | - Lembit Nei
- School of Engineering Tallinn University of Technology, Tartu College, Tartu, Estonia
| | - Daumantas Matulis
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Dana Tapaloaga
- Faculty of Veterinary Medicine, University of Agronomic Sciences and Veterinary Medicine of Bucharest, Bucharest, Romania
| | - Andres Janecke
- Department of Paediatrics I, Medical University of Innsbruck, Innsbruck, Austria
- Division of Human Genetics, Medical University of Innsbruck, Innsbruck, Austria
| | - James Bown
- School of Science, Engineering and Technology, Abertay University, Dundee, UK
| | | | - Iza Radecka
- School of Science, Faculty of Science and Engineering, University of Wolverhampton, Wolverhampton, UK
| | - Celal Ozturk
- Department of Software Engineering, Erciyes University, Kayseri, Turkey
| | - Ozkan Ufuk Nalbantoglu
- Department of Computer Engineering, Engineering Faculty, Erciyes University, Kayseri, Turkey
| | - Sebnem Ozemri Sag
- Department of Medical Genetics, Bursa Uludag University Faculty of Medicine, Bursa, Turkey
| | - Kisung Ko
- Department of Medicine, College of Medicine, Chung-Ang University, Seoul, Korea
| | - Reynir Arngrimsson
- Iceland Landspitali University Hospital, University of Iceland, Reykjavik, Iceland
| | - Isabel Belo
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Hilal Akalin
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey.
| | - Munis Dundar
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey.
| |
Collapse
|
83
|
Zhu Y, Watson C, Safonova Y, Pennell M, Bankevich A. Assessing Assembly Errors in Immunoglobulin Loci: A Comprehensive Evaluation of Long-read Genome Assemblies Across Vertebrates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.19.604360. [PMID: 39091785 PMCID: PMC11291089 DOI: 10.1101/2024.07.19.604360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Long-read sequencing technologies have revolutionized genome assembly producing near-complete chromosome assemblies for numerous organisms, which are invaluable to research in many fields. However, regions with complex repetitive structure continue to represent a challenge for genome assembly algorithms, particularly in areas with high heterozygosity. Robust and comprehensive solutions for the assessment of assembly accuracy and completeness in these regions do not exist. In this study we focus on the assembly of biomedically important antibody-encoding immunoglobulin (IG) loci, which are characterized by complex duplications and repeat structures. High-quality full-length assemblies for these loci are critical for resolving haplotype-level annotations of IG genes, without which, functional and evolutionary studies of antibody immunity across vertebrates are not tractable. To address these challenges, we developed a pipeline, "CloseRead", that generates multiple assembly verification metrics for analysis and visualization. These metrics expand upon those of existing quality assessment tools and specifically target complex and highly heterozygous regions. Using CloseRead, we systematically assessed the accuracy and completeness of IG loci in publicly available assemblies of 74 vertebrate species, identifying problematic regions. We also demonstrated that inspecting assembly graphs for problematic regions can both identify the root cause of assembly errors and illuminate solutions for improving erroneous assemblies. For a subset of species, we were able to correct assembly errors through targeted reassembly. Together, our analysis demonstrated the utility of assembly assessment in improving the completeness and accuracy of IG loci across species.
Collapse
Affiliation(s)
- Yixin Zhu
- Department of Quantitative and Computational Biology and Biological Sciences, University of Southern California, Los Angeles, CA, United States
| | - Corey Watson
- Department of Biochemistry and Molecular Biology, University of Louisville School of Medicine, Louisville, KY, United States
| | - Yana Safonova
- Department of Computer Science and Engineering, Pennsylvania State University, PA, United States
| | - Matt Pennell
- Department of Quantitative and Computational Biology and Biological Sciences, University of Southern California, Los Angeles, CA, United States
| | - Anton Bankevich
- Department of Computer Science and Engineering, Pennsylvania State University, PA, United States
| |
Collapse
|
84
|
L Rocha J, Lou RN, Sudmant PH. Structural variation in humans and our primate kin in the era of telomere-to-telomere genomes and pangenomics. Curr Opin Genet Dev 2024; 87:102233. [PMID: 39042999 PMCID: PMC11695101 DOI: 10.1016/j.gde.2024.102233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/02/2024] [Accepted: 07/05/2024] [Indexed: 07/25/2024]
Abstract
Structural variants (SVs) account for the majority of base pair differences both within and between primate species. However, our understanding of inter- and intra-species SV has been historically hampered by the quality of draft primate genomes and the absence of genome resources for key taxa. Recently, advances in long-read sequencing and genome assembly have begun to radically reshape our understanding of SVs. Two landmark achievements include the publication of a human telomere-to-telomere (T2T) genome as well as the development of the first human pangenome reference. In this review, we first look back to the major works laying the foundation for these projects. We then examine the ways in which T2T genome assemblies and pangenomes are transforming our understanding of and approach to primate SV. Finally, we discuss what the future of primate SV research may look like in the era of T2T genomes and pangenomics.
Collapse
Affiliation(s)
- Joana L Rocha
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA. https://twitter.com/@joanocha
| | - Runyang N Lou
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA. https://twitter.com/@NicolasLou10
| | - Peter H Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA; Center for Computational Biology, University of California, Berkeley, Berkeley, USA.
| |
Collapse
|
85
|
Taylor DJ, Eizenga JM, Li Q, Das A, Jenike KM, Kenny EE, Miga KH, Monlong J, McCoy RC, Paten B, Schatz MC. Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References. Annu Rev Genomics Hum Genet 2024; 25:77-104. [PMID: 38663087 PMCID: PMC11451085 DOI: 10.1146/annurev-genom-021623-081639] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024]
Abstract
The Human Genome Project was an enormous accomplishment, providing a foundation for countless explorations into the genetics and genomics of the human species. Yet for many years, the human genome reference sequence remained incomplete and lacked representation of human genetic diversity. Recently, two major advances have emerged to address these shortcomings: complete gap-free human genome sequences, such as the one developed by the Telomere-to-Telomere Consortium, and high-quality pangenomes, such as the one developed by the Human Pangenome Reference Consortium. Facilitated by advances in long-read DNA sequencing and genome assembly algorithms, complete human genome sequences resolve regions that have been historically difficult to sequence, including centromeres, telomeres, and segmental duplications. In parallel, pangenomes capture the extensive genetic diversity across populations worldwide. Together, these advances usher in a new era of genomics research, enhancing the accuracy of genomic analysis, paving the path for precision medicine, and contributing to deeper insights into human biology.
Collapse
Affiliation(s)
- Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Arun Das
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Katharine M Jenike
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA;
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA;
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Jean Monlong
- Institut de Recherche en Santé Digestive, Université de Toulouse, INSERM, INRA, ENVT, UPS, Toulouse, France;
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Benedict Paten
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| |
Collapse
|
86
|
Wang C, Liu H, Li XY, Ma J, Gu Z, Feng X, Xie S, Tang BS, Chen S, Wang W, Wang J, Zhang J, Chan P. High-depth whole-genome sequencing identifies structure variants, copy number variants and short tandem repeats associated with Parkinson's disease. NPJ Parkinsons Dis 2024; 10:134. [PMID: 39043730 PMCID: PMC11266557 DOI: 10.1038/s41531-024-00722-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 05/10/2024] [Indexed: 07/25/2024] Open
Abstract
While numerous single nucleotide variants and small indels have been identified in Parkinson's disease (PD), the contribution of structural variants (SVs), copy number variants (CNVs), and short tandem repeats (STRs) remains poorly understood. Here we investigated the association using the high-depth whole-genome sequencing data from 466 Chinese PD patients and 513 controls. Totally, we identified 29,561 SVs, 32,153 CNVs, and 174,905 STRs, and found that CNV deletions were significantly enriched in the end-proportion of autosomal chromosomes in PD. After genome-wide association analysis and replication in an external cohort of 352 cases and 547 controls, we validated that the 1.6 kb-deletion neighboring MUC19, 12.4kb-deletion near RXFP1 and GGGAAA repeats in SLC2A13 were significantly associated with PD. Moreover, the MUC19 deletion and the SLC2A13 5-copy repeat reduced the penetrance of the LRRK2 G2385R variant. Moreover, genes with these variants were dosage-sensitive. These data provided novel insights into the genetic architecture of PD.
Collapse
Affiliation(s)
- Chaodong Wang
- Department of Neurology & Neurobiology, Xuanwu Hospital of Capital Medical University, National Clinical Research Center for Geriatric Diseases, Beijing, 100053, China
| | - Hankui Liu
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, 518083, China
| | - Xu-Ying Li
- Department of Neurology & Neurobiology, Xuanwu Hospital of Capital Medical University, National Clinical Research Center for Geriatric Diseases, Beijing, 100053, China
| | - Jinghong Ma
- Department of Neurology & Neurobiology, Xuanwu Hospital of Capital Medical University, National Clinical Research Center for Geriatric Diseases, Beijing, 100053, China
| | - Zhuqin Gu
- Department of Neurology & Neurobiology, Xuanwu Hospital of Capital Medical University, National Clinical Research Center for Geriatric Diseases, Beijing, 100053, China
| | - Xiuli Feng
- National Human Genome Center in Beijing, Beijing Economic-Technological Development Zone, Beijing, 100176, China
| | - Shu Xie
- National Human Genome Center in Beijing, Beijing Economic-Technological Development Zone, Beijing, 100176, China
| | - Bei-Sha Tang
- Department of Neurology, Xiangya Hospital, Central South University, State Key Laboratory of Medical Genetics, Changsha, China
| | - Shengdi Chen
- Department of Neurology, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Wei Wang
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, 518083, China
| | - Jian Wang
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, 518083, China
| | - Jianguo Zhang
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, 518083, China.
- Hebei Industrial Technology Research Institute of Genomics in Maternal & Child Health, Shijiazhuang, 050000, China.
| | - Piu Chan
- Department of Neurology & Neurobiology, Xuanwu Hospital of Capital Medical University, National Clinical Research Center for Geriatric Diseases, Beijing, 100053, China.
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, China.
- Clinical Center for Parkinson's Disease, Capital Medical University, Key Laboratory for Neurodegenerative Disease of the Ministry of Education, Beijing Key Laboratory for Parkinson's Disease, Beijing, China.
- Beijing Institute of Brain Disorders, Collaborative Innovation Center for Brain Disorders, Capital Medical University, Beijing, China.
| |
Collapse
|
87
|
Lin MJ, Langmead B, Safonova Y. IGLoo: Profiling the Immunoglobulin Heavy chain locus in Lymphoblastoid Cell Lines with PacBio High-Fidelity Sequencing reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.20.604421. [PMID: 39091872 PMCID: PMC11291057 DOI: 10.1101/2024.07.20.604421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
New high-quality human genome assemblies derived from lymphoblastoid cell lines (LCLs) provide reference genomes and pangenomes for genomics studies. However, the characteristics of LCLs pose technical challenges to profiling immunoglobulin (IG) genes. IG loci in LCLs contain a mixture of germline and somatically recombined haplotypes, making them difficult to genotype or assemble accurately. To address these challenges, we introduce IGLoo, a software tool that implements novel methods for analyzing sequence data and genome assemblies derived from LCLs. IGLoo characterizes somatic V(D)J recombination events in the sequence data and identifies the breakpoints and missing IG genes in the LCL-based assemblies. Furthermore, IGLoo implements a novel reassembly framework to improve germline assembly quality by integrating information about somatic events and population structural variantions in the IG loci. We applied IGLoo to study the assemblies from the Human Pangenome Reference Consortium, providing new insights into the mechanisms, gene usage, and patterns of V(D)J recombination, causes of assembly fragmentation in the IG heavy chain (IGH) locus, and improved representation of the IGH assemblies.
Collapse
Affiliation(s)
- Mao-Jan Lin
- Department of Computer Science, Johns Hopkins University
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University
| | - Yana Safonova
- Department of Computer Science, Johns Hopkins University
- Computer Science and Engineering Department, Pennsylvania State University
- Huck Institutes of Life Sciences, Pennsylvania State University
| |
Collapse
|
88
|
Ungar RA, Goddard PC, Jensen TD, Degalez F, Smith KS, Jin CA, Bonner DE, Bernstein JA, Wheeler MT, Montgomery SB. Impact of genome build on RNA-seq interpretation and diagnostics. Am J Hum Genet 2024; 111:1282-1300. [PMID: 38834072 PMCID: PMC11267525 DOI: 10.1016/j.ajhg.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 05/04/2024] [Accepted: 05/06/2024] [Indexed: 06/06/2024] Open
Abstract
Transcriptomics is a powerful tool for unraveling the molecular effects of genetic variants and disease diagnosis. Prior studies have demonstrated that choice of genome build impacts variant interpretation and diagnostic yield for genomic analyses. To identify the extent genome build also impacts transcriptomics analyses, we studied the effect of the hg19, hg38, and CHM13 genome builds on expression quantification and outlier detection in 386 rare disease and familial control samples from both the Undiagnosed Diseases Network and Genomics Research to Elucidate the Genetics of Rare Disease Consortium. Across six routinely collected biospecimens, 61% of quantified genes were not influenced by genome build. However, we identified 1,492 genes with build-dependent quantification, 3,377 genes with build-exclusive expression, and 9,077 genes with annotation-specific expression across six routinely collected biospecimens, including 566 clinically relevant and 512 known OMIM genes. Further, we demonstrate that between builds for a given gene, a larger difference in quantification is well correlated with a larger change in expression outlier calling. Combined, we provide a database of genes impacted by build choice and recommend that transcriptomics-guided analyses and diagnoses are cross referenced with these data for robustness.
Collapse
Affiliation(s)
- Rachel A Ungar
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA; Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Pagé C Goddard
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA; Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Tanner D Jensen
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA; Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
| | | | - Kevin S Smith
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Christopher A Jin
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA
| | - Devon E Bonner
- Department of Pediatrics, School of Medicine, Stanford University, Stanford, CA, USA; Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA
| | - Jonathan A Bernstein
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA
| | - Matthew T Wheeler
- Department of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA
| | - Stephen B Montgomery
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA; Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA; Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| |
Collapse
|
89
|
Miga KH. From complete genomes to pangenomes. Am J Hum Genet 2024; 111:1265-1268. [PMID: 38996470 PMCID: PMC11308102 DOI: 10.1016/j.ajhg.2024.05.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 07/14/2024] Open
Abstract
Highlighting the Distinguished Speakers Symposium on "The Future of Human Genetics and Genomics," this collection of articles is based on presentations at the ASHG 2023 Annual Meeting in Washington, DC, in celebration of all our field has accomplished in the past 75 years, since the founding of ASHG in 1948.
Collapse
Affiliation(s)
- Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
| |
Collapse
|
90
|
Sarawad A, Hosagoudar S, Parvatikar P. Pan-genomics: Insight into the Functional Genome, Applications, Advancements, and Challenges. Curr Genomics 2024; 26:2-14. [PMID: 39911277 PMCID: PMC11793047 DOI: 10.2174/0113892029311541240627111506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/30/2024] [Accepted: 05/29/2024] [Indexed: 02/07/2025] Open
Abstract
A pan-genome is a compilation of the common and unique genomes found in a given species. It incorporates the genetic information from all of the genomes sampled, producing a big and diverse set of genetic material. Pan-genomic analysis has various advantages over typical genomics research. It creates a vast and varied spectrum of genetic material by combining the genetic data from all the sampled genomes. Comparing pan-genomics analysis to conventional genomic research, there are a number of benefits. Although the most recent era of pan-genomic studies has used cutting-edge sequencing technology to shed fresh light on biological variety and improvement, the potential uses of pan-genomics in improvement have not yet been fully realized. Pan-genome research in various organisms has demonstrated that missing genetic components and the detection of significant Structural Variants (SVs) can be investigated using pan-genomic methods. Many individual-specific sequences have been linked to biological adaptability, phenotypic, and key economic attributes. This study aims to focus on how pangenome analysis uncovers genetic differences in various organisms, including human, and their effects on phenotypes, as well as how this might help us comprehend the diversity of species. The review also concentrated on potential problems and the prospects for future pangenome research.
Collapse
Affiliation(s)
- Akansha Sarawad
- Department of Biotechnology, Applied School of Science and Technology, BLDE (DU), Vijayapura, Karnataka, India
| | - Spoorti Hosagoudar
- Department of Biotechnology, Applied School of Science and Technology, BLDE (DU), Vijayapura, Karnataka, India
| | - Prachi Parvatikar
- Department of Biotechnology, Applied School of Science and Technology, BLDE (DU), Vijayapura, Karnataka, India
| |
Collapse
|
91
|
Coggi M, Sgarlata A, Di Donato GW, Santambrogio MD. On the optimization of GWFA algorithm: enabling real-case applications supporting alignment backtracking. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40039311 DOI: 10.1109/embc53108.2024.10781891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
The Human Pangenome Reference Consortium (HPRC) proved that pangenome graphs represent a population's genetic variability more efficiently and accurately than linear references. Graphs can intrinsically encode variations as alternative paths inside a directed set of sequence nodes connected by edges. Despite their higher complexity, graph-based genome analysis pipelines are gaining significant interest, and the first sequence-to-graph aligners have already shown improvements in semi-global alignment. However, in pangenomics studies, the global alignment of long reads is fundamental for identifying structural variations and haplotype phasing. In this context, the Graph Wavefront Alignment (GWFA) algorithm emerged as the fastest strategy for aligning long reads to genomic graphs. However, the available GWFA implementation does not support alignment backtracking, a crucial feature in real-case studies. In this paper, we propose a new open-source1 implementation of the GWFA algorithm that computes and reports the complete traceback in the standard GAF format. Our work achieves a 20× speedup in execution time compared to the state-of-the-art tool GraphAligner and competitive memory usage.
Collapse
|
92
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
93
|
Eynard SE, Klopp C, Canale-Tabet K, Marande W, Vandecasteele C, Roques C, Donnadieu C, Boone Q, Servin B, Vignal A. The black honey bee genome: insights on specific structural elements and a first step towards pangenomes. Genet Sel Evol 2024; 56:51. [PMID: 38943059 PMCID: PMC11212449 DOI: 10.1186/s12711-024-00917-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Accepted: 06/04/2024] [Indexed: 07/01/2024] Open
Abstract
BACKGROUND The honey bee reference genome, HAv3.1, was produced from a commercial line sample that was thought to have a largely dominant Apis mellifera ligustica genetic background. Apis mellifera mellifera, often referred to as the black bee, has a separate evolutionary history and is the original type in western and northern Europe. Growing interest in this subspecies for conservation and non-professional apicultural practices, together with the necessity of deciphering genome backgrounds in hybrids, triggered the necessity for a specific genome assembly. Moreover, having several high-quality genomes is becoming key for taking structural variations into account in pangenome analyses. RESULTS Pacific Bioscience technology long reads were produced from a single haploid black bee drone. Scaffolding contigs into chromosomes was done using a high-density genetic map. This allowed for re-estimation of the recombination rate, which was over-estimated in some previous studies due to mis-assemblies, which resulted in spurious inversions in the older reference genomes. The sequence continuity obtained was very high and the only limit towards continuous chromosome-wide sequences seemed to be due to tandem repeat arrays that were usually longer than 10 kb and that belonged to two main families, the 371 and 91 bp repeats, causing problems in the assembly process due to high internal sequence similarity. Our assembly was used together with the reference genome to genotype two structural variants by a pangenome graph approach with Graphtyper2. Genotypes obtained were either correct or missing, when compared to an approach based on sequencing depth analysis, and genotyping rates were 89 and 76% for the two variants. CONCLUSIONS Our new assembly for the Apis mellifera mellifera honey bee subspecies demonstrates the utility of multiple high-quality genomes for the genotyping of structural variants, with a test case on two insertions and deletions. It will therefore be an invaluable resource for future studies, for instance by including structural variants in GWAS. Having used a single haploid drone for sequencing allowed a refined analysis of very large tandem repeat arrays, raising the question of their function in the genome. High quality genome assemblies for multiple subspecies such as presented here, are crucial for emerging projects using pangenomes.
Collapse
Affiliation(s)
- Sonia E Eynard
- GenPhySE, Université de Toulouse, INRAE, INPT, INP-ENVT, Castanet Tolosan, France
| | | | - Kamila Canale-Tabet
- GenPhySE, Université de Toulouse, INRAE, INPT, INP-ENVT, Castanet Tolosan, France
| | | | | | - Céline Roques
- INRAE, US 1426, GeT-PlaGe, Genotoul, Castanet-Tolosan, France
| | | | - Quentin Boone
- GenPhySE, Université de Toulouse, INRAE, INPT, INP-ENVT, Castanet Tolosan, France
- Sigenae, MIAT, INRAE, Castanet Tolosan, France
| | - Bertrand Servin
- GenPhySE, Université de Toulouse, INRAE, INPT, INP-ENVT, Castanet Tolosan, France
| | - Alain Vignal
- GenPhySE, Université de Toulouse, INRAE, INPT, INP-ENVT, Castanet Tolosan, France.
| |
Collapse
|
94
|
Fu Y, Aganezov S, Mahmoud M, Beaulaurier J, Juul S, Treangen TJ, Sedlazeck FJ. MethPhaser: methylation-based long-read haplotype phasing of human genomes. Nat Commun 2024; 15:5327. [PMID: 38909018 PMCID: PMC11193733 DOI: 10.1038/s41467-024-49588-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 06/11/2024] [Indexed: 06/24/2024] Open
Abstract
The assignment of variants across haplotypes, phasing, is crucial for predicting the consequences, interaction, and inheritance of mutations and is a key step in improving our understanding of phenotype and disease. However, phasing is limited by read length and stretches of homozygosity along the genome. To overcome this limitation, we designed MethPhaser, a method that utilizes methylation signals from Oxford Nanopore Technologies to extend Single Nucleotide Variation (SNV)-based phasing. We demonstrate that haplotype-specific methylations extensively exist in Human genomes and the advent of long-read technologies enabled direct report of methylation signals. For ONT R9 and R10 cell line data, we increase the phase length N50 by 78%-151% at a phasing accuracy of 83.4-98.7% To assess the impact of tissue purity and random methylation signals due to inactivation, we also applied MethPhaser on blood samples from 4 patients, still showing improvements over SNV-only phasing. MethPhaser further improves phasing across HLA and multiple other medically relevant genes, improving our understanding of how mutations interact across multiple phenotypes. The concept of MethPhaser can also be extended to non-human diploid genomes. MethPhaser is available at https://github.com/treangenlab/methphaser .
Collapse
Affiliation(s)
- Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | | | - Sissel Juul
- Oxford Nanopore Technologies Inc, New York, NY, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Bioengineering, Rice University, Houston, TX, USA.
| | - Fritz J Sedlazeck
- Department of Computer Science, Rice University, Houston, TX, USA.
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
95
|
Henglin M, Ghareghani M, Harvey W, Porubsky D, Koren S, Eichler EE, Ebert P, Marschall T. Phasing Diploid Genome Assembly Graphs with Single-Cell Strand Sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.15.580432. [PMID: 38529499 PMCID: PMC10962706 DOI: 10.1101/2024.02.15.580432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Haplotype information is crucial for biomedical and population genetics research. However, current strategies to produce de-novo haplotype-resolved assemblies often require either difficult-to-acquire parental data or an intermediate haplotype-collapsed assembly. Here, we present Graphasing, a workflow which synthesizes the global phase signal of Strand-seq with assembly graph topology to produce chromosome-scale de-novo haplotypes for diploid genomes. Graphasing readily integrates with any assembly workflow that both outputs an assembly graph and has a haplotype assembly mode. Graphasing performs comparably to trio-phasing in contiguity, phasing accuracy, and assembly quality, outperforms Hi-C in phasing accuracy, and generates human assemblies with over 18 chromosome-spanning haplotypes.
Collapse
Affiliation(s)
- Mir Henglin
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Germany
| | - Maryam Ghareghani
- Department of Mathematics and Computer Science, Freie Universität Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - William Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Germany
| |
Collapse
|
96
|
Schmidt TT, Tyer C, Rughani P, Haggblom C, Jones JR, Dai X, Frazer KA, Gage FH, Juul S, Hickey S, Karlseder J. High resolution long-read telomere sequencing reveals dynamic mechanisms in aging and cancer. Nat Commun 2024; 15:5149. [PMID: 38890299 PMCID: PMC11189484 DOI: 10.1038/s41467-024-48917-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 05/15/2024] [Indexed: 06/20/2024] Open
Abstract
Telomeres are the protective nucleoprotein structures at the end of linear eukaryotic chromosomes. Telomeres' repetitive nature and length have traditionally challenged the precise assessment of the composition and length of individual human telomeres. Here, we present Telo-seq to resolve bulk, chromosome arm-specific and allele-specific human telomere lengths using Oxford Nanopore Technologies' native long-read sequencing. Telo-seq resolves telomere shortening in five population doubling increments and reveals intrasample, chromosome arm-specific, allele-specific telomere length heterogeneity. Telo-seq can reliably discriminate between telomerase- and ALT-positive cancer cell lines. Thus, Telo-seq is a tool to study telomere biology during development, aging, and cancer at unprecedented resolution.
Collapse
Affiliation(s)
| | - Carly Tyer
- Oxford Nanopore Technologies, Inc., New York, NY, USA
| | | | - Candy Haggblom
- Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | - Jeffrey R Jones
- Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | - Xiaoguang Dai
- Oxford Nanopore Technologies, Inc., New York, NY, USA
| | - Kelly A Frazer
- Institute of Genomic Medicine, University of California, San Diego, La Jolla, CA, 92093-0761, USA
| | - Fred H Gage
- Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | - Sissel Juul
- Oxford Nanopore Technologies, Inc., New York, NY, USA
| | - Scott Hickey
- Oxford Nanopore Technologies, Inc., New York, NY, USA.
| | - Jan Karlseder
- Salk Institute for Biological Studies, La Jolla, CA, 92037, USA.
| |
Collapse
|
97
|
Sedlić F, Sertić J, Markotić A, Primorac D, Slavica A, Zibar L, Vlahoviček K, Kušec V, Barić I, Paar V, Borovečki F, Žmak L, Kurolt IC, Canki-Klain N, Roksandić S, Rinčić I, Jurić H, Škaro V, Marjanović D, Projić P, Primorac D, Starčević A, Vujaklija D, Šikić M, Križanović K, Gamulin S. The Applied Genomics Development Strategy by the Croatian Academy of Sciences and Arts paves the way for the future development of applied genomics in Croatia. Croat Med J 2024; 65:297-302. [PMID: 38868976 PMCID: PMC11157260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2024] Open
Affiliation(s)
| | | | | | - Dragan Primorac
- Dragan Primorac, St. Catherine Specialty Hospital, Zagreb, Croatia,
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
98
|
Pan C, Reinert K. Leaf: an ultrafast filter for population-scale long-read SV detection. Genome Biol 2024; 25:155. [PMID: 38872200 PMCID: PMC11170821 DOI: 10.1186/s13059-024-03297-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Advances in sequencing technology have facilitated population-scale long-read structural variant (SV) detection. Arguably, one of the main challenges in population-scale analysis is developing effective computational pipelines. Here, we present a new filter-based pipeline for population-scale long-read SV detection. It better captures SV signals at an early stage than conventional assembly-based or alignment-based pipelines. Assessments in this work suggest that the filter-based pipeline helps better resolve intra-read rearrangements. Moreover, it is also more computationally efficient than conventional pipelines and thus may facilitate population-scale long-read applications.
Collapse
Affiliation(s)
- Chenxu Pan
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany.
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| |
Collapse
|
99
|
Lee AT, Chang EF, Paredes MF, Nowakowski TJ. Large-scale neurophysiology and single-cell profiling in human neuroscience. Nature 2024; 630:587-595. [PMID: 38898291 PMCID: PMC12049086 DOI: 10.1038/s41586-024-07405-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 04/09/2024] [Indexed: 06/21/2024]
Abstract
Advances in large-scale single-unit human neurophysiology, single-cell RNA sequencing, spatial transcriptomics and long-term ex vivo tissue culture of surgically resected human brain tissue have provided an unprecedented opportunity to study human neuroscience. In this Perspective, we describe the development of these paradigms, including Neuropixels and recent brain-cell atlas efforts, and discuss how their convergence will further investigations into the cellular underpinnings of network-level activity in the human brain. Specifically, we introduce a workflow in which functionally mapped samples of human brain tissue resected during awake brain surgery can be cultured ex vivo for multi-modal cellular and functional profiling. We then explore how advances in human neuroscience will affect clinical practice, and conclude by discussing societal and ethical implications to consider. Potential findings from the field of human neuroscience will be vast, ranging from insights into human neurodiversity and evolution to providing cell-type-specific access to study and manipulate diseased circuits in pathology. This Perspective aims to provide a unifying framework for the field of human neuroscience as we welcome an exciting era for understanding the functional cytoarchitecture of the human brain.
Collapse
Affiliation(s)
- Anthony T Lee
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Mercedes F Paredes
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Tomasz J Nowakowski
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
- Department of Anatomy, University of California, San Francisco, San Francisco, CA, USA.
- Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, San Francisco, CA, USA.
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
100
|
Phanthunane C, Pongcharoen S, Pannarunothai S, Roboon J, Phanthunane P, Nontarak J. Precision medicine in Asia enhanced by next-generation sequencing: Implications for Thailand through a scoping review and interview study. Clin Transl Sci 2024; 17:e13868. [PMID: 38924657 PMCID: PMC11197108 DOI: 10.1111/cts.13868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 06/05/2024] [Accepted: 06/06/2024] [Indexed: 06/28/2024] Open
Abstract
Next-generation sequencing (NGS) significantly enhances precision medicine (PM) by offering personalized approaches to diagnosis, treatment, and prevention of unmet medical needs. Little is known about the current situation of PM in Asia. Thus, we aimed to conduct an overview of the progress and gaps in PM in Asia and enrich it with in-depth insight into the possibilities of future PM in Thailand. This scoping review focused on Asian countries starting with non-cancer studies, including rare and undiagnosed diseases (RUDs), non-communicable diseases (NCDs), infectious diseases (IDs), and pharmacogenomics, with a focus on NGS. Subsequent in-depth interviews with experts in Thailand were performed, and a thematic analysis served as the main qualitative methodology. Out of 2898 searched articles, 387 studies were included after the review. Although most of the studies focused on cancer, 89 (23.0%) studies were related to RUDs (17.1%), NCDs (2.8%), IDs (1.8%), and pharmacogenomics (1.3%). Apart from medicine and related sciences, the studies were mostly composed of PM (61.8%), followed by genetics medicine and bioinformatics. Interestingly, 28% of articles were conducted exclusively within the fields of medicine and related sciences, emphasizing interdisciplinary integration. The experts emphasized the need for sustainability-driven political will, nurturing collaboration, reinforcing computational infrastructure, and expanding the bioinformatic workforce. In Asia, developments of NGS have made remarkable progress in PM. Thailand has extended PM beyond cancer and focused on clinical implementation. We summarized the PM challenges, including equity and efficiency targeting, guided research funding, sufficient sample size, integrated collaboration, computational infrastructure, and sufficient trained human resources.
Collapse
Affiliation(s)
- Chumut Phanthunane
- Division of Medical OncologyChulabhorn Hospital, Chulabhorn Royal AcademyBangkokThailand
| | - Sutatip Pongcharoen
- Department of Medicine, Faculty of MedicineNaresuan UniversityPhitsanulokThailand
| | | | - Jureepon Roboon
- Department of Anatomy, Faculty of Medical ScienceNaresuan UniversityPhitsanulokThailand
- Centre of Excellence in Medical BiotechnologyNaresuan UniversityPhitsanulokThailand
| | - Pudtan Phanthunane
- Department of Economics, Faculty of Business, Economics and CommunicationsNaresuan UniversityPhitsanulokThailand
| | - Jiraluck Nontarak
- Department of Epidemiology, Faculty of Public HealthMahidol UniversityBangkokThailand
| |
Collapse
|