1
|
van Westerhoven A, Fokkens L, Wissink K, Kema GJ, Rep M, Seidl M. Reference-free identification and pangenome analysis of accessory chromosomes in a major fungal plant pathogen. NAR Genom Bioinform 2025; 7:lqaf034. [PMID: 40176926 PMCID: PMC11963757 DOI: 10.1093/nargab/lqaf034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 02/19/2025] [Accepted: 03/14/2025] [Indexed: 04/05/2025] Open
Abstract
Accessory chromosomes, found in some but not all individuals of a species, play an important role in pathogenicity and host specificity in fungal plant pathogens. However, their variability complicates reference-based analysis, especially when these chromosomes are missing in the reference genome. Pangenome variation graphs offer a reference-free alternative for studying these chromosomes. Here, we constructed a pangenome variation graph for 73 diverse Fusarium oxysporum genomes, a major fungal plant pathogen with a compartmentalized genome that includes conserved core as well as variable accessory chromosomes. To obtain insights into accessory chromosome dynamics, we first constructed a chromosome similarity network using all-vs-all similarity mapping. We identified eleven core chromosomes conserved across all strains and a substantial number of highly variable accessory chromosomes. Some of these accessory chromosomes are host-specific and likely play a role in determining host range. Using a k-mer based approach, we further identified the presence of these accessory chromosomes in all available (581) F. oxysporum assemblies and corroborated the occurrence of host-specific accessory chromosomes. To further analyze the evolution of chromosomes in F. oxysporum, we constructed a pangenome variation graph per group of homologous chromosomes. This reveals that accessory chromosomes are composed of different stretches of accessory regions, and possibly rearrangements between accessory regions gave rise to these mosaic accessory chromosomes. Furthermore, we show that accessory chromosomes are likely horizontally transferred in natural populations. Our findings demonstrate that a pangenome variation graph is a powerful approach to elucidate the evolutionary dynamics of accessory chromosomes in F. oxysporum, which is not only a useful resource for Fusarium but also provides a framework for similar analyses in other species containing accessory chromosomes.
Collapse
Affiliation(s)
- Anouk C van Westerhoven
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3583CH, Utrecht, the Netherlands
- Laboratory of Phytopathology, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, the Netherlands
| | - Like Fokkens
- Laboratory of Phytopathology, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, the Netherlands
| | - Kyran Wissink
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3583CH, Utrecht, the Netherlands
| | - Gert H J Kema
- Laboratory of Phytopathology, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, the Netherlands
| | - Martijn Rep
- Molecular Plant Pathology, Swammerdam Institute of Life Sciences, University of Amsterdam,1090GE, Amsterdam, the Netherlands
| | - Michael F Seidl
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3583CH, Utrecht, the Netherlands
| |
Collapse
|
2
|
Luo L, Wang M, Liu Y, Li J, Bu F, Yuan H, Tang R, Liu C, He G. Sequencing and characterizing human mitochondrial genomes in the biobank-based genomic research paradigm. SCIENCE CHINA. LIFE SCIENCES 2025; 68:1610-1625. [PMID: 39843848 DOI: 10.1007/s11427-024-2736-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 09/18/2024] [Indexed: 01/24/2025]
Abstract
Human mitochondrial DNA (mtDNA) harbors essential mutations linked to aging, neurodegenerative diseases, and complex muscle disorders. Due to its uniparental and haploid inheritance, mtDNA captures matrilineal evolutionary trajectories, playing a crucial role in population and medical genetics. However, critical questions about the genomic diversity patterns, inheritance models, and evolutionary and medical functions of mtDNA remain unresolved or underexplored, particularly in the transition from traditional genotyping to large-scale genomic analyses. This review summarizes recent advancements in data-driven genomic research and technological innovations that address these questions and clarify the biological impact of nuclear-mitochondrial segments (NUMTs) and mtDNA variants on human health, disease, and evolution. We propose a streamlined pipeline to comprehensively identify mtDNA and NUMT genomic diversity using advanced sequencing and computational technologies. Haplotype-resolved mtDNA sequencing and assembly can distinguish authentic mtDNA variants from NUMTs, reduce diagnostic inaccuracies, and provide clearer insights into heteroplasmy patterns and the authenticity of paternal inheritance. This review emphasizes the need for integrative multi-omics approaches and emerging long-read sequencing technologies to gain new insights into mutation mechanisms, the influence of heteroplasmy and paternal inheritance on mtDNA diversity and disease susceptibility, and the detailed functions of NUMTs.
Collapse
Affiliation(s)
- Lintao Luo
- Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu, 610000, China
- Department of Forensic Medicine, College of Basic Medicine, Chongqing Medical University, Chongqing, 400331, China
| | - Mengge Wang
- Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu, 610000, China.
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China.
- Anti-Drug Technology Center of Guangdong Province, Guangzhou, 510230, China.
| | - Yunhui Liu
- Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu, 610000, China
- Department of Forensic Medicine, College of Basic Medicine, Chongqing Medical University, Chongqing, 400331, China
| | - Jianbo Li
- Department of Forensic Medicine, College of Basic Medicine, Chongqing Medical University, Chongqing, 400331, China
| | - Fengxiao Bu
- Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu, 610000, China
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China
| | - Huijun Yuan
- Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu, 610000, China.
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China.
| | - Renkuan Tang
- Department of Forensic Medicine, College of Basic Medicine, Chongqing Medical University, Chongqing, 400331, China.
| | - Chao Liu
- Department of Forensic Medicine, College of Basic Medicine, Chongqing Medical University, Chongqing, 400331, China.
- Anti-Drug Technology Center of Guangdong Province, Guangzhou, 510230, China.
| | - Guanglin He
- Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu, 610000, China.
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China.
- Anti-Drug Technology Center of Guangdong Province, Guangzhou, 510230, China.
| |
Collapse
|
3
|
Lawson LP, Parameswaran S, Panganiban RA, Constantine GM, Weirauch MT, Kottyan LC. Update on the genetics of allergic diseases. J Allergy Clin Immunol 2025; 155:1738-1752. [PMID: 40139464 PMCID: PMC12145254 DOI: 10.1016/j.jaci.2025.03.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 02/24/2025] [Accepted: 03/09/2025] [Indexed: 03/29/2025]
Abstract
The field of genetic etiology of allergic diseases has advanced significantly in recent years. Shared risk loci reflect the contribution of genetic factors to the sequential development of allergic conditions across the atopic march, while unique risk loci provide opportunities to understand tissue specific manifestations of allergic disease. Most identified risk variants are noncoding, indicating that they likely influence gene expression through gene regulatory mechanisms. Despite recent advances, challenges persist, particularly regarding the need for increased ancestral diversity in research populations. Further, while polygenic risk scores show promise for identifying individuals at higher genetic risk for allergic diseases, their predictive accuracy varies across different ancestries and can be difficult to translate to an individual's absolute risk of developing a disease. Methodologies, including "nearest gene," 3D chromatin interaction analysis, expression quantitative trait locus analysis, experimental screens, and integrative bioinformatic models, have established connections between genetic variants and their regulatory targets, enhancing our understanding of disease risk and phenotypic variability. In this review, we focus on the state of knowledge of allergic sensitization and 5 allergic diseases: asthma, atopic dermatitis, allergic rhinitis, food allergy, and eosinophilic esophagitis. We summarize recent progress and highlight opportunities for advancing our understanding of their genetic etiology.
Collapse
Affiliation(s)
- Lucinda P Lawson
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Sreeja Parameswaran
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Ronald A Panganiban
- Asthma Research, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio
| | - Gregory M Constantine
- Human Eosinophil Section, Laboratory of Parasitic Diseases, National Institute of Allergy and Infectious Diseases, National Institute of Health, Bethesda, Md
| | - Matthew T Weirauch
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio
| | - Leah C Kottyan
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio.
| |
Collapse
|
4
|
Nakashima T, Miyauchi T, Takeuchi R, Sugihara Y, Funakoshi Y, Ohka F, Maeda S, Hirato J, Yoshioka T, Okita H, Narita Y, Kanemura Y, Kojima Y, Watanabe Y, Saito R, Suzuki H. Diversity of U1 Small Nuclear RNAs and Diagnostic Methods for Their Mutations. Cancer Sci 2025. [PMID: 40425278 DOI: 10.1111/cas.70110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2025] [Revised: 05/16/2025] [Accepted: 05/20/2025] [Indexed: 05/29/2025] Open
Abstract
U1 small nuclear RNA (snRNA) mutations are recurrent non-coding alterations found in various malignancies, yet their identification has proven challenging due to their repetitive nature. We characterized the complex interindividual diversity and genomic architecture of U1 snRNA loci using sequencing data and a pangenome reference. Our analysis uncovered copy number variations and the diversity of single-nucleotide variants in regions not predicted to have significant functional impact. Compared to traditional linear reference-based analyses for mutations, the pangenome graph demonstrated the best accuracy, successfully identifying previously undetectable mutations. This underscores the utility of pangenome graph references for cancer genome research, particularly in repetitive and highly diverse genomic regions. Additionally, we developed mutation detection methods employing targeted capture sequencing, rapid quantitative polymerase chain reaction, and a machine learning approach based on splicing patterns, all exhibiting high precision in identifying U1 snRNA mutations. Our findings elucidate the structural complexity of U1 snRNA loci and establish robust methodologies for precise mutation detection in these regions.
Collapse
Affiliation(s)
- Takuma Nakashima
- Division of Brain Tumor Translational Research, National Cancer Center Research Institute, Chuo City, Japan
- Department of Neurosurgery, Nagoya University School of Medicine, Nagoya, Japan
| | - Tsubasa Miyauchi
- Division of Brain Tumor Translational Research, National Cancer Center Research Institute, Chuo City, Japan
| | - Ryota Takeuchi
- In Vitro Diagnostics Business, KYORIN Pharmaceutical Co., Ltd, Tokyo, Japan
| | - Yuriko Sugihara
- Division of Brain Tumor Translational Research, National Cancer Center Research Institute, Chuo City, Japan
| | - Yusuke Funakoshi
- Division of Brain Tumor Translational Research, National Cancer Center Research Institute, Chuo City, Japan
| | - Fumiharu Ohka
- Department of Neurosurgery, Nagoya University School of Medicine, Nagoya, Japan
| | - Sachi Maeda
- Department of Neurosurgery, Nagoya University School of Medicine, Nagoya, Japan
| | - Junko Hirato
- Department of Pathology, Public Tomioka General Hospital, Tomioka, Japan
| | - Takako Yoshioka
- Department of Pathology, National Center for Child Health and Development, Setagaya, Japan
| | - Hajime Okita
- Division of Diagnostic Pathology, Keio University School of Medicine, Minato City, Japan
| | - Yoshitaka Narita
- Department of Neurosurgery and Neuro-Oncology, National Cancer Center Hospital, Chuo City, Japan
| | - Yonehiro Kanemura
- Department of Biomedical Research and Innovation, Institute for Clinical Research, NHO Osaka National Hospital, Osaka, Japan
- Department of Neurosurgery, NHO Osaka National Hospital, Osaka, Japan
| | - Yasuhiro Kojima
- Laboratory of Computational Life Science, National Cancer Center Research Institute, Tokyo, Japan
| | - Yuko Watanabe
- Department of Pediatric Oncology, National Cancer Center Hospital, Chuo City, Japan
| | - Ryuta Saito
- Department of Neurosurgery, Nagoya University School of Medicine, Nagoya, Japan
| | - Hiromichi Suzuki
- Division of Brain Tumor Translational Research, National Cancer Center Research Institute, Chuo City, Japan
| |
Collapse
|
5
|
Cui Y, Peng C, Xia Z, Yang C, Guo Y. A survey of sequence-to-graph mapping algorithms in the pangenome era. Genome Biol 2025; 26:138. [PMID: 40405275 PMCID: PMC12096488 DOI: 10.1186/s13059-025-03606-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Accepted: 05/06/2025] [Indexed: 05/24/2025] Open
Abstract
A pangenome can reveal the genetic diversity across different individuals simultaneously. It offers a more comprehensive reference for genome analysis compared to a single linear genome that may introduce allele bias. Pangenomes are often represented as genome graphs, making sequence-to-graph mapping a fundamental task for pangenome construction and analysis. Numerous sequence-to-graph mapping algorithms have been developed over the past few years. Here, we provide a review of the advancements in sequence-to-graph mapping algorithms in the pangenome era. We also discuss the challenges and opportunities that arise in the context of pangenome graphs.
Collapse
Affiliation(s)
- Yingbo Cui
- College of Computer Science and Technology, National University of Defense Technology, No.137 Yanwachi St, 410073, Changsha, People's Republic of China.
| | - Chenchen Peng
- College of Computer Science and Technology, National University of Defense Technology, No.137 Yanwachi St, 410073, Changsha, People's Republic of China
| | - Zeyu Xia
- College of Computer Science and Technology, National University of Defense Technology, No.137 Yanwachi St, 410073, Changsha, People's Republic of China
| | - Canqun Yang
- College of Computer Science and Technology, National University of Defense Technology, No.137 Yanwachi St, 410073, Changsha, People's Republic of China
- National Supercomputer Center in Tianjin, No.10 Xinhuan West Rd, 300457, Tianjin, People's Republic of China
| | - Yifei Guo
- College of Computer Science and Technology, National University of Defense Technology, No.137 Yanwachi St, 410073, Changsha, People's Republic of China.
| |
Collapse
|
6
|
Zhu Y, Watson C, Safonova Y, Pennell M, Bankevich A. CloseRead: a tool for assessing assembly errors in immunoglobulin loci applied to vertebrate long-read genome assemblies. Genome Biol 2025; 26:131. [PMID: 40394681 PMCID: PMC12090573 DOI: 10.1186/s13059-025-03594-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Accepted: 04/28/2025] [Indexed: 05/22/2025] Open
Abstract
Despite tremendous advances in long-read sequencing, some structurally complex and repeat-rich genomic regions remain challenging to assemble. Furthermore, we lack tools to assess local assembly quality, making it hard to identify problems and assess progress. Here we develop a new approach "CloseRead" for visualizing local assembly quality and diagnosing errors using multiple metrics. We apply CloseRead to evaluate how well immunoglobulin loci, paradigmatic cases of structurally complex regions, are assembled in 74 state-of-the-art vertebrate genomes. We then show that targeted, local re-assembly can correct the specific errors identified by CloseRead, highlighting the value of an iterative approach to genome assembly.
Collapse
Affiliation(s)
- Yixin Zhu
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Corey Watson
- Department of Biochemistry and Molecular Biology, University of Louisville School of Medicine, Louisville, KY, USA
| | - Yana Safonova
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA.
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA.
| | - Matt Pennell
- Department of Computational Biology, Cornell University, Ithaca, NY, USA.
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| | - Anton Bankevich
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA.
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
7
|
Wang G, Korody ML, Brändl B, Hernandez-Toro CJ, Rohrandt C, Hong K, Pang AWC, Lee J, Migliorelli G, Stanke M, Ford SM, Pollmann I, Houck ML, Lewin HA, Lear TL, Ryder OA, Meissner A, Loring JF, Müller FJ. Genomic map of the functionally extinct northern white rhinoceros ( Ceratotherium simum cottoni). Proc Natl Acad Sci U S A 2025; 122:e2401207122. [PMID: 40359041 DOI: 10.1073/pnas.2401207122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 03/20/2025] [Indexed: 05/15/2025] Open
Abstract
The northern white rhinoceros (NWR; Ceratotherium simum cottoni) is functionally extinct, with only two nonreproductive females alive. Efforts to rescue the NWR from its inevitable demise have inspired the exploration of unconventional conservation methods, including the development of induced pluripotent stem cells (iPSCs) for the in vitro generation of artificial gametes. The integrity of iPSC genomes is critical for in vitro gametogenesis to be used for assisted reproductive technologies using NWR iPSCs. We generated a chromosome-level NWR reference genome that meets or exceeds the metrics proposed by the Vertebrate Genome Project, using complementary sequencing and mapping methods. The genome represents 40 autosomes, an X and a partially resolved Y chromosome, and the mitochondrial genome. Using comparative FISH mapping, we confirmed a general gene order conservation between the NWR and horse genomes. We aligned the NWR genome with that of the southern white rhinoceros (SWR; Ceratotherium simum simum), a population that has been physically separated from the NWR for tens of thousands of years, and we found that the two subspecies are very similar on the chromosome level. Comparing long-read data from NWR iPSC lines and the fibroblast cultures used for reprogramming, we identified copy number variations that were likely to have been introduced during in vitro iPSC expansion. The NWR reference genome allows for efficient, rapid, and accurate assessment of the genomic integrity of iPSC lines to direct their differentiation. This will assist in strategies to rescue the NWR through extraordinary measures like cloning and the generation of embryos from iPSC-derived gametes.
Collapse
Affiliation(s)
- Gaojianyong Wang
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
- Department of Psychiatry and Psychotherapy, Christian-Albrechts Universität, Kiel 24105, Germany
- Zentrum für Integrative Psychiatrie, University Hospital Schleswig-Holstein, Kiel 24105, Germany
| | | | - Björn Brändl
- Department of Psychiatry and Psychotherapy, Christian-Albrechts Universität, Kiel 24105, Germany
- Zentrum für Integrative Psychiatrie, University Hospital Schleswig-Holstein, Kiel 24105, Germany
| | | | - Christian Rohrandt
- Department of Psychiatry and Psychotherapy, Christian-Albrechts Universität, Kiel 24105, Germany
- Zentrum für Integrative Psychiatrie, University Hospital Schleswig-Holstein, Kiel 24105, Germany
- Institute for Communications Technologies and Embedded Systems, Kiel University of Applied Sciences, Kiel 24149, Germany
| | - Karl Hong
- Bionano Genomics Inc, San Diego CA, 92121
| | | | - Joyce Lee
- Bionano Genomics Inc, San Diego CA, 92121
| | - Giovanna Migliorelli
- Institute of Mathematics and Computer Science, and Center for Functional Genomics of Microbes, University of Greifswald, Greifswald 17489, Germany
| | - Mario Stanke
- Institute of Mathematics and Computer Science, and Center for Functional Genomics of Microbes, University of Greifswald, Greifswald 17489, Germany
| | - Sarah M Ford
- San Diego Zoo Wildlife Alliance, Escondido, CA, 92027
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA 95060
| | - Iris Pollmann
- Department of Psychiatry and Psychotherapy, Christian-Albrechts Universität, Kiel 24105, Germany
- Zentrum für Integrative Psychiatrie, University Hospital Schleswig-Holstein, Kiel 24105, Germany
| | | | - Harris A Lewin
- The Genome Center, University of California, Davis, CA 95616
- Department of Evolution and Ecology, University of California, Davis, CA 95616
- John Muir Institute for the Environment, University of California, Davis, CA 95616
| | - Teri L Lear
- Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Lexington, KY 40546
| | | | - Alexander Meissner
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| | | | - Franz-Josef Müller
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
- Department of Psychiatry and Psychotherapy, Christian-Albrechts Universität, Kiel 24105, Germany
- Zentrum für Integrative Psychiatrie, University Hospital Schleswig-Holstein, Kiel 24105, Germany
| |
Collapse
|
8
|
Lin MJ, Langmead B, Safonova Y. IGLoo enables comprehensive analysis and assembly of immunoglobulin heavy-chain loci in lymphoblastoid cell lines using PacBio high-fidelity reads. CELL REPORTS METHODS 2025; 5:101033. [PMID: 40315852 DOI: 10.1016/j.crmeth.2025.101033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 01/29/2025] [Accepted: 04/07/2025] [Indexed: 05/04/2025]
Abstract
High-quality human genome assemblies derived from lymphoblastoid cell lines (LCLs) provide reference genomes and pangenomes for genomics studies. However, LCLs pose technical challenges for profiling immunoglobulin (IG) genes, as their IG loci contain a mixture of germline and somatically recombined haplotypes, making genotyping and assembly difficult with widely used frameworks. To address this, we introduce IGLoo, a software tool that analyzes sequence data and assemblies derived from LCLs, characterizing somatic V(D)J recombination events and identifying breakpoints and missing IG genes in the assemblies. Furthermore, IGLoo implements a reassembly framework to improve germline assembly quality by integrating information on somatic events and population structural variations in IG loci. Applying IGLoo to the assemblies from the Human Pangenome Reference Consortium, we gained valuable insights into the mechanisms, gene usage, and patterns of V(D)J recombination and the causes of assembly artifacts in the IG heavy-chain (IGH) locus, and we improved the representation of IGH assemblies.
Collapse
Affiliation(s)
- Mao-Jan Lin
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA.
| | - Yana Safonova
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA; Computer Science and Engineering Department, Pennsylvania State University, University Park, PA 16802, USA; Huck Institutes of Life Sciences, Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
9
|
Lehmann B, Bräuninger L, Cho Y, Falck F, Jayadeva S, Katell M, Nguyen T, Perini A, Tallman S, Mackintosh M, Silver M, Kuchenbäcker K, Leslie D, Chatterjee N, Holmes C. Methodological opportunities in genomic data analysis to advance health equity. Nat Rev Genet 2025:10.1038/s41576-025-00839-w. [PMID: 40369311 DOI: 10.1038/s41576-025-00839-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/27/2025] [Indexed: 05/16/2025]
Abstract
The causes and consequences of inequities in genomic research and medicine are complex and widespread. However, it is widely acknowledged that underrepresentation of diverse populations in human genetics research risks exacerbating existing health disparities. Efforts to improve diversity are ongoing, but an often-overlooked source of inequity is the choice of analytical methods used to process, analyse and interpret genomic data. This choice can influence all areas of genomic research, from genome-wide association studies and polygenic score development to variant prioritization and functional genomics. New statistical and machine learning techniques to understand, quantify and correct for the impact of biases in genomic data are emerging within the wider genomic research and genomic medicine ecosystems. At this crucial time point, it is important to clarify where improvements in methods and practices can, or cannot, have a role in improving equity in genomics. Here, we review existing approaches to promote equity and fairness in statistical analysis for genomics, and propose future methodological developments that are likely to yield the most impact for equity.
Collapse
Affiliation(s)
- Brieuc Lehmann
- Department of Statistical Science, University College London, London, UK.
| | - Leandra Bräuninger
- Department of Statistical Science, University College London, London, UK
- The Alan Turing Institute, London, UK
| | - Yoonsu Cho
- Genomics England, London, UK
- Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Fabian Falck
- The Alan Turing Institute, London, UK
- Department of Statistics, University of Oxford, Oxford, UK
| | | | | | | | | | | | | | - Matt Silver
- Genomics England, London, UK
- Medical Research Council Unit The Gambia at the London School of Hygiene & Tropical Medicine, Banjul, The Gambia
| | - Karoline Kuchenbäcker
- Genomics England, London, UK
- Division of Psychiatry, University College London, London, UK
| | | | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Chris Holmes
- Department of Statistics, University of Oxford, Oxford, UK
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
10
|
Prokopov D, Tunbak H, Leddy E, Drylie B, Camera F, Deniz Ö. Transposable elements as genome regulators in normal and malignant haematopoiesis. Blood Cancer J 2025; 15:87. [PMID: 40328728 PMCID: PMC12056191 DOI: 10.1038/s41408-025-01295-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2025] [Revised: 04/16/2025] [Accepted: 04/23/2025] [Indexed: 05/08/2025] Open
Abstract
Transposable elements (TEs) constitute over half of the human genome and have played a profound role in genome evolution. While most TEs have lost the ability to transpose, many retain functional elements that serve as drivers of genome innovation, including the emergence of novel genes and regulatory elements. Recent advances in experimental and bioinformatic methods have provided new insights into their roles in human biology, both in health and disease. In this review, we discuss the multifaceted roles of TEs in haematopoiesis, highlighting their contributions to both normal and pathological contexts. TEs influence gene regulation by reshaping gene-regulatory networks, modulating transcriptional activity, and creating novel regulatory elements. These activities play key roles in maintaining normal haematopoietic processes and supporting cellular regeneration. However, in haematological malignancies, TE reactivation can disrupt genomic integrity, induce structural variations, and dysregulate transcriptional programmes, thereby driving oncogenesis. By examining the impact of TE activity on genome regulation and variation, we highlight their pivotal roles in both normal haematopoietic processes and haematological cancers.
Collapse
Affiliation(s)
- Dmitry Prokopov
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, London, UK
- QMUL Centre for Epigenetics, Queen Mary University of London, London, UK
| | - Hale Tunbak
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, London, UK
- QMUL Centre for Epigenetics, Queen Mary University of London, London, UK
| | - Eve Leddy
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, London, UK
- QMUL Centre for Epigenetics, Queen Mary University of London, London, UK
| | - Bryce Drylie
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, London, UK
- QMUL Centre for Epigenetics, Queen Mary University of London, London, UK
| | - Francesco Camera
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, London, UK
- QMUL Centre for Epigenetics, Queen Mary University of London, London, UK
| | - Özgen Deniz
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, London, UK.
- QMUL Centre for Epigenetics, Queen Mary University of London, London, UK.
| |
Collapse
|
11
|
Gao Y, Yang L, Kuhn K, Li W, Zanton G, Bowman M, Zhao P, Zhou Y, Fang L, Cole JB, Rosen BD, Ma L, Li C, Baldwin RL, Van Tassell CP, Zhang Z, Smith TPL, Liu GE. Long read and preliminary pangenome analyses reveal breed-specific structural variations and novel sequences in Holstein and Jersey cattle. J Adv Res 2025:S2090-1232(25)00258-9. [PMID: 40258473 DOI: 10.1016/j.jare.2025.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Revised: 04/06/2025] [Accepted: 04/10/2025] [Indexed: 04/23/2025] Open
Abstract
INTRODUCTION Most SV studies in livestock rely on short-read sequencing, posing challenges in accurately characterizing large genomic variants due to their limited read length. OBJECTIVES Our goal is to reveal structural variation and novel sequences specific to Holstein and Jersey cattle breeds using long-read and pan-genome analyses. METHODS We sequenced 20 Holsteins and 8 Jersey cattle using PacBio HiFi to 20×, and integrated five read-based and one assembly-based SV caller to determine SVs. RESULTS We assembled the 28 genomes averaging 3.25 Gb with a contig N50 of 69.36 Mb and using the ARS-UCD1.2 reference, we acquired Holstein/Jersey SV catalogs with 74,068/54,689 events spanning 202/135 Mb (7.43 %/4.97 % of the genome). SVs were enriched in less conserved, non-coding, and non-regulatory regions. Comparing Holsteins with differing feed efficiency (FE), SVs unique to high FE were linked to energy metabolism and olfactory receptors, while those specific to low FE were associated with material transport. We constructed Holstein/Jersey pangenome graphs with 148,598/105,875 nodes and 208,891/147,990 edges, representing 47,028/37,137 biallelic and multi-allelic events, and 63.75/42.34 Mb of novel sequence. We observed SV count saturation with 20 Holsteins, while adding Jerseys significantly increased the SV count, highlighting breed-specific SV events. CONCLUSION Our long-read data and SV catalogs are valuable resources, revealing that the cattle genome is more complex than previously thought.
Collapse
Affiliation(s)
- Yahui Gao
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China; Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA; Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Liu Yang
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA; Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Kristen Kuhn
- USDA, ARS, U.S. Meat Animal Research Center (USMARC), Clay Center, NE, USA.
| | - Wenli Li
- US Dairy Forage Research Center, USDA-ARS, Madison, WI, USA.
| | - Geoffrey Zanton
- US Dairy Forage Research Center, USDA-ARS, Madison, WI, USA.
| | - Mary Bowman
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Pengju Zhao
- Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya 572000, China.
| | - Yang Zhou
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China.
| | - Lingzhao Fang
- Quantitative Genetics and Genomics (QGG), Aarhus University, Aarhus, Denmark.
| | - John B Cole
- Council on Dairy Cattle Breeding, 4201 Northview Dr, Bowie, MD 20716, USA; Department of Animal Sciences, Donald Henry Barron Reproductive and Perinatal Biology Research Program, and the Genetics Institute, University of Florida, Gainesville, FL 32611-0910, USA; Department of Animal Science, North Carolina State University, Raleigh, NC 27695-7621, USA.
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Congjun Li
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Ransom L Baldwin
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Curtis P Van Tassell
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Zhe Zhang
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Timothy P L Smith
- USDA, ARS, U.S. Meat Animal Research Center (USMARC), Clay Center, NE, USA.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| |
Collapse
|
12
|
Vrček L, Bresson X, Laurent T, Schmitz M, Kawaguchi K, Šikić M. Geometric deep learning framework for de novo genome assembly. Genome Res 2025; 35:839-849. [PMID: 39472021 PMCID: PMC12047240 DOI: 10.1101/gr.279307.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 10/18/2024] [Indexed: 03/16/2025]
Abstract
The critical stage of every de novo genome assembler is identifying paths in assembly graphs that correspond to the reconstructed genomic sequences. The existing algorithmic methods struggle with this, primarily due to repetitive regions causing complex graph tangles, leading to fragmented assemblies. Here, we introduce GNNome, a framework for path identification based on geometric deep learning that enables training models on assembly graphs without relying on existing assembly strategies. By leveraging only the symmetries inherent to the problem, GNNome reconstructs assemblies from PacBio HiFi reads with contiguity and quality comparable to those of the state-of-the-art tools across several species. With every new genome assembled telomere-to-telomere, the amount of reliable training data at our disposal increases. Combining the straightforward generation of abundant simulated data for diverse genomic structures with the AI approach makes the proposed framework a plausible cornerstone for future work on reconstructing complex genomes with different degrees of ploidy and aneuploidy. To facilitate such developments, we make the framework and the best-performing model publicly available, provided as a tool that can directly be used to assemble new haploid genomes.
Collapse
Affiliation(s)
- Lovro Vrček
- Genome Institute of Singapore, A*STAR, Singapore 138672;
- Faculty of Electrical Engineering and Computing, University of Zagreb, 10000, Zagreb, Croatia
| | - Xavier Bresson
- School of Computing, National University of Singapore, Singapore 117417
| | - Thomas Laurent
- Department of Mathematics, Loyola Marymount University, Los Angeles, California 90045, USA
| | - Martin Schmitz
- Genome Institute of Singapore, A*STAR, Singapore 138672
- School of Computing, National University of Singapore, Singapore 117417
| | - Kenji Kawaguchi
- School of Computing, National University of Singapore, Singapore 117417
| | - Mile Šikić
- Genome Institute of Singapore, A*STAR, Singapore 138672;
- Faculty of Electrical Engineering and Computing, University of Zagreb, 10000, Zagreb, Croatia
| |
Collapse
|
13
|
Zhang Y, Hulsman M, Salazar A, Tesi N, Knoop L, van der Lee S, Wijesekera S, Krizova J, Kamsteeg EJ, Holstege H. Multisample motif discovery and visualization for tandem repeats. Genome Res 2025; 35:850-862. [PMID: 39537359 PMCID: PMC12047238 DOI: 10.1101/gr.279278.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 10/31/2024] [Indexed: 11/16/2024]
Abstract
Tandem repeats (TRs) occupy a significant portion of the human genome and are a source of polymorphisms due to variations in sizes and motif compositions. Some of these variations have been associated with various neuropathological disorders, highlighting the clinical importance of assessing the motif structure of TRs. Moreover, assessing the TR motif variation can offer valuable insights into evolutionary dynamics and population structure. Previously, characterizations of TRs were limited by short-read sequencing technology, which lacks the ability to accurately capture the full TR sequences. As long-read sequencing becomes more accessible and can capture the full complexity of TRs, there is now also a need for tools to characterize and analyze TRs using long-read data across multiple samples. In this study, we present MotifScope, a novel algorithm for the characterization and visualization of TRs based on a de novo k-mer approach for motif discovery. Comparative analysis against established tools reveals that MotifScope can identify a greater number of motifs and more accurately represent the underlying repeat sequences. Moreover, MotifScope has been specifically designed to enable motif composition comparisons across assemblies of different individuals, as well as across long-read sequencing reads within an individual, through combined motif discovery and sequence alignment. We showcase potential applications of MotifScope in diverse fields, including population genetics, clinical settings, and forensic analyses.
Collapse
Affiliation(s)
- Yaran Zhang
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Marc Hulsman
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
| | - Alex Salazar
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Niccolò Tesi
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
| | - Lydian Knoop
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Sven van der Lee
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
- Amsterdam Neuroscience, Neurodegeneration, 1081HV Amsterdam, The Netherlands
| | - Sanduni Wijesekera
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Jana Krizova
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands
| | - Erik-Jan Kamsteeg
- Department of Human Genetics, Radboud University Medical Center, 6525GA Nijmegen, The Netherlands
| | - Henne Holstege
- Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands;
- Delft Bioinformatics Lab, Delft University of Technology, 2628CD Delft, The Netherlands
- Amsterdam Neuroscience, Neurodegeneration, 1081HV Amsterdam, The Netherlands
- Alzheimer Center Amsterdam, Neurology, Vrije Universiteit Amsterdam, Amsterdam UMC location VUmc, 1081HV Amsterdam, The Netherlands
| |
Collapse
|
14
|
Azam S, Sahu A, Pandey NK, Neupane M, Van Tassell CP, Rosen BD, Gandham RK, Rath SN, Majumdar SS. Constructing a draft Indian cattle pangenome using short-read sequencing. Commun Biol 2025; 8:605. [PMID: 40223124 PMCID: PMC11994783 DOI: 10.1038/s42003-025-07978-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Accepted: 03/21/2025] [Indexed: 04/15/2025] Open
Abstract
Indian desi cattle, known for their adaptability and phenotypic diversity, represent a valuable genetic resource. However, a single reference genome often fails to capture the full extent of their genetic variation. To address this, we construct a pangenome for desi cattle by identifying and characterizing non-reference novel sequences (NRNS). We sequence 68 genomes from seven breeds, generating 48.35 billion short reads. Using the PanGenome Analysis (PanGA) pipeline, we identify 13,065 NRNS (~41 Mbp), with substantial variation across the population. Most NRNS were unique to desi cattle, with minimal overlap (4.1%) with the Chinese indicine pangenome. Approximately 40% of NRNS exhibited ancestral origins within the Bos genus and were enriched in genic regions, suggesting functional roles. These sequences are linked to quantitative trait loci for traits such as milk production. The pangenome approach enhances read mapping accuracy, reduces spurious single nucleotide polymorphism calls, and uncovers novel genetic variants, offering a deeper understanding of desi cattle genomics.
Collapse
Affiliation(s)
- Sarwar Azam
- National Institute of Animal Biotechnology, Hyderabad, India
- Indian Institute of Technology Hyderabad, Sangareddy, India
| | - Abhisek Sahu
- National Institute of Animal Biotechnology, Hyderabad, India
| | | | - Mahesh Neupane
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, USA
| | | | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, USA.
| | | | | | | |
Collapse
|
15
|
Ma W, Chaisson M. Genotyping sequence-resolved copy number variation using pangenomes reveals paralog-specific global diversity and expression divergence of duplicated genes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.08.11.607269. [PMID: 39149335 PMCID: PMC11326217 DOI: 10.1101/2024.08.11.607269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Copy number variant (CNV) genes are important in evolution and disease, yet sequence variation in CNV genes remains a blind spot in large-scale studies. We present ctyper, a method that leverages pangenomes to produce allele-specific copy numbers with locally phased variants from next-generation sequencing (NGS) reads. Benchmarking on 3,351 CNV genes, including HLA, SMN, and CYP2D6, and 212 challenging medically relevant (CMR) genes that are poorly mapped by NGS, ctyper captures 96.5% of phased variants with ≥99.1% correctness of copy number on CNV genes and 94.8% of phased variants on CMR genes. Applying alignment-free algorithms, ctyper requires 1.5 hours per genome on a single CPU. The results improve prediction of gene expression compared to known expression quantitative trait loci (eQTL) variants. Allele-specific expression quantified divergent expression on 7.94% of paralogs and tissue-specific biases on 4.68% of paralogs. We found reduced expression of SMN-2 due to SMN1 conversion, potentially affecting spinal muscular atrophy, and increased expression of translocated duplications of AMY2B. Overall, ctyper enables biobank-scale genotyping of CNV and CMR genes.
Collapse
|
16
|
Adam CL, Rocha J, Sudmant P, Rohlfs R. TRACKing tandem repeats: a customizable pipeline for identification and cross-species comparison. BIOINFORMATICS ADVANCES 2025; 5:vbaf066. [PMID: 40351869 PMCID: PMC12064168 DOI: 10.1093/bioadv/vbaf066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 03/14/2025] [Accepted: 04/07/2025] [Indexed: 05/14/2025]
Abstract
Summary TRACK is a user-friendly Snakemake workflow designed to streamline the discovery and comparison of tandem repeats (TRs) across species. TRACK facilitates the cataloging and filtering of TRs based on reference genomes or T2T transcripts, and applies reciprocal LiftOver and sequence alignment methods to identify putative homologous TRs between species. For further analyses, TRACK can be used to genotype TRs and subsequently estimate and plot basic population genetic statistics. By incorporating key functionalities within an integrated workflow, TRACK enhances TR analysis accessibility and reproducibility, while offering flexibility for the user. Availability and implementation The TRACK toolkit with step-by-step tutorial is freely available at https://github.com/caroladam/track.
Collapse
Affiliation(s)
- Carolina L Adam
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403, United States
| | - Joana Rocha
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, United States
| | - Peter Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, United States
| | - Rori Rohlfs
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403, United States
- School of Computer and Data Sciences, University of Oregon, Eugene, OR 97403, United States
| |
Collapse
|
17
|
Fang L, Teng J, Lin Q, Bai Z, Liu S, Guan D, Li B, Gao Y, Hou Y, Gong M, Pan Z, Yu Y, Clark EL, Smith J, Rawlik K, Xiang R, Chamberlain AJ, Goddard ME, Littlejohn M, Larson G, MacHugh DE, O'Grady JF, Sørensen P, Sahana G, Lund MS, Jiang Z, Pan X, Gong W, Zhang H, He X, Zhang Y, Gao N, He J, Yi G, Liu Y, Tang Z, Zhao P, Zhou Y, Fu L, Wang X, Hao D, Liu L, Chen S, Young RS, Shen X, Xia C, Cheng H, Ma L, Cole JB, Baldwin RL, Li CJ, Van Tassell CP, Rosen BD, Bhowmik N, Lunney J, Liu W, Guan L, Zhao X, Ibeagha-Awemu EM, Luo Y, Lin L, Canela-Xandri O, Derks MFL, Crooijmans RPMA, Gòdia M, Madsen O, Groenen MAM, Koltes JE, Tuggle CK, McCarthy FM, Rocha D, Giuffra E, Amills M, Clop A, Ballester M, Tosser-Klopp G, Li J, Fang C, Fang M, Wang Q, Hou Z, Wang Q, Zhao F, Jiang L, Zhao G, Zhou Z, Zhou R, Liu H, Deng J, Jin L, Li M, Mo D, Liu X, Chen Y, Yuan X, Li J, Zhao S, Zhang Y, Ding X, Sun D, et alFang L, Teng J, Lin Q, Bai Z, Liu S, Guan D, Li B, Gao Y, Hou Y, Gong M, Pan Z, Yu Y, Clark EL, Smith J, Rawlik K, Xiang R, Chamberlain AJ, Goddard ME, Littlejohn M, Larson G, MacHugh DE, O'Grady JF, Sørensen P, Sahana G, Lund MS, Jiang Z, Pan X, Gong W, Zhang H, He X, Zhang Y, Gao N, He J, Yi G, Liu Y, Tang Z, Zhao P, Zhou Y, Fu L, Wang X, Hao D, Liu L, Chen S, Young RS, Shen X, Xia C, Cheng H, Ma L, Cole JB, Baldwin RL, Li CJ, Van Tassell CP, Rosen BD, Bhowmik N, Lunney J, Liu W, Guan L, Zhao X, Ibeagha-Awemu EM, Luo Y, Lin L, Canela-Xandri O, Derks MFL, Crooijmans RPMA, Gòdia M, Madsen O, Groenen MAM, Koltes JE, Tuggle CK, McCarthy FM, Rocha D, Giuffra E, Amills M, Clop A, Ballester M, Tosser-Klopp G, Li J, Fang C, Fang M, Wang Q, Hou Z, Wang Q, Zhao F, Jiang L, Zhao G, Zhou Z, Zhou R, Liu H, Deng J, Jin L, Li M, Mo D, Liu X, Chen Y, Yuan X, Li J, Zhao S, Zhang Y, Ding X, Sun D, Sun HZ, Li C, Wang Y, Jiang Y, Wu D, Wang W, Fan X, Zhang Q, Li K, Zhang H, Yang N, Hu X, Huang W, Song J, Wu Y, Yang J, Wu W, Kasper C, Liu X, Yu X, Cui L, Zhou X, Kim S, Li W, Im HK, Buckler ES, Ren B, Schatz MC, Li JJ, Palmer AA, Frantz L, Zhou H, Zhang Z, Liu GE. The Farm Animal Genotype-Tissue Expression (FarmGTEx) Project. Nat Genet 2025; 57:786-796. [PMID: 40097783 DOI: 10.1038/s41588-025-02121-5] [Show More Authors] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Accepted: 02/06/2025] [Indexed: 03/19/2025]
Abstract
Genetic mutation and drift, coupled with natural and human-mediated selection and migration, have produced a wide variety of genotypes and phenotypes in farmed animals. We here introduce the Farm Animal Genotype-Tissue Expression (FarmGTEx) Project, which aims to elucidate the genetic determinants of gene expression across 16 terrestrial and aquatic domestic species under diverse biological and environmental contexts. For each species, we aim to collect multiomics data, particularly genomics and transcriptomics, from 50 tissues of 1,000 healthy adults and 200 additional animals representing a specific context. This Perspective provides an overview of the priorities of FarmGTEx and advocates for coordinated strategies of data analysis and resource-sharing initiatives. FarmGTEx aims to serve as a platform for investigating context-specific regulatory effects, which will deepen our understanding of molecular mechanisms underlying complex phenotypes. The knowledge and insights provided by FarmGTEx will contribute to improving sustainable agriculture-based food systems, comparative biology and eventual human biomedicine.
Collapse
Affiliation(s)
- Lingzhao Fang
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark.
| | - Jinyan Teng
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Qing Lin
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zhonghao Bai
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Shuli Liu
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China
- School of Life Sciences, Westlake University, Hangzhou, China
| | - Dailu Guan
- Department of Animal Science, University of California, Davis, Davis, CA, USA
| | - Bingjie Li
- Department of Animal and Veterinary Sciences, Scotland's Rural College, Midlothian, UK
| | - Yahui Gao
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Yali Hou
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Mian Gong
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zhangyuan Pan
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ying Yu
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Emily L Clark
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian, UK
| | - Jacqueline Smith
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian, UK
| | - Konrad Rawlik
- Baillie Gifford Pandemic Science Hub, Centre for Inflammation Research, Institute for Regeneration and Repair, the University of Edinburgh, Edinburgh, UK
| | - Ruidong Xiang
- Agriculture Victoria Research, AgriBio, Centre for AgriBioscience, Bundoora, Victoria, Australia
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of Agriculture, Food and Ecosystem Sciences, the University of Melbourne, Parkville, Victoria, Australia
| | - Amanda J Chamberlain
- Agriculture Victoria Research, AgriBio, Centre for AgriBioscience, Bundoora, Victoria, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, Victoria, Australia
| | - Michael E Goddard
- Agriculture Victoria Research, AgriBio, Centre for AgriBioscience, Bundoora, Victoria, Australia
- School of Agriculture, Food and Ecosystem Sciences, the University of Melbourne, Parkville, Victoria, Australia
| | - Mathew Littlejohn
- Research and Development, Livestock Improvement Corporation, Hamilton, New Zealand
- AL Rae Centre for Genetics and Breeding, Massey University, Palmerston North, New Zealand
| | - Greger Larson
- The Palaeogenomics and Bio-Archaeology Research Network, School of Archaeology, University of Oxford, Oxford, UK
| | - David E MacHugh
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, Dublin, Ireland
- UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin, Ireland
- UCD One Health Centre, University College Dublin, Belfield, Dublin, Ireland
| | - John F O'Grady
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, Dublin, Ireland
| | - Peter Sørensen
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Goutam Sahana
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Mogens Sandø Lund
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Zhihua Jiang
- Department of Animal Sciences and Center for Reproductive Biology, Washington State University, Pullman, WA, USA
| | - Xiangchun Pan
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Wentao Gong
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Haihan Zhang
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Xi He
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Yuebo Zhang
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Ning Gao
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Jun He
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Guoqiang Yi
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Yuwen Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Zhonglin Tang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Pengju Zhao
- Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya, China
| | - Yang Zhou
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of the Ministry of Education, Huazhong Agricultural University, Wuhan, China
- Yazhouwan National Laboratory, Sanya, China
| | - Liangliang Fu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of the Ministry of Education, Huazhong Agricultural University, Wuhan, China
| | - Xiao Wang
- Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Dan Hao
- Poultry Institute, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Lei Liu
- Yazhouwan National Laboratory, Sanya, China
| | - Siqian Chen
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Robert S Young
- Usher Institute, University of Edinburgh, Edinburgh, UK
- Zhejiang University-University of Edinburgh Institute, Zhejiang University, Haining, P. R. China
| | - Xia Shen
- Usher Institute, University of Edinburgh, Edinburgh, UK
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
- Center for Intelligent Medicine Research, Greater Bay Area Institute of Precision Medicine (Guangzhou), Fudan University, Guangzhou, China
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Charley Xia
- Lothian Birth Cohort studies, University of Edinburgh, Edinburgh, UK
- Department of Psychology, University of Edinburgh, Edinburgh, UK
| | - Hao Cheng
- Department of Animal Science, University of California, Davis, Davis, CA, USA
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD, USA
| | - John B Cole
- Council on Dairy Cattle Breeding, Bowie, MD, USA
- Department of Animal Sciences, Donald Henry Barron Reproductive and Perinatal Biology Research Program and the Genetics Institute, University of Florida, Gainesville, FL, USA
- Department of Animal Science, North Carolina State University, Raleigh, NC, USA
| | - Ransom L Baldwin
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, USA
| | - Cong-Jun Li
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, USA
| | - Curtis P Van Tassell
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, USA
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, USA
| | - Nayan Bhowmik
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, USA
| | - Joan Lunney
- Animal Parasitic Diseases Laboratory, BARC, NEA, ARS, USDA, Beltsville, MD, USA
| | - Wansheng Liu
- Department of Animal Science, Center for Reproductive Biology and Health, College of Agricultural Sciences, the Pennsylvania State University, University Park, PA, USA
| | - Leluo Guan
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Alberta, Canada
- Faculty of Land and Food Systems, University of British Columbia, Vancouver, British Columbia, Canada
| | - Xin Zhao
- Department of Animal Science, McGill University, Sainte-Anne-de-Bellevue, Quebec, Canada
| | - Eveline M Ibeagha-Awemu
- Sherbrooke Research and Development Centre, Agriculture and Agri-Food Canada, Sherbrooke, Quebec, Canada
| | - Yonglun Luo
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Steno Diabetes Center Aarhus, Aarhus University Hospital, Aarhus, Denmark
| | - Lin Lin
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Steno Diabetes Center Aarhus, Aarhus University Hospital, Aarhus, Denmark
| | - Oriol Canela-Xandri
- MRC Human Genetics Unit at the Institute of Genetics and Cancer, the University of Edinburgh, Edinburgh, UK
| | - Martijn F L Derks
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | | | - Marta Gòdia
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Ole Madsen
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Martien A M Groenen
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - James E Koltes
- Department of Animal Science, Iowa State University, Ames, IA, USA
| | | | | | - Dominique Rocha
- GABI, AgroParisTech, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| | - Elisabetta Giuffra
- GABI, AgroParisTech, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| | - Marcel Amills
- Department of Animal Genetics, Centre for Research in Agricultural Genomics, CSIC-IRTA-UAB-UB, Campus de la Universitat Autònoma de Barcelona, Bellaterra, Spain
- Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, Bellaterra, Spain
| | - Alex Clop
- Department of Animal Genetics, Centre for Research in Agricultural Genomics, CSIC-IRTA-UAB-UB, Campus de la Universitat Autònoma de Barcelona, Bellaterra, Spain
- Consejo Superior de Investigaciones Científicas, Barcelona, Spain
| | - Maria Ballester
- Animal Breeding and Genetics Programme, Institut de Recerca i Tecnologia Agroalimentàries (IRTA), Torre Marimon, Caldes de Montbui, Spain
| | | | - Jing Li
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
- School of Agriculture and Life Sciences, Kunming University, Kunming, China
| | - Chao Fang
- LC-Bio Technologies, Co., Ltd, Hangzhou, China
| | - Ming Fang
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs, Jimei University, Xiamen, China
| | - Qishan Wang
- College of Animal Sciences, Zhejiang University, Hangzhou, China
| | - Zhuocheng Hou
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Qin Wang
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Fuping Zhao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lin Jiang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Guiping Zhao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zhengkui Zhou
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Rong Zhou
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Hehe Liu
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, China
| | - Juan Deng
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, China
| | - Long Jin
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, China
| | - Mingzhou Li
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, China
| | - Delin Mo
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Xiaohong Liu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Yaosheng Chen
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Xiaolong Yuan
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Jiaqi Li
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of the Ministry of Education, Huazhong Agricultural University, Wuhan, China
- Yazhouwan National Laboratory, Sanya, China
| | - Yi Zhang
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Xiangdong Ding
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Dongxiao Sun
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Hui-Zeng Sun
- Key Laboratory of Dairy Cow Genetic Improvement and Milk Quality Research of Zhejiang Province, College of Animal Sciences, Zhejiang University, Hangzhou, China
| | - Cong Li
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Yu Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Yu Jiang
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Dongdong Wu
- Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Wenwen Wang
- Shandong Provincial Key Laboratory for Livestock Germplasm Innovation and Utilization, College of Animal Science, Shandong Agricultural University, Tai'an, China
| | - Xinzhong Fan
- Shandong Provincial Key Laboratory for Livestock Germplasm Innovation and Utilization, College of Animal Science, Shandong Agricultural University, Tai'an, China
| | - Qin Zhang
- Shandong Provincial Key Laboratory for Livestock Germplasm Innovation and Utilization, College of Animal Science, Shandong Agricultural University, Tai'an, China
| | - Kui Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Hao Zhang
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Ning Yang
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Xiaoxiang Hu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Wen Huang
- Department of Animal Science, Michigan State University, East Lansing, MI, USA
| | - Jiuzhou Song
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD, USA
| | - Yang Wu
- Institute of Rare Diseases, West China Hospital of Sichuan University, Chengdu, China
| | - Jian Yang
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China
- School of Life Sciences, Westlake University, Hangzhou, China
| | - Weiwei Wu
- Institute of Animal Science, Xinjiang Academy of Animal Science, Ürümqi City, China
| | - Claudia Kasper
- Animal GenoPhenomics, Animal Production Systems and Animal Health, Agroscope Posieux, Fribourg, Switzerland
| | - Xinfeng Liu
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, China
| | - Xiaofei Yu
- College of Marine Life Sciences, Ocean University of China, Qingdao, China
| | - Leilei Cui
- School of Life Sciences, Nanchang University, Nanchang, China
- Jiangxi Province Key Laboratory of Aging and Disease, Human Aging Research Institute and School of Life Science, Nanchang University, Jiangxi, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Seyoung Kim
- Department of Epidemiology, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wei Li
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Hae Kyung Im
- Department of Medicine and Human Genetics, the University of Chicago, Chicago, IL, USA
| | - Edward S Buckler
- Section of Plant Breeding and Genetics, Cornell University, Ithaca, NY, USA
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, USA
- Agricultural Research Service, United States Department of Agriculture, Ithaca, NY, USA
| | - Bing Ren
- Department of Cellular and Molecular Medicine, Center for Epigenomics, Moores Cancer Center and Institute of Genomic Medicine, University of California San Diego, School of Medicine, La Jolla, CA, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Jingyi Jessica Li
- Department of Statistics and Data Science, University of California, Los Angeles, Los Angeles, CA, USA.
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA.
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA.
| | - Laurent Frantz
- Palaeogenomics Group, Institute of Palaeoanatomy, Domestication Research and the History of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany.
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK.
| | - Huaijun Zhou
- Department of Animal Science, University of California, Davis, Davis, CA, USA.
| | - Zhe Zhang
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, USA.
| |
Collapse
|
18
|
Kovaka S, Hook PW, Jenike KM, Shivakumar V, Morina LB, Razaghi R, Timp W, Schatz MC. Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment. Nat Methods 2025; 22:681-691. [PMID: 40155722 PMCID: PMC11978507 DOI: 10.1038/s41592-025-02631-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 02/16/2025] [Indexed: 04/01/2025]
Abstract
Nanopore signal analysis enables detection of nucleotide modifications from native DNA and RNA sequencing, providing both accurate genetic or transcriptomic and epigenetic information without additional library preparation. At present, only a limited set of modifications can be directly basecalled (for example, 5-methylcytosine), while most others require exploratory methods that often begin with alignment of nanopore signal to a nucleotide reference. We present Uncalled4, a toolkit for nanopore signal alignment, analysis and visualization. Uncalled4 features an efficient banded signal alignment algorithm, BAM signal alignment file format, statistics for comparing signal alignment methods and a reproducible de novo training method for k-mer-based pore models, revealing potential errors in Oxford Nanopore Technologies' state-of-the-art DNA model. We apply Uncalled4 to RNA 6-methyladenine (m6A) detection in seven human cell lines, identifying 26% more modifications than Nanopolish using m6Anet, including in several genes where m6A has known implications in cancer. Uncalled4 is available open source at github.com/skovaka/uncalled4 .
Collapse
Affiliation(s)
- Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Katharine M Jenike
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Vikram Shivakumar
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Luke B Morina
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Roham Razaghi
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
19
|
Javadzadeh S, Adamson A, Park J, Jo SY, Ding YC, Bakhtiari M, Bansal V, Neuhausen SL, Bafna V. Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats. PLoS Comput Biol 2025; 21:e1012885. [PMID: 40193344 PMCID: PMC11975116 DOI: 10.1371/journal.pcbi.1012885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Accepted: 02/17/2025] [Indexed: 04/09/2025] Open
Abstract
Variable Number Tandem repeats (VNTRs) refer to repeating motifs of size greater than five bp. VNTRs are an important source of genetic variation, and have been associated with multiple Mendelian and complex phenotypes. However, the highly repetitive structures require reads to span the region for accurate genotyping. Pacific Biosciences HiFi sequencing spans large regions and is highly accurate but relatively expensive. Therefore, targeted sequencing approaches coupled with long-read sequencing have been proposed to improve efficiency and throughput. In this paper, we systematically explored the trade-off between targeted and whole genome HiFi sequencing for genotyping VNTRs. We curated a set of 10 , 787 gene-proximal (G-)VNTRs, and 48 phenotype-associated (P-)VNTRs of interest. Illumina reads only spanned 46% of the G-VNTRs and 71% of P-VNTRs, motivating the use of HiFi sequencing. We performed targeted sequencing with hybridization by designing custom probes for 9,999 VNTRs and sequenced 8 samples using HiFi and Illumina sequencing, followed by adVNTR genotyping. We compared these results against HiFi whole genome sequencing (WGS) data from 28 samples in the Human Pangenome Reference Consortium (HPRC). With the targeted approach only 4,091 (41%) G-VNTRs and only 4 (8%) of P-VNTRs were spanned with at least 15 reads. A smaller subset of 3,579 (36%) G-VNTRs had higher median coverage of at least 63 spanning reads. The spanning behavior was consistent across all 8 samples. Among 5,638 VNTRs with low-coverage ( < 15), 67% were located within GC-rich regions ( > 60%). In contrast, the 40X WGS HiFi dataset spanned 98% of all VNTRs and 49 (98%) of P-VNTRs with at least 15 spanning reads, albeit with lower coverage. Spanning reads were sufficient for accurate genotyping in both cases. Our findings demonstrate that targeted sequencing provides consistently high coverage for a small subset of low-GC VNTRs, but WGS is more effective for broad and sufficient sampling of a large number of VNTRs.
Collapse
Affiliation(s)
- Sara Javadzadeh
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America
| | - Aaron Adamson
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, California, United States of America
| | - Jonghun Park
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America
| | - Se-Young Jo
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, South Korea
| | - Yuan-Chun Ding
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, California, United States of America
| | - Mehrdad Bakhtiari
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America
| | - Vikas Bansal
- School of Medicine, University of California, San Diego La Jolla, California, United States of America
| | - Susan L. Neuhausen
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, California, United States of America
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America
| |
Collapse
|
20
|
Jiao D, Dong X, Fan S, Liu X, Yu Y, Wei C. Gastric cancer genomics study using reference human pangenomes. Life Sci Alliance 2025; 8:e202402977. [PMID: 39870503 PMCID: PMC11772497 DOI: 10.26508/lsa.202402977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 01/16/2025] [Accepted: 01/16/2025] [Indexed: 01/29/2025] Open
Abstract
A pangenome is the sum of the genetic information of all individuals in a species or a population. Genomics research has been gradually shifted to a paradigm using a pangenome as the reference. However, in disease genomics study, pangenome-based analysis is still in its infancy. In this study, we introduced a graph-based pangenome GGCPan from 185 patients with gastric cancer. We then systematically compared the cancer genomics study results using GGCPan, a linear pangenome GCPan, and the human reference genome as the reference. For small variant detection and microsatellite instability status identification, there is little difference in using three different genomes. Using GGCPan as the reference had a significant advantage in structural variant identification. A total of 24 candidate gastric cancer driver genes were detected using three different reference genomes, of which eight were common and five were detected only based on pangenomes. Our results showed that disease-specific pangenome as a reference is promising and a whole set of tools are still to be developed or improved for disease genomics study in the pangenome era.
Collapse
Affiliation(s)
- Du Jiao
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Xiaorui Dong
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Shiyu Fan
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Xinyi Liu
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Yingyan Yu
- Department of General Surgery of Ruijin Hospital, Shanghai Institute of Digestive Surgery, and Shanghai Key Laboratory for Gastric Neoplasms, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Chaochun Wei
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
21
|
Wang J, Peng S. Exploring the association between 91 circulating inflammatory proteins and the risk of carcinoid syndrome: a Mendelian randomization analysis. Discov Oncol 2025; 16:434. [PMID: 40163170 PMCID: PMC11958927 DOI: 10.1007/s12672-025-02147-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Accepted: 03/13/2025] [Indexed: 04/02/2025] Open
Abstract
This study aims to explore the potential correlation between circulating inflammatory proteins and carcinoid syndrome (CS). Using summary data from genome-wide association studies (GWAS), we conducted a Mendelian randomization (MR) analysis with two samples, treating 91 circulating inflammatory proteins as exposure factors and CS as the outcome. Based on genetic loci closely associated with circulating inflammatory proteins selected as instrumental variables, we primarily employed the inverse-variance weighted (IVW) method for analysis, combined with the weighted median method (WM), simple median method (SM), weighted mode estimation (WME), and MR-Egger regression for comprehensive analysis. Initial IVW results revealed significant causal effects of six circulating inflammatory proteins on CS. Specifically, Interleukin-17C levels were negatively correlated with CS risk, indicating a protective effect; whereas beta-nerve growth factor, C-C motif chemokine 20, Natural killer cell receptor 2B4, C-X-C motif chemokine 5, and Leukemia inhibitory factor levels were positively correlated with CS risk, suggesting detrimental effects. In heterogeneity tests, the selected single-nucleotide polymorphisms (SNPs) did not show heterogeneity, and analysis using Egger intercept and MR-PRESSO test did not detect pleiotropy of SNPs, thus validating the reliability of the study. Furthermore, sensitivity analysis using the leave-one-out method further confirmed the robustness of the results. In summary, this study identified significant causal relationships between six inflammatory proteins-Interleukin-17C, beta-nerve growth factor, C-C motif chemokine 20, Natural killer cell receptor 2B4, C-X-C motif chemokine 5, and Leukemia inhibitory factor-and CS risk through MR analysis. This finding not only emphasizes the important role of inflammation in the pathogenesis of CS but also suggests the potential value of inflammatory proteins as targets for early diagnosis and therapeutic interventions.
Collapse
Affiliation(s)
- Jingzhi Wang
- Department of Radiotherapy Oncology, The Affiliated Yancheng First Hospital of Nanjing University Medical School, the First People'S Hospital of Yancheng, Yancheng, China
| | - Simin Peng
- Department of Pulmonary Diseases, Shenzhen Hospital of Integrated Traditional Chinese and Western Medicine, No. 528, Xinsha Road, Shajing Street, Shajing Subdistrict, Bao'an District, Shenzhen City, 518104, Guangdong Province, China.
| |
Collapse
|
22
|
Hiatt L, Weisburd B, Dolzhenko E, Rubinetti V, Avvaru AK, VanNoy GE, Kurtas NE, Rehm HL, Quinlan AR, Dashnow H. STRchive: a dynamic resource detailing population-level and locus-specific insights at tandem repeat disease loci. Genome Med 2025; 17:29. [PMID: 40140942 PMCID: PMC11938676 DOI: 10.1186/s13073-025-01454-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Accepted: 03/11/2025] [Indexed: 03/28/2025] Open
Abstract
Approximately 8% of the human genome consists of repetitive elements called tandem repeats (TRs): short tandem repeats (STRs) of 1-6 bp motifs and variable number tandem repeats (VNTRs) of 7 + bp motifs. TR variants contribute to several dozen monogenic diseases but remain understudied and enigmatic. It remains comparatively challenging to interpret the clinical significance of TR variants, particularly relative to single nucleotide variants. We present STRchive ( http://strchive.org/ ), a dynamic resource consolidating information on TR disease loci from the research literature, up-to-date clinical resources, and large-scale genomic databases, streamlining TR variant interpretation at disease-associated loci.
Collapse
Affiliation(s)
- Laurel Hiatt
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Ben Weisburd
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Vincent Rubinetti
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Akshay K Avvaru
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Grace E VanNoy
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Ambry Genetics, Aliso Viejo, CA, USA
| | - Nehir Edibe Kurtas
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| |
Collapse
|
23
|
Miao J, Wang Q, Zhang Z, Wang Q, Pan Y, Wang Z. Pangenome graph mitigates heterozygosity overestimation from mapping bias: a case study in Chinese indigenous pigs. BMC Biol 2025; 23:89. [PMID: 40140905 PMCID: PMC11948684 DOI: 10.1186/s12915-025-02194-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Accepted: 03/18/2025] [Indexed: 03/28/2025] Open
Abstract
BACKGROUND Breeds genetically distant from the reference genome often show considerable differences in DNA fragments, making it difficult to achieve accurate mappings. The genetic differences between pig reference genome (Sscrofa11.1) and Chinese indigenous pigs may lead to mapping bias and affect subsequent analyses. RESULTS Our analysis revealed that pangenome exhibited superior mapping accuracy to the Sscrofa11.1, reducing false-positive mappings by 1.4% and erroneous mappings by 0.8%. Furthermore, the pangenome yielded more accurate genotypes of SNP (F1: 0.9660 vs. 0.9607) and INDEL (F1: 0.9226 vs. 0.9222) compared to Sscrofa11.1. In real sequencing data, the inconsistent SNPs called from the pangenome exhibited lower genome heterozygosity compared to those identified by the Sscrofa11.1, including observed heterozygosity and nucleotide diversity. The same reduction of heterozygosity overestimation was also found in the chicken pangenome. CONCLUSIONS This study quantifies the mapping bias of Sscrofa11.1 in Chinese indigenous pigs, demonstrating that mapping bias can lead to an overestimation of heterozygosity in Chinese indigenous pig breeds. The adoption of a pig pangenome mitigates this bias and provides a more accurate representation of genetic diversity in these populations.
Collapse
Affiliation(s)
- Jian Miao
- College of Animal Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Qingyu Wang
- College of Animal Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Zhe Zhang
- College of Animal Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Qishan Wang
- College of Animal Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
- Hainan Institute of Zhejiang University, Yazhou Bay Science and Technology City, Building 11, Yongyou Industrial Park, Yazhou District, Sanya, Hainan, 572025, China
| | - Yuchun Pan
- College of Animal Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China.
- Hainan Institute of Zhejiang University, Yazhou Bay Science and Technology City, Building 11, Yongyou Industrial Park, Yazhou District, Sanya, Hainan, 572025, China.
| | - Zhen Wang
- College of Animal Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China.
| |
Collapse
|
24
|
Sharma J, Jangale V, Shekhawat RS, Yadav P. Improving genetic variant identification for quantitative traits using ensemble learning-based approaches. BMC Genomics 2025; 26:237. [PMID: 40075256 PMCID: PMC11899862 DOI: 10.1186/s12864-025-11443-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Accepted: 03/04/2025] [Indexed: 03/14/2025] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) are rapidly advancing due to the improved resolution and completeness provided by Telomere-to-Telomere (T2T) and pangenome assemblies. While recent advancements in GWAS methods have primarily focused on identifying genetic variants associated with discrete phenotypes, approaches for quantitative traits (QTs) remain underdeveloped. This has often led to significant variants being overlooked due to biases from genotype multicollinearity and strict p-value thresholds. RESULTS We propose an enhanced ensemble learning approach for QT analysis that integrates regularized variant selection with machine learning-based association methods, validated through comprehensive biological enrichment analysis. We benchmarked four widely recognized single nucleotide polymorphism (SNP) feature selection methods-least absolute shrinkage and selection operator, ridge regression, elastic-net, and mutual information-alongside four association methods: linear regression, random forest, support vector regression (SVR), and XGBoost. Our approach is evaluated on simulated datasets and validated using a subset of the PennCATH real dataset, including imputed versions, focusing on low-density lipoprotein (LDL)-cholesterol levels as a QT. The combination of elastic-net with SVR outperformed other methods across all datasets. Functional annotation of top 100 SNPs identified through this superior ensemble method revealed their expression in tissues involved in LDL cholesterol regulation. We also confirmed the involvement of six known genes (APOB, TRAPPC9, RAB2A, CCL24, FCHO2, and EEPD1) in cholesterol-related pathways and identified potential drug targets, including APOB, PTK2B, and PTPN12. CONCLUSIONS In conclusion, our ensemble learning approach effectively identifies variants associated with QTs, and we expect its performance to improve further with the integration of T2T and pangenome references in future GWAS.
Collapse
Affiliation(s)
- Jyoti Sharma
- Department of Bioscience & Bioengineering, Indian Institute of Technology, Jodhpur, 342030, Rajasthan, India
| | - Vaishnavi Jangale
- Department of Bioscience & Bioengineering, Indian Institute of Technology, Jodhpur, 342030, Rajasthan, India
| | - Rajveer Singh Shekhawat
- Department of Bioscience & Bioengineering, Indian Institute of Technology, Jodhpur, 342030, Rajasthan, India
| | - Pankaj Yadav
- Department of Bioscience & Bioengineering, Indian Institute of Technology, Jodhpur, 342030, Rajasthan, India.
- School of Artificial Intelligence and Data Science, Indian Institute of Technology, Jodhpur, 342030, Rajasthan, India.
| |
Collapse
|
25
|
Smith LA, Cahill JA, Lee JH, Graim K. Equitable machine learning counteracts ancestral bias in precision medicine. Nat Commun 2025; 16:2144. [PMID: 40064867 PMCID: PMC11894161 DOI: 10.1038/s41467-025-57216-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 02/05/2025] [Indexed: 03/14/2025] Open
Abstract
Gold standard genomic datasets severely under-represent non-European populations, leading to inequities and a limited understanding of human disease. Therapeutics and outcomes remain hidden because we lack insights that could be gained from analyzing ancestrally diverse genomic data. To address this significant gap, we present PhyloFrame, a machine learning method for equitable genomic precision medicine. PhyloFrame corrects for ancestral bias by integrating functional interaction networks and population genomics data with transcriptomic training data. Application of PhyloFrame to breast, thyroid, and uterine cancers shows marked improvements in predictive power across all ancestries, less model overfitting, and a higher likelihood of identifying known cancer-related genes. Validation in fourteen ancestrally diverse datasets demonstrates that PhyloFrame is better able to adjust for ancestry bias across all populations. The ability to provide accurate predictions for underrepresented groups, in particular, is substantially increased. Analysis of performance in the most diverse continental ancestry group, African, illustrates how phylogenetic distance from training data negatively impacts model performance, as well as PhyloFrame's capacity to mitigate these effects. These results demonstrate how equitable artificial intelligence (AI) approaches can mitigate ancestral bias in training data and contribute to equitable representation in medical research.
Collapse
Affiliation(s)
- Leslie A Smith
- Department of Computer & Information Science & Engineering, University of Florida, 1889 Museum Rd, Gainesville, 32611, FL, USA
| | - James A Cahill
- Environmental Engineering Sciences Department, University of Florida, 365 Weil Hall, Gainesville, 32611, FL, USA
- UF Genetics Institute, University of Florida, 2033 Mowry Rd, Gainesville, 32610, FL, USA
| | - Ji-Hyun Lee
- Department of Biostatistics, University of Florida, 2004 Mowry Rd, Gainesville, Gainesville, 32603, FL, USA
- UF Health Cancer Center, University of Florida, 2033 Mowry Rd, Gainesville, 32610, FL, USA
| | - Kiley Graim
- Department of Computer & Information Science & Engineering, University of Florida, 1889 Museum Rd, Gainesville, 32611, FL, USA.
- UF Genetics Institute, University of Florida, 2033 Mowry Rd, Gainesville, 32610, FL, USA.
- UF Health Cancer Center, University of Florida, 2033 Mowry Rd, Gainesville, 32610, FL, USA.
| |
Collapse
|
26
|
Chen X, Baker D, Dolzhenko E, Devaney JM, Noya J, Berlyoung AS, Brandon R, Hruska KS, Lochovsky L, Kruszka P, Newman S, Farrow E, Thiffault I, Pastinen T, Kasperaviciute D, Gilissen C, Vissers L, Hoischen A, Berger S, Vilain E, Délot E, Eberle MA. Genome-wide profiling of highly similar paralogous genes using HiFi sequencing. Nat Commun 2025; 16:2340. [PMID: 40057485 PMCID: PMC11890787 DOI: 10.1038/s41467-025-57505-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 02/21/2025] [Indexed: 05/13/2025] Open
Abstract
Variant calling is hindered in segmental duplications by sequence homology. We developed Paraphase, a HiFi-based informatics method that resolves highly similar genes by phasing all haplotypes of paralogous genes together. We applied Paraphase to 160 long (>10 kb) segmental duplication regions across the human genome with high (>99%) sequence similarity, encoding 316 genes. Analysis across five ancestral populations revealed highly variable copy numbers of these regions. We identified 23 paralog groups with exceptionally low within-group diversity, where extensive gene conversion and unequal crossing over contribute to highly similar gene copies. Furthermore, our analysis of 36 trios identified 7 de novo SNVs and 4 de novo gene conversion events, 2 of which are non-allelic. Finally, we summarized extensive genetic diversity in 9 medically relevant genes previously considered challenging to genotype. Paraphase provides a framework for resolving gene paralogs, enabling accurate testing in medically relevant genes and population-wide studies of previously inaccessible genes.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | - Emily Farrow
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
- UMKC School of Medicine, University of Missouri Kansas City, Kansas City, MO, USA
- Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Isabelle Thiffault
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
- UMKC School of Medicine, University of Missouri Kansas City, Kansas City, MO, USA
- Department of Pathology and Laboratory Medicine, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
- UMKC School of Medicine, University of Missouri Kansas City, Kansas City, MO, USA
| | | | - Christian Gilissen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Research Institute for Medical Innovation, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Lisenka Vissers
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Research Institute for Medical Innovation, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Alexander Hoischen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
- Research Institute for Medical Innovation, Radboud University Medical Center, Nijmegen, The Netherlands
- Radboud Center for Infectious Diseases (RCI), Department of Internal Medicine, Radboud University Medical Center, Nijmegen, The Netherlands
- Radboud Expertise Center for Immunodeficiency and Autoinflammation and Radboud Center for Infectious Disease (RCI), Radboud University Medical Center, Nijmegen, The Netherlands
| | - Seth Berger
- Center for Genetics Medicine Research, Children's National Hospital, Washington, DC, USA
| | - Eric Vilain
- Institute for Clinical and Translational Science, University of California, Irvine, CA, USA
| | - Emmanuèle Délot
- Institute for Clinical and Translational Science, University of California, Irvine, CA, USA
| | | |
Collapse
|
27
|
Hatchell KE, Poll SR, Russell EM, Williams TJ, Ellsworth RE, Facio FM, Aguilar S, Esplin ED, Popejoy AB, Nussbaum RL, Aradhya S. Experience using conventional compared to ancestry-based population descriptors in clinical genomics laboratories. Am J Hum Genet 2025; 112:481-491. [PMID: 39884281 PMCID: PMC11947177 DOI: 10.1016/j.ajhg.2025.01.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 01/04/2025] [Accepted: 01/06/2025] [Indexed: 02/01/2025] Open
Abstract
Various scientific and professional groups, including the American Medical Association (AMA), American Society of Human Genetics (ASHG), American College of Medical Genetics (ACMG), and the National Academies of Sciences, Engineering, and Medicine (NASEM), have appropriately clarified that certain population descriptors, such as race and ethnicity, are social and cultural constructs with no basis in genetics. Nevertheless, these conventional population descriptors are routinely collected during the course of clinical genetic testing and may be used to interpret test results. Experts who have examined the use of population descriptors, both conventional and ancestry based, in human genetics and genomics have offered guidance on using these descriptors in research but not in clinical laboratory settings. This perspective piece is based on a decade of experience in a clinical genomics laboratory and provides insight into the relevance of conventional and ancestry-based population descriptors for clinical genetic testing, reporting, and clinical research on aggregated data. As clinicians, laboratory geneticists, genetic counselors, and researchers, we describe real-world experiences collecting conventional population descriptors in the course of clinical genetic testing and expose challenges in ensuring clarity and consistency in the use of population descriptors. Current practices in clinical genomics laboratories that are influenced by population descriptors are identified and discussed through case examples. In relation to this, we describe specific types of clinical research projects in which population descriptors were used and helped derive useful insights related to practicing and improving genomic medicine.
Collapse
Affiliation(s)
- Kathryn E Hatchell
- Labcorp Genetics, Inc. (formerly Invitae Corp.), San Francisco, CA, USA.
| | - Sarah R Poll
- Labcorp Genetics, Inc. (formerly Invitae Corp.), San Francisco, CA, USA
| | - Emily M Russell
- Labcorp Genetics, Inc. (formerly Invitae Corp.), San Francisco, CA, USA
| | - Trevor J Williams
- Labcorp Genetics, Inc. (formerly Invitae Corp.), San Francisco, CA, USA
| | | | - Flavia M Facio
- Labcorp Genetics, Inc. (formerly Invitae Corp.), San Francisco, CA, USA
| | - Sienna Aguilar
- Labcorp Genetics, Inc. (formerly Invitae Corp.), San Francisco, CA, USA
| | - Edward D Esplin
- Labcorp Genetics, Inc. (formerly Invitae Corp.), San Francisco, CA, USA
| | - Alice B Popejoy
- Department of Public Health Sciences (Epidemiology Division), University of California Davis School of Medicine, Davis, CA, USA; UCDavis Health Comprehensive Cancer Center, University of California Davis Medical Center, Sacramento, CA, USA
| | - Robert L Nussbaum
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA, USA
| | - Swaroop Aradhya
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
28
|
Palma-Martínez MJ, Posadas-García YS, Shaukat A, López-Ángeles BE, Sohail M. Evolution, genetic diversity, and health. Nat Med 2025; 31:751-761. [PMID: 40055519 DOI: 10.1038/s41591-025-03558-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Accepted: 02/03/2025] [Indexed: 03/21/2025]
Abstract
Human genetic diversity in today's world has been shaped by evolutionary history, demographic shifts and environmental exposures, influencing complex traits, disease susceptibility and drug responses. Capturing this diversity is essential for advancing precision medicine and promoting equitable healthcare. Despite the great progress achieved with initiatives such as the human Pangenome and large biobanks that aim for a better representation of human diversity, important challenges remain. In this Perspective, we discuss the importance of diversity in clinical genomics through an evolutionary lens. We highlight progress and challenges and outline key clinical applications of diverse genetic data. We argue that diversifying both datasets and methodologies-integrating ancestral and environmental factors-is crucial for fully understanding the genetic basis of human health and disease.
Collapse
Affiliation(s)
- María J Palma-Martínez
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, México
| | | | - Amara Shaukat
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Brenda E López-Ángeles
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Mashaal Sohail
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, México.
| |
Collapse
|
29
|
An Z, Jiang A, Chen J. Toward understanding the role of genomic repeat elements in neurodegenerative diseases. Neural Regen Res 2025; 20:646-659. [PMID: 38886931 PMCID: PMC11433896 DOI: 10.4103/nrr.nrr-d-23-01568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 12/21/2023] [Accepted: 03/02/2024] [Indexed: 06/20/2024] Open
Abstract
Neurodegenerative diseases cause great medical and economic burdens for both patients and society; however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage sequencing technology, researchers have started to notice that genomic repeat regions, previously neglected in search of disease culprits, are active contributors to multiple neurodegenerative diseases. In this review, we describe the association between repeat element variants and multiple degenerative diseases through genome-wide association studies and targeted sequencing. We discuss the identification of disease-relevant repeat element variants, further powered by the advancement of long-read sequencing technologies and their related tools, and summarize recent findings in the molecular mechanisms of repeat element variants in brain degeneration, such as those causing transcriptional silencing or RNA-mediated gain of toxic function. Furthermore, we describe how in silico predictions using innovative computational models, such as deep learning language models, could enhance and accelerate our understanding of the functional impact of repeat element variants. Finally, we discuss future directions to advance current findings for a better understanding of neurodegenerative diseases and the clinical applications of genomic repeat elements.
Collapse
Affiliation(s)
- Zhengyu An
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Aidi Jiang
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Jingqi Chen
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Zhangjiang Fudan International Innovation Center, Shanghai, China
| |
Collapse
|
30
|
Tian M, Gao Y, Xue C, Jin C, Zhang H. Molecular imaging: The bridge from human phenome to personalized precision medicine. Eur J Nucl Med Mol Imaging 2025; 52:1233-1236. [PMID: 39724182 DOI: 10.1007/s00259-024-07048-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2024]
Affiliation(s)
- Mei Tian
- Human Phenome Institute, Fudan University, Shanghai, China.
| | - Yidan Gao
- Human Phenome Institute, Fudan University, Shanghai, China
| | - Chenxi Xue
- Human Phenome Institute, Fudan University, Shanghai, China
| | - Chentao Jin
- Department of Nuclear Medicine and PET Center, The Second Affiliated Hospital of Zhejiang University School of Medicine, 88 Jiefang Road, Hangzhou, 310009, Zhejiang, China
- Institute of Nuclear Medicine and Molecular Imaging of Zhejiang University, Hangzhou, China
- Key Laboratory of Medical Molecular Imaging of Zhejiang Province, Hangzhou, China
| | - Hong Zhang
- Department of Nuclear Medicine and PET Center, The Second Affiliated Hospital of Zhejiang University School of Medicine, 88 Jiefang Road, Hangzhou, 310009, Zhejiang, China.
- Institute of Nuclear Medicine and Molecular Imaging of Zhejiang University, Hangzhou, China.
- Key Laboratory of Medical Molecular Imaging of Zhejiang Province, Hangzhou, China.
- College of Biomedical Engineering & Instrument Science, Zhejiang University, Hangzhou, China.
- Key Laboratory for Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou, China.
| |
Collapse
|
31
|
Hwang S, Brown NK, Ahmed OY, Jenike KM, Kovaka S, Schatz MC, Langmead B. Mem-based pangenome indexing for k-mer queries. Algorithms Mol Biol 2025; 20:3. [PMID: 40025556 PMCID: PMC11871630 DOI: 10.1186/s13015-025-00272-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Accepted: 02/13/2025] [Indexed: 03/04/2025] Open
Abstract
Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition and conservation within pangenomes have limitations. Methods based on graph pangenomes require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes based on k-mers and de Bruijn graphs are limited to answering questions at a specific substring length k. We present Maximal Exact Match Ordered (MEMO), a pangenome indexing method based on maximal exact matches (MEMs) between sequences. A single MEMO index can handle arbitrary-length queries over pangenomic windows. MEMO enables both queries that test k-mer presence/absence (membership queries) and that count the number of genomes containing k-mers in a window (conservation queries). MEMO's index for a pangenome of 89 human autosomal haplotypes fits in 2.04 GB, 8.8 × smaller than a comparable KMC3 index and 11.4 × smaller than a PanKmer index. MEMO indexes can be made smaller by sacrificing some counting resolution, with our decile-resolution HPRC index reaching 0.67 GB. MEMO can conduct a conservation query for 31-mers over the human leukocyte antigen locus in 13.89 s, 2.5 × faster than other approaches. MEMO's small index size, lack of k-mer length dependence, and efficient queries make it a flexible tool for studying and visualizing substring conservation in pangenomes.
Collapse
Affiliation(s)
- Stephen Hwang
- XDBio Program, Johns Hopkins University, Baltimore, MD, USA
| | - Nathaniel K Brown
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Omar Y Ahmed
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Katharine M Jenike
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
32
|
Sanaullah A, Villalobos S, Zhi D, Zhang S. Haplotype Matching with GBWT for Pangenome Graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.03.634410. [PMID: 39975036 PMCID: PMC11838520 DOI: 10.1101/2025.02.03.634410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Traditionally, variations from a linear reference genome were used to represent large sets of haplotypes compactly. In the linear reference genome based paradigm, the positional Burrows-Wheeler transform (PBWT) has traditionally been used to perform efficient haplotype matching. Pangenome graphs have recently been proposed as an alternative to linear reference genomes for representing the full spectrum of variations in the human genome. However, haplotype matches in pangenome graph based haplotype sets are not trivially generalizable from haplotype matches in the linear reference genome based haplotype sets. Work has been done to represent large sets of haplotypes as paths through a pangenome graph. The graph Burrows-Wheeler transform (GBWT) is one such work. The GBWT essentially stores the haplotype paths in a run length compressed BWT with compressed local alphabets. Although efficient in practice count and locate queries on the GBWT were provided by the original authors, the efficient haplotype matching capabilities of the PBWT have never been shown on the GBWT. In this paper, we formally define the notion of haplotype matches in pangenome graph-based haplotype sets by generalizing from haplotype matches in linear reference genome-based haplotype sets. We also describe the relationship between set maximal matches, long matches, locally maximal matches, and text maximal matches on the GBWT, PBWT, and the BWT. We provide algorithms for outputting some of these matches by applying the data structures of the r-index (introduced by Gagie et al.) to the GBWT. We show that these structures enable set maximal match and long match queries on the GBWT in almost linear time and in space close to linear in the number of runs in the GBWT. We also provide multiple versions of the query algorithms for different combinations of the available data structures. The long match query algorithms presented here even run on the BWT in the same time complexity as the GBWT due to their similarity.
Collapse
Affiliation(s)
- Ahsan Sanaullah
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| | - Seba Villalobos
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| | - Degui Zhi
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| |
Collapse
|
33
|
He G, Liu C, Wang M. Perspectives and opportunities in forensic human, animal, and plant integrative genomics in the Pangenome era. Forensic Sci Int 2025; 367:112370. [PMID: 39813779 DOI: 10.1016/j.forsciint.2025.112370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Revised: 12/24/2024] [Accepted: 01/08/2025] [Indexed: 01/18/2025]
Abstract
The Human Pangenome Reference Consortium, the Chinese Pangenome Consortium, and other plant and animal pangenome projects have announced the completion of pilot work aimed at constructing high-quality, haplotype-resolved reference graph genomes representative of global ethno-linguistically different populations or different plant and animal species. These graph-based, gapless pangenome references, which are enriched in terms of genomic diversity, completeness, and contiguity, have the potential for enhancing long-read sequencing (LRS)-based genomic research, as well as improving mappability and variant genotyping on traditional short-read sequencing platforms. We comprehensively discuss the advancements in pangenome-based genomic integrative genomic discoveries across forensic-related species (humans, animals, and plants) and summarize their applications in variant identification and forensic genomics, epigenetics, transcriptomics, and microbiome research. Recent developments in multiplexed array sequencing have introduced a highly efficient and programmable technique to overcome the limitations of short forensic marker lengths in LRS platforms. This technique enables the concatenation of short RNA transcripts and DNA fragments into LRS-optimal molecules for sequencing, assembly, and genotyping. The integration of new pangenome reference coordinates and corresponding computational algorithms will benefit forensic integrative genomics by facilitating new marker identification, accurate genotyping, high-resolution panel development, and the updating of statistical algorithms. This review highlights the necessity of integrating LRS-based platforms, pangenome-based study designs, and graph-based pangenome references in short-read mapping and LRS-based innovations to achieve precision forensic science.
Collapse
Affiliation(s)
- Guanglin He
- Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610000, China; Center for Archaeological Science, Sichuan University, Chengdu 610000, China.
| | - Chao Liu
- Anti-Drug Technology Center of Guangdong Province, Guangzhou 510230, China.
| | - Mengge Wang
- Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610000, China; Center for Archaeological Science, Sichuan University, Chengdu 610000, China; Department of Forensic Medicine, College of Basic Medicine, Chongqing Medical University, Chongqing 400331, China.
| |
Collapse
|
34
|
Ainsworth HC, Baker Frost D, Lim SS, Ramos PS. Breaking research silos to achieve equitable precision medicine in rheumatology. Nat Rev Rheumatol 2025; 21:98-110. [PMID: 39794514 PMCID: PMC11910143 DOI: 10.1038/s41584-024-01204-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/27/2024] [Indexed: 01/13/2025]
Abstract
Health disparities in rheumatic disease are well established and urgently need addressing. Obstacles to precision medicine equity span both the clinical and the research domains, with a focus placed on structural barriers limiting equitable health care access and inclusivity in research. Less articulated factors include the use of inaccurate population descriptors and the existence of research silos in rheumatology research, which creates a knowledge gap that precludes addressing the health disparities and fulfilling the goals of precision medicine to understand the 'full patient'. The biopsychosocial model is a research framework that intertwines layers of biological and environmental effects to understand disease. However, very limited rheumatology research bridges across molecular and epidemiological studies of environmental exposures, such as physical and social determinants of health. In this Review, we discuss clinical obstacles to health care equity, including access to health care and the use of inaccurate language when labelling population groups. We explore the goals and data needed for research under the biopsychosocial model. We describe results from a rheumatic disease literature search that highlights the paucity of studies investigating the molecular influences of systemic exposures. We conclude with a list of considerations and recommendations to help achieve equitable precision medicine.
Collapse
Affiliation(s)
- Hannah C Ainsworth
- Department of Biostatistics and Data Science, Division of Public Health Sciences, Wake Forest University School of Medicine, Winston-Salem, NC, USA
- Wake Forest Center for Precision Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - DeAnna Baker Frost
- Department of Medicine, Division of Rheumatology, Medical University of South Carolina, Charleston, SC, USA
| | - S Sam Lim
- Department of Medicine, Division of Rheumatology, Emory University School of Medicine, Atlanta, GA, USA
| | - Paula S Ramos
- Department of Medicine, Division of Rheumatology, Medical University of South Carolina, Charleston, SC, USA.
- Department of Medicine, Division of Rheumatology, Emory University School of Medicine, Atlanta, GA, USA.
| |
Collapse
|
35
|
LoTempio JE, Moreno JD. Overcoming challenges associated with broad sharing of human genomic data. Nat Genet 2025; 57:287-294. [PMID: 39843657 PMCID: PMC11849138 DOI: 10.1038/s41588-024-02049-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Accepted: 12/04/2024] [Indexed: 01/24/2025]
Abstract
Since the Human Genome Project, the consensus position in genomics has been that data should be shared widely to achieve the greatest societal benefit. This position relies on imprecise definitions of the concept of 'broad data sharing'. Accordingly, the implementation of data sharing varies among landmark genomic studies. In this Perspective, we identify definitions of broad that have been used interchangeably, despite their distinct implications. We further offer a framework with clarified concepts for genomic data sharing and probe six examples in genomics that produced public data. Finally, we articulate three challenges. First, we explore the need to reinterpret the limits of general research use data. Second, we consider the governance of public data deposition from extant samples. Third, we ask whether, in light of changing concepts of broad, participants should be encouraged to share their status as participants publicly or not. Each of these challenges is followed with recommendations.
Collapse
Affiliation(s)
- Jonathan E LoTempio
- Department of Medical Ethics and Health Policy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Jonathan D Moreno
- Department of Medical Ethics and Health Policy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- David and Lyn Silfen University Professor Emeritus, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
36
|
Alser M, Eudine J, Mutlu O. Taming large-scale genomic analyses via sparsified genomics. Nat Commun 2025; 16:876. [PMID: 39837860 PMCID: PMC11751491 DOI: 10.1038/s41467-024-55762-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 12/20/2024] [Indexed: 01/23/2025] Open
Abstract
Searching for similar genomic sequences is an essential and fundamental step in biomedical research. State-of-the-art computational methods performing such comparisons fail to cope with the exponential growth of genomic sequencing data. We introduce the concept of sparsified genomics where we systematically exclude a large number of bases from genomic sequences and enable faster and memory-efficient processing of the sparsified, shorter genomic sequences, while providing comparable accuracy to processing non-sparsified sequences. Sparsified genomics provides benefits to many genomic analyses and has broad applicability. Sparsifying genomic sequences accelerates the state-of-the-art read mapper (minimap2) by 2.57-5.38x, 1.13-2.78x, and 3.52-6.28x using real Illumina, HiFi, and ONT reads, respectively, while providing comparable memory footprint, 2x smaller index size, and more correctly detected variations compared to minimap2. Sparsifying genomic sequences makes containment search through very large genomes and large databases 72.7-75.88x (1.62-1.9x when indexing is preprocessed) faster and 723.3x more storage-efficient than searching through non-sparsified genomic sequences (with CMash and KMC3). Sparsifying genomic sequences enables robust microbiome discovery by providing 54.15-61.88x (1.58-1.71x when indexing is preprocessed) faster and 720x more storage-efficient taxonomic profiling of metagenomic samples over the state-of-the-art tool (Metalign).
Collapse
Affiliation(s)
- Mohammed Alser
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland.
- Department of Computer Science, Georgia State University, Atlanta, GA, USA.
- Department of Clinical Pharmacy, University of Southern California, LA, CA, USA.
| | - Julien Eudine
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Onur Mutlu
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| |
Collapse
|
37
|
Collins RL, Talkowski ME. Diversity and consequences of structural variation in the human genome. Nat Rev Genet 2025:10.1038/s41576-024-00808-9. [PMID: 39838028 DOI: 10.1038/s41576-024-00808-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/26/2024] [Indexed: 01/23/2025]
Abstract
The biomedical community is increasingly invested in capturing all genetic variants across human genomes, interpreting their functional consequences and translating these findings to the clinic. A crucial component of this endeavour is the discovery and characterization of structural variants (SVs), which are ubiquitous in the human population, heterogeneous in their mutational processes, key substrates for evolution and adaptation, and profound drivers of human disease. The recent emergence of new technologies and the remarkable scale of sequence-based population studies have begun to crystalize our understanding of SVs as a mutational class and their widespread influence across phenotypes. In this Review, we summarize recent discoveries and new insights into SVs in the human genome in terms of their mutational patterns, population genetics, functional consequences, and impact on human traits and disease. We conclude by outlining three frontiers to be explored by the field over the next decade.
Collapse
Affiliation(s)
- Ryan L Collins
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
38
|
Kim J, Park J, Yang J, Kim S, Joe S, Park G, Hwang T, Cho MJ, Lee S, Lee JE, Park JH, Yeo MK, Kim SY. Highly accurate Korean draft genomes reveal structural variation highlighting human telomere evolution. Nucleic Acids Res 2025; 53:gkae1294. [PMID: 39778865 PMCID: PMC11707537 DOI: 10.1093/nar/gkae1294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 12/09/2024] [Accepted: 01/06/2025] [Indexed: 01/11/2025] Open
Abstract
Given the presence of highly repetitive genomic regions such as subtelomeric regions, understanding human genomic evolution remains challenging. Recently, long-read sequencing technology has facilitated the identification of complex genetic variants, including structural variants (SVs), at the single-nucleotide level. Here, we resolved SVs and their underlying DNA damage-repair mechanisms in subtelomeric regions, which are among the most uncharted genomic regions. We generated ∼20 × high-fidelity long-read sequencing data from three Korean individuals and their partially phased high-quality de novo genome assemblies (contig N50: 6.3-58.2 Mb). We identified 131 138 deletion and 121 461 insertion SVs, 41.6% of which were prevalent in the East Asian population. The commonality of the SVs identified among the Korean population was examined by short-read sequencing data from 103 Korean individuals, providing the first comprehensive SV set representing the population based on the long-read assemblies. Manual investigation of 19 large subtelomeric SVs (≥5 kb) and their associated repair signatures revealed the potential repair mechanisms leading to the formation of these SVs. Our study provides mechanistic insight into human telomere evolution and can facilitate our understanding of human SV formation.
Collapse
Affiliation(s)
- Jun Kim
- Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
- Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience & Biotechnology, 125, Gwahak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Jong Lyul Park
- Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience & Biotechnology, 125, Gwahak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
- Department of Bioscience, University of Science and Technology (UST), 217, Gajeong-ro, Yuseong-gu, Daejeon 34113, Republic of Korea
| | - Jin Ok Yang
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, 125, Gwahak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science & Technology (KAIST), 291, Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Sangok Kim
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, 125, Gwahak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
- Department of Bioscience, University of Science and Technology (UST), 217, Gajeong-ro, Yuseong-gu, Daejeon 34113, Republic of Korea
| | - Soobok Joe
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, 125, Gwahak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Gunwoo Park
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, 125, Gwahak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Taeyeon Hwang
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, 125, Gwahak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Mun-Jeong Cho
- Department of Bioscience, University of Science and Technology (UST), 217, Gajeong-ro, Yuseong-gu, Daejeon 34113, Republic of Korea
| | - Seungjae Lee
- DNALink, Inc, 31, Magokjungang 8-ro 3-gil, Gangseo-gu, Seoul 07793, Republic of Korea
| | - Jong-Eun Lee
- DNALink, Inc, 31, Magokjungang 8-ro 3-gil, Gangseo-gu, Seoul 07793, Republic of Korea
| | - Ji-Hwan Park
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, 125, Gwahak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
- Department of Biological Science, Ajou University, 206, World cup-ro, Yeongtong-gu, Suwon 16499, Republic of Korea
| | - Min-Kyung Yeo
- Department of Pathology, Chungnam National University School of Medicine, 282, Munhwa-ro, Jung-gu, Daejeon 35015, Republic of Korea
| | - Seon-Young Kim
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, 125, Gwahak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
- Department of Bioscience, University of Science and Technology (UST), 217, Gajeong-ro, Yuseong-gu, Daejeon 34113, Republic of Korea
| |
Collapse
|
39
|
Secomandi S, Gallo GR, Rossi R, Rodríguez Fernandes C, Jarvis ED, Bonisoli-Alquati A, Gianfranceschi L, Formenti G. Pangenome graphs and their applications in biodiversity genomics. Nat Genet 2025; 57:13-26. [PMID: 39779953 DOI: 10.1038/s41588-024-02029-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 11/08/2024] [Indexed: 01/11/2025]
Abstract
Complete datasets of genetic variants are key to biodiversity genomic studies. Long-read sequencing technologies allow the routine assembly of highly contiguous, haplotype-resolved reference genomes. However, even when complete, reference genomes from a single individual may bias downstream analyses and fail to adequately represent genetic diversity within a population or species. Pangenome graphs assembled from aligned collections of high-quality genomes can overcome representation bias by integrating sequence information from multiple genomes from the same population, species or genus into a single reference. Here, we review the available tools and data structures to build, visualize and manipulate pangenome graphs while providing practical examples and discussing their applications in biodiversity and conservation genomics across the tree of life.
Collapse
Affiliation(s)
- Simona Secomandi
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
| | | | - Riccardo Rossi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
| | - Carlos Rodríguez Fernandes
- Centre for Ecology, Evolution and Environmental Changes (CE3C) and CHANGE, Global Change and Sustainability Institute, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
- Faculdade de Psicologia, Universidade de Lisboa, Lisboa, Portugal
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
- The Vertebrate Genome Laboratory, New York, NY, USA
| | - Andrea Bonisoli-Alquati
- Department of Biological Sciences, California State Polytechnic University, Pomona, Pomona, CA, USA
| | | | | |
Collapse
|
40
|
Suzuki T, Ninomiya K, Funayama T, Okamura Y, Tadaka S, Kinoshita K, Yamamoto M, Kure S, Kikuchi A, Tamiya G, Takayama J. Next-generation sequencing analysis with a population-specific human reference genome. Genes Genet Syst 2024; 99:n/a. [PMID: 39462538 DOI: 10.1266/ggs.24-00112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/29/2024] Open
Abstract
Next-generation sequencing (NGS) has become widely available and is routinely used in basic research and clinical practice. The reference genome sequence is an essential resource for NGS analysis, and several population-specific reference genomes have recently been constructed to provide a choice to deal with the vast genetic diversity of human samples. However, resources supporting population-specific references are insufficient, and it is burdensome to perform analysis using these reference genomes. Here, we constructed a set of resources to support NGS analysis using the Japanese reference genome, JG. We created resources for variant calling, variant effect prediction, gene and repeat element annotations, read mappability and RNA-seq analysis. We also provide a resource for reference coordinate conversion for further annotation enrichment. We then provide a variant calling protocol with JG. Our resources provide a guide to prepare sufficient resources for the use of population-specific reference genomes and can facilitate the migration of reference genomes.
Collapse
Affiliation(s)
- Tomohisa Suzuki
- Department of AI and Innovative Medicine, Tohoku University School of Medicine
- Department of Pediatrics, Tohoku University School of Medicine
| | - Kota Ninomiya
- Department of AI and Innovative Medicine, Tohoku University School of Medicine
- Deceased July 13, 2024
| | - Takamitsu Funayama
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University
- RIKEN Center for Advanced Intelligence Project
| | - Yasunobu Okamura
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University
| | - Shu Tadaka
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University
| | - Kengo Kinoshita
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University
- Tohoku Medical Megabank Organization, Tohoku University
- Department of Applied Information Sciences, Graduate School of Information Sciences, Tohoku University
- Department of In Silico Analyses, Institute of Development, Aging and Cancer, Tohoku University
| | - Masayuki Yamamoto
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University
- Department of Biochemistry and Molecular Biology, Tohoku Medical Megabank Organization, Tohoku University
| | - Shigeo Kure
- Department of Pediatrics, Tohoku University School of Medicine
- Miyagi Children's Hospital
| | - Atsuo Kikuchi
- Department of Pediatrics, Tohoku University School of Medicine
| | - Gen Tamiya
- Department of AI and Innovative Medicine, Tohoku University School of Medicine
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University
- RIKEN Center for Advanced Intelligence Project
| | - Jun Takayama
- Department of AI and Innovative Medicine, Tohoku University School of Medicine
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University
- RIKEN Center for Advanced Intelligence Project
| |
Collapse
|
41
|
Zhou W, Mumm C, Gan Y, Switzenberg JA, Wang J, De Oliveira P, Kathuria K, Losh SJ, McDonald TL, Bessell B, Van Deynze K, McConnell MJ, Boyle AP, Mills RE. A personalized multi-platform assessment of somatic mosaicism in the human frontal cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.18.629274. [PMID: 39763954 PMCID: PMC11702624 DOI: 10.1101/2024.12.18.629274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/16/2025]
Abstract
Somatic mutations in individual cells lead to genomic mosaicism, contributing to the intricate regulatory landscape of genetic disorders and cancers. To evaluate and refine the detection of somatic mosaicism across different technologies with personalized donor-specific assembly (DSA), we obtained tissue from the dorsolateral prefrontal cortex (DLPFC) of a post-mortem neurotypical 31-year-old individual. We sequenced bulk DLPFC tissue using Oxford Nanopore Technologies (~60X), NovaSeq (~30X), and linked-read sequencing (~28X). Additionally, we applied Cas9 capture methodology coupled with long-read sequencing (TEnCATS), targeting active transposable elements. We also isolated and amplified DNA from flow-sorted single DLPFC neurons using MALBAC, sequencing 115 of these MALBAC libraries on Nanopore and 94 on NovaSeq. We constructed a haplotype-resolved assembly with a total length of 5.77 Gb and a phase block length of 2.67 Mb (N50) to facilitate cross-platform analysis of somatic genetic variations. We observed an increase in the phasing rate from 11.6% to 38.0% between short-read and long-read technologies. By generating a catalog of phased germline SNVs, CNVs, and TEs from the assembled genome, we applied standard approaches to recall these variants across sequencing technologies. We achieved aggregated recall rates from 97.3% to 99.4% based on long-read bulk tissue data, setting an upper bound for detection limits. Moreover, utilizing haplotype-based analysis from DSA, we achieved a remarkable reduction in false positive somatic calls in bulk tissue, ranging from 14.9% to 72.4%. We developed pipelines leveraging DSA information to enhance somatic large genetic variant calling in long-read single cells. By examining somatic variation using long-reads in 115 individual neurons, we identified 468 candidate somatic heterozygous large deletions (1.5Mb - 20Mb), 137 of which intersected with short-read single-cell data. Additionally, we identified 61 putative somatic TEs (60 Alus, one LINE-1) in the single-cell data. Collectively, our analysis spans personalized assembly to single-cell somatic variant calling, providing a comprehensive ab initio ad finem approach and resource in real human tissue.
Collapse
Affiliation(s)
- Weichen Zhou
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Camille Mumm
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Yanming Gan
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Jessica A. Switzenberg
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Jinhao Wang
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | | | - Kunal Kathuria
- Lieber Institute for Brain Development, Baltimore, MD, USA
| | - Steven J. Losh
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Torrin L. McDonald
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Brandt Bessell
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Kinsey Van Deynze
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | | | - Alan P. Boyle
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Ryan E. Mills
- Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
42
|
Manchel A, Gee M, Vadigepalli R. From sampling to simulating: Single-cell multiomics in systems pathophysiological modeling. iScience 2024; 27:111322. [PMID: 39628578 PMCID: PMC11612781 DOI: 10.1016/j.isci.2024.111322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/06/2024] Open
Abstract
As single-cell omics data sampling and acquisition methods have accumulated at an unprecedented rate, various data analysis pipelines have been developed for the inference of cell types, cell states and their distribution, state transitions, state trajectories, and state interactions. This presents a new opportunity in which single-cell omics data can be utilized to generate high-resolution, high-fidelity computational models. In this review, we discuss how single-cell omics data can be used to build computational models to simulate biological systems at various scales. We propose that single-cell data can be integrated with physiological information to generate organ-specific models, which can then be assembled to generate multi-organ systems pathophysiological models. Finally, we discuss how generic multi-organ models can be brought to the patient-specific level thus permitting their use in the clinical setting.
Collapse
Affiliation(s)
- Alexandra Manchel
- Daniel Baugh Institute of Functional Genomics/Computational Biology, Department of Pathology and Genomic Medicine, Thomas Jefferson University, Philadelphia, PA, USA
| | - Michelle Gee
- Daniel Baugh Institute of Functional Genomics/Computational Biology, Department of Pathology and Genomic Medicine, Thomas Jefferson University, Philadelphia, PA, USA
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE, USA
| | - Rajanikanth Vadigepalli
- Daniel Baugh Institute of Functional Genomics/Computational Biology, Department of Pathology and Genomic Medicine, Thomas Jefferson University, Philadelphia, PA, USA
| |
Collapse
|
43
|
Sarashetti P, Lipovac J, Tomas F, Šikić M, Liu J. Evaluating data requirements for high-quality haplotype-resolved genomes for creating robust pangenome references. Genome Biol 2024; 25:312. [PMID: 39696427 DOI: 10.1186/s13059-024-03452-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 11/29/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Long-read technologies from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have transformed genomics research by providing diverse data types like HiFi, Duplex, and ultra-long ONT. Despite recent strides in achieving haplotype-phased gapless genome assemblies using long-read technologies, concerns persist regarding the representation of genetic diversity, prompting the development of pangenome references. However, pangenome studies face challenges related to data types, volumes, and cost considerations for each assembled genome, while striving to maintain sensitivity. The absence of comprehensive guidance on optimal data selection exacerbates these challenges. RESULTS Our study evaluates recommended data types and volumes required to establish a robust de novo genome assembly pipeline for population-level pangenome projects, extensively examining performance between ONT's Duplex and PacBio HiFi datasets in the context of achieving high-quality phased genomes with enhanced contiguity and completeness. The results show that achieving chromosome-level haplotype-resolved assembly requires 20 × high-quality long reads such as PacBio HiFi or ONT Duplex, combined with 15-20 × of ultra-long ONT per haplotype and 10 × of long-range data such as Omni-C or Hi-C. High-quality long reads from both platforms yield assemblies with comparable contiguity, with HiFi excelling in phasing accuracies, while Duplex generates more T2T contigs. CONCLUSION Our study provides insights into optimal data types and volumes for robust de novo genome assembly in population-level pangenome projects. Reassessing the recommended data types and volumes in this study and aligning them with practical economic limitations are vital to the pangenome research community, contributing to their efforts and pushing genomic studies with broader impacts.
Collapse
Affiliation(s)
- Prasad Sarashetti
- Laboratory of Human Genomics, Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Josipa Lipovac
- Laboratory for Bioinformatics and Computational Biology, Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
| | - Filip Tomas
- Laboratory for Bioinformatics and Computational Biology, Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
| | - Mile Šikić
- Laboratory for Bioinformatics and Computational Biology, Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia.
- Laboratory of AI in Genomics, Genome Institute of Singapore, A*STAR, Singapore, Singapore.
| | - Jianjun Liu
- Laboratory of Human Genomics, Genome Institute of Singapore, A*STAR, Singapore, Singapore.
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
44
|
Dyshlovoy SA, Paigin S, Afflerbach AK, Lobermeyer A, Werner S, Schüller U, Bokemeyer C, Schuh AH, Bergmann L, von Amsberg G, Joosse SA. Applications of Nanopore sequencing in precision cancer medicine. Int J Cancer 2024; 155:2129-2140. [PMID: 39031959 DOI: 10.1002/ijc.35100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 04/25/2024] [Accepted: 06/25/2024] [Indexed: 07/22/2024]
Abstract
Oxford Nanopore Technologies sequencing, also referred to as Nanopore sequencing, stands at the forefront of a revolution in clinical genetics, offering the potential for rapid, long read, and real-time DNA and RNA sequencing. This technology is currently making sequencing more accessible and affordable. In this comprehensive review, we explore its potential regarding precision cancer diagnostics and treatment. We encompass a critical analysis of clinical cases where Nanopore sequencing was successfully applied to identify point mutations, splice variants, gene fusions, epigenetic modifications, non-coding RNAs, and other pivotal biomarkers that defined subsequent treatment strategies. Additionally, we address the challenges of clinical applications of Nanopore sequencing and discuss the current efforts to overcome them.
Collapse
Affiliation(s)
- Sergey A Dyshlovoy
- Department of Oncology, Oxford Molecular Diagnostics Centre, University of Oxford, Level 4, John Radcliffe Hospital, Oxford, UK
- Department of Oncology, Hematology and Bone Marrow Transplantation with Section Pneumology, University Cancer Center Hamburg (UCCH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Stefanie Paigin
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Institute of Pathology and Neuropathology, University Hospital Tübingen, Tübingen, Germany
| | - Ann-Kristin Afflerbach
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Annabelle Lobermeyer
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Stefan Werner
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Ulrich Schüller
- Research Institute Children's Cancer Center Hamburg, Hamburg, Germany
- Institute for Neuropathology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Department of Paediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Carsten Bokemeyer
- Department of Oncology, Hematology and Bone Marrow Transplantation with Section Pneumology, University Cancer Center Hamburg (UCCH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Anna H Schuh
- Department of Oncology, Oxford Molecular Diagnostics Centre, University of Oxford, Level 4, John Radcliffe Hospital, Oxford, UK
| | - Lina Bergmann
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Gunhild von Amsberg
- Department of Oncology, Hematology and Bone Marrow Transplantation with Section Pneumology, University Cancer Center Hamburg (UCCH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Martini-Klinik, Prostate Cancer Center, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Simon A Joosse
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Mildred Scheel Cancer Career Center HaTriCS4, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
45
|
Holt JM, Harting J, Chen X, Baker D, Saunders CT, Kronenberg Z, Gonzaludo N, Yoo B, Hudjashov G, Jõeloo M, Lawlor JMJ, Lim WK, Jamuar SS, Cooper GM, Milani L, Pastinen T, Eberle MA. StarPhase: Comprehensive Phase-Aware Pharmacogenomic Diplotyper for Long-Read Sequencing Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.10.627527. [PMID: 39713404 PMCID: PMC11661245 DOI: 10.1101/2024.12.10.627527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Pharmacogenomics is central to precision medicine, informing medication safety and efficacy. Pharmacogenomic diplotyping of complex genes requires full-length DNA sequences and detection of structural rearrangements. We introduce StarPhase, a tool that leverages PacBio HiFi sequence data to diplotype 21 CPIC Level A pharmacogenes and provides detailed haplotypes and supporting visualizations for HLA-A, HLA-B, and CYP2D6. StarPhase diplotypes have high concordance with benchmarks where 99.5% are either exact matches or minor discrepancies. Manual inspection of the 0.5% mismatches indicates they were correctly called by StarPhase. With StarPhase, we update or correct 26.2% of GeT-RM pharmacogenomic diplotypes. Population distributions from StarPhase mostly reflect those of the All of Us cohort, while also highlighting gaps in existing pharmacogenomic databases that long-read sequencing can fill. With a single HiFi whole genome sequencing assay, StarPhase enables robust PGx diplotyping even as additional pharmacogenes and haplotypes are discovered.
Collapse
Affiliation(s)
- James M Holt
- PacBio, 1305 O'Brien Drive, Menlo Park, CA 94025, USA
| | - John Harting
- PacBio, 1305 O'Brien Drive, Menlo Park, CA 94025, USA
| | - Xiao Chen
- PacBio, 1305 O'Brien Drive, Menlo Park, CA 94025, USA
| | - Daniel Baker
- PacBio, 1305 O'Brien Drive, Menlo Park, CA 94025, USA
| | | | | | | | - Byunggil Yoo
- Children's Mercy Kansas City, 2401 Gillham Road, Kansas City, MO 64108, USA
| | - Georgi Hudjashov
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Estonia
| | - Maarja Jõeloo
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Estonia
| | - James M J Lawlor
- HudsonAlpha Institute for Biotechnology, 601 Genome Way, Huntsville, AL 35806, USA
| | - Weng Khong Lim
- SingHealth Duke-NUS Institute of Precision Medicine, 5 Hospital Drive, Singapore 169609, Singapore
- Cancer & Stem Cell Biology Program, Duke-NUS Medical School, Singapore, 169857, Singapore
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Saumya S Jamuar
- SingHealth Duke-NUS Institute of Precision Medicine, 5 Hospital Drive, Singapore 169609, Singapore
- Genetics service, KK Women's and Children's Hospital, 100 Bukit Timah Road, Singapore 229899
| | - Gregory M Cooper
- HudsonAlpha Institute for Biotechnology, 601 Genome Way, Huntsville, AL 35806, USA
| | - Lili Milani
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Estonia
| | - Tomi Pastinen
- Children's Mercy Kansas City, 2401 Gillham Road, Kansas City, MO 64108, USA
| | | |
Collapse
|
46
|
van Westerhoven AC, Dijkstra J, Aznar Palop JL, Wissink K, Bell J, Kema GHJ, Seidl MF. Frequent genetic exchanges revealed by a pan-mitogenome graph of a fungal plant pathogen. mBio 2024; 15:e0275824. [PMID: 39535230 PMCID: PMC11633160 DOI: 10.1128/mbio.02758-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Accepted: 10/15/2024] [Indexed: 11/16/2024] Open
Abstract
Mitochondria are present in almost all eukaryotic lineages. The mitochondrial genomes (mitogenomes) evolve separately from nuclear genomes, and they can therefore provide relevant insights into the evolution of their host species. Fusarium oxysporum is a major fungal plant pathogen that is assumed to reproduce clonally. However, horizontal chromosome transfer between strains can occur through heterokaryon formation, and recently, signs of sexual recombination have been observed. Similarly, signs of recombination in F. oxysporum mitogenomes challenged the prevailing assumption of clonal reproduction in this species. Here, we construct, to our knowledge, the first fungal pan-mitogenome graph of nearly 500 F. oxysporum mitogenome assemblies to uncover the variation and evolution. In general, the gene order of fungal mitogenomes is not well conserved, yet the mitogenome of F. oxysporum and related species are highly colinear. We observed two strikingly contrasting regions in the F. oxysporum pan-mitogenome, comprising a highly conserved core mitogenome and a long variable region (6-16 kb in size), of which we identified three distinct types. The pan-mitogenome graph reveals that only five intron insertions occurred in the core mitogenome and that the long variable regions drive the difference between mitogenomes. Moreover, we observed that their evolution is neither concurrent with the core mitogenome nor with the nuclear genome. Our large-scale analysis of long variable regions uncovers frequent recombination between mitogenomes, even between strains that belong to different taxonomic clades. This challenges the common assumption of incompatibility between genetically diverse F. oxysporum strains and provides new insights into the evolution of this fungal species.IMPORTANCEInsights into plant pathogen evolution is essential for the understanding and management of disease. Fusarium oxysporum is a major fungal pathogen that can infect many economically important crops. Pathogenicity can be transferred between strains by the horizontal transfer of pathogenicity chromosomes. The fungus has been thought to evolve clonally, yet recent evidence suggests active sexual recombination between related isolates, which could at least partially explain the horizontal transfer of pathogenicity chromosomes. By constructing a pan-genome graph of nearly 500 mitochondrial genomes, we describe the genetic variation of mitochondria in unprecedented detail and demonstrate frequent mitochondrial recombination. Importantly, recombination can occur between genetically diverse isolates from distinct taxonomic clades and thus can shed light on genetic exchange between fungal strains.
Collapse
Affiliation(s)
- Anouk C. van Westerhoven
- Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands
- Laboratory of Phytopathology, Wageningen University and Research, Wageningen, Netherlands
| | - Jelmer Dijkstra
- Laboratory of Phytopathology, Wageningen University and Research, Wageningen, Netherlands
| | - Jose L. Aznar Palop
- Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands
| | - Kyran Wissink
- Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands
| | - Jasper Bell
- Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands
| | - Gert H. J. Kema
- Laboratory of Phytopathology, Wageningen University and Research, Wageningen, Netherlands
| | - Michael F. Seidl
- Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands
| |
Collapse
|
47
|
Ramos MA, Bonini KE, Scarimbolo L, Kelly NR, Insel B, Suckiel SA, Brown K, Di Biase M, Gallagher KM, Lopez J, Aguiñiga KL, Marathe PN, Maria E, Odgis JA, Rodriguez JE, Rodriguez MA, Ruiz N, Sebastin M, Yelton NM, Cunningham-Rundles C, Gertner M, Laguerre I, McDonald TV, McGoldrick PE, Robinson M, Rubinstein A, Shulman LH, Williams T, Wolf SM, Yozawitz EG, Zinberg RE, Abul-Husn NS, Bauman LJ, Diaz GA, Ferket BS, Greally JM, Jobanputra V, Gelb BD, Kenny EE, Wasserstein MP, Horowitz CR. Employing effective recruitment and retention strategies to engage a diverse pediatric population in genomics research. Am J Hum Genet 2024; 111:2607-2617. [PMID: 39566494 PMCID: PMC11639093 DOI: 10.1016/j.ajhg.2024.10.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 10/18/2024] [Accepted: 10/23/2024] [Indexed: 11/22/2024] Open
Abstract
Underrepresentation in clinical genomics research limits the generalizability of findings and the benefits of scientific discoveries. We describe the impact of patient-centered, data-driven recruitment and retention strategies in a pediatric genome sequencing study. We collaborated with a stakeholder board, conducted formative research with adults whose children had undergone genomic testing, and piloted and revised study approaches and materials. Our approaches included racially, ethnically, and linguistically congruent study staff, relational interactions, study visit flexibility, and data-informed quality improvement. Of 1,656 eligible children, only 6.5% declined. Their parents/legal guardians were 76.9% non-White, 65.6% had public health insurance for the child, 49.9% lived below the federal poverty level, and 52.8% resided in a medically underserved area. Among those enrolled, 87.3% completed all study procedures. There were no sociodemographic differences between those who enrolled and declined or between those retained and lost to follow-up. We outline stakeholder-engaged approaches that may have led to the successful enrollment and retention of diverse families. These approaches may inform future research initiatives aiming to engage and retain underrepresented populations in genomics medicine research.
Collapse
Affiliation(s)
- Michelle A Ramos
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Institute for Health Equity Research, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Katherine E Bonini
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Laura Scarimbolo
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Nicole R Kelly
- Department of Pediatrics, Division of Pediatric Genetic Medicine, Children's Hospital at Montefiore/Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA
| | - Beverly Insel
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sabrina A Suckiel
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kaitlyn Brown
- Department of Pediatrics, Division of Pediatric Genetic Medicine, Children's Hospital at Montefiore/Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA; Illumina, Inc., Foster City, CA, USA
| | - Miranda Di Biase
- Department of Pediatrics, Division of Pediatric Genetic Medicine, Children's Hospital at Montefiore/Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA
| | - Katie M Gallagher
- Department of Pediatrics, Division of Pediatric Genetic Medicine, Children's Hospital at Montefiore/Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA
| | - Jessenia Lopez
- Department of Pediatrics, Division of Pediatric Genetic Medicine, Children's Hospital at Montefiore/Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA
| | - Karla López Aguiñiga
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Priya N Marathe
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Estefany Maria
- Department of Pediatrics, Division of Pediatric Genetic Medicine, Children's Hospital at Montefiore/Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA
| | - Jacqueline A Odgis
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jessica E Rodriguez
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Michelle A Rodriguez
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Nairovylex Ruiz
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Monisha Sebastin
- Department of Pediatrics, Division of Pediatric Genetic Medicine, Children's Hospital at Montefiore/Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA
| | - Nicole M Yelton
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Charlotte Cunningham-Rundles
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Melvin Gertner
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Irma Laguerre
- Institute for Health Equity Research, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Thomas V McDonald
- Department of Medicine (Cardiology), Montefiore/Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA
| | - Patricia E McGoldrick
- Department of Pediatrics, Division of Child Neurology, New York Medical College, Valhalla, NY, USA; Pediatric Neurology, Boston Children's Health Physicians/Maria Fareri Children's Hospital, Hawthorne, NY, USA
| | | | - Arye Rubinstein
- Department of Allergy and Immunology, Children's Hospital at Montefiore/Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA
| | - Lisa H Shulman
- Department of Pediatrics, Division of Developmental Medicine, Rose F. Kennedy Children's Evaluation & Rehabilitation Center at Children's Hospital at Montefiore/Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA
| | | | - Steven M Wolf
- Department of Pediatrics, Division of Child Neurology, New York Medical College, Valhalla, NY, USA; Pediatric Neurology, Boston Children's Health Physicians/Maria Fareri Children's Hospital, Hawthorne, NY, USA
| | - Elissa G Yozawitz
- Isabelle Rapin Division of Child Neurology of the Saul R. Korey Department of Neurology at Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA; Department of Pediatrics, Children's Hospital at Montefiore/Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA
| | - Randi E Zinberg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Obstetrics, Gynecology and Reproductive Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Noura S Abul-Husn
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; 23andMe, Inc., Sunnyvale, CA, USA
| | - Laurie J Bauman
- Department of Pediatrics, Division of Ambulatory Pediatrics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - George A Diaz
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bart S Ferket
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - John M Greally
- Department of Pediatrics, Division of Pediatric Genetic Medicine, Children's Hospital at Montefiore/Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA
| | - Vaidehi Jobanputra
- Molecular Diagnostics, New York Genome Center, New York, NY, USA; Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY, USA
| | - Bruce D Gelb
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Melissa P Wasserstein
- Department of Pediatrics, Division of Pediatric Genetic Medicine, Children's Hospital at Montefiore/Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA
| | - Carol R Horowitz
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Institute for Health Equity Research, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
48
|
Rajaby R, Sung WK. SurVIndel2: improving copy number variant calling from next-generation sequencing using hidden split reads. Nat Commun 2024; 15:10473. [PMID: 39622819 PMCID: PMC11612505 DOI: 10.1038/s41467-024-53087-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 09/30/2024] [Indexed: 12/06/2024] Open
Abstract
Deletions and tandem duplications (commonly called CNVs) represent the majority of structural variations in a human genome. They can be identified using short reads, but because they frequently occur in repetitive regions, existing methods fail to detect most of them. This is because CNVs in repetitive regions often do not produce the evidence needed by existing short reads-based callers (split reads, discordant pairs or read depth change). Here, we introduce a new CNV short reads-based caller named SurVIndel2. SurVindel2 builds on statistical techniques we previously developed, but also employs a novel type of evidence, hidden split reads, that can uncover many CNVs missed by existing algorithms. We use public benchmarks to show that SurVIndel2 outperforms other popular callers, both on human and non-human datasets. Then, we demonstrate the practical utility of the method by generating a catalogue of CNVs for the 1000 Genomes Project that contains hundreds of thousands of CNVs missing from the most recent public catalogue. We also show that SurVIndel2 is able to complement small indels predicted by Google DeepVariant, and the two software used in tandem produce a remarkably complete catalogue of variants in an individual. Finally, we characterise how the limitations of current sequencing technologies contribute significantly to the missing CNVs.
Collapse
Affiliation(s)
- Ramesh Rajaby
- Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
- A*STAR Genome Institute of Singapore, Singapore, Singapore
- Shibuya Lab, Division of Medical Data Informatics, Human Genome Center, University of Tokyo, Tokyo, Japan
| | - Wing-Kin Sung
- Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China.
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China.
- A*STAR Genome Institute of Singapore, Singapore, Singapore.
- JC STEM Laboratory of Computational Genomics, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China.
- School of Computing, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
49
|
Öztürk Ü, Mattavelli M, Ribeca P. GIN-TONIC: non-hierarchical full-text indexing for graph genomes. NAR Genom Bioinform 2024; 6:lqae159. [PMID: 39664816 PMCID: PMC11632618 DOI: 10.1093/nargab/lqae159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 10/08/2024] [Accepted: 11/01/2024] [Indexed: 12/13/2024] Open
Abstract
This paper presents a new data structure, GIN-TONIC (Graph INdexing Through Optimal Near Interval Compaction), designed to index arbitrary string-labelled directed graphs representing, for instance, pangenomes or transcriptomes. GIN-TONIC provides several capabilities not offered by other graph-indexing methods based on the FM-Index. It is non-hierarchical, handling a graph as a monolithic object; it indexes at nucleotide resolution all possible walks in the graph without the need to explicitly store them; it supports exact substring queries in polynomial time and space for all possible walk roots in the graph, even if there are exponentially many walks corresponding to such roots. Specific ad-hoc optimizations, such as precomputed caches, allow GIN-TONIC to achieve excellent performance for input graphs of various topologies and sizes. Robust scalability capabilities and a querying performance close to that of a linear FM-Index are demonstrated for two real-world applications on the scale of human pangenomes and transcriptomes. Source code and associated benchmarks are available on GitHub.
Collapse
Affiliation(s)
- Ünsal Öztürk
- SCI-STI-MM, EPFL, ELB 118, Station 11, 1015, Lausanne, Switzerland
| | - Marco Mattavelli
- SCI-STI-MM, EPFL, ELB 118, Station 11, 1015, Lausanne, Switzerland
| | - Paolo Ribeca
- Biomathematics and Statistics Scotland, The James Hutton Institute, Peter Guthrie Tait Road, EH9 3FD, Edinburgh, United Kingdom
- Clinical and Emerging Infection, UK Health Security Agency, 61 Colindale Avenue, NW9 5EQ, London, United Kingdom
- NIHR Health Protection Research Unit in Genomics and Enabling Data, University of Warwick, Gibbet Hill Road, CV4 7AL, Coventry, United Kingdom
- NIHR Health Protection Research Unit in Gastrointestinal Infections, University of Liverpool, 8 West Derby Street, L69 7BE, Liverpool, United Kingdom
| |
Collapse
|
50
|
Kalbfleisch TS, Smith ML, Ciosek JL, Li K, Doris PA. Three decades of rat genomics: approaching the finish(ed) line. Physiol Genomics 2024; 56:807-818. [PMID: 39348459 PMCID: PMC11573253 DOI: 10.1152/physiolgenomics.00110.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 09/11/2024] [Accepted: 09/26/2024] [Indexed: 10/02/2024] Open
Abstract
The rat, Rattus norvegicus, has provided an important model for investigation of a range of characteristics of biomedical importance. Here we survey the origins of this species, its introduction into laboratory research, and the emergence of genetic and genomic methods that utilize this model organism. Genomic studies have yielded important progress and provided new insight into several biologically important traits. However, some studies have been impeded by the lack of a complete and accurate reference genome for this species. New sequencing and genome assembly methods applied to the rat have resulted in a new reference genome assembly, GRCr8, which is a near telomere-to-telomere assembly of high base-level accuracy that incorporates several elements not captured in prior assemblies. As genome assembly methods continue to advance and production costs become a less significant obstacle, genome assemblies for multiple inbred rat strains are emerging. These assemblies will allow a rat pangenome assembly to be constructed that captures all the genetic variations in strains selected for their utility in research and will overcome reference bias, a limitation associated with reliance on a single reference assembly. By this means, the full utility of this model organism to genomic studies will begin to be revealed.
Collapse
Affiliation(s)
- Theodore S Kalbfleisch
- Gluck Equine Research Center, University of Kentucky, Lexington, Kentucky, United States
| | - Melissa L Smith
- Department of Biochemistry and Molecular Biology, University of Louisville School of Medicine, Louisville, Kentucky, United States
| | - Julia L Ciosek
- Gluck Equine Research Center, University of Kentucky, Lexington, Kentucky, United States
| | - Kai Li
- Gluck Equine Research Center, University of Kentucky, Lexington, Kentucky, United States
| | - Peter A Doris
- Center for Human Genetics, Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center, Houston, Texas, United States
| |
Collapse
|