1
|
Callahan TJ, Tripodi IJ, Stefanski AL, Cappelletti L, Taneja SB, Wyrwa JM, Casiraghi E, Matentzoglu NA, Reese J, Silverstein JC, Hoyt CT, Boyce RD, Malec SA, Unni DR, Joachimiak MP, Robinson PN, Mungall CJ, Cavalleri E, Fontana T, Valentini G, Mesiti M, Gillenwater LA, Santangelo B, Vasilevsky NA, Hoehndorf R, Bennett TD, Ryan PB, Hripcsak G, Kahn MG, Bada M, Baumgartner WA, Hunter LE. An open source knowledge graph ecosystem for the life sciences. Sci Data 2024; 11:363. [PMID: 38605048 PMCID: PMC11009265 DOI: 10.1038/s41597-024-03171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/21/2024] [Indexed: 04/13/2024] Open
Abstract
Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
| | - Ignacio J Tripodi
- Computer Science Department, Interdisciplinary Quantitative Biology, University of Colorado Boulder, Boulder, CO, 80301, USA
| | - Adrianne L Stefanski
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Jordan M Wyrwa
- Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Jonathan C Silverstein
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Charles Tapley Hoyt
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Scott A Malec
- Division of Translational Informatics, University of New Mexico School of Medicine, Albuquerque, NM, 87131, USA
| | - Deepak R Unni
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Marcin P Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Peter N Robinson
- Berlin Institute of Health at Charité-Universitatsmedizin, 10117, Berlin, Germany
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Emanuele Cavalleri
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- ELLIS, European Laboratory for Learning and Intelligent Systems, Milan Unit, Italy
| | - Marco Mesiti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Lucas A Gillenwater
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Brook Santangelo
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Nicole A Vasilevsky
- Data Collaboration Center, Critical Path Institute, 1840 E River Rd. Suite 100, Tucson, AZ, 85718, USA
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Tellen D Bennett
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Patrick B Ryan
- Janssen Research and Development, Raritan, NJ, 08869, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Michael G Kahn
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Michael Bada
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - William A Baumgartner
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| |
Collapse
|
2
|
Ma C, Liu S, Koslicki D. MetagenomicKG: a knowledge graph for metagenomic applications. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.14.585056. [PMID: 38559251 PMCID: PMC10980061 DOI: 10.1101/2024.03.14.585056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Motivation The sheer volume and variety of genomic content within microbial communities makes metagenomics a field rich in biomedical knowledge. To traverse these complex communities and their vast unknowns, metagenomic studies often depend on distinct reference databases, such as the Genome Taxonomy Database (GTDB), the Kyoto Encyclopedia of Genes and Genomes (KEGG), and the Bacterial and Viral Bioinformatics Resource Center (BV-BRC), for various analytical purposes. These databases are crucial for genetic and functional annotation of microbial communities. Nevertheless, the inconsistent nomenclature or identifiers of these databases present challenges for effective integration, representation, and utilization. Knowledge graphs (KGs) offer an appropriate solution by organizing biological entities and their interrelations into a cohesive network. The graph structure not only facilitates the unveiling of hidden patterns but also enriches our biological understanding with deeper insights. Despite KGs having shown potential in various biomedical fields, their application in metagenomics remains underexplored. Results We present MetagenomicKG, a novel knowledge graph specifically tailored for metagenomic analysis. MetagenomicKG integrates taxonomic, functional, and pathogenesis-related information from widely used databases, and further links these with established biomedical knowledge graphs to expand biological connections. Through several use cases, we demonstrate its utility in enabling hypothesis generation regarding the relationships between microbes and diseases, generating sample-specific graph embeddings, and providing robust pathogen prediction. Availability and Implementation The source code and technical details for constructing the MetagenomicKG and reproducing all analyses are available at Github: https://github.com/KoslickiLab/MetagenomicKG. We also host a Neo4j instance: http://mkg.cse.psu.edu:7474 for accessing and querying this graph.
Collapse
Affiliation(s)
- Chunyu Ma
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
| | - Shaopeng Liu
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
| | - David Koslicki
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania, USA
- Department of Biology, Pennsylvania State University, State College, Pennsylvania, USA
- The One Health Microbiome Center, Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
| |
Collapse
|
3
|
Kilicoglu H, Ensan F, McInnes B, Wang LL. Semantics-enabled biomedical literature analytics. J Biomed Inform 2024; 150:104588. [PMID: 38244957 DOI: 10.1016/j.jbi.2024.104588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 01/10/2024] [Indexed: 01/22/2024]
Affiliation(s)
- Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana Champaign, Champaign, IL, USA.
| | - Faezeh Ensan
- Department of Electrical, Computer, and Biomedical Engineering, Toronto Metropolitan University, Toronto, ON, Canada.
| | - Bridget McInnes
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| | - Lucy Lu Wang
- Information School, University of Washington, Seattle, WA, USA.
| |
Collapse
|
4
|
Liu F, Patt A, Chen C, Huang R, Xu Y, Mathé EA, Zhu Q. Exploring NCATS in-house biomedical data for evidence-based drug repurposing. PLoS One 2024; 19:e0289518. [PMID: 38271343 PMCID: PMC10810548 DOI: 10.1371/journal.pone.0289518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 11/08/2023] [Indexed: 01/27/2024] Open
Abstract
Drug repurposing is a strategy for identifying new uses of approved or investigational drugs that are outside the scope of the original medical indication. Even though many repurposed drugs have been found serendipitously in the past, the increasing availability of large volumes of biomedical data has enabled more systemic, data-driven approaches for drug candidate identification. At National Center of Advancing Translational Sciences (NCATS), we invent new methods to generate new data and information publicly available to spur innovation and scientific discovery. In this study, we aimed to explore and demonstrate biomedical data generated and collected via two NCATS research programs, the Toxicology in the 21st Century program (Tox21) and the Biomedical Data Translator (Translator) for the application of drug repurposing. These two programs provide complementary types of biomedical data from uncovering underlying biological mechanisms with bioassay screening data from Tox21 for chemical clustering, to enrich clustered chemicals with scientific evidence mined from the Translator towards drug repurposing. 129 chemical clusters have been generated and three of them have been further investigated for drug repurposing candidate identification, which is detailed as case studies.
Collapse
Affiliation(s)
- Fang Liu
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Andrew Patt
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, Maryland, United States of America
| | - Chloe Chen
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Ruili Huang
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, Maryland, United States of America
| | - Yanji Xu
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Ewy A. Mathé
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, Maryland, United States of America
| | - Qian Zhu
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, Maryland, United States of America
| |
Collapse
|
5
|
Callaghan J, Xu CH, Xin J, Cano MA, Riutta A, Zhou E, Juneja R, Yao Y, Narayan M, Hanspers K, Agrawal A, Pico AR, Wu C, Su AI. BioThings Explorer: a query engine for a federated knowledge graph of biomedical APIs. Bioinformatics 2023; 39:7273783. [PMID: 37707514 PMCID: PMC11015316 DOI: 10.1093/bioinformatics/btad570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 08/18/2023] [Accepted: 09/12/2023] [Indexed: 09/15/2023] Open
Abstract
SUMMARY Knowledge graphs are an increasingly common data structure for representing biomedical information. These knowledge graphs can easily represent heterogeneous types of information, and many algorithms and tools exist for querying and analyzing graphs. Biomedical knowledge graphs have been used in a variety of applications, including drug repurposing, identification of drug targets, prediction of drug side effects, and clinical decision support. Typically, knowledge graphs are constructed by centralization and integration of data from multiple disparate sources. Here, we describe BioThings Explorer, an application that can query a virtual, federated knowledge graph derived from the aggregated information in a network of biomedical web services. BioThings Explorer leverages semantically precise annotations of the inputs and outputs for each resource, and automates the chaining of web service calls to execute multi-step graph queries. Because there is no large, centralized knowledge graph to maintain, BioThings Explorer is distributed as a lightweight application that dynamically retrieves information at query time. AVAILABILITY AND IMPLEMENTATION More information can be found at https://explorer.biothings.io and code is available at https://github.com/biothings/biothings_explorer.
Collapse
Affiliation(s)
- Jackson Callaghan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Colleen H Xu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Jiwen Xin
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Marco Alvarado Cano
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Anders Riutta
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Eric Zhou
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Rohan Juneja
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Yao Yao
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Madhumita Narayan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Kristina Hanspers
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Ayushi Agrawal
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Alexander R Pico
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| |
Collapse
|
6
|
Charlet J, Cui L. Knowledge Representation and Management 2022: Findings in Ontology Development and Applications. Yearb Med Inform 2023; 32:225-229. [PMID: 38147864 PMCID: PMC10751114 DOI: 10.1055/s-0043-1768747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023] Open
Abstract
OBJECTIVES To select, present, and summarize the best papers in 2022 for the Knowledge Representation and Management (KRM) section of the International Medical Informatics Association (IMIA) Yearbook. METHODS We conducted PubMed queries and followed the IMIA Yearbook guidelines for performing biomedical informatics literature review to select the best papers in KRM published in 2022. RESULTS We retrieved 1,847 publications from PubMed. We nominated 15 candidate best papers, and two of them were finally selected as the best papers in the KRM section. The topics covered by the candidate papers include ontology and knowledge graph creation, ontology applications, ontology quality assurance, ontology mapping standard, and conceptual model. CONCLUSIONS In the KRM best paper selection for 2022, the candidate best papers encompassed a broad range of topics, with ontology and knowledge graph creation remaining a considerable research focus.
Collapse
Affiliation(s)
- Jean Charlet
- Sorbonne Université, INSERM, Univ Sorbonne Paris Nord, LIMICS, Paris, France
- AP-HP, DRCI, Paris, France
| | - Licong Cui
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | | |
Collapse
|
7
|
Liu F, Patt A, Chen C, Huang R, Xu Y, Mathé EA, Zhu Q. Exploring NCATS In-House Biomedical Data for Evidence-based Drug Repurposing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.21.550045. [PMID: 37546930 PMCID: PMC10401966 DOI: 10.1101/2023.07.21.550045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Drug repurposing is a strategy for identifying new uses of approved or investigational drugs that are outside the scope of the original medical indication. Even though many repurposed drugs have been found serendipitously in the past, the increasing availability of large volumes of biomedical data has enabled more systemic, data-driven approaches for drug candidate identification. At National Center of Advancing Translational Sciences (NCATS), we invent new methods to generate new data and information publicly available to spur innovation and scientific discovery. In this study, we aimed to explore and demonstrate biomedical data generated and collected via two NCATS research programs, the Toxicology in the 21st Century program (Tox21) and the Biomedical Data Translator (Translator) for the application of drug repurposing. These two programs provide complementary types of biomedical data from uncovering underlying biological mechanisms with bioassay screening data from Tox21 for chemical clustering, to enrich clustered chemicals with scientific evidence mined from the Translator towards drug repurposing. 129 chemical clusters have been generated and three of them have been further investigated for drug repurposing candidate identification, which is detailed as case studies.
Collapse
Affiliation(s)
- Fang Liu
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD
| | - Andrew Patt
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD
| | - Chloe Chen
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD
| | - Ruili Huang
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD
| | - Yanji Xu
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD
| | - Ewy A Mathé
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD
| | - Qian Zhu
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD
| |
Collapse
|
8
|
Caufield JH, Putman T, Schaper K, Unni DR, Hegde H, Callahan TJ, Cappelletti L, Moxon SAT, Ravanmehr V, Carbon S, Chan LE, Cortes K, Shefchek KA, Elsarboukh G, Balhoff J, Fontana T, Matentzoglu N, Bruskiewich RM, Thessen AE, Harris NL, Munoz-Torres MC, Haendel MA, Robinson PN, Joachimiak MP, Mungall CJ, Reese JT. KG-Hub-building and exchanging biological knowledge graphs. Bioinformatics 2023; 39:btad418. [PMID: 37389415 PMCID: PMC10336030 DOI: 10.1093/bioinformatics/btad418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/09/2023] [Accepted: 06/29/2023] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION https://kghub.org.
Collapse
Affiliation(s)
- J Harry Caufield
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Kevin Schaper
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Deepak R Unni
- SIB Swiss Institute of Bioinformatics, Basel 1015, Switzerland
| | - Harshad Hegde
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Luca Cappelletti
- Department of Computer Science, University of Milano, Milan 20126, Italy
| | - Sierra A T Moxon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Vida Ravanmehr
- Department of Lymphoma-Myeloma, MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Seth Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, United States
| | - Katherina Cortes
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Kent A Shefchek
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Glass Elsarboukh
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Jim Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, United States
| | - Tommaso Fontana
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan 20133, Italy
| | | | | | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | | | - Melissa A Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
| | - Marcin P Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| |
Collapse
|
9
|
Callaghan J, Xu CH, Xin J, Cano MA, Riutta A, Zhou E, Juneja R, Yao Y, Narayan M, Hanspers K, Agrawal A, Pico AR, Wu C, Su AI. BioThings Explorer: a query engine for a federated knowledge graph of biomedical APIs. ARXIV 2023:arXiv:2304.09344v1. [PMID: 37131885 PMCID: PMC10153288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Knowledge graphs are an increasingly common data structure for representing biomedical information. These knowledge graphs can easily represent heterogeneous types of information, and many algorithms and tools exist for querying and analyzing graphs. Biomedical knowledge graphs have been used in a variety of applications, including drug repurposing, identification of drug targets, prediction of drug side effects, and clinical decision support. Typically, knowledge graphs are constructed by centralization and integration of data from multiple disparate sources. Here, we describe BioThings Explorer, an application that can query a virtual, federated knowledge graph derived from the aggregated information in a network of biomedical web services. BioThings Explorer leverages semantically precise annotations of the inputs and outputs for each resource, and automates the chaining of web service calls to execute multi-step graph queries. Because there is no large, centralized knowledge graph to maintain, BioThing Explorer is distributed as a lightweight application that dynamically retrieves information at query time. More information can be found at https://explorer.biothings.io, and code is available at https://github.com/biothings/biothings_explorer.
Collapse
Affiliation(s)
- Jackson Callaghan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Colleen H Xu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Jiwen Xin
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Marco Alvarado Cano
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Anders Riutta
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Eric Zhou
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Rohan Juneja
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Yao Yao
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Madhumita Narayan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Kristina Hanspers
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Ayushi Agrawal
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Alexander R Pico
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| |
Collapse
|
10
|
Glen AK, Ma C, Mendoza L, Womack F, Wood EC, Sinha M, Acevedo L, Kvarfordt LG, Peene RC, Liu S, Hoffman AS, Roach JC, Deutsch EW, Ramsey SA, Koslicki D. ARAX: a graph-based modular reasoning tool for translational biomedicine. Bioinformatics 2023; 39:7031241. [PMID: 36752514 PMCID: PMC10027432 DOI: 10.1093/bioinformatics/btad082] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/17/2022] [Accepted: 02/07/2023] [Indexed: 04/12/2023] Open
Abstract
MOTIVATION With the rapidly growing volume of knowledge and data in biomedical databases, improved methods for knowledge-graph-based computational reasoning are needed in order to answer translational questions. Previous efforts to solve such challenging computational reasoning problems have contributed tools and approaches, but progress has been hindered by the lack of an expressive analysis workflow language for translational reasoning and by the lack of a reasoning engine-supporting that language-that federates semantically integrated knowledge-bases. RESULTS We introduce ARAX, a new reasoning system for translational biomedicine that provides a web browser user interface and an application programming interface (API). ARAX enables users to encode translational biomedical questions and to integrate knowledge across sources to answer the user's query and facilitate exploration of results. For ARAX, we developed new approaches to query planning, knowledge-gathering, reasoning and result ranking and dynamically integrate knowledge providers for answering biomedical questions. To illustrate ARAX's application and utility in specific disease contexts, we present several use-case examples. AVAILABILITY AND IMPLEMENTATION The source code and technical documentation for building the ARAX server-side software and its built-in knowledge database are freely available online (https://github.com/RTXteam/RTX). We provide a hosted ARAX service with a web browser interface at arax.rtx.ai and a web API endpoint at arax.rtx.ai/api/arax/v1.3/ui/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Luis Mendoza
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Finn Womack
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, PA 16802, USA
| | - E C Wood
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331, USA
| | - Meghamala Sinha
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331, USA
| | - Liliana Acevedo
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331, USA
| | - Lindsey G Kvarfordt
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331, USA
| | - Ross C Peene
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331, USA
| | - Shaopeng Liu
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, PA 16802, USA
| | - Andrew S Hoffman
- Interdisciplinary Hub for Digitalization and Society, Radboud University, Nijmegen 6500GL, The Netherlands
| | - Jared C Roach
- Institute for Systems Biology, Seattle, WA 98109, USA
| | | | | | | |
Collapse
|
11
|
Himmelstein DS, Zietz M, Rubinetti V, Kloster K, Heil BJ, Alquaddoomi F, Hu D, Nicholson DN, Hao Y, Sullivan BD, Nagle MW, Greene CS. Hetnet connectivity search provides rapid insights into how two biomedical entities are related. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.05.522941. [PMID: 36711546 PMCID: PMC9882000 DOI: 10.1101/2023.01.05.522941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Hetnets, short for "heterogeneous networks", contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet connects 11 types of nodes - including genes, diseases, drugs, pathways, and anatomical structures - with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious not only how metformin is related to breast cancer, but also how the GJA1 gene might be involved in insomnia. We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any two nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). We find that predictions are broadly similar to those from previously described supervised approaches for certain node type pairs. Scoring of individual paths is based on the most specific paths of a given type. Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. We implemented the method on Hetionet and provide an online interface at https://het.io/search . We provide an open source implementation of these methods in our new Python package named hetmatpy .
Collapse
Affiliation(s)
- Daniel S. Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Related Sciences
| | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Department of Biomedical Informatics, Columbia University, New York, New York, United States of America
| | - Vincent Rubinetti
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Kyle Kloster
- Carbon, Inc.; Department of Computer Science, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Benjamin J. Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania
| | - Faisal Alquaddoomi
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Dongbo Hu
- Department of Pathology, Perelman School of Medicine University of Pennsylvania, Philadelphia PA, USA
| | - David N. Nicholson
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia PA, USA
| | - Yun Hao
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia PA, USA
| | | | - Michael W. Nagle
- Integrative Biology, Internal Medicine Research Unit, Worldwide Research, Development, and Medicine, Pfizer Inc, Cambridge, Massachusetts, United States of America; Neurogenomics, Translational Sciences, Neurology Business Group, Eisai Inc, Cambridge, Massachusetts, United States of America
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| |
Collapse
|
12
|
Himmelstein DS, Zietz M, Rubinetti V, Kloster K, Heil BJ, Alquaddoomi F, Hu D, Nicholson DN, Hao Y, Sullivan BD, Nagle MW, Greene CS. Hetnet connectivity search provides rapid insights into how biomedical entities are related. Gigascience 2022; 12:giad047. [PMID: 37503959 PMCID: PMC10375517 DOI: 10.1093/gigascience/giad047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 04/14/2023] [Accepted: 06/06/2023] [Indexed: 07/29/2023] Open
Abstract
BACKGROUND Hetnets, short for "heterogeneous networks," contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet, connects 11 types of nodes-including genes, diseases, drugs, pathways, and anatomical structures-with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious about not only how metformin is related to breast cancer but also how a given gene might be involved in insomnia. FINDINGS We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any 2 nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. CONCLUSION We implemented the method on Hetionet and provide an online interface at https://het.io/search. We provide an open-source implementation of these methods in our new Python package named hetmatpy.
Collapse
Affiliation(s)
- Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Related Sciences, Denver, CO 80202, USA
| | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Vincent Rubinetti
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Kyle Kloster
- Carbon, Inc., Redwood City, CA 94063, USA
- Department of Computer Science, North Carolina State University, Raleigh, NC 27606, USA
| | - Benjamin J Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Faisal Alquaddoomi
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Dongbo Hu
- Department of Pathology, Perelman School of Medicine University of Pennsylvania, Philadelphia, PA 19104, USA
| | - David N Nicholson
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yun Hao
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Blair D Sullivan
- School of Computing, University of Utah, Salt Lake City, UT 84112, USA
| | - Michael W Nagle
- Integrative Biology, Internal Medicine Research Unit, Worldwide Research, Development, and Medicine, Pfizer Inc, Cambridge, MA 02139, USA
- Human Biology Integration Foundation, Deep Human Biology Learning, Eisai Inc., Cambridge, MA 02140, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
| |
Collapse
|
13
|
Ma C, Zhou Z, Liu H, Koslicki D. KGML-xDTD: a knowledge graph-based machine learning framework for drug treatment prediction and mechanism description. Gigascience 2022; 12:giad057. [PMID: 37602759 PMCID: PMC10441000 DOI: 10.1093/gigascience/giad057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 05/05/2023] [Accepted: 07/04/2023] [Indexed: 08/22/2023] Open
Abstract
BACKGROUND Computational drug repurposing is a cost- and time-efficient approach that aims to identify new therapeutic targets or diseases (indications) of existing drugs/compounds. It is especially critical for emerging and/or orphan diseases due to its cheaper investment and shorter research cycle compared with traditional wet-lab drug discovery approaches. However, the underlying mechanisms of action (MOAs) between repurposed drugs and their target diseases remain largely unknown, which is still a main obstacle for computational drug repurposing methods to be widely adopted in clinical settings. RESULTS In this work, we propose KGML-xDTD: a Knowledge Graph-based Machine Learning framework for explainably predicting Drugs Treating Diseases. It is a 2-module framework that not only predicts the treatment probabilities between drugs/compounds and diseases but also biologically explains them via knowledge graph (KG) path-based, testable MOAs. We leverage knowledge-and-publication-based information to extract biologically meaningful "demonstration paths" as the intermediate guidance in the Graph-based Reinforcement Learning (GRL) path-finding process. Comprehensive experiments and case study analyses show that the proposed framework can achieve state-of-the-art performance in both predictions of drug repurposing and recapitulation of human-curated drug MOA paths. CONCLUSIONS KGML-xDTD is the first model framework that can offer KG path explanations for drug repurposing predictions by leveraging the combination of prediction outcomes and existing biological knowledge and publications. We believe it can effectively reduce "black-box" concerns and increase prediction confidence for drug repurposing based on predicted path-based explanations and further accelerate the process of drug discovery for emerging diseases.
Collapse
Affiliation(s)
- Chunyu Ma
- Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
| | - Zhihan Zhou
- Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
| | - Han Liu
- Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
| | - David Koslicki
- Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
- Department of Computer Science and Engineering, Pennsylvania State University, State College, PA 16801, USA
- Department of Biology, Pennsylvania State University, State College, PA 16801, USA
| |
Collapse
|