1
|
Callahan TJ, Tripodi IJ, Stefanski AL, Cappelletti L, Taneja SB, Wyrwa JM, Casiraghi E, Matentzoglu NA, Reese J, Silverstein JC, Hoyt CT, Boyce RD, Malec SA, Unni DR, Joachimiak MP, Robinson PN, Mungall CJ, Cavalleri E, Fontana T, Valentini G, Mesiti M, Gillenwater LA, Santangelo B, Vasilevsky NA, Hoehndorf R, Bennett TD, Ryan PB, Hripcsak G, Kahn MG, Bada M, Baumgartner WA, Hunter LE. An open source knowledge graph ecosystem for the life sciences. Sci Data 2024; 11:363. [PMID: 38605048 PMCID: PMC11009265 DOI: 10.1038/s41597-024-03171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/21/2024] [Indexed: 04/13/2024] Open
Abstract
Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
| | - Ignacio J Tripodi
- Computer Science Department, Interdisciplinary Quantitative Biology, University of Colorado Boulder, Boulder, CO, 80301, USA
| | - Adrianne L Stefanski
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Jordan M Wyrwa
- Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Jonathan C Silverstein
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Charles Tapley Hoyt
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Scott A Malec
- Division of Translational Informatics, University of New Mexico School of Medicine, Albuquerque, NM, 87131, USA
| | - Deepak R Unni
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Marcin P Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Peter N Robinson
- Berlin Institute of Health at Charité-Universitatsmedizin, 10117, Berlin, Germany
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Emanuele Cavalleri
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- ELLIS, European Laboratory for Learning and Intelligent Systems, Milan Unit, Italy
| | - Marco Mesiti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Lucas A Gillenwater
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Brook Santangelo
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Nicole A Vasilevsky
- Data Collaboration Center, Critical Path Institute, 1840 E River Rd. Suite 100, Tucson, AZ, 85718, USA
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Tellen D Bennett
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Patrick B Ryan
- Janssen Research and Development, Raritan, NJ, 08869, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Michael G Kahn
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Michael Bada
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - William A Baumgartner
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| |
Collapse
|
2
|
Verma G, Rebholz-Schuhmann D, Madden MG. Enabling personalised disease diagnosis by combining a patient's time-specific gene expression profile with a biomedical knowledge base. BMC Bioinformatics 2024; 25:62. [PMID: 38326757 PMCID: PMC10848462 DOI: 10.1186/s12859-024-05674-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 01/25/2024] [Indexed: 02/09/2024] Open
Abstract
BACKGROUND Recent developments in the domain of biomedical knowledge bases (KBs) open up new ways to exploit biomedical knowledge that is available in the form of KBs. Significant work has been done in the direction of biomedical KB creation and KB completion, specifically, those having gene-disease associations and other related entities. However, the use of such biomedical KBs in combination with patients' temporal clinical data still largely remains unexplored, but has the potential to immensely benefit medical diagnostic decision support systems. RESULTS We propose two new algorithms, LOADDx and SCADDx, to combine a patient's gene expression data with gene-disease association and other related information available in the form of a KB, to assist personalized disease diagnosis. We have tested both of the algorithms on two KBs and on four real-world gene expression datasets of respiratory viral infection caused by Influenza-like viruses of 19 subtypes. We also compare the performance of proposed algorithms with that of five existing state-of-the-art machine learning algorithms (k-NN, Random Forest, XGBoost, Linear SVM, and SVM with RBF Kernel) using two validation approaches: LOOCV and a single internal validation set. Both SCADDx and LOADDx outperform the existing algorithms when evaluated with both validation approaches. SCADDx is able to detect infections with up to 100% accuracy in the cases of Datasets 2 and 3. Overall, SCADDx and LOADDx are able to detect an infection within 72 h of infection with 91.38% and 92.66% average accuracy respectively considering all four datasets, whereas XGBoost, which performed best among the existing machine learning algorithms, can detect the infection with only 86.43% accuracy on an average. CONCLUSIONS We demonstrate how our novel idea of using the most and least differentially expressed genes in combination with a KB can enable identification of the diseases that a patient is most likely to have at a particular time, from a KB with thousands of diseases. Moreover, the proposed algorithms can provide a short ranked list of the most likely diseases for each patient along with their most affected genes, and other entities linked with them in the KB, which can support health care professionals in their decision-making.
Collapse
Affiliation(s)
- Ghanshyam Verma
- Insight Centre for Data Analytics, School of Computer Science, University of Galway, Galway, Ireland.
- School of Computer Science, University of Galway, Galway, Ireland.
| | | | - Michael G Madden
- Insight Centre for Data Analytics, School of Computer Science, University of Galway, Galway, Ireland
- School of Computer Science, University of Galway, Galway, Ireland
| |
Collapse
|
3
|
Arsène S, Parès Y, Tixier E, Granjeon-Noriot S, Martin B, Bruezière L, Couty C, Courcelles E, Kahoul R, Pitrat J, Go N, Monteiro C, Kleine-Schultjann J, Jemai S, Pham E, Boissel JP, Kulesza A. In Silico Clinical Trials: Is It Possible? Methods Mol Biol 2024; 2716:51-99. [PMID: 37702936 DOI: 10.1007/978-1-0716-3449-3_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
Modeling and simulation (M&S), including in silico (clinical) trials, helps accelerate drug research and development and reduce costs and have coined the term "model-informed drug development (MIDD)." Data-driven, inferential approaches are now becoming increasingly complemented by emerging complex physiologically and knowledge-based disease (and drug) models, but differ in setup, bottlenecks, data requirements, and applications (also reminiscent of the different scientific communities they arose from). At the same time, and within the MIDD landscape, regulators and drug developers start to embrace in silico trials as a potential tool to refine, reduce, and ultimately replace clinical trials. Effectively, silos between the historically distinct modeling approaches start to break down. Widespread adoption of in silico trials still needs more collaboration between different stakeholders and established precedence use cases in key applications, which is currently impeded by a shattered collection of tools and practices. In order to address these key challenges, efforts to establish best practice workflows need to be undertaken and new collaborative M&S tools devised, and an attempt to provide a coherent set of solutions is provided in this chapter. First, a dedicated workflow for in silico clinical trial (development) life cycle is provided, which takes up general ideas from the systems biology and quantitative systems pharmacology space and which implements specific steps toward regulatory qualification. Then, key characteristics of an in silico trial software platform implementation are given on the example of jinkō.ai (nova's end-to-end in silico clinical trial platform). Considering these enabling scientific and technological advances, future applications of in silico trials to refine, reduce, and replace clinical research are indicated, ranging from synthetic control strategies and digital twins, which overall shows promise to begin a new era of more efficient drug development.
Collapse
|
4
|
Lotz JC, Ropella G, Anderson P, Yang Q, Hedderich MA, Bailey J, Hunt CA. An exploration of knowledge-organizing technologies to advance transdisciplinary back pain research. JOR Spine 2023; 6:e1300. [PMID: 38156063 PMCID: PMC10751978 DOI: 10.1002/jsp2.1300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 10/02/2023] [Accepted: 10/29/2023] [Indexed: 12/30/2023] Open
Abstract
Chronic low back pain (LBP) is influenced by a broad spectrum of patient-specific factors as codified in domains of the biopsychosocial model (BSM). Operationalizing the BSM into research and clinical care is challenging because most investigators work in silos that concentrate on only one or two BSM domains. Furthermore, the expanding, multidisciplinary nature of BSM research creates practical limitations as to how individual investigators integrate current data into their processes of generating impactful hypotheses. The rapidly advancing field of artificial intelligence (AI) is providing new tools for organizing knowledge, but the practical aspects for how AI may advance LBP research and clinical are beginning to be explored. The goals of the work presented here are to: (1) explore the current capabilities of knowledge integration technologies (large language models (LLM), similarity graphs (SGs), and knowledge graphs (KGs)) to synthesize biomedical literature and depict multimodal relationships reflected in the BSM, and; (2) highlight limitations, implementation details, and future areas of research to improve performance. We demonstrate preliminary evidence that LLMs, like GPT-3, may be useful in helping scientists analyze and distinguish cLBP publications across multiple BSM domains and determine the degree to which the literature supports or contradicts emergent hypotheses. We show that SG representations and KGs enable exploring LBP's literature in novel ways, possibly providing, trans-disciplinary perspectives or insights that are currently difficult, if not infeasible to achieve. The SG approach is automated, simple, and inexpensive to execute, and thereby may be useful for early-phase literature and narrative explorations beyond one's areas of expertise. Likewise, we show that KGs can be constructed using automated pipelines, queried to provide semantic information, and analyzed to explore trans-domain linkages. The examples presented support the feasibility for LBP-tailored AI protocols to organize knowledge and support developing and refining trans-domain hypotheses.
Collapse
Affiliation(s)
- Jeffrey C. Lotz
- Department of Orthopaedic SurgeryUniversity of California at San FranciscoSan FranciscoCaliforniaUSA
| | | | - Paul Anderson
- Department of Computer Science & Software EngineeringCalifornia Polytechnic State UniversitySan Luis ObispoCaliforniaUSA
| | - Qian Yang
- Department of Information ScienceCornell UniversityIthacaNew YorkUSA
| | | | - Jeannie Bailey
- Department of Orthopaedic SurgeryUniversity of California at San FranciscoSan FranciscoCaliforniaUSA
| | - C. Anthony Hunt
- Department of Bioengineering & Therapeutic SciencesUniversity of California at San FranciscoSan FranciscoCaliforniaUSA
| |
Collapse
|
5
|
Sun Z, Lin M, Zhu Q, Xie Q, Wang F, Lu Z, Peng Y. A scoping review on multimodal deep learning in biomedical images and texts. J Biomed Inform 2023; 146:104482. [PMID: 37652343 PMCID: PMC10591890 DOI: 10.1016/j.jbi.2023.104482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 07/18/2023] [Accepted: 08/28/2023] [Indexed: 09/02/2023]
Abstract
OBJECTIVE Computer-assisted diagnostic and prognostic systems of the future should be capable of simultaneously processing multimodal data. Multimodal deep learning (MDL), which involves the integration of multiple sources of data, such as images and text, has the potential to revolutionize the analysis and interpretation of biomedical data. However, it only caught researchers' attention recently. To this end, there is a critical need to conduct a systematic review on this topic, identify the limitations of current work, and explore future directions. METHODS In this scoping review, we aim to provide a comprehensive overview of the current state of the field and identify key concepts, types of studies, and research gaps with a focus on biomedical images and texts joint learning, mainly because these two were the most commonly available data types in MDL research. RESULT This study reviewed the current uses of multimodal deep learning on five tasks: (1) Report generation, (2) Visual question answering, (3) Cross-modal retrieval, (4) Computer-aided diagnosis, and (5) Semantic segmentation. CONCLUSION Our results highlight the diverse applications and potential of MDL and suggest directions for future research in the field. We hope our review will facilitate the collaboration of natural language processing (NLP) and medical imaging communities and support the next generation of decision-making and computer-assisted diagnostic system development.
Collapse
Affiliation(s)
- Zhaoyi Sun
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Mingquan Lin
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Qingqing Zhu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA.
| | - Qianqian Xie
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Fei Wang
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA.
| | - Yifan Peng
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| |
Collapse
|
6
|
Lobentanzer S, Aloy P, Baumbach J, Bohar B, Carey VJ, Charoentong P, Danhauser K, Doğan T, Dreo J, Dunham I, Farr E, Fernandez-Torras A, Gyori BM, Hartung M, Hoyt CT, Klein C, Korcsmaros T, Maier A, Mann M, Ochoa D, Pareja-Lorente E, Popp F, Preusse M, Probul N, Schwikowski B, Sen B, Strauss MT, Turei D, Ulusoy E, Waltemath D, Wodke JAH, Saez-Rodriguez J. Democratizing knowledge representation with BioCypher. Nat Biotechnol 2023; 41:1056-1059. [PMID: 37337100 DOI: 10.1038/s41587-023-01848-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2023]
Affiliation(s)
- Sebastian Lobentanzer
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany.
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona), the Barcelona Institute of Science and Technology, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Balazs Bohar
- Earlham Institute, Norwich, UK
- Biological Research Centre, Szeged, Hungary
| | - Vincent J Carey
- Channing Division of Network Medicine, Mass General Brigham, Harvard Medical School, Boston, MA, USA
| | - Pornpimol Charoentong
- Centre for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Heidelberg, Germany
- Department of Medical Oncology, National Centre for Tumour Diseases (NCT), Heidelberg University Hospital (UKHD), Heidelberg, Germany
| | - Katharina Danhauser
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, Munich, Germany
| | - Tunca Doğan
- Biological Data Science Lab, Department of Computer Engineering, Hacettepe University, Ankara, Turkey
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| | - Johann Dreo
- Computational Systems Biomedicine Lab, Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, France
- Bioinformatics and Biostatistics Hub, Institut Pasteur, Université Paris Cité, Paris, France
| | - Ian Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
- Open Targets, Wellcome Genome Campus, Hinxton, UK
| | - Elias Farr
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Adrià Fernandez-Torras
- Institute for Research in Biomedicine (IRB Barcelona), the Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA
| | - Michael Hartung
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | | | - Christoph Klein
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, Munich, Germany
| | - Tamas Korcsmaros
- Earlham Institute, Norwich, UK
- Imperial College London, London, UK
- Quadram Institute Bioscience, Norwich, UK
| | - Andreas Maier
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Matthias Mann
- Proteomics Program, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Copenhagen, Denmark
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - David Ochoa
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
- Open Targets, Wellcome Genome Campus, Hinxton, UK
| | - Elena Pareja-Lorente
- Institute for Research in Biomedicine (IRB Barcelona), the Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Ferdinand Popp
- Applied Tumour Immunity Clinical Cooperation Unit, National Centre for Tumour Diseases (NCT), German Cancer Research Centre (DKFZ), Heidelberg, Germany
| | - Martin Preusse
- German Centre for Diabetes Research (DZD), Neuherberg, Germany
| | - Niklas Probul
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Benno Schwikowski
- Computational Systems Biomedicine Lab, Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, France
| | - Bünyamin Sen
- Biological Data Science Lab, Department of Computer Engineering, Hacettepe University, Ankara, Turkey
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| | - Maximilian T Strauss
- Proteomics Program, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Denes Turei
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Erva Ulusoy
- Biological Data Science Lab, Department of Computer Engineering, Hacettepe University, Ankara, Turkey
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| | - Dagmar Waltemath
- Medical Informatics Laboratory, University Medicine Greifswald, Greifswald, Germany
| | - Judith A H Wodke
- Medical Informatics Laboratory, University Medicine Greifswald, Greifswald, Germany
| | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany.
| |
Collapse
|
7
|
Morgan JP, Paiement A, Klinke C. Domain-informed graph neural networks: A quantum chemistry case study. Neural Netw 2023; 165:938-952. [PMID: 37453397 DOI: 10.1016/j.neunet.2023.06.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 05/05/2023] [Accepted: 06/24/2023] [Indexed: 07/18/2023]
Abstract
We explore different strategies to integrate prior domain knowledge into the design of graph neural networks (GNN). Our study is supported by a use-case of estimating the potential energy of chemical systems (molecules and crystals) represented as graphs. We integrate two elements of domain knowledge into the design of the GNN to constrain and regularise its learning, towards higher accuracy and generalisation. First, knowledge on the existence of different types of relations/graph edges (e.g. chemical bonds in our case study) between nodes of the graph is used to modulate their interactions. We formulate and compare two strategies, namely specialised message production and specialised update of internal states. Second, knowledge of the relevance of some physical quantities is used to constrain the learnt features towards a higher physical relevance using a simple multi-task learning (MTL) paradigm. We explore the potential of MTL to better capture the underlying mechanisms behind the studied phenomenon. We demonstrate the general applicability of our two knowledge integrations by applying them to three architectures that rely on different mechanisms to propagate information between nodes and to update node states. Our implementations are made publicly available. To support these experiments, we release three new datasets of out-of-equilibrium molecules and crystals of various complexities.
Collapse
Affiliation(s)
- Jay Paul Morgan
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, Marseille, France; Department of Computer Science, Swansea University, Swansea, SA2 8PP, United Kingdom.
| | - Adeline Paiement
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, Marseille, France; Department of Computer Science, Swansea University, Swansea, SA2 8PP, United Kingdom.
| | - Christian Klinke
- Institute of Physics, University of Rostock, Rostock, 18059, Germany; Department "Life, Light & Matter", University of Rostock, Rostock, 18059, Germany; Department of Chemistry, Swansea University, Swansea, SA2 8PP, United Kingdom.
| |
Collapse
|
8
|
Hou Y, Yeung J, Xu H, Su C, Wang F, Zhang R. From Answers to Insights: Unveiling the Strengths and Limitations of ChatGPT and Biomedical Knowledge Graphs. RESEARCH SQUARE 2023:rs.3.rs-3185632. [PMID: 37577545 PMCID: PMC10418534 DOI: 10.21203/rs.3.rs-3185632/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Purpose Large Language Models (LLMs) have shown exceptional performance in various natural language processing tasks, benefiting from their language generation capabilities and ability to acquire knowledge from unstructured text. However, in the biomedical domain, LLMs face limitations that lead to inaccurate and inconsistent answers. Knowledge Graphs (KGs) have emerged as valuable resources for organizing structured information. Biomedical Knowledge Graphs (BKGs) have gained significant attention for managing diverse and large-scale biomedical knowledge. The objective of this study is to assess and compare the capabilities of ChatGPT and existing BKGs in question-answering, biomedical knowledge discovery, and reasoning tasks within the biomedical domain. Methods We conducted a series of experiments to assess the performance of ChatGPT and the BKGs in various aspects of querying existing biomedical knowledge, knowledge discovery, and knowledge reasoning. Firstly, we tasked ChatGPT with answering questions sourced from the "Alternative Medicine" sub-category of Yahoo! Answers and recorded the responses. Additionally, we queried BKG to retrieve the relevant knowledge records corresponding to the questions and assessed them manually. In another experiment, we formulated a prediction scenario to assess ChatGPT's ability to suggest potential drug/dietary supplement repurposing candidates. Simultaneously, we utilized BKG to perform link prediction for the same task. The outcomes of ChatGPT and BKG were compared and analyzed. Furthermore, we evaluated ChatGPT and BKG's capabilities in establishing associations between pairs of proposed entities. This evaluation aimed to assess their reasoning abilities and the extent to which they can infer connections within the knowledge domain. Results The results indicate that ChatGPT with GPT-4.0 outperforms both GPT-3.5 and BKGs in providing existing information. However, BKGs demonstrate higher reliability in terms of information accuracy. ChatGPT exhibits limitations in performing novel discoveries and reasoning, particularly in establishing structured links between entities compared to BKGs. Conclusions To address the limitations observed, future research should focus on integrating LLMs and BKGs to leverage the strengths of both approaches. Such integration would optimize task performance and mitigate potential risks, leading to advancements in knowledge within the biomedical field and contributing to the overall well-being of individuals.
Collapse
|
9
|
Boguslav MR, Salem NM, White EK, Sullivan KJ, Bada M, Hernandez TL, Leach SM, Hunter LE. Creating an ignorance-base: Exploring known unknowns in the scientific literature. J Biomed Inform 2023; 143:104405. [PMID: 37270143 PMCID: PMC10528083 DOI: 10.1016/j.jbi.2023.104405] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 05/18/2023] [Accepted: 05/21/2023] [Indexed: 06/05/2023]
Abstract
BACKGROUND Scientific discovery progresses by exploring new and uncharted territory. More specifically, it advances by a process of transforming unknown unknowns first into known unknowns, and then into knowns. Over the last few decades, researchers have developed many knowledge bases to capture and connect the knowns, which has enabled topic exploration and contextualization of experimental results. But recognizing the unknowns is also critical for finding the most pertinent questions and their answers. Prior work on known unknowns has sought to understand them, annotate them, and automate their identification. However, no knowledge-bases yet exist to capture these unknowns, and little work has focused on how scientists might use them to trace a given topic or experimental result in search of open questions and new avenues for exploration. We show here that a knowledge base of unknowns can be connected to ontologically grounded biomedical knowledge to accelerate research in the field of prenatal nutrition. RESULTS We present the first ignorance-base, a knowledge-base created by combining classifiers to recognize ignorance statements (statements of missing or incomplete knowledge that imply a goal for knowledge) and biomedical concepts over the prenatal nutrition literature. This knowledge-base places biomedical concepts mentioned in the literature in context with the ignorance statements authors have made about them. Using our system, researchers interested in the topic of vitamin D and prenatal health were able to uncover three new avenues for exploration (immune system, respiratory system, and brain development) by searching for concepts enriched in ignorance statements. These were buried among the many standard enriched concepts. Additionally, we used the ignorance-base to enrich concepts connected to a gene list associated with vitamin D and spontaneous preterm birth and found an emerging topic of study (brain development) in an implied field (neuroscience). The researchers could look to the field of neuroscience for potential answers to the ignorance statements. CONCLUSION Our goal is to help students, researchers, funders, and publishers better understand the state of our collective scientific ignorance (known unknowns) in order to help accelerate research through the continued illumination of and focus on the known unknowns and their respective goals for scientific knowledge.
Collapse
Affiliation(s)
- Mayla R Boguslav
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA.
| | - Nourah M Salem
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA
| | - Elizabeth K White
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA; Center for Genes, Environment and Health, National Jewish Health, Jackson Street, Denver, 80206, CO, USA
| | - Katherine J Sullivan
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA
| | - Michael Bada
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA
| | - Teri L Hernandez
- College of Nursing, Department of Medicine/Division of Endocrinology, Metabolism, & Diabetes, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA
| | - Sonia M Leach
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA; Center for Genes, Environment and Health, National Jewish Health, Jackson Street, Denver, 80206, CO, USA
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA
| |
Collapse
|
10
|
Hou Y, Yeung J, Xu H, Su C, Wang F, Zhang R. From Answers to Insights: Unveiling the Strengths and Limitations of ChatGPT and Biomedical Knowledge Graphs. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.09.23291208. [PMID: 37398259 PMCID: PMC10312889 DOI: 10.1101/2023.06.09.23291208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Large Language Models (LLMs) have demonstrated exceptional performance in various natural language processing tasks, utilizing their language generation capabilities and knowledge acquisition potential from unstructured text. However, when applied to the biomedical domain, LLMs encounter limitations, resulting in erroneous and inconsistent answers. Knowledge Graphs (KGs) have emerged as valuable resources for structured information representation and organization. Specifically, Biomedical Knowledge Graphs (BKGs) have attracted significant interest in managing large-scale and heterogeneous biomedical knowledge. This study evaluates the capabilities of ChatGPT and existing BKGs in question answering, knowledge discovery, and reasoning. Results indicate that while ChatGPT with GPT-4.0 surpasses both GPT-3.5 and BKGs in providing existing information, BKGs demonstrate superior information reliability. Additionally, ChatGPT exhibits limitations in performing novel discoveries and reasoning, particularly in establishing structured links between entities compared to BKGs. To overcome these limitations, future research should focus on integrating LLMs and BKGs to leverage their respective strengths. Such an integrated approach would optimize task performance and mitigate potential risks, thereby advancing knowledge in the biomedical field and contributing to overall well-being.
Collapse
Affiliation(s)
- Yu Hou
- Department of Surgery, University of Minnesota, Minneapolis, MN, USA
| | - Jeremy Yeung
- Department of Surgery, University of Minnesota, Minneapolis, MN, USA
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, Yale University, New Haven, Connecticut, USA
| | - Chang Su
- Department of Health Service Administration and Policy, Temple University, Philadelphia, PA, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Rui Zhang
- Department of Surgery, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
11
|
Cenikj G, Strojnik L, Angelski R, Ogrinc N, Koroušić Seljak B, Eftimov T. From language models to large-scale food and biomedical knowledge graphs. Sci Rep 2023; 13:7815. [PMID: 37188766 DOI: 10.1038/s41598-023-34981-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 05/10/2023] [Indexed: 05/17/2023] Open
Abstract
Knowledge about the interactions between dietary and biomedical factors is scattered throughout uncountable research articles in an unstructured form (e.g., text, images, etc.) and requires automatic structuring so that it can be provided to medical professionals in a suitable format. Various biomedical knowledge graphs exist, however, they require further extension with relations between food and biomedical entities. In this study, we evaluate the performance of three state-of-the-art relation-mining pipelines (FooDis, FoodChem and ChemDis) which extract relations between food, chemical and disease entities from textual data. We perform two case studies, where relations were automatically extracted by the pipelines and validated by domain experts. The results show that the pipelines can extract relations with an average precision around 70%, making new discoveries available to domain experts with reduced human effort, since the domain experts should only evaluate the results, instead of finding, and reading all new scientific papers.
Collapse
Affiliation(s)
- Gjorgjina Cenikj
- Jožef Stefan Institute, Ljubljana, 1000, Slovenia.
- Jožef Stefan International Postgraduate School, Ljubljana, 1000, Slovenia.
| | | | | | - Nives Ogrinc
- Jožef Stefan Institute, Ljubljana, 1000, Slovenia
| | | | - Tome Eftimov
- Jožef Stefan Institute, Ljubljana, 1000, Slovenia
| |
Collapse
|
12
|
Su C, Hou Y, Zhou M, Rajendran S, Maasch JRA, Abedi Z, Zhang H, Bai Z, Cuturrufo A, Guo W, Chaudhry FF, Ghahramani G, Tang J, Cheng F, Li Y, Zhang R, DeKosky ST, Bian J, Wang F. Biomedical discovery through the integrative biomedical knowledge hub (iBKH). iScience 2023; 26:106460. [PMID: 37020958 PMCID: PMC10068563 DOI: 10.1016/j.isci.2023.106460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 09/20/2022] [Accepted: 03/16/2023] [Indexed: 04/01/2023] Open
Abstract
The abundance of biomedical knowledge gained from biological experiments and clinical practices is an invaluable resource for biomedicine. The emerging biomedical knowledge graphs (BKGs) provide an efficient and effective way to manage the abundant knowledge in biomedical and life science. In this study, we created a comprehensive BKG called the integrative Biomedical Knowledge Hub (iBKH) by harmonizing and integrating information from diverse biomedical resources. To make iBKH easily accessible for biomedical research, we developed a web-based, user-friendly graphical portal that allows fast and interactive knowledge retrieval. Additionally, we also implemented an efficient and scalable graph learning pipeline for discovering novel biomedical knowledge in iBKH. As a proof of concept, we performed our iBKH-based method for computational in-silico drug repurposing for Alzheimer's disease. The iBKH is publicly available.
Collapse
Affiliation(s)
- Chang Su
- Department of Health Service Administration and Policy, College of Public Health, Temple University, Philadelphia, PA 19122, USA
| | - Yu Hou
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA
| | - Manqi Zhou
- Department of Computational Biology, Cornell University, Ithaca, NY 14850, USA
| | - Suraj Rajendran
- Tri-Institutional Computational Biology & Medicine Program, Cornell University, New York, NY 10065, USA
| | | | - Zehra Abedi
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA
| | - Haotan Zhang
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
| | - Zilong Bai
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA
| | | | - Winston Guo
- Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Fayzan F. Chaudhry
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
| | - Gregory Ghahramani
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
| | - Jian Tang
- Mila-Quebec AI Institute and HEC Montreal, Montreal, QC H2S 3H1, Canada
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Yue Li
- School of Computer Science, McGill University, Montreal, QC H3A 0C6, Canada
| | - Rui Zhang
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA
| | - Steven T. DeKosky
- Department of Neurology, College of Medicine, University of Florida, Gainesville, FL 32610, USA
| | - Jiang Bian
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA
| |
Collapse
|
13
|
Moris D, Henao R, Hensman H, Stempora L, Chasse S, Schobel S, Dente CJ, Kirk AD, Elster E. Multidimensional machine learning models predicting outcomes after trauma. Surgery 2022; 172:1851-1859. [PMID: 36116976 DOI: 10.1016/j.surg.2022.08.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 08/01/2022] [Accepted: 08/04/2022] [Indexed: 01/07/2023]
Abstract
BACKGROUND An emerging body of literature supports the role of individualized prognostic tools to guide the management of patients after trauma. The aim of this study was to develop advanced modeling tools from multidimensional data sources, including immunological analytes and clinical and administrative data, to predict outcomes in trauma patients. METHODS This was a prospective study of trauma patients at Level 1 centers from 2015 to 2019. Clinical, flow cytometry, and serum cytokine data were collected within 48 hours of admission. Sparse logistic regression models were developed, jointly selecting predictors and estimating the risk of ventilator-associated pneumonia, acute kidney injury, complicated disposition (death, rehabilitation, or nursing facility), and return to the operating room. Model parameters (regularization controlling model sparsity) and performance estimation were obtained via nested leave-one-out cross-validation. RESULTS A total of 179 patients were included. The incidences of ventilator-associated pneumonia, acute kidney injury, complicated disposition, and return to the operating room were 17.7%, 28.8%, 22.5%, and 12.3%, respectively. Regarding extensive resource use, 30.7% of patients had prolonged intensive care unit stay, 73.2% had prolonged length of stay, and 23.5% had need for prolonged ventilatory support. The models were developed and cross-validated for ventilator-associated pneumonia, acute kidney injury, complicated dispositions, and return to the operating room, yielding predictive areas under the curve from 0.70 to 0.91. Each model derived its optimal predictive value by combining clinical, administrative, and immunological analyte data. CONCLUSION Clinical, immunological, and administrative data can be combined to predict post-traumatic outcomes and resource use. Multidimensional machine learning modeling can identify trauma patients with complicated clinical trajectories and high resource needs.
Collapse
Affiliation(s)
| | | | - Hannah Hensman
- DecisionQ, Arlington, VA; Surgical Critical Care Initiative, Department of Surgery, Uniformed Services University of the Health Sciences; Bethesda, MD
| | - Linda Stempora
- Medical Center, Duke University Durham, NC; Surgical Critical Care Initiative, Department of Surgery, Uniformed Services University of the Health Sciences; Bethesda, MD
| | - Scott Chasse
- Medical Center, Duke University Durham, NC; Surgical Critical Care Initiative, Department of Surgery, Uniformed Services University of the Health Sciences; Bethesda, MD
| | - Seth Schobel
- Surgical Critical Care Initiative, Department of Surgery, Uniformed Services University of the Health Sciences; Bethesda, MD; Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc, Bethesda, MD
| | | | - Allan D Kirk
- Medical Center, Duke University Durham, NC; Surgical Critical Care Initiative, Department of Surgery, Uniformed Services University of the Health Sciences; Bethesda, MD
| | - Eric Elster
- Surgical Critical Care Initiative, Department of Surgery, Uniformed Services University of the Health Sciences; Bethesda, MD; Walter Reed National Military Medical Center, Bethesda, MD
| |
Collapse
|
14
|
Li MM, Huang K, Zitnik M. Graph representation learning in biomedicine and healthcare. Nat Biomed Eng 2022; 6:1353-1369. [PMID: 36316368 PMCID: PMC10699434 DOI: 10.1038/s41551-022-00942-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 08/09/2022] [Indexed: 11/11/2022]
Abstract
Networks-or graphs-are universal descriptors of systems of interacting elements. In biomedicine and healthcare, they can represent, for example, molecular interactions, signalling pathways, disease co-morbidities or healthcare systems. In this Perspective, we posit that representation learning can realize principles of network medicine, discuss successes and current limitations of the use of representation learning on graphs in biomedicine and healthcare, and outline algorithmic strategies that leverage the topology of graphs to embed them into compact vectorial spaces. We argue that graph representation learning will keep pushing forward machine learning for biomedicine and healthcare applications, including the identification of genetic variants underlying complex traits, the disentanglement of single-cell behaviours and their effects on health, the assistance of patients in diagnosis and treatment, and the development of safe and effective medicines.
Collapse
Affiliation(s)
- Michelle M Li
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Kexin Huang
- Health Data Science Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
| |
Collapse
|
15
|
Turki H, Rasberry L, Ali Hadj Taieb M, Mietchen D, Ben Aouicha M, Pouris A, Bousrih Y. Letter to the Editor: FHIR RDF - Why the world needs structured electronic health records. J Biomed Inform 2022; 136:104253. [DOI: 10.1016/j.jbi.2022.104253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 11/15/2022] [Accepted: 11/16/2022] [Indexed: 11/21/2022]
|
16
|
DeBellis M, Dutta B. From ontology to knowledge graph with agile methods: the case of COVID-19 CODO knowledge graph. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS 2022. [DOI: 10.1108/ijwis-03-2022-0047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
The purpose of this paper is to describe the CODO ontology (COviD-19 Ontology) that captures epidemiological data about the COVID-19 pandemic in a knowledge graph that follows the FAIR principles. This study took information from spreadsheets and integrated it into a knowledge graph that could be queried with SPARQL and visualized with the Gruff tool in AllegroGraph.
Design/methodology/approach
The knowledge graph was designed with the Web Ontology Language. The methodology was a hybrid approach integrating the YAMO methodology for ontology design and Agile methods to define iterations and approach to requirements, testing and implementation.
Findings
The hybrid approach demonstrated that Agile can bring the same benefits to knowledge graph projects as it has to other projects. The two-person team went from an ontology to a large knowledge graph with approximately 5 M triples in a few months. The authors gathered useful real-world experience on how to most effectively transform “from strings to things.”
Originality/value
This study is the only FAIR model (to the best of the authors’ knowledge) to address epidemiology data for the COVID-19 pandemic. It also brought to light several practical issues that generalize to other studies wishing to go from an ontology to a large knowledge graph. This study is one of the first studies to document how the Agile approach can be used for knowledge graph development.
Collapse
|
17
|
Fry M. Question-driven stepwise experimental discoveries in biochemistry: two case studies. HISTORY AND PHILOSOPHY OF THE LIFE SCIENCES 2022; 44:12. [PMID: 35320436 DOI: 10.1007/s40656-022-00491-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 02/09/2022] [Indexed: 06/14/2023]
Abstract
Philosophers of science diverge on the question what drives the growth of scientific knowledge. Most of the twentieth century was dominated by the notion that theories propel that growth whereas experiments play secondary roles of operating within the theoretical framework or testing theoretical predictions. New experimentalism, a school of thought pioneered by Ian Hacking in the early 1980s, challenged this view by arguing that theory-free exploratory experimentation may in many cases effectively probe nature and potentially spawn higher evidence-based theories. Because theories are often powerless to envisage workings of complex biological systems, theory-independent experimentation is common in the life sciences. Some such experiments are triggered by compelling observation, others are prompted by innovative techniques or instruments, whereas different investigations query big data to identify regularities and underlying organizing principles. A distinct fourth type of experiments is motivated by a major question. Here I describe two question-guided experimental discoveries in biochemistry: the cyclic adenosine monophosphate mediator of hormone action and the ubiquitin-mediated system of protein degradation. Lacking underlying theories, antecedent data bases, or new techniques, the sole guides of the two discoveries were respective substantial questions. Both research projects were similarly instigated by theory-free exploratory experimentation and continued in alternating phases of results-based interim working hypotheses, their examination by experiment, provisional hypotheses again, and so on. These two cases designate theory-free, question-guided, stepwise biochemical investigations as a distinct subtype of the new experimentalism mode of scientific enquiry.
Collapse
Affiliation(s)
- Michael Fry
- Department of Biochemistry, Rappaport Faculty of Medicine, Technion - Israel Institute of Technology, POB 9649, 31096, Haifa, Israel.
| |
Collapse
|
18
|
Dedié A, Bleimehl T, Täger J, Preusse M, Hrabě de Angelis M, Jarasch A. DZDconnect: mit vernetzten Daten gegen Diabetes. DIABETOLOGE 2021. [DOI: 10.1007/s11428-021-00807-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
19
|
Mann M, Kumar C, Zeng WF, Strauss MT. Artificial intelligence for proteomics and biomarker discovery. Cell Syst 2021; 12:759-770. [PMID: 34411543 DOI: 10.1016/j.cels.2021.06.006] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/07/2021] [Accepted: 06/28/2021] [Indexed: 12/14/2022]
Abstract
There is an avalanche of biomedical data generation and a parallel expansion in computational capabilities to analyze and make sense of these data. Starting with genome sequencing and widely employed deep sequencing technologies, these trends have now taken hold in all omics disciplines and increasingly call for multi-omics integration as well as data interpretation by artificial intelligence technologies. Here, we focus on mass spectrometry (MS)-based proteomics and describe how machine learning and, in particular, deep learning now predicts experimental peptide measurements from amino acid sequences alone. This will dramatically improve the quality and reliability of analytical workflows because experimental results should agree with predictions in a multi-dimensional data landscape. Machine learning has also become central to biomarker discovery from proteomics data, which now starts to outperform existing best-in-class assays. Finally, we discuss model transparency and explainability and data privacy that are required to deploy MS-based biomarkers in clinical settings.
Collapse
Affiliation(s)
- Matthias Mann
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
| | - Chanchal Kumar
- Translational Science & Experimental Medicine, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden.
| | - Wen-Feng Zeng
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
| | | |
Collapse
|
20
|
Kulmanov M, Smaili FZ, Gao X, Hoehndorf R. Semantic similarity and machine learning with ontologies. Brief Bioinform 2021; 22:bbaa199. [PMID: 33049044 PMCID: PMC8293838 DOI: 10.1093/bib/bbaa199] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/03/2020] [Accepted: 08/04/2020] [Indexed: 12/13/2022] Open
Abstract
Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.
Collapse
Affiliation(s)
| | | | - Xin Gao
- Computational Bioscience Research Center and lead of the Structural and Functional Bioinformatics Group at King Abdullah University of Science and Technology
| | | |
Collapse
|