1
|
Wang C, Yang Y, Song J, Nan X. Research Progresses and Applications of Knowledge Graph Embedding Technique in Chemistry. J Chem Inf Model 2024. [PMID: 39302256 DOI: 10.1021/acs.jcim.4c00791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
A knowledge graph (KG) is a technique for modeling entities and their interrelations. Knowledge graph embedding (KGE) translates these entities and relationships into a continuous vector space to facilitate dense and efficient representations. In the domain of chemistry, applying KG and KGE techniques integrates heterogeneous chemical information into a coherent and user-friendly framework, enhances the representation of chemical data features, and is beneficial for downstream tasks, such as chemical property prediction. This paper begins with a comprehensive review of classical and contemporary KGE methodologies, including distance-based models, semantic matching models, and neural network-based approaches. We then catalogue the primary databases employed in chemistry and biochemistry that furnish the KGs with essential chemical data. Subsequently, we explore the latest applications of KG and KGE in chemistry, focusing on risk assessment, property prediction, and drug discovery. Finally, we discuss the current challenges to KG and KGE techniques and provide a perspective on their potential future developments.
Collapse
Affiliation(s)
- Chuanghui Wang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
| | - Yunqing Yang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
| | - Jinshuai Song
- Green Catalysis Center, College of Chemistry, Zhengzhou University, Zhengzhou 450001, China
| | - Xiaofei Nan
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
| |
Collapse
|
2
|
Piras A, Chenghao S, Sebek M, Ispirova G, Menichetti G. CPIExtract: A software package to collect and harmonize small molecule and protein interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.03.601957. [PMID: 39005430 PMCID: PMC11245042 DOI: 10.1101/2024.07.03.601957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
The binding interactions between small molecules and proteins are the basis of cellular functions. Yet, experimental data available regarding compound-protein interaction is not harmonized into a single entity but rather scattered across multiple institutions, each maintaining databases with different formats. Extracting information from these multiple sources remains challenging due to data heterogeneity. Here, we present CPIExtract (Compound-Protein Interaction Extract), a tool to interactively extract experimental binding interaction data from multiple databases, perform filtering, and harmonize the resulting information, thus providing a gain of compound-protein interaction data. When compared to a single source, DrugBank, we show that it can collect more than 10 times the amount of annotations. The end-user can apply custom filtering to the aggregated output data and save it in any generic tabular file suitable for further downstream tasks such as network medicine analyses for drug repurposing and cross-validation of deep learning models.
Collapse
Affiliation(s)
- Andrea Piras
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Piazza Leonardo da Vinci, 32, 20133, Milan, Italy
| | - Shi Chenghao
- Network Science Institute, Northeastern University, 360 Huntington Ave, 02115, MA, USA
| | - Michael Sebek
- Network Science Institute, Northeastern University, 360 Huntington Ave, 02115, MA, USA
| | - Gordana Ispirova
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 181 Longwood Ave, 02115, MA, USA
| | - Giulia Menichetti
- Network Science Institute, Northeastern University, 360 Huntington Ave, 02115, MA, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 181 Longwood Ave, 02115, MA, USA
- Harvard Data Science Initiative, Harvard University, 114 Western Avenue, 02134, MA, USA
| |
Collapse
|
3
|
Rihm SD, Tan YR, Ang W, Hofmeister M, Deng X, Laksana MT, Quek HY, Bai J, Pascazio L, Siong SC, Akroyd J, Mosbach S, Kraft M. The digital lab manager: Automating research support. SLAS Technol 2024; 29:100135. [PMID: 38703999 DOI: 10.1016/j.slast.2024.100135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/03/2024] [Accepted: 04/22/2024] [Indexed: 05/06/2024]
Abstract
Laboratory management automation is essential for achieving interoperability in the domain of experimental research and accelerating scientific discovery. The integration of resources and the sharing of knowledge across organisations enable scientific discoveries to be accelerated by increasing the productivity of laboratories, optimising funding efficiency, and addressing emerging global challenges. This paper presents a novel framework for digitalising and automating the administration of research laboratories through The World Avatar, an all-encompassing dynamic knowledge graph. This Digital Laboratory Framework serves as a flexible tool, enabling users to efficiently leverage data from diverse systems and formats without being confined to a specific software or protocol. Establishing dedicated ontologies and agents and combining them with technologies such as QR codes, RFID tags, and mobile apps, enabled us to develop modular applications that tackle some key challenges related to lab management. Here, we showcase an automated tracking and intervention system for explosive chemicals as well as an easy-to-use mobile application for asset management and information retrieval. Implementing these, we have achieved semantic linking of BIM and BMS data with laboratory inventory and chemical knowledge. Our approach can capture the crucial data points and reduce inventory processing time. All data provenance is recorded following the FAIR principles, ensuring its accessibility and interoperability.
Collapse
Affiliation(s)
- Simon D Rihm
- CARES, Cambridge Centre for Advanced Research and Education in Singapore, 1 CREATE Way, CREATE Tower, #05-05, 138602, Singapore; Department of Chemical Engineering and Biotechnology, University of Cambridge, Philipppa Fawcett Drive, Cambridge, CB3 0AS, United Kingdom; Department of Chemical & Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, 117585, Singapore
| | - Yong Ren Tan
- CARES, Cambridge Centre for Advanced Research and Education in Singapore, 1 CREATE Way, CREATE Tower, #05-05, 138602, Singapore
| | - Wilson Ang
- CARES, Cambridge Centre for Advanced Research and Education in Singapore, 1 CREATE Way, CREATE Tower, #05-05, 138602, Singapore
| | - Markus Hofmeister
- CARES, Cambridge Centre for Advanced Research and Education in Singapore, 1 CREATE Way, CREATE Tower, #05-05, 138602, Singapore; Department of Chemical Engineering and Biotechnology, University of Cambridge, Philipppa Fawcett Drive, Cambridge, CB3 0AS, United Kingdom; Department of Chemical & Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, 117585, Singapore
| | - Xinhong Deng
- CARES, Cambridge Centre for Advanced Research and Education in Singapore, 1 CREATE Way, CREATE Tower, #05-05, 138602, Singapore
| | - Michael Teguh Laksana
- CARES, Cambridge Centre for Advanced Research and Education in Singapore, 1 CREATE Way, CREATE Tower, #05-05, 138602, Singapore
| | - Hou Yee Quek
- CARES, Cambridge Centre for Advanced Research and Education in Singapore, 1 CREATE Way, CREATE Tower, #05-05, 138602, Singapore
| | - Jiaru Bai
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philipppa Fawcett Drive, Cambridge, CB3 0AS, United Kingdom
| | - Laura Pascazio
- CARES, Cambridge Centre for Advanced Research and Education in Singapore, 1 CREATE Way, CREATE Tower, #05-05, 138602, Singapore
| | - Sim Chun Siong
- CARES, Cambridge Centre for Advanced Research and Education in Singapore, 1 CREATE Way, CREATE Tower, #05-05, 138602, Singapore
| | - Jethro Akroyd
- CARES, Cambridge Centre for Advanced Research and Education in Singapore, 1 CREATE Way, CREATE Tower, #05-05, 138602, Singapore; Department of Chemical Engineering and Biotechnology, University of Cambridge, Philipppa Fawcett Drive, Cambridge, CB3 0AS, United Kingdom; CMCL Innovations, Sheraton House, Cambridge, CB3 0AX, United Kingdom
| | - Sebastian Mosbach
- CARES, Cambridge Centre for Advanced Research and Education in Singapore, 1 CREATE Way, CREATE Tower, #05-05, 138602, Singapore; Department of Chemical Engineering and Biotechnology, University of Cambridge, Philipppa Fawcett Drive, Cambridge, CB3 0AS, United Kingdom; CMCL Innovations, Sheraton House, Cambridge, CB3 0AX, United Kingdom
| | - Markus Kraft
- CARES, Cambridge Centre for Advanced Research and Education in Singapore, 1 CREATE Way, CREATE Tower, #05-05, 138602, Singapore; Department of Chemical Engineering and Biotechnology, University of Cambridge, Philipppa Fawcett Drive, Cambridge, CB3 0AS, United Kingdom; CMCL Innovations, Sheraton House, Cambridge, CB3 0AX, United Kingdom; School of Chemical and Biomedical Engineering, Nanyang Technological University, 62 Nanyang Drive, 637459, Singapore; The Alan Turing Institute, 2QR, John Dodson House, 96 Euston Rd, London, NW1 2DB, United Kingdom.
| |
Collapse
|
4
|
da Silva RGL. The advancement of artificial intelligence in biomedical research and health innovation: challenges and opportunities in emerging economies. Global Health 2024; 20:44. [PMID: 38773458 PMCID: PMC11107016 DOI: 10.1186/s12992-024-01049-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 04/30/2024] [Indexed: 05/23/2024] Open
Abstract
The advancement of artificial intelligence (AI), algorithm optimization and high-throughput experiments has enabled scientists to accelerate the discovery of new chemicals and materials with unprecedented efficiency, resilience and precision. Over the recent years, the so-called autonomous experimentation (AE) systems are featured as key AI innovation to enhance and accelerate research and development (R&D). Also known as self-driving laboratories or materials acceleration platforms, AE systems are digital platforms capable of running a large number of experiments autonomously. Those systems are rapidly impacting biomedical research and clinical innovation, in areas such as drug discovery, nanomedicine, precision oncology, and others. As it is expected that AE will impact healthcare innovation from local to global levels, its implications for science and technology in emerging economies should be examined. By examining the increasing relevance of AE in contemporary R&D activities, this article aims to explore the advancement of artificial intelligence in biomedical research and health innovation, highlighting its implications, challenges and opportunities in emerging economies. AE presents an opportunity for stakeholders from emerging economies to co-produce the global knowledge landscape of AI in health. However, asymmetries in R&D capabilities should be acknowledged since emerging economies suffers from inadequacies and discontinuities in resources and funding. The establishment of decentralized AE infrastructures could support stakeholders to overcome local restrictions and opens venues for more culturally diverse, equitable, and trustworthy development of AI in health-related R&D through meaningful partnerships and engagement. Collaborations with innovators from emerging economies could facilitate anticipation of fiscal pressures in science and technology policies, obsolescence of knowledge infrastructures, ethical and regulatory policy lag, and other issues present in the Global South. Also, improving cultural and geographical representativeness of AE contributes to foster the diffusion and acceptance of AI in health-related R&D worldwide. Institutional preparedness is critical and could enable stakeholders to navigate opportunities of AI in biomedical research and health innovation in the coming years.
Collapse
Affiliation(s)
- Renan Gonçalves Leonel da Silva
- Health Ethics and Policy Lab, Department of Health Sciences and Technology, ETH Zurich, Hottingerstrasse 10, HOA 17, Zurich, 8092, Switzerland.
| |
Collapse
|
5
|
Tran D, Pascazio L, Akroyd J, Mosbach S, Kraft M. Leveraging Text-to-Text Pretrained Language Models for Question Answering in Chemistry. ACS OMEGA 2024; 9:13883-13896. [PMID: 38559914 PMCID: PMC10976360 DOI: 10.1021/acsomega.3c08842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/06/2024] [Accepted: 02/27/2024] [Indexed: 04/04/2024]
Abstract
In this study, we present a question answering (QA) system for chemistry, named Marie, with the use of a text-to-text pretrained language model to attain accurate data retrieval. The underlying data store is "The World Avatar" (TWA), a general world model consisting of a knowledge graph that evolves over time. TWA includes information about chemical species such as their chemical and physical properties, applications, and chemical classifications. Building upon our previous work on KGQA for chemistry, this advanced version of Marie leverages a fine-tuned Flan-T5 model to seamlessly translate natural language questions into SPARQL queries with no separate components for entity and relation linking. The developed QA system demonstrates competence in providing accurate results for complex queries that involve many relation hops as well as showcasing the ability to balance correctness and speed for real-world usage. This new approach offers significant advantages over the prior implementation that relied on knowledge graph embedding. Specifically, the updated system boasts high accuracy and great flexibility in accommodating changes and evolution of the data stored in the knowledge graph without necessitating retraining. Our evaluation results underscore the efficacy of the improved system, highlighting its superior accuracy and the ability in answering complex questions compared to its predecessor.
Collapse
Affiliation(s)
- Dan Tran
- CARES, Cambridge Centre for Advanced Research and Education
in Singapore, 1 Create Way, CREATE Tower, #05-05, Singapore 138602, Singapore
| | - Laura Pascazio
- CARES, Cambridge Centre for Advanced Research and Education
in Singapore, 1 Create Way, CREATE Tower, #05-05, Singapore 138602, Singapore
| | - Jethro Akroyd
- CARES, Cambridge Centre for Advanced Research and Education
in Singapore, 1 Create Way, CREATE Tower, #05-05, Singapore 138602, Singapore
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
- CMCL
Innovations, Sheraton
House, Castle Park, Cambridge CB3 0AX, U.K.
| | - Sebastian Mosbach
- CARES, Cambridge Centre for Advanced Research and Education
in Singapore, 1 Create Way, CREATE Tower, #05-05, Singapore 138602, Singapore
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
- CMCL
Innovations, Sheraton
House, Castle Park, Cambridge CB3 0AX, U.K.
| | - Markus Kraft
- CARES, Cambridge Centre for Advanced Research and Education
in Singapore, 1 Create Way, CREATE Tower, #05-05, Singapore 138602, Singapore
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
- CMCL
Innovations, Sheraton
House, Castle Park, Cambridge CB3 0AX, U.K.
- School
of Chemical and Biomedical Engineering, Nanyang Technological University, 62 Nanyang Drive, Singapore 637459, Singapore
- The
Alan Turing Institute, 96 Euston Rd., London NW1 2DB, U.K.
| |
Collapse
|
6
|
Montoya ID, Volkow ND. IUPHAR Review: New strategies for medications to treat substance use disorders. Pharmacol Res 2024; 200:107078. [PMID: 38246477 PMCID: PMC10922847 DOI: 10.1016/j.phrs.2024.107078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/11/2024] [Accepted: 01/15/2024] [Indexed: 01/23/2024]
Abstract
Substance use disorders (SUDs) and drug overdose are a public health emergency and safe and effective treatments are urgently needed. Developing new medications to treat them is expensive, time-consuming, and the probability of a compound progressing to clinical trials and obtaining FDA-approval is low. The small number of FDA-approved medications for SUDs reflects the low interest of pharmaceutical companies to invest in this area due to market forces, characteristics of the population (e.g., stigma, and socio-economic and legal disadvantages), and the high bar regulatory agencies set for new medication approval. In consequence, most research on medications is funded by government agencies, such as the National Institute on Drug Abuse (NIDA). Multiple scientific opportunities are emerging that can accelerate the discovery and development of new medications for SUDs. These include fast and efficient tools to screen new molecules, discover new medication targets, use of big data to explore large clinical data sets and artificial intelligence (AI) applications to make predictions, and precision medicine tools to individualize and optimize treatments. This review provides a general description of these new research strategies for the development of medications to treat SUDs with emphasis on the gaps and scientific opportunities. It includes a brief overview of the rising public health toll of SUDs; the justification, challenges, and opportunities to develop new medications; and a discussion of medications and treatment endpoints that are being evaluated with support from NIDA.
Collapse
Affiliation(s)
- Ivan D Montoya
- Division of Therapeutics and Medical Consequences, National Institute on Drug Abuse, 3 White Flint North, North Bethesda, MD 20852, United States.
| | - Nora D Volkow
- National Institute on Drug Abuse, 3 White Flint North, North Bethesda, MD 20852, United States
| |
Collapse
|
7
|
Bai J, Mosbach S, Taylor CJ, Karan D, Lee KF, Rihm SD, Akroyd J, Lapkin AA, Kraft M. A dynamic knowledge graph approach to distributed self-driving laboratories. Nat Commun 2024; 15:462. [PMID: 38263405 PMCID: PMC10805810 DOI: 10.1038/s41467-023-44599-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 12/21/2023] [Indexed: 01/25/2024] Open
Abstract
The ability to integrate resources and share knowledge across organisations empowers scientists to expedite the scientific discovery process. This is especially crucial in addressing emerging global challenges that require global solutions. In this work, we develop an architecture for distributed self-driving laboratories within The World Avatar project, which seeks to create an all-encompassing digital twin based on a dynamic knowledge graph. We employ ontologies to capture data and material flows in design-make-test-analyse cycles, utilising autonomous agents as executable knowledge components to carry out the experimentation workflow. Data provenance is recorded to ensure its findability, accessibility, interoperability, and reusability. We demonstrate the practical application of our framework by linking two robots in Cambridge and Singapore for a collaborative closed-loop optimisation for a pharmaceutically-relevant aldol condensation reaction in real-time. The knowledge graph autonomously evolves toward the scientist's research goals, with the two robots effectively generating a Pareto front for cost-yield optimisation in three days.
Collapse
Affiliation(s)
- Jiaru Bai
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK
| | - Sebastian Mosbach
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK
- Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore
| | - Connor J Taylor
- Astex Pharmaceuticals, 436 Cambridge Science Park Milton Road, Cambridge, CB4 0QA, UK
- Innovation Centre in Digital Molecular Technologies, Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
- Faculty of Engineering, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
| | - Dogancan Karan
- Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore
| | - Kok Foong Lee
- CMCL Innovations, Sheraton House, Cambridge, CB3 0AX, UK
| | - Simon D Rihm
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK
- Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore
| | - Jethro Akroyd
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK
- Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore
| | - Alexei A Lapkin
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK
- Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore
- Innovation Centre in Digital Molecular Technologies, Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Markus Kraft
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK.
- Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore.
- School of Chemical and Biomedical Engineering, Nanyang Technological University, 62 Nanyang Drive, 637459, Singapore, Singapore.
- The Alan Turing Institute, London, NW1 2DB, UK.
| |
Collapse
|