1
|
Dong C, Li D, Liu J. Glass Transition Temperature Prediction of Polymers via Graph Reinforcement Learning. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2024; 40:18568-18580. [PMID: 39166275 DOI: 10.1021/acs.langmuir.4c01906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/22/2024]
Abstract
An expansive array of graph-based models has been utilized for accurate prediction of the structure-property relation of polymers. However, these approaches notably underutilize unsupervised structural information. Concentrating on the domain of heterocyclic polymers, particularly polyimides, this study delves into the glass transition temperature (Tg) prediction, aiming to fully exploit the potential within both the global and local structures of molecules. To achieve this, a graph reinforcement learning framework termed Molecular Structural Regularized Graph Convolutional Network with Reinforcement Learning (MSRGCN-RL) is proposed. Experimental results highlight the crucial role of both global and local structural regularization in precise Tg prediction. Concurrently, optimization of MSRGCN training through RL proves essential. This research leads the way in integrating Graph Neural Networks (GNNs) with reinforcement learning methodologies for the property prediction of polymers.
Collapse
Affiliation(s)
- Caibo Dong
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Dazi Li
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Jun Liu
- Key Laboratory of Beijing City on Preparation and Processing of Novel Polymer Materials, Beijing University of Chemical Technology, Beijing 100029, China
- State Key Laboratory of Organic-Inorganic Composites, College of Materials Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, China
| |
Collapse
|
2
|
Singh S, Kaur N, Gehlot A. Application of artificial intelligence in drug design: A review. Comput Biol Med 2024; 179:108810. [PMID: 38991316 DOI: 10.1016/j.compbiomed.2024.108810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/31/2024] [Accepted: 06/24/2024] [Indexed: 07/13/2024]
Abstract
Artificial intelligence (AI) is a field of computer science that involves acquiring information, developing rule bases, and mimicking human behaviour. The fundamental concept behind AI is to create intelligent computer systems that can operate with minimal human intervention or without any intervention at all. These rule-based systems are developed using various machine learning and deep learning models, enabling them to solve complex problems. AI is integrated with these models to learn, understand, and analyse provided data. The rapid advancement of Artificial Intelligence (AI) is reshaping numerous industries, with the pharmaceutical sector experiencing a notable transformation. AI is increasingly being employed to automate, optimize, and personalize various facets of the pharmaceutical industry, particularly in pharmacological research. Traditional drug development methods areknown for being time-consuming, expensive, and less efficient, often taking around a decade and costing billions of dollars. The integration of artificial intelligence (AI) techniques addresses these challenges by enabling the examination of compounds with desired properties from a vast pool of input drugs. Furthermore, it plays a crucial role in drug screening by predicting toxicity, bioactivity, ADME properties (absorption, distribution, metabolism, and excretion), physicochemical properties, and more. AI enhances the drug design process by improving the efficiency and accuracy of predicting drug behaviour, interactions, and properties. These approaches further significantly improve the precision of drug discovery processes and decrease clinical trial costs leading to the development of more effective drugs.
Collapse
Affiliation(s)
- Simrandeep Singh
- Department of Electronics & Communication Engineering, UCRD, Chandigarh University, Gharuan, Punjab, India.
| | - Navjot Kaur
- Department of Pharmacognosy, Amar Shaheed Baba Ajit Singh Jujhar Singh Memorial College of Pharmacy, Bela, Ropar, India
| | - Anita Gehlot
- Uttaranchal Institute of technology, Uttaranchal University, Dehradun, India
| |
Collapse
|
3
|
Snow O, Kazemi A, Bhanshali F, Nasiri A, Rozek A, Ester M. Identifying Synergistic Components of Botanical Fungicide Formulations Using Interpretable Graph Neural Networks. J Chem Inf Model 2024; 64:5786-5795. [PMID: 39031079 DOI: 10.1021/acs.jcim.4c00128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/22/2024]
Abstract
Botanical formulations are promising candidates for developing new biopesticides that can protect crops from pests and diseases while reducing harm to the environment. These biopesticides can be combined with permeation enhancer compounds to boost their efficacy against pests and fungal diseases. However, finding synergistic combinations of these compounds is challenging due to the large and complex chemical space. In this paper, we propose a novel deep learning method that can predict the synergy of botanical products and permeation enhancers based on in vitro assay data. Our method uses a weighted combination of component feature vectors to represent the input mixtures, which enables the model to handle a variable number of components and to interpret the contribution of each component to the synergy. We also employ an ensemble of interpretation methods to provide insights into the underlying mechanisms of synergy. We validate our method by testing the predicted synergistic combinations in wet-lab experiments and show that our method can discover novel and effective biopesticides that would otherwise be difficult to find. Our method is generalizable and applicable to other domains, where predicting mixtures of chemical compounds is important.
Collapse
Affiliation(s)
- Oliver Snow
- Terramera, Vancouver, British Columbia V5Y 1K3, Canada
| | - Amirreza Kazemi
- Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada
| | | | - Alyas Nasiri
- Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada
| | - Annett Rozek
- Terramera, Vancouver, British Columbia V5Y 1K3, Canada
| | - Martin Ester
- Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada
| |
Collapse
|
4
|
Luo Y, Shan W, Peng L, Luo L, Ding P, Liang W. A Computational Framework for Predicting Novel Drug Indications Using Graph Convolutional Network With Contrastive Learning. IEEE J Biomed Health Inform 2024; 28:4503-4511. [PMID: 38607707 DOI: 10.1109/jbhi.2024.3387937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/14/2024]
Abstract
Inferring potential drug indications plays a vital role in the drug discovery process. It can be time-consuming and costly to discover novel drug indications through biological experiments. Recently, graph learning-based methods have gained popularity for this task. These methods typically treat the prediction task as a binary classification problem, focusing on modeling associations between drugs and diseases within a graph. However, labeled data for drug indication prediction is often limited and expensive to acquire. Contrastive learning addresses this challenge by aligning similar drug-disease pairs and separating dissimilar pairs in the embedding space. Thus, we developed a model called DrIGCL for drug indication prediction, which utilizes graph convolutional networks and contrastive learning. DrIGCL incorporates drug structure, disease comorbidities, and known drug indications to extract representations of drugs and diseases. By combining contrastive and classification losses, DrIGCL predicts drug indications effectively. In multiple runs of hold-out validation experiments, DrIGCL consistently outperformed existing computational methods for drug indication prediction, particularly in terms of top-k. Furthermore, our ablation study has demonstrated a significant improvement in the predictive capabilities of our model when utilizing contrastive learning. Finally, we validated the practical usefulness of DrIGCL by examining the predicted novel indications of Aspirin.
Collapse
|
5
|
Wang K, Zheng F, Cheng L, Dai HN, Dou Q, Qin J. Breast Cancer Classification From Digital Pathology Images via Connectivity-Aware Graph Transformer. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2854-2865. [PMID: 38526888 DOI: 10.1109/tmi.2024.3381239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Automated classification of breast cancer subtypes from digital pathology images has been an extremely challenging task due to the complicated spatial patterns of cells in the tissue micro-environment. While newly proposed graph transformers are able to capture more long-range dependencies to enhance accuracy, they largely ignore the topological connectivity between graph nodes, which is nevertheless critical to extract more representative features to address this difficult task. In this paper, we propose a novel connectivity-aware graph transformer (CGT) for phenotyping the topology connectivity of the tissue graph constructed from digital pathology images for breast cancer classification. Our CGT seamlessly integrates connectivity embedding to node feature at every graph transformer layer by using local connectivity aggregation, in order to yield more comprehensive graph representations to distinguish different breast cancer subtypes. In light of the realistic intercellular communication mode, we then encode the spatial distance between two arbitrary nodes as connectivity bias in self-attention calculation, thereby allowing the CGT to distinctively harness the connectivity embedding based on the distance of two nodes. We extensively evaluate the proposed CGT on a large cohort of breast carcinoma digital pathology images stained by Haematoxylin & Eosin. Experimental results demonstrate the effectiveness of our CGT, which outperforms state-of-the-art methods by a large margin. Codes are released on https://github.com/wang-kang-6/CGT.
Collapse
|
6
|
Abbas MKG, Rassam A, Karamshahi F, Abunora R, Abouseada M. The Role of AI in Drug Discovery. Chembiochem 2024; 25:e202300816. [PMID: 38735845 DOI: 10.1002/cbic.202300816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 05/09/2024] [Accepted: 05/10/2024] [Indexed: 05/14/2024]
Abstract
The emergence of Artificial Intelligence (AI) in drug discovery marks a pivotal shift in pharmaceutical research, blending sophisticated computational techniques with conventional scientific exploration to break through enduring obstacles. This review paper elucidates the multifaceted applications of AI across various stages of drug development, highlighting significant advancements and methodologies. It delves into AI's instrumental role in drug design, polypharmacology, chemical synthesis, drug repurposing, and the prediction of drug properties such as toxicity, bioactivity, and physicochemical characteristics. Despite AI's promising advancements, the paper also addresses the challenges and limitations encountered in the field, including data quality, generalizability, computational demands, and ethical considerations. By offering a comprehensive overview of AI's role in drug discovery, this paper underscores the technology's potential to significantly enhance drug development, while also acknowledging the hurdles that must be overcome to fully realize its benefits.
Collapse
Affiliation(s)
- M K G Abbas
- Center for Advanced Materials, Qatar University, P.O. Box, 2713, Doha, Qatar
| | - Abrar Rassam
- Secondary Education, Educational Sciences, Qatar University, P.O. Box, 2713, Doha, Qatar
| | - Fatima Karamshahi
- Department of Chemistry and Earth Sciences, Qatar University, P.O. Box, 2713, Doha, Qatar
| | - Rehab Abunora
- Faculty of Medicine, General Medicine and Surgery, Helwan University, Cairo, Egypt
| | - Maha Abouseada
- Department of Chemistry and Earth Sciences, Qatar University, P.O. Box, 2713, Doha, Qatar
| |
Collapse
|
7
|
Hu X, Sun Z, Nian Y, Wang Y, Dang Y, Li F, Feng J, Yu E, Tao C. Self-Explainable Graph Neural Network for Alzheimer Disease and Related Dementias Risk Prediction: Algorithm Development and Validation Study. JMIR Aging 2024; 7:e54748. [PMID: 38976869 PMCID: PMC11263893 DOI: 10.2196/54748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 03/31/2024] [Accepted: 06/02/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Alzheimer disease and related dementias (ADRD) rank as the sixth leading cause of death in the United States, underlining the importance of accurate ADRD risk prediction. While recent advancements in ADRD risk prediction have primarily relied on imaging analysis, not all patients undergo medical imaging before an ADRD diagnosis. Merging machine learning with claims data can reveal additional risk factors and uncover interconnections among diverse medical codes. OBJECTIVE The study aims to use graph neural networks (GNNs) with claim data for ADRD risk prediction. Addressing the lack of human-interpretable reasons behind these predictions, we introduce an innovative, self-explainable method to evaluate relationship importance and its influence on ADRD risk prediction. METHODS We used a variationally regularized encoder-decoder GNN (variational GNN [VGNN]) integrated with our proposed relation importance method for estimating ADRD likelihood. This self-explainable method can provide a feature-important explanation in the context of ADRD risk prediction, leveraging relational information within a graph. Three scenarios with 1-year, 2-year, and 3-year prediction windows were created to assess the model's efficiency, respectively. Random forest (RF) and light gradient boost machine (LGBM) were used as baselines. By using this method, we further clarify the key relationships for ADRD risk prediction. RESULTS In scenario 1, the VGNN model showed area under the receiver operating characteristic (AUROC) scores of 0.7272 and 0.7480 for the small subset and the matched cohort data set. It outperforms RF and LGBM by 10.6% and 9.1%, respectively, on average. In scenario 2, it achieved AUROC scores of 0.7125 and 0.7281, surpassing the other models by 10.5% and 8.9%, respectively. Similarly, in scenario 3, AUROC scores of 0.7001 and 0.7187 were obtained, exceeding 10.1% and 8.5% than the baseline models, respectively. These results clearly demonstrate the significant superiority of the graph-based approach over the tree-based models (RF and LGBM) in predicting ADRD. Furthermore, the integration of the VGNN model and our relation importance interpretation could provide valuable insight into paired factors that may contribute to or delay ADRD progression. CONCLUSIONS Using our innovative self-explainable method with claims data enhances ADRD risk prediction and provides insights into the impact of interconnected medical code relationships. This methodology not only enables ADRD risk modeling but also shows potential for other image analysis predictions using claims data.
Collapse
Affiliation(s)
- Xinyue Hu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Zenan Sun
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Yi Nian
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Yichen Wang
- Division of Hospital Medicine at Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA, United States
| | - Yifang Dang
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Fang Li
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Jingna Feng
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Evan Yu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Cui Tao
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| |
Collapse
|
8
|
Nguyen ATN, Nguyen DTN, Koh HY, Toskov J, MacLean W, Xu A, Zhang D, Webb GI, May LT, Halls ML. The application of artificial intelligence to accelerate G protein-coupled receptor drug discovery. Br J Pharmacol 2024; 181:2371-2384. [PMID: 37161878 DOI: 10.1111/bph.16140] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 04/14/2023] [Accepted: 04/27/2023] [Indexed: 05/11/2023] Open
Abstract
The application of artificial intelligence (AI) approaches to drug discovery for G protein-coupled receptors (GPCRs) is a rapidly expanding area. Artificial intelligence can be used at multiple stages during the drug discovery process, from aiding our understanding of the fundamental actions of GPCRs to the discovery of new ligand-GPCR interactions or the prediction of clinical responses. Here, we provide an overview of the concepts behind artificial intelligence, including the subfields of machine learning and deep learning. We summarise the published applications of artificial intelligence to different stages of the GPCR drug discovery process. Finally, we reflect on the benefits and limitations of artificial intelligence and share our vision for the exciting potential for further development of applications to aid GPCR drug discovery. In addition to making the drug discovery process "faster, smarter and cheaper," we anticipate that the application of artificial intelligence will create exciting new opportunities for GPCR drug discovery. LINKED ARTICLES: This article is part of a themed issue Therapeutic Targeting of G Protein-Coupled Receptors: hot topics from the Australasian Society of Clinical and Experimental Pharmacologists and Toxicologists 2021 Virtual Annual Scientific Meeting. To view the other articles in this section visit http://onlinelibrary.wiley.com/doi/10.1111/bph.v181.14/issuetoc.
Collapse
Affiliation(s)
- Anh T N Nguyen
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - Diep T N Nguyen
- Department of Information Technology, Faculty of Engineering and Technology, Vietnam National University, Cau Giay, Hanoi, Vietnam
| | - Huan Yee Koh
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Jason Toskov
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - William MacLean
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - Andrew Xu
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - Daokun Zhang
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Geoffrey I Webb
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Lauren T May
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - Michelle L Halls
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| |
Collapse
|
9
|
Ju W, Fang Z, Gu Y, Liu Z, Long Q, Qiao Z, Qin Y, Shen J, Sun F, Xiao Z, Yang J, Yuan J, Zhao Y, Wang Y, Luo X, Zhang M. A Comprehensive Survey on Deep Graph Representation Learning. Neural Netw 2024; 173:106207. [PMID: 38442651 DOI: 10.1016/j.neunet.2024.106207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 01/23/2024] [Accepted: 02/21/2024] [Indexed: 03/07/2024]
Abstract
Graph representation learning aims to effectively encode high-dimensional sparse graph-structured data into low-dimensional dense vectors, which is a fundamental task that has been widely studied in a range of fields, including machine learning and data mining. Classic graph embedding methods follow the basic idea that the embedding vectors of interconnected nodes in the graph can still maintain a relatively close distance, thereby preserving the structural information between the nodes in the graph. However, this is sub-optimal due to: (i) traditional methods have limited model capacity which limits the learning performance; (ii) existing techniques typically rely on unsupervised learning strategies and fail to couple with the latest learning paradigms; (iii) representation learning and downstream tasks are dependent on each other which should be jointly enhanced. With the remarkable success of deep learning, deep graph representation learning has shown great potential and advantages over shallow (traditional) methods, there exist a large number of deep graph representation learning techniques have been proposed in the past decade, especially graph neural networks. In this survey, we conduct a comprehensive survey on current deep graph representation learning algorithms by proposing a new taxonomy of existing state-of-the-art literature. Specifically, we systematically summarize the essential components of graph representation learning and categorize existing approaches by the ways of graph neural network architectures and the most recent advanced learning paradigms. Moreover, this survey also provides the practical and promising applications of deep graph representation learning. Last but not least, we state new perspectives and suggest challenging directions which deserve further investigations in the future.
Collapse
Affiliation(s)
- Wei Ju
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Zheng Fang
- School of Intelligence Science and Technology, Peking University, Beijing, 100871, China
| | - Yiyang Gu
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Zequn Liu
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Qingqing Long
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100086, China
| | - Ziyue Qiao
- Artificial Intelligence Thrust, The Hong Kong University of Science and Technology, Guangzhou, 511453, China
| | - Yifang Qin
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Jianhao Shen
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Fang Sun
- Department of Computer Science, University of California, Los Angeles, 90095, USA
| | - Zhiping Xiao
- Department of Computer Science, University of California, Los Angeles, 90095, USA
| | - Junwei Yang
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Jingyang Yuan
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Yusheng Zhao
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Yifan Wang
- School of Information Technology & Management, University of International Business and Economics, Beijing, 100029, China
| | - Xiao Luo
- Department of Computer Science, University of California, Los Angeles, 90095, USA.
| | - Ming Zhang
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China.
| |
Collapse
|
10
|
Singh S, Zeh G, Freiherr J, Bauer T, Türkmen I, Grasskamp AT. Classification of substances by health hazard using deep neural networks and molecular electron densities. J Cheminform 2024; 16:45. [PMID: 38627862 PMCID: PMC11302296 DOI: 10.1186/s13321-024-00835-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 03/23/2024] [Indexed: 08/09/2024] Open
Abstract
In this paper we present a method that allows leveraging 3D electron density information to train a deep neural network pipeline to segment regions of high, medium and low electronegativity and classify substances as health hazardous or non-hazardous. We show that this can be used for use-cases such as cosmetics and food products. For this purpose, we first generate 3D electron density cubes using semiempirical molecular calculations for a custom European Chemicals Agency (ECHA) subset consisting of substances labelled as hazardous and non-hazardous for cosmetic usage. Together with their 3-class electronegativity maps we train a modified 3D-UNet with electron density cubes to segment reactive sites in molecules and classify substances with an accuracy of 78.1%. We perform the same process on a custom food dataset (CompFood) consisting of hazardous and non-hazardous substances compiled from European Food Safety Authority (EFSA) OpenFoodTox, Food and Drug Administration (FDA) Generally Recognized as Safe (GRAS) and FooDB datasets to achieve a classification accuracy of 64.1%. Our results show that 3D electron densities and particularly masked electron densities, calculated by taking a product of original electron densities and regions of high and low electronegativity can be used to classify molecules for different use-cases and thus serve not only to guide safe-by-design product development but also aid in regulatory decisions. SCIENTIFIC CONTRIBUTION: We aim to contribute to the diverse 3D molecular representations used for training machine learning algorithms by showing that a deep learning network can be trained on 3D electron density representation of molecules. This approach has previously not been used to train machine learning models and it allows utilization of the true spatial domain of the molecule for prediction of properties such as their suitability for usage in cosmetics and food products and in future, to other molecular properties. The data and code used for training is accessible at https://github.com/s-singh-ivv/eDen-Substances .
Collapse
Affiliation(s)
- Satnam Singh
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
- Department of Psychiatry and Psychotherapy, Friedrich-Alexander-Universität Erlangen-Nürnberg, Schwabachanlage 6, 91054, Erlangen, Germany
| | - Gina Zeh
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
| | - Jessica Freiherr
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
- Department of Psychiatry and Psychotherapy, Friedrich-Alexander-Universität Erlangen-Nürnberg, Schwabachanlage 6, 91054, Erlangen, Germany
| | - Thilo Bauer
- Computer Chemistry Center, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nägelsbachstr. 25, 91052, Erlangen, Germany
| | - Isik Türkmen
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
| | - Andreas T Grasskamp
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany.
| |
Collapse
|
11
|
Kengkanna A, Ohue M. Enhancing property and activity prediction and interpretation using multiple molecular graph representations with MMGX. Commun Chem 2024; 7:74. [PMID: 38580841 PMCID: PMC10997661 DOI: 10.1038/s42004-024-01155-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/18/2024] [Indexed: 04/07/2024] Open
Abstract
Graph Neural Networks (GNNs) excel in compound property and activity prediction, but the choice of molecular graph representations significantly influences model learning and interpretation. While atom-level molecular graphs resemble natural topology, they overlook key substructures or functional groups and their interpretation partially aligns with chemical intuition. Recent research suggests alternative representations using reduced molecular graphs to integrate higher-level chemical information and leverages both representations for model. However, there is a lack of studies about applicability and impact of different molecular graphs on model learning and interpretation. Here, we introduce MMGX (Multiple Molecular Graph eXplainable discovery), investigating the effects of multiple molecular graphs, including Atom, Pharmacophore, JunctionTree, and FunctionalGroup, on model learning and interpretation with various perspectives. Our findings indicate that multiple graphs relatively improve model performance, but in varying degrees depending on datasets. Interpretation from multiple graphs in different views provides more comprehensive features and potential substructures consistent with background knowledge. These results help to understand model decisions and offer valuable insights for subsequent tasks. The concept of multiple molecular graph representations and diverse interpretation perspectives has broad applicability across tasks, architectures, and explanation techniques, enhancing model learning and interpretation for relevant applications in drug discovery.
Collapse
Affiliation(s)
- Apakorn Kengkanna
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Kanagawa, 226-8501, Japan
| | - Masahito Ohue
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Kanagawa, 226-8501, Japan.
| |
Collapse
|
12
|
Nandi S, Bhaduri S, Das D, Ghosh P, Mandal M, Mitra P. Deciphering the Lexicon of Protein Targets: A Review on Multifaceted Drug Discovery in the Era of Artificial Intelligence. Mol Pharm 2024; 21:1563-1590. [PMID: 38466810 DOI: 10.1021/acs.molpharmaceut.3c01161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Understanding protein sequence and structure is essential for understanding protein-protein interactions (PPIs), which are essential for many biological processes and diseases. Targeting protein binding hot spots, which regulate signaling and growth, with rational drug design is promising. Rational drug design uses structural data and computational tools to study protein binding sites and protein interfaces to design inhibitors that can change these interactions, thereby potentially leading to therapeutic approaches. Artificial intelligence (AI), such as machine learning (ML) and deep learning (DL), has advanced drug discovery and design by providing computational resources and methods. Quantum chemistry is essential for drug reactivity, toxicology, drug screening, and quantitative structure-activity relationship (QSAR) properties. This review discusses the methodologies and challenges of identifying and characterizing hot spots and binding sites. It also explores the strategies and applications of artificial-intelligence-based rational drug design technologies that target proteins and protein-protein interaction (PPI) binding hot spots. It provides valuable insights for drug design with therapeutic implications. We have also demonstrated the pathological conditions of heat shock protein 27 (HSP27) and matrix metallopoproteinases (MMP2 and MMP9) and designed inhibitors of these proteins using the drug discovery paradigm in a case study on the discovery of drug molecules for cancer treatment. Additionally, the implications of benzothiazole derivatives for anticancer drug design and discovery are deliberated.
Collapse
Affiliation(s)
- Suvendu Nandi
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Soumyadeep Bhaduri
- Centre for Computational and Data Sciences, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Debraj Das
- Centre for Computational and Data Sciences, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Priya Ghosh
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Mahitosh Mandal
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| |
Collapse
|
13
|
Wei J, Xiao J, Chen S, Zong L, Gao X, Li Y. ProNet DB: a proteome-wise database for protein surface property representations and RNA-binding profiles. Database (Oxford) 2024; 2024:baae012. [PMID: 38557634 PMCID: PMC10984565 DOI: 10.1093/database/baae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 01/08/2024] [Accepted: 02/17/2024] [Indexed: 04/04/2024]
Abstract
The rapid growth in the number of experimental and predicted protein structures and more complicated protein structures poses a significant challenge for computational biology in leveraging structural information and accurate representation of protein surface properties. Recently, AlphaFold2 released the comprehensive proteomes of various species, and protein surface property representation plays a crucial role in protein-molecule interaction predictions, including those involving proteins, nucleic acids and compounds. Here, we proposed the first extensive database, namely ProNet DB, that integrates multiple protein surface representations and RNA-binding landscape for 326 175 protein structures. This collection encompasses the 16 model organism proteomes from the AlphaFold Protein Structure Database and experimentally validated structures from the Protein Data Bank. For each protein, ProNet DB provides access to the original protein structures along with the detailed surface property representations encompassing hydrophobicity, charge distribution and hydrogen bonding potential as well as interactive features such as the interacting face and RNA-binding sites and preferences. To facilitate an intuitive interpretation of these properties and the RNA-binding landscape, ProNet DB incorporates visualization tools like Mol* and an Online 3D Viewer, allowing for the direct observation and analysis of these representations on protein surfaces. The availability of pre-computed features enables instantaneous access for users, significantly advancing computational biology research in areas such as molecular mechanism elucidation, geometry-based drug discovery and the development of novel therapeutic approaches. Database URL: https://proj.cse.cuhk.edu.hk/aihlab/pronet/.
Collapse
Affiliation(s)
- Junkang Wei
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Chung Chi Rd, Ma Liu Shui, Hong Kong SAR 999077, China
| | - Jin Xiao
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Chung Chi Rd, Ma Liu Shui, Hong Kong SAR 999077, China
| | - Siyuan Chen
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, Thuwal 23955, Kingdom of Saudi Arabia
| | - Licheng Zong
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Chung Chi Rd, Ma Liu Shui, Hong Kong SAR 999077, China
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, Thuwal 23955, Kingdom of Saudi Arabia
| | - Yu Li
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Chung Chi Rd, Ma Liu Shui, Hong Kong SAR 999077, China
- The CUHK Shenzhen Research Institute, 4 Gaoxin Ave Nanshan, Shenzhen 518057, China
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 45 Carleton Street, Cambridge, MA 02142, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 201 Brookline Avenue, Boston, MA 02215, USA
- Broad Institute of MIT and Harvard, Merkin Building, 415 Main Street, Cambridge, MA 02142, USA
| |
Collapse
|
14
|
Taylor-King JP, Bronstein M, Roblin D. The Future of Machine Learning Within Target Identification: Causality, Reversibility, and Druggability. Clin Pharmacol Ther 2024; 115:655-657. [PMID: 38169071 DOI: 10.1002/cpt.3158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Accepted: 12/18/2023] [Indexed: 01/05/2024]
Affiliation(s)
| | - Michael Bronstein
- Relation Therapeutics, London, UK
- Department of Computer Science, University of Oxford, Oxford, UK
| | - David Roblin
- Relation Therapeutics, London, UK
- The Francis Crick Institute, London, UK
| |
Collapse
|
15
|
Tayebi J, BabaAli B. EKGDR: An End-to-End Knowledge Graph-Based Method for Computational Drug Repurposing. J Chem Inf Model 2024; 64:1868-1881. [PMID: 38483449 DOI: 10.1021/acs.jcim.3c01925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
The lengthy and expensive process of developing new drugs from scratch, coupled with a high failure rate, has prompted the emergence of drug repurposing/repositioning as a more efficient and cost-effective approach. This approach involves identifying new therapeutic applications for existing approved drugs, leveraging the extensive drug-related data already gathered. However, the diversity and heterogeneity of data, along with the limited availability of known drug-disease interactions, pose significant challenges to computational drug design. To address these challenges, this study introduces EKGDR, an end-to-end knowledge graph-based approach for computational drug repurposing. EKGDR utilizes the power of a drug knowledge graph, a comprehensive repository of drug-related information that encompasses known drug interactions and various categorization information, as well as structural molecular descriptors of drugs. EKGDR employs graph neural networks, a cutting-edge graph representation learning technique, to embed the drug knowledge graph (nodes and relations) in an end-to-end manner. By doing so, EKGDR can effectively learn the underlying causes (intents) behind drug-disease interactions and recursively aggregate and combine relational messages between nodes along different multihop neighborhood paths (relational paths). This process generates representations of disease and drug nodes, enabling EKGDR to predict the interaction probability for each drug-disease pair in an end-to-end manner. The obtained results demonstrate that EKGDR outperforms previous models in all three evaluation metrics: area under the receiver operating characteristic curve (AUROC = 0.9475), area under the precision-recall curve (AUPRC = 0.9490), and recall at the top-200 recommendations (Recall@200 = 0.8315). To further validate EKGDR's effectiveness, we evaluated the top-20 candidate drugs suggested for each of Alzheimer's and Parkinson's diseases.
Collapse
Affiliation(s)
- Javad Tayebi
- School of Mathematics, Statistics and Computer Science, University of Tehran, Tehran 141556455, Iran
| | - Bagher BabaAli
- School of Mathematics, Statistics and Computer Science, University of Tehran, Tehran 141556455, Iran
| |
Collapse
|
16
|
Metzcar J, Jutzeler CR, Macklin P, Köhn-Luque A, Brüningk SC. A review of mechanistic learning in mathematical oncology. Front Immunol 2024; 15:1363144. [PMID: 38533513 PMCID: PMC10963621 DOI: 10.3389/fimmu.2024.1363144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 02/20/2024] [Indexed: 03/28/2024] Open
Abstract
Mechanistic learning refers to the synergistic combination of mechanistic mathematical modeling and data-driven machine or deep learning. This emerging field finds increasing applications in (mathematical) oncology. This review aims to capture the current state of the field and provides a perspective on how mechanistic learning may progress in the oncology domain. We highlight the synergistic potential of mechanistic learning and point out similarities and differences between purely data-driven and mechanistic approaches concerning model complexity, data requirements, outputs generated, and interpretability of the algorithms and their results. Four categories of mechanistic learning (sequential, parallel, extrinsic, intrinsic) of mechanistic learning are presented with specific examples. We discuss a range of techniques including physics-informed neural networks, surrogate model learning, and digital twins. Example applications address complex problems predominantly from the domain of oncology research such as longitudinal tumor response predictions or time-to-event modeling. As the field of mechanistic learning advances, we aim for this review and proposed categorization framework to foster additional collaboration between the data- and knowledge-driven modeling fields. Further collaboration will help address difficult issues in oncology such as limited data availability, requirements of model transparency, and complex input data which are embraced in a mechanistic learning framework.
Collapse
Affiliation(s)
- John Metzcar
- Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Bloomington, IN, United States
- Informatics, Luddy School of Informatics, Computing, and Engineering, Bloomington, IN, United States
| | - Catherine R. Jutzeler
- Department of Health Sciences and Technology (D-HEST), Eidgenössische Technische Hochschule Zürich (ETH), Zürich, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Paul Macklin
- Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Bloomington, IN, United States
| | - Alvaro Köhn-Luque
- Oslo Centre for Biostatistics and Epidemiology, Faculty of Medicine, University of Oslo, Oslo, Norway
- Oslo Centre for Biostatistics and Epidemiology, Research Support Services, Oslo University Hospital, Oslo, Norway
| | - Sarah C. Brüningk
- Department of Health Sciences and Technology (D-HEST), Eidgenössische Technische Hochschule Zürich (ETH), Zürich, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
17
|
Goshisht MK. Machine Learning and Deep Learning in Synthetic Biology: Key Architectures, Applications, and Challenges. ACS OMEGA 2024; 9:9921-9945. [PMID: 38463314 PMCID: PMC10918679 DOI: 10.1021/acsomega.3c05913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 01/19/2024] [Accepted: 01/30/2024] [Indexed: 03/12/2024]
Abstract
Machine learning (ML), particularly deep learning (DL), has made rapid and substantial progress in synthetic biology in recent years. Biotechnological applications of biosystems, including pathways, enzymes, and whole cells, are being probed frequently with time. The intricacy and interconnectedness of biosystems make it challenging to design them with the desired properties. ML and DL have a synergy with synthetic biology. Synthetic biology can be employed to produce large data sets for training models (for instance, by utilizing DNA synthesis), and ML/DL models can be employed to inform design (for example, by generating new parts or advising unrivaled experiments to perform). This potential has recently been brought to light by research at the intersection of engineering biology and ML/DL through achievements like the design of novel biological components, best experimental design, automated analysis of microscopy data, protein structure prediction, and biomolecular implementations of ANNs (Artificial Neural Networks). I have divided this review into three sections. In the first section, I describe predictive potential and basics of ML along with myriad applications in synthetic biology, especially in engineering cells, activity of proteins, and metabolic pathways. In the second section, I describe fundamental DL architectures and their applications in synthetic biology. Finally, I describe different challenges causing hurdles in the progress of ML/DL and synthetic biology along with their solutions.
Collapse
Affiliation(s)
- Manoj Kumar Goshisht
- Department of Chemistry, Natural and
Applied Sciences, University of Wisconsin—Green
Bay, Green
Bay, Wisconsin 54311-7001, United States
| |
Collapse
|
18
|
Mercha EM, Benbrahim H, Erradi M. Heterogeneous text graph for comprehensive multilingual sentiment analysis: capturing short- and long-distance semantics. PeerJ Comput Sci 2024; 10:e1876. [PMID: 38435589 PMCID: PMC10909185 DOI: 10.7717/peerj-cs.1876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 01/22/2024] [Indexed: 03/05/2024]
Abstract
Multilingual sentiment analysis (MSA) involves the task of comprehending people's opinions, sentiments, and emotions in multilingual written texts. This task has garnered considerable attention due to its importance in extracting insights for decision-making across diverse fields such as marketing, finance, and politics. Several studies have explored MSA using deep learning methods. Nonetheless, a majority of these studies depend on sequential-based approaches, which focus on capturing short-distance semantics within adjacent word sequences, but they overlook long-distance semantics, which can provide more profound insights for analysis. In this work, we propose an approach for multilingual sentiment analysis, namely MSA-GCN, leveraging a graph convolutional network to effectively capture both short- and long-distance semantics. MSA-GCN involves the comprehensive modeling of the multilingual sentiment analysis corpus through a unified heterogeneous text graph. Subsequently, a slightly deep graph convolutional network is employed to acquire predictive representations for all nodes by encouraging the transfer learning across languages. Extensive experiments are carried out on various language combinations using different benchmark datasets to assess the efficiency of the proposed approach. These datasets include Multilingual Amazon Reviews Corpus (MARC), Internet Movie Database (IMDB), Allociné, and Muchocine. The achieved results reveal that MSA-GCN significantly outperformed all baseline models in almost all datasets with a p-value < 0.05 based on student t-test. In addition, such approach shows prominent results in a variety of language combinations, revealing the robustness of the approach against language variation.
Collapse
Affiliation(s)
- El Mahdi Mercha
- ENSIAS, Mohammed V University in Rabat, Rabat, Morocco
- HENCEFORTH, Rabat, Morocco
| | | | | |
Collapse
|
19
|
Deng Z, Xu J, Feng Y, Dong L, Zhang Y. MAVGAE: a multimodal framework for predicting asymmetric drug-drug interactions based on variational graph autoencoder. Comput Methods Biomech Biomed Engin 2024:1-13. [PMID: 38314513 DOI: 10.1080/10255842.2024.2311315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 01/21/2024] [Indexed: 02/06/2024]
Abstract
Drug-drug interactions refer to the phenomena wherein the potency, duration, or effectiveness of one or multiple drugs undergo alterations of varying degrees as a result of their concurrent or sequential usage. The accurate identification of potential drug interactions plays a pivotal role in mitigating the risks associated with drug administration in patients, it also helps in minimizing the likelihood of hazardous situations arising during a patient's course of treatment. However, researchers have found that there is a problem of asymmetric drug interactions, where one drug may affect another but not vice versa. This adds to the difficulty of prediction, so in polypharmacy, the order of drug administration is critical to efficacy and safety, and few current studies predict asymmetric DDIs. Aiming at the above problems, we propose a framework based on multimodal data and a variational graph autoencoder named MAVGAE for predicting asymmetric drug interactions. The framework initially encodes multimodal data into low-dimensional representations and then utilizes a variational graph autoencoder for encoding and decoding. During the model training process, supervised learning is employed for the classification task with the incorporation of heterogeneity information, ensuring accurate prediction of drug interactions. Experimental validation on a large-scale drug dataset demonstrates the framework's high accuracy and reliability in predicting non-symmetrical drug interactions, offering effective support and guidance for drug research.
Collapse
Affiliation(s)
- Zengqian Deng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| | - Jie Xu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Yinfei Feng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| | - Liangcheng Dong
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| |
Collapse
|
20
|
Liu Y, Shao Y, Hao Z, Lei X, Liang P, Chang Q, Wang X. Cuproptosis gene-related, neural network-based prognosis prediction and drug-target prediction for KIRC. Cancer Med 2024; 13:e6763. [PMID: 38131663 PMCID: PMC10807644 DOI: 10.1002/cam4.6763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 10/23/2023] [Accepted: 11/16/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Kidney renal clear cell carcinoma (KIRC), as a common case in renal cell carcinoma (RCC), has the risk of postoperative recurrence, thus its prognosis is poor and its prognostic markers are usually based on imaging methods, which have the problem of low specificity. In addition, cuproptosis, as a novel mode of cell death, has been used as a biomarker to predict disease in many cancers in recent years, which also provides an important basis for prognostic prediction in KIRC. For postoperative patients with KIRC, an important means of preventing disease recurrence is pharmacological treatment, and thus matching the appropriate drug to the specific patient's target is also particularly important. With the development of neural networks, their predictive performance in the field of medical big data has surpassed that of traditional methods, and this also applies to the field of prognosis prediction and drug-target prediction. OBJECTIVE The purpose of this study is to screen for cuproptosis genes related to the prognosis of KIRC and to establish a deep neural network (DNN) model for patient risk prediction, while also developing a personalized nomogram model for predicting patient survival. In addition, sensitivity drugs for KIRC were screened, and a graph neural network (GNN) model was established to predict the targets of the drugs, in order to discover potential drug action sites and provide new treatment ideas for KIRC. METHODS We used the Cancer Genome Atlas (TCGA) database, International Cancer Genome Consortium (ICGC) database, and DrugBank database for our study. Differentially expressed genes (DEGs) were screened using TCGA data, and then a DNN-based risk prediction model was built and validated using ICGC data. Subsequently, the differences between high- and low-risk groups were analyzed and KIRC-sensitive drugs were screened, and finally a GNN model was trained using DrugBank data to predict the relevant targets of these drugs. RESULTS A prognostic model was built by screening 10 significantly different cuproptosis-related genes, the model had an AUC of 0.739 on the training set (TCGA data) and an AUC of 0.707 on the validation set (ICGC data), which demonstrated a good predictive performance. Based on the prognostic model in this paper, patients were also classified into high- and low-risk groups, and functional analyses were performed. In addition, 251 drugs were screened for sensitivity, and four drugs were ultimately found to have high sensitivity, with 5-Fluorouracil having the best inhibitory effect, and subsequently their corresponding targets were also predicted by GraphSAGE, with the most prominent targets including Cytochrome P450 2D6, UDP-glucuronosyltransferase 1A, and Proto-oncogene tyrosine-protein kinase receptor Ret. Notably, the average accuracy of GraphSAGE was 0.817 ± 0.013, which was higher than that of GAT and GTN. CONCLUSION Our KIRC risk prediction model, constructed using 10 cuproptosis-related genes, had good independent prognostic ability. In addition, we screened four highly sensitive drugs and predicted relevant targets for these four drugs that might treat KIRC. Finally, literature research revealed that four drug-target interactions have been demonstrated in previous studies and the remaining targets are potential sites of drug action for future research.
Collapse
Affiliation(s)
- Yixin Liu
- Department of Surgery, Shanghai Key Laboratory of Gastric NeoplasmsShanghai Institute of Digestive Surgery, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of MedicineShanghaiChina
- School of Health Science and EngineeringUniversity of Shanghai for Science and TechnologyShanghaiChina
| | - Yuan Shao
- Department of UrologyRuijin Hospital Affiliated to Shanghai Jiao Tong University School of MedicineShanghaiChina
| | - Zezhou Hao
- School of Health Science and EngineeringUniversity of Shanghai for Science and TechnologyShanghaiChina
| | - Xuanzi Lei
- Graduate SchoolShanghai University of Traditional Chinese MedicineShanghaiChina
| | - Pengchen Liang
- School of MicroelectronicsShanghai UniversityShanghaiChina
| | - Qing Chang
- Department of Surgery, Shanghai Key Laboratory of Gastric NeoplasmsShanghai Institute of Digestive Surgery, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of MedicineShanghaiChina
- School of Health Science and EngineeringUniversity of Shanghai for Science and TechnologyShanghaiChina
| | - Xianjin Wang
- Department of UrologyRuijin Hospital Affiliated to Shanghai Jiao Tong University School of MedicineShanghaiChina
| |
Collapse
|
21
|
Gao R, Liu Z, Jiang C, Wang Y, Wang S, Wang P. BI-FedGNN: Federated graph neural networks framework based on Bayesian inference. Neural Netw 2024; 169:143-153. [PMID: 37890364 DOI: 10.1016/j.neunet.2023.10.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 10/01/2023] [Accepted: 10/17/2023] [Indexed: 10/29/2023]
Abstract
The development of the Industrial Internet of Things (IIoT) in recent years has resulted in an increase in the amount of data generated by connected devices, creating new opportunities to enhance the quality of service for machine learning in the IIoT through data sharing. Graph neural networks (GNNs) are the most popular technique in machine learning at the moment because they can learn extremely precise node representations from graph-structured data. Due to privacy issues and legal restrictions of clients in industrial IoT, it is not permissible to directly concentrate vast real-world graph-structured datasets for training on GNNs. To resolve the aforementioned difficulties, this paper proposes a federal graph learning framework based on Bayesian inference (BI-FedGNN) that performs effectively in the presence of noisy graph structure information or missing strong relational edges. BI-FedGNN extends Bayesian Inference (BI) to the process of Federal Graph Learning (FGL), adding random samples with weights and biases to the client-side local model training process, improving the accuracy and generalization ability of FGL in the training process by rendering the graph structure data involved in GNNs training more similar to the graph structure data existing in the real world. Through extensive experimental tests, the results show that BI-FedGNN has about 0.5%-5.0% accuracy improvement over other baselines of federal graph learning. In order to expand the applicability of BI-FedGNN, experiments are carried out on heterogeneous graph datasets, and the results indicate that BI-FedGNN can also have at least 1.4% improvement in classification accuracy.
Collapse
Affiliation(s)
- Rufei Gao
- School of Computer Science and Engineering, Yantai University, Shandong, China.
| | - Zhaowei Liu
- School of Computer Science and Engineering, Yantai University, Shandong, China.
| | - Chenxi Jiang
- School of Computer Science and Engineering, Yantai University, Shandong, China.
| | - Yingjie Wang
- School of Computer Science and Engineering, Yantai University, Shandong, China.
| | - Shenqiang Wang
- Institute of Network Technology (Yantai), Shandong, China.
| | - Pengda Wang
- University of Science and Technology of China, Anhui, China.
| |
Collapse
|
22
|
Bertin P, Rector-Brooks J, Sharma D, Gaudelet T, Anighoro A, Gross T, Martínez-Peña F, Tang EL, Suraj MS, Regep C, Hayter JBR, Korablyov M, Valiante N, van der Sloot A, Tyers M, Roberts CES, Bronstein MM, Lairson LL, Taylor-King JP, Bengio Y. RECOVER identifies synergistic drug combinations in vitro through sequential model optimization. CELL REPORTS METHODS 2023; 3:100599. [PMID: 37797618 PMCID: PMC10626197 DOI: 10.1016/j.crmeth.2023.100599] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 08/30/2023] [Accepted: 09/06/2023] [Indexed: 10/07/2023]
Abstract
For large libraries of small molecules, exhaustive combinatorial chemical screens become infeasible to perform when considering a range of disease models, assay conditions, and dose ranges. Deep learning models have achieved state-of-the-art results in silico for the prediction of synergy scores. However, databases of drug combinations are biased toward synergistic agents and results do not generalize out of distribution. During 5 rounds of experimentation, we employ sequential model optimization with a deep learning model to select drug combinations increasingly enriched for synergism and active against a cancer cell line-evaluating only ∼5% of the total search space. Moreover, we find that learned drug embeddings (using structural information) begin to reflect biological mechanisms. In silico benchmarking suggests search queries are ∼5-10× enriched for highly synergistic drug combinations by using sequential rounds of evaluation when compared with random selection or ∼3× when using a pretrained model.
Collapse
Affiliation(s)
- Paul Bertin
- Mila, the Quebec AI Institute, Montreal, QC, Canada
| | | | | | | | | | | | | | - Eileen L Tang
- Department of Chemistry, The Scripps Research Institute, La Jolla, CA, USA
| | | | | | | | | | | | - Almer van der Sloot
- IRIC, Institute for Research in Immunology and Cancer, Université de Montréal, Montreal, QC, Canada
| | - Mike Tyers
- Program in Molecular Medicine, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children, 686 Bay Street, Toronto, ON M5G 0A4, Canada
| | | | - Michael M Bronstein
- Relation Therapeutics, London, UK; Department of Computer Science, University of Oxford, Oxford, UK
| | - Luke L Lairson
- Department of Chemistry, The Scripps Research Institute, La Jolla, CA, USA
| | | | | |
Collapse
|
23
|
Du H, Yao MMS, Liu S, Chen L, Chan WP, Feng M. Automatic Calcification Morphology and Distribution Classification for Breast Mammograms With Multi-Task Graph Convolutional Neural Network. IEEE J Biomed Health Inform 2023; 27:3782-3793. [PMID: 37027577 DOI: 10.1109/jbhi.2023.3249404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2023]
Abstract
The morphology and distribution of microcalcifications are the most important descriptors for radiologists to diagnose breast cancer based on mammograms. However, it is very challenging and time-consuming for radiologists to characterize these descriptors manually, and there also lacks of effective and automatic solutions for this problem. We observed that the distribution and morphology descriptors are determined by the radiologists based on the spatial and visual relationships among calcifications. Thus, we hypothesize that this information can be effectively modelled by learning a relationship-aware representation using graph convolutional networks (GCNs). In this study, we propose a multi-task deep GCN method for automatic characterization of both the morphology and distribution of microcalcifications in mammograms. Our proposed method transforms morphology and distribution characterization into node and graph classification problem and learns the representations concurrently. We trained and validated the proposed method in an in-house dataset and public DDSM dataset with 195 and 583 cases,respectively. The proposed method reaches good and stable results with distribution AUC at 0.812 ± 0.043 and 0.873 ± 0.019, morphology AUC at 0.663 ± 0.016 and 0.700 ± 0.044 for both in-house and public datasets. In both datasets, our proposed method demonstrates statistically significant improvements compared to the baseline models. The performance improvements brought by our proposed multi-task mechanism can be attributed to the association between the distribution and morphology of calcifications in mammograms, which is interpretable using graphical visualizations and consistent with the definitions of descriptors in the standard BI-RADS guideline. In short, we explore, for the first time, the application of GCNs in microcalcification characterization that suggests the potential of using graph learning for more robust understanding of medical images.
Collapse
|
24
|
Azabou M, Ganesh V, Thakoor S, Lin CH, Sathidevi L, Liu R, Valko M, Veličković P, Dyer EL. Half-Hop: A graph upsampling approach for slowing down message passing. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2023; 202:1341-1360. [PMID: 37810517 PMCID: PMC10559225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Message passing neural networks have shown a lot of success on graph-structured data. However, there are many instances where message passing can lead to over-smoothing or fail when neighboring nodes belong to different classes. In this work, we introduce a simple yet general framework for improving learning in message passing neural networks. Our approach essentially upsamples edges in the original graph by adding "slow nodes" at each edge that can mediate communication between a source and a target node. Our method only modifies the input graph, making it plug-and-play and easy to use with existing models. To understand the benefits of slowing down message passing, we provide theoretical and empirical analyses. We report results on several supervised and self-supervised benchmarks, and show improvements across the board, notably in heterophilic conditions where adjacent nodes are more likely to have different labels. Finally, we show how our approach can be used to generate augmentations for self-supervised learning, where slow nodes are randomly introduced into different edges in the graph to generate multi-scale views with variable path lengths.
Collapse
|
25
|
Qureshi R, Irfan M, Gondal TM, Khan S, Wu J, Hadi MU, Heymach J, Le X, Yan H, Alam T. AI in drug discovery and its clinical relevance. Heliyon 2023; 9:e17575. [PMID: 37396052 PMCID: PMC10302550 DOI: 10.1016/j.heliyon.2023.e17575] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 06/17/2023] [Accepted: 06/21/2023] [Indexed: 07/04/2023] Open
Abstract
The COVID-19 pandemic has emphasized the need for novel drug discovery process. However, the journey from conceptualizing a drug to its eventual implementation in clinical settings is a long, complex, and expensive process, with many potential points of failure. Over the past decade, a vast growth in medical information has coincided with advances in computational hardware (cloud computing, GPUs, and TPUs) and the rise of deep learning. Medical data generated from large molecular screening profiles, personal health or pathology records, and public health organizations could benefit from analysis by Artificial Intelligence (AI) approaches to speed up and prevent failures in the drug discovery pipeline. We present applications of AI at various stages of drug discovery pipelines, including the inherently computational approaches of de novo design and prediction of a drug's likely properties. Open-source databases and AI-based software tools that facilitate drug design are discussed along with their associated problems of molecule representation, data collection, complexity, labeling, and disparities among labels. How contemporary AI methods, such as graph neural networks, reinforcement learning, and generated models, along with structure-based methods, (i.e., molecular dynamics simulations and molecular docking) can contribute to drug discovery applications and analysis of drug responses is also explored. Finally, recent developments and investments in AI-based start-up companies for biotechnology, drug design and their current progress, hopes and promotions are discussed in this article.
Collapse
Affiliation(s)
- Rizwan Qureshi
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
- Department of Imaging Physics, MD Anderson Cancer Center, The University of Texas, Houston, USA
| | - Muhammad Irfan
- Faculty of Electrical Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Swabi, Pakistan
| | | | - Sheheryar Khan
- School of Professional Education & Executive Development, The Hong Kong Polytechnic University, Hong Kong
| | - Jia Wu
- Department of Imaging Physics, MD Anderson Cancer Center, The University of Texas, Houston, USA
| | | | - John Heymach
- Department of Thoracic Head and Neck Medical Oncology, Division of Cancer Medicine, The University of Texas, MD Anderson Cancer Center, Houston, USA
| | - Xiuning Le
- Department of Thoracic Head and Neck Medical Oncology, Division of Cancer Medicine, The University of Texas, MD Anderson Cancer Center, Houston, USA
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
26
|
Sun J, Xu M, Ru J, James-Bott A, Xiong D, Wang X, Cribbs AP. Small molecule-mediated targeting of microRNAs for drug discovery: Experiments, computational techniques, and disease implications. Eur J Med Chem 2023; 257:115500. [PMID: 37262996 DOI: 10.1016/j.ejmech.2023.115500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/05/2023] [Accepted: 05/15/2023] [Indexed: 06/03/2023]
Abstract
Small molecules have been providing medical breakthroughs for human diseases for more than a century. Recently, identifying small molecule inhibitors that target microRNAs (miRNAs) has gained importance, despite the challenges posed by labour-intensive screening experiments and the significant efforts required for medicinal chemistry optimization. Numerous experimentally-verified cases have demonstrated the potential of miRNA-targeted small molecule inhibitors for disease treatment. This new approach is grounded in their posttranscriptional regulation of the expression of disease-associated genes. Reversing dysregulated gene expression using this mechanism may help control dysfunctional pathways. Furthermore, the ongoing improvement of algorithms has allowed for the integration of computational strategies built on top of laboratory-based data, facilitating a more precise and rational design and discovery of lead compounds. To complement the use of extensive pharmacogenomics data in prioritising potential drugs, our previous work introduced a computational approach based on only molecular sequences. Moreover, various computational tools for predicting molecular interactions in biological networks using similarity-based inference techniques have been accumulated in established studies. However, there are a limited number of comprehensive reviews covering both computational and experimental drug discovery processes. In this review, we outline a cohesive overview of both biological and computational applications in miRNA-targeted drug discovery, along with their disease implications and clinical significance. Finally, utilizing drug-target interaction (DTIs) data from DrugBank, we showcase the effectiveness of deep learning for obtaining the physicochemical characterization of DTIs.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| | - Miaoer Xu
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | - Jinlong Ru
- Chair of Prevention of Microbial Diseases, School of Life Sciences Weihenstephan, Technical University of Munich, Freising, 85354, Germany
| | - Anna James-Bott
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Xia Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China.
| | - Adam P Cribbs
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| |
Collapse
|
27
|
Pati SK, Gupta MK, Banerjee A, Shai R, Shivakumara P. Drug discovery through Covid-19 genome sequencing with siamese graph convolutional neural network. MULTIMEDIA TOOLS AND APPLICATIONS 2023:1-35. [PMID: 37362739 PMCID: PMC10170456 DOI: 10.1007/s11042-023-15270-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Revised: 09/23/2022] [Accepted: 04/06/2023] [Indexed: 06/28/2023]
Abstract
After several waves of COVID-19 led to a massive loss of human life worldwide due to the changes in its variants and the vast explosion. Several researchers proposed neural network-based drug discovery techniques to fight against the pandemic; utilizing neural networks has limitations (Exponential time complexity, Non-Convergence, Mode Collapse, and Diminished Gradient). To overcome those difficulties, this paper proposed a hybrid architecture that will help to repurpose the most appropriate medicines for the treatment of COVID-19. A brief investigation of the sequences has been made to discover the gene density and noncoding proportion through the next gene sequencing. The paper tracks the exceptional locales in the virus DNA sequence as a Drug Target Region (DTR). Then the variable DNA neighborhood search is applied to this DTR to obtain the DNA interaction network to show how the genes are correlated. A drug database has been obtained based on the ontological property of the genomes with advanced D3Similarity so that all the chemical components of the drug database have been identified. Other methods obtained hydroxychloroquine as an effective drug which was rejected by WHO. However, The experimental results show that Remdesivir and Dexamethasone are the most effective drugs, with 97.41 and 97.93%, respectively.
Collapse
Affiliation(s)
- Soumen Kumar Pati
- Department of Bioinformatics, Maulana Abul Kalam Azad University of Technology, Haringhata, West Bengal 741249 India
| | - Manan Kumar Gupta
- Department of Bioinformatics, Maulana Abul Kalam Azad University of Technology, Haringhata, West Bengal 741249 India
| | - Ayan Banerjee
- Department of Computer Science & Engineering, Jalpaiguri Governmemt Engineering College, Jalpaiguri, West Bengal 735102 India
| | - Rinita Shai
- Department of Mathematics, Behala College, Calcutta University, Kolkata, West Bengal 700060 India
| | | |
Collapse
|
28
|
Yang Y, Hsieh CY, Kang Y, Hou T, Liu H, Yao X. Deep Generation Model Guided by the Docking Score for Active Molecular Design. J Chem Inf Model 2023; 63:2983-2991. [PMID: 37163364 DOI: 10.1021/acs.jcim.3c00572] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
A deep generation model, as a novel drug design and discovery tool, shows obvious advantages in generating compounds with novel backbones and has been applied successfully in the field of drug discovery. However, it is still a challenge to generate molecules with expected properties, especially high activity. Here, to obtain compounds both with novelty and high activity to a target, we proposed a conditional molecular generation model COMG by considering the docking score and 3D pharmacophore matching during molecular generation. The proposed model was based on the conditional variational autoencoder architecture constrained by the pharmacophore matching score. During Bayesian optimization, the docking score was applied to enhance the target relevance of generated compounds. Furthermore, to overcome the problem of high structural similarity caused by Bayesian optimization, the idea of the scaffold memory unit was also introduced. The evaluation results of COMG show that our model not only can improve the structural diversity of generated molecules but also can effectively improve the proportion of target-related drug-active molecules. The obtained results indicate that our proposed model COMG is a useful drug design tool.
Collapse
Affiliation(s)
- Yuwei Yang
- Faculty of Applied Sciences, Macao Polytechnic University, Macao (SAR) 999078, P. R. China
- School of Pharmacy, Lanzhou University, Lanzhou 730000, Gansu, P. R. China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Huanxiang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao (SAR) 999078, P. R. China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa, 999078 Macau (SAR), P. R. China
| |
Collapse
|
29
|
Mateeva A, Kondeva-Burdina M, Nedialkov P, Peikova L, Georgieva M. Development of Hyphenated Techniques and Network Identification Approaches for Biotransformational Evaluation of Promising Antitubercular N-pyrrolyl hydrazide-hydrazone in Isolated Rat Hepatocytes. Chromatographia 2023; 86:497-505. [PMID: 37255951 PMCID: PMC10157554 DOI: 10.1007/s10337-023-04260-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 03/22/2023] [Accepted: 04/22/2023] [Indexed: 06/01/2023]
Abstract
Novel, rapid and precise RP-HPLC-DAD method was developed, validated and successfully applied for determination of metabolic changes of ethyl 5-(4-bromophenyl)-1-(3-(2-(2-hydroxybenzylidene)hydrazinyl)-3-oxopropyl)-2-methyl-1H-pyrrole-3-carboxylate (12b) in isolated rat hepatocytes. The analytes were detected by a simple DAD detector at 279 nm wavelength. A single-step extraction method was implemented to enable fast purification and extraction from cellular culture, resulting in a complete recovery. Thereafter, the method was adequately transferred to a LC-MS system for identification of unknown products. Additionally, network metabolism evaluation was performed to predict the structures of major metabolites with their isotope mass through BioTransformer 3.0. The data from the LC-MS analysis and the online server were compared for comprehensive identification. The results indicated formation of four metabolic products, obtained through processes of hydrolysis (12 and b), hydroxylation in the structure 12b (M1) and O-dealkylation (M2).
Collapse
Affiliation(s)
- Alexandrina Mateeva
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Medical University - Sofia, 2 Dunav Str, 1000 Sofia, Bulgaria
| | - Magdalena Kondeva-Burdina
- Department of Pharmacology, Toxicology and Pharmacotherapy, Faculty of Pharmacy, Medical University - Sofia, 2 Dunav Str, 1000 Sofia, Bulgaria
| | - Paraskev Nedialkov
- Department of Pharmacognosy, Faculty of Pharmacy, Medical University - Sofia, 2 Dunav Str, 1000 Sofia, Bulgaria
| | - Lily Peikova
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Medical University - Sofia, 2 Dunav Str, 1000 Sofia, Bulgaria
| | - Maya Georgieva
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Medical University - Sofia, 2 Dunav Str, 1000 Sofia, Bulgaria
| |
Collapse
|
30
|
Mangione W, Falls Z, Samudrala R. Effective holistic characterization of small molecule effects using heterogeneous biological networks. Front Pharmacol 2023; 14:1113007. [PMID: 37180722 PMCID: PMC10169664 DOI: 10.3389/fphar.2023.1113007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 04/11/2023] [Indexed: 05/16/2023] Open
Abstract
The two most common reasons for attrition in therapeutic clinical trials are efficacy and safety. We integrated heterogeneous data to create a human interactome network to comprehensively describe drug behavior in biological systems, with the goal of accurate therapeutic candidate generation. The Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multiscale therapeutic discovery, repurposing, and design was enhanced by integrating drug side effects, protein pathways, protein-protein interactions, protein-disease associations, and the Gene Ontology, and complemented with its existing drug/compound, protein, and indication libraries. These integrated networks were reduced to a "multiscale interactomic signature" for each compound that describe its functional behavior as vectors of real values. These signatures are then used for relating compounds to each other with the hypothesis that similar signatures yield similar behavior. Our results indicated that there is significant biological information captured within our networks (particularly via side effects) which enhance the performance of our platform, as evaluated by performing all-against-all leave-one-out drug-indication association benchmarking as well as generating novel drug candidates for colon cancer and migraine disorders corroborated via literature search. Further, drug impacts on pathways derived from computed compound-protein interaction scores served as the features for a random forest machine learning model trained to predict drug-indication associations, with applications to mental disorders and cancer metastasis highlighted. This interactomic pipeline highlights the ability of Computational Analysis of Novel Drug Opportunities to accurately relate drugs in a multitarget and multiscale context, particularly for generating putative drug candidates using the information gleaned from indirect data such as side effect profiles and protein pathway information.
Collapse
Affiliation(s)
| | | | - Ram Samudrala
- Jacobs School of Medicine and Biomedical Sciences, Department of Biomedical Informatics, University at Buffalo, Buffalo, NY, United States
| |
Collapse
|
31
|
Sadybekov AV, Katritch V. Computational approaches streamlining drug discovery. Nature 2023; 616:673-685. [PMID: 37100941 DOI: 10.1038/s41586-023-05905-z] [Citation(s) in RCA: 184] [Impact Index Per Article: 184.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 03/01/2023] [Indexed: 04/28/2023]
Abstract
Computer-aided drug discovery has been around for decades, although the past few years have seen a tectonic shift towards embracing computational technologies in both academia and pharma. This shift is largely defined by the flood of data on ligand properties and binding to therapeutic targets and their 3D structures, abundant computing capacities and the advent of on-demand virtual libraries of drug-like small molecules in their billions. Taking full advantage of these resources requires fast computational methods for effective ligand screening. This includes structure-based virtual screening of gigascale chemical spaces, further facilitated by fast iterative screening approaches. Highly synergistic are developments in deep learning predictions of ligand properties and target activities in lieu of receptor structure. Here we review recent advances in ligand discovery technologies, their potential for reshaping the whole process of drug discovery and development, as well as the challenges they encounter. We also discuss how the rapid identification of highly diverse, potent, target-selective and drug-like ligands to protein targets can democratize the drug discovery process, presenting new opportunities for the cost-effective development of safer and more effective small-molecule treatments.
Collapse
Affiliation(s)
- Anastasiia V Sadybekov
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Center for New Technologies in Drug Discovery and Development, Bridge Institute, Michelson Center for Convergent Biosciences, University of Southern California, Los Angeles, CA, USA
| | - Vsevolod Katritch
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Center for New Technologies in Drug Discovery and Development, Bridge Institute, Michelson Center for Convergent Biosciences, University of Southern California, Los Angeles, CA, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
32
|
Li J, Yanagisawa K, Sugita M, Fujie T, Ohue M, Akiyama Y. CycPeptMPDB: A Comprehensive Database of Membrane Permeability of Cyclic Peptides. J Chem Inf Model 2023; 63:2240-2250. [PMID: 36930969 PMCID: PMC10091415 DOI: 10.1021/acs.jcim.2c01573] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023]
Abstract
Recently, cyclic peptides have been considered breakthrough drugs because they can interact with "undruggable" targets such as intracellular protein-protein interactions. Membrane permeability is an essential indicator of oral bioavailability and intracellular targeting, and the development of membrane-permeable peptides is a bottleneck in cyclic peptide drug discovery. Although many experimental data on membrane permeability of cyclic peptides have been reported, a comprehensive database is not yet available. A comprehensive membrane permeability database is essential for developing computational methods for cyclic peptide drug design. In this study, we constructed CycPeptMPDB, the first web-accessible database of cyclic peptide membrane permeability. We collected information on a total of 7334 cyclic peptides, including the structure and experimentally measured membrane permeability, from 45 published papers and 2 patents from pharmaceutical companies. To unambiguously represent cyclic peptides larger than small molecules, we used the hierarchical editing language for macromolecules notation to generate a uniform sequence representation of peptides. In addition to data storage, CycPeptMPDB provides several supporting functions such as online data visualization, data analysis, and downloading. CycPeptMPDB is expected to be a valuable platform to support membrane permeability research on cyclic peptides. CycPeptMPDB can be freely accessed at http://cycpeptmpdb.com.
Collapse
Affiliation(s)
- Jianan Li
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| | - Keisuke Yanagisawa
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan.,Middle-Molecule IT-based Drug Discovery Laboratory (MIDL), Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| | - Masatake Sugita
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan.,Middle-Molecule IT-based Drug Discovery Laboratory (MIDL), Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| | - Takuya Fujie
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan.,Middle-Molecule IT-based Drug Discovery Laboratory (MIDL), Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| | - Masahito Ohue
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan.,Middle-Molecule IT-based Drug Discovery Laboratory (MIDL), Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| | - Yutaka Akiyama
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan.,Middle-Molecule IT-based Drug Discovery Laboratory (MIDL), Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| |
Collapse
|
33
|
Song T, Ren Y, Wang S, Han P, Wang L, Li X, Rodriguez-Patón A. DNMG: Deep molecular generative model by fusion of 3D information for de novo drug design. Methods 2023; 211:10-22. [PMID: 36764588 DOI: 10.1016/j.ymeth.2023.02.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 01/18/2023] [Accepted: 02/01/2023] [Indexed: 02/11/2023] Open
Abstract
Deep learning is improving and changing the process of de novo molecular design at a rapid pace. In recent years, great progress has been made in drug discovery and development by using deep generative models for de novo molecular design. However, most of the existing methods are string-based or graph-based and are limited by the lack of some very important properties, such as the three-dimensional information of molecules. We propose DNMG, a deep generative adversarial network (GAN) combined with transfer learning. Specifically, we use a Wasserstein-variant GAN based network architecture that considers the 3D grid spatial information of the ligand with atomic physicochemical properties to generate a representation of the molecule, which is then parsed into SMILES strings using an improved captioning network. Comprehensive in experiments demonstrate the ability of DNMG to generate valid and novel drug-like ligands. The DNMG model is used to design inhibitors for three targets, MK14, FNTA, and CDK2. The computational results show that the molecules generated by DNMG have better binding ability to the target proteins and better physicochemical properties. Overall, our deep generative model has excellent potential to generate molecules with high binding affinity for targets and explore the space of drug-like chemistry.
Collapse
Affiliation(s)
- Tao Song
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China; Department of Artificial Intelligence, Faculty of Computer Science, Polytechnical University of Madrid, Campus de Montegancedo, Boadilla del Monte 28660, Madrid, Spain.
| | - Yongqi Ren
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Shuang Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China.
| | - Peifu Han
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Lulu Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Xue Li
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Alfonso Rodriguez-Patón
- Department of Artificial Intelligence, Faculty of Computer Science, Polytechnical University of Madrid, Campus de Montegancedo, Boadilla del Monte 28660, Madrid, Spain
| |
Collapse
|
34
|
Zhan H, Zhu X, Qiao Z, Hu J. Graph Neural Tree: A novel and interpretable deep learning-based framework for accurate molecular property predictions. Anal Chim Acta 2023; 1244:340558. [PMID: 36737143 DOI: 10.1016/j.aca.2022.340558] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Accepted: 10/24/2022] [Indexed: 11/06/2022]
Abstract
Determining various properties of molecules is a critical step in drug discovery. Recently, with the improvement of large heterogeneous datasets and the development of deep learning approaches, more and more scientists have turned their attention to neural network-based virtual preliminary screening to reduce the time and monetary cost of drug discovery. However, the poor interpretability of deep learning masks causality, so models' conclusions are often beyond the comprehension of human users, which reduces the credibility of the model and makes it difficult for chemists to further narrow the huge chemical space based on models' results. Thus, this study develops a novel framework consisting of Graph Neural Networks for feature extraction, Curriculum-Based Learning Strategies for optimization, and a Learning Binary Neural Tree (LBNT) for prediction, to improve the performance of neural networks and reveal their decision-making process to chemists. The framework encodes molecular graph data with graph neural networks (GNNs), then retrains the encoder with curriculum-based learning strategies to reduce uncertainty and improve accuracy, and finally uses LBNT as the predictor, which joint retrains with the encoder after independently training, for prediction and visualization. The framework is validated on the public datasets and compared to single GNNs with normal training strategies as well as GNN encoders with common machine learning predictors instead of the LBNT predictor. The result reveals that the proposed framework enhances the point prediction accuracy of the completely trained GNN and reduces its uncertainty through curriculum-based learning, and further improves the accuracy by combining LBNT. Besides, compared with common machine learning tools, the LBNT predictor generally has the best performance because of joint retraining with the GNN encoder. The decision-making process of LBNT is also better and easier to explain than that of other models.
Collapse
Affiliation(s)
- Haolin Zhan
- Guangzhou Key Laboratory for New Energy and Green Catalysis, School of Chemistry and Chemical Engineering, Guangzhou University, Guangzhou, China; College of Economics and Statistics, Guangzhou University, Guangzhou, China
| | - Xin Zhu
- Guangzhou Key Laboratory for New Energy and Green Catalysis, School of Chemistry and Chemical Engineering, Guangzhou University, Guangzhou, China.
| | - Zhiwei Qiao
- Guangzhou Key Laboratory for New Energy and Green Catalysis, School of Chemistry and Chemical Engineering, Guangzhou University, Guangzhou, China; Joint Institute of Guangzhou University & Institute of Corrosion Science and Technology, Guangzhou University, Guangzhou, 510006, China.
| | - Jianming Hu
- College of Economics and Statistics, Guangzhou University, Guangzhou, China.
| |
Collapse
|
35
|
Partin A, Brettin TS, Zhu Y, Narykov O, Clyde A, Overbeek J, Stevens RL. Deep learning methods for drug response prediction in cancer: Predominant and emerging trends. Front Med (Lausanne) 2023; 10:1086097. [PMID: 36873878 PMCID: PMC9975164 DOI: 10.3389/fmed.2023.1086097] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/23/2023] [Indexed: 02/17/2023] Open
Abstract
Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Thomas S. Brettin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Austin Clyde
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Jamie Overbeek
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Rick L. Stevens
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, The University of Chicago, Chicago, IL, United States
| |
Collapse
|
36
|
Hsieh KL, Plascencia-Villa G, Lin KH, Perry G, Jiang X, Kim Y. Synthesize heterogeneous biological knowledge via representation learning for Alzheimer's disease drug repurposing. iScience 2023; 26:105678. [PMID: 36594024 PMCID: PMC9804117 DOI: 10.1016/j.isci.2022.105678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 11/04/2022] [Accepted: 11/23/2022] [Indexed: 11/27/2022] Open
Abstract
Developing drugs for treating Alzheimer's disease has been extremely challenging and costly due to limited knowledge of underlying mechanisms and therapeutic targets. To address the challenge in AD drug development, we developed a multi-task deep learning pipeline that learns biological interactions and AD risk genes, then utilizes multi-level evidence on drug efficacy to identify repurposable drug candidates. Using the embedding derived from the model, we ranked drug candidates based on evidence from post-treatment transcriptomic patterns, efficacy in preclinical models, population-based treatment effects, and clinical trials. We mechanistically validated the top-ranked candidates in neuronal cells, identifying drug combinations with efficacy in reducing oxidative stress and safety in maintaining neuronal viability and morphology. Our neuronal response experiments confirmed several biologically efficacious drug combinations. This pipeline showed that harmonizing heterogeneous and complementary data/knowledge, including human interactome, transcriptome patterns, experimental efficacy, and real-world patient data shed light on the drug development of complex diseases.
Collapse
Affiliation(s)
- Kang-Lin Hsieh
- Center for Secure Artificial Intelligence for Healthcare, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - German Plascencia-Villa
- Department of Neuroscience, Developmental and Regenerative Biology, University of Texas at San Antonio, San Antonio, TX 78729, USA
| | - Ko-Hong Lin
- Center for Secure Artificial Intelligence for Healthcare, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - George Perry
- Department of Neuroscience, Developmental and Regenerative Biology, University of Texas at San Antonio, San Antonio, TX 78729, USA
| | - Xiaoqian Jiang
- Center for Secure Artificial Intelligence for Healthcare, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Yejin Kim
- Center for Secure Artificial Intelligence for Healthcare, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
37
|
Abate C, Decherchi S, Cavalli A. Graph neural networks for conditional de novo drug design. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2023. [DOI: 10.1002/wcms.1651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Carlo Abate
- Fondazione Istituto Italiano di Tecnologia Genoa Italy
- Università degli Studi di Bologna Bologna Italy
| | | | - Andrea Cavalli
- Fondazione Istituto Italiano di Tecnologia Genoa Italy
- Università degli Studi di Bologna Bologna Italy
| |
Collapse
|
38
|
Ayati M, Yilmaz S, Blasco Tavares Pereira Lopes F, Chance M, Koyuturk M. Prediction of Kinase-Substrate Associations Using The Functional Landscape of Kinases and Phosphorylation Sites. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2023; 28:73-84. [PMID: 36540966 PMCID: PMC9782723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Protein phosphorylation is a key post-translational modification that plays a central role in many cellular processes. With recent advances in biotechnology, thousands of phosphorylated sites can be identified and quantified in a given sample, enabling proteome-wide screening of cellular signaling. However, for most (> 90%) of the phosphorylation sites that are identified in these experiments, the kinase(s) that target these sites are unknown. To broadly utilize available structural, functional, evolutionary, and contextual information in predicting kinase-substrate associations (KSAs), we develop a network-based machine learning framework. Our framework integrates a multitude of data sources to characterize the landscape of functional relationships and associations among phosphosites and kinases. To construct a phosphosite-phosphosite association network, we use sequence similarity, shared biological pathways, co-evolution, co-occurrence, and co-phosphorylation of phosphosites across different biological states. To construct a kinase-kinase association network, we integrate protein-protein interactions, shared biological pathways, and membership in common kinase families. We use node embeddings computed from these heterogeneous networks to train machine learning models for predicting kinase-substrate associations. Our systematic computational experiments using the PhosphositePLUS database shows that the resulting algorithm, NetKSA, outperforms two state-of-the-art algorithms, including KinomeXplorer and LinkPhinder, in overall KSA prediction. By stratifying the ranking of kinases, NetKSA also enables annotation of phosphosites that are targeted by relatively less-studied kinases.Availability: The code and data are available at compbio.case.edu/NetKSA/.
Collapse
Affiliation(s)
- Marzieh Ayati
- Department of Computer Science, University of Texas Rio Grande Valley, Edinburg, TX, USA,
| | | | | | | | | |
Collapse
|
39
|
Recent advances and challenges in experiment-oriented polymer informatics. Polym J 2022. [DOI: 10.1038/s41428-022-00734-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
40
|
Li MM, Huang K, Zitnik M. Graph representation learning in biomedicine and healthcare. Nat Biomed Eng 2022; 6:1353-1369. [PMID: 36316368 PMCID: PMC10699434 DOI: 10.1038/s41551-022-00942-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 08/09/2022] [Indexed: 11/11/2022]
Abstract
Networks-or graphs-are universal descriptors of systems of interacting elements. In biomedicine and healthcare, they can represent, for example, molecular interactions, signalling pathways, disease co-morbidities or healthcare systems. In this Perspective, we posit that representation learning can realize principles of network medicine, discuss successes and current limitations of the use of representation learning on graphs in biomedicine and healthcare, and outline algorithmic strategies that leverage the topology of graphs to embed them into compact vectorial spaces. We argue that graph representation learning will keep pushing forward machine learning for biomedicine and healthcare applications, including the identification of genetic variants underlying complex traits, the disentanglement of single-cell behaviours and their effects on health, the assistance of patients in diagnosis and treatment, and the development of safe and effective medicines.
Collapse
Affiliation(s)
- Michelle M Li
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Kexin Huang
- Health Data Science Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
| |
Collapse
|
41
|
Ngara TR, Zeng P, Zhang H. mibPOPdb: An online database for microbial biodegradation of persistent organic pollutants. IMETA 2022; 1:e45. [PMID: 38867901 PMCID: PMC10989864 DOI: 10.1002/imt2.45] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 07/04/2022] [Accepted: 07/11/2022] [Indexed: 06/14/2024]
Abstract
Microbial biodegradation of persistent organic pollutants (POPs) is an attractive, ecofriendly, and cost-efficient clean-up technique for reclaiming POP-contaminated environments. In the last few decades, the number of publications documenting POP-degrading microbes, enzymes, and experimental data sets has continuously increased, necessitating the development of a dedicated web resource that catalogs consolidated information on POP-degrading microbes and tools to facilitate integrative analysis of POP degradation data sets. To address this knowledge gap, we developed the Microbial Biodegradation of Persistent Organic Pollutants Database (mibPOPdb) by accumulating microbial POP degradation information from the public domain and manually curating published scientific literature. Currently, in mibPOPdb, there are 9215 microbial strain entries, including 184 gene (sub)families, 100 enzymes, 48 biodegradation pathways, and 593 intermediate compounds identified in POP-biodegradation processes, and information on 32 toxic compounds listed under the Stockholm Convention environmental treaty. Besides the standard database functionalities, which include data searching, browsing, and retrieval of database entries, we provide a suite of bioinformatics services to facilitate comparative analysis of users' own data sets against mibPOPdb entries. Additionally, we built a Graph Neural Network-based prediction model for the biodegradability classification of chemicals. The predictive model exhibited a good biodegradability classification performance and high prediction accuracy. mibPOPdb is a free data-sharing platform designated to promote research in microbial-based biodegradation of POPs and fills a long-standing gap in environmental protection research. Database URL: http://mibpop.genome-mining.cn/.
Collapse
Affiliation(s)
- Tanyaradzwa R. Ngara
- Department of Biotechnology, College of Life Science and Technology, MOE KEY Laboratory of Molecular BiophysicsHuazhong University of Science and TechnologyWuhanChina
| | - Peiji Zeng
- Department of Biotechnology, College of Life Science and Technology, MOE KEY Laboratory of Molecular BiophysicsHuazhong University of Science and TechnologyWuhanChina
| | - Houjin Zhang
- Department of Biotechnology, College of Life Science and Technology, MOE KEY Laboratory of Molecular BiophysicsHuazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
42
|
Bonner S, Barrett IP, Ye C, Swiers R, Engkvist O, Bender A, Hoyt CT, Hamilton WL. A review of biomedical datasets relating to drug discovery: a knowledge graph perspective. Brief Bioinform 2022; 23:6712301. [PMID: 36151740 DOI: 10.1093/bib/bbac404] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/14/2022] [Accepted: 08/20/2022] [Indexed: 12/14/2022] Open
Abstract
Drug discovery and development is a complex and costly process. Machine learning approaches are being investigated to help improve the effectiveness and speed of multiple stages of the drug discovery pipeline. Of these, those that use Knowledge Graphs (KG) have promise in many tasks, including drug repurposing, drug toxicity prediction and target gene-disease prioritization. In a drug discovery KG, crucial elements including genes, diseases and drugs are represented as entities, while relationships between them indicate an interaction. However, to construct high-quality KGs, suitable data are required. In this review, we detail publicly available sources suitable for use in constructing drug discovery focused KGs. We aim to help guide machine learning and KG practitioners who are interested in applying new techniques to the drug discovery field, but who may be unfamiliar with the relevant data sources. The datasets are selected via strict criteria, categorized according to the primary type of information contained within and are considered based upon what information could be extracted to build a KG. We then present a comparative analysis of existing public drug discovery KGs and an evaluation of selected motivating case studies from the literature. Additionally, we raise numerous and unique challenges and issues associated with the domain and its datasets, while also highlighting key future research directions. We hope this review will motivate KGs use in solving key and emerging questions in the drug discovery domain.
Collapse
Affiliation(s)
- Stephen Bonner
- Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Ian P Barrett
- Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Cheng Ye
- Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Rowan Swiers
- Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweeden
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, UK
| | | | - William L Hamilton
- School of Computer Science, McGill University, Canada.,Mila-Quebec AI Institute, Montreal, Canada
| |
Collapse
|
43
|
Granata I, Manipur I, Giordano M, Maddalena L, Guarracino MR. TumorMet: A repository of tumor metabolic networks derived from context-specific Genome-Scale Metabolic Models. Sci Data 2022; 9:607. [PMID: 36207341 PMCID: PMC9547001 DOI: 10.1038/s41597-022-01702-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 09/15/2022] [Indexed: 11/25/2022] Open
Abstract
Studies about the metabolic alterations during tumorigenesis have increased our knowledge of the underlying mechanisms and consequences, which are important for diagnostic and therapeutic investigations. In this scenario and in the era of systems biology, metabolic networks have become a powerful tool to unravel the complexity of the cancer metabolic machinery and the heterogeneity of this disease. Here, we present TumorMet, a repository of tumor metabolic networks extracted from context-specific Genome-Scale Metabolic Models, as a benchmark for graph machine learning algorithms and network analyses. This repository has an extended scope for use in graph classification, clustering, community detection, and graph embedding studies. Along with the data, we developed and provided Met2Graph, an R package for creating three different types of metabolic graphs, depending on the desired nodes and edges: Metabolites-, Enzymes-, and Reactions-based graphs. This package allows the easy generation of datasets for downstream analysis. Measurement(s) | gene expression, metabolic relationships | Technology Type(s) | Genome Scale Metabolic Models; Computational network biology | Sample Characteristic - Organism | Homo sapiens |
Collapse
|
44
|
Beardall WA, Stan GB, Dunlop MJ. Deep Learning Concepts and Applications for Synthetic Biology. GEN BIOTECHNOLOGY 2022; 1:360-371. [PMID: 36061221 PMCID: PMC9428732 DOI: 10.1089/genbio.2022.0017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 07/14/2022] [Indexed: 12/24/2022]
Abstract
Synthetic biology has a natural synergy with deep learning. It can be used to generate large data sets to train models, for example by using DNA synthesis, and deep learning models can be used to inform design, such as by generating novel parts or suggesting optimal experiments to conduct. Recently, research at the interface of engineering biology and deep learning has highlighted this potential through successes including the design of novel biological parts, protein structure prediction, automated analysis of microscopy data, optimal experimental design, and biomolecular implementations of artificial neural networks. In this review, we present an overview of synthetic biology-relevant classes of data and deep learning architectures. We also highlight emerging studies in synthetic biology that capitalize on deep learning to enable novel understanding and design, and discuss challenges and future opportunities in this space.
Collapse
Affiliation(s)
- William A.V. Beardall
- Department of Bioengineering, Imperial College London, London, United Kingdom
- Imperial College Centre of Excellence in Synthetic Biology, Imperial College London, London, United Kingdom
| | - Guy-Bart Stan
- Department of Bioengineering, Imperial College London, London, United Kingdom
- Imperial College Centre of Excellence in Synthetic Biology, Imperial College London, London, United Kingdom
| | - Mary J. Dunlop
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
- Biological Design Center, Boston University, Boston, Massachusetts, USA
| |
Collapse
|
45
|
Bonner S, Kirik U, Engkvist O, Tang J, Barrett IP. Implications of topological imbalance for representation learning on biomedical knowledge graphs. Brief Bioinform 2022; 23:6649936. [PMID: 35880623 DOI: 10.1093/bib/bbac279] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 05/18/2022] [Accepted: 06/14/2022] [Indexed: 11/13/2022] Open
Abstract
Adoption of recently developed methods from machine learning has given rise to creation of drug-discovery knowledge graphs (KGs) that utilize the interconnected nature of the domain. Graph-based modelling of the data, combined with KG embedding (KGE) methods, are promising as they provide a more intuitive representation and are suitable for inference tasks such as predicting missing links. One common application is to produce ranked lists of genes for a given disease, where the rank is based on the perceived likelihood of association between the gene and the disease. It is thus critical that these predictions are not only pertinent but also biologically meaningful. However, KGs can be biased either directly due to the underlying data sources that are integrated or due to modelling choices in the construction of the graph, one consequence of which is that certain entities can get topologically overrepresented. We demonstrate the effect of these inherent structural imbalances, resulting in densely connected entities being highly ranked no matter the context. We provide support for this observation across different datasets, models as well as predictive tasks. Further, we present various graph perturbation experiments which yield more support to the observation that KGE models can be more influenced by the frequency of entities rather than any biological information encoded within the relations. Our results highlight the importance of data modelling choices, and emphasizes the need for practitioners to be mindful of these issues when interpreting model outputs and during KG composition.
Collapse
Affiliation(s)
- Stephen Bonner
- Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Ufuk Kirik
- Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweeden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweeden
| | - Jian Tang
- HEC Montreal, Canada.,Canadian Institute for Advanced Research (CIFAR), Canada.,Mila - Quebec AI Institute, Montreal, Canada
| | - Ian P Barrett
- Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| |
Collapse
|
46
|
Guedj M, Swindle J, Hamon A, Hubert S, Desvaux E, Laplume J, Xuereb L, Lefebvre C, Haudry Y, Gabarroca C, Aussy A, Laigle L, Dupin-Roger I, Moingeon P. Industrializing AI-powered drug discovery: lessons learned from the Patrimony computing platform. Expert Opin Drug Discov 2022; 17:815-824. [PMID: 35786124 DOI: 10.1080/17460441.2022.2095368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
INTRODUCTION As a mid-size international pharmaceutical company, we initiated four years ago the launch of a dedicated high-throughput computing platform supporting drug discovery. The platform named "Patrimony" was built-up on the initial predicate to capitalize on our proprietary data while leveraging public data sources in order to foster a Computational Precision Medicine approach with the power of Artificial Intelligence. AREAS COVERED Specifically, Patrimony is designed to identify novel therapeutic target candidates. With several successful use cases in Immuno-inflammatory diseases, and current ongoing extension to applications to Oncology and Neurology, we document how this industrial computational platform has had a transformational impact on our R&D, making it more competitive, as well time and cost effective through a model-based educated selection of therapeutic targets and drug candidates. EXPERT OPINION We report our achievements, but also our challenges in implementing data access and governance processes, building-up hardware and user interfaces, and acculturing scientists to use predictive models to inform decisions.
Collapse
Affiliation(s)
- Mickaël Guedj
- Servier, Research & Development, Suresnes Cedex, France
| | - Jack Swindle
- Lincoln, Research & Development, Boulogne-Billancourt Cedex, France
| | - Antoine Hamon
- Lincoln, Research & Development, Boulogne-Billancourt Cedex, France
| | - Sandra Hubert
- Servier, Research & Development, Suresnes Cedex, France
| | - Emiko Desvaux
- Servier, Research & Development, Suresnes Cedex, France
| | | | - Laura Xuereb
- Servier, Research & Development, Suresnes Cedex, France
| | | | | | | | - Audrey Aussy
- Servier, Research & Development, Suresnes Cedex, France
| | | | | | | |
Collapse
|
47
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
48
|
Construction of Knowledge Graph of 3D Clothing Design Resources Based on Multimodal Clustering Network. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1168012. [PMID: 35694580 PMCID: PMC9184191 DOI: 10.1155/2022/1168012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 04/24/2022] [Accepted: 05/17/2022] [Indexed: 11/18/2022]
Abstract
The construction of 3D design model is a hotspot of applied research in the fields of clothing functional design system teaching and display. The simple 3D clothing visualization postprocessing lacks interactive functions, which is a hot issue that needs to be solved urgently at present. Based on analyzing the existing clothing modeling technology, template technology, and fusion technology, and based on the multimodal clustering network theory, this paper proposes a 3D clothing design resource knowledge graph modeling method with multiple fusion of features and templates. The position of each joint point is converted into the coordinate system centered on the torso point in advance and normalized to avoid the problem that the relative position of the camera and the collector cannot be determined, and the shape of different collectors is different. The paper provides a multimodal clustering network intelligence method, illustrates the interoperability of users switching between different design networks in the seamless connection movement, and combines the hybrid intelligence algorithm with the fuzzy logic interpretation algorithm to solve the problems in the field of 3D clothing design service quality. During the simulation process, the research scheme builds a logical multimodal clustering network framework, which integrates compatibility access and global access partition fusion of style templates to achieve information extraction of clothing parts. The experimental results show that the realistic 3D clothing modeling can be achieved by layering the 3D clothing map, contour features, clothing size features, and color texture features with the modeling template. The developed ActiveX control is mounted on MSN, and the system is compatible. The performance and integration rate reached 77.1% and 89.7%, respectively, which effectively strengthened the practical role of the 3D clothing design system.
Collapse
|
49
|
|
50
|
Du BX, Qin Y, Jiang YF, Xu Y, Yiu SM, Yu H, Shi JY. Compound–protein interaction prediction by deep learning: Databases, descriptors and models. Drug Discov Today 2022; 27:1350-1366. [DOI: 10.1016/j.drudis.2022.02.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 11/19/2021] [Accepted: 02/28/2022] [Indexed: 11/24/2022]
|