1
|
Chan TH, Yin G, Bae K, Yu L. Multi-task heterogeneous graph learning on electronic health records. Neural Netw 2024; 180:106644. [PMID: 39180906 DOI: 10.1016/j.neunet.2024.106644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 05/28/2024] [Accepted: 08/14/2024] [Indexed: 08/27/2024]
Abstract
Learning electronic health records (EHRs) has received emerging attention because of its capability to facilitate accurate medical diagnosis. Since the EHRs contain enriched information specifying complex interactions between entities, modeling EHRs with graphs is shown to be effective in practice. The EHRs, however, present a great degree of heterogeneity, sparsity, and complexity, which hamper the performance of most of the models applied to them. Moreover, existing approaches modeling EHRs often focus on learning the representations for a single task, overlooking the multi-task nature of EHR analysis problems and resulting in limited generalizability across different tasks. In view of these limitations, we propose a novel framework for EHR modeling, namely MulT-EHR (Multi-Task EHR), which leverages a heterogeneous graph to mine the complex relations and model the heterogeneity in the EHRs. To mitigate the large degree of noise, we introduce a denoising module based on the causal inference framework to adjust for severe confounding effects and reduce noise in the EHR data. Additionally, since our model adopts a single graph neural network for simultaneous multi-task prediction, we design a multi-task learning module to leverage the inter-task knowledge to regularize the training process. Extensive empirical studies on MIMIC-III and MIMIC-IV datasets validate that the proposed method consistently outperforms the state-of-the-art designs in four popular EHR analysis tasks - drug recommendation, and predictions of the length of stay, mortality, and readmission. Thorough ablation studies demonstrate the robustness of our method upon variations to key components and hyperparameters.
Collapse
Affiliation(s)
- Tsai Hor Chan
- Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong Special Administrative Region of China
| | - Guosheng Yin
- Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong Special Administrative Region of China
| | - Kyongtae Bae
- Department of Diagnostic Radiology, The University of Hong Kong, Pokfulam Road, Hong Kong Special Administrative Region of China
| | - Lequan Yu
- Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong Special Administrative Region of China.
| |
Collapse
|
2
|
Liu Y, Zhang Z, Zhang H, Wang X, Wang K, Yang R, Han P, Luan K, Zhou Y. Clinical prediction of microvascular invasion in hepatocellular carcinoma using an MRI-based graph convolutional network model integrated with nomogram. Br J Radiol 2024; 97:938-946. [PMID: 38552308 DOI: 10.1093/bjr/tqae056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 02/07/2024] [Accepted: 03/06/2024] [Indexed: 05/09/2024] Open
Abstract
OBJECTIVES Based on enhanced MRI, a prediction model of microvascular invasion (MVI) for hepatocellular carcinoma (HCC) was developed using graph convolutional network (GCN) combined nomogram. METHODS We retrospectively collected 182 HCC patients confirmed histopathologically, all of them performed enhanced MRI before surgery. The patients were randomly divided into training and validation groups. Radiomics features were extracted from the arterial phase (AP), portal venous phase (PVP), and delayed phase (DP), respectively. After removing redundant features, the graph structure by constructing the distance matrix with the feature matrix was built. Screening the superior phases and acquired GCN Score (GS). Finally, combining clinical, radiological and GS established the predicting nomogram. RESULTS 27.5% (50/182) patients were with MVI positive. In radiological analysis, intratumoural artery (P = 0.007) was an independent predictor of MVI. GCN model with grey-level cooccurrence matrix-grey-level run length matrix features exhibited area under the curves of the training group was 0.532, 0.690, and 0.885 and the validation group was 0.583, 0.580, and 0.854 for AP, PVP, and DP, respectively. DP was selected to develop final model and got GS. Combining GS with diameter, corona enhancement, mosaic architecture, and intratumoural artery constructed a nomogram which showed a C-index of 0.884 (95% CI: 0.829-0.927). CONCLUSIONS The GCN model based on DP has a high predictive ability. A nomogram combining GS, clinical and radiological characteristics can be a simple and effective guiding tool for selecting HCC treatment options. ADVANCES IN KNOWLEDGE GCN based on MRI could predict MVI on HCC.
Collapse
Affiliation(s)
- Yang Liu
- Department of Radiology, Harbin Medical University Cancer Hospital, Harbin 150010, Heilongjiang, China
| | - Ziqian Zhang
- Department of Radiology, Harbin Medical University Cancer Hospital, Harbin 150010, Heilongjiang, China
| | - Hongxia Zhang
- Department of Radiology, Harbin Medical University Cancer Hospital, Harbin 150010, Heilongjiang, China
| | - Xinxin Wang
- Department of Radiology, Harbin Medical University Cancer Hospital, Harbin 150010, Heilongjiang, China
| | - Kun Wang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
| | - Rui Yang
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, No.150 Haping Road, Nangang District, Harbin 150081, Heilongjiang Province, China
| | - Peng Han
- Department of Surgical Oncology, Harbin Medical University Cancer Hospital, No.150 Haping Road, Nangang District, Harbin 150081, Heilongjiang Province, China
| | - Kuan Luan
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
| | - Yang Zhou
- Department of Radiology, Harbin Medical University Cancer Hospital, Harbin 150010, Heilongjiang, China
| |
Collapse
|
3
|
Chen C, Zhang Z, Tang P, Liu X, Huang B. Edge-relational window-attentional graph neural network for gene expression prediction in spatial transcriptomics analysis. Comput Biol Med 2024; 174:108449. [PMID: 38626512 DOI: 10.1016/j.compbiomed.2024.108449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 01/27/2024] [Accepted: 04/07/2024] [Indexed: 04/18/2024]
Abstract
Spatial transcriptomics (ST), containing gene expression with fine-grained (i.e., different windows) spatial location within tissue samples, has become vital in developing innovative treatments. Traditional ST technology, however, rely on costly specialized commercial equipment. Addressing this, our article aims to creates a cost-effective, virtual ST approach using standard tissue images for gene expression prediction, eliminating the need for expensive equipment. Conventional approaches in this field often overlook the long-distance spatial dependencies between different sample windows or need prior gene expression data. To overcome these limitations, we propose the Edge-Relational Window-Attentional Network (ErwaNet), enhancing gene prediction by capturing both local interactions and global structural information from tissue images, without prior gene expression data. ErwaNet innovatively constructs heterogeneous graphs to model local window interactions and incorporates an attention mechanism for global information analysis. This dual framework not only provides a cost-effective solution for gene expression predictions but also obviates the necessity of prior knowledge gene expression information, a significant advantage in the field of cancer research where it enables a more efficient and accessible analytical paradigm. ErwaNet stands out as a prior-free and easy-to-implement Graph Convolution Network (GCN) method for predicting gene expression from tissue images. Evaluation of the two public breast cancer datasets shows that ErwaNet, without additional information, outperforms the state-of-the-art (SOTA) methods. Code is available at https://github.com/biyecc/ErwaNet.
Collapse
Affiliation(s)
- Cui Chen
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| | - Zuping Zhang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| | - Panrui Tang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Xin Liu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Bo Huang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
4
|
Amaya-Rodriguez CA, Carvajal-Zamorano K, Bustos D, Alegría-Arcos M, Castillo K. A journey from molecule to physiology and in silico tools for drug discovery targeting the transient receptor potential vanilloid type 1 (TRPV1) channel. Front Pharmacol 2024; 14:1251061. [PMID: 38328578 PMCID: PMC10847257 DOI: 10.3389/fphar.2023.1251061] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 12/14/2023] [Indexed: 02/09/2024] Open
Abstract
The heat and capsaicin receptor TRPV1 channel is widely expressed in nerve terminals of dorsal root ganglia (DRGs) and trigeminal ganglia innervating the body and face, respectively, as well as in other tissues and organs including central nervous system. The TRPV1 channel is a versatile receptor that detects harmful heat, pain, and various internal and external ligands. Hence, it operates as a polymodal sensory channel. Many pathological conditions including neuroinflammation, cancer, psychiatric disorders, and pathological pain, are linked to the abnormal functioning of the TRPV1 in peripheral tissues. Intense biomedical research is underway to discover compounds that can modulate the channel and provide pain relief. The molecular mechanisms underlying temperature sensing remain largely unknown, although they are closely linked to pain transduction. Prolonged exposure to capsaicin generates analgesia, hence numerous capsaicin analogs have been developed to discover efficient analgesics for pain relief. The emergence of in silico tools offered significant techniques for molecular modeling and machine learning algorithms to indentify druggable sites in the channel and for repositioning of current drugs aimed at TRPV1. Here we recapitulate the physiological and pathophysiological functions of the TRPV1 channel, including structural models obtained through cryo-EM, pharmacological compounds tested on TRPV1, and the in silico tools for drug discovery and repositioning.
Collapse
Affiliation(s)
- Cesar A. Amaya-Rodriguez
- Centro Interdisciplinario de Neurociencia de Valparaíso, Facultad de Ciencias, Universidad de Valparaíso, Valparaíso, Chile
- Departamento de Fisiología y Comportamiento Animal, Facultad de Ciencias Naturales, Exactas y Tecnología, Universidad de Panamá, Ciudad de Panamá, Panamá
| | - Karina Carvajal-Zamorano
- Centro Interdisciplinario de Neurociencia de Valparaíso, Facultad de Ciencias, Universidad de Valparaíso, Valparaíso, Chile
| | - Daniel Bustos
- Centro de Investigación de Estudios Avanzados del Maule (CIEAM), Vicerrectoría de Investigación y Postgrado Universidad Católica del Maule, Talca, Chile
- Laboratorio de Bioinformática y Química Computacional, Departamento de Medicina Traslacional, Facultad de Medicina, Universidad Católica del Maule, Talca, Chile
| | - Melissa Alegría-Arcos
- Núcleo de Investigación en Data Science, Facultad de Ingeniería y Negocios, Universidad de las Américas, Santiago, Chile
| | - Karen Castillo
- Centro Interdisciplinario de Neurociencia de Valparaíso, Facultad de Ciencias, Universidad de Valparaíso, Valparaíso, Chile
- Centro de Investigación de Estudios Avanzados del Maule (CIEAM), Vicerrectoría de Investigación y Postgrado Universidad Católica del Maule, Talca, Chile
| |
Collapse
|
5
|
Liu X, Yang Z, Cheng J. Music recommendation algorithms based on knowledge graph and multi-task feature learning. Sci Rep 2024; 14:2055. [PMID: 38267571 PMCID: PMC10808181 DOI: 10.1038/s41598-024-52463-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 01/18/2024] [Indexed: 01/26/2024] Open
Abstract
During music recommendation scenarios, sparsity and cold start problems are inevitable. Auxiliary information has been utilized in music recommendation algorithms to provide users with more accurate music recommendation results. This study proposes an end-to-end framework, MMSS_MKR, that uses a knowledge graph as a source of auxiliary information to serve the information obtained from it to the recommendation module. The framework exploits Cross & Compression Units to bridge the knowledge graph embedding task with recommendation task modules. We can obtain more realistic triple information and exclude false triple information as much as possible, because our model obtains triple information through the music knowledge graph, and the information obtained through the recommendation module is used to determine the truth of the triple information; thus, the knowledge graph embedding task is used to perform the recommendation task. In the recommendation module, multiple predictions are adopted to predict the recommendation accuracy. In the knowledge graph embedding module, multiple calculations are used to calculate the score. Finally, the loss function of the model is improved to help us to obtain more useful information for music recommendations. The MMSS_MKR model achieved significant improvements in music recommendations compared with many existing recommendation models.
Collapse
Affiliation(s)
- Xinqiao Liu
- School of Music, Qufu Normal University, Rizhao, 276826, China
| | - Zhisheng Yang
- Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250353, China
| | - Jinyong Cheng
- Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250353, China.
| |
Collapse
|
6
|
Igarashi Y, Kojima R, Matsumoto S, Iwata H, Okuno Y, Yamada H. Developing a GNN-based AI model to predict mitochondrial toxicity using the bagging method. J Toxicol Sci 2024; 49:117-126. [PMID: 38432954 DOI: 10.2131/jts.49.117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
Mitochondrial toxicity has been implicated in the development of various toxicities, including hepatotoxicity. Therefore, mitochondrial toxicity has become a major screening factor in the early discovery phase of drug development. Several models have been developed to predict mitochondrial toxicity based on chemical structures. However, they only provide a binary classification of positive or negative results and do not provide the substructures that contribute to a positive decision. Therefore, we developed an artificial intelligence (AI) model to predict mitochondrial toxicity and visualize structural alerts. To construct the model, we used the open-source software library kMoL, which employs a graph neural network approach that allows learning from chemical structure data. We also utilized the integrated gradient method, which enables the visualization of substructures that contribute to positive results. The dataset used to construct the AI model exhibited a significant imbalance, with significantly more negative than positive data. To address this, we employed the bagging method, which resulted in a model with high predictive performance, as evidenced by an F1 score of 0.839. This model can also be used to visualize substructures that contribute to mitochondrial toxicity using the integrated gradient method. Our AI model predicts mitochondrial toxicity based on chemical structures and may contribute to screening mitochondrial toxicity in the early stages of drug discovery.
Collapse
Affiliation(s)
- Yoshinobu Igarashi
- Toxicogenomics Informatics Project, National Institutes of Biomedical Innovation, Health and Nutrition
| | - Ryosuke Kojima
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University
| | - Shigeyuki Matsumoto
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University
| | - Hiroaki Iwata
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University
| | - Yasushi Okuno
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University
| | - Hiroshi Yamada
- Toxicogenomics Informatics Project, National Institutes of Biomedical Innovation, Health and Nutrition
| |
Collapse
|
7
|
Iwata H, Nakai T, Koyama T, Matsumoto S, Kojima R, Okuno Y. VGAE-MCTS: A New Molecular Generative Model Combining the Variational Graph Auto-Encoder and Monte Carlo Tree Search. J Chem Inf Model 2023; 63:7392-7400. [PMID: 37993764 PMCID: PMC10716893 DOI: 10.1021/acs.jcim.3c01220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 11/03/2023] [Accepted: 11/03/2023] [Indexed: 11/24/2023]
Abstract
Molecular generation is crucial for advancing drug discovery, materials science, and chemical exploration. It expedites the search for new drug candidates, facilitates tailored material creation, and enhances our understanding of molecular diversity. By employing artificial intelligence techniques such as molecular generative models based on molecular graphs, researchers have tackled the challenge of identifying efficient molecules with desired properties. Here, we propose a new molecular generative model combining a graph-based deep neural network and a reinforcement learning technique. We evaluated the validity, novelty, and optimized physicochemical properties of the generated molecules. Importantly, the model explored uncharted regions of chemical space, allowing for the efficient discovery and design of new molecules. This innovative approach has considerable potential to revolutionize drug discovery, materials science, and chemical research for accelerating scientific innovation. By leveraging advanced techniques and exploring previously unexplored chemical spaces, this study offers promising prospects for the efficient discovery and design of new molecules in the field of drug development.
Collapse
Affiliation(s)
- Hiroaki Iwata
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
| | - Taichi Nakai
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
| | - Takuto Koyama
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
| | - Shigeyuki Matsumoto
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
| | - Ryosuke Kojima
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
| | - Yasushi Okuno
- Graduate
School of Medicine, Kyoto University, 53 Shogoin-kawaharacho, Sakyo-ku, Kyoto-shi, Kyoto 606-8507, Japan
- HPC-
and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, Kobe-shi, Hyogo 650-0047, Japan
| |
Collapse
|
8
|
Adachi A, Yamashita T, Kanaya S, Kosugi Y. Ensemble Machine Learning Approaches Based on Molecular Descriptors and Graph Convolutional Networks for Predicting the Efflux Activities of MDR1 and BCRP Transporters. AAPS J 2023; 25:88. [PMID: 37700207 DOI: 10.1208/s12248-023-00853-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 08/19/2023] [Indexed: 09/14/2023] Open
Abstract
Multidrug resistance (MDR1) and breast cancer resistance protein (BCRP) play important roles in drug absorption and distribution. Computational prediction of substrates for both transporters can help reduce time in drug discovery. This study aimed to predict the efflux activity of MDR1 and BCRP using multiple machine learning approaches with molecular descriptors and graph convolutional networks (GCNs). In vitro efflux activity was determined using MDR1- and BCRP-expressing cells. Predictive performance was assessed using an in-house dataset with a chronological split and an external dataset. CatBoost and support vector regression showed the best predictive performance for MDR1 and BCRP efflux activities, respectively, of the 25 descriptor-based machine learning methods based on the coefficient of determination (R2). The single-task GCN showed a slightly lower performance than descriptor-based prediction in the in-house dataset. In both approaches, the percentage of compounds predicted within twofold of the observed values in the external dataset was lower than that in the in-house dataset. Multi-task GCN did not show any improvements, whereas multimodal GCN increased the predictive performance of BCRP efflux activity compared with single-task GCN. Furthermore, the ensemble approach of descriptor-based machine learning and GCN achieved the highest predictive performance with R2 values of 0.706 and 0.587 in MDR1 and BCRP, respectively, in time-split test sets. This result suggests that two different approaches to represent molecular structures complement each other in terms of molecular characteristics. Our study demonstrated that predictive models using advanced machine learning approaches are beneficial for identifying potential substrate liability of both MDR1 and BCRP.
Collapse
Affiliation(s)
- Asahi Adachi
- Global DMPK, Takeda Pharmaceutical Company Limited, 26-1 Muraoka-Higashi, 2-Chome, Fujisawa, Kanagawa, 251-8555, Japan
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayamacho, Ikoma, Nara, 630-0101, Japan
| | - Tomoki Yamashita
- Global DMPK, Takeda Pharmaceutical Company Limited, 26-1 Muraoka-Higashi, 2-Chome, Fujisawa, Kanagawa, 251-8555, Japan
| | - Shigehiko Kanaya
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayamacho, Ikoma, Nara, 630-0101, Japan
| | - Yohei Kosugi
- Global DMPK, Takeda Pharmaceutical Company Limited, 26-1 Muraoka-Higashi, 2-Chome, Fujisawa, Kanagawa, 251-8555, Japan.
| |
Collapse
|
9
|
Liu S, Kosugi Y. Human Brain Penetration Prediction Using Scaling Approach from Animal Machine Learning Models. AAPS J 2023; 25:86. [PMID: 37667061 DOI: 10.1208/s12248-023-00850-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 08/14/2023] [Indexed: 09/06/2023] Open
Abstract
Machine learning (ML) approaches have been applied to predicting drug pharmacokinetic properties. Previously, we predicted rat unbound brain-to-plasma ratio (Kpuu,brain) by ML models. In this study, we aimed to predict human Kpuu,brain through animal ML models. First, we re-evaluated ML models for rat Kpuu,brain prediction by using trendy open-source packages. We then developed ML models for monkey Kpuu,brain prediction. Leave-one-out cross validation was utilized to rationally build models using a relatively small dataset. After establishing the monkey and rat ML models, human Kpuu,brain prediction was achieved by implementing the animal models considering appropriate scaling methods. Mechanistic NeuroPK models for the identical monkey and human dataset were treated as the criteria for comparison. Results showed that rat Kpuu,brain predictivity was successfully replicated. The optimal ML model for monkey Kpuu,brain prediction was superior to the NeuroPK model, where accuracy within 2-fold error was 78% (R2 = 0.76). For human Kpuu,brain prediction, rat model using relative expression factor (REF), scaled transporter efflux ratios (ERs), and monkey model using in vitro ERs can provide comparable predictivity to the NeuroPK model, where accuracy within 2-fold error was 71% and 64% (R2 = 0.30 and 0.52), respectively. We demonstrated that ML models can deliver promising Kpuu,brain prediction with several advantages: (1) predict reasonable animal Kpuu,brain; (2) prospectively predict human Kpuu,brain from animal models; and (3) can skip expensive monkey studies for human prediction by using the rat model. As a result, ML models can be a powerful tool for drug Kpuu,brain prediction in the discovery stage.
Collapse
Affiliation(s)
- Siyu Liu
- Drug Metabolism & Pharmacokinetics Research Laboratories, Preclinical & Translational Sciences, Research, Takeda Pharmaceutical Company Limited, Shonan Health Innovation Park, 26-1, Muraoka-Higashi 2-Chome, Fujisawa, Kanagawa, 251-8555, Japan.
| | - Yohei Kosugi
- Drug Metabolism & Pharmacokinetics Research Laboratories, Preclinical & Translational Sciences, Research, Takeda Pharmaceutical Company Limited, Shonan Health Innovation Park, 26-1, Muraoka-Higashi 2-Chome, Fujisawa, Kanagawa, 251-8555, Japan
| |
Collapse
|
10
|
Koyama T, Matsumoto S, Iwata H, Kojima R, Okuno Y. Improving Compound-Protein Interaction Prediction by Self-Training with Augmenting Negative Samples. J Chem Inf Model 2023; 63:4552-4559. [PMID: 37460105 PMCID: PMC10428206 DOI: 10.1021/acs.jcim.3c00269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Indexed: 08/15/2023]
Abstract
Identifying compound-protein interactions (CPIs) is crucial for drug discovery. Since experimentally validating CPIs is often time-consuming and costly, computational approaches are expected to facilitate the process. Rapid growths of available CPI databases have accelerated the development of many machine-learning methods for CPI predictions. However, their performance, particularly their generalizability against external data, often suffers from a data imbalance attributed to the lack of experimentally validated inactive (negative) samples. In this study, we developed a self-training method for augmenting both credible and informative negative samples to improve the performance of models impaired by data imbalances. The constructed model demonstrated higher performance than those constructed with other conventional methods for solving data imbalances, and the improvement was prominent for external datasets. Moreover, examination of the prediction score thresholds for pseudo-labeling during self-training revealed that augmenting the samples with ambiguous prediction scores is beneficial for constructing a model with high generalizability. The present study provides guidelines for improving CPI predictions on real-world data, thus facilitating drug discovery.
Collapse
Affiliation(s)
- Takuto Koyama
- Graduate
School of Medicine, Kyoto University, Sakyo-ku 606-8507 Kyoto, Japan
| | - Shigeyuki Matsumoto
- Graduate
School of Medicine, Kyoto University, Sakyo-ku 606-8507 Kyoto, Japan
| | - Hiroaki Iwata
- Graduate
School of Medicine, Kyoto University, Sakyo-ku 606-8507 Kyoto, Japan
| | - Ryosuke Kojima
- Graduate
School of Medicine, Kyoto University, Sakyo-ku 606-8507 Kyoto, Japan
| | - Yasushi Okuno
- Graduate
School of Medicine, Kyoto University, Sakyo-ku 606-8507 Kyoto, Japan
- HPC-
and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, Kobe 650-0047, Hyogo, Japan
| |
Collapse
|
11
|
Chaka M, Geffe CA, Rodriguez A, Seriani N, Wu Q, Mekonnen YS. High-Throughput Screening of Promising Redox-Active Molecules with MolGAT. ACS OMEGA 2023; 8:24268-24278. [PMID: 37457475 PMCID: PMC10339396 DOI: 10.1021/acsomega.3c01295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 06/12/2023] [Indexed: 07/18/2023]
Abstract
Redox flow batteries (RFBs) have emerged as a promising option for large-scale energy storage, owing to their high energy density, low cost, and environmental benefits. However, the identification of organic compounds with high redox activity, aqueous solubility, stability, and fast redox kinetics is a crucial and challenging step in developing an RFB technology. Density functional theory-based computational materials prediction and screening is a time-consuming and computationally expensive technique, yet it has a high success rate. To speed up the discovery of new materials with desired properties, machine-learning-based models can be trained on large data sets. Graph neural networks (GNNs) are particularly well-suited for non-Euclidean data and can model complex relationships, making them ideal for accelerating the discovery of novel materials. In this study, a GNN-based model called MolGAT was developed to predict the redox potential of organic molecules using molecular structures, atomic properties, and bond attributes. The model was trained on a data set of over 15,000 compounds with redox potentials ranging from -4.11 to 2.56. MolGAT outperformed other GNN variants, such as the Graph Attention Network, Graph Convolution Network, and AttentiveFP models. The trained model was used to screen a vast chemical data set comprising 581,014 molecules, namely OMDB, QM9, ZINC, CHEMBL, and DELANEY, and identified 23,467 potential redox-active compounds for use in redox flow batteries. Of those, 20,716 molecules were identified as potential catholytes with predicted redox potentials up to 2.87 V, while 2,751 molecules were deemed potential anolytes with predicted redox potentials as low as -2.88 V. This work demonstrates the capabilities of graph neural networks in condensed matter physics and materials science to screen promising redox-active species for further electronic structure calculations and experimental testing.
Collapse
Affiliation(s)
- Mesfin
Diro Chaka
- Department
of Physics, College of Natural and Computational Sciences, Addis Ababa University, P.O. Box 1176, Addis Ababa 1176, Ethiopia
- Computational
Data Science, College of Natural and Computational Sciences, Addis Ababa University, P.O. Box 1176, Addis Ababa 1176, Ethiopia
| | - Chernet Amente Geffe
- Department
of Physics, College of Natural and Computational Sciences, Addis Ababa University, P.O. Box 1176, Addis Ababa 1176, Ethiopia
| | - Alex Rodriguez
- The Abdus
Salam International Centre for Theoretical Physics(ICTP) Condensed Matter and Statistical Physics Section, 34100 Trieste, Italy
| | - Nicola Seriani
- The Abdus
Salam International Centre for Theoretical Physics(ICTP) Condensed Matter and Statistical Physics Section, 34100 Trieste, Italy
| | - Qin Wu
- Brookhaven
National Laboratory, Center for Functional Nanomaterials, Upton New York 11973, United States
| | - Yedilfana Setarge Mekonnen
- Center for
Environmental Science, College of Natural and Computational Sciences, Addis Ababa University, P.O. Box 1176, Addis Ababa 1176, Ethiopia
| |
Collapse
|
12
|
AbdulHameed MDM, Liu R, Wallqvist A. Using a Graph Convolutional Neural Network Model to Identify Bile Salt Export Pump Inhibitors. ACS OMEGA 2023; 8:21853-21861. [PMID: 37360478 PMCID: PMC10286257 DOI: 10.1021/acsomega.3c01583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 05/19/2023] [Indexed: 06/28/2023]
Abstract
The bile salt export pump (BSEP) is a key transporter involved in the efflux of bile salts from hepatocytes to bile canaliculi. Inhibition of BSEP leads to the accumulation of bile salts within the hepatocytes, leading to possible cholestasis and drug-induced liver injury. Screening for and identification of chemicals that inhibit this transporter aid in understanding the safety liabilities of these chemicals. Moreover, computational approaches to identify BSEP inhibitors provide an alternative to the more resource-intensive, gold standard experimental approaches. Here, we used publicly available data to develop predictive machine learning models for the identification of potential BSEP inhibitors. Specifically, we analyzed the utility of a graph convolutional neural network (GCNN)-based approach in combination with multitask learning to identify BSEP inhibitors. Our analyses showed that the developed GCNN model performed better than the variable-nearest neighbor and Bayesian machine learning approaches, with a cross-validation receiver operating characteristic area under the curve of 0.86. In addition, we compared GCNN-based single-task and multitask models and evaluated their utility in addressing data limitation challenges commonly observed in bioactivity modeling. We found that multitask models performed better than single-task models and can be utilized to identify active molecules for targets with limited data availability. Overall, our developed multitask GCNN-based BSEP model provides a useful tool for prioritizing hits during early drug discovery and in risk assessment of chemicals.
Collapse
Affiliation(s)
- Mohamed Diwan M. AbdulHameed
- Department
of Defense Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Development Command, Fort Detrick 21702, Maryland, United States
- The
Henry M. Jackson Foundation for the Advancement of Military Medicine,
Inc., Bethesda 20817, Maryland, United States
| | - Ruifeng Liu
- Department
of Defense Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Development Command, Fort Detrick 21702, Maryland, United States
- The
Henry M. Jackson Foundation for the Advancement of Military Medicine,
Inc., Bethesda 20817, Maryland, United States
| | - Anders Wallqvist
- Department
of Defense Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Development Command, Fort Detrick 21702, Maryland, United States
| |
Collapse
|
13
|
Zhang H, Saravanan KM, Zhang JZH. DeepBindGCN: Integrating Molecular Vector Representation with Graph Convolutional Neural Networks for Protein-Ligand Interaction Prediction. Molecules 2023; 28:4691. [PMID: 37375246 PMCID: PMC10301867 DOI: 10.3390/molecules28124691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/08/2023] [Accepted: 06/09/2023] [Indexed: 06/29/2023] Open
Abstract
The core of large-scale drug virtual screening is to select the binders accurately and efficiently with high affinity from large libraries of small molecules in which non-binders are usually dominant. The binding affinity is significantly influenced by the protein pocket, ligand spatial information, and residue types/atom types. Here, we used the pocket residues or ligand atoms as the nodes and constructed edges with the neighboring information to comprehensively represent the protein pocket or ligand information. Moreover, the model with pre-trained molecular vectors performed better than the one-hot representation. The main advantage of DeepBindGCN is that it is independent of docking conformation, and concisely keeps the spatial information and physical-chemical features. Using TIPE3 and PD-L1 dimer as proof-of-concept examples, we proposed a screening pipeline integrating DeepBindGCN and other methods to identify strong-binding-affinity compounds. It is the first time a non-complex-dependent model has achieved a root mean square error (RMSE) value of 1.4190 and Pearson r value of 0.7584 in the PDBbind v.2016 core set, respectively, thereby showing a comparable prediction power with the state-of-the-art affinity prediction models that rely upon the 3D complex. DeepBindGCN provides a powerful tool to predict the protein-ligand interaction and can be used in many important large-scale virtual screening application scenarios.
Collapse
Affiliation(s)
- Haiping Zhang
- Shenzhen Institute of Synthetic Biology, Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai 600073, Tamil Nadu, India;
| | - John Z. H. Zhang
- Shenzhen Institute of Synthetic Biology, Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
14
|
Tan G, Shi Y, Wang J, Li H, Chen Z, Wang X. Guided node graph convolutional networks for repository recommendation. INTELL DATA ANAL 2023. [DOI: 10.3233/ida-216250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Knowledge graph (KG) has been widely used in the field of recommender systems. There are some nodes in KG that guide the occurrence of interaction behaviors. We call them guided nodes. However, the current application doesn’t take into account the guided nodes in KG. We explore the utility of guided nodes in KG. It is applied in repository recommendations. In this paper, we propose an end-to-end framework, namely Guided Node Graph Convolutional Network (GNGCN), which effectively captures the connections between entities by mining the influence of related nodes. We extract samples of each entity in KG as their guided nodes and then combine the information and bias of the guided nodes when computing the representation of a given entity. The guided nodes can be extended to multiple hops. We evaluate our model on a real-world Github dataset named Github-SKG and music recommendation dataset, and the experimental results show that the method outperforms the recommendation baselines and our model is much lighter than others.
Collapse
Affiliation(s)
- Guoqiang Tan
- School of Software, Shandong University, Jinan, Shandong, China
| | - Yuliang Shi
- School of Software, Shandong University, Jinan, Shandong, China
- Dareway Software Co., Ltd, Jinan, Shandong, China
| | - Jihu Wang
- School of Software, Shandong University, Jinan, Shandong, China
| | - Hui Li
- School of Software, Shandong University, Jinan, Shandong, China
| | - Zhiyong Chen
- School of Software, Shandong University, Jinan, Shandong, China
| | - Xinjun Wang
- School of Software, Shandong University, Jinan, Shandong, China
| |
Collapse
|
15
|
Ogawa K, Sakamoto D, Hosoki R. Computer Science Technology in Natural Products Research: A Review of Its Applications and Implications. Chem Pharm Bull (Tokyo) 2023; 71:486-494. [PMID: 37394596 DOI: 10.1248/cpb.c23-00039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Computational approaches to drug development are rapidly growing in popularity and have been used to produce significant results. Recent developments in information science have expanded databases and chemical informatics knowledge relating to natural products. Natural products have long been well-studied, and a large number of unique structures and remarkable active substances have been reported. Analyzing accumulated natural product knowledge using emerging computational science techniques is expected to yield more new discoveries. In this article, we discuss the current state of natural product research using machine learning. The basic concepts and frameworks of machine learning are summarized. Natural product research that utilizes machine learning is described in terms of the exploration of active compounds, automatic compound design, and application to spectral data. In addition, efforts to develop drugs for intractable diseases will be addressed. Lastly, we discuss key considerations for applying machine learning in this field. This paper aims to promote progress in natural product research by presenting the current state of computational science and chemoinformatics approaches in terms of its applications, strengths, limitations, and implications for the field.
Collapse
Affiliation(s)
- Keiko Ogawa
- Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University
| | - Daiki Sakamoto
- Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University
| | - Rumiko Hosoki
- Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University
| |
Collapse
|
16
|
Ju W, Gu Y, Luo X, Wang Y, Yuan H, Zhong H, Zhang M. Unsupervised graph-level representation learning with hierarchical contrasts. Neural Netw 2023; 158:359-368. [PMID: 36516542 DOI: 10.1016/j.neunet.2022.11.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 11/08/2022] [Accepted: 11/13/2022] [Indexed: 11/27/2022]
Abstract
Unsupervised graph-level representation learning has recently shown great potential in a variety of domains, ranging from bioinformatics to social networks. Plenty of graph contrastive learning methods have been proposed to generate discriminative graph-level representations recently. They typically design multiple types of graph augmentations and enforce a graph to have consistent representations under different views. However, these techniques mostly neglect the intrinsic hierarchical structure of the graph, resulting in a limited exploration of semantic information for graph representation. Moreover, they often rely on a large number of negative samples to prevent collapsing into trivial solutions, while a great need for negative samples may lead to memory issues during optimization in graph domains. To address the two issues, this paper develops an unsupervised graph-level representation learning framework named Hierarchical Graph Contrastive Learning (HGCL), which investigates the hierarchical structural semantics of a graph at both node and graph levels. Specifically, our HGCL consists of three parts, i.e., node-level contrastive learning, graph-level contrastive learning, and mutual contrastive learning to capture graph semantics hierarchically. Furthermore, the Siamese network and momentum update are further involved to release the demand for excessive negative samples. Finally, the experimental results on both benchmark datasets for graph classification and large-scale OGB datasets for transfer learning demonstrate that our proposed HGCL significantly outperforms a broad range of state-of-the-art baselines.
Collapse
Affiliation(s)
- Wei Ju
- School of Computer Science, Peking University, Beijing, 100871, China
| | - Yiyang Gu
- School of Computer Science, Peking University, Beijing, 100871, China
| | - Xiao Luo
- Department of Computer Science, University of California, Los Angeles, 90095, USA.
| | - Yifan Wang
- School of Computer Science, Peking University, Beijing, 100871, China
| | - Haochen Yuan
- School of Computer Science, Peking University, Beijing, 100871, China
| | | | - Ming Zhang
- School of Computer Science, Peking University, Beijing, 100871, China.
| |
Collapse
|
17
|
Raush E, Abagyan R, Totrov M. Graph-Convolutional Neural Net Model of the Statistical Torsion Profiles for Small Organic Molecules. J Chem Inf Model 2022; 62:5896-5906. [PMID: 36456533 DOI: 10.1021/acs.jcim.2c00790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
We present a graph-convolutional neural network (GCNN)-based method for learning and prediction of statistical torsional profiles (STP) in small organic molecules based on the experimental X-ray structure data. A specialized GCNN torsion profile model is trained using the structures in the Crystallography Open Database (COD). The GCNN-STP model captures torsional preferences over a wide range of torsion rotor chemotypes and correctly predicts a variety of effects from the vicinal atoms and moieties. GCNN-STP statistical profiles also show good agreement with quantum chemically (DFT) calculated torsion energy profiles. Furthermore, we demonstrate the application of the GCNN-STP statistical profiles for conformer generation. A web server that allows interactive profile prediction and viewing is made freely available at https://www.molsoft.com/tortool.html.
Collapse
Affiliation(s)
- Eugene Raush
- Molsoft L.L.C., 11199 Sorrento Valley Road, S209, San Diego, California92121, United States
| | - Ruben Abagyan
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California92093, United States
| | - Maxim Totrov
- Molsoft L.L.C., 11199 Sorrento Valley Road, S209, San Diego, California92121, United States
| |
Collapse
|
18
|
Jiang J, Ma X, Ouyang D, Williams RO. Emerging Artificial Intelligence (AI) Technologies Used in the Development of Solid Dosage Forms. Pharmaceutics 2022; 14:2257. [PMID: 36365076 PMCID: PMC9694557 DOI: 10.3390/pharmaceutics14112257] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 10/11/2022] [Accepted: 10/17/2022] [Indexed: 07/30/2023] Open
Abstract
Artificial Intelligence (AI)-based formulation development is a promising approach for facilitating the drug product development process. AI is a versatile tool that contains multiple algorithms that can be applied in various circumstances. Solid dosage forms, represented by tablets, capsules, powder, granules, etc., are among the most widely used administration methods. During the product development process, multiple factors including critical material attributes (CMAs) and processing parameters can affect product properties, such as dissolution rates, physical and chemical stabilities, particle size distribution, and the aerosol performance of the dry powder. However, the conventional trial-and-error approach for product development is inefficient, laborious, and time-consuming. AI has been recently recognized as an emerging and cutting-edge tool for pharmaceutical formulation development which has gained much attention. This review provides the following insights: (1) a general introduction of AI in the pharmaceutical sciences and principal guidance from the regulatory agencies, (2) approaches to generating a database for solid dosage formulations, (3) insight on data preparation and processing, (4) a brief introduction to and comparisons of AI algorithms, and (5) information on applications and case studies of AI as applied to solid dosage forms. In addition, the powerful technique known as deep learning-based image analytics will be discussed along with its pharmaceutical applications. By applying emerging AI technology, scientists and researchers can better understand and predict the properties of drug formulations to facilitate more efficient drug product development processes.
Collapse
Affiliation(s)
- Junhuang Jiang
- Division of Molecular Pharmaceutics and Drug Delivery, College of Pharmacy, The University of Texas at Austin, Austin, TX 78712, USA
| | - Xiangyu Ma
- Global Investment Research, Goldman Sachs, New York, NY 10282, USA
| | - Defang Ouyang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau 999078, China
| | - Robert O. Williams
- Division of Molecular Pharmaceutics and Drug Delivery, College of Pharmacy, The University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
19
|
Cao W, Liu Y, Cao G, He Z. Implicit user relationships across sessions enhanced graph for session-based recommendation. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.062] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
20
|
Veríssimo GC, Serafim MSM, Kronenberger T, Ferreira RS, Honorio KM, Maltarollo VG. Designing drugs when there is low data availability: one-shot learning and other approaches to face the issues of a long-term concern. Expert Opin Drug Discov 2022; 17:929-947. [PMID: 35983695 DOI: 10.1080/17460441.2022.2114451] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
INTRODUCTION Modern drug discovery generally is accessed by useful information from previous large databases or uncovering novel data. The lack of biological and/or chemical data tends to slow the development of scientific research and innovation. Here, approaches that may help provide solutions to generate or obtain enough relevant data or improve/accelerate existing methods within the last five years were reviewed. AREAS COVERED One-shot learning (OSL) approaches, structural modeling, molecular docking, scoring function space (SFS), molecular dynamics (MD), and quantum mechanics (QM) may be used to amplify the amount of available data to drug design and discovery campaigns, presenting methods, their perspectives, and discussions to be employed in the near future. EXPERT OPINION Recent works have successfully used these techniques to solve a range of issues in the face of data scarcity, including complex problems such as the challenging scenario of drug design aimed at intrinsically disordered proteins and the evaluation of potential adverse effects in a clinical scenario. These examples show that it is possible to improve and kickstart research from scarce available data to design and discover new potential drugs.
Collapse
Affiliation(s)
- Gabriel C Veríssimo
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Mateus Sá M Serafim
- Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Thales Kronenberger
- Department of Medical Oncology and Pneumology, Internal Medicine VIII, University Hospital of Tübingen, Tübingen, Germany.,School of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, Kuopio, Finland
| | - Rafaela S Ferreira
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Kathia M Honorio
- Escola de Artes, Ciências e Humanidades, Universidade de São Paulo (USP), São Paulo, Brazil.,Centro de Ciências Naturais e Humanas, Universidade Federal do ABC (UFABC), Santo André, Brazil
| | - Vinícius G Maltarollo
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| |
Collapse
|
21
|
Kim C, Moon H, Hwang HJ. NEAR: Neighborhood Edge AggregatoR for Graph Classification. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3506714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Learning graph-structured data with graph neural networks (GNNs) has been recently emerging as an important field because of its wide applicability in bioinformatics, chemoinformatics, social network analysis, and data mining. Recent GNN algorithms are based on neural message passing, which enables GNNs to integrate local structures and node features recursively. However, past GNN algorithms based on 1-hop neighborhood neural message passing are exposed to a risk of loss of information on local structures and relationships. In this article, we propose Neighborhood Edge AggregatoR (NEAR), a framework that aggregates relations between the nodes in the neighborhood via edges. NEAR, which can be orthogonally combined with Graph Isomorphism Network (GIN), gives integrated information that describes which nodes in the neighborhood are connected. Therefore, NEAR can reflect additional information of a local structure of each node beyond the nodes themselves in 1-hop neighborhood. Experimental results on multiple graph classification tasks show that our algorithm makes a good improvement over other existing 1-hop based GNN-based algorithms.
Collapse
Affiliation(s)
- Cheolhyeong Kim
- Pohang University of Science and Technology, Pohang. Gyeongbuk, Republic of Korea
| | - Haeseong Moon
- University of California San Diego, San Diego, La Jolla, CA, USA
| | - Hyung Ju Hwang
- Pohang University of Science and Technology, Pohang. Gyeongbuk, Republic of Korea
| |
Collapse
|
22
|
Mongia M, Guler M, Mohimani H. An interpretable machine learning approach to identify mechanism of action of antibiotics. Sci Rep 2022; 12:10342. [PMID: 35725893 PMCID: PMC9209520 DOI: 10.1038/s41598-022-14229-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Accepted: 06/02/2022] [Indexed: 11/19/2022] Open
Abstract
As antibiotic resistance is becoming a major public health problem worldwide, one of the approaches for novel antibiotic discovery is re-purposing drugs available on the market for treating antibiotic resistant bacteria. The main economic advantage of this approach is that since these drugs have already passed all the safety tests, it vastly reduces the overall cost of clinical trials. Recently, several machine learning approaches have been developed for predicting promising antibiotics by training on bioactivity data collected on a set of small molecules. However, these methods report hundreds/thousands of bioactive molecules, and it remains unclear which of these molecules possess a novel mechanism of action. While the cost of high-throughput bioactivity testing has dropped dramatically in recent years, determining the mechanism of action of small molecules remains a costly and time-consuming step, and therefore computational methods for prioritizing molecules with novel mechanisms of action are needed. The existing approaches for predicting bioactivity of small molecules are based on uninterpretable machine learning, and therefore are not capable of determining known mechanism of action of small molecules and prioritizing novel mechanisms. We introduce InterPred, an interpretable technique for predicting bioactivity of small molecules and their mechanism of action. InterPred has the same accuracy as the state of the art in bioactivity prediction, and it enables assigning chemical moieties that are responsible for bioactivity. After analyzing bioactivity data of several thousand molecules against bacterial and fungal pathogens available from Community for Open Antimicrobial Drug Discovery and a US Food and Drug Association-approved drug library, InterPred identified five known links between moieties and mechanism of action.
Collapse
Affiliation(s)
- Mihir Mongia
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA
| | - Mustafa Guler
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA
| | - Hosein Mohimani
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA.
| |
Collapse
|
23
|
Luo X, Ju W, Qu M, Gu Y, Chen C, Deng M, Hua XS, Zhang M. CLEAR: Cluster-Enhanced Contrast for Self-Supervised Graph Representation Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:899-912. [PMID: 35675236 DOI: 10.1109/tnnls.2022.3177775] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article studies self-supervised graph representation learning, which is critical to various tasks, such as protein property prediction. Existing methods typically aggregate representations of each individual node as graph representations, but fail to comprehensively explore local substructures (i.e., motifs and subgraphs), which also play important roles in many graph mining tasks. In this article, we propose a self-supervised graph representation learning framework named cluster-enhanced Contrast (CLEAR) that models the structural semantics of a graph from graph-level and substructure-level granularities, i.e., global semantics and local semantics, respectively. Specifically, we use graph-level augmentation strategies followed by a graph neural network-based encoder to explore global semantics. As for local semantics, we first use graph clustering techniques to partition each whole graph into several subgraphs while preserving as much semantic information as possible. We further employ a self-attention interaction module to aggregate the semantics of all subgraphs into a local-view graph representation. Moreover, we integrate both global semantics and local semantics into a multiview graph contrastive learning framework, enhancing the semantic-discriminative ability of graph representations. Extensive experiments on various real-world benchmarks demonstrate the efficacy of the proposed over current graph self-supervised representation learning approaches on both graph classification and transfer learning tasks.
Collapse
|
24
|
Ishida S, Terayama K, Kojima R, Takasu K, Okuno Y. AI-Driven Synthetic Route Design Incorporated with Retrosynthesis Knowledge. J Chem Inf Model 2022; 62:1357-1367. [PMID: 35258953 PMCID: PMC8965881 DOI: 10.1021/acs.jcim.1c01074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Computer-aided synthesis planning (CASP) aims to assist chemists in performing retrosynthetic analysis for which they utilize their experiments, intuition, and knowledge. Recent breakthroughs in machine learning (ML) techniques, including deep neural networks, have significantly improved data-driven synthetic route designs without human intervention. However, learning chemical knowledge by ML for practical synthesis planning has not yet been adequately achieved and remains a challenging problem. In this study, we developed a data-driven CASP application integrated with various portions of retrosynthesis knowledge called "ReTReK" that introduces the knowledge as adjustable parameters into the evaluation of promising search directions. The experimental results showed that ReTReK successfully searched synthetic routes based on the specified retrosynthesis knowledge, indicating that the synthetic routes searched with the knowledge were preferred to those without the knowledge. The concept of integrating retrosynthesis knowledge as adjustable parameters into a data-driven CASP application is expected to enhance the performance of both existing data-driven CASP applications and those under development.
Collapse
Affiliation(s)
- Shoichi Ishida
- Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshidashimo-Adachicho, Sakyo-ku 606-8501, Kyoto, Japan
| | - Kei Terayama
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29, Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Kanagawa, Japan.,Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku 606-8507, Kyoto, Japan
| | - Ryosuke Kojima
- Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku 606-8507, Kyoto, Japan
| | - Kiyosei Takasu
- Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshidashimo-Adachicho, Sakyo-ku 606-8501, Kyoto, Japan
| | - Yasushi Okuno
- Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku 606-8507, Kyoto, Japan.,HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, 7-1-26, Minatojima-minami-machi, Chuo-ku, Kobe 650-0047, Hyogo, Japan
| |
Collapse
|
25
|
Ju W, Luo X, Ma Z, Yang J, Deng M, Zhang M. GHNN: Graph Harmonic Neural Networks for semi-supervised graph-level classification. Neural Netw 2022; 151:70-79. [DOI: 10.1016/j.neunet.2022.03.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 01/19/2022] [Accepted: 03/10/2022] [Indexed: 11/27/2022]
|
26
|
Zhang M, Yu X, Rong J, Ou L. Graph pruning for model compression. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02802-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
27
|
Ikeda K, Doi T, Ikeda M, Tomii K. PreBINDS: An Interactive Web Tool to Create Appropriate Datasets for Predicting Compound-Protein Interactions. Front Mol Biosci 2021; 8:758480. [PMID: 34938773 PMCID: PMC8685504 DOI: 10.3389/fmolb.2021.758480] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Accepted: 11/15/2021] [Indexed: 11/13/2022] Open
Abstract
Given the abundant computational resources and the huge amount of data of compound-protein interactions (CPIs), constructing appropriate datasets for learning and evaluating prediction models for CPIs is not always easy. For this study, we have developed a web server to facilitate the development and evaluation of prediction models by providing an appropriate dataset according to the task. Our web server provides an environment and dataset that aid model developers and evaluators in obtaining a suitable dataset for both proteins and compounds, in addition to attributes necessary for deep learning. With the web server interface, users can customize the CPI dataset derived from ChEMBL by setting positive and negative thresholds to be adjusted according to the user's definitions. We have also implemented a function for graphic display of the distribution of activity values in the dataset as a histogram to set appropriate thresholds for positive and negative examples. These functions enable effective development and evaluation of models. Furthermore, users can prepare their task-specific datasets by selecting a set of target proteins based on various criteria such as Pfam families, ChEMBL's classification, and sequence similarities. The accuracy and efficiency of in silico screening and drug design using machine learning including deep learning can therefore be improved by facilitating access to an appropriate dataset prepared using our web server (https://binds.lifematics.work/).
Collapse
Affiliation(s)
- Kazuyoshi Ikeda
- Medicinal Chemistry Applied AI Unit, HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, Yokohama, Japan.,Division of Physics for Life Functions, Keio University Faculty of Pharmacy, Tokyo, Japan
| | | | - Masami Ikeda
- Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Kentaro Tomii
- Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.,AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| |
Collapse
|
28
|
Kentour M, Lu J. An investigation into the deep learning approach in sentimental analysis using graph-based theories. PLoS One 2021; 16:e0260761. [PMID: 34855856 PMCID: PMC8638889 DOI: 10.1371/journal.pone.0260761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 11/16/2021] [Indexed: 11/24/2022] Open
Abstract
Sentiment analysis is a branch of natural language analytics that aims to correlate what is expressed which comes normally within unstructured format with what is believed and learnt. Several attempts have tried to address this gap (i.e., Naive Bayes, RNN, LSTM, word embedding, etc.), even though the deep learning models achieved high performance, their generative process remains a "black-box" and not fully disclosed due to the high dimensional feature and the non-deterministic weights assignment. Meanwhile, graphs are becoming more popular when modeling complex systems while being traceable and understood. Here, we reveal that a good trade-off transparency and efficiency could be achieved with a Deep Neural Network by exploring the Credit Assignment Paths theory. To this end, we propose a novel algorithm which alleviates the features' extraction mechanism and attributes an importance level of selected neurons by applying a deterministic edge/node embeddings with attention scores on the input unit and backward path respectively. We experiment on the Twitter Health News dataset were the model has been extended to approach different approximations (tweet/aspect and tweets' source levels, frequency, polarity/subjectivity), it was also transparent and traceable. Moreover, results of comparing with four recent models on same data corpus for tweets analysis showed a rapid convergence with an overall accuracy of ≈83% and 94% of correctly identified true positive sentiments. Therefore, weights can be ideally assigned to specific active features by following the proposed method. As opposite to other compared works, the inferred features are conditioned through the users' preferences (i.e., frequency degree) and via the activation's derivatives (i.e., reject feature if not scored). Future direction will address the inductive aspect of graph embeddings to include dynamic graph structures and expand the model resiliency by considering other datasets like SemEval task7, covid-19 tweets, etc.
Collapse
Affiliation(s)
- Mohamed Kentour
- School of Computing and Engineering, University of Huddersfield, Huddersfield, West- Yorkshire, United Kingdom
| | - Joan Lu
- School of Computing and Engineering, University of Huddersfield, Huddersfield, West- Yorkshire, United Kingdom
| |
Collapse
|
29
|
Peng SP, Yang XY, Zhao Y. Molecular Conditional Generation and Property Analysis of Non-Fullerene Acceptors with Deep Learning. Int J Mol Sci 2021; 22:9099. [PMID: 34445805 PMCID: PMC8396663 DOI: 10.3390/ijms22169099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 08/19/2021] [Accepted: 08/20/2021] [Indexed: 11/29/2022] Open
Abstract
The proposition of non-fullerene acceptors (NFAs) in organic solar cells has made great progress in the raise of power conversion efficiency, and it also broadens the ways for searching and designing new acceptor molecules. In this work, the design of novel NFAs with required properties is performed with the conditional generative model constructed from a convolutional neural network (CNN). The temporal CNN is firstly trained to be a good string-based molecular conditional generative model to directly generate the desired molecules. The reliability of generated molecular properties is then demonstrated by a graph-based prediction model and evaluated with quantum chemical calculations. Specifically, the global attention mechanism is incorporated in the prediction model to pool the extracted information of molecular structures and provide interpretability. By combining the generative and prediction models, thousands of NFAs with required frontier molecular orbital energies are generated. The generated new molecules essentially explore the chemical space and enrich the database of transformation rules for molecular design. The conditional generation model can also be trained to generate the molecules from molecular fragments, and the contribution of molecular fragments to the properties is subsequently predicted by the prediction model.
Collapse
Affiliation(s)
| | | | - Yi Zhao
- State Key Laboratory for Physical Chemistry of Solid Surfaces, Fujian Provincial Key Lab of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China; (S.-P.P.); (X.-Y.Y.)
| |
Collapse
|
30
|
Artificial intelligence in drug design: algorithms, applications, challenges and ethics. FUTURE DRUG DISCOVERY 2021. [DOI: 10.4155/fdd-2020-0028] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The discovery paradigm of drugs is rapidly growing due to advances in machine learning (ML) and artificial intelligence (AI). This review covers myriad faces of AI and ML in drug design. There is a plethora of AI algorithms, the most common of which are summarized in this review. In addition, AI is fraught with challenges that are highlighted along with plausible solutions to them. Examples are provided to illustrate the use of AI and ML in drug discovery and in predicting drug properties such as binding affinities and interactions, solubility, toxicology, blood–brain barrier permeability and chemical properties. The review also includes examples depicting the implementation of AI and ML in tackling intractable diseases such as COVID-19, cancer and Alzheimer’s disease. Ethical considerations and future perspectives of AI are also covered in this review.
Collapse
|
31
|
A Comprehensive Survey of Knowledge Graph-Based Recommender Systems: Technologies, Development, and Contributions. INFORMATION 2021. [DOI: 10.3390/info12060232] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
In recent years, the use of recommender systems has become popular on the web. To improve recommendation performance, usage, and scalability, the research has evolved by producing several generations of recommender systems. There is much literature about it, although most proposals focus on traditional methods’ theories and applications. Recently, knowledge graph-based recommendations have attracted attention in academia and the industry because they can alleviate information sparsity and performance problems. We found only two studies that analyze the recommendation system’s role over graphs, but they focus on specific recommendation methods. This survey attempts to cover a broader analysis from a set of selected papers. In summary, the contributions of this paper are as follows: (1) we explore traditional and more recent developments of filtering methods for a recommender system, (2) we identify and analyze proposals related to knowledge graph-based recommender systems, (3) we present the most relevant contributions using an application domain, and (4) we outline future directions of research in the domain of recommender systems. As the main survey result, we found that the use of knowledge graphs for recommendations is an efficient way to leverage and connect a user’s and an item’s knowledge, thus providing more precise results for users.
Collapse
|
32
|
Serafim MSM, Dos Santos Júnior VS, Gertrudes JC, Maltarollo VG, Honorio KM. Machine learning techniques applied to the drug design and discovery of new antivirals: a brief look over the past decade. Expert Opin Drug Discov 2021; 16:961-975. [PMID: 33957833 DOI: 10.1080/17460441.2021.1918098] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Introduction: Drug design and discovery of new antivirals will always be extremely important in medicinal chemistry, taking into account known and new viral diseases that are yet to come. Although machine learning (ML) have shown to improve predictions on the biological potential of chemicals and accelerate the discovery of drugs over the past decade, new methods and their combinations have improved their performance and established promising perspectives regarding ML in the search for new antivirals.Areas covered: The authors consider some interesting areas that deal with different ML techniques applied to antivirals. Recent innovative studies on ML and antivirals were selected and analyzed in detail. Also, the authors provide a brief look at the past to the present to detect advances and bottlenecks in the area.Expert opinion: From classical ML techniques, it was possible to boost the searches for antivirals. However, from the emergence of new algorithms and the improvement in old approaches, promising results will be achieved every day, as we have observed in the case of SARS-CoV-2. Recent experience has shown that it is possible to use ML to discover new antiviral candidates from virtual screening and drug repurposing.
Collapse
Affiliation(s)
- Mateus Sá Magalhães Serafim
- Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | | | - Jadson Castro Gertrudes
- Departamento de Computação, Instituto de Ciências Exatas e Biológicas, Universidade Federal de Ouro Preto (UFOP), Ouro Preto, Brazil
| | - Vinícius Gonçalves Maltarollo
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Kathia Maria Honorio
- Escola de Artes, Ciências e Humanidades, Universidade de São Paulo (USP), São Paulo, Brazil.,Centro de Ciências Naturais e Humanas, Universidade Federal do ABC (UFABC), Santo André, Brazil
| |
Collapse
|
33
|
Koutsoukos S, Philippi F, Malaret F, Welton T. A review on machine learning algorithms for the ionic liquid chemical space. Chem Sci 2021; 12:6820-6843. [PMID: 34123314 PMCID: PMC8153233 DOI: 10.1039/d1sc01000j] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 04/28/2021] [Indexed: 01/05/2023] Open
Abstract
There are thousands of papers published every year investigating the properties and possible applications of ionic liquids. Industrial use of these exceptional fluids requires adequate understanding of their physical properties, in order to create the ionic liquid that will optimally suit the application. Computational property prediction arose from the urgent need to minimise the time and cost that would be required to experimentally test different combinations of ions. This review discusses the use of machine learning algorithms as property prediction tools for ionic liquids (either as standalone methods or in conjunction with molecular dynamics simulations), presents common problems of training datasets and proposes ways that could lead to more accurate and efficient models.
Collapse
Affiliation(s)
- Spyridon Koutsoukos
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London White City Campus London W12 0BZ UK
| | - Frederik Philippi
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London White City Campus London W12 0BZ UK
| | - Francisco Malaret
- Department of Chemical Engineering, Imperial College London South Kensington Campus London SW7 2AZ UK
| | - Tom Welton
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London White City Campus London W12 0BZ UK
| |
Collapse
|
34
|
Abstract
Toxicity analysis is a major challenge in drug design and discovery. Recently significant progress has been made through machine learning due to its accuracy, efficiency, and lower cost. US Toxicology in the 21st Century (Tox21) screened a large library of compounds, including approximately 12 000 environmental chemicals and drugs, for different mechanisms responsible for eliciting toxic effects. The Tox21 Data Challenge offered a platform to evaluate different computational methods for toxicity predictions. Inspired by the success of multiscale weighted colored graph (MWCG) theory in protein-ligand binding affinity predictions, we consider MWCG theory for toxicity analysis. In the present work, we develop a geometric graph learning toxicity (GGL-Tox) model by integrating MWCG features and the gradient boosting decision tree (GBDT) algorithm. The benchmark tests of the Tox21 Data Challenge are employed to demonstrate the utility and usefulness of the proposed GGL-Tox model. An extensive comparison with other state-of-the-art models indicates that GGL-Tox is an accurate and efficient model for toxicity analysis and prediction.
Collapse
Affiliation(s)
- Jian Jiang
- Research Center of Nonlinear Science, College of Mathematics and Computer Science, Engineering Research Center of Hubei Province for Clothing Information, Wuhan Textile University, Wuhan 430200, P R. China
| | - Rui Wang
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
35
|
Kumar Das J, Tradigo G, Veltri P, H Guzzi P, Roy S. Data science in unveiling COVID-19 pathogenesis and diagnosis: evolutionary origin to drug repurposing. Brief Bioinform 2021; 22:855-872. [PMID: 33592108 PMCID: PMC7929414 DOI: 10.1093/bib/bbaa420] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 11/09/2020] [Accepted: 12/19/2020] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION The outbreak of novel severe acute respiratory syndrome coronavirus (SARS-CoV-2, also known as COVID-19) in Wuhan has attracted worldwide attention. SARS-CoV-2 causes severe inflammation, which can be fatal. Consequently, there has been a massive and rapid growth in research aimed at throwing light on the mechanisms of infection and the progression of the disease. With regard to this data science is playing a pivotal role in in silico analysis to gain insights into SARS-CoV-2 and the outbreak of COVID-19 in order to forecast, diagnose and come up with a drug to tackle the virus. The availability of large multiomics, radiological, bio-molecular and medical datasets requires the development of novel exploratory and predictive models, or the customisation of existing ones in order to fit the current problem. The high number of approaches generates the need for surveys to guide data scientists and medical practitioners in selecting the right tools to manage their clinical data. RESULTS Focusing on data science methodologies, we conduct a detailed study on the state-of-the-art of works tackling the current pandemic scenario. We consider various current COVID-19 data analytic domains such as phylogenetic analysis, SARS-CoV-2 genome identification, protein structure prediction, host-viral protein interactomics, clinical imaging, epidemiological research and drug discovery. We highlight data types and instances, their generation pipelines and the data science models currently in use. The current study should give a detailed sketch of the road map towards handling COVID-19 like situations by leveraging data science experts in choosing the right tools. We also summarise our review focusing on prime challenges and possible future research directions. CONTACT hguzzi@unicz.it, sroy01@cus.ac.in.
Collapse
Affiliation(s)
- Jayanta Kumar Das
- Department of Pediatrics, School of Medicine, Johns Hopkins University, Maryland, USA
| | - Giuseppe Tradigo
- eCampus University, Via Isimbardi 10, 22060 Novedrate, CO, Italy
| | - Pierangelo Veltri
- Department of Surgical and Medical Sciences, Magna Graecia University, Catanzaro, 88100, Italy
| | - Pietro H Guzzi
- Department of Surgical and Medical Sciences, Magna Graecia University, Catanzaro, 88100, Italy
| | - Swarup Roy
- Network Reconstruction & Analysis (NetRA) Lab, Department of Computer Applications, Sikkim University, Gangtok, India
| |
Collapse
|
36
|
Alafif T, Tehame AM, Bajaba S, Barnawi A, Zia S. Machine and Deep Learning towards COVID-19 Diagnosis and Treatment: Survey, Challenges, and Future Directions. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:1117. [PMID: 33513984 PMCID: PMC7908539 DOI: 10.3390/ijerph18031117] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 01/16/2021] [Accepted: 01/17/2021] [Indexed: 12/13/2022]
Abstract
With many successful stories, machine learning (ML) and deep learning (DL) have been widely used in our everyday lives in a number of ways. They have also been instrumental in tackling the outbreak of Coronavirus (COVID-19), which has been happening around the world. The SARS-CoV-2 virus-induced COVID-19 epidemic has spread rapidly across the world, leading to international outbreaks. The COVID-19 fight to curb the spread of the disease involves most states, companies, and scientific research institutions. In this research, we look at the Artificial Intelligence (AI)-based ML and DL methods for COVID-19 diagnosis and treatment. Furthermore, in the battle against COVID-19, we summarize the AI-based ML and DL methods and the available datasets, tools, and performance. This survey offers a detailed overview of the existing state-of-the-art methodologies for ML and DL researchers and the wider health community with descriptions of how ML and DL and data can improve the status of COVID-19, and more studies in order to avoid the outbreak of COVID-19. Details of challenges and future directions are also provided.
Collapse
Affiliation(s)
- Tarik Alafif
- Computer Science Department, Jamoum University College, Umm Al-Qura University, Jamoum 25375, Saudi Arabia
| | - Abdul Muneeim Tehame
- Department of Software Engineering, Sir Syed University of Engineering and Technology, Karachi 75300, Pakistan;
| | - Saleh Bajaba
- Business Administration Department, King Abdulaziz University, Jeddah 21589, Saudi Arabia;
| | - Ahmed Barnawi
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia;
| | - Saad Zia
- IT Department, Jeddah Cable Company, Jeddah 31248, Saudi Arabia;
| |
Collapse
|
37
|
Matsumura K. Skin sensitizer classification using dual-input machine learning model. CHEM-BIO INFORMATICS JOURNAL 2020. [DOI: 10.1273/cbij.20.54] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|