1
|
Xuan P, Wu S, Cui H, Li P, Nakaguchi T, Zhang T. Interactive multi-hypergraph inferring and channel-enhanced and attribute-enhanced learning for drug-related side effect prediction. Comput Biol Med 2025; 184:109321. [PMID: 39522133 DOI: 10.1016/j.compbiomed.2024.109321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 10/15/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024]
Abstract
Identifying the potential side effects for the interested drugs can help reduce harm to patients caused by drugs in clinical use and decrease the risk of drug development failure. Multiple functionally similar drugs often have multiple similar side effects, resulting in the closed relationships among these nodes. However, most of previous methods did not completely encode the features from the biological perspective to mine the complex associations between the drugs and side effects. A prediction model based on interactive multi-hypergraph inferring and channel-enhanced and attribute-enhanced learning, ICAL, was proposed to fuse the global correlations reflected by multiple hypergraphs and to learn the attributes of a pair of drug and side effect nodes enhanced by the channels and attributes. First, we designed a hypergraph architecture where a hyperedge reflects the complex correlations between a single drug (side effect) and all the drugs and side effects, and the entire hypergraph composed of the hyperedges reveals the global correlations of all the drugs and side effects. Two hypergraphs were established based on two types of drug similarities, and each hypergraph implies its specific complex relationships among multiple drugs and side effects. Second, we proposed an interactive hypergraph neural network to enable the learning of global correlation features of drugs and side effects from the two hypergraphs. It propagated the node features across multiple hypergraphs and encoded the context relationships within these hypergraphs. Besides, the attentions at the channel level and at the attribute level were proposed to integrate the semantic correlations among multiple channels and to encode the long-distance dependence within the attributes of a pair of drug and side effect. The experimental results based on cross-validation showed that our new model outperformed seven advanced prediction methods in terms of AUC, AUPR, and recall rates for the top-ranked candidates. The ablation studies showed the effectiveness of global correlation learning, node feature propagation across multiple hypergraphs, and channel and attribute enhanced pairwise attribute learning. The case studies on the candidate side effects related to five drugs further demonstrated ICAL's effective application in discovering the reliable candidates.
Collapse
Affiliation(s)
- Ping Xuan
- Department of Computer Science and Technology, Shantou University, Shantou, China; School of Cyberspace Security, Hainan University, Haikou, China
| | - Shien Wu
- Department of Computer Science and Technology, Shantou University, Shantou, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Australia; Australian Centre for AI in Medical Innovation, La Trobe University, Australia
| | - Peiru Li
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | | | - Tiangang Zhang
- School of Cyberspace Security, Hainan University, Haikou, China.
| |
Collapse
|
2
|
Xu K, Wang M, Zou X, Liu J, Wei A, Chen J, Tang C. HSTrans: Homogeneous substructures transformer for predicting frequencies of drug-side effects. Neural Netw 2025; 181:106779. [PMID: 39488108 DOI: 10.1016/j.neunet.2024.106779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 08/29/2024] [Accepted: 10/01/2024] [Indexed: 11/04/2024]
Abstract
Identifying the frequencies of drug-side effects is crucial for assessing drug risk-benefit. However, accurately determining these frequencies remains challenging due to the limitations of time and scale in clinical randomized controlled trials. As a result, several computational methods have been proposed to address these issues. Nonetheless, two primary problems still persist. Firstly, most of these methods face challenges in generating accurate predictions for novel drugs, as they heavily depend on the interaction graph between drugs and side effects (SEs) within their modeling framework. Secondly, some previous methods often simply concatenate the features of drugs and SEs, which fails to effectively capture their underlying association. In this work, we present HSTrans, a novel approach that treats drugs and SEs as sets of substructures, leveraging a transformer encoder for unified substructure embedding and incorporating an interaction module for association capture. Specifically, HSTrans extracts drug substructures through a specialized algorithm and identifies effective substructures for each SE by employing an indicator that measures the importance of each substructure and SE. Additionally, HSTrans applies convolutional neural network (CNN) in the interaction module to capture complex relationships between drugs and SEs. Experimental results on datasets from Galeano et al.'s study demonstrate that the proposed method outperforms other state-of-the-art approaches. The demo codes for HSTrans are available at https://github.com/Dtdtxuky/HSTrans/tree/master.
Collapse
Affiliation(s)
- Kaiyi Xu
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
| | - Minhui Wang
- Department of Pharmacy, Lianshui People's Hospital Affiliated to Kangda College of Nanjing Medical University, Huai'an 223300, China
| | - Xin Zou
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
| | - Jingjing Liu
- Department of Cardiac Surgery, Tianjin Chest Hospital, Tianjin 300222, China
| | - Ao Wei
- Department of Cardiology, Tianjin Chest Hospital, Tianjin 300222, China
| | - Jiajia Chen
- Department of Pharmacy, The Affiliated Huai'an Hospital of Xuzhou Medical University and The Second People's Hospital of Huai'an, Huai'an 223002, China.
| | - Chang Tang
- School of Computer Science, China University of Geosciences, Wuhan 430074, China.
| |
Collapse
|
3
|
Li G, Li S, Liang C, Xiao Q, Luo J. Drug repositioning based on residual attention network and free multiscale adversarial training. BMC Bioinformatics 2024; 25:261. [PMID: 39118000 PMCID: PMC11308596 DOI: 10.1186/s12859-024-05893-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 08/06/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Conducting traditional wet experiments to guide drug development is an expensive, time-consuming and risky process. Analyzing drug function and repositioning plays a key role in identifying new therapeutic potential of approved drugs and discovering therapeutic approaches for untreated diseases. Exploring drug-disease associations has far-reaching implications for identifying disease pathogenesis and treatment. However, reliable detection of drug-disease relationships via traditional methods is costly and slow. Therefore, investigations into computational methods for predicting drug-disease associations are currently needed. RESULTS This paper presents a novel drug-disease association prediction method, RAFGAE. First, RAFGAE integrates known associations between diseases and drugs into a bipartite network. Second, RAFGAE designs the Re_GAT framework, which includes multilayer graph attention networks (GATs) and two residual networks. The multilayer GATs are utilized for learning the node embeddings, which is achieved by aggregating information from multihop neighbors. The two residual networks are used to alleviate the deep network oversmoothing problem, and an attention mechanism is introduced to combine the node embeddings from different attention layers. Third, two graph autoencoders (GAEs) with collaborative training are constructed to simulate label propagation to predict potential associations. On this basis, free multiscale adversarial training (FMAT) is introduced. FMAT enhances node feature quality through small gradient adversarial perturbation iterations, improving the prediction performance. Finally, tenfold cross-validations on two benchmark datasets show that RAFGAE outperforms current methods. In addition, case studies have confirmed that RAFGAE can detect novel drug-disease associations. CONCLUSIONS The comprehensive experimental results validate the utility and accuracy of RAFGAE. We believe that this method may serve as an excellent predictor for identifying unobserved disease-drug associations.
Collapse
Affiliation(s)
- Guanghui Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China.
| | - Shuwen Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Qiu Xiao
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
| |
Collapse
|
4
|
Toni E, Ayatollahi H, Abbaszadeh R, Fotuhi Siahpirani A. Machine Learning Techniques for Predicting Drug-Related Side Effects: A Scoping Review. Pharmaceuticals (Basel) 2024; 17:795. [PMID: 38931462 PMCID: PMC11206653 DOI: 10.3390/ph17060795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 06/11/2024] [Accepted: 06/12/2024] [Indexed: 06/28/2024] Open
Abstract
BACKGROUND Drug safety relies on advanced methods for timely and accurate prediction of side effects. To tackle this requirement, this scoping review examines machine-learning approaches for predicting drug-related side effects with a particular focus on chemical, biological, and phenotypical features. METHODS This was a scoping review in which a comprehensive search was conducted in various databases from 1 January 2013 to 31 December 2023. RESULTS The results showed the widespread use of Random Forest, k-nearest neighbor, and support vector machine algorithms. Ensemble methods, particularly random forest, emphasized the significance of integrating chemical and biological features in predicting drug-related side effects. CONCLUSIONS This review article emphasized the significance of considering a variety of features, datasets, and machine learning algorithms for predicting drug-related side effects. Ensemble methods and Random Forest showed the best performance and combining chemical and biological features improved prediction. The results suggested that machine learning techniques have some potential to improve drug development and trials. Future work should focus on specific feature types, selection techniques, and graph-based methods for even better prediction.
Collapse
Affiliation(s)
- Esmaeel Toni
- Medical Informatics, Student Research Committee, Iran University of Medical Sciences, Tehran, Iran 14496-14535;
| | - Haleh Ayatollahi
- Medical Informatics, Health Management and Economics Research Center, Health Management Research Institute, Iran University of Medical Sciences, Tehran, Iran 1996-713883
| | - Reza Abbaszadeh
- Pediatric Cardiology, Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Sciences, Tehran, Iran 19956-14331;
| | - Alireza Fotuhi Siahpirani
- Systems Biology and Bioinformatics, Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran 14176-14411;
| |
Collapse
|
5
|
Li S, Zhang L, Wang L, Ji J, He J, Zheng X, Cao L, Li K. BiMPADR: A Deep Learning Framework for Predicting Adverse Drug Reactions in New Drugs. Molecules 2024; 29:1784. [PMID: 38675604 PMCID: PMC11051887 DOI: 10.3390/molecules29081784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 04/08/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024] Open
Abstract
Detecting the unintended adverse reactions of drugs (ADRs) is a crucial concern in pharmacological research. The experimental validation of drug-ADR associations often entails expensive and time-consuming investigations. Thus, a computational model to predict ADRs from known associations is essential for enhanced efficiency and cost-effectiveness. Here, we propose BiMPADR, a novel model that integrates drug gene expression into adverse reaction features using a message passing neural network on a bipartite graph of drugs and adverse reactions, leveraging publicly available data. By combining the computed adverse reaction features with the structural fingerprints of drugs, we predict the association between drugs and adverse reactions. Our models obtained high AUC (area under the receiver operating characteristic curve) values ranging from 0.861 to 0.907 in an external drug validation dataset under differential experiment conditions. The case study on multiple BET inhibitors also demonstrated the high accuracy of our predictions, and our model's exploration of potential adverse reactions for HWD-870 has contributed to its research and development for market approval. In summary, our method would provide a promising tool for ADR prediction and drug safety assessment in drug discovery and development.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Lei Cao
- Department of Biostatistics, School of Public Health, Harbin Medical University, Harbin 150081, China; (S.L.); (L.Z.); (L.W.); (J.J.); (J.H.); (X.Z.)
| | - Kang Li
- Department of Biostatistics, School of Public Health, Harbin Medical University, Harbin 150081, China; (S.L.); (L.Z.); (L.W.); (J.J.); (J.H.); (X.Z.)
| |
Collapse
|
6
|
Yu L, Xu Z, Qiu W, Xiao X. MSDSE: Predicting drug-side effects based on multi-scale features and deep multi-structure neural network. Comput Biol Med 2024; 169:107812. [PMID: 38091725 DOI: 10.1016/j.compbiomed.2023.107812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 11/10/2023] [Accepted: 12/03/2023] [Indexed: 02/08/2024]
Abstract
Unexpected side effects may accompany the research stage and post-marketing of drugs. These accidents lead to drug development failure and even endanger patients' health. Thus, it is essential to recognize the unknown drug-side effects. Most existing methods in silico find the answer from the association network or similarity network of drugs while ignoring the drug-intrinsic attributes. The limitation is that they can only handle drugs in the maturation stage. To be suitable for early drug-side effect screening, we conceive a multi-structural deep learning framework, MSDSE, which synthetically considers the multi-scale features derived from the drug. MSDSE can jointly learn SMILES sequence-based word embedding, substructure-based molecular fingerprint, and chemical structure-based graph embedding. In the preprocessing stage of MSDSE, we project all features to the abstract space with the same dimension. MSDSE builds a bi-level channel strategy, including a convolutional neural network module with an Inception structure and a multi-head Self-Attention module, to learn and integrate multi-modal features from local to global perspectives. Finally, MSDSE regards the prediction of drug-side effects as pair-wise learning and outputs the pair-wise probability of drug-side effects through the inner product operation. MSDSE is evaluated and analyzed on benchmark datasets and performs optimally compared to other baseline models. We also set up the ablation study to explain the rationality of the feature approach and model structure. Moreover, we select model partial prediction results for the case study to reveal actual capability. The original data are available at http://github.com/yuliyi/MSDSE.
Collapse
Affiliation(s)
- Liyi Yu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Zhaochun Xu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China.
| |
Collapse
|
7
|
Li D, Xiao Z, Sun H, Jiang X, Zhao W, Shen X. Prediction of Drug-Disease Associations Based on Multi-Kernel Deep Learning Method in Heterogeneous Graph Embedding. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:120-128. [PMID: 38051617 DOI: 10.1109/tcbb.2023.3339189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Computational drug repositioning can identify potential associations between drugs and diseases. This technology has been shown to be effective in accelerating drug development and reducing experimental costs. Although there has been plenty of research for this task, existing methods are deficient in utilizing complex relationships among biological entities, which may not be conducive to subsequent simulation of drug treatment processes. In this article, we propose a heterogeneous graph embedding method called HMLKGAT to infer novel potential drugs for diseases. More specifically, we first construct a heterogeneous information network by combining drug-disease, drug-protein and disease-protein biological networks. Then, a multi-layer graph attention model is utilized to capture the complex associations in the network to derive representations for drugs and diseases. Finally, to maintain the relationship of nodes in different feature spaces, we propose a multi-kernel learning method to transform and combine the representations. Experimental results demonstrate that HMLKGAT outperforms six state-of-the-art methods in drug-related disease prediction, and case studies of five classical drugs further demonstrate the effectiveness of HMLKGAT.
Collapse
|
8
|
Ding Y, Zhou H, Zou Q, Yuan L. Identification of drug-side effect association via correntropy-loss based matrix factorization with neural tangent kernel. Methods 2023; 219:73-81. [PMID: 37783242 DOI: 10.1016/j.ymeth.2023.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/18/2023] [Accepted: 09/20/2023] [Indexed: 10/04/2023] Open
Abstract
Adverse drug reactions include side effects, allergic reactions, and secondary infections. Severe adverse reactions can cause cancer, deformity, or mutation. The monitoring of drug side effects is an important support for post marketing safety supervision of drugs, and an important basis for revising drug instructions. Its purpose is to timely detect and control drug safety risks. Traditional methods are time-consuming. To accelerate the discovery of side effects, we propose a machine learning based method, called correntropy-loss based matrix factorization with neural tangent kernel (CLMF-NTK), to solve the prediction of drug side effects. Our method and other computational methods are tested on three benchmark datasets, and the results show that our method achieves the best predictive performance.
Collapse
Affiliation(s)
- Yijie Ding
- Key Laboratory of Computational Science and Application of Hainan Province, Hainan Normal University, Haikou 571158, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China; School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Hongmei Zhou
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, 100# Minjiang Main Road, Quzhou 324000, China.
| |
Collapse
|
9
|
Krix S, DeLong LN, Madan S, Domingo-Fernández D, Ahmad A, Gul S, Zaliani A, Fröhlich H. MultiGML: Multimodal graph machine learning for prediction of adverse drug events. Heliyon 2023; 9:e19441. [PMID: 37681175 PMCID: PMC10481305 DOI: 10.1016/j.heliyon.2023.e19441] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 08/22/2023] [Accepted: 08/23/2023] [Indexed: 09/09/2023] Open
Abstract
Adverse drug events constitute a major challenge for the success of clinical trials. Several computational strategies have been suggested to estimate the risk of adverse drug events in preclinical drug development. While these approaches have demonstrated high utility in practice, they are at the same time limited to specific information sources. Thus, many current computational approaches neglect a wealth of information which results from the integration of different data sources, such as biological protein function, gene expression, chemical compound structure, cell-based imaging and others. In this work we propose an integrative and explainable multi-modal Graph Machine Learning approach (MultiGML), which fuses knowledge graphs with multiple further data modalities to predict drug related adverse events and general drug target-phenotype associations. MultiGML demonstrates excellent prediction performance compared to alternative algorithms, including various traditional knowledge graph embedding techniques. MultiGML distinguishes itself from alternative techniques by providing in-depth explanations of model predictions, which point towards biological mechanisms associated with predictions of an adverse drug event. Hence, MultiGML could be a versatile tool to support decision making in preclinical drug development.
Collapse
Affiliation(s)
- Sophia Krix
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115, Bonn, Germany
- Fraunhofer Center for Machine Learning, Germany
| | - Lauren Nicole DeLong
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
- Artificial Intelligence and its Applications Institute, School of Informatics, University of Edinburgh, 10 Crichton Street, EH8 9AB, UK
| | - Sumit Madan
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
- Department of Computer Science, University of Bonn, 53115, Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
- Fraunhofer Center for Machine Learning, Germany
- Enveda Biosciences, Boulder, CO, 80301, USA
| | - Ashar Ahmad
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115, Bonn, Germany
- Grunenthal GmbH, 52099, Aachen, Germany
| | - Sheraz Gul
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Schnackenburgallee 114, 22525, Hamburg, Germany
- Fraunhofer Cluster of Excellence for Immune-Mediated Diseases CIMD, Schnackenburgallee 114, 22525, Hamburg, Germany
| | - Andrea Zaliani
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Schnackenburgallee 114, 22525, Hamburg, Germany
- Fraunhofer Cluster of Excellence for Immune-Mediated Diseases CIMD, Schnackenburgallee 114, 22525, Hamburg, Germany
| | - Holger Fröhlich
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115, Bonn, Germany
| |
Collapse
|
10
|
Qu J, Song Z, Cheng X, Jiang Z, Zhou J. Neighborhood-based inference and restricted Boltzmann machine for small molecule-miRNA associations prediction. PeerJ 2023; 11:e15889. [PMID: 37641598 PMCID: PMC10460564 DOI: 10.7717/peerj.15889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 07/21/2023] [Indexed: 08/31/2023] Open
Abstract
Background A growing number of experiments have shown that microRNAs (miRNAs) can be used as target of small molecules (SMs) to regulate gene expression for treating diseases. Therefore, identifying SM-related miRNAs is helpful for the treatment of diseases in the domain of medical investigation. Methods This article presents a new computational model, called NIRBMSMMA (neighborhood-based inference (NI) and restricted Boltzmann machine (RBM)), which we developed to identify potential small molecule-miRNA associations (NIRBMSMMA). First, grounded on known SM-miRNAs associations, SM similarity and miRNA similarity, NI was used to predict score of an unknown SM-miRNA pair by reckoning the sum of known associations between neighbors of the SM (miRNA) and the miRNA (SM). Second, utilizing a two-layered generative stochastic artificial neural network, RBM was used to predict SM-miRNA association by learning potential probability distribution from known SM-miRNA associations. At last, an ensemble learning model was conducted to combine NI and RBM for identifying potential SM-miRNA associations. Results Furthermore, we conducted global leave one out cross validation (LOOCV), miRNA-fixed LOOCV, SM-fixed LOOCV and five-fold cross validation to assess performance of NIRBMSMMA based on three datasets. Results showed that NIRBMSMMA obtained areas under the curve (AUC) of 0.9912, 0.9875, 0.8376 and 0.9898 ± 0.0009 under global LOOCV, miRNA-fixed LOOCV, SM-fixed LOOCV and five-fold cross validation based on dataset 1, respectively. For dataset 2, the AUCs are 0.8645, 0.8720, 0.7066 and 0.8547 ± 0.0046 in turn. For dataset 3, the AUCs are 0.9884, 0.9802, 0.8239 and 0.9870 ± 0.0015 in turn. Also, we conducted case studies to further assess the predictive performance of NIRBMSMMA. These results illustrated the proposed model is a useful tool in predicting potential SM-miRNA associations.
Collapse
Affiliation(s)
- Jia Qu
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu, China
| | - Zihao Song
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu, China
| | - Xiaolong Cheng
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu, China
| | - Zhibin Jiang
- Department of Computer Science and Engineering, Shaoxing University, Shaoxing, Zhejiang, China
| | - Jie Zhou
- Department of Computer Science and Engineering, Shaoxing University, Shaoxing, Zhejiang, China
| |
Collapse
|
11
|
Medvedeva A, Teimouri H, Kolomeisky AB. Predicting Antimicrobial Activity for Untested Peptide-Based Drugs Using Collaborative Filtering and Link Prediction. J Chem Inf Model 2023. [PMID: 37307501 DOI: 10.1021/acs.jcim.3c00137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The increase of bacterial resistance to currently available antibiotics has underlined the urgent need to develop new antibiotic drugs. Antimicrobial peptides (AMPs), alone or in combination with other peptides and/or existing antibiotics, have emerged as promising candidates for this task. However, given that there are thousands of known AMPs and an even larger number can be synthesized, it is impossible to comprehensively test all of them using standard wet lab experimental methods. These observations stimulated an application of machine-learning methods to identify promising AMPs. Currently, machine learning studies combine very different bacteria without considering bacteria-specific features or interactions with AMPs. In addition, the sparsity of current AMP data sets disqualifies the application of traditional machine-learning methods or makes the results unreliable. Here, we present a new approach, featuring neighborhood-based collaborative filtering, to predict with high accuracy a given bacteria's response to untested AMPs based on similarities between bacterial responses. Furthermore, we also developed a complementary bacteria-specific link prediction approach that can be used to visualize networks of AMP-antibiotic combinations, enabling us to propose new combinations that are likely to be effective.
Collapse
Affiliation(s)
- Angela Medvedeva
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Hamid Teimouri
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Anatoly B Kolomeisky
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas 77005, United States
- Department of Physics and Astronomy, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
12
|
Venkatachala Appa Swamy M, Periyasamy J, Thangavel M, Khan SB, Almusharraf A, Santhanam P, Ramaraj V, Elsisi M. Design and Development of IoT and Deep Ensemble Learning Based Model for Disease Monitoring and Prediction. Diagnostics (Basel) 2023; 13:diagnostics13111942. [PMID: 37296794 DOI: 10.3390/diagnostics13111942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 05/04/2023] [Accepted: 05/11/2023] [Indexed: 06/12/2023] Open
Abstract
With the rapidly increasing reliance on advances in IoT, we persist towards pushing technology to new heights. From ordering food online to gene editing-based personalized healthcare, disruptive technologies like ML and AI continue to grow beyond our wildest dreams. Early detection and treatment through AI-assisted diagnostic models have outperformed human intelligence. In many cases, these tools can act upon the structured data containing probable symptoms, offer medication schedules based on the appropriate code related to diagnosis conventions, and predict adverse drug effects, if any, in accordance with medications. Utilizing AI and IoT in healthcare has facilitated innumerable benefits like minimizing cost, reducing hospital-obtained infections, decreasing mortality and morbidity etc. DL algorithms have opened up several frontiers by contributing towards healthcare opportunities through their ability to understand and learn from different levels of demonstration and generalization, which is significant in data analysis and interpretation. In contrast to ML which relies more on structured, labeled data and domain expertise to facilitate feature extractions, DL employs human-like cognitive abilities to extract hidden relationships and patterns from uncategorized data. Through the efficient application of DL techniques on the medical dataset, precise prediction, and classification of infectious/rare diseases, avoiding surgeries that can be preventable, minimization of over-dosage of harmful contrast agents for scans and biopsies can be reduced to a greater extent in future. Our study is focused on deploying ensemble deep learning algorithms and IoT devices to design and develop a diagnostic model that can effectively analyze medical Big Data and diagnose diseases by identifying abnormalities in early stages through medical images provided as input. This AI-assisted diagnostic model based on Ensemble Deep learning aims to be a valuable tool for healthcare systems and patients through its ability to diagnose diseases in the initial stages and present valuable insights to facilitate personalized treatment by aggregating the prediction of each base model and generating a final prediction.
Collapse
Affiliation(s)
| | - Jayalakshmi Periyasamy
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Muthamilselvan Thangavel
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Surbhi B Khan
- Department of Electrical and Computer Engineering, Lebanese American University, Byblos 13-5053, Lebanon
- Department of Data Science, School of Science, Engineering and Environment, University of Sanford, Manchester M5 4WT, UK
| | - Ahlam Almusharraf
- Department of Business Administration, College of Business and Administration, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Prasanna Santhanam
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Vijayan Ramaraj
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Mahmoud Elsisi
- Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung City 807618, Taiwan
- Department of Electrical Engineering, Faculty of Engineering (Shoubra), Benha University, 108 Shoubra St., Cairo P.O. Box 11241, Egypt
| |
Collapse
|
13
|
Hu J, Yu W, Pang C, Jin J, Pham NT, Manavalan B, Wei L. DrugormerDTI: Drug Graphormer for drug-target interaction prediction. Comput Biol Med 2023; 161:106946. [PMID: 37244151 DOI: 10.1016/j.compbiomed.2023.106946] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 03/29/2023] [Accepted: 04/15/2023] [Indexed: 05/29/2023]
Abstract
Drug-target interactions (DTI) prediction is a crucial task in drug discovery. Existing computational methods accelerate the drug discovery in this respect. However, most of them suffer from low feature representation ability, significantly affecting the predictive performance. To address the problem, we propose a novel neural network architecture named DrugormerDTI, which uses Graph Transformer to learn both sequential and topological information through the input molecule graph and Resudual2vec to learn the underlying relation between residues from proteins. By conducting ablation experiments, we verify the importance of each part of the DrugormerDTI. We also demonstrate the good feature extraction and expression capabilities of our model via comparing the mapping results of the attention layer and molecular docking results. Experimental results show that our proposed model performs better than baseline methods on four benchmarks. We demonstrate that the introduction of Graph Transformer and the design of residue are appropriate for drug-target prediction.
Collapse
Affiliation(s)
- Jiayue Hu
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Wang Yu
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Chao Pang
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Junru Jin
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Nhat Truong Pham
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, South Korea
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, South Korea.
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.
| |
Collapse
|
14
|
Shen Y, Zhu J, Deng Z, Lu W, Wang H. EnsDeepDP: An Ensemble Deep Learning Approach for Disease Prediction Through Metagenomics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:986-998. [PMID: 36001521 DOI: 10.1109/tcbb.2022.3201295] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
A growing number of studies show that the human microbiome plays a vital role in human health and can be a crucial factor in predicting certain human diseases. However, microbiome data are often characterized by the limited samples and high-dimensional features, which pose a great challenge for machine learning methods. Therefore, this paper proposes a novel ensemble deep learning disease prediction method that combines unsupervised and supervised learning paradigms. First, unsupervised deep learning methods are used to learn the potential representation of the sample. Afterwards, the disease scoring strategy is developed based on the deep representations as the informative features for ensemble analysis. To ensure the optimal ensemble, a score selection mechanism is constructed, and performance boosting features are engaged with the original sample. Finally, the composite features are trained with gradient boosting classifier for health status decision. For case study, the ensemble deep learning flowchart has been demonstrated on six public datasets extracted from the human microbiome profiling. The results show that compared with the existing algorithms, our framework achieves better performance on disease prediction.
Collapse
|
15
|
Das P, Mazumder DH. An extensive survey on the use of supervised machine learning techniques in the past two decades for prediction of drug side effects. Artif Intell Rev 2023; 56:1-28. [PMID: 36819660 PMCID: PMC9930028 DOI: 10.1007/s10462-023-10413-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/01/2023] [Indexed: 02/19/2023]
Abstract
Approved drugs for sale must be effective and safe, implying that the drug's advantages outweigh its known harmful side effects. Side effects (SE) of drugs are one of the common reasons for drug failure that may halt the whole drug discovery pipeline. The side effects might vary from minor concerns like a runny nose to potentially life-threatening issues like liver damage, heart attack, and death. Therefore, predicting the side effects of the drug is vital in drug development, discovery, and design. Supervised machine learning-based side effects prediction task has recently received much attention since it reduces time, chemical waste, design complexity, risk of failure, and cost. The advancement of supervised learning approaches for predicting side effects have emerged as essential computational tools. Supervised machine learning technique provides early information on drug side effects to develop an effective drug based on drug properties. Still, there are several challenges to predicting drug side effects. Thus, a near-exhaustive survey is carried out in this paper on the use of supervised machine learning approaches employed in drug side effects prediction tasks in the past two decades. In addition, this paper also summarized the drug descriptor required for the side effects prediction task, commonly utilized drug properties sources, computational models, and their performances. Finally, the research gap, open problems, and challenges for the further supervised learning-based side effects prediction task have been discussed.
Collapse
Affiliation(s)
- Pranab Das
- Department of Computer Science and Engineering, National Institute of Technology Nagaland, Chumukedima, Dimapur, Nagaland 797103 India
| | - Dilwar Hussain Mazumder
- Department of Computer Science and Engineering, National Institute of Technology Nagaland, Chumukedima, Dimapur, Nagaland 797103 India
| |
Collapse
|
16
|
Liu J, Lei X, Zhang Y, Pan Y. The prediction of molecular toxicity based on BiGRU and GraphSAGE. Comput Biol Med 2023; 153:106524. [PMID: 36623439 DOI: 10.1016/j.compbiomed.2022.106524] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 12/10/2022] [Accepted: 12/31/2022] [Indexed: 01/04/2023]
Abstract
The prediction of molecules toxicity properties plays an crucial role in the realm of the drug discovery, since it can swiftly screen out the expected drug moleculars. The conventional method for predicting toxicity is to use some in vivo or in vitro biological experiments in the laboratory, which can easily pose a threat significant time and financial waste and even ethical issues. Therefore, using computational approaches to predict molecular toxicity has become a common strategy in modern drug discovery. In this article, we propose a novel model named MTBG, which primarily makes use of both SMILES (Simplified molecular input line entry system) strings and graph structures of molecules to extract drug molecular feature in the field of drug molecular toxicity prediction. To verify the performance of the MTBG model, we opt the Tox21 dataset and several widely used baseline models. Experimental results demonstrate that our model can perform better than these baseline models.
Collapse
Affiliation(s)
- Jianping Liu
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| | - Yuchen Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China
| | - Yi Pan
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| |
Collapse
|
17
|
Mohammed A, Kora R. A Comprehensive Review on Ensemble Deep Learning: Opportunities and Challenges. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2023. [DOI: 10.1016/j.jksuci.2023.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
18
|
Chen YH, Shih YT, Chien CS, Tsai CS. Predicting adverse drug effects: A heterogeneous graph convolution network with a multi-layer perceptron approach. PLoS One 2022; 17:e0266435. [PMID: 36516131 PMCID: PMC9750037 DOI: 10.1371/journal.pone.0266435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Accepted: 11/19/2022] [Indexed: 12/15/2022] Open
Abstract
We apply a heterogeneous graph convolution network (GCN) combined with a multi-layer perceptron (MLP) denoted by GCNMLP to explore the potential side effects of drugs. Here the SIDER, OFFSIDERS, and FAERS are used as the datasets. We integrate the drug information with similar characteristics from the datasets of known drugs and side effect networks. The heterogeneous graph networks explore the potential side effects of drugs by inferring the relationship between similar drugs and related side effects. This novel in silico method will shorten the time spent in uncovering the unseen side effects within routine drug prescriptions while highlighting the relevance of exploring drug mechanisms from well-documented drugs. In our experiments, we inquire about the drugs Vancomycin, Amlodipine, Cisplatin, and Glimepiride from a trained model, where the parameters are acquired from the dataset SIDER after training. Our results show that the performance of the GCNMLP on these three datasets is superior to the non-negative matrix factorization method (NMF) and some well-known machine learning methods with respect to various evaluation scales. Moreover, new side effects of drugs can be obtained using the GCNMLP.
Collapse
Affiliation(s)
- Y.-H. Chen
- Dept. of Nephrology, Taichung Tzu Chi Hospital, Taichung, Taiwan
- School of Medicine, Tzu Chi University, Hualien, Taiwan
| | - Y.-T. Shih
- Dept. of Applied Mathematics, National Chung Hsing University, Taichung, Taiwan
| | - C.-S. Chien
- Dept. of Applied Mathematics, National Chung Hsing University, Taichung, Taiwan
| | - C.-S. Tsai
- Dept. of Management of Information Systems, National Chung Hsing University, Taichung, Taiwan
| |
Collapse
|
19
|
Qian Y, Ding Y, Zou Q, Guo F. Identification of drug-side effect association via restricted Boltzmann machines with penalized term. Brief Bioinform 2022; 23:6762741. [PMID: 36259601 DOI: 10.1093/bib/bbac458] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 09/09/2022] [Accepted: 09/25/2022] [Indexed: 12/14/2022] Open
Abstract
In the entire life cycle of drug development, the side effect is one of the major failure factors. Severe side effects of drugs that go undetected until the post-marketing stage leads to around two million patient morbidities every year in the United States. Therefore, there is an urgent need for a method to predict side effects of approved drugs and new drugs. Following this need, we present a new predictor for finding side effects of drugs. Firstly, multiple similarity matrices are constructed based on the association profile feature and drug chemical structure information. Secondly, these similarity matrices are integrated by Centered Kernel Alignment-based Multiple Kernel Learning algorithm. Then, Weighted K nearest known neighbors is utilized to complement the adjacency matrix. Next, we construct Restricted Boltzmann machines (RBM) in drug space and side effect space, respectively, and apply a penalized maximum likelihood approach to train model. At last, the average decision rule was adopted to integrate predictions from RBMs. Comparison results and case studies demonstrate, with four benchmark datasets, that our method can give a more accurate and reliable prediction result.
Collapse
Affiliation(s)
- Yuqing Qian
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, PR China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, PR China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha 410083, PR China
| |
Collapse
|
20
|
Cheng X, Qu J, Song S, Bian Z. Neighborhood-based inference and restricted Boltzmann machine for microbe and drug associations prediction. PeerJ 2022; 10:e13848. [PMID: 35990901 PMCID: PMC9387521 DOI: 10.7717/peerj.13848] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 07/14/2022] [Indexed: 01/18/2023] Open
Abstract
Background Efficient identification of microbe-drug associations is critical for drug development and solving problem of antimicrobial resistance. Traditional wet-lab method requires a lot of money and labor in identifying potential microbe-drug associations. With development of machine learning and publication of large amounts of biological data, computational methods become feasible. Methods In this article, we proposed a computational model of neighborhood-based inference (NI) and restricted Boltzmann machine (RBM) to predict potential microbe-drug association (NIRBMMDA) by using integrated microbe similarity, integrated drug similarity and known microbe-drug associations. First, NI was used to obtain a score matrix of potential microbe-drug associations by using different thresholds to find similar neighbors for drug or microbe. Second, RBM was employed to obtain another score matrix of potential microbe-drug associations based on contrastive divergence algorithm and sigmoid function. Because generalization ability of individual method is poor, we used an ensemble learning to integrate two score matrices for predicting potential microbe-drug associations more accurately. In particular, NI can fully utilize similar (neighbor) information of drug or microbe and RBM can learn potential probability distribution hid in known microbe-drug associations. Moreover, ensemble learning was used to integrate individual predictor for obtaining a stronger predictor. Results In global leave-one-out cross validation (LOOCV), NIRBMMDA gained the area under the receiver operating characteristics curve (AUC) of 0.8666, 0.9413 and 0.9557 for datasets of DrugVirus, MDAD and aBiofilm, respectively. In local LOOCV, AUCs of 0.8512, 0.9204 and 0.9414 were obtained for NIRBMMDA based on datasets of DrugVirus, MDAD and aBiofilm, respectively. For five-fold cross validation, NIRBMMDA acquired AUC and standard deviation of 0.8569 ± -0.0027, 0.9248 ± -0.0014 and 0.9369 ± -0.0020 on the basis of datasets of DrugVirus, MDAD and aBiofilm, respectively. Moreover, case study for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) showed that 13 out of the top 20 predicted drugs were verified by searching literature. The other two case studies indicated that 17 and 17 out of the top 20 predicted microbes for the drug of ciprofloxacin and minocycline were confirmed by identifying published literature, respectively.
Collapse
Affiliation(s)
- Xiaolong Cheng
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu, China
| | - Jia Qu
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu, China
| | - Shuangbao Song
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu, China
| | - Zekang Bian
- School of AI & Computer Science, Jiangnan University, Wuxi, Jiangsu, China
| |
Collapse
|
21
|
Zheng J, Xiao X, Qiu WR. DTI-BERT: Identifying Drug-Target Interactions in Cellular Networking Based on BERT and Deep Learning Method. Front Genet 2022; 13:859188. [PMID: 35754843 PMCID: PMC9213727 DOI: 10.3389/fgene.2022.859188] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 04/25/2022] [Indexed: 11/20/2022] Open
Abstract
Drug–target interactions (DTIs) are regarded as an essential part of genomic drug discovery, and computational prediction of DTIs can accelerate to find the lead drug for the target, which can make up for the lack of time-consuming and expensive wet-lab techniques. Currently, many computational methods predict DTIs based on sequential composition or physicochemical properties of drug and target, but further efforts are needed to improve them. In this article, we proposed a new sequence-based method for accurately identifying DTIs. For target protein, we explore using pre-trained Bidirectional Encoder Representations from Transformers (BERT) to extract sequence features, which can provide unique and valuable pattern information. For drug molecules, Discrete Wavelet Transform (DWT) is employed to generate information from drug molecular fingerprints. Then we concatenate the feature vectors of the DTIs, and input them into a feature extraction module consisting of a batch-norm layer, rectified linear activation layer and linear layer, called BRL block and a Convolutional Neural Networks module to extract DTIs features further. Subsequently, a BRL block is used as the prediction engine. After optimizing the model based on contrastive loss and cross-entropy loss, it gave prediction accuracies of the target families of G Protein-coupled receptors, ion channels, enzymes, and nuclear receptors up to 90.1, 94.7, 94.9, and 89%, which indicated that the proposed method can outperform the existing predictors. To make it as convenient as possible for researchers, the web server for the new predictor is freely accessible at: https://bioinfo.jcu.edu.cn/dtibert or http://121.36.221.79/dtibert/. The proposed method may also be a potential option for other DITs.
Collapse
Affiliation(s)
- Jie Zheng
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| | - Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| | - Wang-Ren Qiu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| |
Collapse
|
22
|
Zhao Y, Yu Y, Wang H, Li Y, Deng Y, Jiang G, Luo Y. Machine Learning in Causal Inference: Application in Pharmacovigilance. Drug Saf 2022; 45:459-476. [PMID: 35579811 PMCID: PMC9114053 DOI: 10.1007/s40264-022-01155-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/09/2022] [Indexed: 01/28/2023]
Abstract
Monitoring adverse drug events or pharmacovigilance has been promoted by the World Health Organization to assure the safety of medicines through a timely and reliable information exchange regarding drug safety issues. We aim to discuss the application of machine learning methods as well as causal inference paradigms in pharmacovigilance. We first reviewed data sources for pharmacovigilance. Then, we examined traditional causal inference paradigms, their applications in pharmacovigilance, and how machine learning methods and causal inference paradigms were integrated to enhance the performance of traditional causal inference paradigms. Finally, we summarized issues with currently mainstream correlation-based machine learning models and how the machine learning community has tried to address these issues by incorporating causal inference paradigms. Our literature search revealed that most existing data sources and tasks for pharmacovigilance were not designed for causal inference. Additionally, pharmacovigilance was lagging in adopting machine learning-causal inference integrated models. We highlight several currently trending directions or gaps to integrate causal inference with machine learning in pharmacovigilance research. Finally, our literature search revealed that the adoption of causal paradigms can mitigate known issues with machine learning models. We foresee that the pharmacovigilance domain can benefit from the progress in the machine learning field.
Collapse
Affiliation(s)
- Yiqing Zhao
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, 750 N Lake Shore Drive, Room 11-189, Chicago, IL, 60611, USA
| | - Yue Yu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, 55902, USA
| | - Hanyin Wang
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, 750 N Lake Shore Drive, Room 11-189, Chicago, IL, 60611, USA
| | - Yikuan Li
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, 750 N Lake Shore Drive, Room 11-189, Chicago, IL, 60611, USA
| | - Yu Deng
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, 750 N Lake Shore Drive, Room 11-189, Chicago, IL, 60611, USA
| | - Guoqian Jiang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, 55902, USA
| | - Yuan Luo
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, 750 N Lake Shore Drive, Room 11-189, Chicago, IL, 60611, USA.
| |
Collapse
|
23
|
Yu L, Qiu W, Lin W, Cheng X, Xiao X, Dai J. HGDTI: predicting drug-target interaction by using information aggregation based on heterogeneous graph neural network. BMC Bioinformatics 2022; 23:126. [PMID: 35413800 PMCID: PMC9004085 DOI: 10.1186/s12859-022-04655-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 03/28/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In research on new drug discovery, the traditional wet experiment has a long period. Predicting drug-target interaction (DTI) in silico can greatly narrow the scope of search of candidate medications. Excellent algorithm model may be more effective in revealing the potential connection between drug and target in the bioinformatics network composed of drugs, proteins and other related data. RESULTS In this work, we have developed a heterogeneous graph neural network model, named as HGDTI, which includes a learning phase of network node embedding and a training phase of DTI classification. This method first obtains the molecular fingerprint information of drugs and the pseudo amino acid composition information of proteins, then extracts the initial features of nodes through Bi-LSTM, and uses the attention mechanism to aggregate heterogeneous neighbors. In several comparative experiments, the overall performance of HGDTI significantly outperforms other state-of-the-art DTI prediction models, and the negative sampling technology is employed to further optimize the prediction power of model. In addition, we have proved the robustness of HGDTI through heterogeneous network content reduction tests, and proved the rationality of HGDTI through other comparative experiments. These results indicate that HGDTI can utilize heterogeneous information to capture the embedding of drugs and targets, and provide assistance for drug development. CONCLUSIONS The HGDTI based on heterogeneous graph neural network model, can utilize heterogeneous information to capture the embedding of drugs and targets, and provide assistance for drug development. For the convenience of related researchers, a user-friendly web-server has been established at http://bioinfo.jcu.edu.cn/hgdti .
Collapse
Affiliation(s)
- Liyi Yu
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Weizhong Lin
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xiang Cheng
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China.
| | - Jiexia Dai
- School of Foreign Languages, Jingdezhen University, Jingdezhen, China
| |
Collapse
|
24
|
Granda Morales LF, Valdiviezo-Diaz P, Reátegui R, Barba-Guaman L. Drug Recommendation System for Diabetes using Collaborative Filtering and Clustering Techniques (Preprint). J Med Internet Res 2022; 24:e37233. [PMID: 35838763 PMCID: PMC9338420 DOI: 10.2196/37233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 04/18/2022] [Accepted: 05/29/2022] [Indexed: 11/13/2022] Open
Affiliation(s)
| | - Priscila Valdiviezo-Diaz
- Departamento de Ciencias de la Computación y Electrónica, Universidad Técnica Particular de Loja, Loja, Ecuador
| | - Ruth Reátegui
- Departamento de Ciencias de la Computación y Electrónica, Universidad Técnica Particular de Loja, Loja, Ecuador
| | - Luis Barba-Guaman
- Departamento de Ciencias de la Computación y Electrónica, Universidad Técnica Particular de Loja, Loja, Ecuador
| |
Collapse
|
25
|
Yu L, Xue L, Liu F, Li Y, Jing R, Luo J. The applications of deep learning algorithms on in silico druggable proteins identification. J Adv Res 2022; 41:219-231. [PMID: 36328750 PMCID: PMC9637576 DOI: 10.1016/j.jare.2022.01.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 12/21/2021] [Accepted: 01/18/2022] [Indexed: 11/20/2022] Open
Abstract
We developed the first deep learning-based druggable protein classifier for fast and accurate identification of potential druggable proteins. Experimental results on a standard dataset demonstrate that the prediction performance of deep learning model is comparable to those of existing methods. We visualized the representations of druggable proteins learned by deep learning models, which helps us understand how they work. Our analysis reconfirms that the attention mechanism is especially useful for explaining deep learning models.
Introduction The top priority in drug development is to identify novel and effective drug targets. In vitro assays are frequently used for this purpose; however, traditional experimental approaches are insufficient for large-scale exploration of novel drug targets, as they are expensive, time-consuming and laborious. Therefore, computational methods have emerged in recent decades as an alternative to aid experimental drug discovery studies by developing sophisticated predictive models to estimate unknown drugs/compounds and their targets. The recent success of deep learning (DL) techniques in machine learning and artificial intelligence has further attracted a great deal of attention in the biomedicine field, including computational drug discovery. Objectives This study focuses on the practical applications of deep learning algorithms for predicting druggable proteins and proposes a powerful predictor for fast and accurate identification of potential drug targets. Methods Using a gold-standard dataset, we explored several typical protein features and different deep learning algorithms and evaluated their performance in a comprehensive way. We provide an overview of the entire experimental process, including protein features and descriptors, neural network architectures, libraries and toolkits for deep learning modelling, performance evaluation metrics, model interpretation and visualization. Results Experimental results show that the hybrid model (architecture: CNN-RNN (BiLSTM) + DNN; feature: dictionary encoding + DC_TC_CTD) performed better than the other models on the benchmark dataset. This hybrid model was able to achieve 90.0% accuracy and 0.800 MCC on the test dataset and 84.8% and 0.703 on a nonredundant independent test dataset, which is comparable to those of existing methods. Conclusion We developed the first deep learning-based classifier for fast and accurate identification of potential druggable proteins. We hope that this study will be helpful for future researchers who would like to use deep learning techniques to develop relevant predictive models.
Collapse
|
26
|
Zhao BW, Hu L, You ZH, Wang L, Su XR. HINGRL: predicting drug-disease associations with graph representation learning on heterogeneous information networks. Brief Bioinform 2021; 23:6456295. [PMID: 34891172 DOI: 10.1093/bib/bbab515] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 11/08/2021] [Accepted: 11/09/2021] [Indexed: 12/20/2022] Open
Abstract
Identifying new indications for drugs plays an essential role at many phases of drug research and development. Computational methods are regarded as an effective way to associate drugs with new indications. However, most of them complete their tasks by constructing a variety of heterogeneous networks without considering the biological knowledge of drugs and diseases, which are believed to be useful for improving the accuracy of drug repositioning. To this end, a novel heterogeneous information network (HIN) based model, namely HINGRL, is proposed to precisely identify new indications for drugs based on graph representation learning techniques. More specifically, HINGRL first constructs a HIN by integrating drug-disease, drug-protein and protein-disease biological networks with the biological knowledge of drugs and diseases. Then, different representation strategies are applied to learn the features of nodes in the HIN from the topological and biological perspectives. Finally, HINGRL adopts a Random Forest classifier to predict unknown drug-disease associations based on the integrated features of drugs and diseases obtained in the previous step. Experimental results demonstrate that HINGRL achieves the best performance on two real datasets when compared with state-of-the-art models. Besides, our case studies indicate that the simultaneous consideration of network topology and biological knowledge of drugs and diseases allows HINGRL to precisely predict drug-disease associations from a more comprehensive perspective. The promising performance of HINGRL also reveals that the utilization of rich heterogeneous information provides an alternative view for HINGRL to identify novel drug-disease associations especially for new diseases.
Collapse
Affiliation(s)
- Bo-Wei Zhao
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Lun Hu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - Lei Wang
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Science, Nanning 530007, China
| | - Xiao-Rui Su
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| |
Collapse
|
27
|
Weber JM, Guo Z, Zhang C, Schweidtmann AM, Lapkin AA. Chemical data intelligence for sustainable chemistry. Chem Soc Rev 2021; 50:12013-12036. [PMID: 34520507 DOI: 10.1039/d1cs00477h] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
This study highlights new opportunities for optimal reaction route selection from large chemical databases brought about by the rapid digitalisation of chemical data. The chemical industry requires a transformation towards more sustainable practices, eliminating its dependencies on fossil fuels and limiting its impact on the environment. However, identifying more sustainable process alternatives is, at present, a cumbersome, manual, iterative process, based on chemical intuition and modelling. We give a perspective on methods for automated discovery and assessment of competitive sustainable reaction routes based on renewable or waste feedstocks. Three key areas of transition are outlined and reviewed based on their state-of-the-art as well as bottlenecks: (i) data, (ii) evaluation metrics, and (iii) decision-making. We elucidate their synergies and interfaces since only together these areas can bring about the most benefit. The field of chemical data intelligence offers the opportunity to identify the inherently more sustainable reaction pathways and to identify opportunities for a circular chemical economy. Our review shows that at present the field of data brings about most bottlenecks, such as data completion and data linkage, but also offers the principal opportunity for advancement.
Collapse
Affiliation(s)
- Jana M Weber
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, UK. .,Chemical Data Intelligence (CDI) Pte Ltd, Robinson Road, #02-00, 068898, Singapore
| | - Zhen Guo
- Chemical Data Intelligence (CDI) Pte Ltd, Robinson Road, #02-00, 068898, Singapore.,Cambridge Centre for Advanced Research and Education in Singapore, CARES Ltd. 1 CREATE Way, CREATE Tower #05-05, 138602, Singapore
| | - Chonghuan Zhang
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, UK.
| | - Artur M Schweidtmann
- Department of Chemical Engineering, Delft University of Technology, Van der Maasweg 9, Delft 2629 HZ, The Netherlands
| | - Alexei A Lapkin
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, UK. .,Chemical Data Intelligence (CDI) Pte Ltd, Robinson Road, #02-00, 068898, Singapore.,Cambridge Centre for Advanced Research and Education in Singapore, CARES Ltd. 1 CREATE Way, CREATE Tower #05-05, 138602, Singapore
| |
Collapse
|
28
|
Identification of drug-target interactions via multi-view graph regularized link propagation model. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.100] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
29
|
Melo MCR, Maasch JRMA, de la Fuente-Nunez C. Accelerating antibiotic discovery through artificial intelligence. Commun Biol 2021; 4:1050. [PMID: 34504303 PMCID: PMC8429579 DOI: 10.1038/s42003-021-02586-0] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 07/16/2021] [Indexed: 02/07/2023] Open
Abstract
By targeting invasive organisms, antibiotics insert themselves into the ancient struggle of the host-pathogen evolutionary arms race. As pathogens evolve tactics for evading antibiotics, therapies decline in efficacy and must be replaced, distinguishing antibiotics from most other forms of drug development. Together with a slow and expensive antibiotic development pipeline, the proliferation of drug-resistant pathogens drives urgent interest in computational methods that promise to expedite candidate discovery. Strides in artificial intelligence (AI) have encouraged its application to multiple dimensions of computer-aided drug design, with increasing application to antibiotic discovery. This review describes AI-facilitated advances in the discovery of both small molecule antibiotics and antimicrobial peptides. Beyond the essential prediction of antimicrobial activity, emphasis is also given to antimicrobial compound representation, determination of drug-likeness traits, antimicrobial resistance, and de novo molecular design. Given the urgency of the antimicrobial resistance crisis, we analyze uptake of open science best practices in AI-driven antibiotic discovery and argue for openness and reproducibility as a means of accelerating preclinical research. Finally, trends in the literature and areas for future inquiry are discussed, as artificially intelligent enhancements to drug discovery at large offer many opportunities for future applications in antibiotic development.
Collapse
Affiliation(s)
- Marcelo C R Melo
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Jacqueline R M A Maasch
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
30
|
Gupta G, Katarya R. EnPSO: An AutoML Technique for Generating Ensemble Recommender System. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2021. [DOI: 10.1007/s13369-021-05670-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
31
|
Abstract
AbstractNowadays, a vast amount of clinical data scattered across different sites on the Internet hinders users from finding helpful information for their well-being improvement. Besides, the overload of medical information (e.g., on drugs, medical tests, and treatment suggestions) have brought many difficulties to medical professionals in making patient-oriented decisions. These issues raise the need to apply recommender systems in the healthcare domain to help both, end-users and medical professionals, make more efficient and accurate health-related decisions. In this article, we provide a systematic overview of existing research on healthcare recommender systems. Different from existing related overview papers, our article provides insights into recommendation scenarios and recommendation approaches. Examples thereof are food recommendation, drug recommendation, health status prediction, healthcare service recommendation, and healthcare professional recommendation. Additionally, we develop working examples to give a deep understanding of recommendation algorithms. Finally, we discuss challenges concerning the development of healthcare recommender systems in the future.
Collapse
|
32
|
Piroozmand F, Mohammadipanah F, Sajedi H. Spectrum of deep learning algorithms in drug discovery. Chem Biol Drug Des 2021; 96:886-901. [PMID: 33058458 DOI: 10.1111/cbdd.13674] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 02/11/2020] [Accepted: 02/19/2020] [Indexed: 12/16/2022]
Abstract
Deep learning (DL) algorithms are a subset of machine learning algorithms with the aim of modeling complex mapping between a set of elements and their classes. In parallel to the advance in revealing the molecular bases of diseases, a notable innovation has been undertaken to apply DL in data/libraries management, reaction optimizations, differentiating uncertainties, molecule constructions, creating metrics from qualitative results, and prediction of structures or interactions. From source identification to lead discovery and medicinal chemistry of the drug candidate, drug delivery, and modification, the challenges can be subjected to artificial intelligence algorithms to aid in the generation and interpretation of data. Discovery and design approach, both demand automation, large data management and data fusion by the advance in high-throughput mode. The application of DL can accelerate the exploration of drug mechanisms, finding novel indications for existing drugs (drug repositioning), drug development, and preclinical and clinical studies. The impact of DL in the workflow of drug discovery, design, and their complementary tools are highlighted in this review. Additionally, the type of DL algorithms used for this purpose, and their pros and cons along with the dominant directions of future research are presented.
Collapse
Affiliation(s)
- Firoozeh Piroozmand
- Pharmaceutical Biotechnology Lab, Department of Microbiology, School of Biology and Center of Excellence in Phylogeny of Living Organisms, College of Science, University of Tehran, Tehran, Iran
| | - Fatemeh Mohammadipanah
- Pharmaceutical Biotechnology Lab, Department of Microbiology, School of Biology and Center of Excellence in Phylogeny of Living Organisms, College of Science, University of Tehran, Tehran, Iran
| | - Hedieh Sajedi
- Department of Computer Science, School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
| |
Collapse
|
33
|
Zhao H, Zheng K, Li Y, Wang J. A novel graph attention model for predicting frequencies of drug-side effects from multi-view data. Brief Bioinform 2021; 22:6312959. [PMID: 34213525 DOI: 10.1093/bib/bbab239] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 05/30/2021] [Accepted: 06/04/2021] [Indexed: 12/15/2022] Open
Abstract
Identifying the frequencies of the drug-side effects is a very important issue in pharmacological studies and drug risk-benefit. However, designing clinical trials to determine the frequencies is usually time consuming and expensive, and most existing methods can only predict the drug-side effect existence or associations, not their frequencies. Inspired by the recent progress of graph neural networks in the recommended system, we develop a novel prediction model for drug-side effect frequencies, using a graph attention network to integrate three different types of features, including the similarity information, known drug-side effect frequency information and word embeddings. In comparison, the few available studies focusing on frequency prediction use only the known drug-side effect frequency scores. One novel approach used in this work first decomposes the feature types in drug-side effect graph to extract different view representation vectors based on three different type features, and then recombines these latent view vectors automatically to obtain unified embeddings for prediction. The proposed method demonstrates high effectiveness in 10-fold cross-validation. The computational results show that the proposed method achieves the best performance in the benchmark dataset, outperforming the state-of-the-art matrix decomposition model. In addition, some ablation experiments and visual analyses are also supplied to illustrate the usefulness of our method for the prediction of the drug-side effect frequencies. The codes of MGPred are available at https://github.com/zhc940702/MGPred and https://zenodo.org/record/4449613.
Collapse
Affiliation(s)
- Haochen Zhao
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Kai Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529-0001, United States
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| |
Collapse
|
34
|
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res 2021; 49:D1388-D1395. [PMID: 33151290 PMCID: PMC7778930 DOI: 10.1093/nar/gkaa971] [Citation(s) in RCA: 1892] [Impact Index Per Article: 630.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/06/2020] [Accepted: 10/11/2020] [Indexed: 02/06/2023] Open
Abstract
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves the scientific community as well as the general public, with millions of unique users per month. In the past two years, PubChem made substantial improvements. Data from more than 100 new data sources were added to PubChem, including chemical-literature links from Thieme Chemistry, chemical and physical property links from SpringerMaterials, and patent links from the World Intellectual Properties Organization (WIPO). PubChem's homepage and individual record pages were updated to help users find desired information faster. This update involved a data model change for the data objects used by these pages as well as by programmatic users. Several new services were introduced, including the PubChem Periodic Table and Element pages, Pathway pages, and Knowledge panels. Additionally, in response to the coronavirus disease 2019 (COVID-19) outbreak, PubChem created a special data collection that contains PubChem data related to COVID-19 and the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jie Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Asta Gindulyte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jia He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Siqian He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Paul A Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Leonid Zaslavsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jian Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| |
Collapse
|
35
|
Ding Y, Tang J, Guo F. The Computational Models of Drug-target Interaction Prediction. Protein Pept Lett 2020; 27:348-358. [PMID: 30968771 DOI: 10.2174/0929866526666190410124110] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 02/22/2019] [Accepted: 04/02/2019] [Indexed: 12/19/2022]
Abstract
The identification of Drug-Target Interactions (DTIs) is an important process in drug discovery and medical research. However, the tradition experimental methods for DTIs identification are still time consuming, extremely expensive and challenging. In the past ten years, various computational methods have been developed to identify potential DTIs. In this paper, the identification methods of DTIs are summarized. What's more, several state-of-the-art computational methods are mainly introduced, containing network-based method and machine learning-based method. In particular, for machine learning-based methods, including the supervised and semisupervised models, have essential differences in the approach of negative samples. Although these effective computational models in identification of DTIs have achieved significant improvements, network-based and machine learning-based methods have their disadvantages, respectively. These computational methods are evaluated on four benchmark data sets via values of Area Under the Precision Recall curve (AUPR).
Collapse
Affiliation(s)
- Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States.,School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
36
|
Tran TNT, Felfernig A, Trattner C, Holzinger A. Recommender systems in the healthcare domain: state-of-the-art and research issues. J Intell Inf Syst 2020. [DOI: 10.1007/s10844-020-00633-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
AbstractNowadays, a vast amount of clinical data scattered across different sites on the Internet hinders users from finding helpful information for their well-being improvement. Besides, the overload of medical information (e.g., on drugs, medical tests, and treatment suggestions) have brought many difficulties to medical professionals in making patient-oriented decisions. These issues raise the need to apply recommender systems in the healthcare domain to help both, end-users and medical professionals, make more efficient and accurate health-related decisions. In this article, we provide a systematic overview of existing research on healthcare recommender systems. Different from existing related overview papers, our article provides insights into recommendation scenarios and recommendation approaches. Examples thereof are food recommendation, drug recommendation, health status prediction, healthcare service recommendation, and healthcare professional recommendation. Additionally, we develop working examples to give a deep understanding of recommendation algorithms. Finally, we discuss challenges concerning the development of healthcare recommender systems in the future.
Collapse
|
37
|
NDDSA: A network- and domain-based method for predicting drug-side effect associations. Inf Process Manag 2020. [DOI: 10.1016/j.ipm.2020.102357] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
38
|
Identification of Drug–Target Interactions via Dual Laplacian Regularized Least Squares with Multiple Kernel Fusion. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106254] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
39
|
Palanivinayagam A, Sasikumar D. Drug recommendation with minimal side effects based on direct and temporal symptoms. Neural Comput Appl 2020. [DOI: 10.1007/s00521-018-3794-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
40
|
Sosnina EA, Sosnin S, Nikitina AA, Nazarov I, Osolodkin DI, Fedorov MV. Recommender Systems in Antiviral Drug Discovery. ACS OMEGA 2020; 5:15039-15051. [PMID: 32632398 PMCID: PMC7315437 DOI: 10.1021/acsomega.0c00857] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 06/03/2020] [Indexed: 06/11/2023]
Abstract
Recommender systems (RSs), which underwent rapid development and had an enormous impact on e-commerce, have the potential to become useful tools for drug discovery. In this paper, we applied RS methods for the prediction of the antiviral activity class (active/inactive) for compounds extracted from ChEMBL. Two main RS approaches were applied: collaborative filtering (Surprise implementation) and content-based filtering (sparse-group inductive matrix completion (SGIMC) method). The effectiveness of RS approaches was investigated for prediction of antiviral activity classes ("interactions") for compounds and viruses, for which some of their interactions with other viruses or compounds are known, and for prediction of interaction profiles for new compounds. Both approaches achieved relatively good prediction quality for binary classification of individual interactions and compound profiles, as quantified by cross-validation and external validation receiver operating characteristic (ROC) score >0.9. Thus, even simple recommender systems may serve as an effective tool in antiviral drug discovery.
Collapse
Affiliation(s)
- Ekaterina A. Sosnina
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
- Institute
of Physiologically Active Compounds, RAS, Severniy pr. 1, Chernogolovka 142432, Russia
| | - Sergey Sosnin
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
- Syntelly
LLC, Skolkovo Innovation Center, Bolshoy Boulevard 30, Moscow 121205, Russia
| | - Anastasia A. Nikitina
- Department
of Chemistry, Lomonosov Moscow State University, Leninskie Gory 1 bd. 3, Moscow 119991, Russia
- FSBSI
“Chumakov FSC R&D IBP RAS”, Poselok Instituta Poliomielita 8
bd. 1, Poselenie Moskovsky, Moscow 108819, Russia
| | - Ivan Nazarov
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
| | - Dmitry I. Osolodkin
- FSBSI
“Chumakov FSC R&D IBP RAS”, Poselok Instituta Poliomielita 8
bd. 1, Poselenie Moskovsky, Moscow 108819, Russia
- Institute
of Translational Medicine and Biotechnology, Sechenov First Moscow State Medical University, Trubetskaya Ulitsa 8, Moscow 119991, Russia
| | - Maxim V. Fedorov
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
- Syntelly
LLC, Skolkovo Innovation Center, Bolshoy Boulevard 30, Moscow 121205, Russia
- Physics
John Anderson Building, University of Strathclyde, 107 Rottenrow East, Glasgow G4 0NG, U.K.
| |
Collapse
|
41
|
A Novel Triple Matrix Factorization Method for Detecting Drug-Side Effect Association Based on Kernel Target Alignment. BIOMED RESEARCH INTERNATIONAL 2020; 2020:4675395. [PMID: 32596314 PMCID: PMC7275954 DOI: 10.1155/2020/4675395] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Accepted: 04/08/2020] [Indexed: 01/01/2023]
Abstract
All drugs usually have side effects, which endanger the health of patients. To identify potential side effects of drugs, biological and pharmacological experiments are done but are expensive and time-consuming. So, computation-based methods have been developed to accurately and quickly predict side effects. To predict potential associations between drugs and side effects, we propose a novel method called the Triple Matrix Factorization- (TMF-) based model. TMF is built by the biprojection matrix and latent feature of kernels, which is based on Low Rank Approximation (LRA). LRA could construct a lower rank matrix to approximate the original matrix, which not only retains the characteristics of the original matrix but also reduces the storage space and computational complexity of the data. To fuse multivariate information, multiple kernel matrices are constructed and integrated via Kernel Target Alignment-based Multiple Kernel Learning (KTA-MKL) in drug and side effect space, respectively. Compared with other methods, our model achieves better performance on three benchmark datasets. The values of the Area Under the Precision-Recall curve (AUPR) are 0.677, 0.685, and 0.680 on three datasets, respectively.
Collapse
|
42
|
Zhou H, Cao H, Matyunina L, Shelby M, Cassels L, McDonald JF, Skolnick J. MEDICASCY: A Machine Learning Approach for Predicting Small-Molecule Drug Side Effects, Indications, Efficacy, and Modes of Action. Mol Pharm 2020; 17:1558-1574. [PMID: 32237745 PMCID: PMC7319183 DOI: 10.1021/acs.molpharmaceut.9b01248] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
To improve the drug discovery yield, a method which is implemented at the beginning of drug discovery that accurately predicts drug side effects, indications, efficacy, and mode of action based solely on the input of the drug's chemical structure is needed. In contrast, extant predictive methods do not comprehensively address these aspects of drug discovery and rely on features derived from extensive, often unavailable experimental information for novel molecules. To address these issues, we developed MEDICASCY, a multilabel-based boosted random forest machine learning method that only requires the small molecule's chemical structure for the drug side effect, indication, efficacy, and probable mode of action target predictions; however, it has comparable or even significantly better performance than existing approaches requiring far more information. In retrospective benchmarking on high confidence predictions, MEDICASCY shows about 78% precision and recall for predicting at least one severe side effect and 72% precision drug efficacy. Experimental validation of MEDICASCY's efficacy predictions on novel molecules shows close to 80% precision for the inhibition of growth in ovarian, breast, and prostate cancer cell lines. Thus, MEDICASCY should improve the success rate for new drug approval. A web service for academic users is available at http://pwp.gatech.edu/cssb/MEDICASCY.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, N.W., Atlanta, GA 30332
| | - Hongnan Cao
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, N.W., Atlanta, GA 30332
| | - Lilya Matyunina
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, 30332-0230, USA
| | - Madelyn Shelby
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, 30332-0230, USA
| | - Lauren Cassels
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, 30332-0230, USA
| | - John F. McDonald
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, 30332-0230, USA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, N.W., Atlanta, GA 30332
| |
Collapse
|
43
|
Sachdev K, Gupta MK. A comprehensive review of computational techniques for the prediction of drug side effects. Drug Dev Res 2020; 81:650-670. [DOI: 10.1002/ddr.21669] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 03/18/2020] [Accepted: 03/30/2020] [Indexed: 12/28/2022]
Affiliation(s)
- Kanica Sachdev
- School of Computer Science and EngineeringShri Mata Vaishno Devi University Katra Jammu and Kashmir India
| | - Manoj K. Gupta
- School of Computer Science and EngineeringShri Mata Vaishno Devi University Katra Jammu and Kashmir India
| |
Collapse
|
44
|
Huang F, Qiu Y, Li Q, Liu S, Ni F. Predicting Drug-Disease Associations via Multi-Task Learning Based on Collective Matrix Factorization. Front Bioeng Biotechnol 2020; 8:218. [PMID: 32373595 PMCID: PMC7179666 DOI: 10.3389/fbioe.2020.00218] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 03/04/2020] [Indexed: 12/30/2022] Open
Abstract
Identifying drug-disease associations is integral to drug development. Computationally prioritizing candidate drug-disease associations has attracted growing attention due to its contribution to reducing the cost of laboratory screening. Drug-disease associations involve different association types, such as drug indications and drug side effects. However, the existing models for predicting drug-disease associations merely concentrate on independent tasks: recommending novel indications to benefit drug repositioning, predicting potential side effects to prevent drug-induced risk, or only determining the existence of drug-disease association. They ignore crucial prior knowledge of the correlations between different association types. Since the Comparative Toxicogenomics Database (CTD) annotates the drug-disease associations as therapeutic or marker/mechanism, we consider predicting the two types of association. To this end, we propose a collective matrix factorization-based multi-task learning method (CMFMTL) in this paper. CMFMTL handles the problem as multi-task learning where each task is to predict one type of association, and two tasks complement and improve each other by capturing the relatedness between them. First, drug-disease associations are represented as a bipartite network with two types of links representing therapeutic effects and non-therapeutic effects. Then, CMFMTL, respectively, approximates the association matrix regarding each link type by matrix tri-factorization, and shares the low-dimensional latent representations for drugs and diseases in the two related tasks for the goal of collective learning. Finally, CMFMTL puts the two tasks into a unified framework and an efficient algorithm is developed to solve our proposed optimization problem. In the computational experiments, CMFMTL outperforms several state-of-the-art methods both in the two tasks. Moreover, case studies show that CMFMTL helps to find out novel drug-disease associations that are not included in CTD, and simultaneously predicts their association types.
Collapse
Affiliation(s)
- Feng Huang
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Yang Qiu
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Qiaojun Li
- College of Informatics, Huazhong Agricultural University, Wuhan, China
- School of Electronic and Information Engineering, Henan Polytechnic Institute, Henan Nanyang, China
| | - Shichao Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, Wuhan, China
| | - Fuchuan Ni
- College of Informatics, Huazhong Agricultural University, Wuhan, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, Wuhan, China
| |
Collapse
|
45
|
Sivaramakrishnan N, Subramaniyaswamy V, Viloria A, Vijayakumar V, Senthilselvan N. A deep learning-based hybrid model for recommendation generation and ranking. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-04844-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
46
|
Dere S, Ayvaz S. Prediction of Drug-Drug Interactions by Using Profile Fingerprint Vectors and Protein Similarities. Healthc Inform Res 2020; 26:42-49. [PMID: 32082699 PMCID: PMC7010946 DOI: 10.4258/hir.2020.26.1.42] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 12/24/2019] [Accepted: 12/25/2019] [Indexed: 12/21/2022] Open
Abstract
Objectives Drug-drug interaction (DDI) is a vital problem that threatens people's health. However, the prediction of DDIs through in-vivo experiments is not only extremely costly but also difficult as many serious side effects are hard to detect in in-vivo and in-vitro settings. The aim of this study was to assess the effectiveness of similarity-based in-silico computational DDI prediction approaches and to provide a cost effective and scalable solution to predict potential DDIs. Methods In this study, widely known similarity-based computational DDI prediction methods were utilized to discover novel potential DDIs. More specifically, known interactions, drug targets, adverse effects, and protein similarities of drug pairs were used to construct drug fingerprints for the prediction of DDIs. Results Using the drug interaction profile, our approach achieved an area under the curve (AUC) of 0.975 in the prediction of a potential DDI. The drug adverse effect profile and protein profile similarity-based methods resulted in AUC values of 0.685 and 0.895, respectively, in the prediction of DDIs. Conclusions In this study, we developed a computational approach to the prediction of potential drug interactions. The performance of the similarity-based computational methods was comparatively evaluated using a comprehensive real-world DDI dataset. The evaluations showed that the drug interaction profile information is a better predictor of DDIs compared to drug adverse effects and protein similarities among DDI pairs.
Collapse
Affiliation(s)
- Selma Dere
- Department of Computer Engineering, Bahcesehir University, Besiktas, Istanbul, Turkey
| | - Serkan Ayvaz
- Department of Software Engineering, Bahcesehir University, Besiktas, Istanbul, Turkey
| |
Collapse
|
47
|
Spiro A, Fernández García J, Yanover C. Inferring new relations between medical entities using literature curated term co-occurrences. JAMIA Open 2020; 2:378-385. [PMID: 31984370 PMCID: PMC6951958 DOI: 10.1093/jamiaopen/ooz022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 06/05/2019] [Accepted: 06/08/2019] [Indexed: 11/17/2022] Open
Abstract
Objectives Identifying new relations between medical entities, such as drugs, diseases, and side effects, is typically a resource-intensive task, involving experimentation and clinical trials. The increased availability of related data and curated knowledge enables a computational approach to this task, notably by training models to predict likely relations. Such models rely on meaningful representations of the medical entities being studied. We propose a generic features vector representation that leverages co-occurrences of medical terms, linked with PubMed citations. Materials and Methods We demonstrate the usefulness of the proposed representation by inferring two types of relations: a drug causes a side effect and a drug treats an indication. To predict these relations and assess their effectiveness, we applied 2 modeling approaches: multi-task modeling using neural networks and single-task modeling based on gradient boosting machines and logistic regression. Results These trained models, which predict either side effects or indications, obtained significantly better results than baseline models that use a single direct co-occurrence feature. The results demonstrate the advantage of a comprehensive representation. Discussion Selecting the appropriate representation has an immense impact on the predictive performance of machine learning models. Our proposed representation is powerful, as it spans multiple medical domains and can be used to predict a wide range of relation types. Conclusion The discovery of new relations between various medical entities can be translated into meaningful insights, for example, related to drug development or disease understanding. Our representation of medical entities can be used to train models that predict such relations, thus accelerating healthcare-related discoveries.
Collapse
Affiliation(s)
- Adam Spiro
- Machine Learning for Healthcare and Life Sciences, Department of Health Informatics, IBM Research, Haifa, Israel
| | - Jonatan Fernández García
- Machine Learning for Healthcare and Life Sciences, Department of Health Informatics, IBM Research, Haifa, Israel
| | - Chen Yanover
- Machine Learning for Healthcare and Life Sciences, Department of Health Informatics, IBM Research, Haifa, Israel
| |
Collapse
|
48
|
Tan C, Wang T, Yang W, Deng L. PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction. Molecules 2019; 25:molecules25010098. [PMID: 31888057 PMCID: PMC6982935 DOI: 10.3390/molecules25010098] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2019] [Revised: 12/20/2019] [Accepted: 12/21/2019] [Indexed: 11/16/2022] Open
Abstract
Interactions between proteins and DNAs play essential roles in many biological processes. DNA binding proteins can be classified into two categories. Double-stranded DNA-binding proteins (DSBs) bind to double-stranded DNA and are involved in a series of cell functions such as gene expression and regulation. Single-stranded DNA-binding proteins (SSBs) are necessary for DNA replication, recombination, and repair and are responsible for binding to the single-stranded DNA. Therefore, the effective classification of DNA-binding proteins is helpful for functional annotations of proteins. In this work, we propose PredPSD, a computational method based on sequence information that accurately predicts SSBs and DSBs. It introduces three novel feature extraction algorithms. In particular, we use the autocross-covariance (ACC) transformation to transform feature matrices into fixed-length vectors. Then, we put the optimal feature subset obtained by the minimal-redundancy-maximal-relevance criterion (mRMR) feature selection algorithm into the gradient tree boosting (GTB). In 10-fold cross-validation based on a benchmark dataset, PredPSD achieves promising performances with an AUC score of 0.956 and an accuracy of 0.912, which are better than those of existing methods. Moreover, our method has significantly improved the prediction accuracy in independent testing. The experimental results show that PredPSD can significantly recognize the binding specificity and differentiate DSBs and SSBs.
Collapse
Affiliation(s)
- Changgeng Tan
- School of Computer Science and Engineering, Central South University, Changsha 410075, China; (C.T.); (T.W.); (W.Y.)
| | - Tong Wang
- School of Computer Science and Engineering, Central South University, Changsha 410075, China; (C.T.); (T.W.); (W.Y.)
| | - Wenyi Yang
- School of Computer Science and Engineering, Central South University, Changsha 410075, China; (C.T.); (T.W.); (W.Y.)
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410075, China; (C.T.); (T.W.); (W.Y.)
- School of Software, Xinjiang University, Urumqi 830008, China
- Correspondence: ; Tel.: +86-731-82539736
| |
Collapse
|
49
|
Ding Y, Tang J, Guo F. Identification of Drug-Side Effect Association via Semisupervised Model and Multiple Kernel Learning. IEEE J Biomed Health Inform 2019; 23:2619-2632. [DOI: 10.1109/jbhi.2018.2883834] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
50
|
Onay A, Onay M. A Drug Decision Support System for Developing a Successful Drug Candidate Using Machine Learning Techniques. Curr Comput Aided Drug Des 2019; 16:407-419. [PMID: 31438830 DOI: 10.2174/1573409915666190716143601] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Revised: 04/24/2019] [Accepted: 05/06/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND Virtual screening of candidate drug molecules using machine learning techniques plays a key role in pharmaceutical industry to design and discovery of new drugs. Computational classification methods can determine drug types according to the disease groups and distinguish approved drugs from withdrawn ones. INTRODUCTION Classification models developed in this study can be used as a simple filter in drug modelling to eliminate potentially inappropriate molecules in the early stages. In this work, we developed a Drug Decision Support System (DDSS) to classify each drug candidate molecule as potentially drug or non-drug and to predict its disease group. METHODS Molecular descriptors were identified for the determination of a number of rules in drug molecules. They were derived using ADRIANA.Code program and Lipinski's rule of five. We used Artificial Neural Network (ANN) to classify drug molecules correctly according to the types of diseases. Closed frequent molecular structures in the form of subgraph fragments were also obtained with Gaston algorithm included in ParMol Package to find common molecular fragments for withdrawn drugs. RESULTS We observed that TPSA, XlogP Natoms, HDon_O and TPSA are the most distinctive features in the pool of the molecular descriptors and evaluated the performances of classifiers on all datasets and found that classification accuracies are very high on all the datasets. Neural network models achieved 84.6% and 83.3% accuracies on test sets including cardiac therapy, anti-epileptics and anti-parkinson drugs with approved and withdrawn drugs for drug classification problems. CONCLUSION The experimental evaluation shows that the system is promising at determination of potential drug molecules to classify drug molecules correctly according to the types of diseases.
Collapse
Affiliation(s)
- Aytun Onay
- Department of Computer Engineering, Faculty of Engineering & Architecture, Kafkas University, Kars, 36100, Turkey
| | - Melih Onay
- Department of Environmental Engineering, Computational & Experimental Biochemistry Lab, Faculty of Engineering, Van Yuzuncu Yil University, 65100, Van, Turkey
| |
Collapse
|