51
|
Kang H, Goo S, Lee H, Chae JW, Yun HY, Jung S. Fine-tuning of BERT Model to Accurately Predict Drug–Target Interactions. Pharmaceutics 2022; 14:pharmaceutics14081710. [PMID: 36015336 PMCID: PMC9414546 DOI: 10.3390/pharmaceutics14081710] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 08/09/2022] [Accepted: 08/11/2022] [Indexed: 11/16/2022] Open
Abstract
The identification of optimal drug candidates is very important in drug discovery. Researchers in biology and computational sciences have sought to use machine learning (ML) to efficiently predict drug–target interactions (DTIs). In recent years, according to the emerging usefulness of pretrained models in natural language process (NLPs), pretrained models are being developed for chemical compounds and target proteins. This study sought to improve DTI predictive models using a Bidirectional Encoder Representations from the Transformers (BERT)-pretrained model, ChemBERTa, for chemical compounds. Pretraining features the use of a simplified molecular-input line-entry system (SMILES). We also employ the pretrained ProBERT for target proteins (pretraining employed the amino acid sequences). The BIOSNAP, DAVIS, and BindingDB databases (DBs) were used (alone or together) for learning. The final model, taught by both ChemBERTa and ProtBert and the integrated DBs, afforded the best DTI predictive performance to date based on the receiver operating characteristic area under the curve (AUC) and precision-recall-AUC values compared with previous models. The performance of the final model was verified using a specific case study on 13 pairs of subtrates and the metabolic enzyme cytochrome P450 (CYP). The final model afforded excellent DTI prediction. As the real-world interactions between drugs and target proteins are expected to exhibit specific patterns, pretraining with ChemBERTa and ProtBert could teach such patterns. Learning the patterns of such interactions would enhance DTI accuracy if learning employs large, well-balanced datasets that cover all relationships between drugs and target proteins.
Collapse
Affiliation(s)
- Hyeunseok Kang
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea
| | - Sungwoo Goo
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea
| | - Hyunjung Lee
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea
| | - Jung-woo Chae
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea
- College of Pharmacy, Chungnam National University, Daejeon 34134, Korea
- Correspondence: (J.-w.C.); (H.-y.Y.); (S.J.)
| | - Hwi-yeol Yun
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea
- College of Pharmacy, Chungnam National University, Daejeon 34134, Korea
- Correspondence: (J.-w.C.); (H.-y.Y.); (S.J.)
| | - Sangkeun Jung
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea
- Department of Computer Convergence, Chungnam National University, Daejeon 34134, Korea
- Correspondence: (J.-w.C.); (H.-y.Y.); (S.J.)
| |
Collapse
|
52
|
Reciprocal perspective as a super learner improves drug-target interaction prediction (MUSDTI). Sci Rep 2022; 12:13237. [PMID: 35918366 PMCID: PMC9344797 DOI: 10.1038/s41598-022-16493-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 07/11/2022] [Indexed: 11/08/2022] Open
Abstract
The identification of novel drug-target interactions (DTI) is critical to drug discovery and drug repurposing to address contemporary medical and public health challenges presented by emergent diseases. Historically, computational methods have framed DTI prediction as a binary classification problem (indicating whether or not a drug physically interacts with a given protein target); however, framing the problem instead as a regression-based prediction of the physiochemical binding affinity is more meaningful. With growing databases of experimentally derived drug-target interactions (e.g. Davis, Binding-DB, and Kiba), deep learning-based DTI predictors can be effectively leveraged to achieve state-of-the-art (SOTA) performance. In this work, we formulated a DTI competition as part of the coursework for a senior undergraduate machine learning course and challenged students to generate component DTI models that might surpass SOTA models and effectively combine these component models as part of a meta-model using the Reciprocal Perspective (RP) multi-view learning framework. Following 6 weeks of concerted effort, 28 student-produced component deep-learning DTI models were leveraged in this work to produce a new SOTA RP-DTI model, denoted the Meta Undergraduate Student DTI (MUSDTI) model. Through a series of experiments we demonstrate that (1) RP can considerably improve SOTA DTI prediction, (2) our new double-cold experimental design is more appropriate for emergent DTI challenges, (3) that our novel MUSDTI meta-model outperforms SOTA models, (4) that RP can improve upon individual models as an ensembling method, and finally, (5) RP can be utilized for low computation transfer learning. This work introduces a number of important revelations for the field of DTI prediction and sequence-based, pairwise prediction in general.
Collapse
|
53
|
Zheng PF, Chen LZ, Liu P, Liu ZY, Pan HW. Integrative identification of immune-related key genes in atrial fibrillation using weighted gene coexpression network analysis and machine learning. Front Cardiovasc Med 2022; 9:922523. [PMID: 35966550 PMCID: PMC9363882 DOI: 10.3389/fcvm.2022.922523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 07/11/2022] [Indexed: 11/13/2022] Open
Abstract
BackgroundThe immune system significantly participates in the pathologic process of atrial fibrillation (AF). However, the molecular mechanisms underlying this participation are not completely explained. The current research aimed to identify critical genes and immune cells that participate in the pathologic process of AF.MethodsCIBERSORT was utilized to reveal the immune cell infiltration pattern in AF patients. Meanwhile, weighted gene coexpression network analysis (WGCNA) was utilized to identify meaningful modules that were significantly correlated with AF. The characteristic genes correlated with AF were identified by the least absolute shrinkage and selection operator (LASSO) logistic regression and support vector machine recursive feature elimination (SVM-RFE) algorithm.ResultsIn comparison to sinus rhythm (SR) individuals, we observed that fewer activated mast cells and regulatory T cells (Tregs), as well as more gamma delta T cells, resting mast cells, and M2 macrophages, were infiltrated in AF patients. Three significant modules (pink, red, and magenta) were identified to be significantly associated with AF. Gene enrichment analysis showed that all 717 genes were associated with immunity- or inflammation-related pathways and biological processes. Four hub genes (GALNT16, HTR2B, BEX2, and RAB8A) were revealed to be significantly correlated with AF by the SVM-RFE algorithm and LASSO logistic regression. qRT–PCR results suggested that compared to the SR subjects, AF patients exhibited significantly reduced BEX2 and GALNT16 expression, as well as dramatically elevated HTR2B expression. The AUC measurement showed that the diagnostic efficiency of BEX2, HTR2B, and GALNT16 in the training set was 0.836, 0.883, and 0.893, respectively, and 0.858, 0.861, and 0.915, respectively, in the validation set.ConclusionsThree novel genes, BEX2, HTR2B, and GALNT16, were identified by WGCNA combined with machine learning, which provides potential new therapeutic targets for the early diagnosis and prevention of AF.
Collapse
Affiliation(s)
- Peng-Fei Zheng
- Department of Cardiology, Hunan Provincial People's Hospital, Changsha, China
- Clinical Research Center for Heart Failure in Hunan Province, Changsha, China
- Hunan Provincial People's Hospital, Institute of Cardiovascular Epidemiology, Changsha, China
| | - Lu-Zhu Chen
- Department of Cardiology, The Central Hospital of ShaoYang, Shaoyang, China
| | - Peng Liu
- Department of Cardiology, The Central Hospital of ShaoYang, Shaoyang, China
| | - Zheng-Yu Liu
- Department of Cardiology, Hunan Provincial People's Hospital, Changsha, China
- Clinical Research Center for Heart Failure in Hunan Province, Changsha, China
- Hunan Provincial People's Hospital, Institute of Cardiovascular Epidemiology, Changsha, China
- *Correspondence: Zheng-Yu Liu
| | - Hong Wei Pan
- Department of Cardiology, Hunan Provincial People's Hospital, Changsha, China
- Clinical Research Center for Heart Failure in Hunan Province, Changsha, China
- Hunan Provincial People's Hospital, Institute of Cardiovascular Epidemiology, Changsha, China
- Hong Wei Pan
| |
Collapse
|
54
|
Matrix factorization with denoising autoencoders for prediction of drug–target interactions. Mol Divers 2022:10.1007/s11030-022-10492-8. [DOI: 10.1007/s11030-022-10492-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Accepted: 07/01/2022] [Indexed: 11/25/2022]
|
55
|
Luo H, Xiang Y, Fang X, Lin W, Wang F, Wu H, Wang H. BatchDTA: implicit batch alignment enhances deep learning-based drug-target affinity estimation. Brief Bioinform 2022; 23:6632927. [PMID: 35794723 DOI: 10.1093/bib/bbac260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/23/2022] [Accepted: 06/03/2022] [Indexed: 11/14/2022] Open
Abstract
Candidate compounds with high binding affinities toward a target protein are likely to be developed as drugs. Deep neural networks (DNNs) have attracted increasing attention for drug-target affinity (DTA) estimation owning to their efficiency. However, the negative impact of batch effects caused by measure metrics, system technologies and other assay information is seldom discussed when training a DNN model for DTA. Suffering from the data deviation caused by batch effects, the DNN models can only be trained on a small amount of 'clean' data. Thus, it is challenging for them to provide precise and consistent estimations. We design a batch-sensitive training framework, namely BatchDTA, to train the DNN models. BatchDTA implicitly aligns multiple batches toward the same protein through learning the orders of candidate compounds with respect to the batches, alleviating the impact of the batch effects on the DNN models. Extensive experiments demonstrate that BatchDTA facilitates four mainstream DNN models to enhance the ability and robustness on multiple DTA datasets (BindingDB, Davis and KIBA). The average concordance index of the DNN models achieves a relative improvement of 4.0%. The case study reveals that BatchDTA can successfully learn the ranking orders of the compounds from multiple batches. In addition, BatchDTA can also be applied to the fused data collected from multiple sources to achieve further improvement.
Collapse
Affiliation(s)
- Hongyu Luo
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Yingfei Xiang
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Xiaomin Fang
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Wei Lin
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Fan Wang
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Hua Wu
- Baidu Inc., 100000, Beijing, China
| | | |
Collapse
|
56
|
Zheng J, Xiao X, Qiu WR. DTI-BERT: Identifying Drug-Target Interactions in Cellular Networking Based on BERT and Deep Learning Method. Front Genet 2022; 13:859188. [PMID: 35754843 PMCID: PMC9213727 DOI: 10.3389/fgene.2022.859188] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 04/25/2022] [Indexed: 11/20/2022] Open
Abstract
Drug–target interactions (DTIs) are regarded as an essential part of genomic drug discovery, and computational prediction of DTIs can accelerate to find the lead drug for the target, which can make up for the lack of time-consuming and expensive wet-lab techniques. Currently, many computational methods predict DTIs based on sequential composition or physicochemical properties of drug and target, but further efforts are needed to improve them. In this article, we proposed a new sequence-based method for accurately identifying DTIs. For target protein, we explore using pre-trained Bidirectional Encoder Representations from Transformers (BERT) to extract sequence features, which can provide unique and valuable pattern information. For drug molecules, Discrete Wavelet Transform (DWT) is employed to generate information from drug molecular fingerprints. Then we concatenate the feature vectors of the DTIs, and input them into a feature extraction module consisting of a batch-norm layer, rectified linear activation layer and linear layer, called BRL block and a Convolutional Neural Networks module to extract DTIs features further. Subsequently, a BRL block is used as the prediction engine. After optimizing the model based on contrastive loss and cross-entropy loss, it gave prediction accuracies of the target families of G Protein-coupled receptors, ion channels, enzymes, and nuclear receptors up to 90.1, 94.7, 94.9, and 89%, which indicated that the proposed method can outperform the existing predictors. To make it as convenient as possible for researchers, the web server for the new predictor is freely accessible at: https://bioinfo.jcu.edu.cn/dtibert or http://121.36.221.79/dtibert/. The proposed method may also be a potential option for other DITs.
Collapse
Affiliation(s)
- Jie Zheng
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| | - Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| | - Wang-Ren Qiu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| |
Collapse
|
57
|
Monteiro NRC, Simões CJV, Ávila HV, Abbasi M, Oliveira JL, Arrais JP. Explainable deep drug-target representations for binding affinity prediction. BMC Bioinformatics 2022; 23:237. [PMID: 35715734 PMCID: PMC9204982 DOI: 10.1186/s12859-022-04767-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/25/2022] [Indexed: 11/10/2022] Open
Abstract
Background Several computational advances have been achieved in the drug discovery field, promoting the identification of novel drug–target interactions and new leads. However, most of these methodologies have been overlooking the importance of providing explanations to the decision-making process of deep learning architectures. In this research study, we explore the reliability of convolutional neural networks (CNNs) at identifying relevant regions for binding, specifically binding sites and motifs, and the significance of the deep representations extracted by providing explanations to the model’s decisions based on the identification of the input regions that contributed the most to the prediction. We make use of an end-to-end deep learning architecture to predict binding affinity, where CNNs are exploited in their capacity to automatically identify and extract discriminating deep representations from 1D sequential and structural data. Results The results demonstrate the effectiveness of the deep representations extracted from CNNs in the prediction of drug–target interactions. CNNs were found to identify and extract features from regions relevant for the interaction, where the weight associated with these spots was in the range of those with the highest positive influence given by the CNNs in the prediction. The end-to-end deep learning model achieved the highest performance both in the prediction of the binding affinity and on the ability to correctly distinguish the interaction strength rank order when compared to baseline approaches. Conclusions This research study validates the potential applicability of an end-to-end deep learning architecture in the context of drug discovery beyond the confined space of proteins and ligands with determined 3D structure. Furthermore, it shows the reliability of the deep representations extracted from the CNNs by providing explainability to the decision-making process. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04767-y.
Collapse
Affiliation(s)
- Nelson R C Monteiro
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal.
| | | | - Henrique V Ávila
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - Maryam Abbasi
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - José L Oliveira
- IEETA, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Joel P Arrais
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| |
Collapse
|
58
|
Zhu S, Bai Q, Li L, Xu T. Drug repositioning in drug discovery of T2DM and repositioning potential of antidiabetic agents. Comput Struct Biotechnol J 2022; 20:2839-2847. [PMID: 35765655 PMCID: PMC9189996 DOI: 10.1016/j.csbj.2022.05.057] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/30/2022] [Accepted: 05/30/2022] [Indexed: 12/19/2022] Open
Abstract
Repositioning or repurposing drugs account for a substantial part of entering approval pipeline drugs, which indicates that drug repositioning has huge market potential and value. Computational technologies such as machine learning methods have accelerated the process of drug repositioning in the last few decades years. The repositioning potential of type 2 diabetes mellitus (T2DM) drugs for various diseases such as cancer, neurodegenerative diseases, and cardiovascular diseases have been widely studied. Hence, the related summary about repurposing antidiabetic drugs is of great significance. In this review, we focus on the machine learning methods for the development of new T2DM drugs and give an overview of the repurposing potential of the existing antidiabetic agents.
Collapse
Affiliation(s)
- Sha Zhu
- Key Lab of Preclinical Study for New Drugs of Gansu Province, Institute of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, PR China
| | - Qifeng Bai
- Key Lab of Preclinical Study for New Drugs of Gansu Province, Institute of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, PR China
- Corresponding author.
| | | | | |
Collapse
|
59
|
Gabur I, Simioniuc DP, Snowdon RJ, Cristea D. Machine Learning Applied to the Search for Nonlinear Features in Breeding Populations. Front Artif Intell 2022; 5:876578. [PMID: 35669178 PMCID: PMC9164111 DOI: 10.3389/frai.2022.876578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 04/19/2022] [Indexed: 11/13/2022] Open
Abstract
Large plant breeding populations are traditionally a source of novel allelic diversity and are at the core of selection efforts for elite material. Finding rare diversity requires a deep understanding of biological interactions between the genetic makeup of one genotype and its environmental conditions. Most modern breeding programs still rely on linear regression models to solve this problem, generalizing the complex genotype by phenotype interactions through manually constructed linear features. However, the identification of positive alleles vs. background can be addressed using deep learning approaches that have the capacity to learn complex nonlinear functions for the inputs. Machine learning (ML) is an artificial intelligence (AI) approach involving a range of algorithms to learn from input data sets and predict outcomes in other related samples. This paper describes a variety of techniques that include supervised and unsupervised ML algorithms to improve our understanding of nonlinear interactions from plant breeding data sets. Feature selection (FS) methods are combined with linear and nonlinear predictors and compared to traditional prediction methods used in plant breeding. Recent advances in ML allowed the construction of complex models that have the capacity to better differentiate between positive alleles and the genetic background. Using real plant breeding program data, we show that ML methods have the ability to outperform current approaches, increase prediction accuracies, decrease the computing time drastically, and improve the detection of important alleles involved in qualitative or quantitative traits.
Collapse
Affiliation(s)
- Iulian Gabur
- Department of Plant Breeding, Justus-Liebig-University, Giessen, Germany
- Department of Plant Sciences, Iasi University of Life Sciences, Iasi, Romania
- *Correspondence: Iulian Gabur
| | | | - Rod J. Snowdon
- Department of Plant Breeding, Justus-Liebig-University, Giessen, Germany
| | - Dan Cristea
- Institute of Computer Science, Romanian Academy, Iasi Branch, Iasi, Romania
| |
Collapse
|
60
|
DTIP-TC2A: An analytical framework for drug-target interactions prediction methods. Comput Biol Chem 2022; 99:107707. [DOI: 10.1016/j.compbiolchem.2022.107707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 05/01/2022] [Accepted: 05/26/2022] [Indexed: 11/18/2022]
|
61
|
Nag S, Baidya ATK, Mandal A, Mathew AT, Das B, Devi B, Kumar R. Deep learning tools for advancing drug discovery and development. 3 Biotech 2022; 12:110. [PMID: 35433167 PMCID: PMC8994527 DOI: 10.1007/s13205-022-03165-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 03/18/2022] [Indexed: 12/26/2022] Open
Abstract
A few decades ago, drug discovery and development were limited to a bunch of medicinal chemists working in a lab with enormous amount of testing, validations, and synthetic procedures, all contributing to considerable investments in time and wealth to get one drug out into the clinics. The advancements in computational techniques combined with a boom in multi-omics data led to the development of various bioinformatics/pharmacoinformatics/cheminformatics tools that have helped speed up the drug development process. But with the advent of artificial intelligence (AI), machine learning (ML) and deep learning (DL), the conventional drug discovery process has been further rationalized. Extensive biological data in the form of big data present in various databases across the globe acts as the raw materials for the ML/DL-based approaches and helps in accurate identifications of patterns and models which can be used to identify therapeutically active molecules with much fewer investments on time, workforce and wealth. In this review, we have begun by introducing the general concepts in the drug discovery pipeline, followed by an outline of the fields in the drug discovery process where ML/DL can be utilized. We have also introduced ML and DL along with their applications, various learning methods, and training models used to develop the ML/DL-based algorithms. Furthermore, we have summarized various DL-based tools existing in the public domain with their application in the drug discovery paradigm which includes DL tools for identification of drug targets and drug–target interaction such as DeepCPI, DeepDTA, WideDTA, PADME DeepAffinity, and DeepPocket. Additionally, we have discussed various DL-based models used in protein structure prediction, de novo design of new chemical scaffolds, virtual screening of chemical libraries for hit identification, absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction, metabolite prediction, clinical trial design, and oral bioavailability prediction. In the end, we have tried to shed light on some of the successful ML/DL-based models used in the drug discovery and development pipeline while also discussing the current challenges and prospects of the application of DL tools in drug discovery and development. We believe that this review will be useful for medicinal and computational chemists searching for DL tools for use in their drug discovery projects.
Collapse
Affiliation(s)
- Sagorika Nag
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Anurag T. K. Baidya
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Abhimanyu Mandal
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Alen T. Mathew
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Bhanuranjan Das
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Bharti Devi
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Rajnish Kumar
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| |
Collapse
|
62
|
Yu L, Qiu W, Lin W, Cheng X, Xiao X, Dai J. HGDTI: predicting drug-target interaction by using information aggregation based on heterogeneous graph neural network. BMC Bioinformatics 2022; 23:126. [PMID: 35413800 PMCID: PMC9004085 DOI: 10.1186/s12859-022-04655-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 03/28/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In research on new drug discovery, the traditional wet experiment has a long period. Predicting drug-target interaction (DTI) in silico can greatly narrow the scope of search of candidate medications. Excellent algorithm model may be more effective in revealing the potential connection between drug and target in the bioinformatics network composed of drugs, proteins and other related data. RESULTS In this work, we have developed a heterogeneous graph neural network model, named as HGDTI, which includes a learning phase of network node embedding and a training phase of DTI classification. This method first obtains the molecular fingerprint information of drugs and the pseudo amino acid composition information of proteins, then extracts the initial features of nodes through Bi-LSTM, and uses the attention mechanism to aggregate heterogeneous neighbors. In several comparative experiments, the overall performance of HGDTI significantly outperforms other state-of-the-art DTI prediction models, and the negative sampling technology is employed to further optimize the prediction power of model. In addition, we have proved the robustness of HGDTI through heterogeneous network content reduction tests, and proved the rationality of HGDTI through other comparative experiments. These results indicate that HGDTI can utilize heterogeneous information to capture the embedding of drugs and targets, and provide assistance for drug development. CONCLUSIONS The HGDTI based on heterogeneous graph neural network model, can utilize heterogeneous information to capture the embedding of drugs and targets, and provide assistance for drug development. For the convenience of related researchers, a user-friendly web-server has been established at http://bioinfo.jcu.edu.cn/hgdti .
Collapse
Affiliation(s)
- Liyi Yu
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Weizhong Lin
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xiang Cheng
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China.
| | - Jiexia Dai
- School of Foreign Languages, Jingdezhen University, Jingdezhen, China
| |
Collapse
|
63
|
Shao K, Zhang Y, Wen Y, Zhang Z, He S, Bo X. DTI-HETA: prediction of drug-target interactions based on GCN and GAT on heterogeneous graph. Brief Bioinform 2022; 23:6563180. [PMID: 35380622 DOI: 10.1093/bib/bbac109] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Revised: 02/14/2022] [Accepted: 03/03/2022] [Indexed: 12/19/2022] Open
Abstract
Drug-target interaction (DTI) prediction plays an important role in drug repositioning, drug discovery and drug design. However, due to the large size of the chemical and genomic spaces and the complex interactions between drugs and targets, experimental identification of DTIs is costly and time-consuming. In recent years, the emerging graph neural network (GNN) has been applied to DTI prediction because DTIs can be represented effectively using graphs. However, some of these methods are only based on homogeneous graphs, and some consist of two decoupled steps that cannot be trained jointly. To further explore GNN-based DTI prediction by integrating heterogeneous graph information, this study regards DTI prediction as a link prediction problem and proposes an end-to-end model based on HETerogeneous graph with Attention mechanism (DTI-HETA). In this model, a heterogeneous graph is first constructed based on the drug-drug and target-target similarity matrices and the DTI matrix. Then, the graph convolutional neural network is utilized to obtain the embedded representation of the drugs and targets. To highlight the contribution of different neighborhood nodes to the central node in aggregating the graph convolution information, a graph attention mechanism is introduced into the node embedding process. Afterward, an inner product decoder is applied to predict DTIs. To evaluate the performance of DTI-HETA, experiments are conducted on two datasets. The experimental results show that our model is superior to the state-of-the-art methods. Also, the identification of novel DTIs indicates that DTI-HETA can serve as a powerful tool for integrating heterogeneous graph information to predict DTIs.
Collapse
Affiliation(s)
| | | | - Yuqi Wen
- Beijing Institute of Radiation Medicine, Beijing, China
| | | | - Song He
- Beijing Institute of Radiation Medicine, Beijing, China
| | - Xiaochen Bo
- Beijing Institute of Radiation Medicine, Beijing, China
| |
Collapse
|
64
|
Kang Q, Meng J, Luan Y. RNAI-FRID: novel feature representation method with information enhancement and dimension reduction for RNA-RNA interaction. Brief Bioinform 2022; 23:6555402. [PMID: 35352114 DOI: 10.1093/bib/bbac107] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 02/22/2022] [Accepted: 03/02/2022] [Indexed: 11/12/2022] Open
Abstract
Different ribonucleic acids (RNAs) can interact to form regulatory networks that play important role in many life activities. Molecular biology experiments can confirm RNA-RNA interactions to facilitate the exploration of their biological functions, but they are expensive and time-consuming. Machine learning models can predict potential RNA-RNA interactions, which provide candidates for molecular biology experiments to save a lot of time and cost. Using a set of suitable features to represent the sample is crucial for training powerful models, but there is a lack of effective feature representation for RNA-RNA interaction. This study proposes a novel feature representation method with information enhancement and dimension reduction for RNA-RNA interaction (named RNAI-FRID). Diverse base features are first extracted from RNA data to contain more sample information. Then, the extracted base features are used to construct the complex features through an arithmetic-level method. It greatly reduces the feature dimension while keeping the relationship between molecule features. Since the dimension reduction may cause information loss, in the process of complex feature construction, the arithmetic mean strategy is adopted to enhance the sample information further. Finally, three feature ranking methods are integrated for feature selection on constructed complex features. It can adaptively retain important features and remove redundant ones. Extensive experiment results show that RNAI-FRID can provide reliable feature representation for RNA-RNA interaction with higher efficiency and the model trained with generated features obtain better performance than other deep neural network predictors.
Collapse
Affiliation(s)
- Qiang Kang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, 116024, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, 116024, China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, Liaoning, 116024, China
| |
Collapse
|
65
|
Identification of Potential Diagnostic Biomarkers and Biological Pathways in Hypertrophic Cardiomyopathy Based on Bioinformatics Analysis. Genes (Basel) 2022; 13:genes13030530. [PMID: 35328083 PMCID: PMC8951232 DOI: 10.3390/genes13030530] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/10/2022] [Accepted: 03/14/2022] [Indexed: 12/13/2022] Open
Abstract
Hypertrophic cardiomyopathy (HCM) is a genetic heterogeneous disorder and the main cause of sudden cardiac death in adolescents and young adults. This study was aimed at identifying potential diagnostic biomarkers and biological pathways to help to diagnose and treat HCM through bioinformatics analysis. We selected the GSE36961 dataset from the Gene Expression Omnibus (GEO) database and identified 893 differentially expressed genes (DEGs). Subsequently, 12 modules were generated through weighted gene coexpression network analysis (WGCNA), and the turquoise module showed the highest negative correlation with HCM (cor = −0.9, p-value = 4 × 10−52). With the filtering standard gene significance (GS) < −0.7 and module membership (MM) > 0.9, 19 genes were then selected to establish the least absolute shrinkage and selection operator (LASSO) model, and LYVE1, MAFB, and MT1M were finally identified as key genes. The expression levels of these genes were additionally verified in the GSE130036 dataset. Gene Set Enrichment Analysis (GSEA) showed oxidative phosphorylation, tumor necrosis factor alpha-nuclear factor-κB (TNFα-NFκB), interferon-gamma (IFNγ) response, and inflammatory response were four pathways possibly related to HCM. In conclusion, LYVE1, MAFB, and MT1M were potential biomarkers of HCM, and oxidative stress, immune response as well as inflammatory response were likely to be associated with the pathogenesis of HCM.
Collapse
|
66
|
Sun J, Lu Y, Cui L, Fu Q, Wu H, Chen J. A Method of Optimizing Weight Allocation in Data Integration Based on Q-Learning for Drug-Target Interaction Prediction. Front Cell Dev Biol 2022; 10:794413. [PMID: 35356288 PMCID: PMC8959213 DOI: 10.3389/fcell.2022.794413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 02/14/2022] [Indexed: 11/26/2022] Open
Abstract
Calculating and predicting drug-target interactions (DTIs) is a crucial step in the field of novel drug discovery. Nowadays, many models have improved the prediction performance of DTIs by fusing heterogeneous information, such as drug chemical structure and target protein sequence and so on. However, in the process of fusion, how to allocate the weight of heterogeneous information reasonably is a huge challenge. In this paper, we propose a model based on Q-learning algorithm and Neighborhood Regularized Logistic Matrix Factorization (QLNRLMF) to predict DTIs. First, we obtain three different drug-drug similarity matrices and three different target-target similarity matrices by using different similarity calculation methods based on heterogeneous data, including drug chemical structure, target protein sequence and drug-target interactions. Then, we initialize a set of weights for the drug-drug similarity matrices and target-target similarity matrices respectively, and optimize them through Q-learning algorithm. When the optimal weights are obtained, a new drug-drug similarity matrix and a new drug-drug similarity matrix are obtained by linear combination. Finally, the drug target interaction matrix, the new drug-drug similarity matrices and the target-target similarity matrices are used as inputs to the Neighborhood Regularized Logistic Matrix Factorization (NRLMF) model for DTIs. Compared with the existing six methods of NetLapRLS, BLM-NII, WNN-GIP, KBMF2K, CMF, and NRLMF, our proposed method has achieved better effect in the four benchmark datasets, including enzymes(E), nuclear receptors (NR), ion channels (IC) and G protein coupled receptors (GPCR).
Collapse
Affiliation(s)
- Jiacheng Sun
- School of Electronic and Information Engineering, SuZhou University of Science and Technology, Suzhou, China
- Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, China
- Suzhou Key Laboratory of Mobile Network Technology and Application, Suzhou University of Science and Technology, Suzhou, China
| | - You Lu
- School of Electronic and Information Engineering, SuZhou University of Science and Technology, Suzhou, China
- Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, China
- Suzhou Key Laboratory of Mobile Network Technology and Application, Suzhou University of Science and Technology, Suzhou, China
- *Correspondence: You Lu, ; Jianping Chen,
| | - Linqian Cui
- School of Electronic and Information Engineering, SuZhou University of Science and Technology, Suzhou, China
- Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, China
- Suzhou Key Laboratory of Mobile Network Technology and Application, Suzhou University of Science and Technology, Suzhou, China
| | - Qiming Fu
- School of Electronic and Information Engineering, SuZhou University of Science and Technology, Suzhou, China
- Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, China
- Suzhou Key Laboratory of Mobile Network Technology and Application, Suzhou University of Science and Technology, Suzhou, China
| | - Hongjie Wu
- School of Electronic and Information Engineering, SuZhou University of Science and Technology, Suzhou, China
| | - Jianping Chen
- Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, China
- School of Architecture and Urban Planning, Suzhou University of Science and Technology, Suzhou, China
- *Correspondence: You Lu, ; Jianping Chen,
| |
Collapse
|
67
|
Li J, Wang J, Lv H, Zhang Z, Wang Z. IMCHGAN: Inductive Matrix Completion With Heterogeneous Graph Attention Networks for Drug-Target Interactions Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:655-665. [PMID: 34115592 DOI: 10.1109/tcbb.2021.3088614] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Identification of targets among known drugs plays an important role in drug repurposing and discovery. Computational approaches for prediction of drug-target interactions (DTIs)are highly desired in comparison to traditional biological experiments as its fast and low price. Moreover, recent advances of systems biology approaches have generated large-scale heterogeneous, biological information networks data, which offer opportunities for machine learning-based identification of DTIs. We present a novel Inductive Matrix Completion with Heterogeneous Graph Attention Network approach (IMCHGAN)for predicting DTIs. IMCHGAN first adopts a two-level neural attention mechanism approach to learn drug and target latent feature representations from the DTI heterogeneous network respectively. Then, the learned latent features are fed into the Inductive Matrix Completion (IMC)prediction score model which computes the best projection from drug space onto target space and output DTI score via the inner product of projected drug and target feature representations. IMCHGAN is an end-to-end neural network learning framework where the parameters of both the prediction score model and the feature representation learning model are simultaneously optimized via backpropagation under supervising of the observed known drug-target interactions data. We compare IMCHGAN with other state-of-the-art baselines on two real DTI experimental datasets. The results show that our method is superior to existing methods in term of AUC and AUPR. Moreover, IMCHGAN also shows it has strong predictive power for novel (unknown)DTIs. All datasets and code can be obtained from https://github.com/ljatynu/IMCHGAN/.
Collapse
|
68
|
Drug-target interaction prediction via an ensemble of weighted nearest neighbors with interaction recovery. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02495-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
69
|
Trapotsi MA, Hosseini-Gerami L, Bender A. Computational analyses of mechanism of action (MoA): data, methods and integration. RSC Chem Biol 2022; 3:170-200. [PMID: 35360890 PMCID: PMC8827085 DOI: 10.1039/d1cb00069a] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 12/09/2021] [Indexed: 12/15/2022] Open
Abstract
The elucidation of a compound's Mechanism of Action (MoA) is a challenging task in the drug discovery process, but it is important in order to rationalise phenotypic findings and to anticipate potential side-effects. Bioinformatic approaches, advances in machine learning techniques and the increasing deposition of high-throughput data in public databases have significantly contributed to recent advances in the field, but it is not straightforward to decide which data and methods are most suitable to use in a given case. In this review, we focus on these methods and data and their applications in generating MoA hypotheses for subsequent experimental validation. We discuss compound-specific data such as -omics, cell morphology and bioactivity data, as well as commonly used supplementary prior knowledge such as network and pathway data, and provide information on databases where this data can be accessed. In terms of methodologies, we discuss both well-established methods (connectivity mapping, pathway enrichment) as well as more developing methods (neural networks and multi-omics integration). Finally, we review case studies where the MoA of a compound was successfully suggested from computational analysis by incorporating multiple data modalities and/or methodologies. Our aim for this review is to provide researchers with insights into the benefits and drawbacks of both the data and methods in terms of level of understanding, biases and interpretation - and to highlight future avenues of investigation which we foresee will improve the field of MoA elucidation, including greater public access to -omics data and methodologies which are capable of data integration.
Collapse
Affiliation(s)
- Maria-Anna Trapotsi
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Layla Hosseini-Gerami
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Andreas Bender
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| |
Collapse
|
70
|
Mabonga L, Masamba P, Kappo AP. Inhibitory potential of a benzoxazole derivative, 4FI against SNRPG∼RING finger domain protein complex as a lead compound in the discovery of anti-cancer drugs: A molecular dynamics simulation approach. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.100993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
|
71
|
Sinha K, Ghosh J, Sil PC. Machine Learning in Drug Metabolism Study. Curr Drug Metab 2022; 23:1012-1026. [PMID: 36578255 DOI: 10.2174/1389200224666221227094144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 10/27/2022] [Accepted: 11/01/2022] [Indexed: 12/30/2022]
Abstract
Metabolic reactions in the body transform the administered drug into metabolites. These metabolites exhibit diverse biological activities. Drug metabolism is the major underlying cause of drug overdose-related toxicity, adversative drug effects and the drug's reduced efficacy. Though metabolic reactions deactivate a drug, drug metabolites are often considered pivotal agents for off-target effects or toxicity. On the other side, in combination drug therapy, one drug may influence another drug's metabolism and clearance and is thus considered one of the primary causes of drug-drug interactions. Today with the advancement of machine learning, the metabolic fate of a drug candidate can be comprehensively studied throughout the drug development procedure. Naïve Bayes, Logistic Regression, k-Nearest Neighbours, Decision Trees, different Boosting and Ensemble methods, Support Vector Machines and Artificial Neural Network boosted Deep Learning are some machine learning algorithms which are being extensively used in such studies. Such tools are covering several attributes of drug metabolism, with an emphasis on the prediction of drug-drug interactions, drug-target-interactions, clinical drug responses, metabolite predictions, sites of metabolism, etc. These reports are crucial for evaluating metabolic stability and predicting prospective drug-drug interactions, and can help pharmaceutical companies accelerate the drug development process in a less resourcedemanding manner than what in vitro studies offer. It could also help medical practitioners to use combinatorial drug therapy in a more resourceful manner. Also, with the help of the enormous growth of deep learning, traditional fields of computational drug development like molecular interaction fields, molecular docking, quantitative structure-toactivity relationship (QSAR) studies and quantum mechanical simulations are producing results which were unimaginable couple of years back. This review provides a glimpse of a few contextually relevant machine learning algorithms and then focuses on their outcomes in different studies.
Collapse
Affiliation(s)
- Krishnendu Sinha
- Department of Zoology, Jhargram Raj College, Jhargram-721507, India
| | - Jyotirmoy Ghosh
- Department of Chemistry, Banwarilal Bhalotia College, Asansol-713303, India
| | - Parames Chandra Sil
- Department of Division of Molecular Medicine, Bose Institute, Kolkata-700054, India
| |
Collapse
|
72
|
Alakus TB, Turkoglu I. A Comparative Study of Amino Acid Encoding Methods for Predicting Drug-Target Interactions in COVID-19 Disease. MODELING, CONTROL AND DRUG DEVELOPMENT FOR COVID-19 OUTBREAK PREVENTION 2022:619-643. [DOI: 10.1007/978-3-030-72834-2_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
73
|
Lin HH, Zhang QR, Kong X, Zhang L, Zhang Y, Tang Y, Xu H. Machine learning prediction of antiviral-HPV protein interactions for anti-HPV pharmacotherapy. Sci Rep 2021; 11:24367. [PMID: 34934067 PMCID: PMC8692573 DOI: 10.1038/s41598-021-03000-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Accepted: 11/22/2021] [Indexed: 02/05/2023] Open
Abstract
Persistent infection with high-risk types Human Papillomavirus could cause diseases including cervical cancers and oropharyngeal cancers. Nonetheless, so far there is no effective pharmacotherapy for treating the infection from high-risk HPV types, and hence it remains to be a severe threat to the health of female. Based on drug repositioning strategy, we trained and benchmarked multiple machine learning models so as to predict potential effective antiviral drugs for HPV infection in this work. Through optimizing models, measuring models' predictive performance using 182 pairs of antiviral-target interaction dataset which were all approved by the United States Food and Drug Administration, and benchmarking different models' predictive performance, we identified the optimized Support Vector Machine and K-Nearest Neighbor classifier with high precision score were the best two predictors (0.80 and 0.85 respectively) amongst classifiers of Support Vector Machine, Random forest, Adaboost, Naïve Bayes, K-Nearest Neighbors, and Logistic regression classifier. We applied these two predictors together and successfully predicted 57 pairs of antiviral-HPV protein interactions from 864 pairs of antiviral-HPV protein associations. Our work provided good drug candidates for anti-HPV drug discovery. So far as we know, we are the first one to conduct such HPV-oriented computational drug repositioning study.
Collapse
Affiliation(s)
- Hui-Heng Lin
- Yuebei People's Hospital, Shantou University Medical College, No. 133 of Huimin South road, Wujiang District, Shaoguan City, 512025, China.
| | - Qian-Ru Zhang
- Key Lab of the Basic Pharmacology of the Ministry of Education, School of Pharmacy, Zunyi Medical University, Guizhou Province, 6 West Xue-Fu Road, Zunyi City, 563000, China
| | - Xiangjun Kong
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau Avenida de Universidade, Macau, 999078, Macau, China
| | - Liuping Zhang
- Department of Gynecology, Panyu Central Hospital, No. 8 of Fuyu East Road, Panyu District, Guangzhou, 511400, China
| | - Yong Zhang
- Interdisciplinary Research Center for Agriculture Green Development in Yangtze River Basin, Southwest University, Beibei District, No.1-2-1 Tiansheng Road, Chongqing, 400715, China
| | - Yanyan Tang
- Department of Neurology, The First Affiliated Hospital of Guangxi Medical University, No.6 Shuangyong Road, Nanning, 530021, Guangxi, China
| | - Hongyan Xu
- Yuebei People's Hospital, Shantou University Medical College, No. 133 of Huimin South road, Wujiang District, Shaoguan City, 512025, China.
- Department of Gynecology, Yuebei People's Hospital, Shantou University Medical College, No. 133 of Huimin South road, Wujiang District, Shaoguan City, 512025, China.
| |
Collapse
|
74
|
Wu X, Zeng W, Lin F, Zhou X. NeuRank: learning to rank with neural networks for drug-target interaction prediction. BMC Bioinformatics 2021; 22:567. [PMID: 34836495 PMCID: PMC8620576 DOI: 10.1186/s12859-021-04476-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 11/08/2021] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Experimental verification of a drug discovery process is expensive and time-consuming. Therefore, recently, the demand to more efficiently and effectively identify drug-target interactions (DTIs) has intensified. RESULTS We treat the prediction of DTIs as a ranking problem and propose a neural network architecture, NeuRank, to address it. Also, we assume that similar drug compounds are likely to interact with similar target proteins. Thus, in our model, we add drug and target similarities, which are very effective at improving the prediction of DTIs. Then, we develop NeuRank from a point-wise to a pair-wise, and further to list-wise model. CONCLUSION Finally, results from extensive experiments on five public data sets (DrugBank, Enzymes, Ion Channels, G-Protein-Coupled Receptors, and Nuclear Receptors) show that, in identifying DTIs, our models achieve better performance than other state-of-the-art methods.
Collapse
Affiliation(s)
- Xiujin Wu
- School of Informatics, Xiamen University, Xiamen, China
| | - Wenhua Zeng
- School of Informatics, Xiamen University, Xiamen, China
| | - Fan Lin
- School of Informatics, Xiamen University, Xiamen, China
| | - Xiuze Zhou
- Shuye Technology Co., Ltd., Hangzhou, China
| |
Collapse
|
75
|
Jung YS, Kim Y, Cho YR. Comparative analysis of network-based approaches and machine learning algorithms for predicting drug-target interactions. Methods 2021; 198:19-31. [PMID: 34737033 DOI: 10.1016/j.ymeth.2021.10.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 10/21/2021] [Accepted: 10/22/2021] [Indexed: 01/06/2023] Open
Abstract
Computational prediction of drug-target interactions (DTIs) is of particular importance in the process of drug repositioning because of its efficiency in selecting potential candidates for DTIs. A variety of computational methods for predicting DTIs have been proposed over the past decade. Our interest is which methods or techniques are the most advantageous for increasing prediction accuracy. This article provides a comprehensive overview of network-based, machine learning, and integrated DTI prediction methods. The network-based methods handle a DTI network along with drug and target similarities in a matrix form and apply graph-theoretic algorithms to identify new DTIs. Machine learning methods use known DTIs and the features of drugs and target proteins as training data to build a predictive model. Integrated methods combine these two techniques. We assessed the prediction performance of the selected state-of-the-art methods using two different benchmark datasets. Our experimental results demonstrate that the integrated methods outperform the others in general. Some previous methods showed low accuracy on predicting interactions of unknown drugs which do not exist in the training dataset. Combining similarity matrices from multiple features by data fusion was not beneficial in increasing prediction accuracy. Finally, we analyzed future directions for further improvements in DTI predictions.
Collapse
Affiliation(s)
- Yi-Sue Jung
- Division of Software, Yonsei University - Mirae Campus, Republic of Korea
| | - Yoonbee Kim
- Division of Software, Yonsei University - Mirae Campus, Republic of Korea
| | - Young-Rae Cho
- Division of Software, Yonsei University - Mirae Campus, Republic of Korea; Division of Digital Healthcare, Yonsei University - Mirae Campus, Republic of Korea.
| |
Collapse
|
76
|
Monteiro NRC, Ribeiro B, Arrais JP. Drug-Target Interaction Prediction: End-to-End Deep Learning Approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2364-2374. [PMID: 32142454 DOI: 10.1109/tcbb.2020.2977335] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The discovery of potential Drug-Target Interactions (DTIs) is a determining step in the drug discovery and repositioning process, as the effectiveness of the currently available antibiotic treatment is declining. Although putting efforts on the traditional in vivo or in vitro methods, pharmaceutical financial investment has been reduced over the years. Therefore, establishing effective computational methods is decisive to find new leads in a reasonable amount of time. Successful approaches have been presented to solve this problem but seldom protein sequences and structured data are used together. In this paper, we present a deep learning architecture model, which exploits the particular ability of Convolutional Neural Networks (CNNs) to obtain 1D representations from protein sequences (amino acid sequence) and compounds SMILES (Simplified Molecular Input Line Entry System) strings. These representations can be interpreted as features that express local dependencies or patterns that can then be used in a Fully Connected Neural Network (FCNN), acting as a binary classifier. The results achieved demonstrate that using CNNs to obtain representations of the data, instead of the traditional descriptors, lead to improved performance. The proposed end-to-end deep learning method outperformed traditional machine learning approaches in the correct classification of both positive and negative interactions.
Collapse
|
77
|
Zheng J, Xiao X, Qiu WR. iCDI-W2vCom: Identifying the Ion Channel-Drug Interaction in Cellular Networking Based on word2vec and node2vec. Front Genet 2021; 12:738274. [PMID: 34567088 PMCID: PMC8458815 DOI: 10.3389/fgene.2021.738274] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 08/02/2021] [Indexed: 12/04/2022] Open
Abstract
Ion channels are the second largest drug target family. Ion channel dysfunction may lead to a number of diseases such as Alzheimer’s disease, epilepsy, cephalagra, and type II diabetes. In the research work for predicting ion channel–drug, computational approaches are effective and efficient compared with the costly, labor-intensive, and time-consuming experimental methods. Most of the existing methods can only be used to deal with the ion channels of knowing 3D structures; however, the 3D structures of most ion channels are still unknown. Many predictors based on protein sequence were developed to address the challenge, while most of their results need to be improved, or predicting web servers are missing. In this paper, a sequence-based classifier, called “iCDI-W2vCom,” was developed to identify the interactions between ion channels and drugs. In the predictor, the drug compound was formulated by SMILES-word2vec, FP2-word2vec, SMILES-node2vec, and ECFPs via a 1184D vector, ion channel was represented by the word2vec via a 64D vector, and the prediction engine was operated by the LightGBM classifier. The accuracy and AUC achieved by iCDI-W2vCom via the fivefold cross validation were 91.95% and 0.9703, which outperformed other existing predictors in this area. A user-friendly web server for iCDI-W2vCom was established at http://www.jci-bioinfo.cn/icdiw2v. The proposed method may also be a potential method for predicting target–drug interaction.
Collapse
Affiliation(s)
- Jie Zheng
- Department of Computer Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- Department of Computer Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Wang-Ren Qiu
- Department of Computer Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| |
Collapse
|
78
|
Hoxha M, Kamberaj H. Automation of some macromolecular properties using a machine learning approach. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abe7b6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
In this study, we employed a newly developed method to predict macromolecular properties using a swarm artificial neural network (ANN) method as a machine learning approach. In this method, the molecular structures are represented by the feature description vectors used as training input data for a neural network. This study aims to develop an efficient approach for training an ANN using either experimental or quantum mechanics data. We aim to introduce an error model controlling the reliability of the prediction confidence interval using a bootstrapping swarm approach. We created different datasets of selected experimental or quantum mechanics results. Using this optimized ANN, we hope to predict properties and their statistical errors for new molecules. There are four datasets used in this study. That includes the dataset of 642 small organic molecules with known experimental hydration free energies, the dataset of 1475 experimental pKa values of ionizable groups in 192 proteins, the dataset of 2693 mutants in 14 proteins with given experimental values of changes in the Gibbs free energy, and a dataset of 7101 quantum mechanics heat of formation calculations. All the data are prepared and optimized using the AMBER force field in the CHARMM macromolecular computer simulation program. The bootstrapping swarm ANN code for performing the optimization and prediction is written in Python computer programming language. The descriptor vectors of the small molecules are based on the Coulomb matrix and sum over bond properties. For the macromolecular systems, they consider the chemical-physical fingerprints of the region in the vicinity of each amino acid.
Collapse
|
79
|
Carracedo-Reboredo P, Liñares-Blanco J, Rodríguez-Fernández N, Cedrón F, Novoa FJ, Carballal A, Maojo V, Pazos A, Fernandez-Lozano C. A review on machine learning approaches and trends in drug discovery. Comput Struct Biotechnol J 2021; 19:4538-4558. [PMID: 34471498 PMCID: PMC8387781 DOI: 10.1016/j.csbj.2021.08.011] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 08/06/2021] [Accepted: 08/06/2021] [Indexed: 12/30/2022] Open
Abstract
Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years.
Collapse
Key Words
- ADMET, Absorption, distribution, metabolism, elimination and toxicity
- ADR, Adverse Drug Reaction
- AI, Artificial Intelligence
- ANN, Artificial Neural Networks
- APFP, Atom Pairs 2d FingerPrint
- AUC, Area under the Curve
- BBB, Blood–Brain barrier
- CDK, Chemical Development Kit
- CNN, Convolutional Neural Networks
- CNS, Central Nervous System
- CPI, Compound-protein interaction
- CV, Cross Validation
- Cheminformatics
- DL, Deep Learning
- DNA, Deoxyribonucleic acid
- Deep Learning
- Drug Discovery
- ECFP, Extended Connectivity Fingerprints
- FDA, Food and Drug Administration
- FNN, Fully Connected Neural Networks
- FP, Fringerprints
- FS, Feature Selection
- GCN, Graph Convolutional Networks
- GEO, Gene Expression Omnibus
- GNN, Graph Neural Networks
- GO, Gene Ontology
- KEGG, Kyoto Encyclopedia of Genes and Genomes
- MACCS, Molecular ACCess System
- MCC, Matthews correlation coefficient
- MD, Molecular Descriptors
- MKL, Multiple Kernel Learning
- ML, Machine Learning
- Machine Learning
- Molecular Descriptors
- NB, Naive Bayes
- OOB, Out of Bag
- PCA, Principal Component Analyisis
- QSAR
- QSAR, Quantitative structure–activity relationship
- RF, Random Forest
- RNA, Ribonucleic Acid
- SMILES, simplified molecular-input line-entry system
- SVM, Support Vector Machines
- TCGA, The Cancer Genome Atlas
- WHO, World Health Organization
- t-SNE, t-Distributed Stochastic Neighbor Embedding
Collapse
Affiliation(s)
- Paula Carracedo-Reboredo
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Jose Liñares-Blanco
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
| | - Nereida Rodríguez-Fernández
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Department of Computer Science and Information Technologies, Faculty of Communication Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Francisco Cedrón
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Francisco J. Novoa
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Adrian Carballal
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Department of Computer Science and Information Technologies, Faculty of Communication Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Victor Maojo
- Biomedical Informatics Group, Artificial Intelligence Department, Polytechnic University of Madrid, Calle de los Ciruelos, Boadilla del Monte, Madrid 28660, Spain
| | - Alejandro Pazos
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR), Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| | - Carlos Fernandez-Lozano
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR), Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| |
Collapse
|
80
|
Liu G, Singha M, Pu L, Neupane P, Feinstein J, Wu HC, Ramanujam J, Brylinski M. GraphDTI: A robust deep learning predictor of drug-target interactions from multiple heterogeneous data. J Cheminform 2021; 13:58. [PMID: 34380569 PMCID: PMC8356453 DOI: 10.1186/s13321-021-00540-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 07/31/2021] [Indexed: 12/22/2022] Open
Abstract
Traditional techniques to identify macromolecular targets for drugs utilize solely the information on a query drug and a putative target. Nonetheless, the mechanisms of action of many drugs depend not only on their binding affinity toward a single protein, but also on the signal transduction through cascades of molecular interactions leading to certain phenotypes. Although using protein-protein interaction networks and drug-perturbed gene expression profiles can facilitate system-level investigations of drug-target interactions, utilizing such large and heterogeneous data poses notable challenges. To improve the state-of-the-art in drug target identification, we developed GraphDTI, a robust machine learning framework integrating the molecular-level information on drugs, proteins, and binding sites with the system-level information on gene expression and protein-protein interactions. In order to properly evaluate the performance of GraphDTI, we compiled a high-quality benchmarking dataset and devised a new cluster-based cross-validation protocol. Encouragingly, GraphDTI not only yields an AUC of 0.996 against the validation dataset, but it also generalizes well to unseen data with an AUC of 0.939, significantly outperforming other predictors. Finally, selected examples of identified drugtarget interactions are validated against the biomedical literature. Numerous applications of GraphDTI include the investigation of drug polypharmacological effects, side effects through offtarget binding, and repositioning opportunities.
Collapse
Affiliation(s)
- Guannan Liu
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Manali Singha
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Limeng Pu
- Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Prasanga Neupane
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Joseph Feinstein
- Department of Computer Science, Brown University, Providence, RI, 02902, USA
| | - Hsiao-Chun Wu
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - J Ramanujam
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA, 70803, USA.,Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, 70803, USA. .,Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, 70803, USA.
| |
Collapse
|
81
|
Visani GM, Hughes MC, Hassoun S. Enzyme Promiscuity Prediction Using Hierarchy-Informed Multi-Label Classification. Bioinformatics 2021; 37:2017–2024. [PMID: 33515234 PMCID: PMC8337005 DOI: 10.1093/bioinformatics/btab054] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 12/30/2020] [Accepted: 01/22/2021] [Indexed: 11/25/2022] Open
Abstract
MOTIVATION As experimental efforts are costly and time consuming, computational characterization of enzyme capabilities is an attractive alternative. We present and evaluate several machine-learning models to predict which of 983 distinct enzymes, as defined via the Enzyme Commission (EC) numbers, are likely to interact with a given query molecule. Our data consists of enzyme-substrate interactions from the BRENDA database. Some interactions are attributed to natural selection and involve the enzyme's natural substrates. The majority of the interactions however involve non-natural substrates, thus reflecting promiscuous enzymatic activities. RESULTS We frame this "enzyme promiscuity prediction" problem as a multi-label classification task. We maximally utilize inhibitor and unlabelled data to train prediction models that can take advantage of known hierarchical relationships between enzyme classes. We report that a hierarchical multi-label neural network, EPP-HMCNF, is the best model for solving this problem, outperforming k-nearest neighbours similarity-based and other machine learning models. We show that inhibitor information during training consistently improves predictive power, particularly for EPP-HMCNF. We also show that all promiscuity prediction models perform worse under a realistic data split when compared to a random data split, and when evaluating performance on non-natural substrates compared to natural substrates. AVAILABILITY AND IMPLEMENTATION We provide Python code for EPP-HMCNF and other models in a repository termed EPP (Enzyme Promiscuity Prediction) at https://github.com/hassounlab/EPP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gian Marco Visani
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Michael C Hughes
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Soha Hassoun
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
- Department of Chemical and Biological Engineering, Tufts University, Medford, MA 02155, USA
| |
Collapse
|
82
|
Targeting RNA structures in diseases with small molecules. Essays Biochem 2021; 64:955-966. [PMID: 33078198 PMCID: PMC7724634 DOI: 10.1042/ebc20200011] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/16/2020] [Accepted: 09/30/2020] [Indexed: 01/08/2023]
Abstract
RNA is crucial for gene expression and regulation. Recent advances in understanding of RNA biochemistry, structure and molecular biology have revealed the importance of RNA structure in cellular processes and diseases. Various approaches to discovering drug-like small molecules that target RNA structure have been developed. This review provides a brief introduction to RNA structural biology and how RNA structures function as disease regulators. We summarize approaches to targeting RNA with small molecules and highlight their advantages, shortcomings and therapeutic potential.
Collapse
|
83
|
Kashyap K, Siddiqi MI. Recent trends in artificial intelligence-driven identification and development of anti-neurodegenerative therapeutic agents. Mol Divers 2021; 25:1517-1539. [PMID: 34282519 DOI: 10.1007/s11030-021-10274-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Accepted: 07/05/2021] [Indexed: 12/12/2022]
Abstract
Neurological disorders affect various aspects of life. Finding drugs for the central nervous system is a very challenging and complex task due to the involvement of the blood-brain barrier, P-glycoprotein, and the drug's high attrition rates. The availability of big data present in online databases and resources has enabled the emergence of artificial intelligence techniques including machine learning to analyze, process the data, and predict the unknown data with high efficiency. The use of these modern techniques has revolutionized the whole drug development paradigm, with an unprecedented acceleration in the central nervous system drug discovery programs. Also, the new deep learning architectures proposed in many recent works have given a better understanding of how artificial intelligence can tackle big complex problems that arose due to central nervous system disorders. Therefore, the present review provides comprehensive and up-to-date information on machine learning/artificial intelligence-triggered effort in the brain care domain. In addition, a brief overview is presented on machine learning algorithms and their uses in structure-based drug design, ligand-based drug design, ADMET prediction, de novo drug design, and drug repurposing. Lastly, we conclude by discussing the major challenges and limitations posed and how they can be tackled in the future by using these modern machine learning/artificial intelligence approaches.
Collapse
Affiliation(s)
- Kushagra Kashyap
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Central Drug Research Institute (CSIR-CDRI) Campus, Lucknow, India.,Molecular and Structural Biology Division, CSIR-Central Drug Research Institute (CSIR-CDRI), Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India
| | - Mohammad Imran Siddiqi
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Central Drug Research Institute (CSIR-CDRI) Campus, Lucknow, India. .,Molecular and Structural Biology Division, CSIR-Central Drug Research Institute (CSIR-CDRI), Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India.
| |
Collapse
|
84
|
El-Behery H, Attia AF, El-Feshawy N, Torkey H. Efficient machine learning model for predicting drug-target interactions with case study for Covid-19. Comput Biol Chem 2021; 93:107536. [PMID: 34271420 PMCID: PMC8256690 DOI: 10.1016/j.compbiolchem.2021.107536] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 06/23/2021] [Accepted: 06/24/2021] [Indexed: 11/30/2022]
Abstract
Background Discover possible Drug Target Interactions (DTIs) is a decisive step in the detection of the effects of drugs as well as drug repositioning. There is a strong incentive to develop effective computational methods that can effectively predict potential DTIs, as traditional DTI laboratory experiments are expensive, time-consuming, and labor-intensive. Some technologies have been developed for this purpose, however large numbers of interactions have not yet been detected, the accuracy of their prediction still low, and protein sequences and structured data are rarely used together in the prediction process. Methods This paper presents DTIs prediction model that takes advantage of the special capacity of the structured form of proteins and drugs. Our model obtains features from protein amino-acid sequences using physical and chemical properties, and from drugs smiles (Simplified Molecular Input Line Entry System) strings using encoding techniques. Comparing the proposed model with different existing methods under K-fold cross validation, empirical results show that our model based on ensemble learning algorithms for DTI prediction provide more accurate results from both structures and features data. Results The proposed model is applied on two datasets:Benchmark (feature only) datasets and DrugBank (Structure data) datasets. Experimental results obtained by Light-Boost and ExtraTree using structures and feature data results in 98 % accuracy and 0.97 f-score comparing to 94 % and 0.92 achieved by the existing methods. Moreover, our model can successfully predict more yet undiscovered interactions, and hence can be used as a practical tool to drug repositioning. A case study of applying our prediction model on the proteins that are known to be affected by Corona viruses in order to predict the possible interactions among these proteins and existing drugs is performed. Also, our model is applied on Covid-19 related drugs announced on DrugBank. The results show that some drugs like DB00691 and DB05203 are predicted with 100 % accuracy to interact with ACE2 protein. This protein is a self-membrane protein that enables Covid-19 infection. Hence, our model can be used as an effective tool in drug reposition to predict possible drug treatments for Covid-19.
Collapse
Affiliation(s)
- Heba El-Behery
- Department of Computer Science and Engineering, Faculty of Engineering, Kafrelsheikh University, Kafr_El_Sheikh, Egypt.
| | - Abdel-Fattah Attia
- Department of Computer Science and Engineering, Faculty of Engineering, Kafrelsheikh University, Kafr_El_Sheikh, Egypt.
| | - Nawal El-Feshawy
- Computer Science & Engineering Department, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt.
| | - Hanaa Torkey
- Computer Science & Engineering Department, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt.
| |
Collapse
|
85
|
Binding affinity prediction for binary drug-target interactions using semi-supervised transfer learning. J Comput Aided Mol Des 2021; 35:883-900. [PMID: 34189637 DOI: 10.1007/s10822-021-00404-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 06/18/2021] [Indexed: 10/21/2022]
Abstract
In the field of drug-target interactions prediction, the majority of approaches formulated the problem as a simple binary classification task. These methods used binary drug-target interaction datasets to train their models. The prediction of drug-target interactions is inherently a regression problem and these interactions would be identified according to the binding affinity between drugs and targets. This paper deals the binary drug-target interactions and tries to identify the binary interactions based on the binding strength of a drug and its target. To this end, we propose a semi-supervised transfer learning approach to predict the binding affinity in a continuous spectrum for binary interactions. Due to the lack of training data with continuous binding affinity in the target domain, the proposed method makes use of the information available in other domains (i.e. source domain), via the transfer learning approach. The general framework of our algorithm is based on an objective function, which considers the performance in both source and target domains as well as the unlabeled data in the target domain via a regularization term. To optimize this objective function, we make use of a gradient boosting machine which constructs the final model. To assess the performance of the proposed method, we have used some benchmark datasets with binary interactions for four classes of human proteins. Our algorithm identifies interactions in a more realistic situation. According to the experimental results, our regression model performs better than the state-of-the-art methods in some procedures.
Collapse
|
86
|
Applications of artificial intelligence to drug design and discovery in the big data era: a comprehensive review. Mol Divers 2021; 25:1643-1664. [PMID: 34110579 DOI: 10.1007/s11030-021-10237-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 05/26/2021] [Indexed: 10/21/2022]
Abstract
Artificial intelligence (AI) renders cutting-edge applications in diverse sectors of society. Due to substantial progress in high-performance computing, the development of superior algorithms, and the accumulation of huge biological and chemical data, computer-assisted drug design technology is playing a key role in drug discovery with its advantages of high efficiency, fast speed, and low cost. Over recent years, due to continuous progress in machine learning (ML) algorithms, AI has been extensively employed in various drug discovery stages. Very recently, drug design and discovery have entered the big data era. ML algorithms have progressively developed into a deep learning technique with potent generalization capability and more effectual big data handling, which further promotes the integration of AI technology and computer-assisted drug discovery technology, hence accelerating the design and discovery of the newest drugs. This review mainly summarizes the application progression of AI technology in the drug discovery process, and explores and compares its advantages over conventional methods. The challenges and limitations of AI in drug design and discovery have also been discussed.
Collapse
|
87
|
Multiple-Molecule Drug Design Based on Systems Biology Approaches and Deep Neural Network to Mitigate Human Skin Aging. Molecules 2021; 26:molecules26113178. [PMID: 34073305 PMCID: PMC8197996 DOI: 10.3390/molecules26113178] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 05/20/2021] [Accepted: 05/24/2021] [Indexed: 01/23/2023] Open
Abstract
Human skin aging is affected by various biological signaling pathways, microenvironment factors and epigenetic regulations. With the increasing demand for cosmetics and pharmaceuticals to prevent or reverse skin aging year by year, designing multiple-molecule drugs for mitigating skin aging is indispensable. In this study, we developed strategies for systems medicine design based on systems biology methods and deep neural networks. We constructed the candidate genomewide genetic and epigenetic network (GWGEN) via big database mining. After doing systems modeling and applying system identification, system order detection and principle network projection methods with real time-profile microarray data, we could obtain core signaling pathways and identify essential biomarkers based on the skin aging molecular progression mechanisms. Afterwards, we trained a deep neural network of drug–target interaction in advance and applied it to predict the potential candidate drugs based on our identified biomarkers. To narrow down the candidate drugs, we designed two filters considering drug regulation ability and drug sensitivity. With the proposed systems medicine design procedure, we not only shed the light on the skin aging molecular progression mechanisms but also suggested two multiple-molecule drugs for mitigating human skin aging from young adulthood to middle age and middle age to old age, respectively.
Collapse
|
88
|
Li P, Wang J, Qiao Y, Chen H, Yu Y, Yao X, Gao P, Xie G, Song S. An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Brief Bioinform 2021; 22:6262238. [PMID: 33940598 DOI: 10.1093/bib/bbab109] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 03/06/2021] [Accepted: 03/12/2021] [Indexed: 11/13/2022] Open
Abstract
How to produce expressive molecular representations is a fundamental challenge in artificial intelligence-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and poor generalization capability. Here, we propose a novel molecular pre-training graph-based deep learning framework, named MPG, that learns molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemical insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 14 benchmark datasets. The pre-trained MolGNet in MPG has the potential to become an advanced molecular encoder in the drug discovery pipeline.
Collapse
Affiliation(s)
- Pengyong Li
- Department of Biomedical Engineering at Tsinghua University, China
| | - Jun Wang
- Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China
| | - Yixuan Qiao
- Operations Research and Cybernetics at Beijing University of Technology, China
| | - Hao Chen
- Cybernetics at Beijing University of Technology, China
| | - Yihuan Yu
- Beijing University of Biomedical Engineering, China
| | - Xiaojun Yao
- Analytical Chemistry and Chemoinformatics at Lanzhou University, China
| | - Peng Gao
- Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China
| | - Guotong Xie
- Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China
| | - Sen Song
- Tsinghua Laboratory of Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Haidian, 100084 Beijing, China
| |
Collapse
|
89
|
Yuan T, Werman JM, Sampson NS. The pursuit of mechanism of action: uncovering drug complexity in TB drug discovery. RSC Chem Biol 2021; 2:423-440. [PMID: 33928253 PMCID: PMC8081351 DOI: 10.1039/d0cb00226g] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Accepted: 12/23/2020] [Indexed: 12/21/2022] Open
Abstract
Whole cell-based phenotypic screens have become the primary mode of hit generation in tuberculosis (TB) drug discovery during the last two decades. Different drug screening models have been developed to mirror the complexity of TB disease in the laboratory. As these culture conditions are becoming more and more sophisticated, unraveling the drug target and the identification of the mechanism of action (MOA) of compounds of interest have additionally become more challenging. A good understanding of MOA is essential for the successful delivery of drug candidates for TB treatment due to the high level of complexity in the interactions between Mycobacterium tuberculosis (Mtb) and the TB drug used to treat the disease. There is no single "standard" protocol to follow and no single approach that is sufficient to fully investigate how a drug restrains Mtb. However, with the recent advancements in -omics technologies, there are multiple strategies that have been developed generally in the field of drug discovery that have been adapted to comprehensively characterize the MOAs of TB drugs in the laboratory. These approaches have led to the successful development of preclinical TB drug candidates, and to a better understanding of the pathogenesis of Mtb infection. In this review, we describe a plethora of efforts based upon genetic, metabolomic, biochemical, and computational approaches to investigate TB drug MOAs. We assess these different platforms for their strengths and limitations in TB drug MOA elucidation in the context of Mtb pathogenesis. With an emphasis on the essentiality of MOA identification, we outline the unmet needs in delivering TB drug candidates and provide direction for further TB drug discovery.
Collapse
Affiliation(s)
- Tianao Yuan
- Department of Chemistry, Stony Brook UniversityStony BrookNY 11794-3400USA+1-631-632-5738+1-631-632-7952
| | - Joshua M. Werman
- Department of Chemistry, Stony Brook UniversityStony BrookNY 11794-3400USA+1-631-632-5738+1-631-632-7952
| | - Nicole S. Sampson
- Department of Chemistry, Stony Brook UniversityStony BrookNY 11794-3400USA+1-631-632-5738+1-631-632-7952
| |
Collapse
|
90
|
Yang Y, Xu X. Identification of key genes in coronary artery disease: an integrative approach based on weighted gene co-expression network analysis and their correlation with immune infiltration. Aging (Albany NY) 2021; 13:8306-8319. [PMID: 33686958 PMCID: PMC8034924 DOI: 10.18632/aging.202638] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 01/29/2021] [Indexed: 12/04/2022]
Abstract
This study aimed to identify key genes related to coronary artery disease (CAD) and its association with immune cells infiltration. GSE20680 and GSE20681 were downloaded from GEO. We identified red and pink modules in WGCNA analysis and found 104 genes in these two modules. Next, least absolute shrinkage and selection operator (LASSO) logistic regression was used to screen and verify the diagnostic markers of CAD. We identified ASCC2, LRRC18, and SLC25A37 as the key genes in CAD diagnosis. We further studied the immune cells infiltration in CAD patients with CIBERSORT, and the correlation between key genes and infiltrating immune cells was analyzed. We also found immune cells, including macrophages M0, mast cells resting and T cells CD8, were associated with ASCC2, LRRC18 and SLC25A37. Gene enrichment analysis indicated that these genes mainly enriched in apoptotic signaling pathway for biological pathway analysis, riboflavin metabolism for KEGG analysis. The diagnostic efficiency of these key genes measured by AUC in the training set, testing set and validation cohort was 0.92, 0.96 and 0.83, respectively. In conclusion, ASCC2, LRRC18 and SLC25A37 can be used as diagnostic markers of CAD, and immune cell infiltration plays an important role in the onset and development of CAD.
Collapse
Affiliation(s)
- Yang Yang
- Fourth Affiliated Hospital of China Medical University, Huanggu, Shenyang 110032, Liaoning, China
| | - Xiangshan Xu
- Fourth Affiliated Hospital of China Medical University, Huanggu, Shenyang 110032, Liaoning, China
| |
Collapse
|
91
|
Tanoli Z, Vähä-Koskela M, Aittokallio T. Artificial intelligence, machine learning, and drug repurposing in cancer. Expert Opin Drug Discov 2021; 16:977-989. [PMID: 33543671 DOI: 10.1080/17460441.2021.1883585] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Introduction: Drug repurposing provides a cost-effective strategy to re-use approved drugs for new medical indications. Several machine learning (ML) and artificial intelligence (AI) approaches have been developed for systematic identification of drug repurposing leads based on big data resources, hence further accelerating and de-risking the drug development process by computational means.Areas covered: The authors focus on supervised ML and AI methods that make use of publicly available databases and information resources. While most of the example applications are in the field of anticancer drug therapies, the methods and resources reviewed are widely applicable also to other indications including COVID-19 treatment. A particular emphasis is placed on the use of comprehensive target activity profiles that enable a systematic repurposing process by extending the target profile of drugs to include potent off-targets with therapeutic potential for a new indication.Expert opinion: The scarcity of clinical patient data and the current focus on genetic aberrations as primary drug targets may limit the performance of anticancer drug repurposing approaches that rely solely on genomics-based information. Functional testing of cancer patient cells exposed to a large number of targeted therapies and their combinations provides an additional source of repurposing information for tissue-aware AI approaches.
Collapse
Affiliation(s)
- Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLife, University of Helsinki, Helsinki, Finland
| | - Markus Vähä-Koskela
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLife, University of Helsinki, Helsinki, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLife, University of Helsinki, Helsinki, Finland.,Institute for Cancer Research, Department of Cancer Genetics, Oslo University Hospital, Oslo, Norway.,Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway
| |
Collapse
|
92
|
Drug-Target Interaction Prediction Based on Adversarial Bayesian Personalized Ranking. BIOMED RESEARCH INTERNATIONAL 2021; 2021:6690154. [PMID: 33628808 PMCID: PMC7889346 DOI: 10.1155/2021/6690154] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 01/17/2021] [Accepted: 01/23/2021] [Indexed: 12/13/2022]
Abstract
The prediction of drug-target interaction (DTI) is a key step in drug repositioning. In recent years, many studies have tried to use matrix factorization to predict DTI, but they only use known DTIs and ignore the features of drug and target expression profiles, resulting in limited prediction performance. In this study, we propose a new DTI prediction model named AdvB-DTI. Within this model, the features of drug and target expression profiles are associated with Adversarial Bayesian Personalized Ranking through matrix factorization. Firstly, according to the known drug-target relationships, a set of ternary partial order relationships is generated. Next, these partial order relationships are used to train the latent factor matrix of drugs and targets using the Adversarial Bayesian Personalized Ranking method, and the matrix factorization is improved by the features of drug and target expression profiles. Finally, the scores of drug-target pairs are achieved by the inner product of latent factors, and the DTI prediction is performed based on the score ranking. The proposed model effectively takes advantage of the idea of learning to rank to overcome the problem of data sparsity, and perturbation factors are introduced to make the model more robust. Experimental results show that our model could achieve a better DTI prediction performance.
Collapse
|
93
|
Peng J, Wang Y, Guan J, Li J, Han R, Hao J, Wei Z, Shang X. An end-to-end heterogeneous graph representation learning-based framework for drug-target interaction prediction. Brief Bioinform 2021; 22:6124914. [PMID: 33517357 DOI: 10.1093/bib/bbaa430] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Revised: 12/01/2020] [Accepted: 12/23/2020] [Indexed: 12/28/2022] Open
Abstract
Accurately identifying potential drug-target interactions (DTIs) is a key step in drug discovery. Although many related experimental studies have been carried out for identifying DTIs in the past few decades, the biological experiment-based DTI identification is still timeconsuming and expensive. Therefore, it is of great significance to develop effective computational methods for identifying DTIs. In this paper, we develop a novel 'end-to-end' learning-based framework based on heterogeneous 'graph' convolutional networks for 'DTI' prediction called end-to-end graph (EEG)-DTI. Given a heterogeneous network containing multiple types of biological entities (i.e. drug, protein, disease, side-effect), EEG-DTI learns the low-dimensional feature representation of drugs and targets using a graph convolutional networks-based model and predicts DTIs based on the learned features. During the training process, EEG-DTI learns the feature representation of nodes in an end-to-end mode. The evaluation test shows that EEG-DTI performs better than existing state-of-art methods. The data and source code are available at: https://github.com/MedicineBiology-AI/EEG-DTI.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yuxian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an 710072, China
| | - Jiaojiao Guan
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an 710072, China
| | - Jingyi Li
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an 710072, China
| | - Ruijiang Han
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China
| | - Jianye Hao
- College of Intelligence and Computing, Tianjin University, Tianjin 300072, China
| | - Zhongyu Wei
- School of Data Science, Fudan University, Shanghai 200433, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an 710072, China
| |
Collapse
|
94
|
Shu Z, Pang P, Wu X, Cui S, Xu Y, Zhang M. An Integrative Nomogram for Identifying Early-Stage Parkinson's Disease Using Non-motor Symptoms and White Matter-Based Radiomics Biomarkers From Whole-Brain MRI. Front Aging Neurosci 2021; 12:548616. [PMID: 33390927 PMCID: PMC7773758 DOI: 10.3389/fnagi.2020.548616] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 11/23/2020] [Indexed: 12/12/2022] Open
Abstract
Purpose: To develop and validate an integrative nomogram based on white matter (WM) radiomics biomarkers and nonmotor symptoms for the identification of early-stage Parkinson's disease (PD). Methods: The brain magnetic resonance imaging (MRI) and clinical characteristics of 336 subjects, including 168 patients with PD, were collected from the Parkinson's Progress Markers Initiative (PPMI) database. All subjects were randomly divided into training and test sets. According to the baseline MRI scans of patients in the training set, the WM was segmented to extract the radiomic features of each patient and develop radiomics biomarkers, which were then combined with nonmotor symptoms to build an integrative nomogram using machine learning. Finally, the diagnostic accuracy and reliability of the nomogram were evaluated using a receiver operating characteristic curve and test data, respectively. In addition, we investigated 58 patients with atypical PD who had imaging scans without evidence of dopaminergic deficit (SWEDD) to verify whether the nomogram was able to distinguish patients with typical PD from patients with SWEDD. A decision curve analysis was also performed to validate the clinical practicality of the nomogram. Results: The area under the curve values of the integrative nomogram for the training, testing and verification sets were 0.937, 0.922, and 0.836, respectively; the specificity values were 83.8, 88.2, and 91.38%, respectively; and the sensitivity values were 84.6, 82.4, and 70.69%, respectively. A significant difference in the number of patients with PD was observed between the high-risk group and the low-risk group based on the nomogram (P < 0.05). Conclusion: This integrative nomogram is a new potential method to identify patients with early-stage PD.
Collapse
Affiliation(s)
- Zhenyu Shu
- Department of Radiology, Zhejiang Provincial People's Hospital, Affiliated People's Hospital of Hangzhou Medical College, Hangzhou, China
| | | | - Xiao Wu
- Department of Radiology, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Sijia Cui
- Second Clinical College, Zhejiang Chinese Medical University, Hangzhou, China
| | - Yuyun Xu
- Department of Radiology, Zhejiang Provincial People's Hospital, Affiliated People's Hospital of Hangzhou Medical College, Hangzhou, China
| | - Minming Zhang
- Department of Radiology, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
95
|
Lu J, Hou Y, Ge S, Wang X, Wang J, Hu T, Lv Y, He H, Wang C. Screened antipsychotic drugs inhibit SARS-CoV-2 binding with ACE2 in vitro. Life Sci 2020; 266:118889. [PMID: 33310043 PMCID: PMC7834886 DOI: 10.1016/j.lfs.2020.118889] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 11/25/2020] [Accepted: 12/05/2020] [Indexed: 01/06/2023]
Abstract
Aim The coronavirus disease 2019 (COVID-19) pandemic has swept the globe and no specific effective drug has been identified. Drug repurposing is a well-known method to address the crisis in a time-critical fashion. Antipsychotic drugs (APDs) have been reported to inhibit DNA replication of hepatitis B virus, measles virus germination, and HIV infection, along with replication of SARS-CoV and MERS-CoV, both of which interact with host cells as SARS-CoV-2. Methods Nineteen APDs were screened using ACE2-HEK293T cell membrane chromatography (ACE2-HEK293T/CMC). Cytotoxicity assay, coronavirus spike pseudotype virus entry assay, surface plasmon resonance, and virtual molecular docking were applied to detect affinity between ACE2 protein and drugs and a potential antiviral property of the screened compounds. Key findings After the CMC screening, 8 of the 19 APDs were well-retained on ACE2-HEK293T/CMC column and showed significant antiviral activities in vitro. Three quarters of them belong to phenothiazine and could significantly inhibit the entrance of coronavirus into ACE2-HEK293T cells. Aother two drugs, aripiprazole and tiapride, exhibited weaker inhibition. We selected five of the drugs for subsequent evaluation. All five showed similar affinity to ACE2 and virtual molecular docking demonstrated they bound with different amino acids respectively on ACE2 which SARS-CoV-2 binds to. Significance Eight APDs were screened for binding with ACE2, five of which demonstrated potential protective effects against SARS-CoV-2 through acting on ACE2. Although the five drugs have a weak ability to block SARS-CoV-2 with a single binding site, they may provide a synergistic effect in adjuvant therapy of COVID-19 infection.
Collapse
Affiliation(s)
- Jiayu Lu
- School of Pharmacy, Xi'an Jiaotong University, Xi'an 710061, China
| | - Yajing Hou
- School of Pharmacy, Xi'an Jiaotong University, Xi'an 710061, China
| | - Shuai Ge
- School of Pharmacy, Xi'an Jiaotong University, Xi'an 710061, China
| | - Xiangjun Wang
- School of Pharmacy, Xi'an Jiaotong University, Xi'an 710061, China
| | - Jue Wang
- School of Pharmacy, Xi'an Jiaotong University, Xi'an 710061, China
| | - Tian Hu
- School of Pharmacy, Xi'an Jiaotong University, Xi'an 710061, China
| | - Yuexin Lv
- School of Pharmacy, Xi'an Jiaotong University, Xi'an 710061, China
| | - Huaizhen He
- School of Pharmacy, Xi'an Jiaotong University, Xi'an 710061, China
| | - Cheng Wang
- School of Pharmacy, Xi'an Jiaotong University, Xi'an 710061, China.
| |
Collapse
|
96
|
Patel L, Shukla T, Huang X, Ussery DW, Wang S. Machine Learning Methods in Drug Discovery. Molecules 2020; 25:E5277. [PMID: 33198233 PMCID: PMC7696134 DOI: 10.3390/molecules25225277] [Citation(s) in RCA: 118] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 11/04/2020] [Accepted: 11/09/2020] [Indexed: 12/30/2022] Open
Abstract
The advancements of information technology and related processing techniques have created a fertile base for progress in many scientific fields and industries. In the fields of drug discovery and development, machine learning techniques have been used for the development of novel drug candidates. The methods for designing drug targets and novel drug discovery now routinely combine machine learning and deep learning algorithms to enhance the efficiency, efficacy, and quality of developed outputs. The generation and incorporation of big data, through technologies such as high-throughput screening and high through-put computational analysis of databases used for both lead and target discovery, has increased the reliability of the machine learning and deep learning incorporated techniques. The use of these virtual screening and encompassing online information has also been highlighted in developing lead synthesis pathways. In this review, machine learning and deep learning algorithms utilized in drug discovery and associated techniques will be discussed. The applications that produce promising results and methods will be reviewed.
Collapse
Affiliation(s)
- Lauv Patel
- Chemistry Department, University of Arkansas at Little Rock, Little Rock, AR 72204, USA; (L.P.); (T.S.)
| | - Tripti Shukla
- Chemistry Department, University of Arkansas at Little Rock, Little Rock, AR 72204, USA; (L.P.); (T.S.)
| | - Xiuzhen Huang
- Department of Computer Science, Arkansas State University, Jonesboro, AR 72467, USA;
| | - David W. Ussery
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA;
| | - Shanzhi Wang
- Chemistry Department, University of Arkansas at Little Rock, Little Rock, AR 72204, USA; (L.P.); (T.S.)
| |
Collapse
|
97
|
Li P, Li Y, Hsieh CY, Zhang S, Liu X, Liu H, Song S, Yao X. TrimNet: learning molecular representation from triplet messages for biomedicine. Brief Bioinform 2020; 22:5955940. [PMID: 33147620 DOI: 10.1093/bib/bbaa266] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 09/11/2020] [Accepted: 09/14/2020] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Computational methods accelerate drug discovery and play an important role in biomedicine, such as molecular property prediction and compound-protein interaction (CPI) identification. A key challenge is to learn useful molecular representation. In the early years, molecular properties are mainly calculated by quantum mechanics or predicted by traditional machine learning methods, which requires expert knowledge and is often labor-intensive. Nowadays, graph neural networks have received significant attention because of the powerful ability to learn representation from graph data. Nevertheless, current graph-based methods have some limitations that need to be addressed, such as large-scale parameters and insufficient bond information extraction. RESULTS In this study, we proposed a graph-based approach and employed a novel triplet message mechanism to learn molecular representation efficiently, named triplet message networks (TrimNet). We show that TrimNet can accurately complete multiple molecular representation learning tasks with significant parameter reduction, including the quantum properties, bioactivity, physiology and CPI prediction. In the experiments, TrimNet outperforms the previous state-of-the-art method by a significant margin on various datasets. Besides the few parameters and high prediction accuracy, TrimNet could focus on the atoms essential to the target properties, providing a clear interpretation of the prediction tasks. These advantages have established TrimNet as a powerful and useful computational tool in solving the challenging problem of molecular representation learning. AVAILABILITY The quantum and drug datasets are available on the website of MoleculeNet: http://moleculenet.ai. The source code is available in GitHub: https://github.com/yvquanli/trimnet. CONTACT xjyao@lzu.edu.cn, songsen@tsinghua.edu.cn.
Collapse
Affiliation(s)
- Pengyong Li
- Department of Biomedical Engineering at Tsinghua University
| | - Yuquan Li
- College of Chemistry and Chemical Engineering at Lanzhou University
| | | | | | | | | | | | | |
Collapse
|
98
|
Fu L, Li Y, Cheng A, Pang P, Shu Z. A Novel Machine Learning-derived Radiomic Signature of the Whole Lung Differentiates Stable From Progressive COVID-19 Infection: A Retrospective Cohort Study. J Thorac Imaging 2020; 35:361-368. [PMID: 32555006 PMCID: PMC7682797 DOI: 10.1097/rti.0000000000000544] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
OBJECTIVE This study aimed to use the radiomics signatures of a machine learning-based tool to evaluate the prognosis of patients with coronavirus disease 2019 (COVID-19) infection. METHODS The clinical and imaging data of 64 patients with confirmed diagnoses of COVID-19 were retrospectively selected and divided into a stable group and a progressive group according to the data obtained from the ongoing treatment process. Imaging features from whole-lung images from baseline computed tomography (CT) scans were extracted and dimensionality reduction was performed. Support vector machines were used to construct radiomics signatures and to compare differences between the 2 groups. We also compared the differences of signature scores in the clinical, laboratory, and CT image feature subgroups and finally analyzed the correlation between the radiomics features of the constructed signature and the other features including clinical, laboratory, and CT imaging features. RESULTS The signature has a good classification effect for the stable group and the progressive group, with area under curve, sensitivity, and specificity of 0.833, 80.95%, and 74.42%, respectively. Signature score differences in laboratory and CT imaging features between subgroups were not statistically significant (P>0.05); cough was negatively correlated with GLCM Entropy_angle 90_offset4 (r=-0.578), but was positively correlated with ShortRunEmphhasis_AllDirect_offset4_SD (r=0.454); C-reactive protein was positively correlated with Cluster Prominence_ AllDirect_offset 4_ SD (r=0.47). CONCLUSION The radiomics signature of the whole lung based on machine learning may reveal the changes of lung microstructure in the early stage and help to indicate the progression of the disease.
Collapse
Affiliation(s)
| | - Yongchou Li
- Department of Radiology, The Third Affiliated Hospital of Wenzhou Medical University, Ruian, Zhejiang Province
| | | | | | - Zhenyu Shu
- Radiology, Zhejiang Provincial People’s Hospital, People’s Hospital of Hangzhou Medical College, Hangzhou
| |
Collapse
|
99
|
Parvizi P, Azuaje F, Theodoratou E, Luz S. A Network-Based Embedding Method for Drug-Target Interaction Prediction. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2020:5304-5307. [PMID: 33019181 DOI: 10.1109/embc44109.2020.9176165] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Integration of multi-omics and pharmacological data can help researchers understand the impact of drugs on dynamic biological systems. Network-based approaches to such integration explore the interaction of different cellular components and drugs. However, with ever-increasing amounts of data, processing these high-dimensional biological networks requires powerful tools. We investigate whether network embeddings can address this problem by providing an effective method for dimensionality reduction in drug-related networks. A neural network-based embedding method is employed to encode protein-protein, protein-disease, drug-drug and drug-disease networks for the prediction of novel drug-target interactions. We found that drug-target interaction prediction using embeddings of heterogeneous networks as input features performs comparably to state-of-the-art methods, exhibiting an area under the ROC curve of 84%, outperforming methods such as BLM-NII and NetLapRLS, and coming very close to the best performing network methods such as HNM, CMF and DTINet. These encouraging results suggest that further investigation of this approach is warranted.
Collapse
|
100
|
Agamah FE, Mazandu GK, Hassan R, Bope CD, Thomford NE, Ghansah A, Chimusa ER. Computational/in silico methods in drug target and lead prediction. Brief Bioinform 2020; 21:1663-1675. [PMID: 31711157 PMCID: PMC7673338 DOI: 10.1093/bib/bbz103] [Citation(s) in RCA: 91] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 07/17/2019] [Accepted: 07/18/2019] [Indexed: 01/10/2023] Open
Abstract
Drug-like compounds are most of the time denied approval and use owing to the unexpected clinical side effects and cross-reactivity observed during clinical trials. These unexpected outcomes resulting in significant increase in attrition rate centralizes on the selected drug targets. These targets may be disease candidate proteins or genes, biological pathways, disease-associated microRNAs, disease-related biomarkers, abnormal molecular phenotypes, crucial nodes of biological network or molecular functions. This is generally linked to several factors, including incomplete knowledge on the drug targets and unpredicted pharmacokinetic expressions upon target interaction or off-target effects. A method used to identify targets, especially for polygenic diseases, is essential and constitutes a major bottleneck in drug development with the fundamental stage being the identification and validation of drug targets of interest for further downstream processes. Thus, various computational methods have been developed to complement experimental approaches in drug discovery. Here, we present an overview of various computational methods and tools applied in predicting or validating drug targets and drug-like molecules. We provide an overview on their advantages and compare these methods to identify effective methods which likely lead to optimal results. We also explore major sources of drug failure considering the challenges and opportunities involved. This review might guide researchers on selecting the most efficient approach or technique during the computational drug discovery process.
Collapse
Affiliation(s)
- Francis E Agamah
- Division of Human Genetics, Department of Pathology, University of Cape Town, Observatory 7925, South Africa
| | - Gaston K Mazandu
- Division of Human Genetics, Department of Pathology, University of Cape Town, Observatory 7925, South Africa
- African Institute for Mathematical Sciences, Muizenberg, Cape Town 7945, South Africa
| | - Radia Hassan
- Division of Human Genetics, Department of Pathology, University of Cape Town, Observatory 7925, South Africa
| | - Christian D Bope
- Division of Human Genetics, Department of Pathology, University of Cape Town, Observatory 7925, South Africa
- Faculty of Sciences, University of Kinshasa, Kinshasa, Democratic Republic of Congo
| | - Nicholas E Thomford
- Division of Human Genetics, Department of Pathology, University of Cape Town, Observatory 7925, South Africa
- School of Medical Sciences, University of Cape Coast, PMB, Cape Coast, Ghana
| | - Anita Ghansah
- Noguchi Memorial Institute for Medical Research, College of Health Sciences, University of Ghana, PO Box LG 581, Legon, Ghana
| | - Emile R Chimusa
- Division of Human Genetics, Department of Pathology, University of Cape Town, Observatory 7925, South Africa
| |
Collapse
|