1
|
Picard M, Leclercq M, Bodein A, Scott-Boyer MP, Perin O, Droit A. Improving drug repositioning with negative data labeling using large language models. J Cheminform 2025; 17:16. [PMID: 39905466 DOI: 10.1186/s13321-025-00962-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Accepted: 01/20/2025] [Indexed: 02/06/2025] Open
Abstract
INTRODUCTION Drug repositioning offers numerous advantages, such as faster development timelines, reduced costs, and lower failure rates in drug development. Supervised machine learning is commonly used to score drug candidates but is hindered by the lack of reliable negative data-drugs that fail due to inefficacy or toxicity- which is difficult to access, lowering their prediction accuracy and generalization. Positive-Unlabeled (PU) learning has been used to overcome this issue by either randomly sampling unlabeled drugs or identifying probable negatives but still suffers from misclassification or oversimplified decision boundaries. RESULTS We proposed a novel strategy using Large Language Models (GPT-4) to analyze all clinical trials on prostate cancer and systematically identify true negatives. This approach showed remarkable improvement in predictive accuracy on independent test sets with a Matthews Correlation Coefficient of 0.76 (± 0.33) compared to 0.55 (± 0.15) and 0.48 (± 0.18) for two commonly used PU learning approaches. Using our labeling strategy, we created a training set of 26 positive and 54 experimentally validated negative drugs. We then applied a machine learning ensemble to this new dataset to assess the repurposing potential of the remaining 11,043 drugs in the DrugBank database. This analysis identified 980 potential candidates for prostate cancer. A detailed review of the top 30 revealed 9 promising drugs targeting various mechanisms such as genomic instability, p53 regulation, or TMPRSS2-ERG fusion. CONCLUSION By expanding our negative data labeling approach to all diseases within the ClinicalTrials.gov database, our method could greatly advance supervised drug repositioning, offering a more accurate and data-driven path for discovering new treatments.
Collapse
Affiliation(s)
- Milan Picard
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Mickael Leclercq
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Perin
- Digital Transformation and Innovation Department, L'Oréal Advanced Research, Aulnay-Sous-Bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada.
| |
Collapse
|
2
|
Zhapa-Camacho F, Tang Z, Kulmanov M, Hoehndorf R. Predicting protein functions using positive-unlabeled ranking with ontology-based priors. Bioinformatics 2024; 40:i401-i409. [PMID: 38940168 PMCID: PMC11211813 DOI: 10.1093/bioinformatics/btae237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
Automated protein function prediction is a crucial and widely studied problem in bioinformatics. Computationally, protein function is a multilabel classification problem where only positive samples are defined and there is a large number of unlabeled annotations. Most existing methods rely on the assumption that the unlabeled set of protein function annotations are negatives, inducing the false negative issue, where potential positive samples are trained as negatives. We introduce a novel approach named PU-GO, wherein we address function prediction as a positive-unlabeled ranking problem. We apply empirical risk minimization, i.e. we minimize the classification risk of a classifier where class priors are obtained from the Gene Ontology hierarchical structure. We show that our approach is more robust than other state-of-the-art methods on similarity-based and time-based benchmark datasets. AVAILABILITY AND IMPLEMENTATION Data and code are available at https://github.com/bio-ontology-research-group/PU-GO.
Collapse
Affiliation(s)
- Fernando Zhapa-Camacho
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
| | - Zhenwei Tang
- Department of Computer Science, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Maxat Kulmanov
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
| |
Collapse
|
3
|
Pillai M, Wu D. Validation approaches for computational drug repurposing: a review. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024; 2023:559-568. [PMID: 38222367 PMCID: PMC10785886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Affiliation(s)
- Malvika Pillai
- Stanford University, Stanford, CA
- University of North Carolina, Chapel Hill, NC
| | - Di Wu
- University of North Carolina, Chapel Hill, NC
| |
Collapse
|
4
|
Huang Y, Huang HY, Chen Y, Lin YCD, Yao L, Lin T, Leng J, Chang Y, Zhang Y, Zhu Z, Ma K, Cheng YN, Lee TY, Huang HD. A Robust Drug-Target Interaction Prediction Framework with Capsule Network and Transfer Learning. Int J Mol Sci 2023; 24:14061. [PMID: 37762364 PMCID: PMC10531393 DOI: 10.3390/ijms241814061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/27/2023] [Accepted: 08/28/2023] [Indexed: 09/29/2023] Open
Abstract
Drug-target interactions (DTIs) are considered a crucial component of drug design and drug discovery. To date, many computational methods were developed for drug-target interactions, but they are insufficiently informative for accurately predicting DTIs due to the lack of experimentally verified negative datasets, inaccurate molecular feature representation, and ineffective DTI classifiers. Therefore, we address the limitations of randomly selecting negative DTI data from unknown drug-target pairs by establishing two experimentally validated datasets and propose a capsule network-based framework called CapBM-DTI to capture hierarchical relationships of drugs and targets, which adopts pre-trained bidirectional encoder representations from transformers (BERT) for contextual sequence feature extraction from target proteins through transfer learning and the message-passing neural network (MPNN) for the 2-D graph feature extraction of compounds to accurately and robustly identify drug-target interactions. We compared the performance of CapBM-DTI with state-of-the-art methods using four experimentally validated DTI datasets of different sizes, including human (Homo sapiens) and worm (Caenorhabditis elegans) species datasets, as well as three subsets (new compounds, new proteins, and new pairs). Our results demonstrate that the proposed model achieved robust performance and powerful generalization ability in all experiments. The case study on treating COVID-19 demonstrates the applicability of the model in virtual screening.
Collapse
Affiliation(s)
- Yixian Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Hsi-Yuan Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yigang Chen
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yang-Chi-Dung Lin
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Lantian Yao
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Tianxiu Lin
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Junlin Leng
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yuan Chang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yuntian Zhang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Zihao Zhu
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Kun Ma
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yeong-Nan Cheng
- Institute of Bioinformatics and Systems Biology, Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (Y.-N.C.)
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (Y.-N.C.)
| | - Hsien-Da Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| |
Collapse
|
5
|
Ye Q, Zhang X, Lin X. Drug-Target Interaction Prediction via Graph Auto-Encoder and Multi-Subspace Deep Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2647-2658. [PMID: 36107905 DOI: 10.1109/tcbb.2022.3206907] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Computational prediction of drug-target interaction (DTI) is important for the new drug discovery. Currently, the deep neural network (DNN) has been widely used in DTI prediction. However, parameters of the DNN could be insufficiently trained and features of the data could be insufficiently utilized, because the DTI data is limited and its dimension is very high. To deal with the above problems, in this paper, a graph auto-encoder and multi-subspace deep neural network (GAEMSDNN) is designed. GAEMSDNN enhances its learning ability with a graph auto-encoder, a subspace layer and an ensemble layer. The graph auto-encoder can preserve the reconstruction information. The subspace layer can obtain different strong feature subsets. The ensemble layer in the GAEMSDNN can comprehensively utilize these strong feature subsets in a unified optimization framework. As a result, more features can be extracted from the network input and the DNN network can be better trained. In experiments, the results of GAEMSDNN are significantly improved compared to the previous methods, which validates the effectiveness of our strategies.
Collapse
|
6
|
Huang L, Zhang L, Chen X. Updated review of advances in microRNAs and complex diseases: taxonomy, trends and challenges of computational models. Brief Bioinform 2022; 23:6686738. [PMID: 36056743 DOI: 10.1093/bib/bbac358] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 07/24/2022] [Accepted: 07/30/2022] [Indexed: 12/12/2022] Open
Abstract
Since the problem proposed in late 2000s, microRNA-disease association (MDA) predictions have been implemented based on the data fusion paradigm. Integrating diverse data sources gains a more comprehensive research perspective, and brings a challenge to algorithm design for generating accurate, concise and consistent representations of the fused data. After more than a decade of research progress, a relatively simple algorithm like the score function or a single computation layer may no longer be sufficient for further improving predictive performance. Advanced model design has become more frequent in recent years, particularly in the form of reasonably combing multiple algorithms, a process known as model fusion. In the current review, we present 29 state-of-the-art models and introduce the taxonomy of computational models for MDA prediction based on model fusion and non-fusion. The new taxonomy exhibits notable changes in the algorithmic architecture of models, compared with that of earlier ones in the 2017 review by Chen et al. Moreover, we discuss the progresses that have been made towards overcoming the obstacles to effective MDA prediction since 2017 and elaborated on how future models can be designed according to a set of new schemas. Lastly, we analysed the strengths and weaknesses of each model category in the proposed taxonomy and proposed future research directions from diverse perspectives for enhancing model performance.
Collapse
Affiliation(s)
- Li Huang
- Academy of Arts and Design, Tsinghua University, Beijing, 10084, China.,The Future Laboratory, Tsinghua University, Beijing, 10084, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.,Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou, 221116, China
| |
Collapse
|
7
|
Monteiro NR, Oliveira JL, Arrais JP. DTITR: End-to-end drug–target binding affinity prediction with transformers. Comput Biol Med 2022; 147:105772. [DOI: 10.1016/j.compbiomed.2022.105772] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 06/07/2022] [Accepted: 06/19/2022] [Indexed: 11/03/2022]
|
8
|
Drug-target interaction prediction via an ensemble of weighted nearest neighbors with interaction recovery. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02495-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
9
|
Muslu O, Hoyt CT, Lacerda M, Hofmann-Apitius M, Frohlich H. GuiltyTargets: Prioritization of Novel Therapeutic Targets With Network Representation Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:491-500. [PMID: 32750869 DOI: 10.1109/tcbb.2020.3003830] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
The majority of clinical trials fail due to low efficacy of investigated drugs, often resulting from a poor choice of target protein. Existing computational approaches aim to support target selection either via genetic evidence or by putting potential targets into the context of a disease specific network reconstruction. The purpose of this work was to investigate whether network representation learning techniques could be used to allow for a machine learning based prioritization of putative targets. We propose a novel target prioritization approach, GuiltyTargets, which relies on attributed network representation learning of a genome-wide protein-protein interaction network annotated with disease-specific differential gene expression and uses positive-unlabeled (PU) machine learning for candidate ranking. We evaluated our approach on 12 datasets from six diseases of different type (cancer, metabolic, neurodegenerative) within a 10 times repeated 5-fold stratified cross-validation and achieved AUROC values between 0.92 - 0.97, significantly outperforming previous approaches that relied on manually engineered topological features. Moreover, we showed that GuiltyTargets allows for target repositioning across related disease areas. An application of GuiltyTargets to Alzheimer's disease resulted in a number of highly ranked candidates that are currently discussed as targets in the literature. Interestingly, one (COMT) is also the target of an approved drug (Tolcapone) for Parkinson's disease, highlighting the potential for target repositioning with our method. The GuiltyTargets Python package is available on PyPI and all code used for analysis can be found under the MIT License at https://github.com/GuiltyTargets. Attributed network representation learning techniques provide an interesting approach to effectively leverage the existing knowledge about the molecular mechanisms in different diseases. In this work, the combination with positive-unlabeled learning for target prioritization demonstrated a clear superiority compared to classical feature engineering approaches. Our work highlights the potential of attributed network representation learning for target prioritization. Given the overarching relevance of networks in computational biology we believe that attributed network representation learning techniques could have a broader impact in the future.
Collapse
|
10
|
Drug-Target Interaction Prediction Based on Multisource Information Weighted Fusion. CONTRAST MEDIA & MOLECULAR IMAGING 2021; 2021:6044256. [PMID: 34908912 PMCID: PMC8635946 DOI: 10.1155/2021/6044256] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 10/22/2021] [Indexed: 01/08/2023]
Abstract
Recently, in most existing studies, it is assumed that there are no interaction relationships between drugs and targets with unknown interactions. However, unknown interactions mean the relationships between drugs and targets have just not been confirmed. In this paper, samples for which the relationship between drugs and targets has not been determined are considered unlabeled. A weighted fusion method of multisource information is proposed to screen drug-target interactions. Firstly, some drug-target pairs which may have interactions are selected. Secondly, the selected drug-target pairs are added to the positive samples, which are regarded as known to have interaction relationships, and the original interaction relationship matrix is revised. Finally, the revised datasets are used to predict the interaction derived from the bipartite local model with neighbor-based interaction profile inferring (BLM-NII). Experiments demonstrate that the proposed method has greatly improved specificity, sensitivity, precision, and accuracy compared with the BLM-NII method. In addition, compared with several state-of-the-art methods, the area under the receiver operating characteristic curve (AUC) and the area under the precision-recall curve (AUPR) of the proposed method are excellent.
Collapse
|
11
|
Zhang S, Wang J, Lin Z, Liang Y. Application of Machine Learning Techniques in Drug-target Interactions Prediction. Curr Pharm Des 2021; 27:2076-2087. [PMID: 33238865 DOI: 10.2174/1381612826666201125105730] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Accepted: 08/06/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Drug-Target interactions are vital for drug design and drug repositioning. However, traditional lab experiments are both expensive and time-consuming. Various computational methods which applied machine learning techniques performed efficiently and effectively in the field. RESULTS The machine learning methods can be divided into three categories basically: Supervised methods, Semi-Supervised methods and Unsupervised methods. We reviewed recent representative methods applying machine learning techniques of each category in DTIs and summarized a brief list of databases frequently used in drug discovery. In addition, we compared the advantages and limitations of these methods in each category. CONCLUSION Every prediction model has both strengths and weaknesses and should be adopted in proper ways. Three major problems in DTIs prediction including the lack of nonreactive drug-target pairs data sets, over optimistic results due to the biases and the exploiting of regression models on DTIs prediction should be seriously considered.
Collapse
Affiliation(s)
- Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Jiesheng Wang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Zhenhui Lin
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Yunyun Liang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| |
Collapse
|
12
|
Ning Z, Yu S, Zhao Y, Sun X, Wu H, Yu X. Identification of miRNA-Mediated Subpathways as Prostate Cancer Biomarkers Based on Topological Inference in a Machine Learning Process Using Integrated Gene and miRNA Expression Data. Front Genet 2021; 12:656526. [PMID: 33841512 PMCID: PMC8024646 DOI: 10.3389/fgene.2021.656526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 03/02/2021] [Indexed: 11/25/2022] Open
Abstract
Accurately identifying classification biomarkers for distinguishing between normal and cancer samples is challenging. Additionally, the reproducibility of single-molecule biomarkers is limited by the existence of heterogeneous patient subgroups and differences in the sequencing techniques used to collect patient data. In this study, we developed a method to identify robust biomarkers (i.e., miRNA-mediated subpathways) associated with prostate cancer based on normal prostate samples and cancer samples from a dataset from The Cancer Genome Atlas (TCGA; n = 546) and datasets from the Gene Expression Omnibus (GEO) database (n = 139 and n = 90, with the latter being a cell line dataset). We also obtained 10 other cancer datasets to evaluate the performance of the method. We propose a multi-omics data integration strategy for identifying classification biomarkers using a machine learning method that involves reassigning topological weights to the genes using a directed random walk (DRW)-based method. A global directed pathway network (GDPN) was constructed based on the significantly differentially expressed target genes of the significantly differentially expressed miRNAs, which allowed us to identify the robust biomarkers in the form of miRNA-mediated subpathways (miRNAs). The activity value of each miRNA-mediated subpathway was calculated by integrating multiple types of data, which included the expression of the miRNA and the miRNAs’ target genes and GDPN topological information. Finally, we identified the high-frequency miRNA-mediated subpathways involved in prostate cancer using a support vector machine (SVM) model. The results demonstrated that we obtained robust biomarkers of prostate cancer, which could classify prostate cancer and normal samples. Our method outperformed seven other methods, and many of the identified biomarkers were associated with known clinical treatments.
Collapse
Affiliation(s)
- Ziyu Ning
- The Higher Educational Key Laboratory for Measuring and Control Technology and Instrumentations of Heilongjiang Province, Harbin University of Science and Technology, Harbin, China.,School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Shuang Yu
- The Higher Educational Key Laboratory for Measuring and Control Technology and Instrumentations of Heilongjiang Province, Harbin University of Science and Technology, Harbin, China
| | - Yanqiao Zhao
- The Higher Educational Key Laboratory for Measuring and Control Technology and Instrumentations of Heilongjiang Province, Harbin University of Science and Technology, Harbin, China
| | - Xiaoming Sun
- The Higher Educational Key Laboratory for Measuring and Control Technology and Instrumentations of Heilongjiang Province, Harbin University of Science and Technology, Harbin, China
| | - Haibin Wu
- The Higher Educational Key Laboratory for Measuring and Control Technology and Instrumentations of Heilongjiang Province, Harbin University of Science and Technology, Harbin, China
| | - Xiaoyang Yu
- The Higher Educational Key Laboratory for Measuring and Control Technology and Instrumentations of Heilongjiang Province, Harbin University of Science and Technology, Harbin, China
| |
Collapse
|
13
|
Chen Y, Sun H, Sun M, Shi C, Sun H, Shi X, Ji B, Cui J. Finding Colon Cancer- and Colorectal Cancer-Related Microbes Based on Microbe-Disease Association Prediction. Front Microbiol 2021; 12:650056. [PMID: 33796094 PMCID: PMC8007907 DOI: 10.3389/fmicb.2021.650056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 02/09/2021] [Indexed: 12/02/2022] Open
Abstract
Microbes are closely associated with the formation and development of diseases. The identification of the potential associations between microbes and diseases can boost the understanding of various complex diseases. Wet experiments applied to microbe-disease association (MDA) identification are costly and time-consuming. In this manuscript, we developed a novel computational model, NLLMDA, to find unobserved MDAs, especially for colon cancer and colorectal carcinoma. NLLMDA integrated negative MDA selection, linear neighborhood similarity, label propagation, information integration, and known biological data. The Gaussian association profile (GAP) similarity of microbes and GAPs similarity and symptom similarity of diseases were firstly computed. Secondly, linear neighborhood method was then applied to the above computed similarity matrices to obtain more stable performance. Thirdly, negative MDA samples were selected, and the label propagation algorithm was used to score for microbe-disease pairs. The final association probabilities can be computed based on the information integration method. NLLMDA was compared with the other five classical MDA methods and obtained the highest area under the curve (AUC) value of 0.9031 and 0.9335 on cross-validations of diseases and microbe-disease pairs. The results suggest that NLLMDA was an effective prediction method. More importantly, we found that Acidobacteriaceae may have a close link with colon cancer and Tannerella may densely associate with colorectal carcinoma.
Collapse
Affiliation(s)
- Yu Chen
- The Cancer Hospital of Jia Mu Si, Jiamusi, China
| | - Hongjian Sun
- Oncological Surgery, The Central Hospital of Jia Mu Si, Jiamusi, China
| | - Mengzhe Sun
- Oncological Surgery, The Central Hospital of Jia Mu Si, Jiamusi, China
| | - Changguo Shi
- Department of Thoracic Surgery, The Cancer Hospital of Jia Mu Si, Jiamusi, China
| | - Hongmei Sun
- Medical Oncology, The Cancer Hospital of Jia Mu Si, Jiamusi, China
| | - Xiaoli Shi
- Geneis Beijing Co., Ltd., Beijing, China
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Binbin Ji
- Geneis Beijing Co., Ltd., Beijing, China
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Jinpeng Cui
- Department of Laboratory Medicine, Yantaishan Hospital of Yantai City, Yantai, China
| |
Collapse
|
14
|
Wang J, Yang Z, Chen C, Xu Y, Wang H, Liu B, Zhang W, Jiang Y. Comprehensive circRNA Expression Profile and Construction of circRNAs-Related ceRNA Network in a Mouse Model of Autism. Front Genet 2021; 11:623584. [PMID: 33679870 PMCID: PMC7928284 DOI: 10.3389/fgene.2020.623584] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 12/23/2020] [Indexed: 12/27/2022] Open
Abstract
Autism is a common disease that seriously affects the quality of life. The role of circular RNAs (circRNAs) in autism remains largely unexplored. We aimed to detect the circRNA expression profile and construct a circRNA-based competing endogenous RNA (ceRNA) network in autism. Valproate acid was used to establish an in vivo model of autism in mice. A total of 1,059 differentially expressed circRNAs (477 upregulated and 582 downregulated) in autism group was identified by RNA sequencing. The expression of novel_circ_015779 and novel_circ_035247 were detected by real-time PCR. A ceRNA network based on altered circRNAs was established, with 9,715 nodes and 150,408 edges. Module analysis was conducted followed by GO and KEGG pathway enrichment analysis. The top three modules were all correlated with autism-related pathways involving “TGF-beta signaling pathway,” “Notch signaling pathway,” “MAPK signaling pathway,” “long term depression,” “thyroid hormone signaling pathway,” etc. The present study reveals a novel circRNA involved mechanisms in the pathogenesis of autism.
Collapse
Affiliation(s)
- Ji Wang
- Yangzhou Maternal and Child Health Hospital, Yangzhou, China.,Harbin Children's Hospital, Harbin, China
| | - Zhongxiu Yang
- Xuzhou Children's Hospital, Xuzhou Medical University, Xuzhou, China
| | - Canming Chen
- Yangzhou Maternal and Child Health Hospital, Yangzhou, China
| | - Yang Xu
- Yangzhou Maternal and Child Health Hospital, Yangzhou, China
| | - Hongguang Wang
- School of Civil Engineering, Northeast Forestry University, Harbin, China
| | - Bing Liu
- Translational Medicine Research and Cooperation Center of Northern China, Heilongjiang Academy of Medical Sciences, Harbin, China
| | - Wei Zhang
- Translational Medicine Research and Cooperation Center of Northern China, Heilongjiang Academy of Medical Sciences, Harbin, China
| | - Yanan Jiang
- Translational Medicine Research and Cooperation Center of Northern China, Heilongjiang Academy of Medical Sciences, Harbin, China.,Department of Pharmacology (State-Province Key Laboratories of Biomedicine- Pharmaceutics of China, Key Laboratory of Cardiovascular Research, Ministry of Education), College of Pharmacy, Harbin Medical University, Harbin, China
| |
Collapse
|
15
|
Wang C, Kurgan L. Survey of Similarity-Based Prediction of Drug-Protein Interactions. Curr Med Chem 2021; 27:5856-5886. [PMID: 31393241 DOI: 10.2174/0929867326666190808154841] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Revised: 04/16/2018] [Accepted: 10/23/2018] [Indexed: 12/20/2022]
Abstract
Therapeutic activity of a significant majority of drugs is determined by their interactions with proteins. Databases of drug-protein interactions (DPIs) primarily focus on the therapeutic protein targets while the knowledge of the off-targets is fragmented and partial. One way to bridge this knowledge gap is to employ computational methods to predict protein targets for a given drug molecule, or interacting drugs for given protein targets. We survey a comprehensive set of 35 methods that were published in high-impact venues and that predict DPIs based on similarity between drugs and similarity between protein targets. We analyze the internal databases of known PDIs that these methods utilize to compute similarities, and investigate how they are linked to the 12 publicly available source databases. We discuss contents, impact and relationships between these internal and source databases, and well as the timeline of their releases and publications. The 35 predictors exploit and often combine three types of similarities that consider drug structures, drug profiles, and target sequences. We review the predictive architectures of these methods, their impact, and we explain how their internal DPIs databases are linked to the source databases. We also include a detailed timeline of the development of these predictors and discuss the underlying limitations of the current resources and predictive tools. Finally, we provide several recommendations concerning the future development of the related databases and methods.
Collapse
Affiliation(s)
- Chen Wang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| |
Collapse
|
16
|
Peng L, Tian X, Shen L, Kuang M, Li T, Tian G, Yang J, Zhou L. Identifying Effective Antiviral Drugs Against SARS-CoV-2 by Drug Repositioning Through Virus-Drug Association Prediction. Front Genet 2020; 11:577387. [PMID: 33193695 PMCID: PMC7525008 DOI: 10.3389/fgene.2020.577387] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 08/18/2020] [Indexed: 12/12/2022] Open
Abstract
A new coronavirus called SARS-CoV-2 is rapidly spreading around the world. Over 16,558,289 infected cases with 656,093 deaths have been reported by July 29th, 2020, and it is urgent to identify effective antiviral treatment. In this study, potential antiviral drugs against SARS-CoV-2 were identified by drug repositioning through Virus-Drug Association (VDA) prediction. 96 VDAs between 11 types of viruses similar to SARS-CoV-2 and 78 small molecular drugs were extracted and a novel VDA identification model (VDA-RLSBN) was developed to find potential VDAs related to SARS-CoV-2. The model integrated the complete genome sequences of the viruses, the chemical structures of drugs, a regularized least squared classifier (RLS), a bipartite local model, and the neighbor association information. Compared with five state-of-the-art association prediction methods, VDA-RLSBN obtained the best AUC of 0.9085 and AUPR of 0.6630. Ribavirin was predicted to be the best small molecular drug, with a higher molecular binding energy of -6.39 kcal/mol with human angiotensin-converting enzyme 2 (ACE2), followed by remdesivir (-7.4 kcal/mol), mycophenolic acid (-5.35 kcal/mol), and chloroquine (-6.29 kcal/mol). Ribavirin, remdesivir, and chloroquine have been under clinical trials or supported by recent works. In addition, for the first time, our results suggested several antiviral drugs, such as FK506, with molecular binding energies of -11.06 and -10.1 kcal/mol with ACE2 and the spike protein, respectively, could be potentially used to prevent SARS-CoV-2 and remains to further validation. Drug repositioning through virus-drug association prediction can effectively find potential antiviral drugs against SARS-CoV-2.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Xiongfei Tian
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Ling Shen
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Ming Kuang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Tianbao Li
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Geng Tian
- Geneis (Beijing) Co., Ltd., Beijing, China
| | | | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|
17
|
Luo H, Li M, Yang M, Wu FX, Li Y, Wang J. Biomedical data and computational models for drug repositioning: a comprehensive review. Brief Bioinform 2020; 22:1604-1619. [PMID: 32043521 DOI: 10.1093/bib/bbz176] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 12/07/2019] [Accepted: 12/26/2019] [Indexed: 12/16/2022] Open
Abstract
Drug repositioning can drastically decrease the cost and duration taken by traditional drug research and development while avoiding the occurrence of unforeseen adverse events. With the rapid advancement of high-throughput technologies and the explosion of various biological data and medical data, computational drug repositioning methods have been appealing and powerful techniques to systematically identify potential drug-target interactions and drug-disease interactions. In this review, we first summarize the available biomedical data and public databases related to drugs, diseases and targets. Then, we discuss existing drug repositioning approaches and group them based on their underlying computational models consisting of classical machine learning, network propagation, matrix factorization and completion, and deep learning based models. We also comprehensively analyze common standard data sets and evaluation metrics used in drug repositioning, and give a brief comparison of various prediction methods on the gold standard data sets. Finally, we conclude our review with a brief discussion on challenges in computational drug repositioning, which includes the problem of reducing the noise and incompleteness of biomedical data, the ensemble of various computation drug repositioning methods, the importance of designing reliable negative samples selection methods, new techniques dealing with the data sparseness problem, the construction of large-scale and comprehensive benchmark data sets and the analysis and explanation of the underlying mechanisms of predicted interactions.
Collapse
Affiliation(s)
- Huimin Luo
- School of Computer Science and Engineering at Central South University
| | - Min Li
- School of Computer Science and Engineering at Central South University
| | - Mengyun Yang
- School of Computer Science and Engineering at Central South University
| | - Fang-Xiang Wu
- College of Engineering and the Department of Computer Science at University of Saskatchewan, Saskatoon, Canada
| | - Yaohang Li
- Department of Computer Science at Old Dominion University, Norfolk, USA
| | - Jianxin Wang
- School of Computer Science and Engineering at Central South University
| |
Collapse
|
18
|
Zong N, Wong RSN, Yu Y, Wen A, Huang M, Li N. Drug-target prediction utilizing heterogeneous bio-linked network embeddings. Brief Bioinform 2019; 22:568-580. [PMID: 31885036 DOI: 10.1093/bib/bbz147] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 10/11/2019] [Accepted: 10/29/2019] [Indexed: 11/12/2022] Open
Abstract
To enable modularization for network-based prediction, we conducted a review of known methods conducting the various subtasks corresponding to the creation of a drug-target prediction framework and associated benchmarking to determine the highest-performing approaches. Accordingly, our contributions are as follows: (i) from a network perspective, we benchmarked the association-mining performance of 32 distinct subnetwork permutations, arranging based on a comprehensive heterogeneous biomedical network derived from 12 repositories; (ii) from a methodological perspective, we identified the best prediction strategy based on a review of combinations of the components with off-the-shelf classification, inference methods and graph embedding methods. Our benchmarking strategy consisted of two series of experiments, totaling six distinct tasks from the two perspectives, to determine the best prediction. We demonstrated that the proposed method outperformed the existing network-based methods as well as how combinatorial networks and methodologies can influence the prediction. In addition, we conducted disease-specific prediction tasks for 20 distinct diseases and showed the reliability of the strategy in predicting 75 novel drug-target associations as shown by a validation utilizing DrugBank 5.1.0. In particular, we revealed a connection of the network topology with the biological explanations for predicting the diseases, 'Asthma' 'Hypertension', and 'Dementia'. The results of our benchmarking produced knowledge on a network-based prediction framework with the modularization of the feature selection and association prediction, which can be easily adapted and extended to other feature sources or machine learning algorithms as well as a performed baseline to comprehensively evaluate the utility of incorporating varying data sources.
Collapse
Affiliation(s)
- Nansu Zong
- Department of Health Sciences Research, Mayo Clinic, 200 First St. SW, Rochester, MN 55905, USA
| | - Rachael Sze Nga Wong
- Department of Bioengineering, UC San Diego, 9500 Gilman Drive, San Diego, CA 92093-0412, USA
| | - Yue Yu
- Department of Health Sciences Research, Mayo Clinic, 200 First St. SW, Rochester, MN 55905, USA
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, 200 First St. SW, Rochester, MN 55905, USA
| | - Ming Huang
- Department of Health Sciences Research, Mayo Clinic, 200 First St. SW, Rochester, MN 55905, USA
| | - Ning Li
- Scripps Research Institute, 10550 North Torrey Pines Road, San Diego, CA, 92037, USA
| |
Collapse
|
19
|
Liu N, Chen CB, Kumara S. Semi-Supervised Learning Algorithm for Identifying High-Priority Drug-Drug Interactions Through Adverse Event Reports. IEEE J Biomed Health Inform 2019; 24:57-68. [PMID: 31395567 DOI: 10.1109/jbhi.2019.2932740] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Identifying drug-drug interactions (DDIs) is a critical enabler for reducing adverse drug events and improving patient safety. Generating proper DDI alerts during prescribing workflow has the potential to prevent DDI-related adverse events. However, the implementation of DDI alerting system remains a challenge as users are experiencing alert overload which causes alert fatigue. One strategy to optimize the current system is to establish a list of high-priority DDIs for alerting purposes, though it is a resource-intensive task. In this study, we propose a machine learning framework to extract useful features from the FDA adverse event reports and then identify potential high-priority DDIs using an autoencoder-based semi-supervised learning algorithm. The experimental results demonstrate the effectiveness of using adverse event feature representations in differentiating high- and low-priority DDIs. Additionally, the proposed algorithm utilizes stacked autoencoders and weighted support vector machine for boosting classification performance, which outperforms other competing methods in terms of F-measure and AUC score. This framework integrates multiple information sources, leverages domain knowledge and clinical evidence, and provides a practical approach for pre-screening high-priority DDI candidates for medication alerts.
Collapse
|
20
|
Lin C, Ni S, Liang Y, Zeng X, Liu X. Learning to Predict Drug Target Interaction From Missing Not at Random Labels. IEEE Trans Nanobioscience 2019; 18:353-359. [PMID: 30969929 DOI: 10.1109/tnb.2019.2909293] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Abstract
The prediction of Drug-Target Interaction (DTI) is an important research direction in bioinformatics as it greatly shortens the development cycle of new drugs. State-of-the-art computational methods for DTI prediction adopt a binary classification framework. The supervision is incomplete, i.e. only a small amount of DTIs are known and treated as positive instances, while the rest are unknown and treated as negative. Two severe problems occur in such a framework: (1) the number of negative samples is overwhelming and (2) a negative label cannot rule out the possibility of a positive drug-target interaction. In this paper, we address the problem of learning from incomplete labels in DTI prediction. The key assumption here is that labels are missing not at random. For example, negative DTI labels are more likely to be missing because biomedical researchers prioritize to study DTIs that are more likely to be positive. We introduce a novel probabilistic model, factorization with non-random missing labels (FNML). It models the generative process for the DTI labels (i.e. the labels are positive or negative) and responses (i.e. the labels are observed or missing). In particular, the probability of observing or missing a label is associated with the sign of the label. In order to further reduce prediction variance and improve prediction accuracy on highly imbalanced DTI datasets, we present FNML-EN, an ensemble scheme which is designed specifically for FNML model. We conduct comprehensive experiments on the latest DTI database, demonstrating the superior and robust performance of the proposed models.
Collapse
|
21
|
Sharma R, Raicar G, Tsunoda T, Patil A, Sharma A. OPAL: prediction of MoRF regions in intrinsically disordered protein sequences. Bioinformatics 2019; 34:1850-1858. [PMID: 29360926 DOI: 10.1093/bioinformatics/bty032] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2017] [Accepted: 01/17/2018] [Indexed: 12/15/2022] Open
Abstract
Motivation Intrinsically disordered proteins lack stable 3-dimensional structure and play a crucial role in performing various biological functions. Key to their biological function are the molecular recognition features (MoRFs) located within long disordered regions. Computationally identifying these MoRFs from disordered protein sequences is a challenging task. In this study, we present a new MoRF predictor, OPAL, to identify MoRFs in disordered protein sequences. OPAL utilizes two independent sources of information computed using different component predictors. The scores are processed and combined using common averaging method. The first score is computed using a component MoRF predictor which utilizes composition and sequence similarity of MoRF and non-MoRF regions to detect MoRFs. The second score is calculated using half-sphere exposure (HSE), solvent accessible surface area (ASA) and backbone angle information of the disordered protein sequence, using information from the amino acid properties of flanks surrounding the MoRFs to distinguish MoRF and non-MoRF residues. Results OPAL is evaluated using test sets that were previously used to evaluate MoRF predictors, MoRFpred, MoRFchibi and MoRFchibi-web. The results demonstrate that OPAL outperforms all the available MoRF predictors and is the most accurate predictor available for MoRF prediction. It is available at http://www.alok-ai-lab.com/tools/opal/. Contact ashwini@hgc.jp or alok.sharma@griffith.edu.au. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ronesh Sharma
- School of Engineering and Physics, The University of the South Pacific, Suva, Fiji.,School of Electrical and Electronics Engineering, Fiji National University, Suva, Fiji
| | - Gaurav Raicar
- School of Engineering and Physics, The University of the South Pacific, Suva, Fiji
| | - Tatsuhiko Tsunoda
- Laboratory of Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan.,Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo 113-8510, Japan.,CREST, JST, Tokyo 113-8510, Japan
| | - Ashwini Patil
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | - Alok Sharma
- School of Engineering and Physics, The University of the South Pacific, Suva, Fiji.,Laboratory of Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan.,Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo 113-8510, Japan.,CREST, JST, Tokyo 113-8510, Japan.,Institute for Integrated and Intelligent Systems, Griffith University, Nathan, Brisbane, QLD, Australia
| |
Collapse
|
22
|
Zhou L, Li Z, Yang J, Tian G, Liu F, Wen H, Peng L, Chen M, Xiang J, Peng L. Revealing Drug-Target Interactions with Computational Models and Algorithms. Molecules 2019; 24:molecules24091714. [PMID: 31052598 PMCID: PMC6540161 DOI: 10.3390/molecules24091714] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 04/24/2019] [Accepted: 04/26/2019] [Indexed: 12/02/2022] Open
Abstract
Background: Identifying possible drug-target interactions (DTIs) has become an important task in drug research and development. Although high-throughput screening is becoming available, experimental methods narrow down the validation space because of extremely high cost, low success rate, and time consumption. Therefore, various computational models have been exploited to infer DTI candidates. Methods: We introduced relevant databases and packages, mainly provided a comprehensive review of computational models for DTI identification, including network-based algorithms and machine learning-based methods. Specially, machine learning-based methods mainly include bipartite local model, matrix factorization, regularized least squares, and deep learning. Results: Although computational methods have obtained significant improvement in the process of DTI prediction, these models have their limitations. We discussed potential avenues for boosting DTI prediction accuracy as well as further directions.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou 412007, Hunan, China.
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Henyang 421002, Hunan, China.
| | | | - Geng Tian
- Geneis (Beijing) Co. Ltd., Beijing 100102, China.
| | - Fuxing Liu
- School of Computer Science, Hunan University of Technology, Zhuzhou 412007, Hunan, China.
| | - Hong Wen
- School of Computer Science, Hunan University of Technology, Zhuzhou 412007, Hunan, China.
| | - Li Peng
- School of Computer Science, University of Science and Technology of Hunan, Xiangtan 411201, Hunan, China.
| | - Min Chen
- School of Computer Science, Hunan Institute of Technology, Henyang 421002, Hunan, China.
| | - Ju Xiang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
- Neuroscience Research Center, Department of Basic Medical Sciences, Changsha Medical University, Changsha 410219, Hunan, China.
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou 412007, Hunan, China.
| |
Collapse
|
23
|
Frey NC, Wang J, Vega Bellido GI, Anasori B, Gogotsi Y, Shenoy VB. Prediction of Synthesis of 2D Metal Carbides and Nitrides (MXenes) and Their Precursors with Positive and Unlabeled Machine Learning. ACS NANO 2019; 13:3031-3041. [PMID: 30830760 DOI: 10.1021/acsnano.8b08014] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Growing interest in the potential applications of two-dimensional (2D) materials has fueled advancement in the identification of 2D systems with exotic properties. Increasingly, the bottleneck in this field is the synthesis of these materials. Although theoretical calculations have predicted a myriad of promising 2D materials, only a few dozen have been experimentally realized since the initial discovery of graphene. Here, we adapt the state-of-the-art positive and unlabeled (PU) machine learning framework to predict which theoretically proposed 2D materials have the highest likelihood of being successfully synthesized. Using elemental information and data from high-throughput density functional theory calculations, we apply the PU learning method to the MXene family of 2D transition metal carbides, carbonitrides, and nitrides, and their layered precursor MAX phases, and identify 18 MXene compounds that are highly promising candidates for synthesis. By considering both the MXenes and their precursors, we further propose 20 synthesizable MAX phases that can be chemically exfoliated to produce MXenes.
Collapse
Affiliation(s)
- Nathan C Frey
- Department of Materials Science and Engineering , University of Pennsylvania , Philadelphia , Pennsylvania 19104 , United States
| | - Jin Wang
- Department of Materials Science and Engineering , University of Pennsylvania , Philadelphia , Pennsylvania 19104 , United States
| | - Gabriel Iván Vega Bellido
- Department of Materials Science and Engineering , University of Pennsylvania , Philadelphia , Pennsylvania 19104 , United States
- Department of Chemical Engineering , University of Puerto Rico at Mayagüez , Mayagüez 00681 , Puerto Rico
| | - Babak Anasori
- Department of Materials Science and Engineering and A.J. Drexel Nanomaterials Institute , Drexel University , Philadelphia , Pennsylvania 19104 , United States
| | - Yury Gogotsi
- Department of Materials Science and Engineering and A.J. Drexel Nanomaterials Institute , Drexel University , Philadelphia , Pennsylvania 19104 , United States
| | - Vivek B Shenoy
- Department of Materials Science and Engineering , University of Pennsylvania , Philadelphia , Pennsylvania 19104 , United States
| |
Collapse
|
24
|
Wang CC, Chen X, Qu J, Sun YZ, Li JQ. RFSMMA: A New Computational Model to Identify and Prioritize Potential Small Molecule-MiRNA Associations. J Chem Inf Model 2019; 59:1668-1679. [PMID: 30840454 DOI: 10.1021/acs.jcim.9b00129] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
More and more studies found that many complex human diseases occur accompanied by aberrant expression of microRNAs (miRNAs). Small molecule (SM) drugs have been utilized to treat complex human diseases by affecting the expression of miRNAs. Several computational methods were proposed to infer underlying associations between SMs and miRNAs. In our study, we proposed a new calculation model of random forest based small molecule-miRNA association prediction (RFSMMA) which was based on the known SM-miRNA associations in the SM2miR database. RFSMMA utilized the similarity of SMs and miRNAs as features to represent SM-miRNA pairs and further implemented the machine learning algorithm of random forest to train training samples and obtain a prediction model. In RFSMMA, integrating multiple kinds of similarity can avoid the bias of single similarity and choosing more reliable features from original features can represent SM-miRNA pairs more accurately. We carried out cross validations to assess predictive accuracy of RFSMMA. As a result, RFSMMA acquired AUCs of 0.9854, 0.9839, 0.7052, and 0.9917 ± 0.0008 under global leave-one-out cross validation (LOOCV), miRNA-fixed local LOOCV, SM-fixed local LOOCV, and 5-fold cross validation, respectively, under data set 1. Based on data set 2, RFSMMA obtained AUCs of 0.8456, 0.8463, 0.6653, and 0.8389 ± 0.0033 under four cross validations according to the order mentioned above. In addition, we implemented a case study on three common SMs, namely, 5-fluorouracil, 17β-estradiol, and 5-aza-2'-deoxycytidine. Among the top 50 associated miRNAs of these three SMs predicted by RFSMMA, 31, 32, and 28 miRNAs were verified, respectively. Therefore, RFSMMA is shown to be an effective and reliable tool for identifying underlying SM-miRNA associations.
Collapse
Affiliation(s)
- Chun-Chun Wang
- School of Information and Control Engineering , China University of Mining and Technology , Xuzhou 221116 , China
| | - Xing Chen
- School of Information and Control Engineering , China University of Mining and Technology , Xuzhou 221116 , China
| | - Jia Qu
- School of Information and Control Engineering , China University of Mining and Technology , Xuzhou 221116 , China
| | - Ya-Zhou Sun
- College of Computer Science and Software Engineering , Shenzhen University , Shenzhen 518060 , China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering , Shenzhen University , Shenzhen 518060 , China
| |
Collapse
|
25
|
Wang CC, Chen X, Yin J, Qu J. An integrated framework for the identification of potential miRNA-disease association based on novel negative samples extraction strategy. RNA Biol 2019; 16:257-269. [PMID: 30646823 DOI: 10.1080/15476286.2019.1568820] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
MicroRNAs (miRNAs) play an important role in prevention, diagnosis and treatment of human complex diseases. Predicting potential miRNA-disease associations could provide important prior information for medical researchers. Therefore, reliable computational models are expected to be an effective supplement for inferring associations between miRNAs and diseases. In this study, we developed a novel calculative model named Negative Samples Extraction based MiRNA-Disease Association prediction (NSEMDA). NSEMDA filtered reliable negative samples by two positive-unlabeled learning models, namely, the Spy and Rocchio techniques and calculated similarity weights for ambiguous samples. The positive samples, reliable negative samples and ambiguous samples with similarity weights were used to construct a Support Vector Machine-Similarity Weight model to predict miRNA-disease associations. NSEMDA improved the credibility of negative samples and reduced the impact of noise samples by introducing ambiguous samples with similarity weights to train prediction model. As a result, NSEMDA achieved the AUC of 0.8899 in global leave-one-out cross validation (LOOCV) and AUC of 0.8353 under local LOOCV. In 100 times 5-fold cross validation, NSEMDA obtained an average AUC of 0.8878 and standard deviation of 0.0014. These AUCs are higher than many classical models. Besides, we also carried out three kinds of case studies to evaluate the performance of NSEMDA. Among the top 50 potential related miRNAs of esophageal neoplasms, lung neoplasms and carcinoma hepatocellular predicted by NSEMDA, 46, 50 and 45 miRNAs were verified to be associated with the investigated disease by experimental evidences, respectively. Therefore, NSEMDA would be a reliable calculative model for inferring miRNA-disease associations.
Collapse
Affiliation(s)
- Chun-Chun Wang
- a School of Information and Control Engineering , China University of Mining and Technology , Xuzhou , China
| | - Xing Chen
- a School of Information and Control Engineering , China University of Mining and Technology , Xuzhou , China
| | - Jun Yin
- a School of Information and Control Engineering , China University of Mining and Technology , Xuzhou , China
| | - Jia Qu
- a School of Information and Control Engineering , China University of Mining and Technology , Xuzhou , China
| |
Collapse
|
26
|
Ni S, Lin C, Zeng X, Liang Y. Drug Target Interaction Prediction with Non-random Missing Labels. 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) 2018:496-501. [DOI: 10.1109/bibm.2018.8621514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
27
|
Wang C, Kurgan L. Review and comparative assessment of similarity-based methods for prediction of drug–protein interactions in the druggable human proteome. Brief Bioinform 2018; 20:2066-2087. [DOI: 10.1093/bib/bby069] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 06/26/2018] [Accepted: 07/10/2018] [Indexed: 12/18/2022] Open
Abstract
AbstractDrug–protein interactions (DPIs) underlie the desired therapeutic actions and the adverse side effects of a significant majority of drugs. Computational prediction of DPIs facilitates research in drug discovery, characterization and repurposing. Similarity-based methods that do not require knowledge of protein structures are particularly suitable for druggable genome-wide predictions of DPIs. We review 35 high-impact similarity-based predictors that were published in the past decade. We group them based on three types of similarities and their combinations that they use. We discuss and compare key aspects of these methods including source databases, internal databases and their predictive models. Using our novel benchmark database, we perform comparative empirical analysis of predictive performance of seven types of representative predictors that utilize each type of similarity individually and all possible combinations of similarities. We assess predictive quality at the database-wide DPI level and we are the first to also include evaluation over individual drugs. Our comprehensive analysis shows that predictors that use more similarity types outperform methods that employ fewer similarities, and that the model combining all three types of similarities secures area under the receiver operating characteristic curve of 0.93. We offer a comprehensive analysis of sensitivity of predictive performance to intrinsic and extrinsic characteristics of the considered predictors. We find that predictive performance is sensitive to low levels of similarities between sequences of the drug targets and several extrinsic properties of the input drug structures, drug profiles and drug targets. The benchmark database and a webserver for the seven predictors are freely available at http://biomine.cs.vcu.edu/servers/CONNECTOR/.
Collapse
Affiliation(s)
- Chen Wang
- Computer Science Department, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Computer Science Department, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|