1
|
Ansari M, White AD. Learning peptide properties with positive examples only. DIGITAL DISCOVERY 2024; 3:977-986. [PMID: 38756224 PMCID: PMC11094695 DOI: 10.1039/d3dd00218g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 03/30/2024] [Indexed: 05/18/2024]
Abstract
Deep learning can create accurate predictive models by exploiting existing large-scale experimental data, and guide the design of molecules. However, a major barrier is the requirement of both positive and negative examples in the classical supervised learning frameworks. Notably, most peptide databases come with missing information and low number of observations on negative examples, as such sequences are hard to obtain using high-throughput screening methods. To address this challenge, we solely exploit the limited known positive examples in a semi-supervised setting, and discover peptide sequences that are likely to map to certain antimicrobial properties via positive-unlabeled learning (PU). In particular, we use the two learning strategies of adapting base classifier and reliable negative identification to build deep learning models for inferring solubility, hemolysis, binding against SHP-2, and non-fouling activity of peptides, given their sequence. We evaluate the predictive performance of our PU learning method and show that by only using the positive data, it can achieve competitive performance when compared with the classical positive-negative (PN) classification approach, where there is access to both positive and negative examples.
Collapse
Affiliation(s)
- Mehrad Ansari
- Department of Chemical Engineering, University of Rochester Rochester NY 14627 USA
| | - Andrew D White
- Department of Chemical Engineering, University of Rochester Rochester NY 14627 USA
| |
Collapse
|
2
|
Zhao Y, Yang H, Chen Y, Du M, Gu W, Zhao W. Synthesis of environmentally friendly neonicotinoid insecticide with proper functional properties by theoretical methodologies. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2023; 268:115708. [PMID: 37979357 DOI: 10.1016/j.ecoenv.2023.115708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 11/08/2023] [Accepted: 11/14/2023] [Indexed: 11/20/2023]
Abstract
Modern insecticide substitutes using acetylcholine receptors (nAChR) as biochemical targets, such as neonicotinoid insecticides (NNIs), have been extensively researched. Only 12 compounds have been experimentally realized since the initial discovery of imidacloprid. Increasingly, the bottleneck in this field is to rapidly determine the synthesizability of NNI substitutes. Here, we designed a coupled evaluation system for synthesis prediction and validation, including the synthesis probability, reaction path difficulty, and electron transfer characteristics of NNIs and their substitutes. Firstly, a total of 1475 eigenvalues were generated and 52 critical eigenvalues were screened out through the Pearson's correlation coefficient. The positive and unlabeled (PU) machine learning was constructed using the critical eigenvalues NNIs, including 12 experimentally synthesized NNIs (positive samples) and 73 unsynthesized NNI substitutes (unlabeled samples). Results identified 3 NNI substitutes that were highly promising candidates for synthesis (synthesis probability > 0.5). The results of density functional theory demonstrated the ranking of their reaction ease was UN-1 (31.4 kcal/mol) > UN-2 (81.6 kcal/mol) > UN-3 (3.35 ×103 kcal/mol). Time-dependent density functional theory revealed that changes in the electron distribution and electron excitation type were critical factors affecting their synthesizability, and the local excitation type was more favorable for the synthesizability of NNI substituents. The findings provide significant guidance for NNIs synthesis, reducing the possible space of unlabeled samples to 95.89% of their original size, while also minimizing the cost of research on subsequent NNI substitutes.
Collapse
Affiliation(s)
- Yuanyuan Zhao
- College of New Energy and Environment, Jilin University, Changchun 130012, China; College of Environmental Science and Engineering, North China Electric Power University, Beijing 102206, China; MOE Key Laboratory of Resources and Environmental Systems Optimization, North China Electric Power University, Beijing 102206, China
| | - Hao Yang
- College of Environmental Science and Engineering, North China Electric Power University, Beijing 102206, China; MOE Key Laboratory of Resources and Environmental Systems Optimization, North China Electric Power University, Beijing 102206, China
| | - Yanbing Chen
- College of Environmental Science and Engineering, North China Electric Power University, Beijing 102206, China; MOE Key Laboratory of Resources and Environmental Systems Optimization, North China Electric Power University, Beijing 102206, China
| | - Meijin Du
- College of Environmental Science and Engineering, North China Electric Power University, Beijing 102206, China; MOE Key Laboratory of Resources and Environmental Systems Optimization, North China Electric Power University, Beijing 102206, China
| | - Wenwen Gu
- College of Environmental Science and Engineering, North China Electric Power University, Beijing 102206, China; MOE Key Laboratory of Resources and Environmental Systems Optimization, North China Electric Power University, Beijing 102206, China
| | - Wenjin Zhao
- College of New Energy and Environment, Jilin University, Changchun 130012, China.
| |
Collapse
|
3
|
Zhao Y, Yin J, Zhang L, Zhang Y, Chen X. Drug-drug interaction prediction: databases, web servers and computational models. Brief Bioinform 2023; 25:bbad445. [PMID: 38113076 PMCID: PMC10782925 DOI: 10.1093/bib/bbad445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 10/26/2023] [Accepted: 11/14/2023] [Indexed: 12/21/2023] Open
Abstract
In clinical treatment, two or more drugs (i.e. drug combination) are simultaneously or successively used for therapy with the purpose of primarily enhancing the therapeutic efficacy or reducing drug side effects. However, inappropriate drug combination may not only fail to improve efficacy, but even lead to adverse reactions. Therefore, according to the basic principle of improving the efficacy and/or reducing adverse reactions, we should study drug-drug interactions (DDIs) comprehensively and thoroughly so as to reasonably use drug combination. In this review, we first introduced the basic conception and classification of DDIs. Further, some important publicly available databases and web servers about experimentally verified or predicted DDIs were briefly described. As an effective auxiliary tool, computational models for predicting DDIs can not only save the cost of biological experiments, but also provide relevant guidance for combination therapy to some extent. Therefore, we summarized three types of prediction models (including traditional machine learning-based models, deep learning-based models and score function-based models) proposed during recent years and discussed the advantages as well as limitations of them. Besides, we pointed out the problems that need to be solved in the future research of DDIs prediction and provided corresponding suggestions.
Collapse
Affiliation(s)
- Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jun Yin
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Yong Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Xing Chen
- School of Science, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
4
|
Pomykala KL, Hadaschik BA, Sartor O, Gillessen S, Sweeney CJ, Maughan T, Hofman MS, Herrmann K. Next generation radiotheranostics promoting precision medicine. Ann Oncol 2023; 34:507-519. [PMID: 36924989 DOI: 10.1016/j.annonc.2023.03.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 03/03/2023] [Indexed: 03/17/2023] Open
Abstract
Radiotheranostics is a field of rapid growth with some approved treatments including 131I for thyroid cancer, 223Ra for osseous metastases, 177Lu-DOTATATE for neuroendocrine tumors, and 177Lu-PSMA (prostate-specific membrane antigen) for prostate cancer, and several more under investigation. In this review, we will cover the fundamentals of radiotheranostics, the key clinical studies that have led to current success, future developments with new targets, radionuclides and platforms, challenges with logistics and reimbursement and, lastly, forthcoming considerations regarding dosimetry, identifying the right line of therapy, artificial intelligence and more.
Collapse
Affiliation(s)
- K L Pomykala
- Institute for Artificial Intelligence in Medicine, University Hospital Essen, Essen, Germany
| | - B A Hadaschik
- Department of Urology, University Hospital Essen, Essen, Germany
| | - O Sartor
- School of Medicine, Tulane University, New Orleans, USA
| | - S Gillessen
- Oncology Institute of Southern Switzerland, Bellinzona, Switzerland; Università della Svizzera Italiana, Lugano, Switzerland; Division of Cancer Sciences, University of Manchester, Manchester, UK
| | - C J Sweeney
- Dana-Farber Cancer Institute, Boston, USA; Brigham and Women's Hospital, Harvard Medical School, Boston, USA
| | - T Maughan
- Oxford Institute for Radiation Oncology, University of Oxford, Oxford, UK
| | - M S Hofman
- Prostate Cancer Theranostics and Imaging Centre of Excellence (ProsTIC), Cancer Imaging, Peter MacCallum Cancer Centre, Melbourne, Australia; Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, Australia
| | - K Herrmann
- Department of Nuclear Medicine, University of Duisburg-Essen and German Cancer Consortium (DKTK)-University Hospital Essen, Essen, Germany.
| |
Collapse
|
5
|
Sharifabad MM, Sheikhpour R, Gharaghani S. Drug-target interaction prediction using reliable negative samples and effective feature selection methods. J Pharmacol Toxicol Methods 2022; 116:107191. [PMID: 35738316 DOI: 10.1016/j.vascn.2022.107191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 06/04/2022] [Accepted: 06/14/2022] [Indexed: 11/28/2022]
Abstract
Machine learning-based approaches in the field of drug discovery have dramatically reduced the time and cost of the laboratory process of detecting potential drug-target interactions (DTIs). Standard binary classifiers require both positive and negative samples in the training and validation phases. One of the major challenges in the DTI context is the lack of access to non-interacting pairs as negative samples in the learning process. Many recent studies in this field have randomly selected negative samples from unlabeled drug-target pairs. Therefore, due to the probability of the presence of unknown positive samples in a set considered as negative samples, the model results may be affected and appear with a high rate of false positive. In this study, an algorithm called Reliable Non-Interacting Drug-Target Pairs (RNIDTP) is proposed to select reliable negative samples and an efficient algorithm to select relevant features for drug-target interaction prediction. To validate the performance of the proposed RNIDTP algorithm in the selection of negative samples, a benchmark drug-target interactions dataset is used. The results demonstrate the superiority of the proposed algorithm compared with other algorithms in most cases. The results also indicate that by using an appropriate algorithm for the selection of negative samples, the performance of the learning process is significantly increased compared to random selection.
Collapse
Affiliation(s)
- Mohammad Morovvati Sharifabad
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Razieh Sheikhpour
- Department of Computer Engineering, Faculty of Engineering, Ardakan University, P.O. Box 184, Ardakan, Iran.
| | - Sajjad Gharaghani
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
6
|
Muslu O, Hoyt CT, Lacerda M, Hofmann-Apitius M, Frohlich H. GuiltyTargets: Prioritization of Novel Therapeutic Targets With Network Representation Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:491-500. [PMID: 32750869 DOI: 10.1109/tcbb.2020.3003830] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
The majority of clinical trials fail due to low efficacy of investigated drugs, often resulting from a poor choice of target protein. Existing computational approaches aim to support target selection either via genetic evidence or by putting potential targets into the context of a disease specific network reconstruction. The purpose of this work was to investigate whether network representation learning techniques could be used to allow for a machine learning based prioritization of putative targets. We propose a novel target prioritization approach, GuiltyTargets, which relies on attributed network representation learning of a genome-wide protein-protein interaction network annotated with disease-specific differential gene expression and uses positive-unlabeled (PU) machine learning for candidate ranking. We evaluated our approach on 12 datasets from six diseases of different type (cancer, metabolic, neurodegenerative) within a 10 times repeated 5-fold stratified cross-validation and achieved AUROC values between 0.92 - 0.97, significantly outperforming previous approaches that relied on manually engineered topological features. Moreover, we showed that GuiltyTargets allows for target repositioning across related disease areas. An application of GuiltyTargets to Alzheimer's disease resulted in a number of highly ranked candidates that are currently discussed as targets in the literature. Interestingly, one (COMT) is also the target of an approved drug (Tolcapone) for Parkinson's disease, highlighting the potential for target repositioning with our method. The GuiltyTargets Python package is available on PyPI and all code used for analysis can be found under the MIT License at https://github.com/GuiltyTargets. Attributed network representation learning techniques provide an interesting approach to effectively leverage the existing knowledge about the molecular mechanisms in different diseases. In this work, the combination with positive-unlabeled learning for target prioritization demonstrated a clear superiority compared to classical feature engineering approaches. Our work highlights the potential of attributed network representation learning for target prioritization. Given the overarching relevance of networks in computational biology we believe that attributed network representation learning techniques could have a broader impact in the future.
Collapse
|
7
|
Li F, Dong S, Leier A, Han M, Guo X, Xu J, Wang X, Pan S, Jia C, Zhang Y, Webb GI, Coin LJM, Li C, Song J. Positive-unlabeled learning in bioinformatics and computational biology: a brief review. Brief Bioinform 2021; 23:6415313. [PMID: 34729589 DOI: 10.1093/bib/bbab461] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/27/2021] [Accepted: 10/07/2021] [Indexed: 12/14/2022] Open
Abstract
Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.
Collapse
Affiliation(s)
- Fuyi Li
- Monash University, Australia
| | | | - André Leier
- Department of Genetics, UAB School of Medicine, USA
| | - Meiya Han
- Department of Biochemistry and Molecular Biology, Monash University, Australia
| | | | - Jing Xu
- Computer Science and Technology from Nankai University, China
| | - Xiaoyu Wang
- Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute, Monash University, Australia
| | - Shirui Pan
- University of Technology Sydney (UTS), Ultimo, NSW, Australia
| | - Cangzhi Jia
- College of Science, Dalian Maritime University, Australia
| | - Yang Zhang
- Northwestern Polytechnical University, China
| | - Geoffrey I Webb
- Faculty of Information Technology at Monash University, Australia
| | - Lachlan J M Coin
- Department of Clinical Pathology, University of Melbourne, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry of Molecular Biology, Monash University, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Melbourne, Australia
| |
Collapse
|
8
|
Pérez Santín E, Rodríguez Solana R, González García M, García Suárez MDM, Blanco Díaz GD, Cima Cabal MD, Moreno Rojas JM, López Sánchez JI. Toxicity prediction based on artificial intelligence: A multidisciplinary overview. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1516] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Efrén Pérez Santín
- Escuela Superior de Ingeniería y Tecnología (ESIT) Universidad Internacional de La Rioja (UNIR) Logroño Spain
| | - Raquel Rodríguez Solana
- Department of Food Science and Health Andalusian Institute of Agricultural and Fisheries Research and Training (IFAPA), Alameda del Obispo Avda Córdoba, Andalucía Spain
| | - Mariano González García
- Escuela Superior de Ingeniería y Tecnología (ESIT) Universidad Internacional de La Rioja (UNIR) Logroño Spain
| | - María Del Mar García Suárez
- Escuela Superior de Ingeniería y Tecnología (ESIT) Universidad Internacional de La Rioja (UNIR) Logroño Spain
| | - Gerardo David Blanco Díaz
- Escuela Superior de Ingeniería y Tecnología (ESIT) Universidad Internacional de La Rioja (UNIR) Logroño Spain
| | - María Dolores Cima Cabal
- Escuela Superior de Ingeniería y Tecnología (ESIT) Universidad Internacional de La Rioja (UNIR) Logroño Spain
| | - José Manuel Moreno Rojas
- Department of Food Science and Health Andalusian Institute of Agricultural and Fisheries Research and Training (IFAPA), Alameda del Obispo Avda Córdoba, Andalucía Spain
| | - José Ignacio López Sánchez
- Escuela Superior de Ingeniería y Tecnología (ESIT) Universidad Internacional de La Rioja (UNIR) Logroño Spain
| |
Collapse
|
9
|
Modelling drugs interaction in treatment-experienced patients on antiretroviral therapy. Soft comput 2020. [DOI: 10.1007/s00500-020-05024-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
10
|
Zhang Y, Qiu Y, Cui Y, Liu S, Zhang W. Predicting drug-drug interactions using multi-modal deep auto-encoders based network embedding and positive-unlabeled learning. Methods 2020; 179:37-46. [DOI: 10.1016/j.ymeth.2020.05.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2020] [Revised: 05/06/2020] [Accepted: 05/13/2020] [Indexed: 12/21/2022] Open
|
11
|
Lan C, Chandrasekaran SN, Huan J. On the Unreported-Profile-is-Negative Assumption for Predictive Cheminformatics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1352-1363. [PMID: 31056508 DOI: 10.1109/tcbb.2019.2913855] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In cheminformatics, compound-target binding profiles has been a main source of data for research. For data repositories that only provide positive profiles, a popular assumption is that unreported profiles are all negative. In this paper, we caution the audience not to take this assumption for granted, and present empirical evidence of its ineffectiveness from a machine learning perspective. Our examination is based on a setting where binding profiles are used as features to train predictive models; we show (1) prediction performance degrades when the assumption fails and (2) explicit recovery of unreported profiles improves prediction performance. In particular, we propose a framework that jointly recovers profiles and learns predictive model, and show it achieves further performance improvement. The presented study not only suggests applying matrix recovery methods to recover unreported profiles, but also initiates a new missing feature problem which we called Learning with Positive and Unknown Features.
Collapse
|
12
|
Harada S, Akita H, Tsubaki M, Baba Y, Takigawa I, Yamanishi Y, Kashima H. Dual graph convolutional neural network for predicting chemical networks. BMC Bioinformatics 2020; 21:94. [PMID: 32321421 PMCID: PMC7178944 DOI: 10.1186/s12859-020-3378-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2019] [Accepted: 01/20/2020] [Indexed: 12/17/2022] Open
Abstract
Background Predicting of chemical compounds is one of the fundamental tasks in bioinformatics and chemoinformatics, because it contributes to various applications in metabolic engineering and drug discovery. The recent rapid growth of the amount of available data has enabled applications of computational approaches such as statistical modeling and machine learning method. Both a set of chemical interactions and chemical compound structures are represented as graphs, and various graph-based approaches including graph convolutional neural networks have been successfully applied to chemical network prediction. However, there was no efficient method that can consider the two different types of graphs in an end-to-end manner. Results We give a new formulation of the chemical network prediction problem as a link prediction problem in a graph of graphs (GoG) which can represent the hierarchical structure consisting of compound graphs and an inter-compound graph. We propose a new graph convolutional neural network architecture called dual graph convolutional network that learns compound representations from both the compound graphs and the inter-compound network in an end-to-end manner. Conclusions Experiments using four chemical networks with different sparsity levels and degree distributions shows that our dual graph convolution approach achieves high prediction performance in relatively dense networks, while the performance becomes inferior on extremely-sparse networks.
Collapse
Affiliation(s)
| | | | - Masashi Tsubaki
- National Institute of Advanced Industrial Science and Technology, Tokyo, 1350064, Japan
| | | | - Ichigaku Takigawa
- Hokkaido University, Hokkaido, 0600808, Japan.,Riken AIP, Tokyo, 1030027, Japan
| | | | - Hisashi Kashima
- Kyoto University, Kyoto, 6068501, Japan.,Riken AIP, Tokyo, 1030027, Japan
| |
Collapse
|
13
|
Abstract
Currently, the development of medicines for complex diseases requires the development of combination drug therapies. It is necessary because in many cases, one drug cannot target all necessary points of intervention. For example, in cancer therapy, a physician often meets a patient having a genomic profile including more than five molecular aberrations. Drug combination therapy has been an area of interest for a while, for example the classical work of Loewe devoted to the synergism of drugs was published in 1928-and it is still used in calculations for optimal drug combinations. More recently, over the past several years, there has been an explosion in the available information related to the properties of drugs and the biomedical parameters of patients. For the drugs, hundreds of 2D and 3D molecular descriptors for medicines are now available, while for patients, large data sets related to genetic/proteomic and metabolomics profiles of the patients are now available, as well as the more traditional data relating to the histology, history of treatments, pretreatment state of the organism, etc. Moreover, during disease progression, the genetic profile can change. Thus, the ability to optimize drug combinations for each patient is rapidly moving beyond the comprehension and capabilities of an individual physician. This is the reason, that biomedical informatics methods have been developed and one of the more promising directions in this field is the application of artificial intelligence (AI). In this review, we discuss several AI methods that have been successfully implemented in several instances of combination drug therapy from HIV, hypertension, infectious diseases to cancer. The data clearly show that the combination of rule-based expert systems with machine learning algorithms may be promising direction in this field.
Collapse
|
14
|
Zheng Y, Peng H, Zhang X, Zhao Z, Gao X, Li J. DDI-PULearn: a positive-unlabeled learning method for large-scale prediction of drug-drug interactions. BMC Bioinformatics 2019; 20:661. [PMID: 31870276 PMCID: PMC6929327 DOI: 10.1186/s12859-019-3214-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 11/12/2019] [Indexed: 11/10/2022] Open
Abstract
Background Drug-drug interactions (DDIs) are a major concern in patients’ medication. It’s unfeasible to identify all potential DDIs using experimental methods which are time-consuming and expensive. Computational methods provide an effective strategy, however, facing challenges due to the lack of experimentally verified negative samples. Results To address this problem, we propose a novel positive-unlabeled learning method named DDI-PULearn for large-scale drug-drug-interaction predictions. DDI-PULearn first generates seeds of reliable negatives via OCSVM (one-class support vector machine) under a high-recall constraint and via the cosine-similarity based KNN (k-nearest neighbors) as well. Then trained with all the labeled positives (i.e., the validated DDIs) and the generated seed negatives, DDI-PULearn employs an iterative SVM to identify a set of entire reliable negatives from the unlabeled samples (i.e., the unobserved DDIs). Following that, DDI-PULearn represents all the labeled positives and the identified negatives as vectors of abundant drug properties by a similarity-based method. Finally, DDI-PULearn transforms these vectors into a lower-dimensional space via PCA (principal component analysis) and utilizes the compressed vectors as input for binary classifications. The performance of DDI-PULearn is evaluated on simulative prediction for 149,878 possible interactions between 548 drugs, comparing with two baseline methods and five state-of-the-art methods. Related experiment results show that the proposed method for the representation of DDIs characterizes them accurately. DDI-PULearn achieves superior performance owing to the identified reliable negatives, outperforming all other methods significantly. In addition, the predicted novel DDIs suggest that DDI-PULearn is capable to identify novel DDIs. Conclusions The results demonstrate that positive-unlabeled learning paves a new way to tackle the problem caused by the lack of experimentally verified negatives in the computational prediction of DDIs.
Collapse
Affiliation(s)
- Yi Zheng
- Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, 15 Broadway Ultimo, Sydney, 2007, Australia
| | - Hui Peng
- Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, 15 Broadway Ultimo, Sydney, 2007, Australia
| | - Xiaocai Zhang
- Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, 15 Broadway Ultimo, Sydney, 2007, Australia
| | - Zhixun Zhao
- Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, 15 Broadway Ultimo, Sydney, 2007, Australia
| | - Xiaoying Gao
- School of Engineering and Computer Science, Victoria University of Wellington, Cotton Building, Kelburn Campus, Wellington, 6140, New Zealand
| | - Jinyan Li
- Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, 15 Broadway Ultimo, Sydney, 2007, Australia.
| |
Collapse
|
15
|
Celebi R, Uyar H, Yasar E, Gumus O, Dikenelli O, Dumontier M. Evaluation of knowledge graph embedding approaches for drug-drug interaction prediction in realistic settings. BMC Bioinformatics 2019; 20:726. [PMID: 31852427 PMCID: PMC6921491 DOI: 10.1186/s12859-019-3284-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Accepted: 11/19/2019] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Current approaches to identifying drug-drug interactions (DDIs), include safety studies during drug development and post-marketing surveillance after approval, offer important opportunities to identify potential safety issues, but are unable to provide complete set of all possible DDIs. Thus, the drug discovery researchers and healthcare professionals might not be fully aware of potentially dangerous DDIs. Predicting potential drug-drug interaction helps reduce unanticipated drug interactions and drug development costs and optimizes the drug design process. Methods for prediction of DDIs have the tendency to report high accuracy but still have little impact on translational research due to systematic biases induced by networked/paired data. In this work, we aimed to present realistic evaluation settings to predict DDIs using knowledge graph embeddings. We propose a simple disjoint cross-validation scheme to evaluate drug-drug interaction predictions for the scenarios where the drugs have no known DDIs. RESULTS We designed different evaluation settings to accurately assess the performance for predicting DDIs. The settings for disjoint cross-validation produced lower performance scores, as expected, but still were good at predicting the drug interactions. We have applied Logistic Regression, Naive Bayes and Random Forest on DrugBank knowledge graph with the 10-fold traditional cross validation using RDF2Vec, TransE and TransD. RDF2Vec with Skip-Gram generally surpasses other embedding methods. We also tested RDF2Vec on various drug knowledge graphs such as DrugBank, PharmGKB and KEGG to predict unknown drug-drug interactions. The performance was not enhanced significantly when an integrated knowledge graph including these three datasets was used. CONCLUSION We showed that the knowledge embeddings are powerful predictors and comparable to current state-of-the-art methods for inferring new DDIs. We addressed the evaluation biases by introducing drug-wise and pairwise disjoint test classes. Although the performance scores for drug-wise and pairwise disjoint seem to be low, the results can be considered to be realistic in predicting the interactions for drugs with limited interaction information.
Collapse
Affiliation(s)
- Remzi Celebi
- Institute of Data Science, Maastricht University, Maastricht, 6200, Netherlands.
| | - Huseyin Uyar
- Computer Engineering Department, Ege University, Izmir, 35100, Turkey
| | - Erkan Yasar
- Computer Engineering Department, Ege University, Izmir, 35100, Turkey
| | - Ozgur Gumus
- Computer Engineering Department, Ege University, Izmir, 35100, Turkey
| | - Oguz Dikenelli
- Computer Engineering Department, Ege University, Izmir, 35100, Turkey
| | - Michel Dumontier
- Institute of Data Science, Maastricht University, Maastricht, 6200, Netherlands
| |
Collapse
|
16
|
Li F, Wang Y, Li C, Marquez-Lago TT, Leier A, Rawlings ND, Haffari G, Revote J, Akutsu T, Chou KC, Purcell AW, Pike RN, Webb GI, Ian Smith A, Lithgow T, Daly RJ, Whisstock JC, Song J. Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods. Brief Bioinform 2019; 20:2150-2166. [PMID: 30184176 PMCID: PMC6954447 DOI: 10.1093/bib/bby077] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 07/26/2018] [Accepted: 08/01/2018] [Indexed: 01/06/2023] Open
Abstract
The roles of proteolytic cleavage have been intensively investigated and discussed during the past two decades. This irreversible chemical process has been frequently reported to influence a number of crucial biological processes (BPs), such as cell cycle, protein regulation and inflammation. A number of advanced studies have been published aiming at deciphering the mechanisms of proteolytic cleavage. Given its significance and the large number of functionally enriched substrates targeted by specific proteases, many computational approaches have been established for accurate prediction of protease-specific substrates and their cleavage sites. Consequently, there is an urgent need to systematically assess the state-of-the-art computational approaches for protease-specific cleavage site prediction to further advance the existing methodologies and to improve the prediction performance. With this goal in mind, in this article, we carefully evaluated a total of 19 computational methods (including 8 scoring function-based methods and 11 machine learning-based methods) in terms of their underlying algorithm, calculated features, performance evaluation and software usability. Then, extensive independent tests were performed to assess the robustness and scalability of the reviewed methods using our carefully prepared independent test data sets with 3641 cleavage sites (specific to 10 proteases). The comparative experimental results demonstrate that PROSPERous is the most accurate generic method for predicting eight protease-specific cleavage sites, while GPS-CCD and LabCaS outperformed other predictors for calpain-specific cleavage sites. Based on our review, we then outlined some potential ways to improve the prediction performance and ease the computational burden by applying ensemble learning, deep learning, positive unlabeled learning and parallel and distributed computing techniques. We anticipate that our study will serve as a practical and useful guide for interested readers to further advance next-generation bioinformatics tools for protease-specific cleavage site prediction.
Collapse
Affiliation(s)
- Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yanan Wang
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Department of Biology, Institute of Molecular Systems Biology,ETH Zürich, Zürich 8093, Switzerland
| | - Tatiana T Marquez-Lago
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Neil D Rawlings
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Wellcome Trust Genome Campus,Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gholamreza Haffari
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Jerico Revote
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Anthony W Purcell
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Robert N Pike
- La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - A Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Trevor Lithgow
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, Victoria 3800, Australia
| | - Roger J Daly
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - James C Whisstock
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
17
|
Zhang W, Jing K, Huang F, Chen Y, Li B, Li J, Gong J. SFLLN: A sparse feature learning ensemble method with linear neighborhood regularization for predicting drug–drug interactions. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.05.017] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
18
|
Foong R, Ang KK, Zhang Z, Quek C. An iterative cross-subject negative-unlabeled learning algorithm for quantifying passive fatigue. J Neural Eng 2019; 16:056013. [PMID: 31141797 DOI: 10.1088/1741-2552/ab255d] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
OBJECTIVE This paper proposes an iterative negative-unlabeled (NU) learning algorithm for cross-subject detection of passive fatigue from labelled alert (negative) and unlabeled driving EEG data. APPROACH Unlike other studies which used manual labeling of the fatigue state, the proposed algorithm (PA) first iteratively uses 29 subjects' alert data and unlabeled driving data to identify the most fatigued block of EEG data in each subject in a cross-subject manner. Subsequently, the PA computes subjects' driving fatigue score. Repeated measures correlations of the score to EEG band powers are then performed. MAIN RESULTS The PA yields an averaged accuracy of 93.77% ± 8.15% across subjects in detecting fatigue, which is significantly better than the various baselines. The fatigue scores obtained are also significantly positively correlated with theta band power and negatively correlated with beta band power that are known to respectively increase and decrease in presence of passive fatigue. There is a strong negative correlation with alpha band power as well. SIGNIFICANCE The proposed iterative NU learning algorithm is capable of labelling and quantifying the most fatigued block in a cross-subject manner despite the lack of ground truth in the fatigue levels of unlabeled driving EEG data. Together with the significant correlations with theta, alpha and beta band power, the results show promise in the application of the proposed algorithm to detect fatigue from EEG.
Collapse
Affiliation(s)
- Ruyi Foong
- Neural and Biomedical Technology, Institute for Infocomm Research, Singapore. School of Computer Science and Engineering, Nanyang Technological University, Singapore
| | | | | | | |
Collapse
|
19
|
Liu N, Chen CB, Kumara S. Semi-Supervised Learning Algorithm for Identifying High-Priority Drug-Drug Interactions Through Adverse Event Reports. IEEE J Biomed Health Inform 2019; 24:57-68. [PMID: 31395567 DOI: 10.1109/jbhi.2019.2932740] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Identifying drug-drug interactions (DDIs) is a critical enabler for reducing adverse drug events and improving patient safety. Generating proper DDI alerts during prescribing workflow has the potential to prevent DDI-related adverse events. However, the implementation of DDI alerting system remains a challenge as users are experiencing alert overload which causes alert fatigue. One strategy to optimize the current system is to establish a list of high-priority DDIs for alerting purposes, though it is a resource-intensive task. In this study, we propose a machine learning framework to extract useful features from the FDA adverse event reports and then identify potential high-priority DDIs using an autoencoder-based semi-supervised learning algorithm. The experimental results demonstrate the effectiveness of using adverse event feature representations in differentiating high- and low-priority DDIs. Additionally, the proposed algorithm utilizes stacked autoencoders and weighted support vector machine for boosting classification performance, which outperforms other competing methods in terms of F-measure and AUC score. This framework integrates multiple information sources, leverages domain knowledge and clinical evidence, and provides a practical approach for pre-screening high-priority DDI candidates for medication alerts.
Collapse
|
20
|
Qian S, Liang S, Yu H. Leveraging genetic interactions for adverse drug-drug interaction prediction. PLoS Comput Biol 2019; 15:e1007068. [PMID: 31125330 PMCID: PMC6553795 DOI: 10.1371/journal.pcbi.1007068] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 06/06/2019] [Accepted: 05/03/2019] [Indexed: 12/20/2022] Open
Abstract
In light of increased co-prescription of multiple drugs, the ability to discern and predict drug-drug interactions (DDI) has become crucial to guarantee the safety of patients undergoing treatment with multiple drugs. However, information on DDI profiles is incomplete and the experimental determination of DDIs is labor-intensive and time-consuming. Although previous studies have explored various feature spaces for in silico screening of interacting drug pairs, their use of conventional cross-validation prevents them from achieving generalizable performance on drug pairs where neither drug is seen during training. Here we demonstrate for the first time targets of adversely interacting drug pairs are significantly more likely to have synergistic genetic interactions than non-interacting drug pairs. Leveraging genetic interaction features and a novel training scheme, we construct a gradient boosting-based classifier that achieves robust DDI prediction even for drugs whose interaction profiles are completely unseen during training. We demonstrate that in addition to classification power—including the prediction of 432 novel DDIs—our genetic interaction approach offers interpretability by providing plausible mechanistic insights into the mode of action of DDIs. Adverse drug-drug interactions are adverse side effects caused by taking two or more drugs together. As co-prescription of multiple drugs becomes an increasingly prevalent practice, affecting 42.2% of Americans over 65 years old, adverse drug-drug interactions have become a serious safety concern, accounting for over 74,000 emergency room visits and 195,000 hospitalizations each year in the United States alone. Since experimental determination of adverse drug-drug interactions is labor-intensive and time-consuming, various machine learning-based computational approaches have been developed for predicting drug-drug interactions. Considering the fact that drugs effect through binding and modulating the function of their targets, we have explored whether drug-drug interactions can be predicted from the genetic interaction between the gene targets of two drugs, which characterizes the unexpected fitness effect when two genes are simultaneously knocked out. Furthermore, we have built a fast and robust classifier that achieves accurate prediction of adverse drug-drug interactions by incorporating genetic interaction and several other types of widely used features. Our analyses suggest that genetic interaction is an important feature for our prediction model, and that it provides mechanistic insight into the mode of action of drugs leading to drug-drug interactions.
Collapse
Affiliation(s)
- Sheng Qian
- Department of Computational Biology, Cornell University, Ithaca, New York, United States of America
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, United States of America
| | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, New York, United States of America
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, United States of America
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, New York, United States of America
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, United States of America
- * E-mail:
| |
Collapse
|
21
|
Frey NC, Wang J, Vega Bellido GI, Anasori B, Gogotsi Y, Shenoy VB. Prediction of Synthesis of 2D Metal Carbides and Nitrides (MXenes) and Their Precursors with Positive and Unlabeled Machine Learning. ACS NANO 2019; 13:3031-3041. [PMID: 30830760 DOI: 10.1021/acsnano.8b08014] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Growing interest in the potential applications of two-dimensional (2D) materials has fueled advancement in the identification of 2D systems with exotic properties. Increasingly, the bottleneck in this field is the synthesis of these materials. Although theoretical calculations have predicted a myriad of promising 2D materials, only a few dozen have been experimentally realized since the initial discovery of graphene. Here, we adapt the state-of-the-art positive and unlabeled (PU) machine learning framework to predict which theoretically proposed 2D materials have the highest likelihood of being successfully synthesized. Using elemental information and data from high-throughput density functional theory calculations, we apply the PU learning method to the MXene family of 2D transition metal carbides, carbonitrides, and nitrides, and their layered precursor MAX phases, and identify 18 MXene compounds that are highly promising candidates for synthesis. By considering both the MXenes and their precursors, we further propose 20 synthesizable MAX phases that can be chemically exfoliated to produce MXenes.
Collapse
Affiliation(s)
- Nathan C Frey
- Department of Materials Science and Engineering , University of Pennsylvania , Philadelphia , Pennsylvania 19104 , United States
| | - Jin Wang
- Department of Materials Science and Engineering , University of Pennsylvania , Philadelphia , Pennsylvania 19104 , United States
| | - Gabriel Iván Vega Bellido
- Department of Materials Science and Engineering , University of Pennsylvania , Philadelphia , Pennsylvania 19104 , United States
- Department of Chemical Engineering , University of Puerto Rico at Mayagüez , Mayagüez 00681 , Puerto Rico
| | - Babak Anasori
- Department of Materials Science and Engineering and A.J. Drexel Nanomaterials Institute , Drexel University , Philadelphia , Pennsylvania 19104 , United States
| | - Yury Gogotsi
- Department of Materials Science and Engineering and A.J. Drexel Nanomaterials Institute , Drexel University , Philadelphia , Pennsylvania 19104 , United States
| | - Vivek B Shenoy
- Department of Materials Science and Engineering , University of Pennsylvania , Philadelphia , Pennsylvania 19104 , United States
| |
Collapse
|
22
|
Li F, Zhang Y, Purcell AW, Webb GI, Chou KC, Lithgow T, Li C, Song J. Positive-unlabelled learning of glycosylation sites in the human proteome. BMC Bioinformatics 2019; 20:112. [PMID: 30841845 PMCID: PMC6404354 DOI: 10.1186/s12859-019-2700-1] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 02/22/2019] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND As an important type of post-translational modification (PTM), protein glycosylation plays a crucial role in protein stability and protein function. The abundance and ubiquity of protein glycosylation across three domains of life involving Eukarya, Bacteria and Archaea demonstrate its roles in regulating a variety of signalling and metabolic pathways. Mutations on and in the proximity of glycosylation sites are highly associated with human diseases. Accordingly, accurate prediction of glycosylation can complement laboratory-based methods and greatly benefit experimental efforts for characterization and understanding of functional roles of glycosylation. For this purpose, a number of supervised-learning approaches have been proposed to identify glycosylation sites, demonstrating a promising predictive performance. To train a conventional supervised-learning model, both reliable positive and negative samples are required. However, in practice, a large portion of negative samples (i.e. non-glycosylation sites) are mislabelled due to the limitation of current experimental technologies. Moreover, supervised algorithms often fail to take advantage of large volumes of unlabelled data, which can aid in model learning in conjunction with positive samples (i.e. experimentally verified glycosylation sites). RESULTS In this study, we propose a positive unlabelled (PU) learning-based method, PA2DE (V2.0), based on the AlphaMax algorithm for protein glycosylation site prediction. The predictive performance of this proposed method was evaluated by a range of glycosylation data collected over a ten-year period based on an interval of three years. Experiments using both benchmarking and independent tests show that our method outperformed the representative supervised-learning algorithms (including support vector machines and random forests) and one-class learners, as well as currently available prediction methods in terms of F1 score, accuracy and AUC measures. In addition, we developed an online web server as an implementation of the optimized model (available at http://glycomine.erc.monash.edu/Lab/GlycoMine_PU/ ) to facilitate community-wide efforts for accurate prediction of protein glycosylation sites. CONCLUSION The proposed PU learning approach achieved a competitive predictive performance compared with currently available methods. This PU learning schema may also be effectively employed and applied to address the prediction problems of other important types of protein PTM site and functional sites.
Collapse
Affiliation(s)
- Fuyi Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800 Australia
| | - Yang Zhang
- College of Information Engineering, Northwest A and F University, Yangling, 712100 Shaanxi China
| | - Anthony W. Purcell
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
| | - Geoffrey I. Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800 Australia
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478 USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 China
| | - Trevor Lithgow
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800 Australia
| | - Chen Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8093 Zürich, Switzerland
| | - Jiangning Song
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800 Australia
| |
Collapse
|
23
|
Duran‐Frigola M, Fernández‐Torras A, Bertoni M, Aloy P. Formatting biological big data for modern machine learning in drug discovery. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2018. [DOI: 10.1002/wcms.1408] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Miquel Duran‐Frigola
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
| | - Adrià Fernández‐Torras
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
| | - Martino Bertoni
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
| | - Patrick Aloy
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA) Barcelona Spain
| |
Collapse
|
24
|
Song D, Chen Y, Min Q, Sun Q, Ye K, Zhou C, Yuan S, Sun Z, Liao J. Similarity-based machine learning support vector machine predictor of drug-drug interactions with improved accuracies. J Clin Pharm Ther 2018; 44:268-275. [PMID: 30565313 DOI: 10.1111/jcpt.12786] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Revised: 10/29/2018] [Accepted: 11/18/2018] [Indexed: 12/12/2022]
Affiliation(s)
- Dalong Song
- Guizhou University; Guiyang China
- Department of Urology; GuiZhou Provincial People’s Hospital; Guiyang China
| | - Yao Chen
- School of Science; China Pharmaceutical University; Nanjing China
| | - Qian Min
- School of Science; China Pharmaceutical University; Nanjing China
| | - Qingrong Sun
- School of Science; China Pharmaceutical University; Nanjing China
| | - Kai Ye
- MandalaT Software Corporation, F5; Wuxi China
| | - Changjiang Zhou
- School of Science; China Pharmaceutical University; Nanjing China
| | - Shengyue Yuan
- School of Science; China Pharmaceutical University; Nanjing China
| | - Zhaolin Sun
- Department of Urology; GuiZhou Provincial People’s Hospital; Guiyang China
| | - Jun Liao
- School of Science; China Pharmaceutical University; Nanjing China
- Key Laboratory of Drug Quality Control and Pharmacovigilance (China Pharmaceutical University); Ministry of Education; Nanjing China
| |
Collapse
|
25
|
Kastrin A, Ferk P, Leskošek B. Predicting potential drug-drug interactions on topological and semantic similarity features using statistical learning. PLoS One 2018; 13:e0196865. [PMID: 29738537 PMCID: PMC5940181 DOI: 10.1371/journal.pone.0196865] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Accepted: 04/20/2018] [Indexed: 01/03/2023] Open
Abstract
Drug-drug interaction (DDI) is a change in the effect of a drug when patient takes another drug. Characterizing DDIs is extremely important to avoid potential adverse drug reactions. We represent DDIs as a complex network in which nodes refer to drugs and links refer to their potential interactions. Recently, the problem of link prediction has attracted much consideration in scientific community. We represent the process of link prediction as a binary classification task on networks of potential DDIs. We use link prediction techniques for predicting unknown interactions between drugs in five arbitrary chosen large-scale DDI databases, namely DrugBank, KEGG, NDF-RT, SemMedDB, and Twosides. We estimated the performance of link prediction using a series of experiments on DDI networks. We performed link prediction using unsupervised and supervised approach including classification tree, k-nearest neighbors, support vector machine, random forest, and gradient boosting machine classifiers based on topological and semantic similarity features. Supervised approach clearly outperforms unsupervised approach. The Twosides network gained the best prediction performance regarding the area under the precision-recall curve (0.93 for both random forests and gradient boosting machine). The applied methodology can be used as a tool to help researchers to identify potential DDIs. The supervised link prediction approach proved to be promising for potential DDIs prediction and may facilitate the identification of potential DDIs in clinical research.
Collapse
Affiliation(s)
- Andrej Kastrin
- Institute of Biostatistics and Medical Informatics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Polonca Ferk
- Institute of Biostatistics and Medical Informatics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Brane Leskošek
- Institute of Biostatistics and Medical Informatics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|
26
|
Zhu Y, Elemento O, Pathak J, Wang F. Drug knowledge bases and their applications in biomedical informatics research. Brief Bioinform 2018; 20:1308-1321. [DOI: 10.1093/bib/bbx169] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Revised: 11/15/2017] [Indexed: 11/14/2022] Open
Abstract
Abstract
Recent advances in biomedical research have generated a large volume of drug-related data. To effectively handle this flood of data, many initiatives have been taken to help researchers make good use of them. As the results of these initiatives, many drug knowledge bases have been constructed. They range from simple ones with specific focuses to comprehensive ones that contain information on almost every aspect of a drug. These curated drug knowledge bases have made significant contributions to the development of efficient and effective health information technologies for better health-care service delivery. Understanding and comparing existing drug knowledge bases and how they are applied in various biomedical studies will help us recognize the state of the art and design better knowledge bases in the future. In addition, researchers can get insights on novel applications of the drug knowledge bases through a review of successful use cases. In this study, we provide a review of existing popular drug knowledge bases and their applications in drug-related studies. We discuss challenges in constructing and using drug knowledge bases as well as future research directions toward a better ecosystem of drug knowledge bases.
Collapse
|