1
|
Picard M, Scott-Boyer MP, Bodein A, Leclercq M, Prunier J, Périn O, Droit A. Target repositioning using multi-layer networks and machine learning: The case of prostate cancer. Comput Struct Biotechnol J 2024; 24:464-475. [PMID: 38983753 PMCID: PMC11231507 DOI: 10.1016/j.csbj.2024.06.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 06/10/2024] [Accepted: 06/12/2024] [Indexed: 07/11/2024] Open
Abstract
The discovery of novel therapeutic targets, defined as proteins which drugs can interact with to induce therapeutic benefits, typically represent the first and most important step of drug discovery. One solution for target discovery is target repositioning, a strategy which relies on the repurposing of known targets for new diseases, leading to new treatments, less side effects and potential drug synergies. Biological networks have emerged as powerful tools for integrating heterogeneous data and facilitating the prediction of biological or therapeutic properties. Consequently, they are widely employed to predict new therapeutic targets by characterizing potential candidates, often based on their interactions within a Protein-Protein Interaction (PPI) network, and their proximity to genes associated with the disease. However, over-reliance on PPI networks and the assumption that potential targets are necessarily near known genes can introduce biases that may limit the effectiveness of these methods. This study addresses these limitations in two ways. First, by exploiting a multi-layer network which incorporates additional information such as gene regulation, metabolite interactions, metabolic pathways, and several disease signatures such as Differentially Expressed Genes, mutated genes, Copy Number Alteration, and structural variants. Second, by extracting relevant features from the network using several approaches including proximity to disease-associated genes, but also unbiased approaches such as propagation-based methods, topological metrics, and module detection algorithms. Using prostate cancer as a case study, the best features were identified and utilized to train machine learning algorithms to predict 5 novel promising therapeutic targets for prostate cancer: IGF2R, C5AR, RAB7, SETD2 and NPBWR1.
Collapse
Affiliation(s)
- Milan Picard
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Mickaël Leclercq
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Julien Prunier
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Transformation and Innovation Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| |
Collapse
|
2
|
Liu JX, Zhang X, Huang YQ, Hao GF, Yang GF. Multi-level bioinformatics resources support drug target discovery of protein-protein interactions. Drug Discov Today 2024; 29:103979. [PMID: 38608830 DOI: 10.1016/j.drudis.2024.103979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 03/14/2024] [Accepted: 04/05/2024] [Indexed: 04/14/2024]
Abstract
Drug discovery often begins with a new target. Protein-protein interactions (PPIs) are crucial to multitudinous cellular processes and offer a promising avenue for drug-target discovery. PPIs are characterized by multi-level complexity: at the protein level, interaction networks can be used to identify potential targets, whereas at the residue level, the details of the interactions of individual PPIs can be used to examine a target's druggability. Much great progress has been made in target discovery through multi-level PPI-related computational approaches, but these resources have not been fully discussed. Here, we systematically survey bioinformatics tools for identifying and assessing potential drug targets, examining their characteristics, limitations and applications. This work will aid the integration of the broader protein-to-network context with the analysis of detailed binding mechanisms to support the discovery of drug targets.
Collapse
Affiliation(s)
- Jia-Xin Liu
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China
| | - Xiao Zhang
- State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Yuan-Qin Huang
- State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China; State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals, Guizhou University, Guiyang 550025, PR China.
| | - Guang-Fu Yang
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China.
| |
Collapse
|
3
|
Cunningham M, Pins D, Dezső Z, Torrent M, Vasanthakumar A, Pandey A. PINNED: identifying characteristics of druggable human proteins using an interpretable neural network. J Cheminform 2023; 15:64. [PMID: 37468968 PMCID: PMC10354961 DOI: 10.1186/s13321-023-00735-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/10/2023] [Indexed: 07/21/2023] Open
Abstract
The identification of human proteins that are amenable to pharmacologic modulation without significant off-target effects remains an important unsolved challenge. Computational methods have been devised to identify features which distinguish between "druggable" and "undruggable" proteins, finding that protein sequence, tissue and cellular localization, biological role, and position in the protein-protein interaction network are all important discriminant factors. However, many prior efforts to automate the assessment of protein druggability suffer from low performance or poor interpretability. We developed a neural network-based machine learning model capable of generating druggability sub-scores based on each of four distinct categories, combining them to form an overall druggability score. The model achieves an excellent performance in separating drugged and undrugged proteins in the human proteome, with an area under the receiver operating characteristic (AUC) of 0.95. Our use of multiple sub-scores allows the assessment of potential protein targets of interest based on distinct contributors to druggability, leading to a more interpretable and holistic model to identify novel targets.
Collapse
Affiliation(s)
- Michael Cunningham
- Genomics Research Center, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA.
| | - Danielle Pins
- Information Research, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Zoltán Dezső
- Genomics Research Center, AbbVie Inc., 1000 Gateway Boulevard, South San Francisco, CA, 94080, USA
| | - Maricel Torrent
- Small Molecule Therapeutics and Platform Technologies, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Aparna Vasanthakumar
- Genomics Research Center, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Abhishek Pandey
- Information Research, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| |
Collapse
|
4
|
Thafar MA, Albaradei S, Uludag M, Alshahrani M, Gojobori T, Essack M, Gao X. OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features. Front Genet 2023; 14:1139626. [PMID: 37091791 PMCID: PMC10117673 DOI: 10.3389/fgene.2023.1139626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 03/24/2023] [Indexed: 04/08/2023] Open
Abstract
Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein’s amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the “OncologyTT” datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins’ amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer.
Collapse
Affiliation(s)
- Maha A. Thafar
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- College of Computers and Information Technology, Computer Science Department, Taif University, Taif, Saudi Arabia
| | - Somayah Albaradei
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mahmut Uludag
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Mona Alshahrani
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| |
Collapse
|
5
|
Liu Z, Li H, Jin Z, Li Y, Guo F, He Y, Liu X, Qi Y, Yuan L, He F, Li D. Exploration of Target Spaces in the Human Genome for Protein and Peptide Drugs. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:780-794. [PMID: 35338014 PMCID: PMC9881050 DOI: 10.1016/j.gpb.2021.10.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 10/20/2021] [Accepted: 11/01/2021] [Indexed: 01/31/2023]
Abstract
After decades of development, protein and peptide drugs have now grown into a major drug class in the marketplace. Target identification and validation are crucial for the discovery of protein and peptide drugs, and bioinformatics prediction of targets based on the characteristics of known target proteins will help improve the efficiency and success rate of target selection. However, owing to the developmental history in the pharmaceutical industry, previous systematic exploration of the target spaces has mainly focused on traditional small-molecule drugs, while studies related to protein and peptide drugs are lacking. Here, we systematically explore the target spaces in the human genome specifically for protein and peptide drugs. Compared with other proteins, both successful protein and peptide drug targets have many special characteristics, and are also significantly different from those of small-molecule drugs in many aspects. Based on these features, we develop separate effective genome-wide target prediction models for protein and peptide drugs. Finally, a user-friendly web server, Predictor Of Protein and PeptIde drugs' therapeutic Targets (POPPIT) (http://poppit.ncpsb.org.cn/), is established, which provides not only target prediction specifically for protein and peptide drugs but also abundant annotations for predicted targets.
Collapse
Affiliation(s)
- Zhongyang Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China,School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China,College of Chemistry and Environmental Science, Hebei University, Baoding 071002, China,Corresponding authors.
| | - Honglei Li
- Suzhou Geneworks Technology Co., Ltd., Suzhou 215028, China
| | - Zhaoyu Jin
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yang Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Feifei Guo
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Yangzhige He
- Department of Medical Research Center, Peking Union Medical College Hospital, Chinese Academy of Medical Science & Peking Union Medical College, Beijing 100730, China
| | - Xinyue Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yaning Qi
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China,College of Life Sciences, Hebei University, Baoding 071002, China
| | - Liying Yuan
- College of Life Sciences, Hebei University, Baoding 071002, China
| | - Fuchu He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China,Corresponding authors.
| | - Dong Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China,School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China,Corresponding authors.
| |
Collapse
|
6
|
Prediction of Drug Targets for Specific Diseases Leveraging Gene Perturbation Data: A Machine Learning Approach. Pharmaceutics 2022; 14:pharmaceutics14020234. [PMID: 35213968 PMCID: PMC8878225 DOI: 10.3390/pharmaceutics14020234] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 01/08/2022] [Accepted: 01/14/2022] [Indexed: 12/15/2022] Open
Abstract
Identification of the correct targets is a key element for successful drug development. However, there are limited approaches for predicting drug targets for specific diseases using omics data, and few have leveraged expression profiles from gene perturbations. We present a novel computational approach for drug target discovery based on machine learning (ML) models. ML models are first trained on drug-induced expression profiles with outcomes defined as whether the drug treats the studied disease. The goal is to “learn” the expression patterns associated with treatment. Then, the fitted ML models were applied to expression profiles from gene perturbations (overexpression (OE)/knockdown (KD)). We prioritized targets based on predicted probabilities from the ML model, which reflects treatment potential. The methodology was applied to predict targets for hypertension, diabetes mellitus (DM), rheumatoid arthritis (RA), and schizophrenia (SCZ). We validated our approach by evaluating whether the identified targets may ‘re-discover’ known drug targets from an external database (OpenTargets). Indeed, we found evidence of significant enrichment across all diseases under study. A further literature search revealed that many candidates were supported by previous studies. For example, we predicted PSMB8 inhibition to be associated with the treatment of RA, which was supported by a study showing that PSMB8 inhibitors (PR-957) ameliorated experimental RA in mice. In conclusion, we propose a new ML approach to integrate the expression profiles from drugs and gene perturbations and validated the framework. Our approach is flexible and may provide an independent source of information when prioritizing drug targets.
Collapse
|
7
|
Yu L, Xue L, Liu F, Li Y, Jing R, Luo J. The applications of deep learning algorithms on in silico druggable proteins identification. J Adv Res 2022; 41:219-231. [PMID: 36328750 PMCID: PMC9637576 DOI: 10.1016/j.jare.2022.01.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 12/21/2021] [Accepted: 01/18/2022] [Indexed: 11/20/2022] Open
Abstract
We developed the first deep learning-based druggable protein classifier for fast and accurate identification of potential druggable proteins. Experimental results on a standard dataset demonstrate that the prediction performance of deep learning model is comparable to those of existing methods. We visualized the representations of druggable proteins learned by deep learning models, which helps us understand how they work. Our analysis reconfirms that the attention mechanism is especially useful for explaining deep learning models.
Introduction The top priority in drug development is to identify novel and effective drug targets. In vitro assays are frequently used for this purpose; however, traditional experimental approaches are insufficient for large-scale exploration of novel drug targets, as they are expensive, time-consuming and laborious. Therefore, computational methods have emerged in recent decades as an alternative to aid experimental drug discovery studies by developing sophisticated predictive models to estimate unknown drugs/compounds and their targets. The recent success of deep learning (DL) techniques in machine learning and artificial intelligence has further attracted a great deal of attention in the biomedicine field, including computational drug discovery. Objectives This study focuses on the practical applications of deep learning algorithms for predicting druggable proteins and proposes a powerful predictor for fast and accurate identification of potential drug targets. Methods Using a gold-standard dataset, we explored several typical protein features and different deep learning algorithms and evaluated their performance in a comprehensive way. We provide an overview of the entire experimental process, including protein features and descriptors, neural network architectures, libraries and toolkits for deep learning modelling, performance evaluation metrics, model interpretation and visualization. Results Experimental results show that the hybrid model (architecture: CNN-RNN (BiLSTM) + DNN; feature: dictionary encoding + DC_TC_CTD) performed better than the other models on the benchmark dataset. This hybrid model was able to achieve 90.0% accuracy and 0.800 MCC on the test dataset and 84.8% and 0.703 on a nonredundant independent test dataset, which is comparable to those of existing methods. Conclusion We developed the first deep learning-based classifier for fast and accurate identification of potential druggable proteins. We hope that this study will be helpful for future researchers who would like to use deep learning techniques to develop relevant predictive models.
Collapse
|
8
|
Abedi M, Marateb HR, Mohebian MR, Aghaee-Bakhtiari SH, Nassiri SM, Gheisari Y. Systems biology and machine learning approaches identify drug targets in diabetic nephropathy. Sci Rep 2021; 11:23452. [PMID: 34873190 PMCID: PMC8648918 DOI: 10.1038/s41598-021-02282-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 11/12/2021] [Indexed: 11/15/2022] Open
Abstract
Diabetic nephropathy (DN), the leading cause of end-stage renal disease, has become a massive global health burden. Despite considerable efforts, the underlying mechanisms have not yet been comprehensively understood. In this study, a systematic approach was utilized to identify the microRNA signature in DN and to introduce novel drug targets (DTs) in DN. Using microarray profiling followed by qPCR confirmation, 13 and 6 differentially expressed (DE) microRNAs were identified in the kidney cortex and medulla, respectively. The microRNA-target interaction networks for each anatomical compartment were constructed and central nodes were identified. Moreover, enrichment analysis was performed to identify key signaling pathways. To develop a strategy for DT prediction, the human proteome was annotated with 65 biochemical characteristics and 23 network topology parameters. Furthermore, all proteins targeted by at least one FDA-approved drug were identified. Next, mGMDH-AFS, a high-performance machine learning algorithm capable of tolerating massive imbalanced size of the classes, was developed to classify DT and non-DT proteins. The sensitivity, specificity, accuracy, and precision of the proposed method were 90%, 86%, 88%, and 89%, respectively. Moreover, it significantly outperformed the state-of-the-art (P-value ≤ 0.05) and showed very good diagnostic accuracy and high agreement between predicted and observed class labels. The cortex and medulla networks were then analyzed with this validated machine to identify potential DTs. Among the high-rank DT candidates are Egfr, Prkce, clic5, Kit, and Agtr1a which is a current well-known target in DN. In conclusion, a combination of experimental and computational approaches was exploited to provide a holistic insight into the disorder for introducing novel therapeutic targets.
Collapse
Affiliation(s)
- Maryam Abedi
- grid.411036.10000 0001 1498 685XRegenerative Medicine Research Center, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Hamid Reza Marateb
- grid.411750.60000 0001 0454 365XBiomedical Engineering Department, Engineering Faculty, University of Isfahan, Isfahan, Iran ,grid.6835.80000 0004 1937 028XDepartment of Automatic Control, Biomedical Engineering Research Center, Universitat Politècnica de Catalunya, BarcelonaTech (UPC), Barcelona, Spain
| | - Mohammad Reza Mohebian
- grid.25152.310000 0001 2154 235XDepartment of Electrical and Computer Engineering, University of Saskatchewan, Saskatoon, Canada
| | - Seyed Hamid Aghaee-Bakhtiari
- grid.411583.a0000 0001 2198 6209Bioinformatics Research Group, Mashhad University of Medical Sciences, Mashhad, Iran ,grid.411583.a0000 0001 2198 6209Department of Medical Biotechnology and Nanotechnology, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Seyed Mahdi Nassiri
- grid.46072.370000 0004 0612 7950Department of Clinical Pathology, Faculty of Veterinary Medicine, University of Tehran, Tehran, Iran
| | - Yousof Gheisari
- Regenerative Medicine Research Center, Isfahan University of Medical Sciences, Isfahan, Iran. .,Department of Genetics and Molecular Biology, Isfahan University of Medical Sciences, Isfahan, Iran.
| |
Collapse
|
9
|
Singh N, Bhatnagar S. Machine Learning for Prediction of Drug Targets in Microbe Associated Cardiovascular Diseases by Incorporating Host-pathogen Interaction Network Parameters. Mol Inform 2021; 41:e2100115. [PMID: 34676983 DOI: 10.1002/minf.202100115] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 10/01/2021] [Indexed: 12/20/2022]
Abstract
Host-pathogen interactions play a crucial role in invasion, infection, and induction of immune response in humans. In this work, four machine learning algorithms, namely Logistic regression, K-nearest neighbor, Support Vector Machine, and Random Forest were implemented for the classification of drug targets. The algorithms were trained using 3400 hosts and 3800 pathogen drug and non-drug target proteins as learning instances. For each protein, 68 pathogen and 73 host features were computed that included sequence, structure, biological and host-pathogen network centrality characteristics. The Random Forest classifier model achieved the best accuracy after 10-fold cross-validation. 99 % accuracy was achieved with a ROC-AUC score of 0.99±0.01 for both pathogen and host training sets. The Eigenvector Centrality of host-pathogen interactions and host-host interactions was the top feature in performing classification of pathogen and host targets respectively. Other features important for classification were the presence of catalytic and binding sites, low instability/aliphatic index, and cellular location. The Random Forest classifier was then used for prediction of drug targets involved in Microbe Associated Cardiovascular Diseases. 331 host and 743 pathogen proteins were predicted as drug targets by the random forest model and can be validated experimentally for therapeutic intervention in Microbe Associated Cardiovascular Diseases.
Collapse
Affiliation(s)
- Nirupma Singh
- Department of Biotechnology, Netaji Subhas Institute of Technology, Dwarka, New Delhi, 110078, India
| | - Sonika Bhatnagar
- Department of Biotechnology, Netaji Subhas Institute of Technology, Dwarka, New Delhi, 110078, India.,Computational and Structural Biology Laboratory, Department of Biological Sciences and Engineering, Netaji Subhas University of Technology Dwarka, New Delhi, 110078, India
| |
Collapse
|
10
|
Tosadori G, Di Silvestre D, Spoto F, Mauri P, Laudanna C, Scardoni G. Analysing omics data sets with weighted nodes networks (WNNets). Sci Rep 2021; 11:14447. [PMID: 34262093 PMCID: PMC8280138 DOI: 10.1038/s41598-021-93699-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 06/16/2021] [Indexed: 11/30/2022] Open
Abstract
Current trends in biomedical research indicate data integration as a fundamental step towards precision medicine. In this context, network models allow representing and analysing complex biological processes. However, although effective in unveiling network properties, these models fail in considering the individual, biochemical variations occurring at molecular level. As a consequence, the analysis of these models partially loses its predictive power. To overcome these limitations, Weighted Nodes Networks (WNNets) were developed. WNNets allow to easily and effectively weigh nodes using experimental information from multiple conditions. In this study, the characteristics of WNNets were described and a proteomics data set was modelled and analysed. Results suggested that degree, an established centrality index, may offer a novel perspective about the functional role of nodes in WNNets. Indeed, degree allowed retrieving significant differences between experimental conditions, highlighting relevant proteins, and provided a novel interpretation for degree itself, opening new perspectives in experimental data modelling and analysis. Overall, WNNets may be used to model any high-throughput experimental data set requiring weighted nodes. Finally, improving the power of the analysis by using centralities such as betweenness may provide further biological insights and unveil novel, interesting characteristics of WNNets.
Collapse
Affiliation(s)
- Gabriele Tosadori
- Center for BioMedical Computing (CBMC), University of Verona, Strada le Grazie 8, 37134, Verona, Italy.
- Section of General Pathology, Department of Medicine, University of Verona, 37134, Verona, Italy.
| | - Dario Di Silvestre
- Institute for Biomedical Technologies, National Research Council (ITB-CNR), via F.lli Cervi 93, Segrate, 20090, Milan, Italy
| | - Fausto Spoto
- Department of Computer Science, University of Verona, Strada le Grazie 15, 37134, Verona, Italy
| | - Pierluigi Mauri
- Institute for Biomedical Technologies, National Research Council (ITB-CNR), via F.lli Cervi 93, Segrate, 20090, Milan, Italy
| | - Carlo Laudanna
- Section of General Pathology, Department of Medicine, University of Verona, 37134, Verona, Italy.
| | - Giovanni Scardoni
- Center for BioMedical Computing (CBMC), University of Verona, Strada le Grazie 8, 37134, Verona, Italy
| |
Collapse
|
11
|
Abstract
Background:
At present, using computer methods to predict drug-target interactions
(DTIs) is a very important step in the discovery of new drugs and drug relocation
processes. The potential DTIs identified by machine learning methods can provide guidance
in biochemical or clinical experiments.
Objective:
The goal of this article is to combine the latest network representation learning
methods for drug-target prediction research, improve model prediction capabilities, and
promote new drug development.
Methods:
We use large-scale information network embedding (LINE) method to extract
network topology features of drugs, targets, diseases, etc., integrate features obtained
from heterogeneous networks, construct binary classification samples, and use random
forest (RF) method to predict DTIs.
Results:
The experiments in this paper compare the common classifiers of RF, LR, and
SVM, as well as the typical network representation learning methods of LINE,
Node2Vec, and DeepWalk. It can be seen that the combined method LINE-RF achieves
the best results, reaching an AUC of 0.9349 and an AUPR of 0.9016.
Conclusion:
The learning method based on LINE network can effectively learn drugs,
targets, diseases and other hidden features from the network topology. The combination
of features learned through multiple networks can enhance the expression ability. RF is an
effective method of supervised learning. Therefore, the Line-RF combination method is a
widely applicable method.
Collapse
Affiliation(s)
- Jihong Wang
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Yue Shi
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Xiaodan Wang
- School of Pharmaceutical Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Zhongshan, Guangdong, China
| | - Huiyou Chang
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, Guangdong, China
| |
Collapse
|
12
|
Wang L, You ZH, Li LP, Yan X, Zhang W, Song KJ, Song CD. Identification of potential drug-targets by combining evolutionary information extracted from frequency profiles and molecular topological structures. Chem Biol Drug Des 2020; 96:758-767. [PMID: 31393672 DOI: 10.1111/cbdd.13599] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 07/29/2019] [Accepted: 08/03/2019] [Indexed: 01/09/2023]
Abstract
Identifying interactions among drug compounds and target proteins is the basis of drug research and plays a crucial role in drug discovery. However, determining drug-target interactions (DTIs) and potential protein-compound interactions by biological experiment-based method alone is a very complicated, expensive, and time-consuming process. Hence, there is an intense motivation to design in silico prediction methods to overcome these obstacles. In this work, we designed a novel in silico strategy to predict proteome-scale DTIs based on the assumption that DTI pairs can be expressed through the evolutionary information derived from frequency profiles and drugs' structural properties. To achieve this, drug molecules are encoded into the substructure fingerprints to represent certain fragments; target proteins are first converted into position-specific scoring matrix (PSSM) and then encoded as 2-dimensional principal component analysis (2DPCA) descriptors. In the prediction phase, the feature weighted rotation forest (RF) classifier is used to estimate whether drug and target interact with each other on four benchmark datasets, including Enzymes, Ion Channels, GPCRs, and Nuclear Receptors. The prediction accuracy of cross-validation on the four datasets is 95.40%, 88.82%, 85.67%, and 82.22%, respectively. In order to have a clearer assessment of the proposed approach, we compared it with the discrete cosine transform (DCT) descriptor model, support vector machine (SVM) classifier model, and existing excellent approaches, including DBSI, NetCBP, KBMF2K, SIMCOMP, and RFDT. The excellent results of the experiment indicated that the proposed approach can effectively improve the DTI prediction accuracy and can be used as a practical tool for the research and design of new drugs.
Collapse
Affiliation(s)
- Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China.,Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| | - Zhu-Hong You
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| | - Li-Ping Li
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| | - Xin Yan
- School of Foreign Languages, Zaozhuang University, Zaozhuang, China
| | - Wei Zhang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
| | - Ke-Jian Song
- School of information engineering, JiangXi University of Science and Technology, Ganzhou, China
| | - Chuan-Dong Song
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
| |
Collapse
|
13
|
Dezső Z, Ceccarelli M. Machine learning prediction of oncology drug targets based on protein and network properties. BMC Bioinformatics 2020; 21:104. [PMID: 32171238 PMCID: PMC7071582 DOI: 10.1186/s12859-020-3442-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 03/04/2020] [Indexed: 01/12/2023] Open
Abstract
Background The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. Results We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an Area Under the Curve (AUC) of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. Conclusions We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.
Collapse
Affiliation(s)
- Zoltán Dezső
- Computational Biology-Genomic Research Center, ABBVIE, Redwood City, CA, USA.
| | - Michele Ceccarelli
- Computational Biology-Genomic Research Center, ABBVIE, Redwood City, CA, USA. .,Department of Electrical Engineering and Information Technology (DIETI), University of Naples "Federico II", 80128, Naples, Italy. .,Istituto di Ricerche Genetiche "G. Salvatore", Biogem s.c.ar.l, 83031, Ariano Irpino, Italy.
| |
Collapse
|
14
|
Abstract
Background:
Identifying Drug-Target Interactions (DTIs) is a major challenge for
current drug discovery and drug repositioning. Compared to traditional experimental approaches,
in silico methods are fast and inexpensive. With the increase in open-access experimental data,
numerous computational methods have been applied to predict DTIs.
Methods:
In this study, we propose an end-to-end learning model of Factorization Machine and
Deep Neural Network (FM-DNN), which emphasizes both low-order (first or second order) and
high-order (higher than second order) feature interactions without any feature engineering other
than raw features. This approach combines the power of FM and DNN learning for feature
learning in a new neural network architecture.
Results:
The experimental DTI basic features include drug characteristics (609), target
characteristics (1819), plus drug ID, target ID, total 2430. We compare 8 models such as SVM,
GBDT, WIDE-DEEP etc, the FM-DNN algorithm model obtains the best results of AUC(0.8866)
and AUPR(0.8281).
Conclusion:
Feature engineering is a job that requires expert knowledge, it is often difficult and
time-consuming to achieve good results. FM-DNN can auto learn a lower-order expression by FM
and a high-order expression by DNN.FM-DNN model has outstanding advantages over other
commonly used models.
Collapse
Affiliation(s)
- Jihong Wang
- School of Data and Computer Science, Sun Yat-Sen University, No.132 Waihuan East Road, 510000 Guangzhou, China
| | - Hao Wang
- School of Data and Computer Science, Sun Yat-Sen University, No.132 Waihuan East Road, 510000 Guangzhou, China
| | - Xiaodan Wang
- School of Pharmaceutical Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, No. 9- 13 Wuguishan Avenue of Life Street, 528458, Zhongshan, China
| | - Huiyou Chang
- School of Data and Computer Science, Sun Yat-Sen University, No.132 Waihuan East Road, 510000 Guangzhou, China
| |
Collapse
|
15
|
Hu Y, Zhao T, Zhang N, Zhang Y, Cheng L. A Review of Recent Advances and Research on Drug Target Identification Methods. Curr Drug Metab 2019; 20:209-216. [PMID: 30251599 DOI: 10.2174/1389200219666180925091851] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 01/01/2018] [Accepted: 08/02/2018] [Indexed: 12/14/2022]
Abstract
BACKGROUND From a therapeutic viewpoint, understanding how drugs bind and regulate the functions of their target proteins to protect against disease is crucial. The identification of drug targets plays a significant role in drug discovery and studying the mechanisms of diseases. Therefore the development of methods to identify drug targets has become a popular issue. METHODS We systematically review the recent work on identifying drug targets from the view of data and method. We compiled several databases that collect data more comprehensively and introduced several commonly used databases. Then divided the methods into two categories: biological experiments and machine learning, each of which is subdivided into different subclasses and described in detail. RESULTS Machine learning algorithms are the majority of new methods. Generally, an optimal set of features is chosen to predict successful new drug targets with similar properties. The most widely used features include sequence properties, network topological features, structural properties, and subcellular locations. Since various machine learning methods exist, improving their performance requires combining a better subset of features and choosing the appropriate model for the various datasets involved. CONCLUSION The application of experimental and computational methods in protein drug target identification has become increasingly popular in recent years. Current biological and computational methods still have many limitations due to unbalanced and incomplete datasets or imperfect feature selection methods.
Collapse
Affiliation(s)
- Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ningyi Zhang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150088, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
16
|
Rouillard AD, Hurle MR, Agarwal P. Systematic interrogation of diverse Omic data reveals interpretable, robust, and generalizable transcriptomic features of clinically successful therapeutic targets. PLoS Comput Biol 2018; 14:e1006142. [PMID: 29782487 PMCID: PMC5983857 DOI: 10.1371/journal.pcbi.1006142] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Revised: 06/01/2018] [Accepted: 04/13/2018] [Indexed: 11/19/2022] Open
Abstract
Target selection is the first and pivotal step in drug discovery. An incorrect choice may not manifest itself for many years after hundreds of millions of research dollars have been spent. We collected a set of 332 targets that succeeded or failed in phase III clinical trials, and explored whether Omic features describing the target genes could predict clinical success. We obtained features from the recently published comprehensive resource: Harmonizome. Nineteen features appeared to be significantly correlated with phase III clinical trial outcomes, but only 4 passed validation schemes that used bootstrapping or modified permutation tests to assess feature robustness and generalizability while accounting for target class selection bias. We also used classifiers to perform multivariate feature selection and found that classifiers with a single feature performed as well in cross-validation as classifiers with more features (AUROC = 0.57 and AUPR = 0.81). The two predominantly selected features were mean mRNA expression across tissues and standard deviation of expression across tissues, where successful targets tended to have lower mean expression and higher expression variance than failed targets. This finding supports the conventional wisdom that it is favorable for a target to be present in the tissue(s) affected by a disease and absent from other tissues. Overall, our results suggest that it is feasible to construct a model integrating interpretable target features to inform target selection. We anticipate deeper insights and better models in the future, as researchers can reuse the data we have provided to improve methods for handling sample biases and learn more informative features. Code, documentation, and data for this study have been deposited on GitHub at https://github.com/arouillard/omic-features-successful-targets.
Collapse
Affiliation(s)
| | - Mark R. Hurle
- Computational Biology, GSK, Collegeville, PA, United States of America
| | - Pankaj Agarwal
- Computational Biology, GSK, Collegeville, PA, United States of America
| |
Collapse
|
17
|
Brown KK, Hann MM, Lakdawala AS, Santos R, Thomas PJ, Todd K. Approaches to target tractability assessment - a practical perspective. MEDCHEMCOMM 2018; 9:606-613. [PMID: 30108951 DOI: 10.1039/c7md00633k] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 02/13/2018] [Indexed: 01/21/2023]
Abstract
The assessment of the suitability of novel targets to intervention by different modalities, e.g. small molecules or antibodies, is increasingly seen as important in helping to select the most progressable targets at the outset of a drug discovery project. This perspective considers differing aspects of tractability and how it can be assessed using in silico and experimental approaches. We also share some of our experiences in using these approaches.
Collapse
Affiliation(s)
- Kristin K Brown
- Computational and Modelling Sciences , Platform Technology and Sciences , GlaxoSmithKline , 1250 S. Collegeville Road , Collegeville , Pennsylvania 19426 , USA
| | - Michael M Hann
- NCE Molecular Discovery , Platform Technology and Sciences , GlaxoSmithKline Medicines Research Centre , Gunnels Wood Road, Stevenage , Hertfordshire , SG1 2NY , UK .
| | - Ami S Lakdawala
- In vitro/In vivo Translation Sciences , Platform Technology and Sciences , GlaxoSmithKline , 1250 S. Collegeville Road , Collegeville , Pennsylvania 19426 , USA
| | - Rita Santos
- Target Sciences Computational Biology , GlaxoSmithKline Medicines Research Centre , Gunnels Wood Road, Stevenage , Hertfordshire , SG1 2NY , UK
| | - Pamela J Thomas
- Computational and Modelling Sciences , Platform Technology and Sciences , GlaxoSmithKline Medicines Research Centre , Gunnels Wood Road, Stevenage , Hertfordshire , SG1 2NY , UK
| | - Kieran Todd
- Computational and Modelling Sciences , Platform Technology and Sciences , GlaxoSmithKline Medicines Research Centre , Gunnels Wood Road, Stevenage , Hertfordshire , SG1 2NY , UK
| |
Collapse
|
18
|
Zhao X, Han Q, Lv Y, Sun L, Gang X, Wang G. Biomarkers for cognitive decline in patients with diabetes mellitus: evidence from clinical studies. Oncotarget 2017; 9:7710-7726. [PMID: 29484146 PMCID: PMC5800938 DOI: 10.18632/oncotarget.23284] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Accepted: 10/30/2017] [Indexed: 12/26/2022] Open
Abstract
Diabetes mellitus is considered as an important factor for cognitive decline and dementia in recent years. However, cognitive impairment in diabetic patients is often underestimated and kept undiagnosed, leading to thousands of diabetic patients suffering from worsening memory. Available reviews in this field were limited and not comprehensive enough. Thus, the present review aimed to summarize all available clinical studies on diabetic patients with cognitive decline, and to find valuable biomarkers that might be applied as diagnostic and therapeutic targets of cognitive impairment in diabetes. The biomarkers or risk factors of cognitive decline in diabetic patients could be classified into the following three aspects: serum molecules or relevant complications, functional or metabolic changes by neuroimaging tools, and genetic variants. Specifically, factors related to poor glucose metabolism, insulin resistance, inflammation, comorbid depression, micro-/macrovascular complications, adipokines, neurotrophic molecules and Tau protein presented significant changes in diabetic patients with cognitive decline. Besides, neuroimaging platform could provide more clues on the structural, functional and metabolic changes during the cognitive decline progression of diabetic patients. Genetic factors related to cognitive decline showed inconsistency based on the limited studies. Future studies might apply above biomarkers as diagnostic and treatment targets in a large population, and regulation of these parameters might shed light on a more valuable, sensitive and specific strategy for the diagnosis and treatment of cognitive decline in diabetic patients.
Collapse
Affiliation(s)
- Xue Zhao
- Department of Endocrinology and Metabolism, The First Hospital of Jilin University, Changchun, 130021, Jilin Province, China
| | - Qing Han
- Hospital of Orthopedics, The Second Hospital of Jilin University, Changchun, 130021, Jilin Province, China
| | - You Lv
- Department of Endocrinology and Metabolism, The First Hospital of Jilin University, Changchun, 130021, Jilin Province, China
| | - Lin Sun
- Department of Endocrinology and Metabolism, The First Hospital of Jilin University, Changchun, 130021, Jilin Province, China
| | - Xiaokun Gang
- Department of Endocrinology and Metabolism, The First Hospital of Jilin University, Changchun, 130021, Jilin Province, China
| | - Guixia Wang
- Department of Endocrinology and Metabolism, The First Hospital of Jilin University, Changchun, 130021, Jilin Province, China
| |
Collapse
|
19
|
Peng X, Wang J, Peng W, Wu FX, Pan Y. Protein-protein interactions: detection, reliability assessment and applications. Brief Bioinform 2017; 18:798-819. [PMID: 27444371 DOI: 10.1093/bib/bbw066] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Indexed: 01/06/2023] Open
Abstract
Protein-protein interactions (PPIs) participate in all important biological processes in living organisms, such as catalyzing metabolic reactions, DNA replication, DNA transcription, responding to stimuli and transporting molecules from one location to another. To reveal the function mechanisms in cells, it is important to identify PPIs that take place in the living organism. A large number of PPIs have been discovered by high-throughput experiments and computational methods. However, false-positive PPIs have been introduced too. Therefore, to obtain reliable PPIs, many computational methods have been proposed. Generally, these methods can be classified into two categories. One category includes the methods that are designed to determine new reliable PPIs. The other one is designed to assess the reliability of existing PPIs and filter out the unreliable ones. In this article, we review the two kinds of methods for detecting reliable PPIs, and then focus on evaluating the performance of some of these typical methods. Later on, we also enumerate several PPI network-based applications with taking a reliability assessment of the PPI data into consideration. Finally, we will discuss the challenges for obtaining reliable PPIs and future directions of the construction of reliable PPI networks. Our research will provide readers some guidance for choosing appropriate methods and features for obtaining reliable PPIs.
Collapse
|
20
|
Ferrero E, Dunham I, Sanseau P. In silico prediction of novel therapeutic targets using gene-disease association data. J Transl Med 2017; 15:182. [PMID: 28851378 PMCID: PMC5576250 DOI: 10.1186/s12967-017-1285-6] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Accepted: 08/22/2017] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene-disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market. METHODS To test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets. RESULTS We observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature. CONCLUSIONS Our in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target prioritisation holds the potential to reduce both the costs and the development times associated with bringing new medicines to patients.
Collapse
Affiliation(s)
- Enrico Ferrero
- Computational Biology and Stats, Target Sciences, GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, SG1 2NY UK
| | - Ian Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Philippe Sanseau
- Computational Biology and Stats, Target Sciences, GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, SG1 2NY UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| |
Collapse
|
21
|
Global vision of druggability issues: applications and perspectives. Drug Discov Today 2016; 22:404-415. [PMID: 27939283 DOI: 10.1016/j.drudis.2016.11.021] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Revised: 10/10/2016] [Accepted: 11/25/2016] [Indexed: 02/04/2023]
Abstract
During the preliminary stage of a drug discovery project, the lack of druggability information and poor target selection are the main causes of frequent failures. Elaborating on accurate computational druggability prediction methods is a requirement for prioritizing target selection, designing new drugs and avoiding side effects. In this review, we describe a survey of recently reported druggability prediction methods mainly based on networks, statistical pocket druggability predictions and virtual screening. An application for a frequent mutation of p53 tumor suppressor is presented, illustrating the complementarity of druggability prediction approaches, the remaining challenges and potential new drug development perspectives.
Collapse
|
22
|
Kandoi G, Acencio ML, Lemke N. Prediction of Druggable Proteins Using Machine Learning and Systems Biology: A Mini-Review. Front Physiol 2015; 6:366. [PMID: 26696900 PMCID: PMC4672042 DOI: 10.3389/fphys.2015.00366] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2015] [Accepted: 11/17/2015] [Indexed: 12/11/2022] Open
Abstract
The emergence of -omics technologies has allowed the collection of vast amounts of data on biological systems. Although, the pace of such collection has been exponential, the impact of these data remains small on many critical biomedical applications such as drug development. Limited resources, high costs, and low hit-to-lead ratio have led researchers to search for more cost effective methodologies. A possible alternative is to incorporate computational methods of potential drug target prediction early during drug discovery workflow. Computational methods based on systems approaches have the advantage of taking into account the global properties of a molecule not limited to its sequence, structure or function. Machine learning techniques are powerful tools that can extract relevant information from massive and noisy data sets. In recent years the scientific community has explored the combined power of these fields to propose increasingly accurate and low cost methods to propose interesting drug targets. In this mini-review, we describe promising approaches based on the simultaneous use of systems biology and machine learning to access gene and protein druggability. Moreover, we discuss the state-of-the-art of this emerging and interdisciplinary field, discussing data sources, algorithms and the performance of the different methodologies. Finally, we indicate interesting avenues of research and some remaining open challenges.
Collapse
Affiliation(s)
- Gaurav Kandoi
- Department of Electrical and Computer Engineering, Iowa State University Ames, IA, USA
| | - Marcio L Acencio
- Department of Physics and Biophysics, Institute of Biosciences of Botucatu, UNESP - São Paulo State University Botucatu, Brazil
| | - Ney Lemke
- Department of Physics and Biophysics, Institute of Biosciences of Botucatu, UNESP - São Paulo State University Botucatu, Brazil
| |
Collapse
|
23
|
Mehla K, Ramana J. Novel Drug Targets for Food-Borne Pathogen Campylobacter jejuni: An Integrated Subtractive Genomics and Comparative Metabolic Pathway Study. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2015; 19:393-406. [PMID: 26061459 DOI: 10.1089/omi.2015.0046] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Campylobacters are a major global health burden and a cause of food-borne diarrheal illness and economic loss worldwide. In developing countries, Campylobacter infections are frequent in children under age two and may be associated with mortality. In developed countries, they are a common cause of bacterial diarrhea in early adulthood. In the United States, antibiotic resistance against Campylobacter is notably increased from 13% in 1997 to nearly 25% in 2011. Novel drug targets are urgently needed but remain a daunting task to accomplish. We suggest that omics-guided drug discovery is timely and worth considering in this context. The present study employed an integrated subtractive genomics and comparative metabolic pathway analysis approach. We identified 16 unique pathways from Campylobacter when compared against H. sapiens with 326 non-redundant proteins; 115 of these were found to be essential in the Database of Essential Genes. Sixty-six proteins among these were non-homologous to the human proteome. Six membrane proteins, of which four are transporters, have been proposed as potential vaccine candidates. Screening of 66 essential non-homologous proteins against DrugBank resulted in identification of 34 proteins with drug-ability potential, many of which play critical roles in bacterial growth and survival. Out of these, eight proteins had approved drug targets available in DrugBank, the majority serving crucial roles in cell wall synthesis and energy metabolism and therefore having the potential to be utilized as drug targets. We conclude by underscoring that screening against these proteins with inhibitors may aid in future discovery of novel therapeutics against campylobacteriosis in ways that will be pathogen specific, and thus have minimal toxic effect on host. Omics-guided drug discovery and bioinformatics analyses offer the broad potential for veritable advances in global health relevant novel therapeutics.
Collapse
Affiliation(s)
- Kusum Mehla
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology , Solan, Himachal Pradesh, India
| | - Jayashree Ramana
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology , Solan, Himachal Pradesh, India
| |
Collapse
|