1
|
Talevi A. Computer-Aided Drug Discovery and Design: Recent Advances and Future Prospects. Methods Mol Biol 2024; 2714:1-20. [PMID: 37676590 DOI: 10.1007/978-1-0716-3441-7_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Computer-aided drug discovery and design involve the use of information technologies to identify and develop, on a rational ground, chemical compounds that align a set of desired physicochemical and biological properties. In its most common form, it involves the identification and/or modification of an active scaffold (or the combination of known active scaffolds), although de novo drug design from scratch is also possible. Traditionally, the drug discovery and design processes have focused on the molecular determinants of the interactions between drug candidates and their known or intended pharmacological target(s). Nevertheless, in modern times, drug discovery and design are conceived as a particularly complex multiparameter optimization task, due to the complicated, often conflicting, property requirements.This chapter provides an updated overview of in silico approaches for identifying active scaffolds and guiding the subsequent optimization process. Recent groundbreaking advances in the field have also analyzed the integration of state-of-the-art machine learning approaches in every step of the drug discovery process (from prediction of target structure to customized molecular docking scoring functions), integration of multilevel omics data, and the use of a diversity of computational approaches to assist target validation and assess plausible binding pockets.
Collapse
Affiliation(s)
- Alan Talevi
- Laboratory of Bioactive Compound Research and Development (LIDeB), Faculty of Exact Sciences, National University of La Plata (UNLP), La Plata, Argentina.
- Argentinean National Council of Scientific and Technical Research (CONICET), La Plata, Argentina.
| |
Collapse
|
2
|
Alghushairy O, Ali F, Alghamdi W, Khalid M, Alsini R, Asiry O. Machine learning-based model for accurate identification of druggable proteins using light extreme gradient boosting. J Biomol Struct Dyn 2023:1-12. [PMID: 37850427 DOI: 10.1080/07391102.2023.2269280] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/04/2023] [Indexed: 10/19/2023]
Abstract
The identification of druggable proteins (DPs) is significant for the development of new drugs, personalized medicine, understanding of disease mechanisms, drug repurposing, and economic benefits. By identifying new druggable targets, researchers can develop new therapies for a range of diseases, leading to better patient outcomes. Identification of DPs by machine learning strategies is more efficient and cost-effective than conventional methods. In this study, a computational predictor, namely Drug-LXGB, is introduced to enhance the identification of DPs. Features are discovered by composition, transition, and distribution (CTD), composition of K-spaced amino acid pair (CKSAAP), pseudo-position-specific scoring matrix (PsePSSM), and a novel descriptor, called multi-block pseudo amino acid composition (MB-PseAAC). The dimensions of CTD, CKSAAP, PsePSSM, and MB-PseAAC are integrated and utilized the sequential forward selection as feature selection algorithm. The best characteristics are provided by random forest, extreme gradient boosting, and light eXtreme gradient boosting (LXGB). The predictive analysis of these learning methods is measured via 10-fold cross-validation. The LXGB-based model secures the highest results than other existing predictors. Our novel protocol will perform an active role in designing novel drugs and would be fruitful to explore the potential target. This study will help better to capture a more universal view of a potential target.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Omar Alghushairy
- Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Farman Ali
- Department of Software Engineering, Sarhad University of Science and Information Technology Peshawar Mardan Campus, Peshawar, Pakistan
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Majdi Khalid
- Department of Computer Science, College of Computers and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Othman Asiry
- Department of Information Technology, College of Computing and Information Technology at Khulais, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
3
|
Cunningham M, Pins D, Dezső Z, Torrent M, Vasanthakumar A, Pandey A. PINNED: identifying characteristics of druggable human proteins using an interpretable neural network. J Cheminform 2023; 15:64. [PMID: 37468968 PMCID: PMC10354961 DOI: 10.1186/s13321-023-00735-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/10/2023] [Indexed: 07/21/2023] Open
Abstract
The identification of human proteins that are amenable to pharmacologic modulation without significant off-target effects remains an important unsolved challenge. Computational methods have been devised to identify features which distinguish between "druggable" and "undruggable" proteins, finding that protein sequence, tissue and cellular localization, biological role, and position in the protein-protein interaction network are all important discriminant factors. However, many prior efforts to automate the assessment of protein druggability suffer from low performance or poor interpretability. We developed a neural network-based machine learning model capable of generating druggability sub-scores based on each of four distinct categories, combining them to form an overall druggability score. The model achieves an excellent performance in separating drugged and undrugged proteins in the human proteome, with an area under the receiver operating characteristic (AUC) of 0.95. Our use of multiple sub-scores allows the assessment of potential protein targets of interest based on distinct contributors to druggability, leading to a more interpretable and holistic model to identify novel targets.
Collapse
Affiliation(s)
- Michael Cunningham
- Genomics Research Center, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA.
| | - Danielle Pins
- Information Research, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Zoltán Dezső
- Genomics Research Center, AbbVie Inc., 1000 Gateway Boulevard, South San Francisco, CA, 94080, USA
| | - Maricel Torrent
- Small Molecule Therapeutics and Platform Technologies, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Aparna Vasanthakumar
- Genomics Research Center, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Abhishek Pandey
- Information Research, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| |
Collapse
|
4
|
Peña-Guerrero J, Fernández-Rubio C, García-Sosa AT, Nguewa PA. BRCT Domains: Structure, Functions, and Implications in Disease-New Therapeutic Targets for Innovative Drug Discovery against Infections. Pharmaceutics 2023; 15:1839. [PMID: 37514027 PMCID: PMC10386641 DOI: 10.3390/pharmaceutics15071839] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 06/12/2023] [Accepted: 06/22/2023] [Indexed: 07/30/2023] Open
Abstract
The search for new therapeutic targets and their implications in drug development remains an emerging scientific topic. BRCT-bearing proteins are found in Archaea, Bacteria, Eukarya, and viruses. They are traditionally involved in DNA repair, recombination, and cell cycle control. To carry out these functions, BRCT domains are able to interact with DNA and proteins. Moreover, such domains are also implicated in several pathogenic processes and malignancies including breast, ovarian, and lung cancer. Although these domains exhibit moderately conserved folding, their sequences show very low conservation. Interestingly, sequence variations among species are considered positive traits in the search for suitable therapeutic targets, since non-specific drug interactions might be reduced. These main characteristics of BRCT, as well as its critical implications in key biological processes in the cell, have prompted the study of these domains as therapeutic targets. This review explores the possible roles of BRCT domains as therapeutic targets for drug discovery. We describe their common structural features and relevant interactions and pathways, as well as their implications in pathologic processes. Drugs commonly used to target these domains are also presented. Finally, based on their structures, we describe new drug design possibilities using modern and innovative techniques.
Collapse
Affiliation(s)
- José Peña-Guerrero
- ISTUN Institute of Tropical Health, Department of Microbiology and Parasitology, University of Navarra, IdiSNA (Navarra Institute for Health Research), E-31008 Pamplona, Navarra, Spain
| | - Celia Fernández-Rubio
- ISTUN Institute of Tropical Health, Department of Microbiology and Parasitology, University of Navarra, IdiSNA (Navarra Institute for Health Research), E-31008 Pamplona, Navarra, Spain
| | - Alfonso T García-Sosa
- Chair of Molecular Technology, Institute of Chemistry, University of Tartu, Ravila 14a, 50411 Tartu, Estonia
| | - Paul A Nguewa
- ISTUN Institute of Tropical Health, Department of Microbiology and Parasitology, University of Navarra, IdiSNA (Navarra Institute for Health Research), E-31008 Pamplona, Navarra, Spain
| |
Collapse
|
5
|
Ma’ruf M, Fadli JC, Mahendra MR, Irham LM, Sulistyani N, Adikusuma W, Chong R, Septama AW. A bioinformatic approach to identify pathogenic variants for Stevens-Johnson syndrome. Genomics Inform 2023; 21:e26. [PMID: 37704211 PMCID: PMC10326529 DOI: 10.5808/gi.23010] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 04/09/2023] [Accepted: 04/12/2023] [Indexed: 07/08/2023] Open
Abstract
Stevens-Johnson syndrome (SJS) produces a severe hypersensitivity reaction caused by Herpes simplex virus or mycoplasma infection, vaccination, systemic disease, or other agents. Several studies have investigated the genetic susceptibility involved in SJS. To provide further genetic insights into the pathogenesis of SJS, this study prioritized high-impact, SJS-associated pathogenic variants through integrating bioinformatic and population genetic data. First, we identified SJS-associated single nucleotide polymorphisms from the genome-wide association studies catalog, followed by genome annotation with HaploReg and variant validation with Ensembl. Subsequently, expression quantitative trait locus (eQTL) from GTEx identified human genetic variants with differential gene expression across human tissues. Our results indicate that two variants, namely rs2074494 and rs5010528, which are encoded by the HLA-C (human leukocyte antigen C) gene, were found to be differentially expressed in skin. The allele frequencies for rs2074494 and rs5010528 also appear to significantly differ across continents. We highlight the utility of these population-specific HLA-C genetic variants for genetic association studies, and aid in early prognosis and disease treatment of SJS.
Collapse
Affiliation(s)
- Muhammad Ma’ruf
- Faculty of Pharmacy, Universitas Ahmad Dahlan, Yogyakarta 55164, Indonesia
| | | | | | - Lalu Muhammad Irham
- Faculty of Pharmacy, Universitas Ahmad Dahlan, Yogyakarta 55164, Indonesia
- Center for Vaccine and Drugs, Research Organization for Health, National Research and Innovation Agency (BRIN), South Tangerang 15314, Indonesia
| | - Nanik Sulistyani
- Faculty of Pharmacy, Universitas Ahmad Dahlan, Yogyakarta 55164, Indonesia
| | - Wirawan Adikusuma
- Departement of Pharmacy, University of Muhammadiyah Mataram, Mataram 83127, Indonesia
- Center for Vaccine and Drugs, Research Organization for Health, National Research and Innovation Agency (BRIN), South Tangerang 15314, Indonesia
| | - Rockie Chong
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095, CA, USA
| | - Abdi Wira Septama
- Departement of Pharmacy, University of Muhammadiyah Mataram, Mataram 83127, Indonesia
| |
Collapse
|
6
|
Thafar MA, Albaradei S, Uludag M, Alshahrani M, Gojobori T, Essack M, Gao X. OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features. Front Genet 2023; 14:1139626. [PMID: 37091791 PMCID: PMC10117673 DOI: 10.3389/fgene.2023.1139626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 03/24/2023] [Indexed: 04/08/2023] Open
Abstract
Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein’s amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the “OncologyTT” datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins’ amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer.
Collapse
Affiliation(s)
- Maha A. Thafar
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- College of Computers and Information Technology, Computer Science Department, Taif University, Taif, Saudi Arabia
| | - Somayah Albaradei
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mahmut Uludag
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Mona Alshahrani
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| |
Collapse
|
7
|
Omeershffudin UNM, Kumar S. Antibiotic resistance in Neisseria gonorrhoeae: broad-spectrum drug target identification using subtractive genomics. Genomics Inform 2023; 21:e5. [PMID: 37037463 PMCID: PMC10085745 DOI: 10.5808/gi.22066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 01/03/2023] [Accepted: 01/06/2023] [Indexed: 04/03/2023] Open
Abstract
Neisseria gonorrhoeae is a Gram-negative aerobic diplococcus bacterium that primarily causes sexually transmitted infections through direct human sexual contact. It is a major public health threat due to its impact on reproductive health, the widespread presence of antimicrobial resistance, and the lack of a vaccine. In this study, we used a bioinformatics approach and performed subtractive genomic methods to identify potential drug targets against the core proteome of N. gonorrhoeae (12 strains). In total, 12,300 protein sequences were retrieved, and paralogous proteins were removed using CD-HIT. The remaining sequences were analyzed for non-homology against the human proteome and gut microbiota, and screened for broad-spectrum analysis, druggability, and anti-target analysis. The proteins were also characterized for unique interactions between the host and pathogen through metabolic pathway analysis. Based on the subtractive genomic approach and subcellular localization, we identified one cytoplasmic protein, 2Fe-2S iron-sulfur cluster binding domain-containing protein (NGFG RS03485), as a potential drug target. This protein could be further exploited for drug development to create new medications and therapeutic agents for the treatment of N. gonorrhoeae infections.
Collapse
Affiliation(s)
| | - Suresh Kumar
- Faculty of Health and Life Sciences, Management and Science University, Seksyen 13, 40100, Shah Alam, Selangor, Malaysia
| |
Collapse
|
8
|
Yu L, Xue L, Liu F, Li Y, Jing R, Luo J. The applications of deep learning algorithms on in silico druggable proteins identification. J Adv Res 2022; 41:219-231. [PMID: 36328750 PMCID: PMC9637576 DOI: 10.1016/j.jare.2022.01.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 12/21/2021] [Accepted: 01/18/2022] [Indexed: 11/20/2022] Open
Abstract
We developed the first deep learning-based druggable protein classifier for fast and accurate identification of potential druggable proteins. Experimental results on a standard dataset demonstrate that the prediction performance of deep learning model is comparable to those of existing methods. We visualized the representations of druggable proteins learned by deep learning models, which helps us understand how they work. Our analysis reconfirms that the attention mechanism is especially useful for explaining deep learning models.
Introduction The top priority in drug development is to identify novel and effective drug targets. In vitro assays are frequently used for this purpose; however, traditional experimental approaches are insufficient for large-scale exploration of novel drug targets, as they are expensive, time-consuming and laborious. Therefore, computational methods have emerged in recent decades as an alternative to aid experimental drug discovery studies by developing sophisticated predictive models to estimate unknown drugs/compounds and their targets. The recent success of deep learning (DL) techniques in machine learning and artificial intelligence has further attracted a great deal of attention in the biomedicine field, including computational drug discovery. Objectives This study focuses on the practical applications of deep learning algorithms for predicting druggable proteins and proposes a powerful predictor for fast and accurate identification of potential drug targets. Methods Using a gold-standard dataset, we explored several typical protein features and different deep learning algorithms and evaluated their performance in a comprehensive way. We provide an overview of the entire experimental process, including protein features and descriptors, neural network architectures, libraries and toolkits for deep learning modelling, performance evaluation metrics, model interpretation and visualization. Results Experimental results show that the hybrid model (architecture: CNN-RNN (BiLSTM) + DNN; feature: dictionary encoding + DC_TC_CTD) performed better than the other models on the benchmark dataset. This hybrid model was able to achieve 90.0% accuracy and 0.800 MCC on the test dataset and 84.8% and 0.703 on a nonredundant independent test dataset, which is comparable to those of existing methods. Conclusion We developed the first deep learning-based classifier for fast and accurate identification of potential druggable proteins. We hope that this study will be helpful for future researchers who would like to use deep learning techniques to develop relevant predictive models.
Collapse
|
9
|
Singh N, Bhatnagar S. Machine Learning for Prediction of Drug Targets in Microbe Associated Cardiovascular Diseases by Incorporating Host-pathogen Interaction Network Parameters. Mol Inform 2021; 41:e2100115. [PMID: 34676983 DOI: 10.1002/minf.202100115] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 10/01/2021] [Indexed: 12/20/2022]
Abstract
Host-pathogen interactions play a crucial role in invasion, infection, and induction of immune response in humans. In this work, four machine learning algorithms, namely Logistic regression, K-nearest neighbor, Support Vector Machine, and Random Forest were implemented for the classification of drug targets. The algorithms were trained using 3400 hosts and 3800 pathogen drug and non-drug target proteins as learning instances. For each protein, 68 pathogen and 73 host features were computed that included sequence, structure, biological and host-pathogen network centrality characteristics. The Random Forest classifier model achieved the best accuracy after 10-fold cross-validation. 99 % accuracy was achieved with a ROC-AUC score of 0.99±0.01 for both pathogen and host training sets. The Eigenvector Centrality of host-pathogen interactions and host-host interactions was the top feature in performing classification of pathogen and host targets respectively. Other features important for classification were the presence of catalytic and binding sites, low instability/aliphatic index, and cellular location. The Random Forest classifier was then used for prediction of drug targets involved in Microbe Associated Cardiovascular Diseases. 331 host and 743 pathogen proteins were predicted as drug targets by the random forest model and can be validated experimentally for therapeutic intervention in Microbe Associated Cardiovascular Diseases.
Collapse
Affiliation(s)
- Nirupma Singh
- Department of Biotechnology, Netaji Subhas Institute of Technology, Dwarka, New Delhi, 110078, India
| | - Sonika Bhatnagar
- Department of Biotechnology, Netaji Subhas Institute of Technology, Dwarka, New Delhi, 110078, India.,Computational and Structural Biology Laboratory, Department of Biological Sciences and Engineering, Netaji Subhas University of Technology Dwarka, New Delhi, 110078, India
| |
Collapse
|
10
|
de Falco A, Dezso Z, Ceccarelli F, Cerulo L, Ciaramella A, Ceccarelli M. Adaptive one-class Gaussian processes allow accurate prioritization of oncology drug targets. Bioinformatics 2021; 37:1420-1427. [PMID: 33165571 DOI: 10.1093/bioinformatics/btaa968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/26/2020] [Accepted: 11/03/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The cost of drug development has dramatically increased in the last decades, with the number new drugs approved per billion US dollars spent on R&D halving every year or less. The selection and prioritization of targets is one the most influential decisions in drug discovery. Here we present a Gaussian Process model for the prioritization of drug targets cast as a problem of learning with only positive and unlabeled examples. RESULTS Since the absence of negative samples does not allow standard methods for automatic selection of hyperparameters, we propose a novel approach for hyperparameter selection of the kernel in One Class Gaussian Processes. We compare our methods with state-of-the-art approaches on benchmark datasets and then show its application to druggability prediction of oncology drugs. Our score reaches an AUC 0.90 on a set of clinical trial targets starting from a small training set of 102 validated oncology targets. Our score recovers the majority of known drug targets and can be used to identify novel set of proteins as drug target candidates. AVAILABILITY AND IMPLEMENTATION The matrix of features for each protein is available at: https://bit.ly/3iLgZTa. Source code implemented in Python is freely available for download at https://github.com/AntonioDeFalco/Adaptive-OCGP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Antonio de Falco
- BIOGEM Istituto di Ricerche Genetiche "G. Salvatore", 83031 Ariano Irpino, Italy
| | - Zoltan Dezso
- ABBVIE Biotherapeutics, Redwood City, CA 94063, USA
| | - Francesco Ceccarelli
- Donald Bren School of Information and Computer Sciences (ICS), Irvine, CA 92697, USA
| | - Luigi Cerulo
- BIOGEM Istituto di Ricerche Genetiche "G. Salvatore", 83031 Ariano Irpino, Italy.,Department of Science and Technologies, University of Sannio, 82100 Benevento, Italy
| | - Angelo Ciaramella
- Department Science and Technology, University of Naples Parthenope, 80133 Naples, Italy
| | - Michele Ceccarelli
- BIOGEM Istituto di Ricerche Genetiche "G. Salvatore", 83031 Ariano Irpino, Italy.,Department of Electrical Engineering and Information Technology (DIETI), University of Naples" Federico II", 80128 Naples, Italy
| |
Collapse
|
11
|
Adeowo FY, Lawal MM, Kumalo HM. Design and Development of Cholinesterase Dual Inhibitors towards Alzheimer's Disease Treatment: A Focus on Recent Contributions from Computational and Theoretical Perspective. ChemistrySelect 2020. [DOI: 10.1002/slct.202003573] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Fatima Y. Adeowo
- Discipline of Medical Biochemistry School of Laboratory Medicine and Medical Science University of KwaZulu-Natal Durban 4001 South Africa
| | - Monsurat M. Lawal
- Discipline of Medical Biochemistry School of Laboratory Medicine and Medical Science University of KwaZulu-Natal Durban 4001 South Africa
| | - Hezekiel M. Kumalo
- Discipline of Medical Biochemistry School of Laboratory Medicine and Medical Science University of KwaZulu-Natal Durban 4001 South Africa
| |
Collapse
|
12
|
Xu YY, Zhou H, Murphy RF, Shen HB. Consistency and variation of protein subcellular location annotations. Proteins 2020; 89:242-250. [PMID: 32935893 DOI: 10.1002/prot.26010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 07/09/2020] [Accepted: 09/13/2020] [Indexed: 11/09/2022]
Abstract
A major challenge for protein databases is reconciling information from diverse sources. This is especially difficult when some information consists of secondary, human-interpreted rather than primary data. For example, the Swiss-Prot database contains curated annotations of subcellular location that are based on predictions from protein sequence, statements in scientific articles, and published experimental evidence. The Human Protein Atlas (HPA) consists of millions of high-resolution microscopic images that show protein spatial distribution on a cellular and subcellular level. These images are manually annotated with protein subcellular locations by trained experts. The image annotations in HPA can capture the variation of subcellular location across different cell lines, tissues, or tissue states. Systematic investigation of the consistency between HPA and Swiss-Prot assignments of subcellular location, which is important for understanding and utilizing protein location data from the two databases, has not been described previously. In this paper, we quantitatively evaluate the consistency of subcellular location annotations between HPA and Swiss-Prot at multiple levels, as well as variation of protein locations across cell lines and tissues. Our results show that annotations of these two databases differ significantly in many cases, leading to proposed procedures for deriving and integrating the protein subcellular location data. We also find that proteins having highly variable locations are more likely to be biomarkers of diseases, providing support for incorporating analysis of subcellular location in protein biomarker identification and screening.
Collapse
Affiliation(s)
- Ying-Ying Xu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, China.,Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China.,Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Hang Zhou
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
| | - Robert F Murphy
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
13
|
Dezső Z, Ceccarelli M. Machine learning prediction of oncology drug targets based on protein and network properties. BMC Bioinformatics 2020; 21:104. [PMID: 32171238 PMCID: PMC7071582 DOI: 10.1186/s12859-020-3442-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 03/04/2020] [Indexed: 01/12/2023] Open
Abstract
Background The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. Results We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an Area Under the Curve (AUC) of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. Conclusions We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.
Collapse
Affiliation(s)
- Zoltán Dezső
- Computational Biology-Genomic Research Center, ABBVIE, Redwood City, CA, USA.
| | - Michele Ceccarelli
- Computational Biology-Genomic Research Center, ABBVIE, Redwood City, CA, USA. .,Department of Electrical Engineering and Information Technology (DIETI), University of Naples "Federico II", 80128, Naples, Italy. .,Istituto di Ricerche Genetiche "G. Salvatore", Biogem s.c.ar.l, 83031, Ariano Irpino, Italy.
| |
Collapse
|
14
|
Paananen J, Fortino V. An omics perspective on drug target discovery platforms. Brief Bioinform 2019; 21:1937-1953. [PMID: 31774113 PMCID: PMC7711264 DOI: 10.1093/bib/bbz122] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 07/23/2019] [Accepted: 07/27/2019] [Indexed: 01/28/2023] Open
Abstract
The drug discovery process starts with identification of a disease-modifying target. This critical step traditionally begins with manual investigation of scientific literature and biomedical databases to gather evidence linking molecular target to disease, and to evaluate the efficacy, safety and commercial potential of the target. The high-throughput and affordability of current omics technologies, allowing quantitative measurements of many putative targets (e.g. DNA, RNA, protein, metabolite), has exponentially increased the volume of scientific data available for this arduous task. Therefore, computational platforms identifying and ranking disease-relevant targets from existing biomedical data sources, including omics databases, are needed. To date, more than 30 drug target discovery (DTD) platforms exist. They provide information-rich databases and graphical user interfaces to help scientists identify putative targets and pre-evaluate their therapeutic efficacy and potential side effects. Here we survey and compare a set of popular DTD platforms that utilize multiple data sources and omics-driven knowledge bases (either directly or indirectly) for identifying drug targets. We also provide a description of omics technologies and related data repositories which are important for DTD tasks.
Collapse
Affiliation(s)
- Jussi Paananen
- Institute of Biomedicine, University of Eastern Finland, Finland.,Blueprint Genetics Ltd, Finland
| | - Vittorio Fortino
- Institute of Biomedicine, University of Eastern Finland, Finland
| |
Collapse
|
15
|
Bigaeva E, Gore E, Simon E, Zwick M, Oldenburger A, de Jong KP, Hofker HS, Schlepütz M, Nicklin P, Boersema M, Rippmann JF, Olinga P. Transcriptomic characterization of culture-associated changes in murine and human precision-cut tissue slices. Arch Toxicol 2019; 93:3549-3583. [PMID: 31754732 DOI: 10.1007/s00204-019-02611-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 11/05/2019] [Indexed: 12/14/2022]
Abstract
Our knowledge of complex pathological mechanisms underlying organ fibrosis is predominantly derived from animal studies. However, relevance of animal models for human disease is limited; therefore, an ex vivo model of human precision-cut tissue slices (PCTS) might become an indispensable tool in fibrosis research and drug development by bridging the animal-human translational gap. This study, presented as two parts, provides comprehensive characterization of the dynamic transcriptional changes in PCTS during culture by RNA sequencing. Part I investigates the differences in culture-induced responses in murine and human PCTS derived from healthy liver, kidney and gut. Part II delineates the molecular processes in cultured human PCTS generated from diseased liver, kidney and ileum. We demonstrated that culture was associated with extensive transcriptional changes and impacted PCTS in a universal way across the organs and two species by triggering an inflammatory response and fibrosis-related extracellular matrix (ECM) remodelling. All PCTS shared mRNA upregulation of IL-11 and ECM-degrading enzymes MMP3 and MMP10. Slice preparation and culturing activated numerous pathways across all PCTS, especially those involved in inflammation (IL-6, IL-8 and HMGB1 signalling) and tissue remodelling (osteoarthritis pathway and integrin signalling). Despite the converging effects of culture, PCTS display species-, organ- and pathology-specific differences in the regulation of genes and canonical pathways. The underlying pathology in human diseased PCTS endures and influences biological processes like cytokine release. Our study reinforces the use of PCTS as an ex vivo fibrosis model and supports future studies towards its validation as a preclinical tool for drug development.
Collapse
Affiliation(s)
- Emilia Bigaeva
- Department of Pharmaceutical Technology and Biopharmacy, University of Groningen, Antonius Deusinglaan 1, Groningen, 9713AV, The Netherlands
| | - Emilia Gore
- Department of Pharmaceutical Technology and Biopharmacy, University of Groningen, Antonius Deusinglaan 1, Groningen, 9713AV, The Netherlands
| | - Eric Simon
- Computational Biology, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| | - Matthias Zwick
- Computational Biology, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| | - Anouk Oldenburger
- Cardiometabolic Disease Research, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| | - Koert P de Jong
- Department of Hepato-Pancreato-Biliary Surgery and Liver Transplantation, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713 GZ, Groningen, The Netherlands
| | - Hendrik S Hofker
- Department of Surgery, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713 GZ, Groningen, The Netherlands
| | - Marco Schlepütz
- Respiratory Diseases, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| | - Paul Nicklin
- Research Beyond Borders, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| | - Miriam Boersema
- Department of Pharmaceutical Technology and Biopharmacy, University of Groningen, Antonius Deusinglaan 1, Groningen, 9713AV, The Netherlands
| | - Jörg F Rippmann
- Cardiometabolic Disease Research, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| | - Peter Olinga
- Department of Pharmaceutical Technology and Biopharmacy, University of Groningen, Antonius Deusinglaan 1, Groningen, 9713AV, The Netherlands.
| |
Collapse
|
16
|
Ghadermarzi S, Li X, Li M, Kurgan L. Sequence-Derived Markers of Drug Targets and Potentially Druggable Human Proteins. Front Genet 2019; 10:1075. [PMID: 31803227 PMCID: PMC6872670 DOI: 10.3389/fgene.2019.01075] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Accepted: 10/09/2019] [Indexed: 12/16/2022] Open
Abstract
Recent research shows that majority of the druggable human proteome is yet to be annotated and explored. Accurate identification of these unexplored druggable proteins would facilitate development, screening, repurposing, and repositioning of drugs, as well as prediction of new drug–protein interactions. We contrast the current drug targets against the datasets of non-druggable and possibly druggable proteins to formulate markers that could be used to identify druggable proteins. We focus on the markers that can be extracted from protein sequences or names/identifiers to ensure that they can be applied across the entire human proteome. These markers quantify key features covered in the past works (topological features of PPIs, cellular functions, and subcellular locations) and several novel factors (intrinsic disorder, residue-level conservation, alternative splicing isoforms, domains, and sequence-derived solvent accessibility). We find that the possibly druggable proteins have significantly higher abundance of alternative splicing isoforms, relatively large number of domains, higher degree of centrality in the protein-protein interaction networks, and lower numbers of conserved and surface residues, when compared with the non-druggable proteins. We show that the current drug targets and possibly druggable proteins share involvement in the catalytic and signaling functions. However, unlike the drug targets, the possibly druggable proteins participate in the metabolic and biosynthesis processes, are enriched in the intrinsic disorder, interact with proteins and nucleic acids, and are localized across the cell. To sum up, we formulate several markers that can help with finding novel druggable human proteins and provide interesting insights into the cellular functions and subcellular locations of the current drug targets and potentially druggable proteins.
Collapse
Affiliation(s)
- Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
17
|
Ryaboshapkina M, Hammar M. Tissue-specific genes as an underutilized resource in drug discovery. Sci Rep 2019; 9:7233. [PMID: 31076736 PMCID: PMC6510781 DOI: 10.1038/s41598-019-43829-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Accepted: 05/01/2019] [Indexed: 12/25/2022] Open
Abstract
Tissue-specific genes are believed to be good drug targets due to improved safety. Here we show that this intuitive notion is not reflected in phase 1 and 2 clinical trials, despite the historic success of tissue-specific targets and their 2.3-fold overrepresentation among targets of marketed non-oncology drugs. We compare properties of tissue-specific genes and drug targets. We show that tissue-specificity of the target may also be related to efficacy of the drug. The relationship may be indirect (enrichment in Mendelian disease and PTVesc genes) or direct (elevated betweenness centrality scores for tissue-specifically produced enzymes and secreted proteins). Reduced evolutionary conservation of tissue-specific genes may represent a bottleneck for drug projects, prompting development of novel models with smaller evolutionary gap to humans. We show that the opportunities to identify tissue-specific drug targets are not exhausted and discuss potential use cases for tissue-specific genes in drug research.
Collapse
Affiliation(s)
- Maria Ryaboshapkina
- Translational Science, Cardiovascular, Renal and Metabolism, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden.
| | - Mårten Hammar
- Translational Science, Cardiovascular, Renal and Metabolism, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
18
|
Ji X, Rajpal DK, Freudenberg JM. The essentiality of drug targets: an analysis of current literature and genomic databases. Drug Discov Today 2019; 24:544-550. [DOI: 10.1016/j.drudis.2018.11.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 09/18/2018] [Accepted: 11/05/2018] [Indexed: 12/14/2022]
|
19
|
The benefits of in silico modeling to identify possible small-molecule drugs and their off-target interactions. Future Med Chem 2018; 10:423-432. [PMID: 29380627 DOI: 10.4155/fmc-2017-0151] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The research into the use of small molecules as drugs continues to be a key driver in the development of molecular databases, computer-aided drug design software and collaborative platforms. The evolution of computational approaches is driven by the essential criteria that a drug molecule has to fulfill, from the affinity to targets to minimal side effects while having adequate absorption, distribution, metabolism, and excretion (ADME) properties. A combination of ligand- and structure-based drug development approaches is already used to obtain consensus predictions of small molecule activities and their off-target interactions. Further integration of these methods into easy-to-use workflows informed by systems biology could realize the full potential of available data in the drug discovery and reduce the attrition of drug candidates.
Collapse
|