1
|
Kim D, Jeong J, Choi J. Identification of Optimal Machine Learning Algorithms and Molecular Fingerprints for Explainable Toxicity Prediction Models Using ToxCast/Tox21 Bioassay Data. ACS OMEGA 2024; 9:37934-37941. [PMID: 39281924 PMCID: PMC11391437 DOI: 10.1021/acsomega.4c04474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/18/2024]
Abstract
Recent studies have primarily focused on introducing novel frameworks to enhance the predictive power of toxicity prediction models by refining molecular representation methods and algorithms. However, these methods are inherently complex and often pose challenges in understanding and explaining, leading to barriers in their regulatory adoption and validation. Therefore, it is necessary to select the optimal model, considering not only model performance but also interpretability. This study aimed to identify the optimal combination of molecular fingerprints (pattern-based versus algorithm-based) and machine learning algorithms (simple versus complex) for developing explainable toxicity prediction models through an comprehensive investigation of the ToxCast/Tox21 bioassay data set. For 1092 ToxCast/Tox21 assays, five molecular fingerprints (MACCS, Morgan, RDKit, Layered, and Patterned) and six algorithms (MLP, GBT, Random Forest, kNN, Logistic Regression, and Naïve Bayes) were used to train the models. Results showed that 35 models revealed acceptable performance (F1 score or accuracy is 0.8 or higher). Among the combinations, either MACCS or Morgan, paired with Random Forest, demonstrated robust performance compared with other molecular fingerprints and algorithms. MACCS and Random Forest are valuable, even when prioritizing interpretability. Consequently, the MACCS-Random Forest combination model based on four assays, targeting G protein-coupled receptor and kinase, were identified and they can be used to discern specific structural features or patterns in chemical compounds, offering explainable insights into toxicity-related chemical structures. This study indicates the importance of not disregarding the utilization of simple models when assessing both predictivity and interpretability within the context of chemical feature-based Tox21 data analysis.
Collapse
Affiliation(s)
- Donghyeon Kim
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| | - Jaeseong Jeong
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| | - Jinhee Choi
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| |
Collapse
|
2
|
Nguyen ATN, Nguyen DTN, Koh HY, Toskov J, MacLean W, Xu A, Zhang D, Webb GI, May LT, Halls ML. The application of artificial intelligence to accelerate G protein-coupled receptor drug discovery. Br J Pharmacol 2024; 181:2371-2384. [PMID: 37161878 DOI: 10.1111/bph.16140] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 04/14/2023] [Accepted: 04/27/2023] [Indexed: 05/11/2023] Open
Abstract
The application of artificial intelligence (AI) approaches to drug discovery for G protein-coupled receptors (GPCRs) is a rapidly expanding area. Artificial intelligence can be used at multiple stages during the drug discovery process, from aiding our understanding of the fundamental actions of GPCRs to the discovery of new ligand-GPCR interactions or the prediction of clinical responses. Here, we provide an overview of the concepts behind artificial intelligence, including the subfields of machine learning and deep learning. We summarise the published applications of artificial intelligence to different stages of the GPCR drug discovery process. Finally, we reflect on the benefits and limitations of artificial intelligence and share our vision for the exciting potential for further development of applications to aid GPCR drug discovery. In addition to making the drug discovery process "faster, smarter and cheaper," we anticipate that the application of artificial intelligence will create exciting new opportunities for GPCR drug discovery. LINKED ARTICLES: This article is part of a themed issue Therapeutic Targeting of G Protein-Coupled Receptors: hot topics from the Australasian Society of Clinical and Experimental Pharmacologists and Toxicologists 2021 Virtual Annual Scientific Meeting. To view the other articles in this section visit http://onlinelibrary.wiley.com/doi/10.1111/bph.v181.14/issuetoc.
Collapse
Affiliation(s)
- Anh T N Nguyen
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - Diep T N Nguyen
- Department of Information Technology, Faculty of Engineering and Technology, Vietnam National University, Cau Giay, Hanoi, Vietnam
| | - Huan Yee Koh
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Jason Toskov
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - William MacLean
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - Andrew Xu
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - Daokun Zhang
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Geoffrey I Webb
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Lauren T May
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - Michelle L Halls
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| |
Collapse
|
3
|
Velloso JPL, Kovacs AS, Pires DEV, Ascher DB. AI-driven GPCR analysis, engineering, and targeting. Curr Opin Pharmacol 2024; 74:102427. [PMID: 38219398 DOI: 10.1016/j.coph.2023.102427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 01/16/2024]
Abstract
This article investigates the role of recent advances in Artificial Intelligence (AI) to revolutionise the study of G protein-coupled receptors (GPCRs). AI has been applied to many areas of GPCR research, including the application of machine learning (ML) in GPCR classification, prediction of GPCR activation levels, modelling GPCR 3D structures and interactions, understanding G-protein selectivity, aiding elucidation of GPCRs structures, and drug design. Despite progress, challenges in predicting GPCR structures and addressing the complex nature of GPCRs remain, providing avenues for future research and development.
Collapse
Affiliation(s)
- João P L Velloso
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - Aaron S Kovacs
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia.
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia.
| |
Collapse
|
4
|
Raza A, Chohan TA, Buabeid M, Arafa ESA, Chohan TA, Fatima B, Sultana K, Ullah MS, Murtaza G. Deep learning in drug discovery: a futuristic modality to materialize the large datasets for cheminformatics. J Biomol Struct Dyn 2023; 41:9177-9192. [PMID: 36305195 DOI: 10.1080/07391102.2022.2136244] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 10/08/2022] [Indexed: 10/31/2022]
Abstract
Artificial intelligence (AI) development imitates the workings of the human brain to comprehend modern problems. The traditional approaches such as high throughput screening (HTS) and combinatorial chemistry are lengthy and expensive to the pharmaceutical industry as they can only handle a smaller dataset. Deep learning (DL) is a sophisticated AI method that uses a thorough comprehension of particular systems. The pharmaceutical industry is now adopting DL techniques to enhance the research and development process. Multi-oriented algorithms play a crucial role in the processing of QSAR analysis, de novo drug design, ADME evaluation, physicochemical analysis, preclinical development, followed by clinical trial data precision. In this study, we investigated the performance of several algorithms, including deep neural networks (DNN), convolutional neural networks (CNN) and multi-task learning (MTL), with the aim of generating high-quality, interpretable big and diverse databases for drug design and development. Studies have demonstrated that CNN, recurrent neural network and deep belief network are compatible, accurate and effective for the molecular description of pharmacodynamic properties. In Covid-19, existing pharmacological compounds has also been repurposed using DL models. In the absence of the Covid-19 vaccine, remdesivir and oseltamivir have been widely employed to treat severe SARS-CoV-2 infections. In conclusion, the results indicate the potential benefits of employing the DL strategies in the drug discovery process.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ali Raza
- Department of pharmaceutical chemistry, Faculty of Pharmacy, The University of Lahore, Pakistan
- Institute of Molecular Biology and Biochemistry, The University of Lahore, Pakistan
| | - Talha Ali Chohan
- Institute of Molecular Biology and Biochemistry, The University of Lahore, Pakistan
- Institute of Pharmaceutical Science, UVAS, Lahore, Pakistan
| | - Manal Buabeid
- Department of Clinical Sciences, College of Pharmacy and Health Sciences, Ajman University, Ajman, United Arab Emirates
| | - El-Shaima A Arafa
- Department of Clinical Sciences, College of Pharmacy and Health Sciences, Ajman University, Ajman, United Arab Emirates
- Centre of Medical and Bio-Allied Health Sciences Research, Ajman University, Ajman, United Arab Emirates
| | | | - Batool Fatima
- Department of biochemistry, Bahauddin Zakariya University, Multan, Pakistan
| | - Kishwar Sultana
- Department of pharmaceutical chemistry, Faculty of Pharmacy, The University of Lahore, Pakistan
| | - Malik Saad Ullah
- Department of Pharmacy, Government College University, Faisalabad, Pakistan
| | - Ghulam Murtaza
- Department of Pharmacy, COMSATS University Islamabad, Lahore Campus, Pakistan
| |
Collapse
|
5
|
Gu Y, Li J, Kang H, Zhang B, Zheng S. Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning. Molecules 2023; 28:5982. [PMID: 37630234 PMCID: PMC10459669 DOI: 10.3390/molecules28165982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 07/27/2023] [Accepted: 08/03/2023] [Indexed: 08/27/2023] Open
Abstract
Ligand-based virtual screening (LBVS) is a promising approach for rapid and low-cost screening of potentially bioactive molecules in the early stage of drug discovery. Compared with traditional similarity-based machine learning methods, deep learning frameworks for LBVS can more effectively extract high-order molecule structure representations from molecular fingerprints or structures. However, the 3D conformation of a molecule largely influences its bioactivity and physical properties, and has rarely been considered in previous deep learning-based LBVS methods. Moreover, the relative bioactivity benchmark dataset is still lacking. To address these issues, we introduce a novel end-to-end deep learning architecture trained from molecular conformers for LBVS. We first extracted molecule conformers from multiple public molecular bioactivity data and consolidated them into a large-scale bioactivity benchmark dataset, which totally includes millions of endpoints and molecules corresponding to 954 targets. Then, we devised a deep learning-based LBVS called EquiVS to learn molecule representations from conformers for bioactivity prediction. Specifically, graph convolutional network (GCN) and equivariant graph neural network (EGNN) are sequentially stacked to learn high-order molecule-level and conformer-level representations, followed with attention-based deep multiple-instance learning (MIL) to aggregate these representations and then predict the potential bioactivity for the query molecule on a given target. We conducted various experiments to validate the data quality of our benchmark dataset, and confirmed EquiVS achieved better performance compared with 10 traditional machine learning or deep learning-based LBVS methods. Further ablation studies demonstrate the significant contribution of molecular conformation for bioactivity prediction, as well as the reasonability and non-redundancy of deep learning architecture in EquiVS. Finally, a model interpretation case study on CDK2 shows the potential of EquiVS in optimal conformer discovery. The overall study shows that our proposed benchmark dataset and EquiVS method have promising prospects in virtual screening applications.
Collapse
Affiliation(s)
- Yaowen Gu
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
- Department of Chemistry, New York University, New York, NY 10027, USA
| | - Jiao Li
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
| | - Hongyu Kang
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing 100081, China
| | - Bowen Zhang
- Beijing StoneWise Technology Co., Ltd., Beijing 100080, China;
| | - Si Zheng
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
- Institute for Artificial Intelligence, Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing 100084, China
| |
Collapse
|
6
|
El-Atawneh S, Goldblum A. Activity Models of Key GPCR Families in the Central Nervous System: A Tool for Many Purposes. J Chem Inf Model 2023. [PMID: 37257045 DOI: 10.1021/acs.jcim.2c01531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
G protein-coupled receptors (GPCRs) are targets of many drugs, of which ∼25% are indicated for central nervous system (CNS) disorders. Drug promiscuity affects their efficacy and safety profiles. Predicting the polypharmacology profile of compounds against GPCRs can thus provide a basis for producing more precise therapeutics by considering the targets and the anti-targets in that family of closely related proteins. We provide a tool for predicting the polypharmacology of compounds within prominent GPCR families in the CNS: serotonin, dopamine, histamine, muscarinic, opioid, and cannabinoid receptors. Our in-house algorithm, "iterative stochastic elimination" (ISE), produces high-quality ligand-based models for agonism and antagonism at 31 GPCRs. The ISE models correctly predict 68% of CNS drug-GPCR interactions, while the "similarity ensemble approach" predicts only 33%. The activity models correctly predict 56% of reported activities of DrugBank molecules for these CNS receptors. We conclude that the combination of interactions and activity profiles generated by screening through our models form the basis for subsequent designing and discovering novel therapeutics, either single, multitargeting, or repurposed.
Collapse
Affiliation(s)
- Shayma El-Atawneh
- Molecular Modelling and Drug Design Lab, Institute for Drug Research and Fraunhofer Project Center for Drug Discovery and Delivery, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 91905, Israel
| | - Amiram Goldblum
- Molecular Modelling and Drug Design Lab, Institute for Drug Research and Fraunhofer Project Center for Drug Discovery and Delivery, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 91905, Israel
| |
Collapse
|
7
|
Huang S, Zheng S, Chen R. Multi-source transfer learning with Graph Neural Network for excellent modelling the bioactivities of ligands targeting orphan G protein-coupled receptors. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:2588-2608. [PMID: 36899548 DOI: 10.3934/mbe.2023121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
G protein-coupled receptors (GPCRs) have been the targets for more than 40% of the currently approved drugs. Although neural networks can effectively improve the accuracy of prediction with the biological activity, the result is undesirable in the limited orphan GPCRs (oGPCRs) datasets. To this end, we proposed Multi-source Transfer Learning with Graph Neural Network, called MSTL-GNN, to bridge this gap. Firstly, there are three ideal sources of data for transfer learning, oGPCRs, experimentally validated GPCRs, and invalidated GPCRs similar to the former one. Secondly, the SIMLEs format GPCRs convert to graphics, and they can be the input of Graph Neural Network (GNN) and ensemble learning for improving prediction accuracy. Finally, our experiments show that MSTL-GNN remarkably improves the prediction of GPCRs ligand activity value compared with previous studies. On average, the two evaluation indexes we adopted, R2 and Root-mean-square deviation (RMSE). Compared with the state-of-the-art work MSTL-GNN increased up to 67.13% and 17.22%, respectively. The effectiveness of MSTL-GNN in the field of GPCR Drug discovery with limited data also paves the way for other similar application scenarios.
Collapse
Affiliation(s)
- Shizhen Huang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350116, China
| | - ShaoDong Zheng
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350116, China
- VeriMake Innovation Lab, Nanjing Renmian Integrated Circuit Co., Ltd., Nanjing 210088, China
| | - Ruiqi Chen
- VeriMake Innovation Lab, Nanjing Renmian Integrated Circuit Co., Ltd., Nanjing 210088, China
| |
Collapse
|
8
|
Hasanzadeh A, Hamblin MR, Kiani J, Noori H, Hardie JM, Karimi M, Shafiee H. Could artificial intelligence revolutionize the development of nanovectors for gene therapy and mRNA vaccines? NANO TODAY 2022; 47:101665. [PMID: 37034382 PMCID: PMC10081506 DOI: 10.1016/j.nantod.2022.101665] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Gene therapy enables the introduction of nucleic acids like DNA and RNA into host cells, and is expected to revolutionize the treatment of a wide range of diseases. This growth has been further accelerated by the discovery of CRISPR/Cas technology, which allows accurate genomic editing in a broad range of cells and organisms in vitro and in vivo. Despite many advances in gene delivery and the development of various viral and non-viral gene delivery vectors, the lack of highly efficient non-viral systems with low cellular toxicity remains a challenge. The application of cutting-edge technologies such as artificial intelligence (AI) has great potential to find new paradigms to solve this issue. Herein, we review AI and its major subfields including machine learning (ML), neural networks (NNs), expert systems, deep learning (DL), computer vision and robotics. We discuss the potential of AI-based models and algorithms in the design of targeted gene delivery vehicles capable of crossing extracellular and intracellular barriers by viral mimicry strategies. We finally discuss the role of AI in improving the function of CRISPR/Cas systems, developing novel nanobots, and mRNA vaccine carriers.
Collapse
Affiliation(s)
- Akbar Hasanzadeh
- Cellular and Molecular Research Center, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Department of Medical Nanotechnology, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran 1449614535, Iran
| | - Michael R Hamblin
- Laser Research Centre, Faculty of Health Science, University of Johannesburg, Doornfontein 2028, South Africa
- Radiation Biology Research Center, Iran University of Medical Sciences, Tehran, Iran
| | - Jafar Kiani
- Oncopathology Research Center, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Department of Molecular Medicine, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Hamid Noori
- Cellular and Molecular Research Center, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Department of Medical Nanotechnology, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran 1449614535, Iran
| | - Joseph M. Hardie
- Division of Engineering in Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 02139 USA
| | - Mahdi Karimi
- Cellular and Molecular Research Center, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Department of Medical Nanotechnology, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Oncopathology Research Center, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Research Center for Science and Technology in Medicine, Tehran University of Medical Sciences, Tehran 141556559, Iran
- Applied Biotechnology Research Centre, Tehran Medical Science, Islamic Azad University, Tehran 1584743311, Iran
| | - Hadi Shafiee
- Division of Engineering in Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 02139 USA
| |
Collapse
|
9
|
Yin Y, Hu H, Yang Z, Jiang F, Huang Y, Wu J. AFSE: towards improving model generalization of deep graph learning of ligand bioactivities targeting GPCR proteins. Brief Bioinform 2022; 23:6554127. [PMID: 35348582 DOI: 10.1093/bib/bbac077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 02/12/2022] [Accepted: 02/14/2022] [Indexed: 11/14/2022] Open
Abstract
Ligand molecules naturally constitute a graph structure. Recently, many excellent deep graph learning (DGL) methods have been proposed and used to model ligand bioactivities, which is critical for the virtual screening of drug hits from compound databases in interest. However, pharmacists can find that these well-trained DGL models usually are hard to achieve satisfying performance in real scenarios for virtual screening of drug candidates. The main challenges involve that the datasets for training models were small-sized and biased, and the inner active cliff cases would worsen model performance. These challenges would cause predictors to overfit the training data and have poor generalization in real virtual screening scenarios. Thus, we proposed a novel algorithm named adversarial feature subspace enhancement (AFSE). AFSE dynamically generates abundant representations in new feature subspace via bi-directional adversarial learning, and then minimizes the maximum loss of molecular divergence and bioactivity to ensure local smoothness of model outputs and significantly enhance the generalization of DGL models in predicting ligand bioactivities. Benchmark tests were implemented on seven state-of-the-art open-source DGL models with the potential of modeling ligand bioactivities, and precisely evaluated by multiple criteria. The results indicate that, on almost all 33 GPCRs datasets and seven DGL models, AFSE greatly improved their enhancement factor (top-10%, 20% and 30%), which is the most important evaluation in virtual screening of hits from compound databases, while ensuring the superior performance on RMSE and $r^2$. The web server of AFSE is freely available at http://noveldelta.com/AFSE for academic purposes.
Collapse
Affiliation(s)
- Yueming Yin
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Haifeng Hu
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Zhen Yang
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China.,National Engineering Research Center of Communications and Networking, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Feihu Jiang
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Yihe Huang
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Jiansheng Wu
- School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.,Smart Health Big Data Analysis and Location Services Engineering Research Center of Jiangsu Province, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| |
Collapse
|
10
|
Wu J, Lan C, Mei Z, Chen X, Zhu Y, Hu H, Diao Y. Transfer learning with molecular graph convolutional networks for accurate modelling and representation of bioactivities of ligands targeting GPCRs without sufficient data. Comput Biol Chem 2022; 98:107664. [DOI: 10.1016/j.compbiolchem.2022.107664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 02/23/2022] [Accepted: 03/06/2022] [Indexed: 11/29/2022]
|
11
|
Velloso JPL, Ascher DB, Pires DEV. pdCSM-GPCR: predicting potent GPCR ligands with graph-based signatures. BIOINFORMATICS ADVANCES 2021; 1:vbab031. [PMID: 34901870 PMCID: PMC8651072 DOI: 10.1093/bioadv/vbab031] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 09/30/2021] [Accepted: 11/02/2021] [Indexed: 01/26/2023]
Abstract
MOTIVATION G protein-coupled receptors (GPCRs) can selectively bind to many types of ligands, ranging from light-sensitive compounds, ions, hormones, pheromones and neurotransmitters, modulating cell physiology. Considering their role in many essential cellular processes, they are one of the most targeted protein families, with over a third of all approved drugs modulating GPCR signalling. Despite this, the large diversity of receptors and their multipass transmembrane architectures make the identification and development of novel specific, and safe GPCR ligands a challenge. While computational approaches have the potential to assist GPCR drug development, they have presented limited performance and generalization capabilities. Here, we explored the use of graph-based signatures to develop pdCSM-GPCR, a method capable of rapidly and accurately screening potential GPCR ligands. RESULTS Bioactivity data (IC50, EC50, Ki and Kd) for individual GPCRs were curated. After curation, we used the data for developing predictive models for 36 major GPCR targets, across 4 classes (A, B, C and F). Our models compose the most comprehensive computational resource for GPCR bioactivity prediction to date. Across stratified 10-fold cross-validation and blind tests, our approach achieved Pearson's correlations of up to 0.89, significantly outperforming previous methods. Interpreting our results, we identified common important features of potent GPCRs ligands, which tend to have bicyclic rings, leading to higher levels of aromaticity. We believe pdCSM-GPCR will be an invaluable tool to assist screening efforts, enriching compound libraries and ranking candidates for further experimental validation. AVAILABILITY AND IMPLEMENTATION pdCSM-GPCR predictive models and datasets used have been made available via a freely accessible and easy-to-use web server at http://biosig.unimelb.edu.au/pdcsm_gpcr/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- João Paulo L Velloso
- Fundação Oswaldo Cruz, Instituto René Rachou, Belo Horizonte 30190-009, Brazil
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne 3052, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia
- Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne 3052, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia
- Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Melbourne 3052, Australia
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne 3052, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne 3053, Australia
| |
Collapse
|
12
|
Jabeen A, de March CA, Matsunami H, Ranganathan S. Machine Learning Assisted Approach for Finding Novel High Activity Agonists of Human Ectopic Olfactory Receptors. Int J Mol Sci 2021; 22:ijms222111546. [PMID: 34768977 PMCID: PMC8583936 DOI: 10.3390/ijms222111546] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/21/2021] [Accepted: 10/22/2021] [Indexed: 12/29/2022] Open
Abstract
Olfactory receptors (ORs) constitute the largest superfamily of G protein-coupled receptors (GPCRs). ORs are involved in sensing odorants as well as in other ectopic roles in non-nasal tissues. Matching of an enormous number of the olfactory stimulation repertoire to its counterpart OR through machine learning (ML) will enable understanding of olfactory system, receptor characterization, and exploitation of their therapeutic potential. In the current study, we have selected two broadly tuned ectopic human OR proteins, OR1A1 and OR2W1, for expanding their known chemical space by using molecular descriptors. We present a scheme for selecting the optimal features required to train an ML-based model, based on which we selected the random forest (RF) as the best performer. High activity agonist prediction involved screening five databases comprising ~23 M compounds, using the trained RF classifier. To evaluate the effectiveness of the machine learning based virtual screening and check receptor binding site compatibility, we used docking of the top target ligands to carefully develop receptor model structures. Finally, experimental validation of selected compounds with significant docking scores through in vitro assays revealed two high activity novel agonists for OR1A1 and one for OR2W1.
Collapse
Affiliation(s)
- Amara Jabeen
- Applied BioSciences, Macquarie University, Sydney, NSW 2109, Australia;
| | - Claire A. de March
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA;
| | - Hiroaki Matsunami
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA;
- Department of Neurobiology, Duke Institute for Brain Sciences, Duke University, Durham, NC 27710, USA
- Correspondence: (H.M.); (S.R.)
| | - Shoba Ranganathan
- Applied BioSciences, Macquarie University, Sydney, NSW 2109, Australia;
- Correspondence: (H.M.); (S.R.)
| |
Collapse
|
13
|
Yin Y, Hu H, Yang Z, Xu H, Wu J. RealVS: Toward Enhancing the Precision of Top Hits in Ligand-Based Virtual Screening of Drug Leads from Large Compound Databases. J Chem Inf Model 2021; 61:4924-4939. [PMID: 34619030 DOI: 10.1021/acs.jcim.1c01021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Accurate modeling of compound bioactivities is essential for the virtual screening of drug leads. In real-world scenarios, pharmacists tend to choose from the top-k hit compounds ranked by predicted bioactivities from a large database with interest to continue wet experiments for drug discovery. Significant improvement of the precision of the top hits in ligand-based virtual screening of drug leads is more valuable than conventional schemes for accurately predicting the bioactivities of all compounds from a large database. Here, we proposed a new method, RealVS, to significantly improve the top hits' precision and learn interpretable key substructures associated with compound bioactivities. The features of RealVS involve the following points. (1) Abundant transferable information from the source domain was introduced for alleviating the insufficiency of inactive ligands associated with drug targets. (2) The adversarial domain alignment was adopted to fit the distribution of generated features of compounds from the training data set and that from the screening database for greater model generalization ability. (3) A novel objective function was proposed to simultaneously optimize the classification loss, regression loss, and adversarial loss, where most inactive ligands tend to be screened out before activity regression prediction. (4) Graph attention networks were adopted for learning key substructures associated with ligand bioactivities for better model interpretability. The results on a large number of benchmark data sets show that our method has significantly improved the precision of top hits under various k values in ligand-based virtual screening of drug leads from large compound databases, which is of great value in real-world scenarios. The web server of RealVS is freely available at noveldelta.com/RealVS for academic purposes, where virtual screening of hits from large compound databases is accessible.
Collapse
Affiliation(s)
- Yueming Yin
- College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Haifeng Hu
- College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Zhen Yang
- National Engineering Research Center of Communications and Networking, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Huajian Xu
- College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Jiansheng Wu
- School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| |
Collapse
|
14
|
Gupta R, Srivastava D, Sahu M, Tiwari S, Ambasta RK, Kumar P. Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers 2021; 25:1315-1360. [PMID: 33844136 PMCID: PMC8040371 DOI: 10.1007/s11030-021-10217-3] [Citation(s) in RCA: 286] [Impact Index Per Article: 95.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 03/22/2021] [Indexed: 02/06/2023]
Abstract
Drug designing and development is an important area of research for pharmaceutical companies and chemical scientists. However, low efficacy, off-target delivery, time consumption, and high cost impose a hurdle and challenges that impact drug design and discovery. Further, complex and big data from genomics, proteomics, microarray data, and clinical trials also impose an obstacle in the drug discovery pipeline. Artificial intelligence and machine learning technology play a crucial role in drug discovery and development. In other words, artificial neural networks and deep learning algorithms have modernized the area. Machine learning and deep learning algorithms have been implemented in several drug discovery processes such as peptide synthesis, structure-based virtual screening, ligand-based virtual screening, toxicity prediction, drug monitoring and release, pharmacophore modeling, quantitative structure-activity relationship, drug repositioning, polypharmacology, and physiochemical activity. Evidence from the past strengthens the implementation of artificial intelligence and deep learning in this field. Moreover, novel data mining, curation, and management techniques provided critical support to recently developed modeling algorithms. In summary, artificial intelligence and deep learning advancements provide an excellent opportunity for rational drug design and discovery process, which will eventually impact mankind. The primary concern associated with drug design and development is time consumption and production cost. Further, inefficiency, inaccurate target delivery, and inappropriate dosage are other hurdles that inhibit the process of drug delivery and development. With advancements in technology, computer-aided drug design integrating artificial intelligence algorithms can eliminate the challenges and hurdles of traditional drug design and development. Artificial intelligence is referred to as superset comprising machine learning, whereas machine learning comprises supervised learning, unsupervised learning, and reinforcement learning. Further, deep learning, a subset of machine learning, has been extensively implemented in drug design and development. The artificial neural network, deep neural network, support vector machines, classification and regression, generative adversarial networks, symbolic learning, and meta-learning are examples of the algorithms applied to the drug design and discovery process. Artificial intelligence has been applied to different areas of drug design and development process, such as from peptide synthesis to molecule design, virtual screening to molecular docking, quantitative structure-activity relationship to drug repositioning, protein misfolding to protein-protein interactions, and molecular pathway identification to polypharmacology. Artificial intelligence principles have been applied to the classification of active and inactive, monitoring drug release, pre-clinical and clinical development, primary and secondary drug screening, biomarker development, pharmaceutical manufacturing, bioactivity identification and physiochemical properties, prediction of toxicity, and identification of mode of action.
Collapse
Affiliation(s)
- Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Devesh Srivastava
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Mehar Sahu
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Swati Tiwari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India.
| |
Collapse
|
15
|
Raschka S, Kaufman B. Machine learning and AI-based approaches for bioactive ligand discovery and GPCR-ligand recognition. Methods 2020; 180:89-110. [PMID: 32645448 PMCID: PMC8457393 DOI: 10.1016/j.ymeth.2020.06.016] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 06/23/2020] [Accepted: 06/23/2020] [Indexed: 02/06/2023] Open
Abstract
In the last decade, machine learning and artificial intelligence applications have received a significant boost in performance and attention in both academic research and industry. The success behind most of the recent state-of-the-art methods can be attributed to the latest developments in deep learning. When applied to various scientific domains that are concerned with the processing of non-tabular data, for example, image or text, deep learning has been shown to outperform not only conventional machine learning but also highly specialized tools developed by domain experts. This review aims to summarize AI-based research for GPCR bioactive ligand discovery with a particular focus on the most recent achievements and research trends. To make this article accessible to a broad audience of computational scientists, we provide instructive explanations of the underlying methodology, including overviews of the most commonly used deep learning architectures and feature representations of molecular data. We highlight the latest AI-based research that has led to the successful discovery of GPCR bioactive ligands. However, an equal focus of this review is on the discussion of machine learning-based technology that has been applied to ligand discovery in general and has the potential to pave the way for successful GPCR bioactive ligand discovery in the future. This review concludes with a brief outlook highlighting the recent research trends in deep learning, such as active learning and semi-supervised learning, which have great potential for advancing bioactive ligand discovery.
Collapse
Affiliation(s)
- Sebastian Raschka
- University of Wisconsin-Madison, Department of Statistics, United States.
| | - Benjamin Kaufman
- University of Wisconsin-Madison, Department of Biostatistics and Medical Informatics, United States
| |
Collapse
|
16
|
Wu J, Sun Y, Chan WKB, Zhu Y, Zhu W, Huang W, Hu H, Yan S, Pang T, Ke X, Li F. Homologous G Protein-Coupled Receptors Boost the Modeling and Interpretation of Bioactivities of Ligand Molecules. J Chem Inf Model 2020; 60:1865-1875. [PMID: 32040913 DOI: 10.1021/acs.jcim.9b01000] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
G protein-coupled receptors (GPCRs) are one of the most important drug targets, accounting for ∼34% of drugs on the market. For drug discovery, accurate modeling and explanation of bioactivities of ligands is critical for the screening and optimization of hit compounds. Homologous GPCRs are more likely to interact with chemically similar ligands, and they tend to share common binding modes with ligand molecules. The inclusion of homologous GPCRs in learning bioactivities of ligands potentially enhances the accuracy and interpretability of models due to utilizing increased training sample size and the existence of common ligand substructures that control bioactivities. Accurate modeling and interpretation of bioactivities of ligands by combining homologous GPCRs can be formulated as multitask learning with joint feature learning problem and naturally matched with the group lasso learning algorithm. Thus, we proposed a multitask regression learning with group lasso (MTR-GL) implemented by l2,1-norm regularization to model bioactivities of ligand molecules and then tested the algorithm on a series of thirty-five representative GPCRs datasets that cover nine subfamilies of human GPCRs. The results show that MTR-GL is overall superior to single-task learning methods and classic multitask learning with joint feature learning methods. Moreover, MTR-GL achieves better performance than state-of-the-art deep multitask learning based methods of predicting ligand bioactivities on most datasets (31/35), where MTR-GL obtained an average improvement of 38% on correlation coefficient (r2) and 29% on root-mean-square error over the DeepNeuralNet-QSAR predictors.
Collapse
Affiliation(s)
- Jiansheng Wu
- School of Geographic and Biological Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.,Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Yi Sun
- School of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Wallace K B Chan
- Department of Pharmacology, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Yanxiang Zhu
- Verimake Research, Nanjing Qujike Info-tech Co., Ltd., Nanjing 210088, China
| | - Wenyong Zhu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Wanqing Huang
- School of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Haifeng Hu
- School of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Shancheng Yan
- School of Geographic and Biological Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.,Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Tao Pang
- Jiangsu Key Laboratory of Drug Screening, China Pharmaceutical University, Nanjing 210009, China
| | - Xiaoyan Ke
- Child Mental Health Research Center, Nanjing Brain Hospital, Nanjing Medical University, Nanjing 210029, China
| | - Fei Li
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing 210009, China
| |
Collapse
|
17
|
Singh N, Chaput L, Villoutreix BO. Virtual screening web servers: designing chemical probes and drug candidates in the cyberspace. Brief Bioinform 2020; 22:1790-1818. [PMID: 32187356 PMCID: PMC7986591 DOI: 10.1093/bib/bbaa034] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The interplay between life sciences and advancing technology drives a continuous cycle of chemical data growth; these data are most often stored in open or partially open databases. In parallel, many different types of algorithms are being developed to manipulate these chemical objects and associated bioactivity data. Virtual screening methods are among the most popular computational approaches in pharmaceutical research. Today, user-friendly web-based tools are available to help scientists perform virtual screening experiments. This article provides an overview of internet resources enabling and supporting chemical biology and early drug discovery with a main emphasis on web servers dedicated to virtual ligand screening and small-molecule docking. This survey first introduces some key concepts and then presents recent and easily accessible virtual screening and related target-fishing tools as well as briefly discusses case studies enabled by some of these web services. Notwithstanding further improvements, already available web-based tools not only contribute to the design of bioactive molecules and assist drug repositioning but also help to generate new ideas and explore different hypotheses in a timely fashion while contributing to teaching in the field of drug development.
Collapse
Affiliation(s)
- Natesh Singh
- Univ. Lille, Inserm, Institut Pasteur de Lille, U1177 Drugs and Molecules for Living Systems, F-59000 Lille, France
| | - Ludovic Chaput
- Univ. Lille, Inserm, Institut Pasteur de Lille, U1177 Drugs and Molecules for Living Systems, F-59000 Lille, France
| | - Bruno O Villoutreix
- Univ. Lille, Inserm, Institut Pasteur de Lille, U1177 Drugs and Molecules for Living Systems, F-59000 Lille, France
| |
Collapse
|
18
|
Liu T, Tang H. A Brief Survey of Machine Learning Methods in Identification of Mitochondria Proteins in Malaria Parasite. Curr Pharm Des 2020; 26:3049-3058. [PMID: 32156226 DOI: 10.2174/1381612826666200310122324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 02/10/2020] [Indexed: 11/22/2022]
Abstract
The number of human deaths caused by malaria is increasing day-by-day. In fact, the mitochondrial proteins of the malaria parasite play vital roles in the organism. For developing effective drugs and vaccines against infection, it is necessary to accurately identify mitochondrial proteins of the malaria parasite. Although precise details for the mitochondrial proteins can be provided by biochemical experiments, they are expensive and time-consuming. In this review, we summarized the machine learning-based methods for mitochondrial proteins identification in the malaria parasite and compared the construction strategies of these computational methods. Finally, we also discussed the future development of mitochondrial proteins recognition with algorithms.
Collapse
Affiliation(s)
- Ting Liu
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| | - Hua Tang
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| |
Collapse
|
19
|
Network Embedding the Protein-Protein Interaction Network for Human Essential Genes Identification. Genes (Basel) 2020; 11:genes11020153. [PMID: 32023848 PMCID: PMC7074227 DOI: 10.3390/genes11020153] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 01/27/2020] [Accepted: 01/29/2020] [Indexed: 11/18/2022] Open
Abstract
Essential genes are a group of genes that are indispensable for cell survival and cell fertility. Studying human essential genes helps scientists reveal the underlying biological mechanisms of a human cell but also guides disease treatment. Recently, the publication of human essential gene data makes it possible for researchers to train a machine-learning classifier by using some features of the known human essential genes and to use the classifier to predict new human essential genes. Previous studies have found that the essentiality of genes closely relates to their properties in the protein–protein interaction (PPI) network. In this work, we propose a novel supervised method to predict human essential genes by network embedding the PPI network. Our approach implements a bias random walk on the network to get the node network context. Then, the node pairs are input into an artificial neural network to learn their representation vectors that maximally preserves network structure and the properties of the nodes in the network. Finally, the features are put into an SVM classifier to predict human essential genes. The prediction results on two human PPI networks show that our method achieves better performance than those that refer to either genes’ sequence information or genes’ centrality properties in the network as input features. Moreover, it also outperforms the methods that represent the PPI network by other previous approaches.
Collapse
|
20
|
Wang H, Qiu J, Liu H, Xu Y, Jia Y, Zhao Y. HKPocket: human kinase pocket database for drug design. BMC Bioinformatics 2019; 20:617. [PMID: 31783725 PMCID: PMC6884818 DOI: 10.1186/s12859-019-3254-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 11/15/2019] [Indexed: 01/06/2023] Open
Abstract
Background The kinase pocket structural information is important for drug discovery targeting cancer or other diseases. Although some kinase sequence, structure or drug databases have been developed, the databases cannot be directly used in the kinase drug study. Therefore, a comprehensive database of human kinase protein pockets is urgently needed to be developed. Results Here, we have developed HKPocket, a comprehensive Human Kinase Pocket database. This database provides sequence, structure, hydrophilic-hydrophobic, critical interactions, and druggability information including 1717 pockets from 255 kinases. We further divided these pockets into 91 pocket clusters using structural and position features in each kinase group. The pocket structural information would be useful for preliminary drug screening. Then, the potential drugs can be further selected and optimized by analyzing the sequence conservation, critical interactions, and hydrophobicity of identified drug pockets. HKPocket also provides online visualization and pse files of all identified pockets. Conclusion The HKPocket database would be helpful for drug screening and optimization. Besides, drugs targeting the non-catalytic pockets would cause fewer side effects. HKPocket is available at http://zhaoserver.com.cn/HKPocket/HKPocket.html.
Collapse
Affiliation(s)
- Huiwen Wang
- Department of Physics, Central China Normal University, Wuhan, 430079, China
| | - Jiadi Qiu
- Department of Physics, Central China Normal University, Wuhan, 430079, China
| | - Haoquan Liu
- Department of Physics, Central China Normal University, Wuhan, 430079, China
| | - Ying Xu
- Department of Physics, Central China Normal University, Wuhan, 430079, China
| | - Ya Jia
- Department of Physics, Central China Normal University, Wuhan, 430079, China
| | - Yunjie Zhao
- Department of Physics, Central China Normal University, Wuhan, 430079, China.
| |
Collapse
|
21
|
Wu J, Liu B, Chan WKB, Wu W, Pang T, Hu H, Yan S, Ke X, Zhang Y. Precise modelling and interpretation of bioactivities of ligands targeting G protein-coupled receptors. Bioinformatics 2019; 35:i324-i332. [PMID: 31510691 PMCID: PMC6612825 DOI: 10.1093/bioinformatics/btz336] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
MOTIVATION Accurate prediction and interpretation of ligand bioactivities are essential for virtual screening and drug discovery. Unfortunately, many important drug targets lack experimental data about the ligand bioactivities; this is particularly true for G protein-coupled receptors (GPCRs), which account for the targets of about a third of drugs currently on the market. Computational approaches with the potential of precise assessment of ligand bioactivities and determination of key substructural features which determine ligand bioactivities are needed to address this issue. RESULTS A new method, SED, was proposed to predict ligand bioactivities and to recognize key substructures associated with GPCRs through the coupling of screening for Lasso of long extended-connectivity fingerprints (ECFPs) with deep neural network training. The SED pipeline contains three successive steps: (i) representation of long ECFPs for ligand molecules, (ii) feature selection by screening for Lasso of ECFPs and (iii) bioactivity prediction through a deep neural network regression model. The method was examined on a set of 16 representative GPCRs that cover most subfamilies of human GPCRs, where each has 300-5000 ligand associations. The results show that SED achieves excellent performance in modelling ligand bioactivities, especially for those in the GPCR datasets without sufficient ligand associations, where SED improved the baseline predictors by 12% in correlation coefficient (r2) and 19% in root mean square error. Detail data analyses suggest that the major advantage of SED lies on its ability to detect substructures from long ECFPs which significantly improves the predictive performance. AVAILABILITY AND IMPLEMENTATION The source code and datasets of SED are freely available at https://zhanglab.ccmb.med.umich.edu/SED/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiansheng Wu
- School of Geographic and Biological Information, Nanjing University of Posts and Telecommunications, Nanjing, China
- Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, Nanjing University of Posts and Telecommunications, Nanjing, China
| | - Ben Liu
- School of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China
| | - Wallace K B Chan
- Department of Pharmacology, University of Michigan, Ann Arbor, MI, USA
| | - Weijian Wu
- College of Computer and Information, Hohai University, Nanjing, China
| | - Tao Pang
- Jiangsu Key Laboratory of Drug Screening, China Pharmaceutical University, Nanjing, China
| | - Haifeng Hu
- School of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China
| | - Shancheng Yan
- School of Geographic and Biological Information, Nanjing University of Posts and Telecommunications, Nanjing, China
- Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, Nanjing University of Posts and Telecommunications, Nanjing, China
| | - Xiaoyan Ke
- Child Mental Health Research Center, Nanjing Brain Hospital, Nanjing Medical University, Nanjing, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
22
|
Exploring the Potential of Spherical Harmonics and PCVM for Compounds Activity Prediction. Int J Mol Sci 2019; 20:ijms20092175. [PMID: 31052500 PMCID: PMC6539940 DOI: 10.3390/ijms20092175] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 04/14/2019] [Accepted: 04/29/2019] [Indexed: 01/11/2023] Open
Abstract
Biologically active chemical compounds may provide remedies for several diseases. Meanwhile, Machine Learning techniques applied to Drug Discovery, which are cheaper and faster than wet-lab experiments, have the capability to more effectively identify molecules with the expected pharmacological activity. Therefore, it is urgent and essential to develop more representative descriptors and reliable classification methods to accurately predict molecular activity. In this paper, we investigate the potential of a novel representation based on Spherical Harmonics fed into Probabilistic Classification Vector Machines classifier, namely SHPCVM, to compound the activity prediction task. We make use of representation learning to acquire the features which describe the molecules as precise as possible. To verify the performance of SHPCVM ten-fold cross-validation tests are performed on twenty-one G protein-coupled receptors (GPCRs). Experimental outcomes (accuracy of 0.86) assessed by the classification accuracy, precision, recall, Matthews’ Correlation Coefficient and Cohen’s kappa reveal that using our Spherical Harmonics-based representation which is relatively short and Probabilistic Classification Vector Machines can achieve very satisfactory performance results for GPCRs.
Collapse
|
23
|
Jabeen A, Ranganathan S. Applications of machine learning in GPCR bioactive ligand discovery. Curr Opin Struct Biol 2019; 55:66-76. [PMID: 31005679 DOI: 10.1016/j.sbi.2019.03.022] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 03/14/2019] [Accepted: 03/14/2019] [Indexed: 12/17/2022]
Abstract
GPCRs constitute the largest druggable family having targets for 475 Food and Drug Administration (FDA) approved drugs. As GPCRs are of great interest to pharmaceutical industry, enormous efforts are being expended to find relevant and potent GPCR ligands as lead compounds. There are tens of millions of compounds present in different chemical databases. In order to scan this immense chemical space, computational methods, especially machine learning (ML) methods, are essential components of GPCR drug discovery pipelines. ML approaches have applications in both ligand-based and structure-based virtual screening. We present here a cheminformatics overview of ML applications to different stages of GPCR drug discovery. Focusing on olfactory receptors, which are the largest family of GPCRs, a case study for predicting agonists for an ectopic olfactory receptor, OR1G1, compares four classical ML methods.
Collapse
Affiliation(s)
- Amara Jabeen
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia.
| |
Collapse
|
24
|
Kensert A, Alvarsson J, Norinder U, Spjuth O. Evaluating parameters for ligand-based modeling with random forest on sparse data sets. J Cheminform 2018; 10:49. [PMID: 30306349 PMCID: PMC6755600 DOI: 10.1186/s13321-018-0304-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 10/03/2018] [Indexed: 11/10/2022] Open
Abstract
Ligand-based predictive modeling is widely used to generate predictive models aiding decision making in e.g. drug discovery projects. With growing data sets and requirements on low modeling time comes the necessity to analyze data sets efficiently to support rapid and robust modeling. In this study we analyzed four data sets and studied the efficiency of machine learning methods on sparse data structures, utilizing Morgan fingerprints of different radii and hash sizes, and compared with molecular signatures descriptor of different height. We specifically evaluated the effect these parameters had on modeling time, predictive performance, and memory requirements using two implementations of random forest; Scikit-learn as well as FEST. We also compared with a support vector machine implementation. Our results showed that unhashed fingerprints yield significantly better accuracy than hashed fingerprints ([Formula: see text]), with no pronounced deterioration in modeling time and memory usage. Furthermore, the fast execution and low memory usage of the FEST algorithm suggest that it is a good alternative for large, high dimensional sparse data. Both support vector machines and random forest performed equally well but results indicate that the support vector machine was better at using the extra information from larger values of the Morgan fingerprint's radius.
Collapse
Affiliation(s)
- Alexander Kensert
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden.
| | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Ulf Norinder
- Unit of Toxicology Sciences, Karolinska Institutet, Swetox, Forskargatan 20, SE-15136, Södertälje, Sweden.,Department of Computer and Systems Sciences, Stockholm University, Box 7003, SE-164 07, Kista, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| |
Collapse
|
25
|
Bushdid C, de March CA, Fiorucci S, Matsunami H, Golebiowski J. Agonists of G-Protein-Coupled Odorant Receptors Are Predicted from Chemical Features. J Phys Chem Lett 2018; 9:2235-2240. [PMID: 29648835 PMCID: PMC7294703 DOI: 10.1021/acs.jpclett.8b00633] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Predicting the activity of chemicals for a given odorant receptor is a longstanding challenge. Here the activity of 258 chemicals on the human G-protein-coupled odorant receptor (OR)51E1, also known as prostate-specific G-protein-coupled receptor 2 (PSGR2), was virtually screened by machine learning using 4884 chemical descriptors as input. A systematic control by functional in vitro assays revealed that a support vector machine algorithm accurately predicted the activity of a screened library. It allowed us to identify two novel agonists in vitro for OR51E1. The transferability of the protocol was assessed on OR1A1, OR2W1, and MOR256-3 odorant receptors, and, in each case, novel agonists were identified with a hit rate of 39-50%. We further show how ligands' efficacy is encoded into residues within OR51E1 cavity using a molecular modeling protocol. Our approach allows widening the chemical spaces associated with odorant receptors. This machine-learning protocol based on chemical features thus represents an efficient tool for screening ligands for G-protein-coupled odorant receptors that modulate non-olfactory functions or, upon combinatorial activation, give rise to our sense of smell.
Collapse
Affiliation(s)
- C. Bushdid
- Institute of Chemistry of Nice, UMR CNRS 7272, Université Côte d’Azur, Nice, France
| | - C. A. de March
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, United States
| | - S. Fiorucci
- Institute of Chemistry of Nice, UMR CNRS 7272, Université Côte d’Azur, Nice, France
| | - H. Matsunami
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, United States
- Department of Neurobiology and Duke Institute for Brain Sciences, Duke University, Durham, North Carolina 27710, United States
- Corresponding Authors: (J.G.)., (H.M.)
| | - J. Golebiowski
- Institute of Chemistry of Nice, UMR CNRS 7272, Université Côte d’Azur, Nice, France
- Department of Brain & Cognitive Sciences, DGIST, Daegu, Republic of Korea
- Corresponding Authors: (J.G.)., (H.M.)
| |
Collapse
|