1
|
Lin X, Quan Z, Wang ZJ, Guo Y, Zeng X, Yu PS. Effectively Identifying Compound-Protein Interaction Using Graph Neural Representation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:932-943. [PMID: 35951570 DOI: 10.1109/tcbb.2022.3198003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Effectively identifying compound-protein interactions (CPIs) is crucial for new drug design, which is an important step in silico drug discovery. Current machine learning methods for CPI prediction mainly use one-demensional (1D) compound/protein strings and/or the specific descriptors. However, they often ignore the fact that molecules are essentially modeled by the molecular graph. We observe that in real-world scenarios, the topological structure information of the molecular graph usually provides an overview of how the atoms are connected, and the local chemical context reveals the functionality of the protein sequence in CPI. These two types of information are complementary to each other and they are both significant for modeling compound-protein pairs. Motivated by this, we propose an end-to-end deep learning framework named GraphCPI, which captures the structural information of compounds and leverages the chemical context of protein sequences for solving the CPI prediction task. Our framework can integrate any popular graph neural networks for learning compounds, and it combines with a convolutional neural network for embedding sequences. To compare our method with classic and state-of-the-art deep learning methods, we conduct extensive experiments based on several widely-used CPI datasets. The experimental results show the feasibility and competitiveness of our proposed method.
Collapse
|
2
|
Patel D, Ono SK, Bassit L, Verma K, Amblard F, Schinazi RF. Assessment of a Computational Approach to Predict Drug Resistance Mutations for HIV, HBV and SARS-CoV-2. Molecules 2022; 27:molecules27175413. [PMID: 36080181 PMCID: PMC9457688 DOI: 10.3390/molecules27175413] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 08/18/2022] [Accepted: 08/22/2022] [Indexed: 11/28/2022] Open
Abstract
Viral resistance is a worldwide problem mitigating the effectiveness of antiviral drugs. Mutations in the drug-targeting proteins are the primary mechanism for the emergence of drug resistance. It is essential to identify the drug resistance mutations to elucidate the mechanism of resistance and to suggest promising treatment strategies to counter the drug resistance. However, experimental identification of drug resistance mutations is challenging, laborious and time-consuming. Hence, effective and time-saving computational structure-based approaches for predicting drug resistance mutations are essential and are of high interest in drug discovery research. However, these approaches are dependent on accurate estimation of binding free energies which indirectly correlate to the computational cost. Towards this goal, we developed a computational workflow to predict drug resistance mutations for any viral proteins where the structure is known. This approach can qualitatively predict the change in binding free energies due to mutations through residue scanning and Prime MM-GBSA calculations. To test the approach, we predicted resistance mutations in HIV-RT selected by (-)-FTC and demonstrated accurate identification of the clinical mutations. Furthermore, we predicted resistance mutations in HBV core protein for GLP-26 and in SARS-CoV-2 3CLpro for nirmatrelvir. Mutagenesis experiments were performed on two predicted resistance and three predicted sensitivity mutations in HBV core protein for GLP-26, corroborating the accuracy of the predictions.
Collapse
Affiliation(s)
- Dharmeshkumar Patel
- Center for ViroScience and Cure, Laboratory of Biochemical Pharmacology, Department of Pediatrics, Emory University School of Medicine and Children’s Healthcare of Atlanta, 1760 Haygood Dr., Atlanta, GA 30322, USA
| | - Suzane K. Ono
- Center for ViroScience and Cure, Laboratory of Biochemical Pharmacology, Department of Pediatrics, Emory University School of Medicine and Children’s Healthcare of Atlanta, 1760 Haygood Dr., Atlanta, GA 30322, USA
- Department of Gastroenterology, University of São Paulo School of Medicine, Av. Dr. Arnaldo, 455, São Paulo 05403-000, SP, Brazil
| | - Leda Bassit
- Center for ViroScience and Cure, Laboratory of Biochemical Pharmacology, Department of Pediatrics, Emory University School of Medicine and Children’s Healthcare of Atlanta, 1760 Haygood Dr., Atlanta, GA 30322, USA
| | - Kiran Verma
- Center for ViroScience and Cure, Laboratory of Biochemical Pharmacology, Department of Pediatrics, Emory University School of Medicine and Children’s Healthcare of Atlanta, 1760 Haygood Dr., Atlanta, GA 30322, USA
| | - Franck Amblard
- Center for ViroScience and Cure, Laboratory of Biochemical Pharmacology, Department of Pediatrics, Emory University School of Medicine and Children’s Healthcare of Atlanta, 1760 Haygood Dr., Atlanta, GA 30322, USA
| | - Raymond F. Schinazi
- Center for ViroScience and Cure, Laboratory of Biochemical Pharmacology, Department of Pediatrics, Emory University School of Medicine and Children’s Healthcare of Atlanta, 1760 Haygood Dr., Atlanta, GA 30322, USA
- Correspondence:
| |
Collapse
|
3
|
Pikalyova K, Orlov A, Lin A, Tarasova O, Marcou M, Horvath D, Poroikov V, Varnek A. HIV-1 drug resistance profiling using amino acid sequence space cartography. Bioinformatics 2022; 38:2307-2314. [PMID: 35157024 DOI: 10.1093/bioinformatics/btac090] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 01/03/2022] [Accepted: 02/08/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Human immunodeficiency virus (HIV) drug resistance is a global healthcare issue. The emergence of drug resistance influenced the efficacy of treatment regimens, thus stressing the importance of treatment adaptation. Computational methods predicting the drug resistance profile from genomic data of HIV isolates are advantageous for monitoring drug resistance in patients. However, existing computational methods for drug resistance prediction are either not suitable for emerging HIV strains with complex mutational patterns or lack interpretability, which is of paramount importance in clinical practice. The approach reported here overcomes these limitations and combines high accuracy of predictions and interpretability of the models. RESULTS In this work, a new methodology based on generative topographic mapping (GTM) for biological sequence space representation and quantitative genotype-phenotype relationships prediction purposes was introduced. The GTM-based resistance landscapes allowed us to predict the resistance of HIV strains based on sequencing and drug resistance data for three viral proteins [integrase (IN), protease (PR) and reverse transcriptase (RT)] from Stanford HIV drug resistance database. The average balanced accuracy for PR inhibitors was 0.89 ± 0.01, for IN inhibitors 0.85 ± 0.01, for non-nucleoside RT inhibitors 0.73 ± 0.01 and for nucleoside RT inhibitors 0.84 ± 0.01. We have demonstrated in several case studies that GTM-based resistance landscapes are useful for visualization and analysis of sequence space as well as for treatment optimization purposes. Here, GTMs were applied for the in-depth analysis of the relationships between mutation pattern and drug resistance using mutation landscapes. This allowed us to predict retrospectively the importance of the presence of particular mutations (e.g. V32I, L10F and L33F in HIV PR) for the resistance development. This study highlights some perspectives of GTM applications in clinical informatics and particularly in the field of sequence space exploration. AVAILABILITY AND IMPLEMENTATION https://github.com/karinapikalyova/ISIDASeq. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Karina Pikalyova
- Laboratoire de Chémoinformatique, UMR 7140, Université de Strasbourg, Strasbourg 67000, France
| | - Alexey Orlov
- Laboratoire de Chémoinformatique, UMR 7140, Université de Strasbourg, Strasbourg 67000, France
| | - Arkadii Lin
- Laboratoire de Chémoinformatique, UMR 7140, Université de Strasbourg, Strasbourg 67000, France
| | - Olga Tarasova
- Institute of Biomedical Chemistry, Moscow 119121, Russia
| | - MarcouGilles Marcou
- Laboratoire de Chémoinformatique, UMR 7140, Université de Strasbourg, Strasbourg 67000, France
| | - Dragos Horvath
- Laboratoire de Chémoinformatique, UMR 7140, Université de Strasbourg, Strasbourg 67000, France
| | | | - Alexandre Varnek
- Laboratoire de Chémoinformatique, UMR 7140, Université de Strasbourg, Strasbourg 67000, France
| |
Collapse
|
4
|
He S, Leanse LG, Feng Y. Artificial intelligence and machine learning assisted drug delivery for effective treatment of infectious diseases. Adv Drug Deliv Rev 2021; 178:113922. [PMID: 34461198 DOI: 10.1016/j.addr.2021.113922] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 07/14/2021] [Accepted: 08/09/2021] [Indexed: 12/23/2022]
Abstract
In the era of antimicrobial resistance, the prevalence of multidrug-resistant microorganisms that resist conventional antibiotic treatment has steadily increased. Thus, it is now unquestionable that infectious diseases are significant global burdens that urgently require innovative treatment strategies. Emerging studies have demonstrated that artificial intelligence (AI) can transform drug delivery to promote effective treatment of infectious diseases. In this review, we propose to evaluate the significance, essential principles, and popular tools of AI in drug delivery for infectious disease treatment. Specifically, we will focus on the achievements and key findings of current research, as well as the applications of AI on drug delivery throughout the whole antimicrobial treatment process, with an emphasis on drug development, treatment regimen optimization, drug delivery system and administration route design, and drug delivery outcome prediction. To that end, the challenges of AI in drug delivery for infectious disease treatments and their current solutions and future perspective will be presented and discussed.
Collapse
Affiliation(s)
- Sheng He
- Boston Children's Hospital, Harvard Medical School, Harvard University, Boston, MA, USA.
| | - Leon G Leanse
- Massachusetts General Hospital, Harvard Medical School, Harvard University, Boston, MA, USA
| | - Yanfang Feng
- Massachusetts General Hospital, Harvard Medical School, Harvard University, Boston, MA, USA.
| |
Collapse
|
5
|
Cai Q, Yuan R, He J, Li M, Guo Y. Predicting HIV drug resistance using weighted machine learning method at target protein sequence-level. Mol Divers 2021; 25:1541-1551. [PMID: 34241771 DOI: 10.1007/s11030-021-10262-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 06/19/2021] [Indexed: 11/29/2022]
Abstract
Acquired immune deficiency syndrome (AIDS) is a fatal disease caused by human immunodeficiency virus (HIV). Although 23 different drugs have been available, the treatment of AIDS remains challenging because the virus mutates very quickly which can lead to drug resistance. Therefore, predicting drug resistance before treatment is crucial for individual treatments. Here, based on HIV target protein sequence information, we analyzed 21-drug resistance caused by mutated residues using machine learning (ML) methods. To transform target sequences into numeric vectors, seven physicochemical properties were used, which can well represent the interacting characteristics of target proteins. Then, principal component analysis (PCA) method was adopted to reduce the feature dimensionality. Random forest (RF) and support vector machine (SVM) based on three different kernel functions, including linear, polynomial and radial basis function (RBF), were all employed. By comparisons, we found that RBF-based SVM method gives a comparative performance with RF model. Further, we added the weight information to RBF-based SVM method by four different weight evaluation methods of RF, eXtreme Gradient Boosting (XGB), CfsSubsetEval and ReliefFAttributeEval, respectively. Results show that the RF-weighted RBF-based SVM yield the superior performance and 13 out of 21 drug models provide the correlation coefficients (R2) over 0.8 and 3 of them are higher than 0.9. Finally, position-specific importance analysis indicates that most of the mutation residues with high RF weight scores are proved to be closely related with drug resistance, which has been revealed in previous reports. Overall, we can expect that this method can be a supplementary tool for predicting HIV drug resistance for newly discovered mutations. Here, based on HIV target protein sequence information, we analyzed 21-drug resistance caused by mutated residues using machine learning (ML) methods by fusing the weight information of different mutation positions.
Collapse
Affiliation(s)
- Qihang Cai
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Rongao Yuan
- College of Computer Science, Sichuan University, Chengdu, 610064, China
| | - Jian He
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China.
| |
Collapse
|
6
|
Tarasova O, Poroikov V. Machine Learning in Discovery of New Antivirals and Optimization of Viral Infections Therapy. Curr Med Chem 2021; 28:7840-7861. [PMID: 33949929 DOI: 10.2174/0929867328666210504114351] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 02/13/2021] [Accepted: 02/24/2021] [Indexed: 11/22/2022]
Abstract
Nowadays, computational approaches play an important role in the design of new drug-like compounds and optimization of pharmacotherapeutic treatment of diseases. The emerging growth of viral infections, including those caused by the Human Immunodeficiency Virus (HIV), Ebola virus, recently detected coronavirus, and some others, leads to many newly infected people with a high risk of death or severe complications. A huge amount of chemical, biological, clinical data is at the disposal of the researchers. Therefore, there are many opportunities to find the relationships between the particular features of chemical data and the antiviral activity of biologically active compounds based on machine learning approaches. Biological and clinical data can also be used for building models to predict relationships between viral genotype and drug resistance, which might help determine the clinical outcome of treatment. In the current study, we consider machine-learning approaches in the antiviral research carried out during the past decade. We overview in detail the application of machine-learning methods for the design of new potential antiviral agents and vaccines, drug resistance prediction, and analysis of virus-host interactions. Our review also covers the perspectives of using the machine-learning approaches for antiviral research, including Dengue, Ebola viruses, Influenza A, Human Immunodeficiency Virus, coronaviruses, and some others.
Collapse
Affiliation(s)
- Olga Tarasova
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow. Russian Federation
| | - Vladimir Poroikov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow. Russian Federation
| |
Collapse
|
7
|
Alves NG, Mata AI, Luís JP, Brito RMM, Simões CJV. An Innovative Sequence-to-Structure-Based Approach to Drug Resistance Interpretation and Prediction: The Use of Molecular Interaction Fields to Detect HIV-1 Protease Binding-Site Dissimilarities. Front Chem 2020; 8:243. [PMID: 32411655 PMCID: PMC7202381 DOI: 10.3389/fchem.2020.00243] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 03/13/2020] [Indexed: 12/15/2022] Open
Abstract
In silico methodologies have opened new avenues of research to understanding and predicting drug resistance, a pressing health issue that keeps rising at alarming pace. Sequence-based interpretation systems are routinely applied in clinical context in an attempt to predict mutation-based drug resistance and thus aid the choice of the most adequate antibiotic and antiviral therapy. An important limitation of approaches based on genotypic data exclusively is that mutations are not considered in the context of the three-dimensional (3D) structure of the target. Structure-based in silico methodologies are inherently more suitable to interpreting and predicting the impact of mutations on target-drug interactions, at the cost of higher computational and time demands when compared with sequence-based approaches. Herein, we present a fast, computationally inexpensive, sequence-to-structure-based approach to drug resistance prediction, which makes use of 3D protein structures encoded by input target sequences to draw binding-site comparisons with susceptible templates. Rather than performing atom-by-atom comparisons between input target and template structures, our workflow generates and compares Molecular Interaction Fields (MIFs) that map the areas of energetically favorable interactions between several chemical probe types and the target binding site. Quantitative, pairwise dissimilarity measurements between the target and the template binding sites are thus produced. The method is particularly suited to understanding changes to the 3D structure and the physicochemical environment introduced by mutations into the target binding site. Furthermore, the workflow relies exclusively on freeware, making it accessible to anyone. Using four datasets of known HIV-1 protease sequences as a case-study, we show that our approach is capable of correctly classifying resistant and susceptible sequences given as input. Guided by ROC curve analyses, we fined-tuned a dissimilarity threshold of classification that results in remarkable discriminatory performance (accuracy ≈ ROC AUC ≈ 0.99), illustrating the high potential of sequence-to-structure-, MIF-based approaches in the context of drug resistance prediction. We discuss the complementarity of the proposed methodology to existing prediction algorithms based on genotypic data. The present work represents a new step toward a more comprehensive and structurally-informed interpretation of the impact of genetic variability on the response to HIV-1 therapies.
Collapse
Affiliation(s)
- Nuno G Alves
- Department of Chemistry, Coimbra Chemistry Centre, University of Coimbra, Coimbra, Portugal
| | - Ana I Mata
- Department of Chemistry, Coimbra Chemistry Centre, University of Coimbra, Coimbra, Portugal
| | - João P Luís
- Department of Chemistry, Coimbra Chemistry Centre, University of Coimbra, Coimbra, Portugal
| | - Rui M M Brito
- Department of Chemistry, Coimbra Chemistry Centre, University of Coimbra, Coimbra, Portugal.,BSIM Therapeutics, Instituto Pedro Nunes, Coimbra, Portugal
| | - Carlos J V Simões
- Department of Chemistry, Coimbra Chemistry Centre, University of Coimbra, Coimbra, Portugal.,BSIM Therapeutics, Instituto Pedro Nunes, Coimbra, Portugal
| |
Collapse
|
8
|
Lin X, Quan Z, Wang ZJ, Huang H, Zeng X. A novel molecular representation with BiGRU neural networks for learning atom. Brief Bioinform 2019; 21:2099-2111. [DOI: 10.1093/bib/bbz125] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 08/15/2019] [Accepted: 08/31/2019] [Indexed: 12/20/2022] Open
Abstract
Abstract
Molecular representations play critical roles in researching drug design and properties, and effective methods are beneficial to assisting in the calculation of molecules and solving related problem in drug discovery. In previous years, most of the traditional molecular representations are based on hand-crafted features and rely heavily on biological experimentations, which are often costly and time consuming. However, recent researches achieve promising results using machine learning on various domains. In this article, we present a novel method named Smi2Vec-BiGRU that is designed for learning atoms and solving the single- and multitask binary classification problems in the field of drug discovery, which are the basic and also key problems in this field. Specifically, our approach transforms the molecule data in the SMILES format into a set of sample vectors and then feeds them into the bidirectional gated recurrent unit neural networks for training, which learns low-dimensional vector representations for molecular drug. We conduct extensive experiments on several widely used benchmarks including Tox21, SIDER and ClinTox. The experimental results show that our approach can achieve state-of-the-art performance on these benchmarking datasets, demonstrating the feasibility and competitiveness of our proposed approach.
Collapse
Affiliation(s)
- Xuan Lin
- College of Computer Science and Technology, Hunan University, Changsha, 410082, China
| | - Zhe Quan
- College of Computer Science and Technology, Hunan University, Changsha, 410082, China
| | - Zhi-Jie Wang
- College of Computer Science and Technology, Hunan University, Changsha, 410082, China
| | - Huang Huang
- College of Computer, National University of Defense Technology, Changsha, 410073,China
| | - Xiangxiang Zeng
- College of Computer Science and Technology, Hunan University, Changsha, 410082, China
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510275, China
| |
Collapse
|
9
|
Ramon E, Belanche-Muñoz L, Pérez-Enciso M. HIV drug resistance prediction with weighted categorical kernel functions. BMC Bioinformatics 2019; 20:410. [PMID: 31362714 PMCID: PMC6668108 DOI: 10.1186/s12859-019-2991-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Accepted: 07/11/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Antiretroviral drugs are a very effective therapy against HIV infection. However, the high mutation rate of HIV permits the emergence of variants that can be resistant to the drug treatment. Predicting drug resistance to previously unobserved variants is therefore very important for an optimum medical treatment. In this paper, we propose the use of weighted categorical kernel functions to predict drug resistance from virus sequence data. These kernel functions are very simple to implement and are able to take into account HIV data particularities, such as allele mixtures, and to weigh the different importance of each protein residue, as it is known that not all positions contribute equally to the resistance. RESULTS We analyzed 21 drugs of four classes: protease inhibitors (PI), integrase inhibitors (INI), nucleoside reverse transcriptase inhibitors (NRTI) and non-nucleoside reverse transcriptase inhibitors (NNRTI). We compared two categorical kernel functions, Overlap and Jaccard, against two well-known noncategorical kernel functions (Linear and RBF) and Random Forest (RF). Weighted versions of these kernels were also considered, where the weights were obtained from the RF decrease in node impurity. The Jaccard kernel was the best method, either in its weighted or unweighted form, for 20 out of the 21 drugs. CONCLUSIONS Results show that kernels that take into account both the categorical nature of the data and the presence of mixtures consistently result in the best prediction model. The advantage of including weights depended on the protein targeted by the drug. In the case of reverse transcriptase, weights based in the relative importance of each position clearly increased the prediction performance, while the improvement in the protease was much smaller. This seems to be related to the distribution of weights, as measured by the Gini index. All methods described, together with documentation and examples, are freely available at https://bitbucket.org/elies_ramon/catkern.
Collapse
Affiliation(s)
- Elies Ramon
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB Consortium, Campus UAB, 08193 Bellaterra, Barcelona, Spain.
| | - Lluís Belanche-Muñoz
- Computer Science Department, Technical University of Catalonia, Carrer de Jordi Girona 1-3, 08034, Barcelona, Spain
| | - Miguel Pérez-Enciso
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB Consortium, Campus UAB, 08193 Bellaterra, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig de Lluís Companys 23, 08010, Barcelona, Spain
| |
Collapse
|