1
|
Weckbecker M, Anžel A, Yang Z, Hattab G. Interpretable molecular encodings and representations for machine learning tasks. Comput Struct Biotechnol J 2024; 23:2326-2336. [PMID: 38867722 PMCID: PMC11167246 DOI: 10.1016/j.csbj.2024.05.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 05/13/2024] [Accepted: 05/19/2024] [Indexed: 06/14/2024] Open
Abstract
Molecular encodings and their usage in machine learning models have demonstrated significant breakthroughs in biomedical applications, particularly in the classification of peptides and proteins. To this end, we propose a new encoding method: Interpretable Carbon-based Array of Neighborhoods (iCAN). Designed to address machine learning models' need for more structured and less flexible input, it captures the neighborhoods of carbon atoms in a counting array and improves the utility of the resulting encodings for machine learning models. The iCAN method provides interpretable molecular encodings and representations, enabling the comparison of molecular neighborhoods, identification of repeating patterns, and visualization of relevance heat maps for a given data set. When reproducing a large biomedical peptide classification study, it outperforms its predecessor encoding. When extended to proteins, it outperforms a lead structure-based encoding on 71% of the data sets. Our method offers interpretable encodings that can be applied to all organic molecules, including exotic amino acids, cyclic peptides, and larger proteins, making it highly versatile across various domains and data sets. This work establishes a promising new direction for machine learning in peptide and protein classification in biomedicine and healthcare, potentially accelerating advances in drug discovery and disease diagnosis.
Collapse
Affiliation(s)
- Moritz Weckbecker
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
| | - Aleksandar Anžel
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
| | - Zewen Yang
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
| | - Georges Hattab
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
- Department of Mathematics and Computer science Freie Universität, Arnimallee 14, Berlin, 14195, Berlin, Germany
| |
Collapse
|
2
|
Regueiro-Ren A, Sit SY, Chen Y, Chen J, Swidorski JJ, Liu Z, Venables BL, Sin N, Hartz RA, Protack T, Lin Z, Zhang S, Li Z, Wu DR, Li P, Kempson J, Hou X, Gupta A, Rampulla R, Mathur A, Park H, Sarjeant A, Benitex Y, Rahematpura S, Parker D, Phillips T, Haskell R, Jenkins S, Santone KS, Cockett M, Hanumegowda U, Dicker I, Meanwell NA, Krystal M. The Discovery of GSK3640254, a Next-Generation Inhibitor of HIV-1 Maturation. J Med Chem 2022; 65:11927-11948. [PMID: 36044257 DOI: 10.1021/acs.jmedchem.2c00879] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
GSK3640254 is an HIV-1 maturation inhibitor (MI) that exhibits significantly improved antiviral activity toward a range of clinically relevant polymorphic variants with reduced sensitivity toward the second-generation MI GSK3532795 (BMS-955176). The key structural difference between GSK3640254 and its predecessor is the replacement of the para-substituted benzoic acid moiety attached at the C-3 position of the triterpenoid core with a cyclohex-3-ene-1-carboxylic acid substituted with a CH2F moiety at the carbon atom α- to the pharmacophoric carboxylic acid. This structural element provided a new vector with which to explore structure-activity relationships (SARs) and led to compounds with improved polymorphic coverage while preserving pharmacokinetic (PK) properties. The approach to the design of GSK3640254, the development of a synthetic route and its preclinical profile are discussed. GSK3640254 is currently in phase IIb clinical trials after demonstrating a dose-related reduction in HIV-1 viral load over 7-10 days of dosing to HIV-1-infected subjects.
Collapse
Affiliation(s)
- Alicia Regueiro-Ren
- Small Molecule Drug Discovery, Bristol Myers Squibb Research and Early Development, Princeton, New Jersey08543, United States
| | - Sing-Yuen Sit
- Department of Discovery Chemistry, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Yan Chen
- Department of Discovery Chemistry, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Jie Chen
- Department of Discovery Chemistry, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Jacob J Swidorski
- Department of Discovery Chemistry, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Zheng Liu
- Department of Discovery Chemistry, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Brian L Venables
- Department of Discovery Chemistry, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Ny Sin
- Department of Discovery Chemistry, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Richard A Hartz
- Department of Discovery Chemistry, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Tricia Protack
- Department of Virology, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Zeyu Lin
- Department of Virology, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Sharon Zhang
- Department of Virology, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Zhufang Li
- Department of Virology, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Dauh-Rurng Wu
- Department of Discovery Synthesis, Bristol Myers Squibb Research and Early Development, PO Box 4000, Princeton, New Jersey08543, United States
| | - Peng Li
- Department of Discovery Synthesis, Bristol Myers Squibb Research and Early Development, PO Box 4000, Princeton, New Jersey08543, United States
| | - James Kempson
- Department of Discovery Synthesis, Bristol Myers Squibb Research and Early Development, PO Box 4000, Princeton, New Jersey08543, United States
| | - Xiaoping Hou
- Department of Discovery Synthesis, Bristol Myers Squibb Research and Early Development, PO Box 4000, Princeton, New Jersey08543, United States
| | - Anuradha Gupta
- Department of Discovery Synthesis; Bristol Myers Squibb Research and Early Development, Bangalore 560099, India
| | - Richard Rampulla
- Department of Discovery Synthesis, Bristol Myers Squibb Research and Early Development, PO Box 4000, Princeton, New Jersey08543, United States
| | - Arvind Mathur
- Department of Discovery Synthesis, Bristol Myers Squibb Research and Early Development, PO Box 4000, Princeton, New Jersey08543, United States
| | - Hyunsoo Park
- Bristol Myers Squibb Chemical and Synthetic Development, New Brunswick, New Jersey08901, United States
| | - Amy Sarjeant
- Bristol Myers Squibb Chemical and Synthetic Development, New Brunswick, New Jersey08901, United States
| | - Yulia Benitex
- Department of Pharmaceutical Candidate Optimization, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Sandhya Rahematpura
- Department of Pharmaceutical Candidate Optimization, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Dawn Parker
- Department of Pharmaceutical Candidate Optimization, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Thomas Phillips
- Department of Pharmaceutical Candidate Optimization, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Roy Haskell
- Department of Pharmaceutical Candidate Optimization, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Susan Jenkins
- Department of Pharmaceutical Candidate Optimization, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Kenneth S Santone
- Department of Pharmaceutical Candidate Optimization, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Mark Cockett
- Department of Virology, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Umesh Hanumegowda
- Department of Pharmaceutical Candidate Optimization, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Ira Dicker
- Department of Virology, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| | - Nicholas A Meanwell
- Small Molecule Drug Discovery, Bristol Myers Squibb Research and Early Development, Princeton, New Jersey08543, United States
| | - Mark Krystal
- Department of Virology, Bristol Myers Squibb Research and Early Development, 5 Research Parkway, Wallingford, Connecticut06492, United States
| |
Collapse
|
3
|
Brand L, Yang X, Liu K, Elbeleidy S, Wang H, Zhang H, Nie F. Learning Robust Multilabel Sample Specific Distances for Identifying HIV-1 Drug Resistance. J Comput Biol 2019; 27:655-672. [PMID: 31725323 DOI: 10.1089/cmb.2019.0329] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
AIDS is a syndrome caused by the HIV. During the progression of AIDS, a patient's immune system is weakened, which increases the patient's susceptibility to infections and diseases. Although antiretroviral drugs can effectively suppress HIV, the virus mutates very quickly and can become resistant to treatment. In addition, the virus can also become resistant to other treatments not currently being used through mutations, which is known in the clinical research community as cross-resistance. Since a single HIV strain can be resistant to multiple drugs, this problem is naturally represented as a multilabel classification problem. Given this multilabel relationship, traditional single-label classification methods often fail to effectively identify the drug resistances that may develop after a particular virus mutation. In this work, we propose a novel multilabel Robust Sample Specific Distance (RSSD) method to identify multiclass HIV drug resistance. Our method is novel in that it can illustrate the relative strength of the drug resistance of a reverse transcriptase (RT) sequence against a given drug nucleoside analog and learn the distance metrics for all the drug resistances. To learn the proposed RSSDs, we formulate a learning objective that maximizes the ratio of the summations of a number of ℓ1-norm distances, which is difficult to solve in general. To solve this optimization problem, we derive an efficient, nongreedy iterative algorithm with rigorously proved convergence. Our new method has been verified on a public HIV type 1 drug resistance data set with over 600 RT sequences and five nucleoside analogs. We compared our method against several state-of-the-art multilabel classification methods, and the experimental results have demonstrated the effectiveness of our proposed method.
Collapse
Affiliation(s)
- Lodewijk Brand
- Department of Computer Science, Colorado School of Mines, Golden, Colorado
| | - Xue Yang
- Department of Computer Science, Colorado School of Mines, Golden, Colorado
| | - Kai Liu
- Department of Computer Science, Colorado School of Mines, Golden, Colorado
| | - Saad Elbeleidy
- Department of Computer Science, Colorado School of Mines, Golden, Colorado
| | - Hua Wang
- Department of Computer Science, Colorado School of Mines, Golden, Colorado
| | - Hao Zhang
- Department of Computer Science, Colorado School of Mines, Golden, Colorado
| | - Feiping Nie
- School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi'an, P.R. China
| |
Collapse
|
4
|
Tarasova OA, Filimonov DA, Poroikov VV. [Computational prediction of human immunodeficiency resistance to reverse transcriptase inhibitors]. BIOMEDIT︠S︡INSKAI︠A︡ KHIMII︠A︡ 2019; 63:457-460. [PMID: 29080881 DOI: 10.18097/pbmc20176305457] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Human immunodeficiency virus (HIV) causes acquired immunodeficiency syndrome (AIDS) and leads to over one million of deaths annually. Highly active antiretroviral treatment (HAART) is a gold standard in the HIV/AIDS therapy. Nucleoside and non-nucleoside inhibitors of HIV reverse transcriptase (RT) are important component of HAART, but their effect depends on the HIV susceptibility/resistance. HIV resistance mainly occurs due to mutations leading to conformational changes in the three-dimensional structure of HIV RT. The aim of our work was to develop and test a computational method for prediction of HIV resistance associated with the mutations in HIV RT. Earlier we have developed a method for prediction of HIV type 1 (HIV-1) resistance; it is based on the usage of position-specific descriptors. These descriptors are generated using the particular amino acid residue and its position; the position of certain residue is determined in a multiple alignment. The training set consisted of more than 1900 sequences of HIV RT from the Stanford HIV Drug Resistance database; for these HIV RT variants experimental data on their resistance to ten inhibitors are presented. Balanced accuracy of prediction varies from 80% to 99% depending on the method of classification (support vector machine, Naive Bayes, random forest, convolutional neural networks) and the drug, resistance to which is obtained. Maximal balanced accuracy was obtained for prediction of resistance to zidovudine, stavudine, didanosine and efavirenz by the random forest classifier. Average accuracy of prediction is 89%.
Collapse
Affiliation(s)
- O A Tarasova
- Institute of Biomedical Chemistry, Moscow, Russia
| | | | - V V Poroikov
- Institute of Biomedical Chemistry, Moscow, Russia
| |
Collapse
|
5
|
Spänig S, Heider D. Encodings and models for antimicrobial peptide classification for multi-resistant pathogens. BioData Min 2019; 12:7. [PMID: 30867681 PMCID: PMC6399931 DOI: 10.1186/s13040-019-0196-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 02/24/2019] [Indexed: 01/10/2023] Open
Abstract
Antimicrobial peptides (AMPs) are part of the inherent immune system. In fact, they occur in almost all organisms including, e.g., plants, animals, and humans. Remarkably, they show effectivity also against multi-resistant pathogens with a high selectivity. This is especially crucial in times, where society is faced with the major threat of an ever-increasing amount of antibiotic resistant microbes. In addition, AMPs can also exhibit antitumor and antiviral effects, thus a variety of scientific studies dealt with the prediction of active peptides in recent years. Due to their potential, even the pharmaceutical industry is keen on discovering and developing novel AMPs. However, AMPs are difficult to verify in vitro, hence researchers conduct sequence similarity experiments against known, active peptides. Unfortunately, this approach is very time-consuming and limits potential candidates to sequences with a high similarity to known AMPs. Machine learning methods offer the opportunity to explore the huge space of sequence variations in a timely manner. These algorithms have, in principal, paved the way for an automated discovery of AMPs. However, machine learning models require a numerical input, thus an informative encoding is very important. Unfortunately, developing an appropriate encoding is a major challenge, which has not been entirely solved so far. For this reason, the development of novel amino acid encodings is established as a stand-alone research branch. The present review introduces state-of-the-art encodings of amino acids as well as their properties in sequence and structure based aggregation. Moreover, albeit a well-chosen encoding is essential, performant classifiers are required, which is reflected by a tendency towards specifically designed models in the literature. Furthermore, we introduce these models with a particular focus on encodings derived from support vector machines and deep learning approaches. Albeit a strong focus has been set on AMP predictions, not all of the mentioned encodings have been elaborated as part of antimicrobial research studies, but rather as general protein or peptide representations.
Collapse
Affiliation(s)
- Sebastian Spänig
- Department of Bioinformatics, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| | - Dominik Heider
- Department of Bioinformatics, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| |
Collapse
|
6
|
Tarasova O, Filimonov D, Poroikov V. PASS-based approach to predict HIV-1 reverse transcriptase resistance. J Bioinform Comput Biol 2016; 15:1650040. [PMID: 28033735 DOI: 10.1142/s0219720016500402] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
HIV reverse transcriptase (RT) inhibitors targeting the early stages of virus-host interactions are of great interest to scientists. Acquired HIV RT resistance happens due to mutations in a particular region of the pol gene encoding the HIV RT amino acid sequence. We propose an application of the previously developed PASS algorithm for prediction of amino acid substitutions potentially involved in the resistance of HIV-1 based on open data. In our work, we used more than 3200 HIV-1 RT variants from the publicly available Stanford HIV RT and protease sequence database already tested for 10 anti-HIV drugs including both nucleoside and non-nucleoside RT inhibitors. We used a particular amino acid residue and its position to describe primary structure-resistance relationships. The average balanced accuracy of the prediction obtained in 20-fold cross-validation for the Phenosense dataset was about 88% and for the Antivirogram dataset was about 79%. Thus, the PASS-based algorithm may be used for prediction of the amino acid substitutions associated with the resistance of HIV-1 based on open data. The computational approach for the prediction of HIV-1 associated resistance can be useful for the selection of RT inhibitors for the treatment of HIV infected patients in the clinical practice. Prediction of the HIV-1 RT associated resistance can be useful for the development of new anti-HIV drugs active against the resistant variants of RT. Therefore, we propose that this study can be potentially useful for anti-HIV drug development.
Collapse
Affiliation(s)
- Olga Tarasova
- 1 Department for Bioinformatics, Institute of Biomedical Chemistry, 10 building 8, Pogodinskaya street, 119121, Moscow, Russia
| | - Dmitry Filimonov
- 1 Department for Bioinformatics, Institute of Biomedical Chemistry, 10 building 8, Pogodinskaya street, 119121, Moscow, Russia
| | - Vladimir Poroikov
- 1 Department for Bioinformatics, Institute of Biomedical Chemistry, 10 building 8, Pogodinskaya street, 119121, Moscow, Russia
| |
Collapse
|
7
|
Riemenschneider M, Hummel T, Heider D. SHIVA - a web application for drug resistance and tropism testing in HIV. BMC Bioinformatics 2016; 17:314. [PMID: 27549230 PMCID: PMC4994198 DOI: 10.1186/s12859-016-1179-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 08/11/2016] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Drug resistance testing is mandatory in antiretroviral therapy in human immunodeficiency virus (HIV) infected patients for successful treatment. The emergence of resistances against antiretroviral agents remains the major obstacle in inhibition of viral replication and thus to control infection. Due to the high mutation rate the virus is able to adapt rapidly under drug pressure leading to the evolution of resistant variants and finally to therapy failure. RESULTS We developed a web service for drug resistance prediction of commonly used drugs in antiretroviral therapy, i.e., protease inhibitors (PIs), reverse transcriptase inhibitors (NRTIs and NNRTIs), and integrase inhibitors (INIs), but also for the novel drug class of maturation inhibitors. Furthermore, co-receptor tropism (CCR5 or CXCR4) can be predicted as well, which is essential for treatment with entry inhibitors, such as Maraviroc. Currently, SHIVA provides 24 prediction models for several drug classes. SHIVA can be used with single RNA/DNA or amino acid sequences, but also with large amounts of next-generation sequencing data and allows prediction of a user specified selection of drugs simultaneously. Prediction results are provided as clinical reports which are sent via email to the user. CONCLUSIONS SHIVA represents a novel high performing alternative for hitherto developed drug resistance testing approaches able to process data derived from next-generation sequencing technologies. SHIVA is publicly available via a user-friendly web interface.
Collapse
Affiliation(s)
- Mona Riemenschneider
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315, Germany.,University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354, Germany
| | - Thomas Hummel
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315, Germany.,University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354, Germany
| | - Dominik Heider
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315, Germany. .,University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354, Germany. .,Wissenschaftszentrum Weihenstephan, Technische Universität München, Alte Akademie 8, Freising, 85354, Germany.
| |
Collapse
|
8
|
Intelligent Soft Computing on Forex: Exchange Rates Forecasting with Hybrid Radial Basis Neural Network. ScientificWorldJournal 2016; 2016:3460293. [PMID: 26977450 PMCID: PMC4761754 DOI: 10.1155/2016/3460293] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 09/20/2015] [Indexed: 11/22/2022] Open
Abstract
This paper deals with application of quantitative soft computing prediction models into financial area as reliable and accurate prediction models can be very helpful in management decision-making process. The authors suggest a new hybrid neural network which is a combination of the standard RBF neural network, a genetic algorithm, and a moving average. The moving average is supposed to enhance the outputs of the network using the error part of the original neural network. Authors test the suggested model on high-frequency time series data of USD/CAD and examine the ability to forecast exchange rate values for the horizon of one day. To determine the forecasting efficiency, they perform a comparative statistical out-of-sample analysis of the tested model with autoregressive models and the standard neural network. They also incorporate genetic algorithm as an optimizing technique for adapting parameters of ANN which is then compared with standard backpropagation and backpropagation combined with K-means clustering algorithm. Finally, the authors find out that their suggested hybrid neural network is able to produce more accurate forecasts than the standard models and can be helpful in eliminating the risk of making the bad decision in decision-making process.
Collapse
|
9
|
Riemenschneider M, Senge R, Neumann U, Hüllermeier E, Heider D. Exploiting HIV-1 protease and reverse transcriptase cross-resistance information for improved drug resistance prediction by means of multi-label classification. BioData Min 2016; 9:10. [PMID: 26933450 PMCID: PMC4772363 DOI: 10.1186/s13040-016-0089-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 02/20/2016] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Antiretroviral therapy is essential for human immunodeficiency virus (HIV) infected patients to inhibit viral replication and therewith to slow progression of disease and prolong a patient's life. However, the high mutation rate of HIV can lead to a fast adaptation of the virus under drug pressure and thereby to the evolution of resistant variants. In turn, these variants will lead to the failure of antiretroviral treatment. Moreover, these mutations cannot only lead to resistance against single drugs, but also to cross-resistance, i.e., resistance against drugs that have not yet been applied. METHODS 662 protease sequences and 715 reverse transcriptase sequences with complete resistance profiles were analyzed using machine learning techniques, namely binary relevance classifiers, classifier chains, and ensembles of classifier chains. RESULTS In our study, we applied multi-label classification models incorporating cross-resistance information to predict drug resistance for two of the major drug classes used in antiretroviral therapy for HIV-1, namely protease inhibitors (PIs) and non-nucleoside reverse transcriptase inhibitors (NNRTIs). By means of multi-label learning, namely classifier chains (CCs) and ensembles of classifier chains (ECCs), we were able to improve overall prediction accuracy for all drugs compared to hitherto applied binary classification models. CONCLUSIONS The development of fast and precise models to predict drug resistance in HIV-1 is highly important to enable a highly effective personalized therapy. Cross-resistance information can be exploited to improve prediction accuracy of computational drug resistance models.
Collapse
Affiliation(s)
- Mona Riemenschneider
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315 Germany ; University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354 Germany
| | - Robin Senge
- Department of Computer Science, University of Paderborn, Pohlweg 47, Paderborn, 33098 Germany
| | - Ursula Neumann
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315 Germany ; Wissenschaftszentrum Weihenstephan, Technische Universität München, Alte Akademie 8, Freising, 85354 Germany ; University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354 Germany
| | - Eyke Hüllermeier
- Department of Computer Science, University of Paderborn, Pohlweg 47, Paderborn, 33098 Germany
| | - Dominik Heider
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315 Germany ; Wissenschaftszentrum Weihenstephan, Technische Universität München, Alte Akademie 8, Freising, 85354 Germany ; University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354 Germany
| |
Collapse
|
10
|
Heider D, Dybowski JN, Wilms C, Hoffmann D. A simple structure-based model for the prediction of HIV-1 co-receptor tropism. BioData Min 2014; 7:14. [PMID: 25120583 PMCID: PMC4124776 DOI: 10.1186/1756-0381-7-14] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2014] [Accepted: 07/28/2014] [Indexed: 12/16/2022] Open
Abstract
Background Human Immunodeficiency Virus 1 enters host cells through interaction of its V3 loop (which is part of the gp120 protein) with the host cell receptor CD4 and one of two co-receptors, namely CCR5 or CXCR4. Entry inhibitors binding the CCR5 co-receptor can prevent viral entry. As these drugs are only available for CCR5-using viruses, accurate prediction of this so-called co-receptor tropism is important in order to ensure an effective personalized therapy. With the development of next-generation sequencing technologies, it is now possible to sequence representative subpopulations of the viral quasispecies. Results Here we present T-CUP 2.0, a model for predicting co-receptor tropism. Based on our recently published T-CUP model, we developed a more accurate and even faster solution. Similarly to its predecessor, T-CUP 2.0 models co-receptor tropism using information of the electrostatic potential and hydrophobicity of V3-loops. However, extracting this information from a simplified structural vacuum-model leads to more accurate and faster predictions. The area-under-the-ROC-curve (AUC) achieved with T-CUP 2.0 on the training set is 0.968±0.005 in a leave-one-patient-out cross-validation. When applied to an independent dataset, T-CUP 2.0 has an improved prediction accuracy of around 3% when compared to the original T-CUP. Conclusions We found that it is possible to model co-receptor tropism in HIV-1 based on a simplified structure-based model of the V3 loop. In this way, genotypic prediction of co-receptor tropism is very accurate, fast and can be applied to large datasets derived from next-generation sequencing technologies. The reduced complexity of the electrostatic modeling makes T-CUP 2.0 independent from third-party software, making it easy to install and use.
Collapse
Affiliation(s)
- Dominik Heider
- Research Group Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| | - Jan Nikolaj Dybowski
- Research Group Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| | - Christoph Wilms
- Research Group Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| | - Daniel Hoffmann
- Research Group Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| |
Collapse
|
11
|
Pan W, Liu K, Guan Y, Tan GT, Hung NV, Cuong NM, Soejarto DD, Pezzuto JM, Fong HHS, Zhang H. Bioactive compounds from Vitex leptobotrys. JOURNAL OF NATURAL PRODUCTS 2014; 77:663-7. [PMID: 24404757 PMCID: PMC4068261 DOI: 10.1021/np400779v] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
A new lignan, vitexkarinol (1), as well as a known lignan, neopaulownin (2), a known chalcone, 3-(4-hydroxyphenyl)-1-(2,4,6-trimethoxyphenyl)-2-propen-1-one (3), two known dehydroflavones, tsugafolin (4) and alpinetin (5), two known dipeptides, aurantiamide and aurantiamide acetate, a known sesquiterpene, vemopolyanthofuran, and five known carotenoid metabolites, vomifoliol, dihydrovomifoliol, dehydrovomifoliol, loliolide, and isololiolide, were isolated from the leaves and twigs of Vitex leptobotrys through bioassay-guided fractionation. The chalcone (3) was found to inhibit HIV-1 replication by 77% at 15.9 μM, and the two dehydroflavones (4 and 5) showed weak anti-HIV activity with IC50 values of 118 and 130 μM, respectively, while being devoid of cytotoxicity at 150 μM. A chlorophyll-enriched fraction of V. leptobotrys, containing pheophorbide a, was found to inhibit the replication of HIV-1 by 80% at a concentration of 10 μg/mL. Compounds 1 and 3 were further selected to be evaluated against 21 viral targets available at NIAID (National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA).
Collapse
Affiliation(s)
- Wenhui Pan
- School of Chinese Medicine, Hong Kong Baptist University , Kowloon Tong, Hong Kong SAR, People's Republic of China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Heider D, Senge R, Cheng W, Hüllermeier E. Multilabel classification for exploiting cross-resistance information in HIV-1 drug resistance prediction. ACTA ACUST UNITED AC 2013; 29:1946-52. [PMID: 23793752 DOI: 10.1093/bioinformatics/btt331] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Antiretroviral treatment regimens can sufficiently suppress viral replication in human immunodeficiency virus (HIV)-infected patients and prevent the progression of the disease. However, one of the factors contributing to the progression of the disease despite ongoing antiretroviral treatment is the emergence of drug resistance. The high mutation rate of HIV can lead to a fast adaptation of the virus under drug pressure, thus to failure of antiretroviral treatment due to the evolution of drug-resistant variants. Moreover, cross-resistance phenomena have been frequently found in HIV-1, leading to resistance not only against a drug from the current treatment, but also to other not yet applied drugs. Automatic classification and prediction of drug resistance is increasingly important in HIV research as well as in clinical settings, and to this end, machine learning techniques have been widely applied. Nevertheless, cross-resistance information was not taken explicitly into account, yet. RESULTS In our study, we demonstrated the use of cross-resistance information to predict drug resistance in HIV-1. We tested a set of more than 600 reverse transcriptase sequences and corresponding resistance information for six nucleoside analogues. Based on multilabel classification models and cross-resistance information, we were able to significantly improve overall prediction accuracy for all drugs, compared with single binary classifiers without any additional information. Moreover, we identified drug-specific patterns within the reverse transcriptase sequences that can be used to determine an optimal order of the classifiers within the classifier chains. These patterns are in good agreement with known resistance mutations and support the use of cross-resistance information in such prediction models. CONTACT dominik.heider@uni-due.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dominik Heider
- Department of Bioinformatics, University of Duisburg-Essen, Essen, Germany
| | | | | | | |
Collapse
|
13
|
Touw WG, Bayjanov JR, Overmars L, Backus L, Boekhorst J, Wels M, van Hijum SAFT. Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? Brief Bioinform 2012; 14:315-26. [PMID: 22786785 PMCID: PMC3659301 DOI: 10.1093/bib/bbs034] [Citation(s) in RCA: 204] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
In the Life Sciences 'omics' data is increasingly generated by different high-throughput technologies. Often only the integration of these data allows uncovering biological insights that can be experimentally validated or mechanistically modelled, i.e. sophisticated computational approaches are required to extract the complex non-linear trends present in omics data. Classification techniques allow training a model based on variables (e.g. SNPs in genetic association studies) to separate different classes (e.g. healthy subjects versus patients). Random Forest (RF) is a versatile classification algorithm suited for the analysis of these large data sets. In the Life Sciences, RF is popular because RF classification models have a high-prediction accuracy and provide information on importance of variables for classification. For omics data, variables or conditional relations between variables are typically important for a subset of samples of the same class. For example: within a class of cancer patients certain SNP combinations may be important for a subset of patients that have a specific subtype of cancer, but not important for a different subset of patients. These conditional relationships can in principle be uncovered from the data with RF as these are implicitly taken into account by the algorithm during the creation of the classification model. This review details some of the to the best of our knowledge rarely or never used RF properties that allow maximizing the biological insights that can be extracted from complex omics data sets using RF.
Collapse
|
14
|
Safi M, Lilien RH. Efficient a Priori Identification of Drug Resistant Mutations Using Dead-End Elimination and MM-PBSA. J Chem Inf Model 2012; 52:1529-41. [DOI: 10.1021/ci200626m] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Maria Safi
- Department of Computer Science, University of Toronto,
Toronto, Ontario M5S 3G4, Canada
| | - Ryan H. Lilien
- Department of Computer Science, University of Toronto,
Toronto, Ontario M5S 3G4, Canada
| |
Collapse
|
15
|
Dias DA, Urban S, Roessner U. A historical overview of natural products in drug discovery. Metabolites 2012; 2:303-36. [PMID: 24957513 PMCID: PMC3901206 DOI: 10.3390/metabo2020303] [Citation(s) in RCA: 877] [Impact Index Per Article: 73.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2012] [Revised: 03/31/2012] [Accepted: 03/31/2012] [Indexed: 12/25/2022] Open
Abstract
Historically, natural products have been used since ancient times and in folklore for the treatment of many diseases and illnesses. Classical natural product chemistry methodologies enabled a vast array of bioactive secondary metabolites from terrestrial and marine sources to be discovered. Many of these natural products have gone on to become current drug candidates. This brief review aims to highlight historically significant bioactive marine and terrestrial natural products, their use in folklore and dereplication techniques to rapidly facilitate their discovery. Furthermore a discussion of how natural product chemistry has resulted in the identification of many drug candidates; the application of advanced hyphenated spectroscopic techniques to aid in their discovery, the future of natural product chemistry and finally adopting metabolomic profiling and dereplication approaches for the comprehensive study of natural product extracts will be discussed.
Collapse
Affiliation(s)
- Daniel A Dias
- Metabolomics Australia, School of Botany, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Sylvia Urban
- School of Applied Sciences (Discipline of Applied Chemistry), Health Innovations Research Institute (HIRi) RMIT University, G.P.O. Box 2476V, Melbourne, Victoria 3001, Australia
| | - Ute Roessner
- Metabolomics Australia, School of Botany, The University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
16
|
Mutational patterns in the frameshift-regulating site of HIV-1 selected by protease inhibitors. Med Microbiol Immunol 2011; 201:213-8. [PMID: 22200908 DOI: 10.1007/s00430-011-0224-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2011] [Indexed: 12/29/2022]
Abstract
Sustained suppression of viral replication in HIV-1 infected patients is especially hampered by the emergence of HIV-1 drug resistance. The mechanisms of drug resistance mainly involve mutations directly altering the interaction of viral enzymes and inhibitors. However, protease inhibitors do not only select for mutations in the protease but also for mutations in the precursor Gag and Pol proteins. In this study, we analysed the frameshift-regulating site of HIV-1 subtype B isolates, which also encodes for Gag and Pol proteins, classified as either treatment-naïve (TN) or protease inhibitor resistant (PI-R). HIV-1 Gag cleavage site mutations (G435E, K436N, I437V, L449F/V) especially correlated with protease inhibitor resistance mutations, but also Pol cleavage site mutations (D05G, D05S) could be assigned to specific protease resistance profiles. Additionally, two Gag non-cleavage site mutations (S440F, H441P) were observed more often in HIV-1 isolates carrying protease resistance mutations. However, in dual luciferase assays, the frameshift efficiencies of specific clones did not reveal any effect from these mutations. Nevertheless, two patterns of mutations modestly increased the frameshift rates in vitro, but were not specifically accumulating in PI-resistant HIV-1 isolates. In summary, HIV-1 Gag cleavage site mutations were dominantly selected in PI-resistant HIV-1 isolates but also Pol cleavage site mutations influenced resistance profiles in the protease. Additionally, Gag non-cleavage site mutations accumulated in PI-resistant HIV-1 isolates, but were not related to an increased frameshift efficiency.
Collapse
|
17
|
Dybowski JN, Riemenschneider M, Hauke S, Pyka M, Verheyen J, Hoffmann D, Heider D. Improved Bevirimat resistance prediction by combination of structural and sequence-based classifiers. BioData Min 2011; 4:26. [PMID: 22082002 PMCID: PMC3248369 DOI: 10.1186/1756-0381-4-26] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2011] [Accepted: 11/14/2011] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Maturation inhibitors such as Bevirimat are a new class of antiretroviral drugs that hamper the cleavage of HIV-1 proteins into their functional active forms. They bind to these preproteins and inhibit their cleavage by the HIV-1 protease, resulting in non-functional virus particles. Nevertheless, there exist mutations in this region leading to resistance against Bevirimat. Highly specific and accurate tools to predict resistance to maturation inhibitors can help to identify patients, who might benefit from the usage of these new drugs. RESULTS We tested several methods to improve Bevirimat resistance prediction in HIV-1. It turned out that combining structural and sequence-based information in classifier ensembles led to accurate and reliable predictions. Moreover, we were able to identify the most crucial regions for Bevirimat resistance computationally, which are in line with experimental results from other studies. CONCLUSIONS Our analysis demonstrated the use of machine learning techniques to predict HIV-1 resistance against maturation inhibitors such as Bevirimat. New maturation inhibitors are already under development and might enlarge the arsenal of antiretroviral drugs in the future. Thus, accurate prediction tools are very useful to enable a personalized therapy.
Collapse
Affiliation(s)
- J Nikolaj Dybowski
- Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr, 2, 45117 Essen, Germany.
| | | | | | | | | | | | | |
Collapse
|
18
|
Mishra BB, Tiwari VK. Natural products: An evolving role in future drug discovery. Eur J Med Chem 2011; 46:4769-807. [DOI: 10.1016/j.ejmech.2011.07.057] [Citation(s) in RCA: 565] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2011] [Revised: 07/29/2011] [Accepted: 07/30/2011] [Indexed: 11/16/2022]
|
19
|
Computational Design of a DNA- and Fc-Binding Fusion Protein. Adv Bioinformatics 2011; 2011:457578. [PMID: 21941539 PMCID: PMC3173724 DOI: 10.1155/2011/457578] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2011] [Revised: 06/16/2011] [Accepted: 06/22/2011] [Indexed: 12/23/2022] Open
Abstract
Computational design of novel proteins with well-defined functions is an ongoing topic in computational biology. In this work, we generated and optimized a new synthetic fusion protein using an evolutionary approach. The optimization was guided by directed evolution based on hydrophobicity scores, molecular weight, and secondary structure predictions. Several methods were used to refine the models built from the resulting sequences. We have successfully combined two unrelated naturally occurring binding sites, the immunoglobin Fc-binding site of the Z domain and the DNA-binding motif of MyoD bHLH, into a novel stable protein.
Collapse
|
20
|
Dorr CR, Yemets S, Kolomitsyna O, Krasutsky P, Mansky LM. Triterpene derivatives that inhibit human immunodeficiency virus type 1 replication. Bioorg Med Chem Lett 2011; 21:542-5. [PMID: 21084190 DOI: 10.1016/j.bmcl.2010.10.078] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2010] [Revised: 10/14/2010] [Accepted: 10/15/2010] [Indexed: 12/25/2022]
Abstract
Triterpene derivatives were analyzed for anti-HIV-1 activity and for cellular toxicity. Betulinic aldehyde, betulinic nitrile, and morolic acid derivatives were identified to have anti-HIV-1 activity. These derivatives inhibit a late step in virus replication, likely virus maturation.
Collapse
Affiliation(s)
- Casey R Dorr
- University of Minnesota, Institute for Molecular Virology, 18-242 Moos Tower, 525 Delaware St SE, Minneapolis, MN 55455, United States
| | | | | | | | | |
Collapse
|
21
|
Heider D, Verheyen J, Hoffmann D. Machine learning on normalized protein sequences. BMC Res Notes 2011; 4:94. [PMID: 21453485 PMCID: PMC3079662 DOI: 10.1186/1756-0500-4-94] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2010] [Accepted: 03/31/2011] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Machine learning techniques have been widely applied to biological sequences, e.g. to predict drug resistance in HIV-1 from sequences of drug target proteins and protein functional classes. As deletions and insertions are frequent in biological sequences, a major limitation of current methods is the inability to handle varying sequence lengths. FINDINGS We propose to normalize sequences to uniform length. To this end, we tested one linear and four different non-linear interpolation methods for the normalization of sequence lengths of 19 classification datasets. Classification tasks included prediction of HIV-1 drug resistance from drug target sequences and sequence-based prediction of protein function. We applied random forests to the classification of sequences into "positive" and "negative" samples. Statistical tests showed that the linear interpolation outperforms the non-linear interpolation methods in most of the analyzed datasets, while in a few cases non-linear methods had a small but significant advantage. Compared to other published methods, our prediction scheme leads to an improvement in prediction accuracy by up to 14%. CONCLUSIONS We found that machine learning on sequences normalized by simple linear interpolation gave better or at least competitive results compared to state-of-the-art procedures, and thus, is a promising alternative to existing methods, especially for protein sequences of variable length.
Collapse
Affiliation(s)
- Dominik Heider
- Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| | - Jens Verheyen
- Institute of Virology, University of Cologne, Fuerst-Pueckler-Str. 56, 50935 Cologne, Germany
| | - Daniel Hoffmann
- Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| |
Collapse
|