1
|
Paremskaia AI, Rudik AV, Filimonov DA, Lagunin AA, Poroikov VV, Tarasova OA. Web Service for HIV Drug Resistance Prediction Based on Analysis of Amino Acid Substitutions in Main Drug Targets. Viruses 2023; 15:2245. [PMID: 38005921 PMCID: PMC10674809 DOI: 10.3390/v15112245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 10/30/2023] [Accepted: 11/06/2023] [Indexed: 11/26/2023] Open
Abstract
Predicting viral drug resistance is a significant medical concern. The importance of this problem stimulates the continuous development of experimental and new computational approaches. The use of computational approaches allows researchers to increase therapy effectiveness and reduce the time and expenses involved when the prescribed antiretroviral therapy is ineffective in the treatment of infection caused by the human immunodeficiency virus type 1 (HIV-1). We propose two machine learning methods and the appropriate models for predicting HIV drug resistance related to amino acid substitutions in HIV targets: (i) k-mers utilizing the random forest and the support vector machine algorithms of the scikit-learn library, and (ii) multi-n-grams using the Bayesian approach implemented in MultiPASSR software. Both multi-n-grams and k-mers were computed based on the amino acid sequences of HIV enzymes: reverse transcriptase and protease. The performance of the models was estimated by five-fold cross-validation. The resulting classification models have a relatively high reliability (minimum accuracy for the drugs is 0.82, maximum: 0.94) and were used to create a web application, HVR (HIV drug Resistance), for the prediction of HIV drug resistance to protease inhibitors and nucleoside and non-nucleoside reverse transcriptase inhibitors based on the analysis of the amino acid sequences of the appropriate HIV proteins from clinical samples.
Collapse
Affiliation(s)
- Anastasiia Iu. Paremskaia
- Department of Bioinformatics, Pirogov Russian National Research Medical University, Ostrovitianov Str. 1, Moscow 117997, Russia;
- Live Sciences Research Center, Moscow Institute of Physics and Technology, National Research University, Institutsky Lane 9, Dolgoprudny 141700, Russia
| | - Anastassia V. Rudik
- Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, 10 bldg. 8, Pogodinskaya Str., Moscow 119121, Russia; (A.V.R.); (D.A.F.); (V.V.P.)
| | - Dmitry A. Filimonov
- Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, 10 bldg. 8, Pogodinskaya Str., Moscow 119121, Russia; (A.V.R.); (D.A.F.); (V.V.P.)
| | - Alexey A. Lagunin
- Department of Bioinformatics, Pirogov Russian National Research Medical University, Ostrovitianov Str. 1, Moscow 117997, Russia;
- Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, 10 bldg. 8, Pogodinskaya Str., Moscow 119121, Russia; (A.V.R.); (D.A.F.); (V.V.P.)
| | - Vladimir V. Poroikov
- Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, 10 bldg. 8, Pogodinskaya Str., Moscow 119121, Russia; (A.V.R.); (D.A.F.); (V.V.P.)
| | - Olga A. Tarasova
- Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, 10 bldg. 8, Pogodinskaya Str., Moscow 119121, Russia; (A.V.R.); (D.A.F.); (V.V.P.)
| |
Collapse
|
2
|
Huang S, Ding Y. Identification of Anticancer and Anti-inflammatory Drugs from Drug-target Interaction Descriptors by Machine Learning.. LETT DRUG DES DISCOV 2022. [DOI: 10.2174/1570180819666220114114752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Drug repositioning is an important subject in drug-disease research. In the past, most studies simply used drug descriptors as the feature vector to classify drugs or targets, or used qualitative data about drug-target or drug-disease to predict drug-target interactions. These data provide limited information for drug repositioning.
Objective:
Considering both drugs and targets and constructing quantitative drug-target interaction descriptors as a method of drug characteristics are of great significance to the study of drug repositioning.
Methods:
Taking anticancer and anti-inflammatory drugs as research objects, the interaction sites between drugs and targets were determined by molecular docking. Sixty-seven drug-target interaction descriptors were calculated to describe the drug-target interactions, and 22 important descriptors were screened for drug classification by SVM, LightGBM and MLP.
Results:
The accuracy of SVM, LightGBM and MLP reached 93.29%, 92.68% and 94.51%, their Matthews correlation coefficients reached 0.852, 0.840 and 0.882, and their areas under the ROC curve reached 0.977, 0.969 and 0.968, respectively.
Conclusion:
Using drug-target interaction descriptors to build machine learning models can obtain better results for drug classification. Number of atom pairs, force field, hydrophobic interactions and bSASA are the four types of key features for the classification of anticancer and anti-inflammatory drugs.
Collapse
Affiliation(s)
- Songtao Huang
- School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, P.R. China
- Laboratory of Media Design and Software Technology, Jiangnan University, Wuxi, Jiangsu, 214122, P.R. China
| | - Yanrui Ding
- school of Science, Jiangnan University, Wuxi, Jiangsu, 214122, P.R. China
- Key Laboratory of Industrial Biotechnology, Jiangnan University, Wuxi, Jiangsu, 214122, P.R. China
| |
Collapse
|
3
|
Tarasova O, Biziukova N, Kireev D, Lagunin A, Ivanov S, Filimonov D, Poroikov V. A Computational Approach for the Prediction of Treatment History and the Effectiveness or Failure of Antiretroviral Therapy. Int J Mol Sci 2020; 21:ijms21030748. [PMID: 31979356 PMCID: PMC7037494 DOI: 10.3390/ijms21030748] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 01/20/2020] [Accepted: 01/21/2020] [Indexed: 02/01/2023] Open
Abstract
Human Immunodeficiency Virus Type 1 (HIV-1) infection is associated with high mortality if no therapy is provided. Currently, the treatment of an HIV-1 positive patient requires that several drugs should be taken simultaneously. The resistance of the virus to an antiretroviral drug may lead to treatment failure. Our approach focuses on predicting the exposure of a particular viral variant to an antiretroviral drug or drug combination. It also aims at the prediction of drug treatment success or failure. We utilized nucleotide sequences of HIV-1 encoding protease and reverse transcriptase to perform such types of prediction. The PASS (Prediction of Activity Spectra for Substances) algorithm based on the naive Bayesian classifier was used to make a prediction. We calculated the probability of whether a sequence belonged (P1) or did not belong (P0) to the class associated with exposure of the viral sequence to the set of drugs that can be associated with resistance to the set of drugs. The accuracy calculated as the average Area Under the ROC (Receiver Operating Characteristic) Curve (AUC/ROC) for classifying exposure of the sequence to the HIV-1 protease inhibitors was 0.81 (±0.07), and for HIV-1 reverse transcriptase, it was 0.83 (±0.07). To predict cases of treatment effectiveness or failure, we used P1 and P0 values, obtained in PASS, along with the binary vector constructed based on short nucleotide descriptors and the applied random forest classifier. Average AUC/ROC prediction accuracy for the prediction of treatment effectiveness or failure for the combinations of HIV-1 protease inhibitors was 0.82 (±0.06) and of HIV-1 reverse transcriptase was 0.76 (±0.09).
Collapse
Affiliation(s)
- Olga Tarasova
- Department of Bioinformatics, Institute of Biomedical Chemistry, 119121 Moscow, Russia; (N.B.); (A.L.); (S.I.); (D.F.); (V.P.)
- Correspondence:
| | - Nadezhda Biziukova
- Department of Bioinformatics, Institute of Biomedical Chemistry, 119121 Moscow, Russia; (N.B.); (A.L.); (S.I.); (D.F.); (V.P.)
| | - Dmitry Kireev
- Central Research Institute of Epidemiology, 111123 Moscow, Russia;
| | - Alexey Lagunin
- Department of Bioinformatics, Institute of Biomedical Chemistry, 119121 Moscow, Russia; (N.B.); (A.L.); (S.I.); (D.F.); (V.P.)
- Department of Bioinformatics, Pirogov Russian National Research Medical University, 117997 Moscow, Russia
| | - Sergey Ivanov
- Department of Bioinformatics, Institute of Biomedical Chemistry, 119121 Moscow, Russia; (N.B.); (A.L.); (S.I.); (D.F.); (V.P.)
- Department of Bioinformatics, Pirogov Russian National Research Medical University, 117997 Moscow, Russia
| | - Dmitry Filimonov
- Department of Bioinformatics, Institute of Biomedical Chemistry, 119121 Moscow, Russia; (N.B.); (A.L.); (S.I.); (D.F.); (V.P.)
| | - Vladimir Poroikov
- Department of Bioinformatics, Institute of Biomedical Chemistry, 119121 Moscow, Russia; (N.B.); (A.L.); (S.I.); (D.F.); (V.P.)
| |
Collapse
|
4
|
Deep learning on chaos game representation for proteins. Bioinformatics 2019; 36:272-279. [DOI: 10.1093/bioinformatics/btz493] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 05/29/2019] [Accepted: 06/14/2019] [Indexed: 11/14/2022] Open
Abstract
AbstractMotivationClassification of protein sequences is one big task in bioinformatics and has many applications. Different machine learning methods exist and are applied on these problems, such as support vector machines (SVM), random forests (RF) and neural networks (NN). All of these methods have in common that protein sequences have to be made machine-readable and comparable in the first step, for which different encodings exist. These encodings are typically based on physical or chemical properties of the sequence. However, due to the outstanding performance of deep neural networks (DNN) on image recognition, we used frequency matrix chaos game representation (FCGR) for encoding of protein sequences into images. In this study, we compare the performance of SVMs, RFs and DNNs, trained on FCGR encoded protein sequences. While the original chaos game representation (CGR) has been used mainly for genome sequence encoding and classification, we modified it to work also for protein sequences, resulting in n-flakes representation, an image with several icosagons.ResultsWe could show that all applied machine learning techniques (RF, SVM and DNN) show promising results compared to the state-of-the-art methods on our benchmark datasets, with DNNs outperforming the other methods and that FCGR is a promising new encoding method for protein sequences.Availability and implementationhttps://cran.r-project.org/.Supplementary informationSupplementary data are available at Bioinformatics online.
Collapse
|
5
|
Spänig S, Heider D. Encodings and models for antimicrobial peptide classification for multi-resistant pathogens. BioData Min 2019; 12:7. [PMID: 30867681 PMCID: PMC6399931 DOI: 10.1186/s13040-019-0196-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 02/24/2019] [Indexed: 01/10/2023] Open
Abstract
Antimicrobial peptides (AMPs) are part of the inherent immune system. In fact, they occur in almost all organisms including, e.g., plants, animals, and humans. Remarkably, they show effectivity also against multi-resistant pathogens with a high selectivity. This is especially crucial in times, where society is faced with the major threat of an ever-increasing amount of antibiotic resistant microbes. In addition, AMPs can also exhibit antitumor and antiviral effects, thus a variety of scientific studies dealt with the prediction of active peptides in recent years. Due to their potential, even the pharmaceutical industry is keen on discovering and developing novel AMPs. However, AMPs are difficult to verify in vitro, hence researchers conduct sequence similarity experiments against known, active peptides. Unfortunately, this approach is very time-consuming and limits potential candidates to sequences with a high similarity to known AMPs. Machine learning methods offer the opportunity to explore the huge space of sequence variations in a timely manner. These algorithms have, in principal, paved the way for an automated discovery of AMPs. However, machine learning models require a numerical input, thus an informative encoding is very important. Unfortunately, developing an appropriate encoding is a major challenge, which has not been entirely solved so far. For this reason, the development of novel amino acid encodings is established as a stand-alone research branch. The present review introduces state-of-the-art encodings of amino acids as well as their properties in sequence and structure based aggregation. Moreover, albeit a well-chosen encoding is essential, performant classifiers are required, which is reflected by a tendency towards specifically designed models in the literature. Furthermore, we introduce these models with a particular focus on encodings derived from support vector machines and deep learning approaches. Albeit a strong focus has been set on AMP predictions, not all of the mentioned encodings have been elaborated as part of antimicrobial research studies, but rather as general protein or peptide representations.
Collapse
Affiliation(s)
- Sebastian Spänig
- Department of Bioinformatics, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| | - Dominik Heider
- Department of Bioinformatics, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| |
Collapse
|
6
|
Tarasova O, Biziukova N, Filimonov D, Poroikov V. A Computational Approach for the Prediction of HIV Resistance Based on Amino Acid and Nucleotide Descriptors. Molecules 2018; 23:E2751. [PMID: 30355996 PMCID: PMC6278491 DOI: 10.3390/molecules23112751] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 10/07/2018] [Accepted: 10/16/2018] [Indexed: 12/25/2022] Open
Abstract
The high variability of the human immunodeficiency virus (HIV) is an important cause of HIV resistance to reverse transcriptase and protease inhibitors. There are many variants of HIV type 1 (HIV-1) that can be used to model sequence-resistance relationships. Machine learning methods are widely and successfully used in new drug discovery. An emerging body of data regarding the interactions of small drug-like molecules with their protein targets provides the possibility of building models on "structure-property" relationships and analyzing the performance of various machine-learning techniques. In our research, we analyze several different types of descriptors in order to predict the resistance of HIV reverse transcriptase and protease to the marketed antiretroviral drugs using the Random Forest approach. First, we represented amino acid sequences as a set of short peptide fragments, which included several amino acid residues. Second, we represented nucleotide sequences as a set of fragments, which included several nucleotides. We compared these two approaches using open data from the Stanford HIV Drug Resistance Database. We have determined the factors that modulate the performance of prediction: in particular, we observed that the prediction performance was more sensitive to certain drugs than a type of the descriptor used.
Collapse
Affiliation(s)
- Olga Tarasova
- Institute of Biomedical Chemistry, Moscow 119121, Russia.
| | | | | | | |
Collapse
|
7
|
Shen C, Yu X, Harrison RW, Weber IT. Automated prediction of HIV drug resistance from genotype data. BMC Bioinformatics 2016; 17 Suppl 8:278. [PMID: 27586700 PMCID: PMC5009519 DOI: 10.1186/s12859-016-1114-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background HIV/AIDS is a serious threat to public health. The emergence of drug resistance mutations diminishes the effectiveness of drug therapy for HIV/AIDS. Developing a computational prediction of drug resistance phenotype will enable efficient and timely selection of the best treatment regimens. Results A unified encoding of protein sequence and structure was used as the feature vector for predicting phenotypic resistance from genotype data. Two machine learning algorithms, Random Forest and K-nearest neighbor, were used. The prediction accuracies were examined by five-fold cross-validation on the genotype-phenotype datasets. A supervised machine learning approach for automatic prediction of drug resistance was developed to handle genotype-phenotype datasets of HIV protease (PR) and reverse transcriptase (RT). It predicts the drug resistance phenotype and its relative severity from a query sequence. The accuracy of the classification was higher than 0.973 for eight PR inhibitors and 0.986 for ten RT inhibitors, respectively. The overall cross-validated regression R2-values for the severity of drug resistance were 0.772–0.953 for 8 PR inhibitors and 0.773–0.995 for 10 RT inhibitors. Conclusions Machine learning using a unified encoding of sequence and protein structure as a feature vector provides an accurate prediction of drug resistance from genotype data. A practical webserver for clinicians has been implemented.
Collapse
Affiliation(s)
- ChenHsiang Shen
- Department of Biology, Georgia State University, Atlanta, GA, 30303, USA
| | - Xiaxia Yu
- Department of Computer Science, Georgia State University, Atlanta, GA, 30303, USA
| | - Robert W Harrison
- Department of Biology, Georgia State University, Atlanta, GA, 30303, USA.,Department of Computer Science, Georgia State University, Atlanta, GA, 30303, USA
| | - Irene T Weber
- Department of Biology, Georgia State University, Atlanta, GA, 30303, USA.
| |
Collapse
|
8
|
Riemenschneider M, Hummel T, Heider D. SHIVA - a web application for drug resistance and tropism testing in HIV. BMC Bioinformatics 2016; 17:314. [PMID: 27549230 PMCID: PMC4994198 DOI: 10.1186/s12859-016-1179-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 08/11/2016] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Drug resistance testing is mandatory in antiretroviral therapy in human immunodeficiency virus (HIV) infected patients for successful treatment. The emergence of resistances against antiretroviral agents remains the major obstacle in inhibition of viral replication and thus to control infection. Due to the high mutation rate the virus is able to adapt rapidly under drug pressure leading to the evolution of resistant variants and finally to therapy failure. RESULTS We developed a web service for drug resistance prediction of commonly used drugs in antiretroviral therapy, i.e., protease inhibitors (PIs), reverse transcriptase inhibitors (NRTIs and NNRTIs), and integrase inhibitors (INIs), but also for the novel drug class of maturation inhibitors. Furthermore, co-receptor tropism (CCR5 or CXCR4) can be predicted as well, which is essential for treatment with entry inhibitors, such as Maraviroc. Currently, SHIVA provides 24 prediction models for several drug classes. SHIVA can be used with single RNA/DNA or amino acid sequences, but also with large amounts of next-generation sequencing data and allows prediction of a user specified selection of drugs simultaneously. Prediction results are provided as clinical reports which are sent via email to the user. CONCLUSIONS SHIVA represents a novel high performing alternative for hitherto developed drug resistance testing approaches able to process data derived from next-generation sequencing technologies. SHIVA is publicly available via a user-friendly web interface.
Collapse
Affiliation(s)
- Mona Riemenschneider
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315, Germany.,University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354, Germany
| | - Thomas Hummel
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315, Germany.,University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354, Germany
| | - Dominik Heider
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315, Germany. .,University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354, Germany. .,Wissenschaftszentrum Weihenstephan, Technische Universität München, Alte Akademie 8, Freising, 85354, Germany.
| |
Collapse
|
9
|
Heider D, Dybowski JN, Wilms C, Hoffmann D. A simple structure-based model for the prediction of HIV-1 co-receptor tropism. BioData Min 2014; 7:14. [PMID: 25120583 PMCID: PMC4124776 DOI: 10.1186/1756-0381-7-14] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2014] [Accepted: 07/28/2014] [Indexed: 12/16/2022] Open
Abstract
Background Human Immunodeficiency Virus 1 enters host cells through interaction of its V3 loop (which is part of the gp120 protein) with the host cell receptor CD4 and one of two co-receptors, namely CCR5 or CXCR4. Entry inhibitors binding the CCR5 co-receptor can prevent viral entry. As these drugs are only available for CCR5-using viruses, accurate prediction of this so-called co-receptor tropism is important in order to ensure an effective personalized therapy. With the development of next-generation sequencing technologies, it is now possible to sequence representative subpopulations of the viral quasispecies. Results Here we present T-CUP 2.0, a model for predicting co-receptor tropism. Based on our recently published T-CUP model, we developed a more accurate and even faster solution. Similarly to its predecessor, T-CUP 2.0 models co-receptor tropism using information of the electrostatic potential and hydrophobicity of V3-loops. However, extracting this information from a simplified structural vacuum-model leads to more accurate and faster predictions. The area-under-the-ROC-curve (AUC) achieved with T-CUP 2.0 on the training set is 0.968±0.005 in a leave-one-patient-out cross-validation. When applied to an independent dataset, T-CUP 2.0 has an improved prediction accuracy of around 3% when compared to the original T-CUP. Conclusions We found that it is possible to model co-receptor tropism in HIV-1 based on a simplified structure-based model of the V3 loop. In this way, genotypic prediction of co-receptor tropism is very accurate, fast and can be applied to large datasets derived from next-generation sequencing technologies. The reduced complexity of the electrostatic modeling makes T-CUP 2.0 independent from third-party software, making it easy to install and use.
Collapse
Affiliation(s)
- Dominik Heider
- Research Group Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| | - Jan Nikolaj Dybowski
- Research Group Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| | - Christoph Wilms
- Research Group Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| | - Daniel Hoffmann
- Research Group Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| |
Collapse
|
10
|
Doherty KM, Nakka P, King BM, Rhee SY, Holmes SP, Shafer RW, Radhakrishnan ML. A multifaceted analysis of HIV-1 protease multidrug resistance phenotypes. BMC Bioinformatics 2011; 12:477. [PMID: 22172090 PMCID: PMC3305535 DOI: 10.1186/1471-2105-12-477] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2011] [Accepted: 12/15/2011] [Indexed: 12/19/2022] Open
Abstract
Background Great strides have been made in the effective treatment of HIV-1 with the development of second-generation protease inhibitors (PIs) that are effective against historically multi-PI-resistant HIV-1 variants. Nevertheless, mutation patterns that confer decreasing susceptibility to available PIs continue to arise within the population. Understanding the phenotypic and genotypic patterns responsible for multi-PI resistance is necessary for developing PIs that are active against clinically-relevant PI-resistant HIV-1 variants. Results In this work, we use globally optimal integer programming-based clustering techniques to elucidate multi-PI phenotypic resistance patterns using a data set of 398 HIV-1 protease sequences that have each been phenotyped for susceptibility toward the nine clinically-approved HIV-1 PIs. We validate the information content of the clusters by evaluating their ability to predict the level of decreased susceptibility to each of the available PIs using a cross validation procedure. We demonstrate the finding that as a result of phenotypic cross resistance, the considered clinical HIV-1 protease isolates are confined to ~6% or less of the clinically-relevant phenotypic space. Clustering and feature selection methods are used to find representative sequences and mutations for major resistance phenotypes to elucidate their genotypic signatures. We show that phenotypic similarity does not imply genotypic similarity, that different PI-resistance mutation patterns can give rise to HIV-1 isolates with similar phenotypic profiles. Conclusion Rather than characterizing HIV-1 susceptibility toward each PI individually, our study offers a unique perspective on the phenomenon of PI class resistance by uncovering major multidrug-resistant phenotypic patterns and their often diverse genotypic determinants, providing a methodology that can be applied to understand clinically-relevant phenotypic patterns to aid in the design of novel inhibitors that target other rapidly evolving molecular targets as well.
Collapse
|
11
|
Dybowski JN, Riemenschneider M, Hauke S, Pyka M, Verheyen J, Hoffmann D, Heider D. Improved Bevirimat resistance prediction by combination of structural and sequence-based classifiers. BioData Min 2011; 4:26. [PMID: 22082002 PMCID: PMC3248369 DOI: 10.1186/1756-0381-4-26] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2011] [Accepted: 11/14/2011] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Maturation inhibitors such as Bevirimat are a new class of antiretroviral drugs that hamper the cleavage of HIV-1 proteins into their functional active forms. They bind to these preproteins and inhibit their cleavage by the HIV-1 protease, resulting in non-functional virus particles. Nevertheless, there exist mutations in this region leading to resistance against Bevirimat. Highly specific and accurate tools to predict resistance to maturation inhibitors can help to identify patients, who might benefit from the usage of these new drugs. RESULTS We tested several methods to improve Bevirimat resistance prediction in HIV-1. It turned out that combining structural and sequence-based information in classifier ensembles led to accurate and reliable predictions. Moreover, we were able to identify the most crucial regions for Bevirimat resistance computationally, which are in line with experimental results from other studies. CONCLUSIONS Our analysis demonstrated the use of machine learning techniques to predict HIV-1 resistance against maturation inhibitors such as Bevirimat. New maturation inhibitors are already under development and might enlarge the arsenal of antiretroviral drugs in the future. Thus, accurate prediction tools are very useful to enable a personalized therapy.
Collapse
Affiliation(s)
- J Nikolaj Dybowski
- Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr, 2, 45117 Essen, Germany.
| | | | | | | | | | | | | |
Collapse
|
12
|
Interpol: An R package for preprocessing of protein sequences. BioData Min 2011; 4:16. [PMID: 21682849 PMCID: PMC3138420 DOI: 10.1186/1756-0381-4-16] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2011] [Accepted: 06/17/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Most machine learning techniques currently applied in the literature need a fixed dimensionality of input data. However, this requirement is frequently violated by real input data, such as DNA and protein sequences, that often differ in length due to insertions and deletions. It is also notable that performance in classification and regression is often improved by numerical encoding of amino acids, compared to the commonly used sparse encoding. RESULTS The software "Interpol" encodes amino acid sequences as numerical descriptor vectors using a database of currently 532 descriptors (mainly from AAindex), and normalizes sequences to uniform length with one of five linear or non-linear interpolation algorithms. Interpol is distributed with open source as platform independent R-package. It is typically used for preprocessing of amino acid sequences for classification or regression. CONCLUSIONS The functionality of Interpol widens the spectrum of machine learning methods that can be applied to biological sequences, and it will in many cases improve their performance in classification and regression.
Collapse
|