1
|
Onah E, Uzor PF, Ugwoke IC, Eze JU, Ugwuanyi ST, Chukwudi IR, Ibezim A. Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors. BMC Bioinformatics 2022; 23:466. [DOI: 10.1186/s12859-022-05017-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 10/11/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the treatment of human immunodeficiency virus (HIV) infection, which is responsible for AIDS. Peptide cleavage by HIV-1 protease is an essential step in the replication of HIV-1. Thus, correct and timely prediction of the cleavage site of HIV-1 protease can significantly speed up and optimize the drug discovery process of novel HIV-1 protease inhibitors. In this work, we built and compared the performance of selected machine learning models for the prediction of HIV-1 protease cleavage site utilizing a hybrid of octapeptide sequence information comprising bond composition, amino acid binary profile (AABP), and physicochemical properties as numerical descriptors serving as input variables for some selected machine learning algorithms. Our work differs from antecedent studies exploring the same subject in the combination of octapeptide descriptors and method used. Instead of using various subsets of the dataset for training and testing the models, we combined the dataset, applied a 3-way data split, and then used a "stratified" 10-fold cross-validation technique alongside the testing set to evaluate the models.
Results
Among the 8 models evaluated in the “stratified” 10-fold CV experiment, logistic regression, multi-layer perceptron classifier, linear discriminant analysis, gradient boosting classifier, Naive Bayes classifier, and decision tree classifier with AUC, F-score, and B. Acc. scores in the ranges of 0.91–0.96, 0.81–0.88, and 80.1–86.4%, respectively, have the closest predictive performance to the state-of-the-art model (AUC 0.96, F-score 0.80 and B. Acc. ~ 80.0%). Whereas, the perceptron classifier and the K-nearest neighbors had statistically lower performance (AUC 0.77–0.82, F-score 0.53–0.69, and B. Acc. 60.0–68.5%) at p < 0.05. On the other hand, logistic regression, and multi-layer perceptron classifier (AUC of 0.97, F-score > 0.89, and B. Acc. > 90.0%) had the best performance on further evaluation on the testing set, though linear discriminant analysis, gradient boosting classifier, and Naive Bayes classifier equally performed well (AUC > 0.94, F-score > 0.87, and B. Acc. > 86.0%).
Conclusions
Logistic regression and multi-layer perceptron classifiers have comparable predictive performances to the state-of-the-art model when octapeptide sequence descriptors consisting of AABP, bond composition and standard physicochemical properties are used as input variables. In our future work, we hope to develop a standalone software for HIV-1 protease cleavage site prediction utilizing the linear regression algorithm and the aforementioned octapeptide sequence descriptors.
Collapse
|
2
|
Hu L, Hu P, Luo X, Yuan X, You ZH. Incorporating the Coevolving Information of Substrates in Predicting HIV-1 Protease Cleavage Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2017-2028. [PMID: 31056514 DOI: 10.1109/tcbb.2019.2914208] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Human immunodeficiency virus 1 (HIV-1) protease (PR) plays a crucial role in the maturation of the virus. The study of substrate specificity of HIV-1 PR as a new endeavor strives to increase our ability to understand how HIV-1 PR recognizes its various cleavage sites. To predict HIV-1 PR cleavage sites, most of the existing approaches have been developed solely based on the homogeneity of substrate sequence information with supervised classification techniques. Although efficient, these approaches are found to be restricted to the ability of explaining their results and probably provide few insights into the mechanisms by which HIV-1 PR cleaves the substrates in a site-specific manner. In this work, a coevolutionary pattern-based prediction model for HIV-1 PR cleavage sites, namely EvoCleave, is proposed by integrating the coevolving information obtained from substrate sequences with a linear SVM classifier. The experiment results showed that EvoCleave yielded a very promising performance in terms of ROC analysis and f-measure. We also prospectively assessed the biological significance of coevolutionary patterns by applying them to study three fundamental issues of HIV-1 PR cleavage site. The analysis results demonstrated that the coevolutionary patterns offered valuable insights into the understanding of substrate specificity of HIV-1 PR.
Collapse
|
3
|
Singh D, Sisodia DS, Singh P. Multiobjective evolutionary-based multi-kernel learner for realizing transfer learning in the prediction of HIV-1 protease cleavage sites. Soft comput 2020. [DOI: 10.1007/s00500-019-04487-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
4
|
Singh D, Sisodia DS, Singh P. Compositional framework for multitask learning in the identification of cleavage sites of HIV-1 protease. J Biomed Inform 2020; 102:103376. [PMID: 31935461 DOI: 10.1016/j.jbi.2020.103376] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 12/19/2019] [Accepted: 01/08/2020] [Indexed: 11/18/2022]
Abstract
Inadequate patient samples and costly annotated data generations result into the smaller dataset in the biomedical domain. Due to which the predictions with a trained model that usually reveal a single small dataset association are fail to derive robust insights. To cope with the data sparsity, a promising strategy of combining data from the different related tasks is exercised in various application. Motivated by, successful work in the various bioinformatics application, we propose a multitask learning model based on multi-kernel that exploits the dependencies among various related tasks. This work aims to combine the knowledge from experimental studies of the different dataset to build stronger predictive models for HIV-1 protease cleavage sites prediction. In this study, a set of peptide data from one source is referred as 'task' and to integrate interactions from multiple tasks; our method exploits the common features and parameters sharing across the data source. The proposed framework uses feature integration, feature selection, multi-kernel and multifactorial evolutionary algorithm to model multitask learning. The framework considered seven different feature descriptors and four different kernel variants of support vector machines to form the optimal multi-kernel learning model. To validate the effectiveness of the model, the performance parameters such as average accuracy, and area under curve have been evaluated on the suggested model. We also carried out Friedman and post hoc statistical test to substantiate the significant improvement achieved by the proposed framework. The result obtained following the extensive experiment confirms the belief that multitask learning in cleavage site identification can improve the performance.
Collapse
Affiliation(s)
- Deepak Singh
- Department of Computer Science and Engineering, National Institute of Technology, Raipur, C.G, India.
| | - Dilip Singh Sisodia
- Department of Computer Science and Engineering, National Institute of Technology, Raipur, C.G, India.
| | - Pradeep Singh
- Department of Computer Science and Engineering, National Institute of Technology, Raipur, C.G, India.
| |
Collapse
|
5
|
Cognitive Framework for HIV-1 Protease Cleavage Site Classification Using Evolutionary Algorithm. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2019. [DOI: 10.1007/s13369-019-03871-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
6
|
Rentzsch R, Deneke C, Nitsche A, Renard BY. Predicting bacterial virulence factors - evaluation of machine learning and negative data strategies. Brief Bioinform 2019; 21:1596-1608. [PMID: 32978619 DOI: 10.1093/bib/bbz076] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 05/17/2019] [Accepted: 06/01/2019] [Indexed: 11/12/2022] Open
Abstract
Bacterial proteins dubbed virulence factors (VFs) are a highly diverse group of sequences, whose only obvious commonality is the very property of being, more or less directly, involved in virulence. It is therefore tempting to speculate whether their prediction, based on direct sequence similarity (seqsim) to known VFs, could be enhanced or even replaced by using machine-learning methods. Specifically, when trained on a large and diverse set of VFs, such may be able to detect putative, non-trivial characteristics shared by otherwise unrelated VF families and therefore better predict novel VFs with insignificant similarity to each individual family. We therefore first reassess the performance of dimer-based Support Vector Machines, as used in the widely used MP3 method, in light of seqsim-only and seqsim/dimer-hybrid classifiers. We then repeat the analysis with a novel, considerably more diverse data set, also addressing the important problem of negative data selection. Finally, we move on to the real-world use case of proteome-wide VF prediction, outlining different approaches to estimating specificity in this scenario. We find that direct seqsim is of unparalleled importance and therefore should always be exploited. Further, we observe strikingly low correlations between different feature and classifier types when ranking proteins by VF likeness. We therefore propose a 'best of each world' approach to prioritize proteins for experimental testing, focussing on the top predictions of each classifier. Further, classifiers for individual VF families should be developed.
Collapse
Affiliation(s)
- Robert Rentzsch
- Bioinformatics Unit (MF 1), Robert Koch Institute, Berlin.,Institute for Innovation and Technology (IIT), Steinplatz 1, Berlin
| | - Carlus Deneke
- Bioinformatics Unit (MF 1), Robert Koch Institute, Berlin.,Molecular Microbiology and Genome Analysis Unit, German Federal Institute for Risk Assessment, Berlin
| | - Andreas Nitsche
- Centre for Biological Threats and Special Pathogens: Highly Pathogenic Viruses (ZBS 1), Robert Koch Institute, Berlin
| | | |
Collapse
|
7
|
Evolutionary based ensemble framework for realizing transfer learning in HIV-1 Protease cleavage sites prediction. APPL INTELL 2018. [DOI: 10.1007/s10489-018-1323-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
8
|
Singh O, Su ECY. Prediction of HIV-1 protease cleavage site using a combination of sequence, structural, and physicochemical features. BMC Bioinformatics 2016; 17:478. [PMID: 28155640 PMCID: PMC5259813 DOI: 10.1186/s12859-016-1337-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Background The human immunodeficiency virus type 1 (HIV-1) aspartic protease is an important enzyme owing to its imperative part in viral development and a causative agent of deadliest disease known as acquired immune deficiency syndrome (AIDS). Development of HIV-1 protease inhibitors can help understand the specificity of substrates which can restrain the replication of HIV-1, thus antagonize AIDS. However, experimental methods in identification of HIV-1 protease cleavage sites are generally time-consuming and labor-intensive. Therefore, using computational methods to predict cleavage sites has become highly desirable. Results In this study, we propose a prediction method in which sequence, structural, and physicochemical features are incorporated in various machine learning algorithms. Then, a bidirectional stepwise selection algorithm is incorporated in feature selection to identify discriminative features. Further, only the selected features are calculated by various encoding schemes and used as input for decision trees, logistic regression, and artificial neural networks. Moreover, a more rigorous three-way data split procedure is applied to evaluate the objective performance of cleavage site prediction. Four benchmark datasets collected from previous studies are used to evaluate the predictive performance. Conclusions Experiment results showed that combinations of sequence, structure, and physicochemical features performed better than single feature type for identification of HIV-1 protease cleavage sites. In addition, incorporation of stepwise feature selection is effective to identify interpretable biological features to depict specificity of the substrates. Moreover, artificial neural networks perform significantly better than the other two classifiers. Finally, the proposed method achieved 80.0% ~ 97.4% in accuracy and 0.815 ~ 0.995 evaluated by independent test sets in a three-way data split procedure. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1337-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Onkar Singh
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Emily Chia-Yu Su
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan.
| |
Collapse
|
9
|
Koçak Y, Özyer T, Alhajj R. Utilizing maximal frequent itemsets and social network analysis for HIV data analysis. J Cheminform 2016. [PMCID: PMC5395515 DOI: 10.1186/s13321-016-0184-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Acquired immune deficiency syndrome is a deadly disease which is caused by human immunodeficiency virus (HIV). This virus attacks patients immune system and effects its ability to fight against diseases. Developing effective medicine requires understanding the life cycle and replication ability of the virus. HIV-1 protease enzyme is used to cleave an octamer peptide into peptides which are used to create proteins by the virus. In this paper, a novel feature extraction method is proposed for understanding important patterns in octamer’s cleavability. This feature extraction method is based on data mining techniques which are used to find important relations inside a dataset by comprehensively analyzing the given data. As demonstrated in this paper, using the extracted information in the classification process yields important results which may be taken into consideration when developing a new medicine. We have used 746 and 1625, Impens and schilling data instances from the 746-dataset. Besides, we have performed social network analysis as a complementary alternative method.
Collapse
|
10
|
Manning T, Walsh P. The importance of physicochemical characteristics and nonlinear classifiers in determining HIV-1 protease specificity. Bioengineered 2016; 7:65-78. [PMID: 27212259 PMCID: PMC4879986 DOI: 10.1080/21655979.2016.1149271] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Revised: 01/25/2016] [Accepted: 01/26/2016] [Indexed: 10/21/2022] Open
Abstract
This paper reviews recent research relating to the application of bioinformatics approaches to determining HIV-1 protease specificity, outlines outstanding issues, and presents a new approach to addressing these issues. Leading machine learning theory for the problem currently suggests that the direct encoding of the physicochemical properties of the amino acid substrates is not required for optimal performance. A number of amino acid encoding approaches which incorporate potentially relevant physicochemical properties of the substrate are identified, and are evaluated using a nonlinear task decomposition based neuroevolution algorithm. The results are evaluated, and compared against a recent benchmark set on a nonlinear classifier using only amino acid sequence and identity information. Ensembles of these nonlinear classifiers using the physicochemical properties of the substrate are demonstrated to consistently outperform the recently published state-of-the-art linear support vector machine based approach in out-of-sample evaluations.
Collapse
Affiliation(s)
- Timmy Manning
- Department of Computer Science, Cork Institute of Technology, Cork, Ireland
| | - Paul Walsh
- Department of Computer Science, Cork Institute of Technology, Cork, Ireland
- NSilico Ltd, Rubicon Innovation Center, Cork, Ireland
| |
Collapse
|
11
|
Rögnvaldsson T, You L, Garwicz D. State of the art prediction of HIV-1 protease cleavage sites. Bioinformatics 2014; 31:1204-10. [PMID: 25504647 DOI: 10.1093/bioinformatics/btu810] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Accepted: 12/04/2014] [Indexed: 02/01/2023] Open
Abstract
MOTIVATION Understanding the substrate specificity of human immunodeficiency virus (HIV)-1 protease is important when designing effective HIV-1 protease inhibitors. Furthermore, characterizing and predicting the cleavage profile of HIV-1 protease is essential to generate and test hypotheses of how HIV-1 affects proteins of the human host. Currently available tools for predicting cleavage by HIV-1 protease can be improved. RESULTS The linear support vector machine with orthogonal encoding is shown to be the best predictor for HIV-1 protease cleavage. It is considerably better than current publicly available predictor services. It is also found that schemes using physicochemical properties do not improve over the standard orthogonal encoding scheme. Some issues with the currently available data are discussed. AVAILABILITY AND IMPLEMENTATION The datasets used, which are the most important part, are available at the UCI Machine Learning Repository. The tools used are all standard and easily available. CONTACT thorsteinn.rognvaldsson@hh.se.
Collapse
Affiliation(s)
- Thorsteinn Rögnvaldsson
- CAISR, School of Information Science, Computer and Electrical Engineering, Halmstad University, Halmstad, Sweden and Division of Clinical Chemistry and Pharmacology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Liwen You
- CAISR, School of Information Science, Computer and Electrical Engineering, Halmstad University, Halmstad, Sweden and Division of Clinical Chemistry and Pharmacology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Daniel Garwicz
- CAISR, School of Information Science, Computer and Electrical Engineering, Halmstad University, Halmstad, Sweden and Division of Clinical Chemistry and Pharmacology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| |
Collapse
|
12
|
Dybowski JN, Riemenschneider M, Hauke S, Pyka M, Verheyen J, Hoffmann D, Heider D. Improved Bevirimat resistance prediction by combination of structural and sequence-based classifiers. BioData Min 2011; 4:26. [PMID: 22082002 PMCID: PMC3248369 DOI: 10.1186/1756-0381-4-26] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2011] [Accepted: 11/14/2011] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Maturation inhibitors such as Bevirimat are a new class of antiretroviral drugs that hamper the cleavage of HIV-1 proteins into their functional active forms. They bind to these preproteins and inhibit their cleavage by the HIV-1 protease, resulting in non-functional virus particles. Nevertheless, there exist mutations in this region leading to resistance against Bevirimat. Highly specific and accurate tools to predict resistance to maturation inhibitors can help to identify patients, who might benefit from the usage of these new drugs. RESULTS We tested several methods to improve Bevirimat resistance prediction in HIV-1. It turned out that combining structural and sequence-based information in classifier ensembles led to accurate and reliable predictions. Moreover, we were able to identify the most crucial regions for Bevirimat resistance computationally, which are in line with experimental results from other studies. CONCLUSIONS Our analysis demonstrated the use of machine learning techniques to predict HIV-1 resistance against maturation inhibitors such as Bevirimat. New maturation inhibitors are already under development and might enlarge the arsenal of antiretroviral drugs in the future. Thus, accurate prediction tools are very useful to enable a personalized therapy.
Collapse
Affiliation(s)
- J Nikolaj Dybowski
- Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr, 2, 45117 Essen, Germany.
| | | | | | | | | | | | | |
Collapse
|
13
|
Fuzzy clustering of physicochemical and biochemical properties of amino acids. Amino Acids 2011; 43:583-94. [PMID: 21993537 PMCID: PMC3397137 DOI: 10.1007/s00726-011-1106-9] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2011] [Accepted: 09/23/2011] [Indexed: 12/03/2022]
Abstract
In this article, we categorize presently available experimental and theoretical knowledge of various physicochemical and biochemical features of amino acids, as collected in the AAindex database of known 544 amino acid (AA) indices. Previously reported 402 indices were categorized into six groups using hierarchical clustering technique and 142 were left unclustered. However, due to the increasing diversity of the database these indices are overlapping, therefore crisp clustering method may not provide optimal results. Moreover, in various large-scale bioinformatics analyses of whole proteomes, the proper selection of amino acid indices representing their biological significance is crucial for efficient and error-prone encoding of the short functional sequence motifs. In most cases, researchers perform exhaustive manual selection of the most informative indices. These two facts motivated us to analyse the widely used AA indices. The main goal of this article is twofold. First, we present a novel method of partitioning the bioinformatics data using consensus fuzzy clustering, where the recently proposed fuzzy clustering techniques are exploited. Second, we prepare three high quality subsets of all available indices. Superiority of the consensus fuzzy clustering method is demonstrated quantitatively, visually and statistically by comparing it with the previously proposed hierarchical clustered results. The processed AAindex1 database, supplementary material and the software are available at http://sysbio.icm.edu.pl/aaindex/.
Collapse
|
14
|
Heider D, Verheyen J, Hoffmann D. Machine learning on normalized protein sequences. BMC Res Notes 2011; 4:94. [PMID: 21453485 PMCID: PMC3079662 DOI: 10.1186/1756-0500-4-94] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2010] [Accepted: 03/31/2011] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Machine learning techniques have been widely applied to biological sequences, e.g. to predict drug resistance in HIV-1 from sequences of drug target proteins and protein functional classes. As deletions and insertions are frequent in biological sequences, a major limitation of current methods is the inability to handle varying sequence lengths. FINDINGS We propose to normalize sequences to uniform length. To this end, we tested one linear and four different non-linear interpolation methods for the normalization of sequence lengths of 19 classification datasets. Classification tasks included prediction of HIV-1 drug resistance from drug target sequences and sequence-based prediction of protein function. We applied random forests to the classification of sequences into "positive" and "negative" samples. Statistical tests showed that the linear interpolation outperforms the non-linear interpolation methods in most of the analyzed datasets, while in a few cases non-linear methods had a small but significant advantage. Compared to other published methods, our prediction scheme leads to an improvement in prediction accuracy by up to 14%. CONCLUSIONS We found that machine learning on sequences normalized by simple linear interpolation gave better or at least competitive results compared to state-of-the-art procedures, and thus, is a promising alternative to existing methods, especially for protein sequences of variable length.
Collapse
Affiliation(s)
- Dominik Heider
- Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| | - Jens Verheyen
- Institute of Virology, University of Cologne, Fuerst-Pueckler-Str. 56, 50935 Cologne, Germany
| | - Daniel Hoffmann
- Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| |
Collapse
|
15
|
Artificial intelligence systems based on texture descriptors for vaccine development. Amino Acids 2010; 40:443-51. [DOI: 10.1007/s00726-010-0654-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2009] [Accepted: 06/03/2010] [Indexed: 10/19/2022]
|
16
|
Prediction of co-receptor usage of HIV-1 from genotype. PLoS Comput Biol 2010; 6:e1000743. [PMID: 20419152 PMCID: PMC2855328 DOI: 10.1371/journal.pcbi.1000743] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2009] [Accepted: 03/16/2010] [Indexed: 11/27/2022] Open
Abstract
Human Immunodeficiency Virus 1 uses for entry into host cells a receptor (CD4) and one of two co-receptors (CCR5 or CXCR4). Recently, a new class of antiretroviral drugs has entered clinical practice that specifically bind to the co-receptor CCR5, and thus inhibit virus entry. Accurate prediction of the co-receptor used by the virus in the patient is important as it allows for personalized selection of effective drugs and prognosis of disease progression. We have investigated whether it is possible to predict co-receptor usage accurately by analyzing the amino acid sequence of the main determinant of co-receptor usage, i.e., the third variable loop V3 of the gp120 protein. We developed a two-level machine learning approach that in the first level considers two different properties important for protein-protein binding derived from structural models of V3 and V3 sequences. The second level combines the two predictions of the first level. The two-level method predicts usage of CXCR4 co-receptor for new V3 sequences within seconds, with an area under the ROC curve of 0.937±0.004. Moreover, it is relatively robust against insertions and deletions, which frequently occur in V3. The approach could help clinicians to find optimal personalized treatments, and it offers new insights into the molecular basis of co-receptor usage. For instance, it quantifies the importance for co-receptor usage of a pocket that probably is responsible for binding sulfated tyrosine. Human Immunodeficiency Virus is the pathogen causing the disease AIDS. A precondition for virus entry into human cells is the contact of its glycoprotein gp120 with two cellular proteins, a receptor and a co-receptor. Depending on the viral strain, one specific co-receptor is used. The type of co-receptor used is crucial for the aggressiveness of the viral strain and the available treatment options. Hence, it is important to identify which co-receptor is used by the virus in an individual patient. Since the genome of the virus in the patient can be readily sequenced, and thus the composition of the viral proteins be determined, it could be possible to predict co-receptor usage from the viral genome sequences. To this end, we developed a method that is motivated by the insight that physical properties of gp120 will determine its specificity for a co-receptor. The method learns a computational model from structures and sequences of a crucial part of gp120, and the corresponding experimentally measured co-receptor usage. It then employs the model to predict co-receptor usage for new sequences. The high accuracy of the method could make it helpful for diagnosis and suggests that the model captures the determinants of co-receptor usage.
Collapse
|
17
|
Heider D, Verheyen J, Hoffmann D. Predicting Bevirimat resistance of HIV-1 from genotype. BMC Bioinformatics 2010; 11:37. [PMID: 20089140 PMCID: PMC3224585 DOI: 10.1186/1471-2105-11-37] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2009] [Accepted: 01/20/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Maturation inhibitors are a new class of antiretroviral drugs. Bevirimat (BVM) was the first substance in this class of inhibitors entering clinical trials. While the inhibitory function of BVM is well established, the molecular mechanisms of action and resistance are not well understood. It is known that mutations in the regions CS p24/p2 and p2 can cause phenotypic resistance to BVM. We have investigated a set of p24/p2 sequences of HIV-1 of known phenotypic resistance to BVM to test whether BVM resistance can be predicted from sequence, and to identify possible molecular mechanisms of BVM resistance in HIV-1. RESULTS We used artificial neural networks and random forests with different descriptors for the prediction of BVM resistance. Random forests with hydrophobicity as descriptor performed best and classified the sequences with an area under the Receiver Operating Characteristics (ROC) curve of 0.93 +/- 0.001. For the collected data we find that p2 sequence positions 369 to 376 have the highest impact on resistance, with positions 370 and 372 being particularly important. These findings are in partial agreement with other recent studies. Apart from the complex machine learning models we derived a number of simple rules that predict BVM resistance from sequence with surprising accuracy. According to computational predictions based on the data set used, cleavage sites are usually not shifted by resistance mutations. However, we found that resistance mutations could shorten and weaken the alpha-helix in p2, which hints at a possible resistance mechanism. CONCLUSIONS We found that BVM resistance of HIV-1 can be predicted well from the sequence of the p2 peptide, which may prove useful for personalized therapy if maturation inhibitors reach clinical practice. Results of secondary structure analysis are compatible with a possible route to BVM resistance in which mutations weaken a six-helix bundle discovered in recent experiments, and thus ease Gag cleavage by the retroviral protease.
Collapse
Affiliation(s)
- Dominik Heider
- Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| | - Jens Verheyen
- Institute of Virology, University of Cologne, Fuerst-Pueckler-Str. 56, 50935 Cologne, Germany
| | - Daniel Hoffmann
- Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| |
Collapse
|
18
|
Nanni L, Lumini A. Coding of amino acids by texture descriptors. Artif Intell Med 2010; 48:43-50. [PMID: 19892537 DOI: 10.1016/j.artmed.2009.10.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2008] [Revised: 09/24/2009] [Accepted: 10/03/2009] [Indexed: 11/26/2022]
|
19
|
Variable context Markov chains for HIV protease cleavage site prediction. Biosystems 2009; 96:246-50. [DOI: 10.1016/j.biosystems.2009.03.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2009] [Revised: 03/03/2009] [Accepted: 03/12/2009] [Indexed: 11/19/2022]
|