1
|
Weckbecker M, Anžel A, Yang Z, Hattab G. Interpretable molecular encodings and representations for machine learning tasks. Comput Struct Biotechnol J 2024; 23:2326-2336. [PMID: 38867722 PMCID: PMC11167246 DOI: 10.1016/j.csbj.2024.05.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 05/13/2024] [Accepted: 05/19/2024] [Indexed: 06/14/2024] Open
Abstract
Molecular encodings and their usage in machine learning models have demonstrated significant breakthroughs in biomedical applications, particularly in the classification of peptides and proteins. To this end, we propose a new encoding method: Interpretable Carbon-based Array of Neighborhoods (iCAN). Designed to address machine learning models' need for more structured and less flexible input, it captures the neighborhoods of carbon atoms in a counting array and improves the utility of the resulting encodings for machine learning models. The iCAN method provides interpretable molecular encodings and representations, enabling the comparison of molecular neighborhoods, identification of repeating patterns, and visualization of relevance heat maps for a given data set. When reproducing a large biomedical peptide classification study, it outperforms its predecessor encoding. When extended to proteins, it outperforms a lead structure-based encoding on 71% of the data sets. Our method offers interpretable encodings that can be applied to all organic molecules, including exotic amino acids, cyclic peptides, and larger proteins, making it highly versatile across various domains and data sets. This work establishes a promising new direction for machine learning in peptide and protein classification in biomedicine and healthcare, potentially accelerating advances in drug discovery and disease diagnosis.
Collapse
Affiliation(s)
- Moritz Weckbecker
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
| | - Aleksandar Anžel
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
| | - Zewen Yang
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
| | - Georges Hattab
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
- Department of Mathematics and Computer science Freie Universität, Arnimallee 14, Berlin, 14195, Berlin, Germany
| |
Collapse
|
2
|
Borkenhagen LK, Allen MW, Runstadler JA. Influenza virus genotype to phenotype predictions through machine learning: a systematic review. Emerg Microbes Infect 2021; 10:1896-1907. [PMID: 34498543 PMCID: PMC8462836 DOI: 10.1080/22221751.2021.1978824] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background: There is great interest in understanding the viral genomic predictors of phenotypic traits that allow influenza A viruses to adapt to or become more virulent in different hosts. Machine learning techniques have demonstrated promise in addressing this critical need for other pathogens because the underlying algorithms are especially well equipped to uncover complex patterns in large datasets and produce generalizable predictions for new data. As the body of research where these techniques are applied for influenza A virus phenotype prediction continues to grow, it is useful to consider the strengths and weaknesses of these approaches to understand what has prevented these models from seeing widespread use by surveillance laboratories and to identify gaps that are underexplored with this technology. Methods and Results: We present a systematic review of English literature published through 15 April 2021 of studies employing machine learning methods to generate predictions of influenza A virus phenotypes from genomic or proteomic input. Forty-nine studies were included in this review, spanning the topics of host discrimination, human adaptability, subtype and clade assignment, pandemic lineage assignment, characteristics of infection, and antiviral drug resistance. Conclusions: Our findings suggest that biases in model design and a dearth of wet laboratory follow-up may explain why these models often go underused. We, therefore, offer guidance to overcome these limitations, aid in improving predictive models of previously studied influenza A virus phenotypes, and extend those models to unexplored phenotypes in the ultimate pursuit of tools to enable the characterization of virus isolates across surveillance laboratories.
Collapse
Affiliation(s)
- Laura K Borkenhagen
- Department of Infectious Disease and Global Health, Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA, USA
| | - Martin W Allen
- Department of Computer Science, School of Engineering, Tufts University, Medford, MA, USA
| | - Jonathan A Runstadler
- Department of Infectious Disease and Global Health, Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA, USA
| |
Collapse
|
3
|
Comparative analyses of error handling strategies for next-generation sequencing in precision medicine. Sci Rep 2020; 10:5750. [PMID: 32238883 PMCID: PMC7113248 DOI: 10.1038/s41598-020-62675-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 03/18/2020] [Indexed: 11/21/2022] Open
Abstract
Next-generation sequencing (NGS) offers the opportunity to sequence millions and billions of DNA sequences in a short period, leading to novel applications in personalized medicine, such as cancer diagnostics or antiviral therapy. Nevertheless, sequencing technologies have different error rates, which occur during the sequencing process. If the NGS data is used for diagnostics, these sequences with errors are typically neglected or a worst-case scenario is assumed. In the current study, we focused on the impact of ambiguous bases on therapy recommendations for Human Immunodeficiency Virus 1 (HIV-1) patients. Concretely, we analyzed the treatment recommendation with entry blockers based on prediction models for co-receptor tropism. We compared three different error handling strategies that have been used in the literature, namely (i) neglection, (ii) worst-case assumption, and (iii) deconvolution with a majority vote. We could show that for two or more ambiguous positions per sequence a reliable prediction is generally no longer possible. Moreover, also the position of ambiguity plays a crucial role. Thus, we analyzed the error probability distributions of existing sequencing technologies, e.g., Illumina MiSeq or PacBio, with respect to the aforementioned error handling strategies and it turned out that neglection outperforms the other strategies in the case where no systematic errors are present. In other cases, the deconvolution strategy with the majority vote should be preferred.
Collapse
|
4
|
Löchel HF, Riemenschneider M, Frishman D, Heider D. SCOTCH: subtype A coreceptor tropism classification in HIV-1. Bioinformatics 2019; 34:2575-2580. [PMID: 29554213 DOI: 10.1093/bioinformatics/bty170] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 03/14/2018] [Indexed: 01/25/2023] Open
Abstract
Motivation The V3 loop of the gp120 glycoprotein of the Human Immunodeficiency Virus 1 (HIV-1) is considered to be responsible for viral coreceptor tropism. gp120 interacts with the CD4 receptor of the host cell and subsequently V3 binds either CCR5 or CXCR4. Due to the fact that the CCR5 coreceptor is targeted by entry inhibitors, a reliable prediction of the coreceptor usage of HIV-1 is of great interest for antiretroviral therapy. Although several methods for the prediction of coreceptor tropism are available, almost all of them have been developed based on only subtype B sequences, and it has been shown in several studies that the prediction of non-B sequences, in particular subtype A sequences, are less reliable. Thus, the aim of the current study was to develop a reliable prediction model for subtype A viruses. Results Our new model SCOTCH is based on a stacking approach of classifier ensembles and shows a significantly better performance for subtype A sequences compared to other available models. In particular for low false positive rates (between 0.05 and 0.2, i.e. recommendation in the German and European Guidelines for tropism prediction), SCOTCH shows significantly better prediction performances in terms of partial area under the curves and diagnostic odds ratios compared to existing tools, and thus can be used to reliably predict coreceptor tropism for subtype A sequences. Availability and implementation SCOTCH can be downloaded/accessed at http://www.heiderlab.de.
Collapse
Affiliation(s)
- Hannah F Löchel
- Department of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| | | | - Dmitrij Frishman
- Department of Genome-Oriented Bioinformatics, Technical University of Munich, Freising, Germany.,Laboratory of Bioinformatics, St. Petersburg State Polytechnic University, St. Petersburg, Russia
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| |
Collapse
|
5
|
Spänig S, Heider D. Encodings and models for antimicrobial peptide classification for multi-resistant pathogens. BioData Min 2019; 12:7. [PMID: 30867681 PMCID: PMC6399931 DOI: 10.1186/s13040-019-0196-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 02/24/2019] [Indexed: 01/10/2023] Open
Abstract
Antimicrobial peptides (AMPs) are part of the inherent immune system. In fact, they occur in almost all organisms including, e.g., plants, animals, and humans. Remarkably, they show effectivity also against multi-resistant pathogens with a high selectivity. This is especially crucial in times, where society is faced with the major threat of an ever-increasing amount of antibiotic resistant microbes. In addition, AMPs can also exhibit antitumor and antiviral effects, thus a variety of scientific studies dealt with the prediction of active peptides in recent years. Due to their potential, even the pharmaceutical industry is keen on discovering and developing novel AMPs. However, AMPs are difficult to verify in vitro, hence researchers conduct sequence similarity experiments against known, active peptides. Unfortunately, this approach is very time-consuming and limits potential candidates to sequences with a high similarity to known AMPs. Machine learning methods offer the opportunity to explore the huge space of sequence variations in a timely manner. These algorithms have, in principal, paved the way for an automated discovery of AMPs. However, machine learning models require a numerical input, thus an informative encoding is very important. Unfortunately, developing an appropriate encoding is a major challenge, which has not been entirely solved so far. For this reason, the development of novel amino acid encodings is established as a stand-alone research branch. The present review introduces state-of-the-art encodings of amino acids as well as their properties in sequence and structure based aggregation. Moreover, albeit a well-chosen encoding is essential, performant classifiers are required, which is reflected by a tendency towards specifically designed models in the literature. Furthermore, we introduce these models with a particular focus on encodings derived from support vector machines and deep learning approaches. Albeit a strong focus has been set on AMP predictions, not all of the mentioned encodings have been elaborated as part of antimicrobial research studies, but rather as general protein or peptide representations.
Collapse
Affiliation(s)
- Sebastian Spänig
- Department of Bioinformatics, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| | - Dominik Heider
- Department of Bioinformatics, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| |
Collapse
|
6
|
Pan Y, Liu H, Metsch LR, Feaster DJ. Factors Associated with HIV Testing Among Participants from Substance Use Disorder Treatment Programs in the US: A Machine Learning Approach. AIDS Behav 2017; 21:534-546. [PMID: 27933461 PMCID: PMC5583728 DOI: 10.1007/s10461-016-1628-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
HIV testing is the foundation for consolidated HIV treatment and prevention. In this study, we aim to discover the most relevant variables for predicting HIV testing uptake among substance users in substance use disorder treatment programs by applying random forest (RF), a robust multivariate statistical learning method. We also provide a descriptive introduction to this method for those who are unfamiliar with it. We used data from the National Institute on Drug Abuse Clinical Trials Network HIV testing and counseling study (CTN-0032). A total of 1281 HIV-negative or status unknown participants from 12 US community-based substance use disorder treatment programs were included and were randomized into three HIV testing and counseling treatment groups. The a priori primary outcome was self-reported receipt of HIV test results. Classification accuracy of RF was compared to logistic regression, a standard statistical approach for binary outcomes. Variable importance measures for the RF model were used to select the most relevant variables. RF based models produced much higher classification accuracy than those based on logistic regression. Treatment group is the most important predictor among all covariates, with a variable importance index of 12.9%. RF variable importance revealed that several types of condomless sex behaviors, condom use self-efficacy and attitudes towards condom use, and level of depression are the most important predictors of receipt of HIV testing results. There is a non-linear negative relationship between count of condomless sex acts and the receipt of HIV testing. In conclusion, RF seems promising in discovering important factors related to HIV testing uptake among large numbers of predictors and should be encouraged in future HIV prevention and treatment research and intervention program evaluations.
Collapse
Affiliation(s)
- Yue Pan
- Division of Epidemiology, Department of Public Health Sciences, University of Miami Miller School of Medicine, 1120 N.W. 14th ST, Miami, FL, 33136, USA.
| | - Hongmei Liu
- Division of Biostatistics, Department of Public Health Sciences, University of Miami Miller School of Medicine, 1120 N.W. 14th ST, Miami, FL, 33136, USA
| | - Lisa R Metsch
- Department of Sociomedical Sciences, Mailman School of Public Health, Columbia University, 722 W 168th ST, New York, NY, 10032, USA
| | - Daniel J Feaster
- Division of Biostatistics, Department of Public Health Sciences, University of Miami Miller School of Medicine, 1120 N.W. 14th ST, Miami, FL, 33136, USA
| |
Collapse
|
7
|
Riemenschneider M, Hummel T, Heider D. SHIVA - a web application for drug resistance and tropism testing in HIV. BMC Bioinformatics 2016; 17:314. [PMID: 27549230 PMCID: PMC4994198 DOI: 10.1186/s12859-016-1179-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 08/11/2016] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Drug resistance testing is mandatory in antiretroviral therapy in human immunodeficiency virus (HIV) infected patients for successful treatment. The emergence of resistances against antiretroviral agents remains the major obstacle in inhibition of viral replication and thus to control infection. Due to the high mutation rate the virus is able to adapt rapidly under drug pressure leading to the evolution of resistant variants and finally to therapy failure. RESULTS We developed a web service for drug resistance prediction of commonly used drugs in antiretroviral therapy, i.e., protease inhibitors (PIs), reverse transcriptase inhibitors (NRTIs and NNRTIs), and integrase inhibitors (INIs), but also for the novel drug class of maturation inhibitors. Furthermore, co-receptor tropism (CCR5 or CXCR4) can be predicted as well, which is essential for treatment with entry inhibitors, such as Maraviroc. Currently, SHIVA provides 24 prediction models for several drug classes. SHIVA can be used with single RNA/DNA or amino acid sequences, but also with large amounts of next-generation sequencing data and allows prediction of a user specified selection of drugs simultaneously. Prediction results are provided as clinical reports which are sent via email to the user. CONCLUSIONS SHIVA represents a novel high performing alternative for hitherto developed drug resistance testing approaches able to process data derived from next-generation sequencing technologies. SHIVA is publicly available via a user-friendly web interface.
Collapse
Affiliation(s)
- Mona Riemenschneider
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315, Germany.,University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354, Germany
| | - Thomas Hummel
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315, Germany.,University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354, Germany
| | - Dominik Heider
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315, Germany. .,University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354, Germany. .,Wissenschaftszentrum Weihenstephan, Technische Universität München, Alte Akademie 8, Freising, 85354, Germany.
| |
Collapse
|
8
|
Pandey SS, Cherian S, Thakar M, Paranjape RS. Short Communication: Phylogenetic and Molecular Characterization of Six Full-Length HIV-1 Genomes from India Reveals a Monophyletic Lineage of Indian Sub-Subtype A1. AIDS Res Hum Retroviruses 2016; 32:489-502. [PMID: 26756665 DOI: 10.1089/aid.2015.0207] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Although HIV-1 epidemic in India is mainly driven by subtype C, subtype A has been reported for over two decades. This is the first comprehensive analysis of sequences of HIV-1 subtype A from India, based on the near full-length genome sequences of six different HIV-1 subtype A Indian isolates along with available partial gene sequences from India and global sequences. The phylogenetic analyses revealed the convergence of all Indian whole-genome sequences and majority of the partial gene sequences to a single node with the sequences most closely related to African sub-subtype A1. The presence of the signature motifs consistent with those observed in subtype A and CTL epitopes characterized specifically for subtype A1 were observed among the study sequences. Deletion of LY amino acid of LYPXnL motif of p6gag and one amino acid in V3 loop have been observed among the study isolates, which have also been observed in a few sequences from East Africa. Overall, the results are indicative of a monophyletic lineage or founder effect of the Indian epidemic due to sub-subtype A1 and supportive of a possible migration of subtype A1 into India from East Africa.
Collapse
Affiliation(s)
| | - Sarah Cherian
- Bioinformatics Group, National Institute of Virology (ICMR), Pune, India
| | - Madhuri Thakar
- Department of Immunology, National AIDS Research Institute (ICMR), Pune, India
| | - Ramesh S. Paranjape
- Department of Immunology, National AIDS Research Institute (ICMR), Pune, India
| |
Collapse
|
9
|
Genotypic Prediction of Co-receptor Tropism of HIV-1 Subtypes A and C. Sci Rep 2016; 6:24883. [PMID: 27126912 PMCID: PMC4850382 DOI: 10.1038/srep24883] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 04/07/2016] [Indexed: 02/06/2023] Open
Abstract
Antiretroviral treatment of Human Immunodeficiency Virus type-1 (HIV-1) infections with CCR5-antagonists requires the co-receptor usage prediction of viral strains. Currently available tools are mostly designed based on subtype B strains and thus are in general not applicable to non-B subtypes. However, HIV-1 infections caused by subtype B only account for approximately 11% of infections worldwide. We evaluated the performance of several sequence-based algorithms for co-receptor usage prediction employed on subtype A V3 sequences including circulating recombinant forms (CRFs) and subtype C strains. We further analysed sequence profiles of gp120 regions of subtype A, B and C to explore functional relationships to entry phenotypes. Our analyses clearly demonstrate that state-of-the-art algorithms are not useful for predicting co-receptor tropism of subtype A and its CRFs. Sequence profile analysis of gp120 revealed molecular variability in subtype A viruses. Especially, the V2 loop region could be associated with co-receptor tropism, which might indicate a unique pattern that determines co-receptor tropism in subtype A strains compared to subtype B and C strains. Thus, our study demonstrates that there is a need for the development of novel algorithms facilitating tropism prediction of HIV-1 subtype A to improve effective antiretroviral treatment in patients.
Collapse
|
10
|
Riemenschneider M, Senge R, Neumann U, Hüllermeier E, Heider D. Exploiting HIV-1 protease and reverse transcriptase cross-resistance information for improved drug resistance prediction by means of multi-label classification. BioData Min 2016; 9:10. [PMID: 26933450 PMCID: PMC4772363 DOI: 10.1186/s13040-016-0089-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 02/20/2016] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Antiretroviral therapy is essential for human immunodeficiency virus (HIV) infected patients to inhibit viral replication and therewith to slow progression of disease and prolong a patient's life. However, the high mutation rate of HIV can lead to a fast adaptation of the virus under drug pressure and thereby to the evolution of resistant variants. In turn, these variants will lead to the failure of antiretroviral treatment. Moreover, these mutations cannot only lead to resistance against single drugs, but also to cross-resistance, i.e., resistance against drugs that have not yet been applied. METHODS 662 protease sequences and 715 reverse transcriptase sequences with complete resistance profiles were analyzed using machine learning techniques, namely binary relevance classifiers, classifier chains, and ensembles of classifier chains. RESULTS In our study, we applied multi-label classification models incorporating cross-resistance information to predict drug resistance for two of the major drug classes used in antiretroviral therapy for HIV-1, namely protease inhibitors (PIs) and non-nucleoside reverse transcriptase inhibitors (NNRTIs). By means of multi-label learning, namely classifier chains (CCs) and ensembles of classifier chains (ECCs), we were able to improve overall prediction accuracy for all drugs compared to hitherto applied binary classification models. CONCLUSIONS The development of fast and precise models to predict drug resistance in HIV-1 is highly important to enable a highly effective personalized therapy. Cross-resistance information can be exploited to improve prediction accuracy of computational drug resistance models.
Collapse
Affiliation(s)
- Mona Riemenschneider
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315 Germany ; University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354 Germany
| | - Robin Senge
- Department of Computer Science, University of Paderborn, Pohlweg 47, Paderborn, 33098 Germany
| | - Ursula Neumann
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315 Germany ; Wissenschaftszentrum Weihenstephan, Technische Universität München, Alte Akademie 8, Freising, 85354 Germany ; University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354 Germany
| | - Eyke Hüllermeier
- Department of Computer Science, University of Paderborn, Pohlweg 47, Paderborn, 33098 Germany
| | - Dominik Heider
- Department of Bioinformatics, Straubing Center of Science, Petersgasse 18, Straubing, 94315 Germany ; Wissenschaftszentrum Weihenstephan, Technische Universität München, Alte Akademie 8, Freising, 85354 Germany ; University of Applied Science Weihenstephan-Triesdorf, Am Hofgarten 4, Freising, 85354 Germany
| |
Collapse
|
11
|
Shen HS, Yin J, Leng F, Teng RF, Xu C, Xia XY, Pan XM. HIV coreceptor tropism determination and mutational pattern identification. Sci Rep 2016; 6:21280. [PMID: 26883082 PMCID: PMC4756667 DOI: 10.1038/srep21280] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Accepted: 01/20/2016] [Indexed: 12/20/2022] Open
Abstract
In the early stages of infection, Human Immunodeficiency Virus Type 1 (HIV-1) generally selects CCR5 as the primary coreceptor for entering the host cell. As infection progresses, the virus evolves and may exhibit a coreceptor-switch to CXCR4. Accurate determination coreceptor usage and identification key mutational patterns associated tropism switch are essential for selection of appropriate therapies and understanding mechanism of coreceptor change. We developed a classifier composed of two coreceptor-specific weight matrices (CMs) based on a full-scale dataset. For this classifier, we found an AUC of 0.97, an accuracy of 95.21% and an MCC of 0.885 (sensitivity 92.92%; specificity 95.54%) in a ten-fold cross-validation, outperforming all other methods on an independent dataset (13% higher MCC value than geno2pheno and 15% higher MCC value than PSSM). A web server (http://spg.med.tsinghua.edu.cn/CM.html) based on our classifier was provided. Patterns of genetic mutations that occur along with coreceptor transitions were further identified based on the score of each sequence. Six pairs of one-AA mutational patterns and three pairs of two-AA mutational patterns were identified to associate with increasing propensity for X4 tropism. These mutational patterns offered new insights into the mechanism of coreceptor switch and aided in monitoring coreceptor switch.
Collapse
Affiliation(s)
- Hui-Shuang Shen
- The Key Laboratory of Bioinformatics, Ministry of Education, School of Life Sciences, Tsinghua University, China
| | - Jason Yin
- Department of Biostatistics, Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Fei Leng
- The Key Laboratory of Bioinformatics, Ministry of Education, School of Life Sciences, Tsinghua University, China
| | - Rui-Fang Teng
- The Key Laboratory of Bioinformatics, Ministry of Education, School of Life Sciences, Tsinghua University, China
| | - Chao Xu
- The Key Laboratory of Bioinformatics, Ministry of Education, School of Life Sciences, Tsinghua University, China
| | - Xia-Yu Xia
- The Key Laboratory of Bioinformatics, Ministry of Education, School of Life Sciences, Tsinghua University, China
| | - Xian-Ming Pan
- The Key Laboratory of Bioinformatics, Ministry of Education, School of Life Sciences, Tsinghua University, China
| |
Collapse
|
12
|
Kieslich CA, Tamamis P, Guzman YA, Onel M, Floudas CA. Highly Accurate Structure-Based Prediction of HIV-1 Coreceptor Usage Suggests Intermolecular Interactions Driving Tropism. PLoS One 2016; 11:e0148974. [PMID: 26859389 PMCID: PMC4747591 DOI: 10.1371/journal.pone.0148974] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Accepted: 01/26/2016] [Indexed: 01/21/2023] Open
Abstract
HIV-1 entry into host cells is mediated by interactions between the V3-loop of viral glycoprotein gp120 and chemokine receptor CCR5 or CXCR4, collectively known as HIV-1 coreceptors. Accurate genotypic prediction of coreceptor usage is of significant clinical interest and determination of the factors driving tropism has been the focus of extensive study. We have developed a method based on nonlinear support vector machines to elucidate the interacting residue pairs driving coreceptor usage and provide highly accurate coreceptor usage predictions. Our models utilize centroid-centroid interaction energies from computationally derived structures of the V3-loop:coreceptor complexes as primary features, while additional features based on established rules regarding V3-loop sequences are also investigated. We tested our method on 2455 V3-loop sequences of various lengths and subtypes, and produce a median area under the receiver operator curve of 0.977 based on 500 runs of 10-fold cross validation. Our study is the first to elucidate a small set of specific interacting residue pairs between the V3-loop and coreceptors capable of predicting coreceptor usage with high accuracy across major HIV-1 subtypes. The developed method has been implemented as a web tool named CRUSH, CoReceptor USage prediction for HIV-1, which is available at http://ares.tamu.edu/CRUSH/.
Collapse
Affiliation(s)
- Chris A Kieslich
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America.,Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
| | - Phanourios Tamamis
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America.,Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
| | - Yannis A Guzman
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America.,Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America.,Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, United States of America
| | - Melis Onel
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America.,Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
| | - Christodoulos A Floudas
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America.,Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
| |
Collapse
|
13
|
Budeus B, Timm J, Hoffmann D. SeqFeatR for the Discovery of Feature-Sequence Associations. PLoS One 2016; 11:e0146409. [PMID: 26731669 PMCID: PMC4701496 DOI: 10.1371/journal.pone.0146409] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 12/15/2015] [Indexed: 12/20/2022] Open
Abstract
Specific selection pressures often lead to specifically mutated genomes. The open source software SeqFeatR has been developed to identify associations between mutation patterns in biological sequences and specific selection pressures ("features"). For instance, SeqFeatR has been used to discover in viral protein sequences new T cell epitopes for hosts of given HLA types. SeqFeatR supports frequentist and Bayesian methods for the discovery of statistical sequence-feature associations. Moreover, it offers novel ways to visualize results of the statistical analyses and to relate them to further properties. In this article we demonstrate various functions of SeqFeatR with real data. The most frequently used set of functions is also provided by a web server. SeqFeatR is implemented as R package and freely available from the R archive CRAN (http://cran.r-project.org/web/packages/SeqFeatR/index.html). The package includes a tutorial vignette. The software is distributed under the GNU General Public License (version 3 or later). The web server URL is https://seqfeatr.zmb.uni-due.de.
Collapse
Affiliation(s)
- Bettina Budeus
- Research Group Bioinformatics, Faculty of Biology, University of Duisburg-Essen, Essen, NRW, Germany
| | - Jörg Timm
- Institute for Virology, University Hospital Düsseldorf, Düsseldorf, NRW, Germany
| | - Daniel Hoffmann
- Research Group Bioinformatics, Faculty of Biology, University of Duisburg-Essen, Essen, NRW, Germany
- * E-mail:
| |
Collapse
|
14
|
Sierra S, Dybowski JN, Pironti A, Heider D, Güney L, Thielen A, Reuter S, Esser S, Fätkenheuer G, Lengauer T, Hoffmann D, Pfister H, Jensen B, Kaiser R. Parameters Influencing Baseline HIV-1 Genotypic Tropism Testing Related to Clinical Outcome in Patients on Maraviroc. PLoS One 2015; 10:e0125502. [PMID: 25970632 PMCID: PMC4430318 DOI: 10.1371/journal.pone.0125502] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2014] [Accepted: 03/18/2015] [Indexed: 12/25/2022] Open
Abstract
OBJECTIVES We analysed the impact of different parameters on genotypic tropism testing related to clinical outcome prediction in 108 patients on maraviroc (MVC) treatment. METHODS 87 RNA and 60 DNA samples were used. The viral tropism was predicted using the geno2pheno[coreceptor] and T-CUP tools with FPR cut-offs ranging from 1%-20%. Additionally, 27 RNA and 28 DNA samples were analysed in triplicate, 43 samples with the ESTA assay and 45 with next-generation sequencing. The influence of the genotypic susceptibility score (GSS) and 16 MVC-resistance mutations on clinical outcome was also studied. RESULTS Concordance between single-amplification testing compared to ESTA and to NGS was in the order of 80%. Concordance with NGS was higher at lower FPR cut-offs. Detection of baseline R5 viruses in RNA and DNA samples by all methods significantly correlated with treatment success, even with FPR cut-offs of 3.75%-7.5%. Triple amplification did not improve the prediction value but reduced the number of patients eligible for MVC. No influence of the GSS or MVC-resistance mutations but adherence to treatment, on the clinical outcome was detected. CONCLUSIONS Proviral DNA is valid to select candidates for MVC treatment. FPR cut-offs of 5%-7.5% and single amplification from RNA or DNA would assure a safe administration of MVC without excluding many patients who could benefit from this drug. In addition, the new prediction system T-CUP produced reliable results.
Collapse
Affiliation(s)
- Saleta Sierra
- Institute of Virology, University of Cologne, Cologne, Germany
| | - J Nikolai Dybowski
- Department for Bioinformatics, University of Duisburg-Essen, Essen, Germany
| | - Alejandro Pironti
- Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Dominik Heider
- Department for Bioinformatics, University of Duisburg-Essen, Essen, Germany
| | - Lisa Güney
- Institute of Virology, University of Cologne, Cologne, Germany
| | - Alex Thielen
- Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Stefan Reuter
- Department of Gastroenterology, Hepatology and Infectiology, University Hospital of Düsseldorf, Düsseldorf, Germany
| | - Stefan Esser
- Department of Dermatology, University of Duisburg-Essen, Essen, Germany
| | - Gerd Fätkenheuer
- First Department of Internal Medicine, University of Cologne, Cologne, Germany
| | - Thomas Lengauer
- Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Daniel Hoffmann
- Department for Bioinformatics, University of Duisburg-Essen, Essen, Germany
| | - Herbert Pfister
- Institute of Virology, University of Cologne, Cologne, Germany
| | - Björn Jensen
- Department of Gastroenterology, Hepatology and Infectiology, University Hospital of Düsseldorf, Düsseldorf, Germany
| | - Rolf Kaiser
- Institute of Virology, University of Cologne, Cologne, Germany
| |
Collapse
|
15
|
Schwalbe B, Schreiber M. Effect of lysine to arginine mutagenesis in the V3 loop of HIV-1 gp120 on viral entry efficiency and neutralization. PLoS One 2015; 10:e0119879. [PMID: 25785610 PMCID: PMC4364900 DOI: 10.1371/journal.pone.0119879] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 02/03/2015] [Indexed: 12/02/2022] Open
Abstract
HIV-1 infection is characterized by an ongoing replication leading to T-lymphocyte decline which is paralleled by the switch from CCR5 to CXCR4 coreceptor usage. To predict coreceptor usage, several computer algorithms using gp120 V3 loop sequence data have been developed. In these algorithms an occupation of the V3 positions 11 and 25, by one of the amino acids lysine (K) or arginine (R), is an indicator for CXCR4 usage. Amino acids R and K dominate at these two positions, but can also be identified at positions 9 and 10. Generally, CXCR4-viruses possess V3 sequences, with an overall positive charge higher than the V3 sequences of R5-viruses. The net charge is calculated by subtracting the number of negatively charged amino acids (D, aspartic acid and E, glutamic acid) from the number of positively charged ones (K and R). In contrast to D and E, which are very similar in their polar and acidic properties, the characteristics of the R guanidinium group differ significantly from the K ammonium group. However, in coreceptor predictive computer algorithms R and K are both equally rated. The study was conducted to analyze differences in infectivity and coreceptor usage because of R-to-K mutations at the V3 positions 9, 10 and 11. V3 loop mutants with all possible RRR-to-KKK triplets were constructed and analyzed for coreceptor usage, infectivity and neutralization by SDF-1α and RANTES. Virus mutants R9R10R11 showed the highest infectivity rates, and were inhibited more efficiently in contrast to the K9K10K11 viruses. They also showed higher efficiency in a virus-gp120 paired infection assay. Especially V3 loop position 9 was relevant for a switch to higher infectivity when occupied by R. Thus, K-to-R exchanges play a role for enhanced viral entry efficiency and should therefore be considered when the viral phenotype is predicted based on V3 sequence data.
Collapse
Affiliation(s)
- Birco Schwalbe
- Department Virology, Bernhard Nocht Institute for Tropical Medicine, Hamburg, Germany
| | - Michael Schreiber
- Department Virology, Bernhard Nocht Institute for Tropical Medicine, Hamburg, Germany
- * E-mail:
| |
Collapse
|
16
|
Gupta S, Neogi U, Srinivasa H, Shet A. Performance of Genotypic Tools for Prediction of Tropism in HIV-1 Subtype C V3 Loop Sequences. Intervirology 2015; 58:1-5. [DOI: 10.1159/000369017] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 10/09/2014] [Indexed: 11/19/2022] Open
Abstract
Currently, there is no consensus on the genotypic tools to be used for tropism analysis in HIV-1 subtype C strains. Thus, the aim of the study was to evaluate the performance of the different V3 loop-based genotypic algorithms available. We compiled a dataset of 645 HIV-1 subtype C V3 loop sequences of known coreceptor phenotypes (531 R5-tropic/non-syncytium-inducing and 114 X4-tropic/R5X4-tropic/syncytium-inducing sequences) from the Los Alamos database (http://www.hiv.lanl.gov/) and previously published literature. Coreceptor usage was predicted based on this dataset using different software-based machine-learning algorithms as well as simple classical rules. All the sophisticated machine-learning methods showed a good concordance of above 85%. Geno2Pheno (false-positive rate cutoff of 5-15%) and CoRSeqV3-C were found to have a high predicting capability in determining both HIV-1 subtype C X4-tropic and R5-tropic strains. The current sophisticated genotypic tropism tools based on V3 loop perform well for tropism prediction in HIV-1 subtype C strains and can be used in clinical settings.
Collapse
|
17
|
Arruda LB, Araújo MLD, Martinez ML, Gonsalez CR, Duarte AJDS, Coakley E, Lie Y, Casseb J. Determination of viral tropism by genotyping and phenotyping assays in Brazilian HIV-1-infected patients. Rev Inst Med Trop Sao Paulo 2014; 56:287-90. [PMID: 25076427 PMCID: PMC4131812 DOI: 10.1590/s0036-46652014000400003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2013] [Accepted: 01/30/2014] [Indexed: 12/03/2022] Open
Abstract
The clinical application of CCR5 antagonists involves first determining
the coreceptor usage by the infecting viral strain. Bioinformatics programs that
predict coreceptor usage could provide an alternative method to screen candidates for
treatment with CCR5 antagonists, particularly in countries with limited financial
resources. Thus, the present study aims to identify the best approach using
bioinformatics tools for determining HIV-1 coreceptor usage in clinical practice.
Proviral DNA sequences and Trofile results from 99 HIV-1-infected subjects under
clinical monitoring were analyzed in this study. Based on the Trofile results, the
viral variants present were 81.1% R5, 21.4% R5X4 and 1.8% X4. Determination of
tropism using a Geno2pheno[coreceptor] analysis with a false positive rate
of 10% gave the most suitable performance in this sampling: the R5 and X4 strains
were found at frequencies of 78.5% and 28.4%, respectively, and there was 78.6%
concordance between the phenotypic and genotypic results. Further studies are needed
to clarify how genetic diversity amongst virus strains affects bioinformatics-driven
approaches for determining tropism. Although this strategy could be useful for
screening patients in developing countries, some limitations remain that restrict the
wider application of coreceptor usage tests in clinical practice.
Collapse
Affiliation(s)
- Liã Bárbara Arruda
- Institute of Tropical Medicine of São Paulo, University of São Paulo, São Paulo, SP, Brazil
| | - Marilia Ladeira de Araújo
- Laboratory of Investigation in Dermatology and Immunodeficiencies, Department of Dermatology School of Medicine at University of São Paulo, University of São Paulo, São Paulo, SP, Brazil
| | - Maira Luccia Martinez
- Laboratory of Investigation in Dermatology and Immunodeficiencies, Department of Dermatology School of Medicine at University of São Paulo, University of São Paulo, São Paulo, SP, Brazil
| | - Claudio Roberto Gonsalez
- HIV Out-clinic, Ambulatory of Secondary Immunodeficiencies, ADEE3002, Department of Dermatology, Hospital of Clinics at School of Medicine, University of São Paulo
| | - Alberto José da Silva Duarte
- Laboratory of Investigation in Dermatology and Immunodeficiencies, Department of Dermatology School of Medicine at University of São Paulo, University of São Paulo, São Paulo, SP, Brazil
| | - Eoin Coakley
- Monogram Biosciences, Inc., South San Francisco, CA, USA
| | - Yolanda Lie
- Monogram Biosciences, Inc., South San Francisco, CA, USA
| | - Jorge Casseb
- Institute of Tropical Medicine of São Paulo, University of São Paulo, São Paulo, SP, Brazil
| |
Collapse
|
18
|
IDEPI: rapid prediction of HIV-1 antibody epitopes and other phenotypic features from sequence data using a flexible machine learning platform. PLoS Comput Biol 2014; 10:e1003842. [PMID: 25254639 PMCID: PMC4177671 DOI: 10.1371/journal.pcbi.1003842] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 08/01/2014] [Indexed: 11/19/2022] Open
Abstract
Since its identification in 1983, HIV-1 has been the focus of a research effort unprecedented in scope and difficulty, whose ultimate goals--a cure and a vaccine--remain elusive. One of the fundamental challenges in accomplishing these goals is the tremendous genetic variability of the virus, with some genes differing at as many as 40% of nucleotide positions among circulating strains. Because of this, the genetic bases of many viral phenotypes, most notably the susceptibility to neutralization by a particular antibody, are difficult to identify computationally. Drawing upon open-source general-purpose machine learning algorithms and libraries, we have developed a software package IDEPI (IDentify EPItopes) for learning genotype-to-phenotype predictive models from sequences with known phenotypes. IDEPI can apply learned models to classify sequences of unknown phenotypes, and also identify specific sequence features which contribute to a particular phenotype. We demonstrate that IDEPI achieves performance similar to or better than that of previously published approaches on four well-studied problems: finding the epitopes of broadly neutralizing antibodies (bNab), determining coreceptor tropism of the virus, identifying compartment-specific genetic signatures of the virus, and deducing drug-resistance associated mutations. The cross-platform Python source code (released under the GPL 3.0 license), documentation, issue tracking, and a pre-configured virtual machine for IDEPI can be found at https://github.com/veg/idepi.
Collapse
|
19
|
Olejnik M, Steuwer M, Gorlatch S, Heider D. gCUP: rapid GPU-based HIV-1 co-receptor usage prediction for next-generation sequencing. Bioinformatics 2014; 30:3272-3. [PMID: 25123901 DOI: 10.1093/bioinformatics/btu535] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Next-generation sequencing (NGS) has a large potential in HIV diagnostics, and genotypic prediction models have been developed and successfully tested in the recent years. However, albeit being highly accurate, these computational models lack computational efficiency to reach their full potential. In this study, we demonstrate the use of graphics processing units (GPUs) in combination with a computational prediction model for HIV tropism. Our new model named gCUP, parallelized and optimized for GPU, is highly accurate and can classify >175 000 sequences per second on an NVIDIA GeForce GTX 460. The computational efficiency of our new model is the next step to enable NGS technologies to reach clinical significance in HIV diagnostics. Moreover, our approach is not limited to HIV tropism prediction, but can also be easily adapted to other settings, e.g. drug resistance prediction. AVAILABILITY AND IMPLEMENTATION The source code can be downloaded at http://www.heiderlab.de CONTACT d.heider@wz-straubing.de.
Collapse
Affiliation(s)
- Michael Olejnik
- Institute of Computer Science, University of Muenster, 48149 Muenster and Department of Bioinformatics, University of Applied Sciences Weihenstephan-Triesdorf, 94315 Straubing, Germany
| | - Michel Steuwer
- Institute of Computer Science, University of Muenster, 48149 Muenster and Department of Bioinformatics, University of Applied Sciences Weihenstephan-Triesdorf, 94315 Straubing, Germany
| | - Sergei Gorlatch
- Institute of Computer Science, University of Muenster, 48149 Muenster and Department of Bioinformatics, University of Applied Sciences Weihenstephan-Triesdorf, 94315 Straubing, Germany
| | - Dominik Heider
- Institute of Computer Science, University of Muenster, 48149 Muenster and Department of Bioinformatics, University of Applied Sciences Weihenstephan-Triesdorf, 94315 Straubing, Germany
| |
Collapse
|
20
|
Heider D, Dybowski JN, Wilms C, Hoffmann D. A simple structure-based model for the prediction of HIV-1 co-receptor tropism. BioData Min 2014; 7:14. [PMID: 25120583 PMCID: PMC4124776 DOI: 10.1186/1756-0381-7-14] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2014] [Accepted: 07/28/2014] [Indexed: 12/16/2022] Open
Abstract
Background Human Immunodeficiency Virus 1 enters host cells through interaction of its V3 loop (which is part of the gp120 protein) with the host cell receptor CD4 and one of two co-receptors, namely CCR5 or CXCR4. Entry inhibitors binding the CCR5 co-receptor can prevent viral entry. As these drugs are only available for CCR5-using viruses, accurate prediction of this so-called co-receptor tropism is important in order to ensure an effective personalized therapy. With the development of next-generation sequencing technologies, it is now possible to sequence representative subpopulations of the viral quasispecies. Results Here we present T-CUP 2.0, a model for predicting co-receptor tropism. Based on our recently published T-CUP model, we developed a more accurate and even faster solution. Similarly to its predecessor, T-CUP 2.0 models co-receptor tropism using information of the electrostatic potential and hydrophobicity of V3-loops. However, extracting this information from a simplified structural vacuum-model leads to more accurate and faster predictions. The area-under-the-ROC-curve (AUC) achieved with T-CUP 2.0 on the training set is 0.968±0.005 in a leave-one-patient-out cross-validation. When applied to an independent dataset, T-CUP 2.0 has an improved prediction accuracy of around 3% when compared to the original T-CUP. Conclusions We found that it is possible to model co-receptor tropism in HIV-1 based on a simplified structure-based model of the V3 loop. In this way, genotypic prediction of co-receptor tropism is very accurate, fast and can be applied to large datasets derived from next-generation sequencing technologies. The reduced complexity of the electrostatic modeling makes T-CUP 2.0 independent from third-party software, making it easy to install and use.
Collapse
Affiliation(s)
- Dominik Heider
- Research Group Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| | - Jan Nikolaj Dybowski
- Research Group Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| | - Christoph Wilms
- Research Group Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| | - Daniel Hoffmann
- Research Group Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| |
Collapse
|
21
|
Sede MM, Moretti FA, Laufer NL, Jones LR, Quarleri JF. HIV-1 tropism dynamics and phylogenetic analysis from longitudinal ultra-deep sequencing data of CCR5- and CXCR4-using variants. PLoS One 2014; 9:e102857. [PMID: 25032817 PMCID: PMC4102574 DOI: 10.1371/journal.pone.0102857] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 06/25/2014] [Indexed: 11/25/2022] Open
Abstract
OBJECTIVE Coreceptor switch from CCR5 to CXCR4 is associated with HIV disease progression. The molecular and evolutionary mechanisms underlying the CCR5 to CXCR4 switch are the focus of intense recent research. We studied the HIV-1 tropism dynamics in relation to coreceptor usage, the nature of quasispecies from ultra deep sequencing (UDPS) data and their phylogenetic relationships. METHODS Here, we characterized C2-V3-C3 sequences of HIV obtained from 19 patients followed up for 54 to 114 months using UDPS, with further genotyping and phylogenetic analysis for coreceptor usage. HIV quasispecies diversity and variability as well as HIV plasma viral load were measured longitudinally and their relationship with the HIV coreceptor usage was analyzed. The longitudinal UDPS data were submitted to phylogenetic analysis and sampling times and coreceptor usage were mapped onto the trees obtained. RESULTS Although a temporal viral genetic structuring was evident, the persistence of several viral lineages evolving independently along the infection was statistically supported, indicating a complex scenario for the evolution of viral quasispecies. HIV X4-using variants were present in most of our patients, exhibiting a dissimilar inter- and intra-patient predominance as the component of quasispecies even on antiretroviral therapy. The viral populations from some of the patients studied displayed evidences of the evolution of X4 variants through fitness valleys, whereas for other patients the data favored a gradual mode of emergence. CONCLUSIONS CXCR4 usage can emerge independently, in multiple lineages, along the course of HIV infection. The mode of emergence, i.e. gradual or through fitness valleys seems to depend on both virus and patient factors. Furthermore, our analyses suggest that, besides becoming dominant after population-level switches, minor proportions of X4 viruses might exist along the infection, perhaps even at early stages of it. The fate of these minor variants might depend on both viral and host factors.
Collapse
Affiliation(s)
- Mariano M. Sede
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Universidad de Buenos Aires, CONICET, Buenos Aires, Argentina
- Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Franco A. Moretti
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Universidad de Buenos Aires, CONICET, Buenos Aires, Argentina
- Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Natalia L. Laufer
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Universidad de Buenos Aires, CONICET, Buenos Aires, Argentina
- Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Leandro R. Jones
- Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
- Laboratorio de Virología y Genética Molecular, Facultad de Ciencias Naturales, sede Trelew, Universidad Nacional de la Patagonia San Juan Bosco, Chubut, Argentina
| | - Jorge F. Quarleri
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Universidad de Buenos Aires, CONICET, Buenos Aires, Argentina
- Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| |
Collapse
|
22
|
Chandramouli B, Chillemi G, Desideri A. Structural dynamics of V3 loop in a trimeric ambiance, a molecular dynamics study on gp120–CD4 trimeric mimic. J Struct Biol 2014; 186:132-40. [DOI: 10.1016/j.jsb.2014.02.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Revised: 01/03/2014] [Accepted: 02/20/2014] [Indexed: 11/24/2022]
|
23
|
HIV-1 tropism testing and clinical management of CCR5 antagonists: Quebec review and recommendations. CANADIAN JOURNAL OF INFECTIOUS DISEASES & MEDICAL MICROBIOLOGY 2014; 24:202-8. [PMID: 24489562 DOI: 10.1155/2013/982759] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
HIV-1 tropism assays play a crucial role in determining the response to CCR5 receptor antagonists. Initially, phenotypic tests were used, but limited access to these tests prompted the development of alternative strategies. Recently, genotyping tropism has been validated using a Canadian technology in clinical trials investigating the use of maraviroc in both experienced and treatment-naive patients. The present guidelines review the evidence supporting the use of genotypic assays and provide recommendations regarding tropism testing in daily clinical management.
Collapse
|
24
|
Lengauer T, Pfeifer N, Kaiser R. Personalized HIV therapy to control drug resistance. DRUG DISCOVERY TODAY. TECHNOLOGIES 2014; 11:57-64. [PMID: 24847654 DOI: 10.1016/j.ddtec.2014.02.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The therapy of HIV patients is characterized by both the high genomic diversity of the virus population harbored by the patient and a substantial volume of therapy options. The virus population is unique for each patient and time point. The large number of therapy options makes it difficult to select an optimal or near optimal therapy, especially with therapy-experienced patients. In the past decade, computer-based support for therapy selection, which assesses the level of viral resistance against drugs has become a mainstay for HIV patients. We discuss the properties of available systems and the perspectives of the field.
Collapse
|
25
|
Aiamkitsumrit B, Dampier W, Antell G, Rivera N, Martin-Garcia J, Pirrone V, Nonnemacher MR, Wigdahl B. Bioinformatic analysis of HIV-1 entry and pathogenesis. Curr HIV Res 2014; 12:132-61. [PMID: 24862329 PMCID: PMC4382797 DOI: 10.2174/1570162x12666140526121746] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2013] [Revised: 03/18/2014] [Accepted: 05/06/2014] [Indexed: 02/07/2023]
Abstract
The evolution of human immunodeficiency virus type 1 (HIV-1) with respect to co-receptor utilization has been shown to be relevant to HIV-1 pathogenesis and disease. The CCR5-utilizing (R5) virus has been shown to be important in the very early stages of transmission and highly prevalent during asymptomatic infection and chronic disease. In addition, the R5 virus has been proposed to be involved in neuroinvasion and central nervous system (CNS) disease. In contrast, the CXCR4-utilizing (X4) virus is more prevalent during the course of disease progression and concurrent with the loss of CD4(+) T cells. The dual-tropic virus is able to utilize both co-receptors (CXCR4 and CCR5) and has been thought to represent an intermediate transitional virus that possesses properties of both X4 and R5 viruses that can be encountered at many stages of disease. The use of computational tools and bioinformatic approaches in the prediction of HIV-1 co-receptor usage has been growing in importance with respect to understanding HIV-1 pathogenesis and disease, developing diagnostic tools, and improving the efficacy of therapeutic strategies focused on blocking viral entry. Current strategies have enhanced the sensitivity, specificity, and reproducibility relative to the prediction of co-receptor use; however, these technologies need to be improved with respect to their efficient and accurate use across the HIV-1 subtypes. The most effective approach may center on the combined use of different algorithms involving sequences within and outside of the env-V3 loop. This review focuses on the HIV-1 entry process and on co-receptor utilization, including bioinformatic tools utilized in the prediction of co-receptor usage. It also provides novel preliminary analyses for enabling identification of linkages between amino acids in V3 with other components of the HIV-1 genome and demonstrates that these linkages are different between X4 and R5 viruses.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Brian Wigdahl
- Department of Microbiology and Immunology, Drexel University College of Medicine, 245 N. 15th Street, Philadelphia, PA 19102.
| |
Collapse
|
26
|
Heider D, Senge R, Cheng W, Hüllermeier E. Multilabel classification for exploiting cross-resistance information in HIV-1 drug resistance prediction. ACTA ACUST UNITED AC 2013; 29:1946-52. [PMID: 23793752 DOI: 10.1093/bioinformatics/btt331] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Antiretroviral treatment regimens can sufficiently suppress viral replication in human immunodeficiency virus (HIV)-infected patients and prevent the progression of the disease. However, one of the factors contributing to the progression of the disease despite ongoing antiretroviral treatment is the emergence of drug resistance. The high mutation rate of HIV can lead to a fast adaptation of the virus under drug pressure, thus to failure of antiretroviral treatment due to the evolution of drug-resistant variants. Moreover, cross-resistance phenomena have been frequently found in HIV-1, leading to resistance not only against a drug from the current treatment, but also to other not yet applied drugs. Automatic classification and prediction of drug resistance is increasingly important in HIV research as well as in clinical settings, and to this end, machine learning techniques have been widely applied. Nevertheless, cross-resistance information was not taken explicitly into account, yet. RESULTS In our study, we demonstrated the use of cross-resistance information to predict drug resistance in HIV-1. We tested a set of more than 600 reverse transcriptase sequences and corresponding resistance information for six nucleoside analogues. Based on multilabel classification models and cross-resistance information, we were able to significantly improve overall prediction accuracy for all drugs, compared with single binary classifiers without any additional information. Moreover, we identified drug-specific patterns within the reverse transcriptase sequences that can be used to determine an optimal order of the classifiers within the classifier chains. These patterns are in good agreement with known resistance mutations and support the use of cross-resistance information in such prediction models. CONTACT dominik.heider@uni-due.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dominik Heider
- Department of Bioinformatics, University of Duisburg-Essen, Essen, Germany
| | | | | | | |
Collapse
|
27
|
Pfeifer N, Lengauer T. Improving HIV coreceptor usage prediction in the clinic using hints from next-generation sequencing data. Bioinformatics 2013; 28:i589-i595. [PMID: 22962486 PMCID: PMC3436800 DOI: 10.1093/bioinformatics/bts373] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Due to the high mutation rate of human immunodeficiency virus (HIV), drug-resistant-variants emerge frequently. Therefore, researchers are constantly searching for new ways to attack the virus. One new class of anti-HIV drugs is the class of coreceptor antagonists that block cell entry by occupying a coreceptor on CD4 cells. This type of drug just has an effect on the subset of HIVs that use the inhibited coreceptor. A good prediction of whether the viral population inside a patient is susceptible to the treatment is hence very important for therapy decisions and pre-requisite to administering the respective drug. The first prediction models were based on data from Sanger sequencing of the V3 loop of HIV. Recently, a method based on next-generation sequencing (NGS) data was introduced that predicts labels for each read separately and decides on the patient label through a percentage threshold for the resistant viral minority. RESULTS We model the prediction problem on the patient level taking the information of all reads from NGS data jointly into account. This enables us to improve prediction performance for NGS data, but we can also use the trained model to improve predictions based on Sanger sequencing data. Therefore, also laboratories without NGS capabilities can benefit from the improvements. Furthermore, we show which amino acids at which position are important for prediction success, giving clues on how the interaction mechanism between the V3 loop and the particular coreceptors might be influenced. AVAILABILITY A webserver is available at http://coreceptor.bioinf.mpi-inf.mpg.de. CONTACT nico.pfeifer@mpi-inf.mpg.de.
Collapse
Affiliation(s)
- Nico Pfeifer
- Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Campus E1 4, 66123 Saarbrücken, Germany.
| | | |
Collapse
|
28
|
Hybrid approach for predicting coreceptor used by HIV-1 from its V3 loop amino acid sequence. PLoS One 2013; 8:e61437. [PMID: 23596523 PMCID: PMC3626595 DOI: 10.1371/journal.pone.0061437] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Accepted: 03/13/2013] [Indexed: 12/18/2022] Open
Abstract
Background HIV-1 infects the host cell by interacting with the primary receptor CD4 and a coreceptor CCR5 or CXCR4. Maraviroc, a CCR5 antagonist binds to CCR5 receptor. Thus, it is important to identify the coreceptor used by the HIV strains dominating in the patient. In past, a number of experimental assays and in-silico techniques have been developed for predicting the coreceptor tropism. The prediction accuracy of these methods is excellent when predicting CCR5(R5) tropic sequences but is relatively poor for CXCR4(X4) tropic sequences. Therefore, any new method for accurate determination of coreceptor usage would be of paramount importance to the successful management of HIV-infected individuals. Results The dataset used in this study comprised 1799 R5-tropic and 598 X4-tropic third variable (V3) sequences of HIV-1. We compared the amino acid composition of both types of V3 sequences and observed that certain types of residues, e.g., Asparagine and Isoleucine, were preferred in R5-tropic sequences whereas residues like Lysine, Arginine, and Tryptophan were preferred in X4-tropic sequences. Initially, Support Vector Machine-based models were developed using amino acid composition, dipeptide composition, and split amino acid composition, which achieved accuracy up to 90%. We used BLAST to discriminate R5- and X4-tropic sequences and correctly predicted 93.16% of R5- and 75.75% of X4-tropic sequences. In order to improve the prediction accuracy, a Hybrid model was developed that achieved 91.66% sensitivity, 81.77% specificity, 89.19% accuracy and 0.72 Matthews Correlation Coefficient. The performance of our models was also evaluated on an independent dataset (256 R5- and 81 X4-tropic sequences) and achieved maximum accuracy of 84.87% with Matthews Correlation Coefficient 0.63. Conclusion This study describes a highly efficient method for predicting HIV-1 coreceptor usage from V3 sequences. In order to provide a service to the scientific community, a webserver HIVcoPred was developed (http://www.imtech.res.in/raghava/hivcopred/) for predicting the coreceptor usage.
Collapse
|
29
|
Evans MC, Paquet AC, Huang W, Napolitano L, Frantzell A, Toma J, Stawiski EW, Goetz MB, Petropoulos CJ, Whitcomb J, Coakley E, Haddad M. A case-based reasoning system for genotypic prediction of HIV-1 co-receptor tropism. J Bioinform Comput Biol 2013; 11:1350006. [PMID: 23859270 DOI: 10.1142/s0219720013500066] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Accurate co-receptor tropism (CRT) determination is critical for making treatment decisions in HIV management. We created a genotypic tropism prediction tool by utilizing the case-based reasoning (CBR) technique that attempts to solve new problems through applying the solution from similar past problems. V3 loop sequences from 732 clinical samples with diverse characteristics were used to build a case library. Additional sequence and molecular properties of the V3 loop were examined and used for similarity assessment. A similarity metric was defined based on each attribute's frequency in the CXCR4-using viruses. We implemented three other genotype-based tropism predictors, support vector machines (SVM), position specific scoring matrices (PSSM), and the 11/25 rule, and evaluated their performance as the ability to predict CRT compared to Monogram's enhanced sensitivity Trofile(®) assay (ESTA). Overall concordance of the CBR based tropism prediction algorithm was 81%, as compared to ESTA. Sensitivity to detect CXCR4 usage was 90% and specificity was at 73%. In comparison, sensitivity of the SVM, PSSM, and the 11/25 rule were 85%, 81%, and 36% respectively while achieving a specificity of 90% by SVM, 75% by PSSM, and 97% by the 11/25 rule. When we evaluated these predictors in an unseen dataset, higher sensitivity was achieved by the CBR algorithm (87%), compared to SVM (82%), PSSM (76%), and the 11/25 rule (33%), while maintaining similar level of specificity. Overall this study suggests that CBR can be utilized as a genotypic tropism prediction tool, and can achieve improved performance in independent datasets compared to model or rule based methods.
Collapse
Affiliation(s)
- Mark C Evans
- Bioinformatics/Biostatistics, Monogram Biosciences Inc., South San Francisco, CA 94080, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Bozek K, Lengauer T, Sierra S, Kaiser R, Domingues FS. Analysis of physicochemical and structural properties determining HIV-1 coreceptor usage. PLoS Comput Biol 2013; 9:e1002977. [PMID: 23555214 PMCID: PMC3605109 DOI: 10.1371/journal.pcbi.1002977] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Accepted: 01/23/2013] [Indexed: 11/18/2022] Open
Abstract
The relationship of HIV tropism with disease progression and the recent development of CCR5-blocking drugs underscore the importance of monitoring virus coreceptor usage. As an alternative to costly phenotypic assays, computational methods aim at predicting virus tropism based on the sequence and structure of the V3 loop of the virus gp120 protein. Here we present a numerical descriptor of the V3 loop encoding its physicochemical and structural properties. The descriptor allows for structure-based prediction of HIV tropism and identification of properties of the V3 loop that are crucial for coreceptor usage. Use of the proposed descriptor for prediction results in a statistically significant improvement over the prediction based solely on V3 sequence with 3 percentage points improvement in AUC and 7 percentage points in sensitivity at the specificity of the 11/25 rule (95%). We additionally assessed the predictive power of the new method on clinically derived ‘bulk’ sequence data and obtained a statistically significant improvement in AUC of 3 percentage points over sequence-based prediction. Furthermore, we demonstrated the capacity of our method to predict therapy outcome by applying it to 53 samples from patients undergoing Maraviroc therapy. The analysis of structural features of the loop informative of tropism indicates the importance of two loop regions and their physicochemical properties. The regions are located on opposite strands of the loop stem and the respective features are predominantly charge-, hydrophobicity- and structure-related. These regions are in close proximity in the bound conformation of the loop potentially forming a site determinant for the coreceptor binding. The method is available via server under http://structure.bioinf.mpi-inf.mpg.de/. Human Immunodeficiency Virus (HIV) requires one of the chemokine coreceptors CCR5 or CXCR4 for entry into the host cell. The capacity of the virus to use one or both of these coreceptors is termed tropism. Monitoring HIV tropism is of high importance due to the relationship of the emergence of CXCR4-tropic virus with the progression of immunodeficiency and for patient treatment with the recently developed CCR5 antagonists. Computational methods for predicting HIV tropism are based on sequence and on structure of the third variable region (V3 loop) of the viral gp120 protein — the major determinant of the HIV tropism. Limitations of the existing methods include the limited insights they provide into the biochemical determinants of coreceptor usage, high computational load of the structure-based methods and low prediction accuracy on clinically derived patient samples. Here we propose a numerical descriptor of the V3 loop encoding the physicochemical and structural properties of the loop. The new descriptor allows for server-based prediction of viral tropism with accuracy comparable to that of established sequence-based methods both on clonal and clinically derived patient data as well as for the interpretation of the properties of the loop relevant for tropism. The server is available under http://structure.bioinf.mpi-inf.mpg.de/.
Collapse
Affiliation(s)
- Katarzyna Bozek
- Max Planck Institute for Computer Science, Saarbrucken, Germany.
| | | | | | | | | |
Collapse
|
31
|
Wang Y, Rawi R, Wilms C, Heider D, Yang R, Hoffmann D. A small set of succinct signature patterns distinguishes Chinese and non-Chinese HIV-1 genomes. PLoS One 2013; 8:e58804. [PMID: 23527028 PMCID: PMC3602349 DOI: 10.1371/journal.pone.0058804] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2012] [Accepted: 02/06/2013] [Indexed: 11/18/2022] Open
Abstract
The epidemiology of HIV-1 in China has unique features that may have led to unique viral strains. We therefore tested the hypothesis that it is possible to find distinctive patterns in HIV-1 genomes sampled in China. Using a rule inference algorithm we could indeed extract from sequences of the third variable loop (V3) of HIV-1 gp120 a set of 14 signature patterns that with 89% accuracy distinguished Chinese from non-Chinese sequences. These patterns were found to be specific to HIV-1 subtype, i.e. sequences complying with pattern 1 were of subtype B, pattern 2 almost exclusively covered sequences of subtype 01_AE, etc. We then analyzed the first of these signature patterns in depth, namely that L and W at two V3 positions are specifically occurring in Chinese sequences of subtype B/B' (3% false positives). This pattern was found to be in agreement with the phylogeny of HIV-1 of subtype B inside and outside of China. We could neither reject nor convincingly confirm that the pattern is stabilized by immune escape. For further interpretation of the signature pattern we used the recently developed measure of Direct Information, and in this way discovered evidence for physical interactions between V2 and V3. We conclude by a discussion of limitations of signature patterns, and the applicability of the approach to other genomic regions and other countries.
Collapse
Affiliation(s)
- Yan Wang
- Research Group Bioinformatics, Center for Medical Biology, University of Duisburg-Essen, Essen, Germany
- AIDS and HIV Research Group, State Key Laboratory of Virology, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, P. R. China
| | - Reda Rawi
- Research Group Bioinformatics, Center for Medical Biology, University of Duisburg-Essen, Essen, Germany
| | - Christoph Wilms
- Research Group Bioinformatics, Center for Medical Biology, University of Duisburg-Essen, Essen, Germany
| | - Dominik Heider
- Research Group Bioinformatics, Center for Medical Biology, University of Duisburg-Essen, Essen, Germany
| | - Rongge Yang
- AIDS and HIV Research Group, State Key Laboratory of Virology, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, P. R. China
- * E-mail: (RY); (DH)
| | - Daniel Hoffmann
- Research Group Bioinformatics, Center for Medical Biology, University of Duisburg-Essen, Essen, Germany
- * E-mail: (RY); (DH)
| |
Collapse
|
32
|
Chandramouli B, Chillemi G, Giombini E, Capobianchi MR, Rozera G, Desideri A. Structural dynamics of V3 loop with different electrostatics: implications on co-receptor recognition: a molecular dynamics study of HIV gp120. J Biomol Struct Dyn 2012; 31:403-13. [PMID: 22876913 DOI: 10.1080/07391102.2012.703068] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
The HIV's envelope glycoprotein gp120 plays a major role in the entry of the virus into the host cell, through its successive interactions with the cell surface CD4 receptor and a co-receptor (CCR5 or CXCR4). The choice of a specific co-receptor by gp120 has an important consequence on HIV infection and pathogenesis. The third variable region within gp120, the V3 loop, is the principal determinant of the co-receptor usage by gp120. Here, we report the long time molecular dynamics simulations of four gp120 structures, having a V3 loop charge of +3 and +5, from both R5 and X4 specific strains of HIV. The results of the study highlight the properties of the V3 loop that can be critical for dictating the co-receptor recognition and selection in structural context. In detail, we observe that the structural orientation of the V3 loop in the 3D space is modulated by its net charge, whilst its co-receptor choice is likely dictated by a combined effect of both the electrostatics of the loop and its conformational variability at the level of its central crown region.
Collapse
|
33
|
Bozek K, Eckhardt M, Sierra S, Anders M, Kaiser R, Kräusslich HG, Müller B, Lengauer T. An expanded model of HIV cell entry phenotype based on multi-parameter single-cell data. Retrovirology 2012; 9:60. [PMID: 22830600 PMCID: PMC3464718 DOI: 10.1186/1742-4690-9-60] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2012] [Accepted: 06/07/2012] [Indexed: 11/29/2022] Open
Abstract
Background Entry of human immunodeficiency virus type 1 (HIV-1) into the host cell involves interactions between the viral envelope glycoproteins (Env) and the cellular receptor CD4 as well as a coreceptor molecule (most importantly CCR5 or CXCR4). Viral preference for a specific coreceptor (tropism) is in particular determined by the third variable loop (V3) of the Env glycoprotein gp120. The approval and use of a coreceptor antagonist for antiretroviral therapy make detailed understanding of tropism and its accurate prediction from patient derived virus isolates essential. The aim of the present study is the development of an extended description of the HIV entry phenotype reflecting its co-dependence on several key determinants as the basis for a more accurate prediction of HIV-1 entry phenotype from genotypic data. Results Here, we established a new protocol of quantitation and computational analysis of the dependence of HIV entry efficiency on receptor and coreceptor cell surface levels as well as viral V3 loop sequence and the presence of two prototypic coreceptor antagonists in varying concentrations. Based on data collected at the single-cell level, we constructed regression models of the HIV-1 entry phenotype integrating the measured determinants. We developed a multivariate phenotype descriptor, termed phenotype vector, which facilitates a more detailed characterization of HIV entry phenotypes than currently used binary tropism classifications. For some of the tested virus variants, the multivariant phenotype vector revealed substantial divergences from existing tropism predictions. We also developed methods for computational prediction of the entry phenotypes based on the V3 sequence and performed an extrapolating calculation of the effectiveness of this computational procedure. Conclusions Our study of the HIV cell entry phenotype and the novel multivariate representation developed here contributes to a more detailed understanding of this phenotype and offers potential for future application in the effective administration of entry inhibitors in antiretroviral therapies.
Collapse
Affiliation(s)
- Katarzyna Bozek
- Department of Computational Biology and Applied Algorithmics, Max Planck for Computer Sciences, Campus E1 4 66123, Saarbrücken, Germany
| | | | | | | | | | | | | | | |
Collapse
|
34
|
Touw WG, Bayjanov JR, Overmars L, Backus L, Boekhorst J, Wels M, van Hijum SAFT. Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? Brief Bioinform 2012; 14:315-26. [PMID: 22786785 PMCID: PMC3659301 DOI: 10.1093/bib/bbs034] [Citation(s) in RCA: 213] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
In the Life Sciences 'omics' data is increasingly generated by different high-throughput technologies. Often only the integration of these data allows uncovering biological insights that can be experimentally validated or mechanistically modelled, i.e. sophisticated computational approaches are required to extract the complex non-linear trends present in omics data. Classification techniques allow training a model based on variables (e.g. SNPs in genetic association studies) to separate different classes (e.g. healthy subjects versus patients). Random Forest (RF) is a versatile classification algorithm suited for the analysis of these large data sets. In the Life Sciences, RF is popular because RF classification models have a high-prediction accuracy and provide information on importance of variables for classification. For omics data, variables or conditional relations between variables are typically important for a subset of samples of the same class. For example: within a class of cancer patients certain SNP combinations may be important for a subset of patients that have a specific subtype of cancer, but not important for a different subset of patients. These conditional relationships can in principle be uncovered from the data with RF as these are implicitly taken into account by the algorithm during the creation of the classification model. This review details some of the to the best of our knowledge rarely or never used RF properties that allow maximizing the biological insights that can be extracted from complex omics data sets using RF.
Collapse
|
35
|
Balasubramanian C, Chillemi G, Abbate I, Capobianchi MR, Rozera G, Desideri A. Importance of V3 Loop Flexibility and Net Charge in the Context of Co-Receptor Recognition. A Molecular Dynamics Study on HIV gp120. J Biomol Struct Dyn 2012; 29:879-91. [DOI: 10.1080/07391102.2012.10507416] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
36
|
López de Victoria A, Kieslich CA, Rizos AK, Krambovitis E, Morikis D. Clustering of HIV-1 Subtypes Based on gp120 V3 Loop electrostatic properties. BMC BIOPHYSICS 2012; 5:3. [PMID: 22313935 PMCID: PMC3295656 DOI: 10.1186/2046-1682-5-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Accepted: 02/07/2012] [Indexed: 11/10/2022]
Abstract
BACKGROUND The V3 loop of the glycoprotein gp120 of HIV-1 plays an important role in viral entry into cells by utilizing as coreceptor CCR5 or CXCR4, and is implicated in the phenotypic tropisms of HIV viruses. It has been hypothesized that the interaction between the V3 loop and CCR5 or CXCR4 is mediated by electrostatics. We have performed hierarchical clustering analysis of the spatial distributions of electrostatic potentials and charges of V3 loop structures containing consensus sequences of HIV-1 subtypes. RESULTS Although the majority of consensus sequences have a net charge of +3, the spatial distribution of their electrostatic potentials and charges may be a discriminating factor for binding and infectivity. This is demonstrated by the formation of several small subclusters, within major clusters, which indicates common origin but distinct spatial details of electrostatic properties. Some of this information may be present, in a coarse manner, in clustering of sequences, but the spatial details are largely lost. We show the effect of ionic strength on clustering of electrostatic potentials, information that is not present in clustering of charges or sequences. We also make correlations between clustering of electrostatic potentials and net charge, coreceptor selectivity, global prevalence, and geographic distribution. Finally, we interpret coreceptor selectivity based on the N6X7T8|S8X9 sequence glycosylation motif, the specific positive charge location according to the 11/24/25 rule, and the overall charge and electrostatic potential distribution. CONCLUSIONS We propose that in addition to the sequence and the net charge of the V3 loop of each subtype, the spatial distributions of electrostatic potentials and charges may also be important factors for receptor recognition and binding and subsequent viral entry into cells. This implies that the overall electrostatic potential is responsible for long-range recognition of the V3 loop with coreceptors CCR5/CXCR4, whereas the charge distribution contributes to the specific short-range interactions responsible for the formation of the bound complex. We also propose a scheme for coreceptor selectivity based on the sequence glycosylation motif, the 11/24/25 rule, and net charge.
Collapse
Affiliation(s)
| | - Chris A Kieslich
- Department of Bioengineering, University of California, Riverside 92521, USA
| | - Apostolos K Rizos
- Department of Chemistry, University of Crete and Foundation for Research and Technology-Hellas, FORTH-IESL, GR-71003, Heraklion, Crete, Greece
| | - Elias Krambovitis
- Department of Veterinary Medicine, University of Thessaly, Karditsa, Greece
| | - Dimitrios Morikis
- Department of Bioengineering, University of California, Riverside 92521, USA
| |
Collapse
|
37
|
Dybowski JN, Riemenschneider M, Hauke S, Pyka M, Verheyen J, Hoffmann D, Heider D. Improved Bevirimat resistance prediction by combination of structural and sequence-based classifiers. BioData Min 2011; 4:26. [PMID: 22082002 PMCID: PMC3248369 DOI: 10.1186/1756-0381-4-26] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2011] [Accepted: 11/14/2011] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Maturation inhibitors such as Bevirimat are a new class of antiretroviral drugs that hamper the cleavage of HIV-1 proteins into their functional active forms. They bind to these preproteins and inhibit their cleavage by the HIV-1 protease, resulting in non-functional virus particles. Nevertheless, there exist mutations in this region leading to resistance against Bevirimat. Highly specific and accurate tools to predict resistance to maturation inhibitors can help to identify patients, who might benefit from the usage of these new drugs. RESULTS We tested several methods to improve Bevirimat resistance prediction in HIV-1. It turned out that combining structural and sequence-based information in classifier ensembles led to accurate and reliable predictions. Moreover, we were able to identify the most crucial regions for Bevirimat resistance computationally, which are in line with experimental results from other studies. CONCLUSIONS Our analysis demonstrated the use of machine learning techniques to predict HIV-1 resistance against maturation inhibitors such as Bevirimat. New maturation inhibitors are already under development and might enlarge the arsenal of antiretroviral drugs in the future. Thus, accurate prediction tools are very useful to enable a personalized therapy.
Collapse
Affiliation(s)
- J Nikolaj Dybowski
- Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr, 2, 45117 Essen, Germany.
| | | | | | | | | | | | | |
Collapse
|
38
|
Computational Design of a DNA- and Fc-Binding Fusion Protein. Adv Bioinformatics 2011; 2011:457578. [PMID: 21941539 PMCID: PMC3173724 DOI: 10.1155/2011/457578] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2011] [Revised: 06/16/2011] [Accepted: 06/22/2011] [Indexed: 12/23/2022] Open
Abstract
Computational design of novel proteins with well-defined functions is an ongoing topic in computational biology. In this work, we generated and optimized a new synthetic fusion protein using an evolutionary approach. The optimization was guided by directed evolution based on hydrophobicity scores, molecular weight, and secondary structure predictions. Several methods were used to refine the models built from the resulting sequences. We have successfully combined two unrelated naturally occurring binding sites, the immunoglobin Fc-binding site of the Z domain and the DNA-binding motif of MyoD bHLH, into a novel stable protein.
Collapse
|
39
|
Interpol: An R package for preprocessing of protein sequences. BioData Min 2011; 4:16. [PMID: 21682849 PMCID: PMC3138420 DOI: 10.1186/1756-0381-4-16] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2011] [Accepted: 06/17/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Most machine learning techniques currently applied in the literature need a fixed dimensionality of input data. However, this requirement is frequently violated by real input data, such as DNA and protein sequences, that often differ in length due to insertions and deletions. It is also notable that performance in classification and regression is often improved by numerical encoding of amino acids, compared to the commonly used sparse encoding. RESULTS The software "Interpol" encodes amino acid sequences as numerical descriptor vectors using a database of currently 532 descriptors (mainly from AAindex), and normalizes sequences to uniform length with one of five linear or non-linear interpolation algorithms. Interpol is distributed with open source as platform independent R-package. It is typically used for preprocessing of amino acid sequences for classification or regression. CONCLUSIONS The functionality of Interpol widens the spectrum of machine learning methods that can be applied to biological sequences, and it will in many cases improve their performance in classification and regression.
Collapse
|
40
|
Heider D, Verheyen J, Hoffmann D. Machine learning on normalized protein sequences. BMC Res Notes 2011; 4:94. [PMID: 21453485 PMCID: PMC3079662 DOI: 10.1186/1756-0500-4-94] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2010] [Accepted: 03/31/2011] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Machine learning techniques have been widely applied to biological sequences, e.g. to predict drug resistance in HIV-1 from sequences of drug target proteins and protein functional classes. As deletions and insertions are frequent in biological sequences, a major limitation of current methods is the inability to handle varying sequence lengths. FINDINGS We propose to normalize sequences to uniform length. To this end, we tested one linear and four different non-linear interpolation methods for the normalization of sequence lengths of 19 classification datasets. Classification tasks included prediction of HIV-1 drug resistance from drug target sequences and sequence-based prediction of protein function. We applied random forests to the classification of sequences into "positive" and "negative" samples. Statistical tests showed that the linear interpolation outperforms the non-linear interpolation methods in most of the analyzed datasets, while in a few cases non-linear methods had a small but significant advantage. Compared to other published methods, our prediction scheme leads to an improvement in prediction accuracy by up to 14%. CONCLUSIONS We found that machine learning on sequences normalized by simple linear interpolation gave better or at least competitive results compared to state-of-the-art procedures, and thus, is a promising alternative to existing methods, especially for protein sequences of variable length.
Collapse
Affiliation(s)
- Dominik Heider
- Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| | - Jens Verheyen
- Institute of Virology, University of Cologne, Fuerst-Pueckler-Str. 56, 50935 Cologne, Germany
| | - Daniel Hoffmann
- Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| |
Collapse
|
41
|
Dybowski N, Heider D, Hoffmann D. Structure of HIV-1 quasi-species as early indicator for switches of co-receptor tropism. AIDS Res Ther 2010; 7:41. [PMID: 21118549 PMCID: PMC3009693 DOI: 10.1186/1742-6405-7-41] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2010] [Accepted: 11/30/2010] [Indexed: 01/12/2023] Open
Abstract
Deep sequencing is able to generate a complete picture of the retroviral quasi-species in a patient. We demonstrate that the unprecedented power of deep sequencing in conjunction with computational data analysis has great potential for clinical diagnostics and basic research. Specifically, we analyzed longitudinal deep sequencing data from patients in a study with Vicriviroc, a drug that blocks the HIV-1 co-receptor CCR5. Sequences covered the V3-loop of gp120, known to be the main determinant of co-receptor tropism. First, we evaluated this data with a computational model for the interpretation of V3-sequences with respect to tropism, and we found complete agreement with results from phenotypic assays. Thus, the method could be applied in cases where phenotypic assays fail. Second, computational analysis led to the discovery of a characteristic pattern in the quasi-species that foreshadows switches of co-receptor tropism. This analysis could help to unravel the mechanism of tropism switches, and to predict these switches weeks to months before they can be detected by a phenotypic assay.
Collapse
|