1
|
Kayondo HW, Ssekagiri A, Nabakooza G, Bbosa N, Ssemwanga D, Kaleebu P, Mwalili S, Mango JM, Leigh Brown AJ, Saenz RA, Galiwango R, Kitayimbwa JM. Employing phylogenetic tree shape statistics to resolve the underlying host population structure. BMC Bioinformatics 2021; 22:546. [PMID: 34758743 PMCID: PMC8579572 DOI: 10.1186/s12859-021-04465-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 10/29/2021] [Indexed: 12/24/2022] Open
Abstract
Background Host population structure is a key determinant of pathogen and infectious disease transmission patterns. Pathogen phylogenetic trees are useful tools to reveal the population structure underlying an epidemic. Determining whether a population is structured or not is useful in informing the type of phylogenetic methods to be used in a given study. We employ tree statistics derived from phylogenetic trees and machine learning classification techniques to reveal an underlying population structure. Results In this paper, we simulate phylogenetic trees from both structured and non-structured host populations. We compute eight statistics for the simulated trees, which are: the number of cherries; Sackin, Colless and total cophenetic indices; ladder length; maximum depth; maximum width, and width-to-depth ratio. Based on the estimated tree statistics, we classify the simulated trees as from either a non-structured or a structured population using the decision tree (DT), K-nearest neighbor (KNN) and support vector machine (SVM). We incorporate the basic reproductive number (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$R_0$$\end{document}R0) in our tree simulation procedure. Sensitivity analysis is done to investigate whether the classifiers are robust to different choice of model parameters and to size of trees. Cross-validated results for area under the curve (AUC) for receiver operating characteristic (ROC) curves yield mean values of over 0.9 for most of the classification models. Conclusions Our classification procedure distinguishes well between trees from structured and non-structured populations using the classifiers, the two-sample Kolmogorov-Smirnov, Cucconi and Podgor-Gastwirth tests and the box plots. SVM models were more robust to changes in model parameters and tree size compared to KNN and DT classifiers. Our classification procedure was applied to real -world data and the structured population was revealed with high accuracy of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$92.3\%$$\end{document}92.3% using SVM-polynomial classifier.
Collapse
Affiliation(s)
- Hassan W Kayondo
- Institute of Basic Sciences, Technology and Innovation (PAUSTI), Pan African University, Nairobi, Kenya. .,Department of Mathematics, Makerere University, Kampala, Uganda.
| | - Alfred Ssekagiri
- Uganda Virus Research Institute (UVRI), Entebbe, Uganda.,Department of Immunology and Molecular Biology, Makerere University, Kampala, Uganda
| | - Grace Nabakooza
- Department of Immunology and Molecular Biology, Makerere University, Kampala, Uganda.,UVRI Centre of Excellence in Infection and Immunity Research and Training (MUII-Plus), Makerere University, Entebbe, Uganda.,Centre for Computational Biology, Uganda Christian University, Mukono, Uganda
| | - Nicholas Bbosa
- Medical Research Council (MRC)/Uganda Virus Research Institute (UVRI) and London School of Hygiene and Tropical Medicine (LSHTM) Uganda Research Unit, Entebbe, Uganda
| | - Deogratius Ssemwanga
- Uganda Virus Research Institute (UVRI), Entebbe, Uganda.,Medical Research Council (MRC)/Uganda Virus Research Institute (UVRI) and London School of Hygiene and Tropical Medicine (LSHTM) Uganda Research Unit, Entebbe, Uganda
| | - Pontiano Kaleebu
- Uganda Virus Research Institute (UVRI), Entebbe, Uganda.,Medical Research Council (MRC)/Uganda Virus Research Institute (UVRI) and London School of Hygiene and Tropical Medicine (LSHTM) Uganda Research Unit, Entebbe, Uganda
| | - Samuel Mwalili
- Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
| | - John M Mango
- Department of Mathematics, Makerere University, Kampala, Uganda
| | | | | | - Ronald Galiwango
- Centre for Computational Biology, Uganda Christian University, Mukono, Uganda
| | - John M Kitayimbwa
- Centre for Computational Biology, Uganda Christian University, Mukono, Uganda
| |
Collapse
|
2
|
Tarasova O, Biziukova N, Filimonov D, Poroikov V. A Computational Approach for the Prediction of HIV Resistance Based on Amino Acid and Nucleotide Descriptors. Molecules 2018; 23:E2751. [PMID: 30355996 PMCID: PMC6278491 DOI: 10.3390/molecules23112751] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 10/07/2018] [Accepted: 10/16/2018] [Indexed: 12/25/2022] Open
Abstract
The high variability of the human immunodeficiency virus (HIV) is an important cause of HIV resistance to reverse transcriptase and protease inhibitors. There are many variants of HIV type 1 (HIV-1) that can be used to model sequence-resistance relationships. Machine learning methods are widely and successfully used in new drug discovery. An emerging body of data regarding the interactions of small drug-like molecules with their protein targets provides the possibility of building models on "structure-property" relationships and analyzing the performance of various machine-learning techniques. In our research, we analyze several different types of descriptors in order to predict the resistance of HIV reverse transcriptase and protease to the marketed antiretroviral drugs using the Random Forest approach. First, we represented amino acid sequences as a set of short peptide fragments, which included several amino acid residues. Second, we represented nucleotide sequences as a set of fragments, which included several nucleotides. We compared these two approaches using open data from the Stanford HIV Drug Resistance Database. We have determined the factors that modulate the performance of prediction: in particular, we observed that the prediction performance was more sensitive to certain drugs than a type of the descriptor used.
Collapse
Affiliation(s)
- Olga Tarasova
- Institute of Biomedical Chemistry, Moscow 119121, Russia.
| | | | | | | |
Collapse
|
3
|
The emergence of drug resistant HIV variants at virological failure of HAART combinations containing efavirenz, tenofovir and lamivudine or emtricitabine within the UK Collaborative HIV Cohort. J Infect 2013; 68:77-84. [PMID: 24055802 DOI: 10.1016/j.jinf.2013.09.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Revised: 08/23/2013] [Accepted: 09/07/2013] [Indexed: 11/23/2022]
Abstract
BACKGROUND Lamivudine (3TC) and emtricitabine (FTC) are guideline choices for combination highly active antiretroviral therapy (HAART). 3TC has a shorter intracellular half life than FTC and may be more likely to lead to the development of drug resistant HIV variants. METHODS In this study we analysed linked data from the observational UK Collaborative HIV Cohort (CHIC) Study and UK HIV Drug Resistance Database (HDRD) to investigate the rate of development of K65R or M184V resistance mutations in patients failing on combinations containing tenofovir (TDF) and efavirenz (EFV) with either 3TC or FTC. Virological failure was defined as 1 viral load >400 copies/ml. Rates were stratified by demographic variables, baseline viral load, current CD4 count, current viral load and year of starting regimen. Significant associations were identified using Poisson regression models and multivariable analyses were performed adjusting for the variables above. Logistic regression was used to determine whether there were any significant associations between type of regimen and detection of resistance mutation. RESULTS 5455 patients received either (or both) 3TC, TDF and EFV or FTC, TDF and EFV contributing 6465 treatment episodes over 9962 person-years follow up. 47 of these episodes were preceded by resistance tests showing development of K65R or M184V mutation and were hence excluded. The majority of treatment episodes consisted of FTC- (n = 5190) rather than 3TC- (n = 1228) based regimens. 21 cases of K65R were detected over the course of follow up, giving an overall event rate of 0.21 (95% CI: 0.12-0.31)/100 person years follow up (PYFU). The overall event rate for detection of M184V was 0.38 (95% CI: 0.26-0.5)/100 PYFU. 201 patients receiving either regimen for the first time experienced virological failure. Of those receiving 3TC (n = 53), 7 (13.2%), 12 (22.6%) and 15 (28.3%) developed K65R, M184V and either K65R or M184V respectively. Of those receiving FTC (n = 148), 13 (8.8%), 20 (13.5%) and 26 (17.6%) developed K65R, M184V and either K65R or M184V respectively. Although patients on 3TC were more likely to develop resistance, this was not statistically significant in univariable (OR 1.85 (95% CI: 0.89-3.85, p = 0.09)) or multivariable analyses (OR 1.89 (95% CI: 0.89-4.01, p = 0.1)). CONCLUSIONS We have not found evidence of an increased risk of development of M184V and K65R in patients exposed to 3TC.
Collapse
|
4
|
Betancor G, Garriga C, Puertas MC, Nevot M, Anta L, Blanco JL, Pérez-Elías MJ, de Mendoza C, Martínez MA, Martinez-Picado J, Menéndez-Arias L, Iribarren JA, Caballero E, Ribera E, Llibre JM, Clotet B, Jaén A, Dalmau D, Gatel JM, Peraire J, Vidal F, Vidal C, Riera M, Córdoba J, López Aldeguer J, Galindo MJ, Gutiérrez F, Álvarez M, García F, Pérez-Romero P, Viciana P, Leal M, Palomares JC, Pineda JA, Viciana I, Santos J, Rodríguez P, Gómez Sirvent JL, Gutiérrez C, Moreno S, Pérez-Olmeda M, Alcamí J, Rodríguez C, del Romero J, Cañizares A, Pedreira J, Miralles C, Ocampo A, Morano L, Aguilera A, Garrido C, Manuzza G, Poveda E, Soriano V. Clinical, virological and biochemical evidence supporting the association of HIV-1 reverse transcriptase polymorphism R284K and thymidine analogue resistance mutations M41L, L210W and T215Y in patients failing tenofovir/emtricitabine therapy. Retrovirology 2012; 9:68. [PMID: 22889300 PMCID: PMC3468358 DOI: 10.1186/1742-4690-9-68] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2012] [Accepted: 07/26/2012] [Indexed: 11/10/2022] Open
Abstract
Background Thymidine analogue resistance mutations (TAMs) selected under treatment with nucleoside analogues generate two distinct genotypic profiles in the HIV-1 reverse transcriptase (RT): (i) TAM1: M41L, L210W and T215Y, and (ii) TAM2: D67N, K70R and K219E/Q, and sometimes T215F. Secondary mutations, including thumb subdomain polymorphisms (e.g. R284K) have been identified in association with TAMs. We have identified mutational clusters associated with virological failure during salvage therapy with tenofovir/emtricitabine-based regimens. In this context, we have studied the role of R284K as a secondary mutation associated with mutations of the TAM1 complex. Results The cross-sectional study carried out with >200 HIV-1 genotypes showed that virological failure to tenofovir/emtricitabine was strongly associated with the presence of M184V (P < 10-10) and TAMs (P < 10-3), while K65R was relatively uncommon in previously-treated patients failing antiretroviral therapy. Clusters of mutations were identified, and among them, the TAM1 complex showed the highest correlation coefficients. Covariation of TAM1 mutations and V118I, V179I, M184V and R284K was observed. Virological studies showed that the combination of R284K with TAM1 mutations confers a fitness advantage in the presence of zidovudine or tenofovir. Studies with recombinant HIV-1 RTs showed that when associated with TAM1 mutations, R284K had a minimal impact on zidovudine or tenofovir inhibition, and in their ability to excise the inhibitors from blocked DNA primers. However, the mutant RT M41L/L210W/T215Y/R284K showed an increased catalytic rate for nucleotide incorporation and a higher RNase H activity in comparison with WT and mutant M41L/L210W/T215Y RTs. These effects were consistent with its enhanced chain-terminated primer rescue on DNA/DNA template-primers, but not on RNA/DNA complexes, and can explain the higher fitness of HIV-1 having TAM1/R284K mutations. Conclusions Our study shows the association of R284K and TAM1 mutations in individuals failing therapy with tenofovir/emtricitabine, and unveils a novel mechanism by which secondary mutations are selected in the context of drug-resistance mutations.
Collapse
Affiliation(s)
- Gilberto Betancor
- Centro de Biología Molecular "Severo Ochoa", Consejo Superior de Investigaciones Científicas & Universidad Autónoma de Madrid, Madrid, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Dybowski JN, Riemenschneider M, Hauke S, Pyka M, Verheyen J, Hoffmann D, Heider D. Improved Bevirimat resistance prediction by combination of structural and sequence-based classifiers. BioData Min 2011; 4:26. [PMID: 22082002 PMCID: PMC3248369 DOI: 10.1186/1756-0381-4-26] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2011] [Accepted: 11/14/2011] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Maturation inhibitors such as Bevirimat are a new class of antiretroviral drugs that hamper the cleavage of HIV-1 proteins into their functional active forms. They bind to these preproteins and inhibit their cleavage by the HIV-1 protease, resulting in non-functional virus particles. Nevertheless, there exist mutations in this region leading to resistance against Bevirimat. Highly specific and accurate tools to predict resistance to maturation inhibitors can help to identify patients, who might benefit from the usage of these new drugs. RESULTS We tested several methods to improve Bevirimat resistance prediction in HIV-1. It turned out that combining structural and sequence-based information in classifier ensembles led to accurate and reliable predictions. Moreover, we were able to identify the most crucial regions for Bevirimat resistance computationally, which are in line with experimental results from other studies. CONCLUSIONS Our analysis demonstrated the use of machine learning techniques to predict HIV-1 resistance against maturation inhibitors such as Bevirimat. New maturation inhibitors are already under development and might enlarge the arsenal of antiretroviral drugs in the future. Thus, accurate prediction tools are very useful to enable a personalized therapy.
Collapse
Affiliation(s)
- J Nikolaj Dybowski
- Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr, 2, 45117 Essen, Germany.
| | | | | | | | | | | | | |
Collapse
|
6
|
Emerging mutations at virological failure of HAART combinations containing tenofovir and lamivudine or emtricitabine. AIDS 2010; 24:1013-8. [PMID: 20124969 DOI: 10.1097/qad.0b013e328336e962] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
OBJECTIVE To compare the emergence of drug-resistant HIV variants at failure of lamivudine (3TC)/tenofovir (TDF)-containing or emtricitabine (FTC)/TDF-containing HAART as a consequence of the different 3TC and FTC intracellular half-lives. DESIGN Retrospective evaluation of 859 patients selected from an Italian HIV resistance database (Antiretroviral Resistance Cohort Analysis). METHODS Patients were selected for analysis if treated with a HAART whose nucleoside/nucleotide reverse transcriptase inhibitor backbone was either 3TC/TDF or FTC/TDF; if they experienced a virological failure after at least 6 months of plasma HIV-RNA undetectability; and if HIV genotypes before treatment and at failure were available. Univariate and multivariate logistic regression analyses were done to detect predictors of resistance mutations emerging at failure. RESULTS Of 714 patients failing with 3TC/TDF and 145 with FTC/TDF, 35.8 and 21.1% were in Centers for Disease Control and Prevention stage C, and 8.8 and 15.2% were on first-line HAART, respectively. At multivariate analysis, the emergence of K70R (P = 0.002), M184V (P = 0.031), T215F (P = 0.020) and Y181C (P = 0.005) was significantly more common in 3TC-treated than in FTC-treated patients, with an odds ratio of 4, 1.56, 1.89 and 3.84, respectively. CONCLUSION Despite their close structural similarity, 3TC and FTC are associated with a significantly different rate of drug resistance at treatment failure when combined with TDF in HAART regimens independently of the third drug used.
Collapse
|
7
|
Heider D, Verheyen J, Hoffmann D. Predicting Bevirimat resistance of HIV-1 from genotype. BMC Bioinformatics 2010; 11:37. [PMID: 20089140 PMCID: PMC3224585 DOI: 10.1186/1471-2105-11-37] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2009] [Accepted: 01/20/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Maturation inhibitors are a new class of antiretroviral drugs. Bevirimat (BVM) was the first substance in this class of inhibitors entering clinical trials. While the inhibitory function of BVM is well established, the molecular mechanisms of action and resistance are not well understood. It is known that mutations in the regions CS p24/p2 and p2 can cause phenotypic resistance to BVM. We have investigated a set of p24/p2 sequences of HIV-1 of known phenotypic resistance to BVM to test whether BVM resistance can be predicted from sequence, and to identify possible molecular mechanisms of BVM resistance in HIV-1. RESULTS We used artificial neural networks and random forests with different descriptors for the prediction of BVM resistance. Random forests with hydrophobicity as descriptor performed best and classified the sequences with an area under the Receiver Operating Characteristics (ROC) curve of 0.93 +/- 0.001. For the collected data we find that p2 sequence positions 369 to 376 have the highest impact on resistance, with positions 370 and 372 being particularly important. These findings are in partial agreement with other recent studies. Apart from the complex machine learning models we derived a number of simple rules that predict BVM resistance from sequence with surprising accuracy. According to computational predictions based on the data set used, cleavage sites are usually not shifted by resistance mutations. However, we found that resistance mutations could shorten and weaken the alpha-helix in p2, which hints at a possible resistance mechanism. CONCLUSIONS We found that BVM resistance of HIV-1 can be predicted well from the sequence of the p2 peptide, which may prove useful for personalized therapy if maturation inhibitors reach clinical practice. Results of secondary structure analysis are compatible with a possible route to BVM resistance in which mutations weaken a six-helix bundle discovered in recent experiments, and thus ease Gag cleavage by the retroviral protease.
Collapse
Affiliation(s)
- Dominik Heider
- Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| | - Jens Verheyen
- Institute of Virology, University of Cologne, Fuerst-Pueckler-Str. 56, 50935 Cologne, Germany
| | - Daniel Hoffmann
- Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany
| |
Collapse
|