1
|
Co-evolution of drug resistance and broadened substrate recognition in HIV protease variants isolated from an Escherichia coli genetic selection system. Biochem J 2022; 479:479-501. [PMID: 35089310 DOI: 10.1042/bcj20210767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 01/07/2022] [Accepted: 01/28/2022] [Indexed: 11/17/2022]
Abstract
A genetic selection system for activity of HIV protease is described that is based on a synthetic substrate constructed as a modified AraC regulatory protein that when cleaved stimulate L-arabinose metabolism in an Escherichia coli araC strain. Growth stimulation on selective plates was shown to depend on active HIV protease and the scissile bond in the substrate. In addition, the growth of cells correlated well with the established cleavage efficiency of the sites in the viral polyprotein, Gag, when these sites were individually introduced into the synthetic substate of the selection system. Plasmids encoding protease variants selected based on stimulation of cell growth in the presence of saquinavir or cleavage of a site not cleaved by wild-type protease, were indistinguishable with respect to both phenotypes. Also, both groups of selected plasmids encoded side chain substitutions known from clinical isolates or displayed different side chain substitutions but at identical positions. One highly frequent side chain substitution, E34V, not regarded as a major drug resistance substitution was found in variants obtained under both selective conditions and is suggested to improve protease processing of the synthetic substrate. This substitution is away from the substrate-binding cavity and together with other substitutions in the selected reading frames supports the previous suggestion of a substrate-binding site extended from the active site binding pocket itself.
Collapse
|
2
|
Koçak Y, Özyer T, Alhajj R. Utilizing maximal frequent itemsets and social network analysis for HIV data analysis. J Cheminform 2016. [PMCID: PMC5395515 DOI: 10.1186/s13321-016-0184-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Acquired immune deficiency syndrome is a deadly disease which is caused by human immunodeficiency virus (HIV). This virus attacks patients immune system and effects its ability to fight against diseases. Developing effective medicine requires understanding the life cycle and replication ability of the virus. HIV-1 protease enzyme is used to cleave an octamer peptide into peptides which are used to create proteins by the virus. In this paper, a novel feature extraction method is proposed for understanding important patterns in octamer’s cleavability. This feature extraction method is based on data mining techniques which are used to find important relations inside a dataset by comprehensively analyzing the given data. As demonstrated in this paper, using the extracted information in the classification process yields important results which may be taken into consideration when developing a new medicine. We have used 746 and 1625, Impens and schilling data instances from the 746-dataset. Besides, we have performed social network analysis as a complementary alternative method.
Collapse
|
3
|
Ain QU, Méndez-Lucio O, Ciriano IC, Malliavin T, van Westen GJP, Bender A. Modelling ligand selectivity of serine proteases using integrative proteochemometric approaches improves model performance and allows the multi-target dependent interpretation of features. Integr Biol (Camb) 2015; 6:1023-33. [PMID: 25255469 DOI: 10.1039/c4ib00175c] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Serine proteases, implicated in important physiological functions, have a high intra-family similarity, which leads to unwanted off-target effects of inhibitors with insufficient selectivity. However, the availability of sequence and structure data has now made it possible to develop approaches to design pharmacological agents that can discriminate successfully between their related binding sites. In this study, we have quantified the relationship between 12,625 distinct protease inhibitors and their bioactivity against 67 targets of the serine protease family (20,213 data points) in an integrative manner, using proteochemometric modelling (PCM). The benchmarking of 21 different target descriptors motivated the usage of specific binding pocket amino acid descriptors, which helped in the identification of active site residues and selective compound chemotypes affecting compound affinity and selectivity. PCM models performed better than alternative approaches (models trained using exclusively compound descriptors on all available data, QSAR) employed for comparison with R(2)/RMSE values of 0.64 ± 0.23/0.66 ± 0.20 vs. 0.35 ± 0.27/1.05 ± 0.27 log units, respectively. Moreover, the interpretation of the PCM model singled out various chemical substructures responsible for bioactivity and selectivity towards particular proteases (thrombin, trypsin and coagulation factor 10) in agreement with the literature. For instance, absence of a tertiary sulphonamide was identified to be responsible for decreased selective activity (by on average 0.27 ± 0.65 pChEMBL units) on FA10. Among the binding pocket residues, the amino acids (arginine, leucine and tyrosine) at positions 35, 39, 60, 93, 140 and 207 were observed as key contributing residues for selective affinity on these three targets.
Collapse
Affiliation(s)
- Qurrat U Ain
- Centre for Molecular Informatics, Department of Chemistry, Lensfield Road, CB2 1EW, University of Cambridge, UK.
| | | | | | | | | | | |
Collapse
|
4
|
Dimitrov I, Doytchinova I. Peptide Binding Prediction to Five Most Frequent HLA-DQ Proteins - a Proteochemometric Approach. Mol Inform 2015; 34:467-76. [PMID: 27490390 DOI: 10.1002/minf.201400150] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 03/04/2015] [Indexed: 12/24/2022]
Abstract
Major histocompatibility complex (MHC) proteins class II, are glycoproteins binding within the cell to short peptides with foreign origin, called epitopes, and present them at the cell surface for inspection by T-cells. Apart from presenting foreign antigens, they are able to present also common self-antigens and trigger autoimmune diseases as coeliac disease and diabetes mellitus type 1. The MHC proteins are extremely polymorphic. The polymorphism is located mainly in the peptide binding site. In the present study, we apply a proteochemometric approach to derive a model for prediction of peptide binding to human MHC class II proteins from locus HLA-DQ. Proteochemometrics was applied on 2624 peptides binding to five most frequent HLA-DQ proteins. The sequences of peptides and proteins were described by three z-descriptors relating to hydrophobicity, steric effects and polarity of amino acids. Cross-terms accounting for the protein-peptide interactions also were included. The derived model was validated by external test set of 660 peptides and showed rpred (2) =0.808, AUC=0.965, 92.5 % accuracy at threshold of pIC50 =5.3 and average sensitivity of 83 % among the top 10 % best predicted nonamers. The model is implemented in the server for MHC binding prediction EpiTOP and is freely available at http://www.ddg-pharmfac.net/epitop.
Collapse
Affiliation(s)
- Ivan Dimitrov
- Faculty of Pharmacy, Medical University of Sofia, 2 Dunav str, 1000 Sofia, Bulgaria tel: +359 2 9236506
| | - Irini Doytchinova
- Faculty of Pharmacy, Medical University of Sofia, 2 Dunav str, 1000 Sofia, Bulgaria tel: +359 2 9236506.
| |
Collapse
|
5
|
Computational chemogenomics: is it more than inductive transfer? J Comput Aided Mol Des 2014; 28:597-618. [PMID: 24771144 DOI: 10.1007/s10822-014-9743-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Accepted: 04/11/2014] [Indexed: 10/25/2022]
Abstract
High-throughput assays challenge us to extract knowledge from multi-ligand, multi-target activity data. In QSAR, weights are statically fitted to each ligand descriptor with respect to a single endpoint or target. However, computational chemogenomics (CG) has demonstrated benefits of learning from entire grids of data at once, rather than building target-specific QSARs. A possible reason for this is the emergence of inductive knowledge transfer (IT) between targets, providing statistical robustness to the model, with no assumption about the structure of the targets. Relevant protein descriptors in CG should allow one to learn how to dynamically adjust ligand attribute weights with respect to protein structure. Hence, models built through explicit learning (EL) by including protein information, while benefitting from IT enhancement, should provide additional predictive capability, notably for protein deorphanization. This interplay between IT and EL in CG modeling is not sufficiently studied. While IT is likely to occur irrespective of the injected target information, it is not clear whether and when boosting due to EL may occur. EL is only possible if protein description is appropriate to the target set under investigation. The key issue here is the search for evidence of genuine EL exceeding expectations based on pure IT. We explore the problem in the context of Support Vector Regression, using more than 9,400 pKi values of 31 GPCRs, where compound-protein interactions are represented by the concatenation of vectorial descriptions of compounds and proteins. This provides a unified framework to generate both IT-enhanced and potentially EL-enabled models, where the difference is toggled by supplied protein information. For EL-enabled models, protein information includes genuine protein descriptors such as typical sequence-based terms, but also the experimentally determined affinity cross-correlation fingerprints. These latter benchmark the expected behavior of a quasi-ideal descriptor capturing the actual functional protein-protein relatedness, and therefore thought to be the most likely to enable EL. EL- and IT-based methods were benchmarked alongside classical QSAR, with respect to cross-validation and deorphanization challenges. A rational method for projecting benchmarked methodologies into a strategy space is given, in the aims that the projection will provide directions for the types of molecule designs possible using a given methodology. While EL-enabled strategies outperform classical QSARs and favorably compare to similar published results, they are, in all respects evaluated herein, not strongly distinguished from IT-enhanced models. Moreover, EL-enabled strategies failed to prove superior in deorphanization challenges. Therefore, this paper raises caution that, contrary to common belief and intuitive expectation, the benefits of chemogenomics models over classical QSAR are quite possibly due less to the injection of protein-related information, and rather impacted more by the effect of inductive transfer, due to simultaneous learning from all of the modeled endpoints. These results show that the field of protein descriptor research needs further improvements to truly realize the expected benefit of EL.
Collapse
|
6
|
Chen F, Li Z, Chen YPP. Determining common insertion sites based on retroviral insertion distribution across tumors. Comput Biol Chem 2014; 51:83-92. [PMID: 24675070 DOI: 10.1016/j.compbiolchem.2014.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2013] [Revised: 02/24/2014] [Accepted: 03/03/2014] [Indexed: 10/25/2022]
Abstract
A CIS (common insertion site) indicates a genome region that is hit more frequently by retroviral insertions than expected by chance. Such a region is strongly related to cancer gene loci, which leads to the detection of cancer genes. An algorithm for detecting CISs should satisfy the following: (1) it does not require any prior knowledge of underlying insertion distribution; (2) it can resolve the insertion biases caused by hotspots; (3) it can detect CISs of any biological width; (4) it can identify noises resulting from statistic mistakes and non-CIS insertions; and (5) it can identify the widths of CISs as accurately as possible. We develop a method to resolve these difficulties. We verify a region's significance from two perspectives: distribution width and distribution depth. The former indicates how many insertions in a region while the latter evaluates the insertion distribution across the tumors in a region. We compare our method with kernel density estimation and sliding window on the simulated data, showing that our method not only identifies cancer-related insertions effectively, but also filters noises correctly. The experiments on the real data show that taking insertion distribution into account can highlight significant CISs. We detect 53 novel CISs, some of which have been proven correct by the biological literature.
Collapse
Affiliation(s)
- Feng Chen
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou City, Henan Province 450001, China; Faculty of Science, Technology and Engineering, La Trobe University, Melbourne, Victoria 3086, Australia
| | - Zhoufang Li
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou City, Henan Province 450001, China
| | - Yi-Ping Phoebe Chen
- Faculty of Science, Technology and Engineering, La Trobe University, Melbourne, Victoria 3086, Australia.
| |
Collapse
|
7
|
Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets. J Cheminform 2013; 5:42. [PMID: 24059743 PMCID: PMC4015169 DOI: 10.1186/1758-2946-5-42] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 09/18/2013] [Indexed: 11/10/2022] Open
Abstract
Background While a large body of work exists on comparing and benchmarking descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 amino acid descriptor sets have been benchmarked with respect to their ability of establishing bioactivity models. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI, BLOSUM, a novel protein descriptor set (termed ProtFP (4 variants)), and in addition we created and benchmarked three pairs of descriptor combinations. Prediction performance was evaluated in seven structure-activity benchmarks which comprise Angiotensin Converting Enzyme (ACE) dipeptidic inhibitor data, and three proteochemometric data sets, namely (1) GPCR ligands modeled against a GPCR panel, (2) enzyme inhibitors (NNRTIs) with associated bioactivities against a set of HIV enzyme mutants, and (3) enzyme inhibitors (PIs) with associated bioactivities on a large set of HIV enzyme mutants. Results The amino acid descriptor sets compared here show similar performance (<0.1 log units RMSE difference and <0.1 difference in MCC), while errors for individual proteins were in some cases found to be larger than those resulting from descriptor set differences ( > 0.3 log units RMSE difference and >0.7 difference in MCC). Combining different descriptor sets generally leads to better modeling performance than utilizing individual sets. The best performers were Z-scales (3) combined with ProtFP (Feature), or Z-Scales (3) combined with an average Z-Scale value for each target, while ProtFP (PCA8), ST-Scales, and ProtFP (Feature) rank last. Conclusions While amino acid descriptor sets capture different aspects of amino acids their ability to be used for bioactivity modeling is still – on average – surprisingly similar. Still, combining sets describing complementary information consistently leads to small but consistent improvement in modeling performance (average MCC 0.01 better, average RMSE 0.01 log units lower). Finally, performance differences exist between the targets compared thereby underlining that choosing an appropriate descriptor set is of fundamental for bioactivity modeling, both from the ligand- as well as the protein side.
Collapse
|
8
|
van Westen GJ, Swier RF, Wegner JK, Ijzerman AP, van Vlijmen HW, Bender A. Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets. J Cheminform 2013; 5:41. [PMID: 24059694 PMCID: PMC3848949 DOI: 10.1186/1758-2946-5-41] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 09/18/2013] [Indexed: 11/10/2022] Open
Abstract
Background While a large body of work exists on comparing and benchmarking of descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 different protein descriptor sets have been compared with respect to their behavior in perceiving similarities between amino acids. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI and BLOSUM, and a novel protein descriptor set termed ProtFP (4 variants). We investigate to which extent descriptor sets show collinear as well as orthogonal behavior via principal component analysis (PCA). Results In describing amino acid similarities, MSWHIM, T-scales and ST-scales show related behavior, as do the VHSE, FASGAI, and ProtFP (PCA3) descriptor sets. Conversely, the ProtFP (PCA5), ProtFP (PCA8), Z-Scales (Binned), and BLOSUM descriptor sets show behavior that is distinct from one another as well as both of the clusters above. Generally, the use of more principal components (>3 per amino acid, per descriptor) leads to a significant differences in the way amino acids are described, despite that the later principal components capture less variation per component of the original input data. Conclusion In this work a comparison is provided of how similar (and differently) currently available amino acids descriptor sets behave when converting structure to property space. The results obtained enable molecular modelers to select suitable amino acid descriptor sets for structure-activity analyses, e.g. those showing complementary behavior.
Collapse
Affiliation(s)
- Gerard Jp van Westen
- Division of Medicinal Chemistry, Leiden / Amsterdam Center for Drug Research, Einsteinweg 55, Leiden 2333, CC, The Netherlands.
| | | | | | | | | | | |
Collapse
|
9
|
Wright DW, Coveney PV. Resolution of discordant HIV-1 protease resistance rankings using molecular dynamics simulations. J Chem Inf Model 2011; 51:2636-49. [PMID: 21902276 DOI: 10.1021/ci200308r] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The emergence of drug resistance is a major challenge for the effective treatment of HIV. In this article, we explore the application of atomistic molecular dynamics simulations to quantify the level of resistance of a patient-derived HIV-1 protease sequence to the inhibitor lopinavir. A comparative drug ranking methodology was developed to compare drug resistance rankings produced by the Stanford HIVdb, ANRS, and RegaDB clinical decision support systems. The methodology was used to identify a patient sequence for which the three rival online tools produced differing resistance rankings. Mutations at only three positions ( L10I , A71IV, and L90M ) influenced the resistance level assigned to the sequence. We use ensemble molecular dynamics simulations to elucidate the origin of these discrepancies and the mechanism of resistance. By simulating not only the full patient sequences but also systems containing the constituent mutations, we gain insight into why resistance estimates vary and the interactions between the various mutations. In the same way, we also gain valuable knowledge of the mechanistic causes of resistance. In particular, we identify changes in the relative conformation of the two beta sheets that form the protease dimer interface which suggest an explanation of the relative frequency of different amino acids observed in patients at residue 71.
Collapse
Affiliation(s)
- David W Wright
- Centre for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, UK
| | | |
Collapse
|
10
|
Hastings J, Chepelev L, Willighagen E, Adams N, Steinbeck C, Dumontier M. The chemical information ontology: provenance and disambiguation for chemical data on the biological semantic web. PLoS One 2011; 6:e25513. [PMID: 21991315 PMCID: PMC3184996 DOI: 10.1371/journal.pone.0025513] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2011] [Accepted: 09/07/2011] [Indexed: 11/19/2022] Open
Abstract
Cheminformatics is the application of informatics techniques to solve chemical problems in silico. There are many areas in biology where cheminformatics plays an important role in computational research, including metabolism, proteomics, and systems biology. One critical aspect in the application of cheminformatics in these fields is the accurate exchange of data, which is increasingly accomplished through the use of ontologies. Ontologies are formal representations of objects and their properties using a logic-based ontology language. Many such ontologies are currently being developed to represent objects across all the domains of science. Ontologies enable the definition, classification, and support for querying objects in a particular domain, enabling intelligent computer applications to be built which support the work of scientists both within the domain of interest and across interrelated neighbouring domains. Modern chemical research relies on computational techniques to filter and organise data to maximise research productivity. The objects which are manipulated in these algorithms and procedures, as well as the algorithms and procedures themselves, enjoy a kind of virtual life within computers. We will call these information entities. Here, we describe our work in developing an ontology of chemical information entities, with a primary focus on data-driven research and the integration of calculated properties (descriptors) of chemical entities within a semantic web context. Our ontology distinguishes algorithmic, or procedural information from declarative, or factual information, and renders of particular importance the annotation of provenance to calculated data. The Chemical Information Ontology is being developed as an open collaborative project. More details, together with a downloadable OWL file, are available at http://code.google.com/p/semanticchemistry/ (license: CC-BY-SA).
Collapse
Affiliation(s)
- Janna Hastings
- Chemoinformatics and Metabolism, European Bioinformatics Institute, Hinxton, United Kingdom.
| | | | | | | | | | | |
Collapse
|
11
|
Li Q, Li X, Li C, Chen L, Song J, Tang Y, Xu X. A network-based multi-target computational estimation scheme for anticoagulant activities of compounds. PLoS One 2011; 6:e14774. [PMID: 21445339 PMCID: PMC3062543 DOI: 10.1371/journal.pone.0014774] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2009] [Accepted: 02/19/2011] [Indexed: 12/26/2022] Open
Abstract
Background Traditional virtual screening method pays more attention on predicted binding affinity between drug molecule and target related to a certain disease instead of phenotypic data of drug molecule against disease system, as is often less effective on discovery of the drug which is used to treat many types of complex diseases. Virtual screening against a complex disease by general network estimation has become feasible with the development of network biology and system biology. More effective methods of computational estimation for the whole efficacy of a compound in a complex disease system are needed, given the distinct weightiness of the different target in a biological process and the standpoint that partial inhibition of several targets can be more efficient than the complete inhibition of a single target. Methodology We developed a novel approach by integrating the affinity predictions from multi-target docking studies with biological network efficiency analysis to estimate the anticoagulant activities of compounds. From results of network efficiency calculation for human clotting cascade, factor Xa and thrombin were identified as the two most fragile enzymes, while the catalytic reaction mediated by complex IXa:VIIIa and the formation of the complex VIIIa:IXa were recognized as the two most fragile biological matter in the human clotting cascade system. Furthermore, the method which combined network efficiency with molecular docking scores was applied to estimate the anticoagulant activities of a serial of argatroban intermediates and eight natural products respectively. The better correlation (r = 0.671) between the experimental data and the decrease of the network deficiency suggests that the approach could be a promising computational systems biology tool to aid identification of anticoagulant activities of compounds in drug discovery. Conclusions This article proposes a network-based multi-target computational estimation method for anticoagulant activities of compounds by combining network efficiency analysis with scoring function from molecular docking.
Collapse
Affiliation(s)
- Qian Li
- Beijing National Laboratory for Molecular Sciences, State Key Lab of Rare Earth Material Chemistry and Applications, College of Chemistry and Molecular Engineering, Peking University, Beijing, People's Republic of China
- Beijing National Laboratory for Molecular Sciences, Center for Molecular Sciences, State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry Chinese Academy of Sciences, Beijing, People's Republic of China
- Graduate University of Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Xudong Li
- Beijing National Laboratory for Molecular Sciences, State Key Lab of Rare Earth Material Chemistry and Applications, College of Chemistry and Molecular Engineering, Peking University, Beijing, People's Republic of China
| | - Canghai Li
- Experimental Research Center, China Academy of Chinese Medical Sciences, Beijing, People's Republic of China
| | - Lirong Chen
- Beijing National Laboratory for Molecular Sciences, State Key Lab of Rare Earth Material Chemistry and Applications, College of Chemistry and Molecular Engineering, Peking University, Beijing, People's Republic of China
- * E-mail: (LC); (YT); (XX)
| | - Jun Song
- Experimental Research Center, China Academy of Chinese Medical Sciences, Beijing, People's Republic of China
| | - Yalin Tang
- Beijing National Laboratory for Molecular Sciences, Center for Molecular Sciences, State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry Chinese Academy of Sciences, Beijing, People's Republic of China
- * E-mail: (LC); (YT); (XX)
| | - Xiaojie Xu
- Beijing National Laboratory for Molecular Sciences, State Key Lab of Rare Earth Material Chemistry and Applications, College of Chemistry and Molecular Engineering, Peking University, Beijing, People's Republic of China
- * E-mail: (LC); (YT); (XX)
| |
Collapse
|
12
|
A comparative study of HIV-1 and HTLV-I protease structure and dynamics reveals a conserved residue interaction network. J Mol Model 2011; 17:2693-705. [PMID: 21279524 DOI: 10.1007/s00894-011-0971-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2010] [Accepted: 01/11/2011] [Indexed: 12/14/2022]
Abstract
The two retroviruses human T-lymphotropic virus type I (HTLV-I) and human immunodeficiency virus type 1 (HIV-1) are the causative agents of severe and fatal diseases including adult T-cell leukemia and the acquired immune deficiency syndrome (AIDS). Both viruses code for a protease that is essential for replication and therefore represents a key target for drugs interfering with viral infection. The retroviral proteases from HIV-1 and HTLV-I share 31% sequence identity and high structural similarities. Yet, their substrate specificities and inhibition profiles differ substantially. In this study, we performed all-atom molecular dynamics (MD) simulations for both enzymes in their ligand-free states and in complex with model substrates in order to compare their dynamic behaviors and enhance our understanding of the correlation between sequence, structure, and dynamics in this protein family. We found extensive similarities in both local and overall protein dynamics, as well as in the energetics of their interactions with model substrates. Interestingly, those residues that are important for strong ligand binding are frequently not conserved in sequence, thereby offering an explanation for the differences in binding specificity. Moreover, we identified an interaction network of contacts between conserved residues that interconnects secondary structure elements and serves as a scaffold for the protein fold. This interaction network is conformationally stable over time and may provide an explanation for the highly similar dynamic behavior of the two retroviral proteases, even in the light of their rather low overall sequence identity.
Collapse
|
13
|
Proteochemometric modeling of the susceptibility of mutated variants of the HIV-1 virus to reverse transcriptase inhibitors. PLoS One 2010; 5:e14353. [PMID: 21179544 PMCID: PMC3002298 DOI: 10.1371/journal.pone.0014353] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2010] [Accepted: 11/10/2010] [Indexed: 12/16/2022] Open
Abstract
Background Reverse transcriptase is a major drug target in highly active antiretroviral therapy (HAART) against HIV, which typically comprises two nucleoside/nucleotide analog reverse transcriptase (RT) inhibitors (NRTIs) in combination with a non-nucleoside RT inhibitor or a protease inhibitor. Unfortunately, HIV is capable of escaping the therapy by mutating into drug-resistant variants. Computational models that correlate HIV drug susceptibilities to the virus genotype and to drug molecular properties might facilitate selection of improved combination treatment regimens. Methodology/Principal Findings We applied our earlier developed proteochemometric modeling technology to analyze HIV mutant susceptibility to the eight clinically approved NRTIs. The data set used covered 728 virus variants genotyped for 240 sequence residues of the DNA polymerase domain of the RT; 165 of these residues contained mutations; totally the data-set covered susceptibility data for 4,495 inhibitor-RT combinations. Inhibitors and RT sequences were represented numerically by 3D-structural and physicochemical property descriptors, respectively. The two sets of descriptors and their derived cross-terms were correlated to the susceptibility data by partial least-squares projections to latent structures. The model identified more than ten frequently occurring mutations, each conferring more than two-fold loss of susceptibility for one or several NRTIs. The most deleterious mutations were K65R, Q151M, M184V/I, and T215Y/F, each of them decreasing susceptibility to most of the NRTIs. The predictive ability of the model was estimated by cross-validation and by external predictions for new HIV variants; both procedures showed very high correlation between the predicted and actual susceptibility values (Q2 = 0.89 and Q2ext = 0.86). The model is available at www.hivdrc.org as a free web service for the prediction of the susceptibility to any of the clinically used NRTIs for any HIV-1 mutant variant. Conclusions/Significance Our results give directions how to develop approaches for selection of genome-based optimum combination therapy for patients harboring mutated HIV variants.
Collapse
|
14
|
Ode H, Yokoyama M, Kanda T, Sato H. Identification of folding preferences of cleavage junctions of HIV-1 precursor proteins for regulation of cleavability. J Mol Model 2010; 17:391-9. [PMID: 20480379 DOI: 10.1007/s00894-010-0739-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2010] [Accepted: 04/30/2010] [Indexed: 11/30/2022]
Abstract
Human immunodeficiency virus type 1 protease (HIV-1 PR) cleaves two viral precursor proteins, Gag and Gag-Pol, at multiple sites. Although the processing proceeds in the rank order to assure effective viral replication, the molecular mechanisms by which the order is regulated are not fully understood. In this study, we used bioinformatics approaches to examine whether the folding preferences of the cleavage junctions influence their cleavabilities by HIV-1 PR. The folding of the eight-amino-acid peptides corresponding to the seven cleavage junctions of the HIV-1(HXB2) Gag and Gag-Pol precursors were simulated in the PR-free and PR-bound states with molecular dynamics and homology modeling methods, and the relationships between the folding parameters and the reported kinetic parameters of the HIV-1(HXB2) peptides were analyzed. We found that a folding preference for forming a dihedral angle of Cβ (P1)-Cα (P1)- Cα (P1')-Cβ (P1') in the range of 150 to 180 degrees in the PR-free state was positively correlated with the 1/K(m) (R = 0.95, P = 0.0008) and that the dihedral angle of the O (P2)-C (P2)- C (P1)- O (P1) of the main chains in the PR-bound state was negatively correlated with k(cat) (R = 0.94, P = 0.001). We further found that these two folding properties influenced the overall cleavability of the precursor protein when the sizes of the side chains at the P1 site were similar. These data suggest that the dihedral angles at the specific positions around the cleavage junctions before and after binding to PR are both critical for regulating the cleavability of precursor proteins by HIV-1 PR.
Collapse
Affiliation(s)
- Hirotaka Ode
- Pathogen Genomics Center, National Institute of Infectious Diseases, Tokyo, Japan.
| | | | | | | |
Collapse
|
15
|
Identification of structural mechanisms of HIV-1 protease specificity using computational peptide docking: implications for drug resistance. Structure 2010; 17:1636-1648. [PMID: 20004167 DOI: 10.1016/j.str.2009.10.008] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2009] [Revised: 10/01/2009] [Accepted: 10/04/2009] [Indexed: 11/23/2022]
Abstract
Drug-resistant mutations (DRMs) in HIV-1 protease are a major challenge to antiretroviral therapy. Protease-substrate interactions that are determined to be critical for native selectivity could serve as robust targets for drug design that are immune to DRMs. In order to identify the structural mechanisms of selectivity, we developed a peptide-docking algorithm to predict the atomic structure of protease-substrate complexes and applied it to a large and diverse set of cleavable and noncleavable peptides. Cleavable peptides showed significantly lower energies of interaction than noncleavable peptides with six protease active-site residues playing the most significant role in discrimination. Surprisingly, all six residues correspond to sequence positions associated with drug resistance mutations, demonstrating that the very residues that are responsible for native substrate specificity in HIV-1 protease are altered during its evolution to drug resistance, suggesting that drug resistance and substrate selectivity may share common mechanisms.
Collapse
|
16
|
Dimitrov I, Garnev P, Flower DR, Doytchinova I. Peptide binding to the HLA-DRB1 supertype: a proteochemometrics analysis. Eur J Med Chem 2009; 45:236-43. [PMID: 19896246 DOI: 10.1016/j.ejmech.2009.09.049] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2009] [Revised: 09/04/2009] [Accepted: 09/29/2009] [Indexed: 11/19/2022]
Abstract
A proteochemometrics approach was applied to a set of 2666 peptides binding to 12 HLA-DRB1 proteins. Sequences of both peptide and protein were described using three z-descriptors. Cross terms accounting for adjacent positions and for every second position in the peptides were included in the models, as well as cross terms for peptide/protein interactions. Models were derived based on combinations of different blocks of variables. These models had moderate goodness of fit, as expressed by r2, which ranged from 0.685 to 0.732; and good cross-validated predictive ability, as expressed by q2, which varied from 0.678 to 0.719. The external predictive ability was tested using a set of 356 HLA-DRB1 binders, which showed an r2(pred) in the range 0.364-0.530. Peptide and protein positions involved in the interactions were analyzed in terms of hydrophobicity, steric bulk and polarity.
Collapse
Affiliation(s)
- Ivan Dimitrov
- Faculty of Pharmacy, Medical University of Sofia, 2 Dunav st, 1000 Sofia, Bulgaria
| | | | | | | |
Collapse
|
17
|
Kontijevskis A, Petrovska R, Yahorava S, Komorowski J, Wikberg JES. Proteochemometrics mapping of the interaction space for retroviral proteases and their substrates. Bioorg Med Chem 2009; 17:5229-37. [PMID: 19539482 DOI: 10.1016/j.bmc.2009.05.045] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2008] [Revised: 04/01/2009] [Accepted: 05/17/2009] [Indexed: 10/20/2022]
Abstract
Understanding the complex interactions of retroviral proteases with their ligands is an important scientific challenge in efforts to achieve control of retroviral infections. Development of drug resistance because of high mutation rates and extensive polymorphisms causes major problems in treating the deadly diseases these viruses cause, and prompts efforts to identify new strategies. Here we report a comprehensive analysis of the interaction of 63 retroviral proteases from nine different viral species with their substrates and inhibitors based on publicly available data from the past 17years of retroviral research. By correlating physico-chemical descriptions of retroviral proteases and substrates to their biological activities we constructed a highly statistically valid 'proteochemometric' model for the interactome of retroviral proteases. Analysis of the model indicated amino acid positions in retroviral proteases with the highest influence on ligand activity and revealed general physicochemical properties essential for tight binding of substrates across multiple retroviral proteases. Hexapeptide inhibitors developed based on the discovered general properties effectively inhibited HIV-1 proteases in vitro, and some exhibited uniformly high inhibitory activity against all HIV-1 proteases mutants evaluated. A generalized proteochemometric model for retroviral proteases interactome has been created and analysed in this study. Our results demonstrate the feasibility of using the developed general strategy in the design of inhibitory peptides that can potentially serve as templates for drug resistance-improved HIV retardants.
Collapse
Affiliation(s)
- Aleksejs Kontijevskis
- Department of Pharmaceutical Biosciences, Uppsala University, Husargatan 3, SE-75124, Uppsala, Sweden
| | | | | | | | | |
Collapse
|
18
|
Strömbergsson H, Daniluk P, Kryshtafovych A, Fidelis K, Wikberg JES, Kleywegt GJ, Hvidsten TR. Interaction model based on local protein substructures generalizes to the entire structural enzyme-ligand space. J Chem Inf Model 2008; 48:2278-88. [PMID: 18937438 DOI: 10.1021/ci800200e] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Chemogenomics is a new strategy in in silico drug discovery, where the ultimate goal is to understand molecular recognition for all molecules interacting with all proteins in the proteome. To study such cross interactions, methods that can generalize over proteins that vary greatly in sequence, structure, and function are needed. We present a general quantitative approach to protein-ligand binding affinity prediction that spans the entire structural enzyme-ligand space. The model was trained on a data set composed of all available enzymes cocrystallized with druglike ligands, taken from four publicly available interaction databases, for which a crystal structure is available. Each enzyme was characterized by a set of local descriptors of protein structure that describe the binding site of the cocrystallized ligand. The ligands in the training set were described by traditional QSAR descriptors. To evaluate the model, a comprehensive test set consisting of enzyme structures and ligands was manually curated. The test set contained enzyme-ligand complexes for which no crystal structures were available, and thus the binding modes were unknown. The test set enzymes were therefore characterized by matching their entire structures to the local descriptor library constructed from the training set. Both the training and the test set contained enzyme-ligand complexes from all major enzyme classes, and the enzymes spanned a large range of sequences and folds. The experimental binding affinities (p K i) ranged from 0.5 to 11.9 (0.7-11.0 in the test set). The induced model predicted the binding affinities of the external test set enzyme-ligand complexes with an r (2) of 0.53 and an RMSEP of 1.5. This demonstrates that the use of local descriptors makes it possible to create rough predictive models that can generalize over a wide range of protein targets.
Collapse
Affiliation(s)
- Helena Strömbergsson
- The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden, Department of Biophysics, Faculty of Physics, University of Warsaw, Warsaw, Poland
| | | | | | | | | | | | | |
Collapse
|
19
|
Prusis P, Lapins M, Yahorava S, Petrovska R, Niyomrattanakit P, Katzenmeier G, Wikberg JE. Proteochemometrics analysis of substrate interactions with dengue virus NS3 proteases. Bioorg Med Chem 2008; 16:9369-77. [DOI: 10.1016/j.bmc.2008.08.081] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2008] [Revised: 08/07/2008] [Accepted: 08/20/2008] [Indexed: 11/25/2022]
|
20
|
Eklund M, Spjuth O, Wikberg JE. The C1C2: a framework for simultaneous model selection and assessment. BMC Bioinformatics 2008; 9:360. [PMID: 18761753 PMCID: PMC2556350 DOI: 10.1186/1471-2105-9-360] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2008] [Accepted: 09/02/2008] [Indexed: 11/12/2022] Open
Abstract
Background There has been recent concern regarding the inability of predictive modeling approaches to generalize to new data. Some of the problems can be attributed to improper methods for model selection and assessment. Here, we have addressed this issue by introducing a novel and general framework, the C1C2, for simultaneous model selection and assessment. The framework relies on a partitioning of the data in order to separate model choice from model assessment in terms of used data. Since the number of conceivable models in general is vast, it was also of interest to investigate the employment of two automatic search methods, a genetic algorithm and a brute-force method, for model choice. As a demonstration, the C1C2 was applied to simulated and real-world datasets. A penalized linear model was assumed to reasonably approximate the true relation between the dependent and independent variables, thus reducing the model choice problem to a matter of variable selection and choice of penalizing parameter. We also studied the impact of assuming prior knowledge about the number of relevant variables on model choice and generalization error estimates. The results obtained with the C1C2 were compared to those obtained by employing repeated K-fold cross-validation for choosing and assessing a model. Results The C1C2 framework performed well at finding the true model in terms of choosing the correct variable subset and producing reasonable choices for the penalizing parameter, even in situations when the independent variables were highly correlated and when the number of observations was less than the number of variables. The C1C2 framework was also found to give accurate estimates of the generalization error. Prior information about the number of important independent variables improved the variable subset choice but reduced the accuracy of generalization error estimates. Using the genetic algorithm worsened the model choice but not the generalization error estimates, compared to using the brute-force method. The results obtained with repeated K-fold cross-validation were similar to those produced by the C1C2 in terms of model choice, however a lower accuracy of the generalization error estimates was observed. Conclusion The C1C2 framework was demonstrated to work well for finding the true model within a penalized linear model class and accurately assess its generalization error, even for datasets with many highly correlated independent variables, a low observation-to-variable ratio, and model assumption deviations. A complete separation of the model choice and the model assessment in terms of data used for each task improves the estimates of the generalization error.
Collapse
Affiliation(s)
- Martin Eklund
- Department of Pharmaceutical Pharmacology, Uppsala University, Box 591, BMC, SE-751 24 Uppsala, Sweden.
| | | | | |
Collapse
|
21
|
Kontijevskis A, Komorowski J, Wikberg JES. Generalized Proteochemometric Model of Multiple Cytochrome P450 Enzymes and Their Inhibitors. J Chem Inf Model 2008; 48:1840-50. [DOI: 10.1021/ci8000953] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Affiliation(s)
- Aleksejs Kontijevskis
- Department of Pharmaceutical Biosciences and Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden
| | - Jan Komorowski
- Department of Pharmaceutical Biosciences and Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden
| | - Jarl E. S. Wikberg
- Department of Pharmaceutical Biosciences and Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden
| |
Collapse
|
22
|
Lapins M, Eklund M, Spjuth O, Prusis P, Wikberg JES. Proteochemometric modeling of HIV protease susceptibility. BMC Bioinformatics 2008; 9:181. [PMID: 18402661 PMCID: PMC2375133 DOI: 10.1186/1471-2105-9-181] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2007] [Accepted: 04/10/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A major obstacle in treatment of HIV is the ability of the virus to mutate rapidly into drug-resistant variants. A method for predicting the susceptibility of mutated HIV strains to antiviral agents would provide substantial clinical benefit as well as facilitate the development of new candidate drugs. Therefore, we used proteochemometrics to model the susceptibility of HIV to protease inhibitors in current use, utilizing descriptions of the physico-chemical properties of mutated HIV proteases and 3D structural property descriptions for the protease inhibitors. The descriptions were correlated to the susceptibility data of 828 unique HIV protease variants for seven protease inhibitors in current use; the data set comprised 4792 protease-inhibitor combinations. RESULTS The model provided excellent predictability (R2 = 0.92, Q2 = 0.87) and identified general and specific features of drug resistance. The model's predictive ability was verified by external prediction in which the susceptibilities to each one of the seven inhibitors were omitted from the data set, one inhibitor at a time, and the data for the six remaining compounds were used to create new models. This analysis showed that the over all predictive ability for the omitted inhibitors was Q2 inhibitors = 0.72. CONCLUSION Our results show that a proteochemometric approach can provide generalized susceptibility predictions for new inhibitors. Our proteochemometric model can directly analyze inhibitor-protease interactions and facilitate treatment selection based on viral genotype. The model is available for public use, and is located at HIV Drug Research Centre.
Collapse
Affiliation(s)
- Maris Lapins
- Department of Pharmaceutical Pharmacology, Uppsala University, SE-751 24, Sweden.
| | | | | | | | | |
Collapse
|