1
|
Fathi A, Sadeghi R. A genetic programming method for feature mapping to improve prediction of HIV-1 protease cleavage site. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.06.045] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
2
|
Koçak Y, Özyer T, Alhajj R. Utilizing maximal frequent itemsets and social network analysis for HIV data analysis. J Cheminform 2016. [PMCID: PMC5395515 DOI: 10.1186/s13321-016-0184-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Acquired immune deficiency syndrome is a deadly disease which is caused by human immunodeficiency virus (HIV). This virus attacks patients immune system and effects its ability to fight against diseases. Developing effective medicine requires understanding the life cycle and replication ability of the virus. HIV-1 protease enzyme is used to cleave an octamer peptide into peptides which are used to create proteins by the virus. In this paper, a novel feature extraction method is proposed for understanding important patterns in octamer’s cleavability. This feature extraction method is based on data mining techniques which are used to find important relations inside a dataset by comprehensively analyzing the given data. As demonstrated in this paper, using the extracted information in the classification process yields important results which may be taken into consideration when developing a new medicine. We have used 746 and 1625, Impens and schilling data instances from the 746-dataset. Besides, we have performed social network analysis as a complementary alternative method.
Collapse
|
3
|
Manning T, Walsh P. The importance of physicochemical characteristics and nonlinear classifiers in determining HIV-1 protease specificity. Bioengineered 2016; 7:65-78. [PMID: 27212259 PMCID: PMC4879986 DOI: 10.1080/21655979.2016.1149271] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Revised: 01/25/2016] [Accepted: 01/26/2016] [Indexed: 10/21/2022] Open
Abstract
This paper reviews recent research relating to the application of bioinformatics approaches to determining HIV-1 protease specificity, outlines outstanding issues, and presents a new approach to addressing these issues. Leading machine learning theory for the problem currently suggests that the direct encoding of the physicochemical properties of the amino acid substrates is not required for optimal performance. A number of amino acid encoding approaches which incorporate potentially relevant physicochemical properties of the substrate are identified, and are evaluated using a nonlinear task decomposition based neuroevolution algorithm. The results are evaluated, and compared against a recent benchmark set on a nonlinear classifier using only amino acid sequence and identity information. Ensembles of these nonlinear classifiers using the physicochemical properties of the substrate are demonstrated to consistently outperform the recently published state-of-the-art linear support vector machine based approach in out-of-sample evaluations.
Collapse
Affiliation(s)
- Timmy Manning
- Department of Computer Science, Cork Institute of Technology, Cork, Ireland
| | - Paul Walsh
- Department of Computer Science, Cork Institute of Technology, Cork, Ireland
- NSilico Ltd, Rubicon Innovation Center, Cork, Ireland
| |
Collapse
|
4
|
Feature Selection Combined with Neural Network Structure Optimization for HIV-1 Protease Cleavage Site Prediction. BIOMED RESEARCH INTERNATIONAL 2015; 2015:263586. [PMID: 25961009 PMCID: PMC4413510 DOI: 10.1155/2015/263586] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 01/07/2015] [Indexed: 11/17/2022]
Abstract
It is crucial to understand the specificity of HIV-1 protease for designing HIV-1 protease inhibitors. In this paper, a new feature selection method combined with neural network structure optimization is proposed to analyze the specificity of HIV-1 protease and find the important positions in an octapeptide that determined its cleavability. Two kinds of newly proposed features based on Amino Acid Index database plus traditional orthogonal encoding features are used in this paper, taking both physiochemical and sequence information into consideration. Results of feature selection prove that p2, p1, p1′, and p2′ are the most important positions. Two feature fusion methods are used in this paper: combination fusion and decision fusion aiming to get comprehensive feature representation and improve prediction performance. Decision fusion of subsets that getting after feature selection obtains excellent prediction performance, which proves feature selection combined with decision fusion is an effective and useful method for the task of HIV-1 protease cleavage site prediction. The results and analysis in this paper can provide useful instruction and help designing HIV-1 protease inhibitor in the future.
Collapse
|
5
|
Rögnvaldsson T, You L, Garwicz D. State of the art prediction of HIV-1 protease cleavage sites. Bioinformatics 2014; 31:1204-10. [PMID: 25504647 DOI: 10.1093/bioinformatics/btu810] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Accepted: 12/04/2014] [Indexed: 02/01/2023] Open
Abstract
MOTIVATION Understanding the substrate specificity of human immunodeficiency virus (HIV)-1 protease is important when designing effective HIV-1 protease inhibitors. Furthermore, characterizing and predicting the cleavage profile of HIV-1 protease is essential to generate and test hypotheses of how HIV-1 affects proteins of the human host. Currently available tools for predicting cleavage by HIV-1 protease can be improved. RESULTS The linear support vector machine with orthogonal encoding is shown to be the best predictor for HIV-1 protease cleavage. It is considerably better than current publicly available predictor services. It is also found that schemes using physicochemical properties do not improve over the standard orthogonal encoding scheme. Some issues with the currently available data are discussed. AVAILABILITY AND IMPLEMENTATION The datasets used, which are the most important part, are available at the UCI Machine Learning Repository. The tools used are all standard and easily available. CONTACT thorsteinn.rognvaldsson@hh.se.
Collapse
Affiliation(s)
- Thorsteinn Rögnvaldsson
- CAISR, School of Information Science, Computer and Electrical Engineering, Halmstad University, Halmstad, Sweden and Division of Clinical Chemistry and Pharmacology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Liwen You
- CAISR, School of Information Science, Computer and Electrical Engineering, Halmstad University, Halmstad, Sweden and Division of Clinical Chemistry and Pharmacology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Daniel Garwicz
- CAISR, School of Information Science, Computer and Electrical Engineering, Halmstad University, Halmstad, Sweden and Division of Clinical Chemistry and Pharmacology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| |
Collapse
|
6
|
Matusali G, Tchidjou HK, Pontrelli G, Bernardi S, D'Ettorre G, Vullo V, Buonomini AR, Andreoni M, Santoni A, Cerboni C, Doria M. Soluble ligands for the NKG2D receptor are released during HIV-1 infection and impair NKG2D expression and cytotoxicity of NK cells. FASEB J 2013; 27:2440-50. [PMID: 23395909 DOI: 10.1096/fj.12-223057] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
In humans, the interaction of the natural killer group 2 member D (NKG2D)-activating receptor on natural killer (NK) and CD8(+) T cells with its major histocompatibility complex class I-related chain (MIC) and UL16 binding protein (ULBP) ligands (NKG2DLs) promotes recognition and elimination of stressed cells, such as tumor or infected cells. Here, we investigated the capacity of HIV-1 to modulate NKG2DL expression and escape NGK2D-mediated immunosurveillance. In CD4(+) T lymphocytes, both cell surface expression and release of MICA, MICB, and ULBP2 were up-regulated >2-fold by HIV-1 infection. In HIV-infected CD4(+) T lymphocytes or Jurkat T-cell lines, increased shedding of soluble NKG2DLs (sNKG2DLs) was impaired by a matrix metalloproteinase inhibitor (MMPI). Moreover, naive HIV(+) patients displayed increased plasma sMICA and sULBP2 levels and reduced NKG2D expression on NK and CD8(+) T cells compared to patients receiving highly active antiretroviral therapy (HAART) or healthy donors. In individual patients, HAART uptake resulted in the drop of sNKG2DL and recovery of NKG2D expression. Finally, sNKG2DLs in patients' plasma down-regulated NKG2D on NK and CD8(+) T cells and impaired NKG2D-mediated cytotoxicity of NK cells. Thus, NKG2D detuning by sNKG2DLs may promote HIV-1 immune evasion and compromise host resistance to opportunistic infections, but HAART and MMPI have the potential to avoid such immune dysfunction.
Collapse
Affiliation(s)
- Giulia Matusali
- Laboratory of Immunoinfectivology, Bambino Gesù Children's Hospital, Istituto di Ricovero e Cura a Carattere Scientifico, Rome, Italy
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
PMeS: prediction of methylation sites based on enhanced feature encoding scheme. PLoS One 2012; 7:e38772. [PMID: 22719939 PMCID: PMC3376144 DOI: 10.1371/journal.pone.0038772] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2011] [Accepted: 05/14/2012] [Indexed: 01/16/2023] Open
Abstract
Protein methylation is predominantly found on lysine and arginine residues, and carries many important biological functions, including gene regulation and signal transduction. Given their important involvement in gene expression, protein methylation and their regulatory enzymes are implicated in a variety of human disease states such as cancer, coronary heart disease and neurodegenerative disorders. Thus, identification of methylation sites can be very helpful for the drug designs of various related diseases. In this study, we developed a method called PMeS to improve the prediction of protein methylation sites based on an enhanced feature encoding scheme and support vector machine. The enhanced feature encoding scheme was composed of the sparse property coding, normalized van der Waals volume, position weight amino acid composition and accessible surface area. The PMeS achieved a promising performance with a sensitivity of 92.45%, a specificity of 93.18%, an accuracy of 92.82% and a Matthew’s correlation coefficient of 85.69% for arginine as well as a sensitivity of 84.38%, a specificity of 93.94%, an accuracy of 89.16% and a Matthew’s correlation coefficient of 78.68% for lysine in 10-fold cross validation. Compared with other existing methods, the PMeS provides better predictive performance and greater robustness. It can be anticipated that the PMeS might be useful to guide future experiments needed to identify potential methylation sites in proteins of interest. The online service is available at http://bioinfo.ncu.edu.cn/inquiries_PMeS.aspx.
Collapse
|
8
|
Artificial intelligence systems based on texture descriptors for vaccine development. Amino Acids 2010; 40:443-51. [DOI: 10.1007/s00726-010-0654-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2009] [Accepted: 06/03/2010] [Indexed: 10/19/2022]
|
9
|
Rögnvaldsson T, Etchells TA, You L, Garwicz D, Jarman I, Lisboa PJG. How to find simple and accurate rules for viral protease cleavage specificities. BMC Bioinformatics 2009; 10:149. [PMID: 19445713 PMCID: PMC2698905 DOI: 10.1186/1471-2105-10-149] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2009] [Accepted: 05/16/2009] [Indexed: 01/02/2023] Open
Abstract
Background Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way. Results A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods. Conclusion A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data.
Collapse
|
10
|
Study of Inhibitors Against SARS Coronavirus by Computational Approaches. VIRAL PROTEASES AND ANTIVIRAL PROTEASE INHIBITOR THERAPY 2009. [PMCID: PMC7122585 DOI: 10.1007/978-90-481-2348-3_1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
11
|
Shen HB, Chou KC. Identification of proteases and their types. Anal Biochem 2008; 385:153-60. [PMID: 19007742 DOI: 10.1016/j.ab.2008.10.020] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2009] [Revised: 10/13/2008] [Accepted: 10/14/2008] [Indexed: 10/21/2022]
Abstract
Called by many as biology's version of Swiss army knives, proteases cut long sequences of amino acids into fragments and regulate most physiological processes. They are vitally important in the life cycle. Different types of proteases have different action mechanisms and biological processes. With the avalanche of protein sequences generated during the postgenomic age, it is highly desirable for both basic research and drug design to develop a fast and reliable method for identifying the types of proteases according to their sequences or even just for whether they are proteases or not. In this article, three recently developed identification methods in this regard are discussed: (i) FunD-PseAAC, (ii) GO-PseAAC, and (iii) FunD-PsePSSM. The first two were established by hybridizing the FunD (functional domain) approach and the GO (gene ontology) approach, respectively, with the PseAAC (pseudo amino acid composition) approach. The third method was established by fusing the FunD approach with the PsePSSM (pseudo position-specific scoring matrix) approach. Of these three methods, only FunD-PsePSSM has provided a server called ProtIdent (protease identifier), which is freely accessible to the public via the website at http://www.csbio.sjtu.edu.cn/bioinf/Protease. For the convenience of users, a step-by-step guide on how to use ProtIdent is illustrated. Meanwhile, the caveat in using ProtIdent and how to understand the success expectancy rate of a statistical predictor are discussed. Finally, the essence of why ProtIdent can yield a high success rate in identifying proteases and their types is elucidated.
Collapse
Affiliation(s)
- Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai 200240, China.
| | | |
Collapse
|
12
|
Chou KC, Shen HB. ProtIdent: a web server for identifying proteases and their types by fusing functional domain and sequential evolution information. Biochem Biophys Res Commun 2008; 376:321-5. [PMID: 18774775 DOI: 10.1016/j.bbrc.2008.08.125] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2008] [Accepted: 08/26/2008] [Indexed: 10/21/2022]
Abstract
Proteases are vitally important to life cycles and have become a main target in drug development. According to their action mechanisms, proteases are classified into six types: (1) aspartic, (2) cysteine, (3) glutamic, (4) metallo, (5) serine, and (6) threonine. Given the sequence of an uncharacterized protein, can we identify whether it is a protease or non-protease? If it is, what type does it belong to? To address these problems, a 2-layer predictor, called "ProtIdent", is developed by fusing the functional domain and sequential evolution information: the first layer is for identifying the query protein as protease or non-protease; if it is a protease, the process will automatically go to the second layer to further identify it among the six types. The overall success rates in both cases by rigorous cross-validation tests were higher than 92%. ProtIdent is freely accessible to the public as a web server at http://www.csbio.sjtu.edu.cn/bioinf/Protease.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Institute of Image Processing & Pattern Recognition, Shanghai Jiaotong University, 800 Dongchuan Road, Shanghai, 200240, China.
| | | |
Collapse
|
13
|
Using ensemble of classifiers for predicting HIV protease cleavage sites in proteins. Amino Acids 2008; 36:409-16. [DOI: 10.1007/s00726-008-0076-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2008] [Accepted: 03/27/2008] [Indexed: 10/22/2022]
|
14
|
Nanni L, Lumini A. A genetic approach for building different alphabets for peptide and protein classification. BMC Bioinformatics 2008; 9:45. [PMID: 18218100 PMCID: PMC2246158 DOI: 10.1186/1471-2105-9-45] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2007] [Accepted: 01/24/2008] [Indexed: 11/10/2022] Open
Abstract
Background In this paper, it is proposed an optimization approach for producing reduced alphabets for peptide classification, using a Genetic Algorithm. The classification task is performed by a multi-classifier system where each classifier (Linear or Radial Basis function Support Vector Machines) is trained using features extracted by different reduced alphabets. Each alphabet is constructed by a Genetic Algorithm whose objective function is the maximization of the area under the ROC-curve obtained in several classification problems. Results The new approach has been tested in three peptide classification problems: HIV-protease, recognition of T-cell epitopes and prediction of peptides that bind human leukocyte antigens. The tests demonstrate that the idea of training a pool classifiers by reduced alphabets, created using a Genetic Algorithm, allows an improvement over other state-of-the-art feature extraction methods. Conclusion The validity of the novel strategy for creating reduced alphabets is demonstrated by the performance improvement obtained by the proposed approach with respect to other reduced alphabets-based methods in the tested problems.
Collapse
Affiliation(s)
- Loris Nanni
- DEIS, Università di Bologna, Via Venezia 52, 47023 Cesena (FC), Italy.
| | | |
Collapse
|