1
|
Pande A, Patiyal S, Lathwal A, Arora C, Kaur D, Dhall A, Mishra G, Kaur H, Sharma N, Jain S, Usmani SS, Agrawal P, Kumar R, Kumar V, Raghava GPS. Pfeature: A Tool for Computing Wide Range of Protein Features and Building Prediction Models. J Comput Biol 2023; 30:204-222. [PMID: 36251780 DOI: 10.1089/cmb.2022.0241] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
In the last three decades, a wide range of protein features have been discovered to annotate a protein. Numerous attempts have been made to integrate these features in a software package/platform so that the user may compute a wide range of features from a single source. To complement the existing methods, we developed a method, Pfeature, for computing a wide range of protein features. Pfeature allows to compute more than 200,000 features required for predicting the overall function of a protein, residue-level annotation of a protein, and function of chemically modified peptides. It has six major modules, namely, composition, binary profiles, evolutionary information, structural features, patterns, and model building. Composition module facilitates to compute most of the existing compositional features, plus novel features. The binary profile of amino acid sequences allows to compute the fraction of each type of residue as well as its position. The evolutionary information module allows to compute evolutionary information of a protein in the form of a position-specific scoring matrix profile generated using Position-Specific Iterative Basic Local Alignment Search Tool (PSI-BLAST); fit for annotation of a protein and its residues. A structural module was developed for computing of structural features/descriptors from a tertiary structure of a protein. These features are suitable to predict the therapeutic potential of a protein containing non-natural or chemically modified residues. The model-building module allows to implement various machine learning techniques for developing classification and regression models as well as feature selection. Pfeature also allows the generation of overlapping patterns and features from a protein. A user-friendly Pfeature is available as a web server python library and stand-alone package.
Collapse
Affiliation(s)
- Akshara Pande
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Anjali Lathwal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Chakit Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Dilraj Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gaurav Mishra
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Department of Electrical Engineering, Shiv Nadar University, Greater Noida, India
| | - Harpreet Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Neelam Sharma
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Shipra Jain
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Salman Sadullah Usmani
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Piyush Agrawal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Rajesh Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Vinod Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
2
|
Oldfield CJ, Chen K, Kurgan L. Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences. Methods Mol Biol 2019; 1958:73-100. [PMID: 30945214 DOI: 10.1007/978-1-4939-9161-7_4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Many new methods for the sequence-based prediction of the secondary and supersecondary structures have been developed over the last several years. These and older sequence-based predictors are widely applied for the characterization and prediction of protein structure and function. These efforts have produced countless accurate predictors, many of which rely on state-of-the-art machine learning models and evolutionary information generated from multiple sequence alignments. We describe and motivate both types of predictions. We introduce concepts related to the annotation and computational prediction of the three-state and eight-state secondary structure as well as several types of supersecondary structures, such as β hairpins, coiled coils, and α-turn-α motifs. We review 34 predictors focusing on recent tools and provide detailed information for a selected set of 14 secondary structure and 3 supersecondary structure predictors. We conclude with several practical notes for the end users of these predictive methods.
Collapse
Affiliation(s)
- Christopher J Oldfield
- Department of Computer Science, College of Engineering, Virginia Commonwealth University, Richmond, VA, USA
| | - Ke Chen
- School of Computer Science and Software Engineering, Tianjin Polytechnic University, Tianjin, People's Republic of China
| | - Lukasz Kurgan
- Department of Computer Science, College of Engineering, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
3
|
MacCarthy E, Perry D, Kc DB. Advances in Protein Super-Secondary Structure Prediction and Application to Protein Structure Prediction. Methods Mol Biol 2019; 1958:15-45. [PMID: 30945212 DOI: 10.1007/978-1-4939-9161-7_2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Due to the advancement in various sequencing technologies, the gap between the number of protein sequences and the number of experimental protein structures is ever increasing. Community-wide initiatives like CASP have resulted in considerable efforts in the development of computational methods to accurately model protein structures from sequences. Sequence-based prediction of super-secondary structure has direct application in protein structure prediction, and there have been significant efforts in the prediction of super-secondary structure in the last decade. In this chapter, we first introduce the protein structure prediction problem and highlight some of the important progress in the field of protein structure prediction. Next, we discuss recent methods for the prediction of super-secondary structures. Finally, we discuss applications of super-secondary structure prediction in structure prediction/analysis of proteins. We also discuss prediction of protein structures that are composed of simple super-secondary structure repeats and protein structures that are composed of complex super-secondary structure repeats. Finally, we also discuss the recent trends in the field.
Collapse
Affiliation(s)
- Elijah MacCarthy
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Derrick Perry
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Dukka B Kc
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA.
| |
Collapse
|
4
|
Design and structural characterisation of monomeric water-soluble α-helix and β-hairpin peptides: State-of-the-art. Arch Biochem Biophys 2019; 661:149-167. [DOI: 10.1016/j.abb.2018.11.014] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 11/06/2018] [Accepted: 11/14/2018] [Indexed: 02/06/2023]
|
5
|
Agrawal P, Patiyal S, Kumar R, Kumar V, Singh H, Raghav PK, Raghava GPS. ccPDB 2.0: an updated version of datasets created and compiled from Protein Data Bank. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5298333. [PMID: 30689843 PMCID: PMC6343045 DOI: 10.1093/database/bay142] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 12/09/2018] [Indexed: 12/20/2022]
Abstract
ccPDB 2.0 (http://webs.iiitd.edu.in/raghava/ccpdb) is an updated version of the manually curated database ccPDB that maintains datasets required for developing methods to predict the structure and function of proteins. The number of datasets compiled from literature increased from 45 to 141 in ccPDB 2.0. Similarly, the number of protein structures used for creating datasets also increased from ~74 000 to ~137 000 (PDB March 2018 release). ccPDB 2.0 provides the same web services and flexible tools which were present in the previous version of the database. In the updated version, links of the number of methods developed in the past few years have also been incorporated. This updated resource is built on responsive templates which is compatible with smartphones (mobile, iPhone, iPad, tablets etc.) and large screen gadgets. In summary, ccPDB 2.0 is a user-friendly web-based platform that provides comprehensive as well as updated information about datasets.
Collapse
Affiliation(s)
- Piyush Agrawal
- Bioinformatics Center, CSIR-Institute of Microbial Technology, India.,Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| | - Rajesh Kumar
- Bioinformatics Center, CSIR-Institute of Microbial Technology, India.,Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| | - Vinod Kumar
- Bioinformatics Center, CSIR-Institute of Microbial Technology, India.,Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| | - Harinder Singh
- J. Craig Venter Institute 9605 Medical Center Drive, Suite 150 Rockville, MD, USA
| | - Pawan Kumar Raghav
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| |
Collapse
|
6
|
Li D, Hu X, Liu X, Feng Z, Ding C. Using feature optimization-based support vector machine method to recognize the β-hairpin motifs in enzymes. Saudi J Biol Sci 2016; 24:1361-1369. [PMID: 28855832 PMCID: PMC5562482 DOI: 10.1016/j.sjbs.2016.11.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Revised: 11/16/2016] [Accepted: 11/17/2016] [Indexed: 11/28/2022] Open
Abstract
β-Hairpins in enzyme, a kind of special protein with catalytic functions, contain many binding sites which are essential for the functions of enzyme. With the increasing number of observed enzyme protein sequences, it is of especial importance to use bioinformatics techniques to quickly and accurately identify the β-hairpin in enzyme protein for further advanced annotation of structure and function of enzyme. In this work, the proposed method was trained and tested on a non-redundant enzyme β-hairpin database containing 2818 β-hairpins and 1098 non-β-hairpins. With 5-fold cross-validation on the training dataset, the overall accuracy of 90.08% and Matthew’s correlation coefficient (Mcc) of 0.74 were obtained, while on the independent test dataset, the overall accuracy of 88.93% and Mcc of 0.76 were achieved. Furthermore, the method was validated on 845 β-hairpins with ligand binding sites. With 5-fold cross-validation on the training dataset and independent test on the test dataset, the overall accuracies were 85.82% (Mcc of 0.71) and 84.78% (Mcc of 0.70), respectively. With an integration of mRMR feature selection and SVM algorithm, a reasonable high accuracy was achieved, indicating the method to be an effective tool for the further studies of β-hairpins in enzymes structure. Additionally, as a novelty for function prediction of enzymes, β-hairpins with ligand binding sites were predicted. Based on this work, a web server was constructed to predict β-hairpin motifs in enzymes (http://202.207.29.251:8080/).
Collapse
Affiliation(s)
- Dongmei Li
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Xiuzhen Hu
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Xingxing Liu
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Zhenxing Feng
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Changjiang Ding
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| |
Collapse
|
7
|
Sun L, Hu X, Li S, Jiang Z, Li K. Prediction of complex super-secondary structure βαβ motifs based on combined features. Saudi J Biol Sci 2015; 23:66-71. [PMID: 26858540 PMCID: PMC4705255 DOI: 10.1016/j.sjbs.2015.10.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Revised: 10/08/2015] [Accepted: 10/12/2015] [Indexed: 11/17/2022] Open
Abstract
Prediction of a complex super-secondary structure is a key step in the study of tertiary structures of proteins. The strand-loop-helix-loop-strand (βαβ) motif is an important complex super-secondary structure in proteins. Many functional sites and active sites often occur in polypeptides of βαβ motifs. Therefore, the accurate prediction of βαβ motifs is very important to recognizing protein tertiary structure and the study of protein function. In this study, the βαβ motif dataset was first constructed using the DSSP package. A statistical analysis was then performed on βαβ motifs and non-βαβ motifs. The target motif was selected, and the length of the loop-α-loop varies from 10 to 26 amino acids. The ideal fixed-length pattern comprised 32 amino acids. A Support Vector Machine algorithm was developed for predicting βαβ motifs by using the sequence information, the predicted structure and function information to express the sequence feature. The overall predictive accuracy of 5-fold cross-validation and independent test was 81.7% and 76.7%, respectively. The Matthew’s correlation coefficient of the 5-fold cross-validation and independent test are 0.63 and 0.53, respectively. Results demonstrate that the proposed method is an effective approach for predicting βαβ motifs and can be used for structure and function studies of proteins.
Collapse
Affiliation(s)
| | - Xiuzhen Hu
- Corresponding author. Tel.: +86 471 6576281; fax: +86 471 6575863.
| | | | | | | |
Collapse
|
8
|
YongE F, GaoShan K. Identify Beta-Hairpin Motifs with Quadratic Discriminant Algorithm Based on the Chemical Shifts. PLoS One 2015; 10:e0139280. [PMID: 26422468 PMCID: PMC4589334 DOI: 10.1371/journal.pone.0139280] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Accepted: 09/09/2015] [Indexed: 01/13/2023] Open
Abstract
Successful prediction of the beta-hairpin motif will be helpful for understanding the of the fold recognition. Some algorithms have been proposed for the prediction of beta-hairpin motifs. However, the parameters used by these methods were primarily based on the amino acid sequences. Here, we proposed a novel model for predicting beta-hairpin structure based on the chemical shift. Firstly, we analyzed the statistical distribution of chemical shifts of six nuclei in not beta-hairpin and beta-hairpin motifs. Secondly, we used these chemical shifts as features combined with three algorithms to predict beta-hairpin structure. Finally, we achieved the best prediction, namely sensitivity of 92%, the specificity of 94% with 0.85 of Mathew’s correlation coefficient using quadratic discriminant analysis algorithm, which is clearly superior to the same method for the prediction of beta-hairpin structure from 20 amino acid compositions in the three-fold cross-validation. Our finding showed that the chemical shift is an effective parameter for beta-hairpin prediction, suggesting the quadratic discriminant analysis is a powerful algorithm for the prediction of beta-hairpin.
Collapse
Affiliation(s)
- Feng YongE
- College of Science, Inner Mongolia Agriculture University, Hohhot, PR China
- * E-mail:
| | - Kou GaoShan
- College of Science, Inner Mongolia Agriculture University, Hohhot, PR China
| |
Collapse
|
9
|
Kou G, Feng Y. Identify five kinds of simple super-secondary structures with quadratic discriminant algorithm based on the chemical shifts. J Theor Biol 2015; 380:392-8. [DOI: 10.1016/j.jtbi.2015.06.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 06/02/2015] [Accepted: 06/04/2015] [Indexed: 10/23/2022]
|
10
|
Prediction of four kinds of simple supersecondary structures in protein by using chemical shifts. ScientificWorldJournal 2014; 2014:978503. [PMID: 25050407 PMCID: PMC4090465 DOI: 10.1155/2014/978503] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2014] [Revised: 06/03/2014] [Accepted: 06/04/2014] [Indexed: 12/23/2022] Open
Abstract
Knowledge of supersecondary structures can provide important information about its spatial structure of protein. Some approaches have been developed for the prediction of protein supersecondary structure. However, the feature used by these approaches is primarily based on amino acid sequences. In this study, a novel model is presented to predict protein supersecondary structure by use of chemical shifts (CSs) information derived from nuclear magnetic resonance (NMR) spectroscopy. Using these CSs as inputs of the method of quadratic discriminant analysis (QD), we achieve the overall prediction accuracy of 77.3%, which is competitive with the same method for predicting supersecondary structures from amino acid compositions in threefold cross-validation. Moreover, our finding suggests that the combined use of different chemical shifts will influence the accuracy of prediction.
Collapse
|
11
|
Abstract
Since the first report in 1993 (JACS 115, 5887-5888) of a peptide able to form a monomeric β-hairpin structure in aqueous solution, the design of peptides forming either β-hairpins (two-stranded antiparallel β-sheets) or three-stranded antiparallel β-sheets has become a field of growing interest and activity. These studies have yielded great insights into the principles governing the stability and folding of β-hairpins and antiparallel β-sheets. This chapter provides an overview of the reported β-hairpin/β-sheet peptides focussed on the applied design criteria, reviews briefly the factors contributing to β-hairpin/β-sheet stability, and describes a protocol for the de novo design of β-sheet-forming peptides based on them. Guidelines to select appropriate turn and strand residues and to avoid self-association are provided. The methods employed to check the success of new designed peptides are also summarized. Since NMR is the best technique to that end, NOEs and chemical shifts characteristic of β-hairpins and three-stranded antiparallel β-sheets are given.
Collapse
Affiliation(s)
- M Angeles Jiménez
- Consejo Superior de Investigaciones Científicas (CSIC), Instituto de Química Física Rocasolano (IQFR), Serrano 119, 28006, Madrid, Spain,
| |
Collapse
|
12
|
Pybus LP, James DC, Dean G, Slidel T, Hardman C, Smith A, Daramola O, Field R. Predicting the expression of recombinant monoclonal antibodies in Chinese hamster ovary cells based on sequence features of the CDR3 domain. Biotechnol Prog 2013; 30:188-97. [DOI: 10.1002/btpr.1839] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Revised: 10/27/2013] [Indexed: 11/11/2022]
Affiliation(s)
- Leon P. Pybus
- ChELSI Institute; Dept. of Chemical and Biological Engineering; University of Sheffield; Mappin Street, Sheffield S1 3JD U.K
| | - David C. James
- ChELSI Institute; Dept. of Chemical and Biological Engineering; University of Sheffield; Mappin Street, Sheffield S1 3JD U.K
| | - Greg Dean
- MedImmune Ltd.; Granta Park Cambridge CB21 6GH U.K
| | - Tim Slidel
- MedImmune Ltd.; Granta Park Cambridge CB21 6GH U.K
| | | | - Andrew Smith
- MedImmune Ltd.; Granta Park Cambridge CB21 6GH U.K
| | | | - Ray Field
- MedImmune Ltd.; Granta Park Cambridge CB21 6GH U.K
| |
Collapse
|
13
|
Guilloux A, Caudron B, Jestin JL. A method to predict edge strands in beta-sheets from protein sequences. Comput Struct Biotechnol J 2013; 7:e201305001. [PMID: 24688737 PMCID: PMC3962219 DOI: 10.5936/csbj.201305001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2013] [Revised: 05/27/2013] [Accepted: 05/30/2013] [Indexed: 12/15/2022] Open
Abstract
There is a need for rules allowing three-dimensional structure information to be derived from protein sequences. In this work, consideration of an elementary protein folding step allows protein sub-sequences which optimize folding to be derived for any given protein sequence. Classical mechanics applied to this system and the energy conservation law during the elementary folding step yields an equation whose solutions are taken over the field of rational numbers. This formalism is applied to beta-sheets containing two edge strands and at least two central strands. The number of protein sub-sequences optimized for folding per amino acid in beta-strands is shown in particular to predict edge strands from protein sequences. Topological information on beta-strands and loops connecting them is derived for protein sequences with a prediction accuracy of 75%. The statistical significance of the finding is given. Applications in protein structure prediction are envisioned such as for the quality assessment of protein structure models.
Collapse
Affiliation(s)
- Antonin Guilloux
- Analyse algébrique, Institut de Mathématiques de Jussieu, Université Pierre et Marie Curie, Paris VI, France
| | - Bernard Caudron
- Centre d'Informatique pour la Biologie, Institut Pasteur, Paris, France
| | | |
Collapse
|
14
|
Ho HK, Zhang L, Ramamohanarao K, Martin S. A survey of machine learning methods for secondary and supersecondary protein structure prediction. Methods Mol Biol 2013; 932:87-106. [PMID: 22987348 DOI: 10.1007/978-1-62703-065-6_6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
In this chapter we provide a survey of protein secondary and supersecondary structure prediction using methods from machine learning. Our focus is on machine learning methods applicable to β-hairpin and β-sheet prediction, but we also discuss methods for more general supersecondary structure prediction. We provide background on the secondary and supersecondary structures that we discuss, the features used to describe them, and the basic theory behind the machine learning methods used. We survey the machine learning methods available for secondary and supersecondary structure prediction and compare them where possible.
Collapse
Affiliation(s)
- Hui Kian Ho
- Department of Computer Science and Software Engineering, University of Melbourne, National ICT Australia, Parkville, VIC, Australia
| | | | | | | |
Collapse
|
15
|
Using Homology Information From PDB to Improve The Accuracy of Protein β-turn Prediction by NetTurnP*. PROG BIOCHEM BIOPHYS 2012. [DOI: 10.3724/sp.j.1206.2011.00370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
16
|
Chen K, Kurgan L. Computational prediction of secondary and supersecondary structures. Methods Mol Biol 2012; 932:63-86. [PMID: 22987347 DOI: 10.1007/978-1-62703-065-6_5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The sequence-based prediction of the secondary and supersecondary structures enjoys strong interest and finds applications in numerous areas related to the characterization and prediction of protein structure and function. Substantial efforts in these areas over the last three decades resulted in the development of accurate predictors, which take advantage of modern machine learning models and availability of evolutionary information extracted from multiple sequence alignment. In this chapter, we first introduce and motivate both prediction areas and introduce basic concepts related to the annotation and prediction of the secondary and supersecondary structures, focusing on the β hairpin, coiled coil, and α-turn-α motifs. Next, we overview state-of-the-art prediction methods, and we provide details for 12 modern secondary structure predictors and 4 representative supersecondary structure predictors. Finally, we provide several practical notes for the users of these prediction tools.
Collapse
Affiliation(s)
- Ke Chen
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada
| | | |
Collapse
|
17
|
Zou D, He Z, He J, Xia Y. Supersecondary structure prediction using Chou's pseudo amino acid composition. J Comput Chem 2010; 32:271-8. [DOI: 10.1002/jcc.21616] [Citation(s) in RCA: 94] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
18
|
Kountouris P, Hirst JD. Predicting beta-turns and their types using predicted backbone dihedral angles and secondary structures. BMC Bioinformatics 2010; 11:407. [PMID: 20673368 PMCID: PMC2920885 DOI: 10.1186/1471-2105-11-407] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2010] [Accepted: 07/31/2010] [Indexed: 11/29/2022] Open
Abstract
Background β-turns are secondary structure elements usually classified as coil. Their prediction is important, because of their role in protein folding and their frequent occurrence in protein chains. Results We have developed a novel method that predicts β-turns and their types using information from multiple sequence alignments, predicted secondary structures and, for the first time, predicted dihedral angles. Our method uses support vector machines, a supervised classification technique, and is trained and tested on three established datasets of 426, 547 and 823 protein chains. We achieve a Matthews correlation coefficient of up to 0.49, when predicting the location of β-turns, the highest reported value to date. Moreover, the additional dihedral information improves the prediction of β-turn types I, II, IV, VIII and "non-specific", achieving correlation coefficients up to 0.39, 0.33, 0.27, 0.14 and 0.38, respectively. Our results are more accurate than other methods. Conclusions We have created an accurate predictor of β-turns and their types. Our method, called DEBT, is available online at http://comp.chem.nottingham.ac.uk/debt/.
Collapse
Affiliation(s)
- Petros Kountouris
- School of Chemistry, University of Nottingham, University Park, Nottingham NG7 2RD, UK
| | | |
Collapse
|
19
|
Zou D, He Z, He J. Beta-hairpin prediction with quadratic discriminant analysis using diversity measure. J Comput Chem 2009; 30:2277-84. [PMID: 19263434 DOI: 10.1002/jcc.21229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
On the basis of the features of protein sequential pattern, we used the method of increment of diversity combined with quadratic discriminant analysis (IDQD) to predict beta-hairpins motifs in protein sequences. Three rules are used to extract the raw beta-beta motifs sequential patterns for fixed-length. Amino acid basic compositions, dipeptide components, and amino acid composition distribution are combined to represent the compositional features. Eighteen feature variables on a sequential pattern to be predicted are defined in terms of ID. They are integrated in a single formal framework given by IDQD. The method is trained and tested on ArchDB40 dataset containing 3088 proteins. The overall accuracy of prediction and Matthew's correlation coefficient for the independent testing dataset are 81.7% and 0.60, respectively. In addition, a higher accuracy of 84.5% and Matthew's correlation coefficient of 0.68 for the independent testing dataset are obtained on a dataset previously used by Kumar et al. (Nucleic Acids Res 2005, 33, 154), which contains 2088 proteins. For a fair assessment of our method, the performance is also evaluated on all 63 proteins used in CASP6. The overall accuracy of prediction is 74.2% for the independent testing dataset.
Collapse
Affiliation(s)
- Dongsheng Zou
- College of Computer Science, Chongqing University, Chongqing 400044, China.
| | | | | |
Collapse
|
20
|
Zhang N, Ruan J, Duan G, Gao S, Zhang T. The interstrand amino acid pairs play a significant role in determining the parallel or antiparallel orientation of beta-strands. Biochem Biophys Res Commun 2009; 386:537-43. [PMID: 19540200 DOI: 10.1016/j.bbrc.2009.06.072] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2009] [Accepted: 06/16/2009] [Indexed: 12/12/2022]
Abstract
It is widely considered that it is not appropriate to treat beta-pairs in isolation, since other secondary structural models (such as helices, coils), protein topology and protein tertiary structures would limit beta-strand pairing. However, to understand the underlying mechanisms of beta-sheet formation, studies ought to be performed separately on more concrete aspects. In this study, we focus on the parallel or antiparallel orientation of beta-strands. First, statistical analysis was performed on the relative frequencies of the interstrand amino acid pairs within parallel and antiparallel beta-strands. Consequently, features were extracted by singular value decomposition from the statistical results. By using the support vector machine to distinguish the features extracted from the two types of beta-strands, high accuracy was achieved (up to 99.4%). This suggests that the interstrand amino acid pairs play a significant role in determining the parallel or antiparallel orientation of beta-strands. These results may provide useful information for developing other useful algorithms to examine to the beta-strand folding pathways, and could eventually lead to protein structure predictions.
Collapse
Affiliation(s)
- Ning Zhang
- Key Laboratory of Bioactive Materials, Ministry of Education and College of Life Science, Nankai University, Tianjin 300071, PR China
| | | | | | | | | |
Collapse
|
21
|
Recognition of β-hairpin motifs in proteins by using the composite vector. Amino Acids 2009; 38:915-21. [DOI: 10.1007/s00726-009-0299-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2008] [Accepted: 04/20/2009] [Indexed: 10/20/2022]
|
22
|
Emekli U, Gunasekaran K, Nussinov R, Haliloglu T. What can we learn from highly connected beta-rich structures for structural interface design? METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 474:235-53. [PMID: 19031068 DOI: 10.1007/978-1-59745-480-3_12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Most hubs' binding sites are able to transiently interact with numerous proteins. We focus on beta-rich hubs with the goal of inferring features toward design. Since they are able to interact with many partners and association of beta-conformations may lead to amyloid fibrils, we ask whether there is some property that distinguishes them from low-connectivity beta-rich proteins, which may be more interaction specific. Identification of such features should be useful as they can be incorporated in interface design while avoiding polymerization into fibrils. We classify the proteins in the yeast interaction map according to the types of their secondary structures. The small number of the obtained beta-rich protein structures in the Protein Data Bank likely reflects their low occurrence in the proteome. Analysis of the obtained structures indicates that highly connected beta-rich proteins tend to have clusters of conserved residues in their cores, unlike beta-rich structures with low connectivity, suggesting that the highly packed conserved cores are important to the stability of proteins, which have residue composition and sequence prone to beta-structure and amyloid formation. The enhanced stability may hinder partial unfolding, which, depending on the conditions, is more likely to lead to polymerization of these sequences.
Collapse
Affiliation(s)
- Ugur Emekli
- Polymer Research Center and Chemical Engineering Department, Bogaziçi University, Istanbul, Turkey
| | | | | | | |
Collapse
|
23
|
Marcelino AMC, Gierasch LM. Roles of beta-turns in protein folding: from peptide models to protein engineering. Biopolymers 2008; 89:380-91. [PMID: 18275088 PMCID: PMC2904567 DOI: 10.1002/bip.20960] [Citation(s) in RCA: 172] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Reverse turns are a major class of protein secondary structure; they represent sites of chain reversal and thus sites where the globular character of a protein is created. It has been speculated for many years that turns may nucleate the formation of structure in protein folding, as their propensity to occur will favor the approximation of their flanking regions and their general tendency to be hydrophilic will favor their disposition at the solvent-accessible surface. Reverse turns are local features, and it is therefore not surprising that their structural properties have been extensively studied using peptide models. In this article, we review research on peptide models of turns to test the hypothesis that the propensities of turns to form in short peptides will relate to the roles of corresponding sequences in protein folding. Turns with significant stability as isolated entities should actively promote the folding of a protein, and by contrast, turn sequences that merely allow the chain to adopt conformations required for chain reversal are predicted to be passive in the folding mechanism. We discuss results of protein engineering studies of the roles of turn residues in folding mechanisms. Factors that correlate with the importance of turns in folding indeed include their intrinsic stability, as well as their topological context and their participation in hydrophobic networks within the protein's structure.
Collapse
|
24
|
|
25
|
Tancredi T, Guerrini R, Marzola E, Trapella C, Calo G, Regoli D, Reinscheid RK, Camarda V, Salvadori S, Temussi PA. Conformation-activity relationship of neuropeptide S and some structural mutants: helicity affects their interaction with the receptor. J Med Chem 2007; 50:4501-8. [PMID: 17696420 DOI: 10.1021/jm0706822] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Neuropeptide S (NPS) is the endogenous ligand of the previously orphan G-protein coupled receptor now named NPSR. The NPS-NPSR receptor system regulates important biological functions such as sleep/waking, locomotion, anxiety and food intake. Recently, exhaustive Ala scan and d-amino acid scan studies, together with systematic N- and C-terminal truncation, led to the identification of key residues for biological activity. Because conformational preferences might also play an important role, we undertook a detailed conformational analysis of NPS and several analogues in solution. We show that helicity induced by substitution of three flexible residues in the 5-13 regulatory region abolishes biological activity. A parallel pharmacological and conformational study of single and multiple substitutions of glycines 5, 7, and 9 showed that helicity can be tolerated in the C-terminal part of the peptide but not around Gly7. The identification of hNPSR partial agonists heralds the possibility of designing pure NPS receptor antagonists.
Collapse
Affiliation(s)
- Teodorico Tancredi
- Istituto di Chimica Biomolecolare, CNR, Via Campi Flegrei 34, I-80078 Pozzuoli, Italy
| | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Zong C, Papoian GA, Ulander J, Wolynes PG. Role of Topology, Nonadditivity, and Water-Mediated Interactions in Predicting the Structures of α/β Proteins. J Am Chem Soc 2006; 128:5168-76. [PMID: 16608353 DOI: 10.1021/ja058589v] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The folding of alpha/beta proteins involves most of the commonly known structural and dynamic complexities of the protein energy landscapes. Thus, the interplay among different structural components, taking into account the cooperative interactions, is important in determining the success of protein structure prediction. In this work we present further developments of our knowledge-based force field for alpha/beta proteins, introducing more realistic modeling of many-body interactions governing the folding of beta-sheets. The model's innovations highlight both specific topological characteristics of secondary structures and the generic nonadditive interactions that are mediated by water. We also investigate how a coarse biasing of the protein morphology can be used to understand the role of heterogeneity in protein collapse. Analysis of the simulation results for three test alpha/beta proteins indicates that the addition of the topological and many-body ingredients to the model helps to greatly reduce the roughness in the energy landscape. Consequently, high quality candidate structures for alpha/beta proteins can be generated from simulated annealing runs, using very modest amounts of computer time.
Collapse
Affiliation(s)
- Chenghang Zong
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093-0371, USA.
| | | | | | | |
Collapse
|
27
|
Jeong J, Berman P, Przytycka T. Fold classification based on secondary structure--how much is gained by including loop topology? BMC STRUCTURAL BIOLOGY 2006; 6:3. [PMID: 16524467 PMCID: PMC1434743 DOI: 10.1186/1472-6807-6-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2005] [Accepted: 03/08/2006] [Indexed: 11/18/2022]
Abstract
Background It has been proposed that secondary structure information can be used to classify (to some extend) protein folds. Since this method utilizes very limited information about the protein structure, it is not surprising that it has a higher error rate than the approaches that use full 3D fold description. On the other hand, the comparing of 3D protein structures is computing intensive. This raises the question to what extend the error rate can be decreased with each new source of information, especially if the new information can still be used with simple alignment algorithms. We consider the question whether the information about closed loops can improve the accuracy of this approach. While the answer appears to be obvious, we had to overcome two challenges. First, how to code and to compare topological information in such a way that local alignment of strings will properly identify similar structures. Second, how to properly measure the effect of new information in a large data sample. We investigate alternative ways of computing and presenting this information. Results We used the set of beta proteins with at most 30% pairwise identity to test the approach; local alignment scores were used to build a tree of clusters which was evaluated using a new log-odd cluster scoring function. In particular, we derive a closed formula for the probability of obtaining a given score by chance.Parameters of local alignment function were optimized using a genetic algorithm. Of 81 folds that had more than one representative in our data set, log-odds scores registered significantly better clustering in 27 cases and significantly worse in 6 cases, and small differences in the remaining cases. Various notions of the significant change or average change were considered and tried, and the results were all pointing in the same direction. Conclusion We found that, on average, properly presented information about the loop topology improves noticeably the accuracy of the method but the benefits vary between fold families as measured by log-odds cluster score.
Collapse
Affiliation(s)
- Jieun Jeong
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, USA
| | - Piotr Berman
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, USA
| | - Teresa Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institute of Health, Bethesda, USA
| |
Collapse
|
28
|
Kumar M, Bhasin M, Natt NK, Raghava GPS. BhairPred: prediction of beta-hairpins in a protein from multiple alignment information using ANN and SVM techniques. Nucleic Acids Res 2005; 33:W154-9. [PMID: 15988830 PMCID: PMC1160264 DOI: 10.1093/nar/gki588] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
This paper describes a method for predicting a supersecondary structural motif, β-hairpins, in a protein sequence. The method was trained and tested on a set of 5102 hairpins and 5131 non-hairpins, obtained from a non-redundant dataset of 2880 proteins using the DSSP and PROMOTIF programs. Two machine-learning techniques, an artificial neural network (ANN) and a support vector machine (SVM), were used to predict β-hairpins. An accuracy of 65.5% was achieved using ANN when an amino acid sequence was used as the input. The accuracy improved from 65.5 to 69.1% when evolutionary information (PSI-BLAST profile), observed secondary structure and surface accessibility were used as the inputs. The accuracy of the method further improved from 69.1 to 79.2% when the SVM was used for classification instead of the ANN. The performances of the methods developed were assessed in a test case, where predicted secondary structure and surface accessibility were used instead of the observed structure. The highest accuracy achieved by the SVM based method in the test case was 77.9%. A maximum accuracy of 71.1% with Matthew's correlation coefficient of 0.41 in the test case was obtained on a dataset previously used by X. Cruz, E. G. Hutchinson, A. Shephard and J. M. Thornton (2002) Proc. Natl Acad. Sci. USA, 99, 11157–11162. The performance of the method was also evaluated on proteins used in the ‘6th community-wide experiment on the critical assessment of techniques for protein structure prediction (CASP6)’. Based on the algorithm described, a web server, BhairPred (), has been developed, which can be used to predict β-hairpins in a protein using the SVM approach.
Collapse
Affiliation(s)
| | | | | | - G. P. S. Raghava
- To whom correspondence should be addressed. Tel: +91 172 2690557/2690225; Fax: +91 172 2690632/2690585;
| |
Collapse
|
29
|
Kuhn M, Meiler J, Baker D. Strand-loop-strand motifs: Prediction of hairpins and diverging turns in proteins. Proteins 2003; 54:282-8. [PMID: 14696190 DOI: 10.1002/prot.10589] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Beta-sheet proteins have been particularly challenging for de novo structure prediction methods, which tend to pair adjacent beta-strands into beta-hairpins and produce overly local topologies. To remedy this problem and facilitate de novo prediction of beta-sheet protein structures, we have developed a neural network that classifies strand-loop-strand motifs by local hairpins and nonlocal diverging turns by using the amino acid sequence as input. The neural network is trained with a representative subset of the Protein Data Bank and achieves a prediction accuracy of 75.9 +/- 4.4% compared to a baseline prediction rate of 59.1%. Hairpins are predicted with an accuracy of 77.3 +/- 6.1%, diverging turns with an accuracy of 73.9 +/- 6.0%. Incorporation of the beta-hairpin/diverging turn classification into the ROSETTA de novo structure prediction method led to higher contact order models and somewhat improved tertiary structure predictions for a test set of 11 all-beta-proteins and 3 alphabeta-proteins. The beta-hairpin/diverging turn classification from amino acid sequences is available online for academic use (Meiler and Kuhn, 2003; www.jens-meiler.de/turnpred.html).
Collapse
Affiliation(s)
- Michael Kuhn
- California Institute of Technology, Pasadena, USA
| | | | | |
Collapse
|
30
|
Meiler J, Baker D. Coupled prediction of protein secondary and tertiary structure. Proc Natl Acad Sci U S A 2003; 100:12105-10. [PMID: 14528006 PMCID: PMC218720 DOI: 10.1073/pnas.1831973100] [Citation(s) in RCA: 147] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2003] [Indexed: 11/18/2022] Open
Abstract
The strong coupling between secondary and tertiary structure formation in protein folding is neglected in most structure prediction methods. In this work we investigate the extent to which nonlocal interactions in predicted tertiary structures can be used to improve secondary structure prediction. The architecture of a neural network for secondary structure prediction that utilizes multiple sequence alignments was extended to accept low-resolution nonlocal tertiary structure information as an additional input. By using this modified network, together with tertiary structure information from native structures, the Q3-prediction accuracy is increased by 7-10% on average and by up to 35% in individual cases for independent test data. By using tertiary structure information from models generated with the ROSETTA de novo tertiary structure prediction method, the Q3-prediction accuracy is improved by 4-5% on average for small and medium-sized single-domain proteins. Analysis of proteins with particularly large improvements in secondary structure prediction using tertiary structure information provides insight into the feedback from tertiary to secondary structure.
Collapse
Affiliation(s)
- Jens Meiler
- Department of Biochemistry, University of Washington, Box 357350, Seattle, WA 98195-7350, USA
| | | |
Collapse
|
31
|
Mollah AKMM, Stennis RL, Mossing MC. Stability of monomeric Cro variants: Isoenergetic transformation of a type I' to a type II' beta-hairpin by single amino acid replacements. Protein Sci 2003; 12:1126-30. [PMID: 12717034 PMCID: PMC2323882 DOI: 10.1110/ps.0239003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The thermodynamic stabilities of three monomeric variants of the bacteriophage lambda Cro repressor that differ only in the sequence of two amino acids at the apex of an engineered beta-hairpin have been determined. The sequences of the turns are EVK-XX-EVK, where the two central residues are DG, GG, and GT, respectively. Standard-state unfolding free energies, determined from circular dichroism measurements as a function of urea concentration, range from 2.4 to 2.7 kcal/mole, while those determined from guanidine hydrochloride range from 2.8 to 3.3 kcal/mole for the three proteins. Thermal denaturation yields van't Hoff unfolding enthalpies of 36 to 40 kcal /mole at midpoint temperatures in the range of 53 to 58 degrees C. Extrapolation of the thermal denaturation free energies with heat capacities of 400 to 600 cal/mole deg gives good agreement with the parameters determined in denaturant titrations. As predicted from statistical surveys of amino acid replacements in beta-hairpins, energetic barriers to transformation from a type I' turn (DG) to a type II' turn (GT) can be quite small.
Collapse
Affiliation(s)
- A K M M Mollah
- Department of Biology, Yeshiva University, New York, New York 10033, USA
| | | | | |
Collapse
|