51
|
周 鹏. A Predictor of Protein Secondary Structure Based on a Continuously Updated Templet Library. ACTA ACUST UNITED AC 2017. [DOI: 10.12677/hjcb.2017.72002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
52
|
Abstract
Transmembrane beta-barrels (TMBBs) constitute an important structural class of membrane proteins located in the outer membrane of gram-negative bacteria, and in the outer membrane of chloroplasts and mitochondria. They are involved in a wide variety of cellular functions and the prediction of their transmembrane topology, as well as their discrimination in newly sequenced genomes is of great importance as they are promising targets for antimicrobial drugs and vaccines. Several methods have been applied for the prediction of the transmembrane segments and the topology of beta barrel transmembrane proteins utilizing different algorithmic techniques. Hidden Markov Models (HMMs) have been efficiently used in the development of several computational methods used for this task. In this chapter we give a brief review of different available prediction methods for beta barrel transmembrane proteins pointing out sequence and structural features that should be incorporated in a prediction method. We then describe the procedure of the design and development of a Hidden Markov Model capable of predicting the transmembrane beta strands of TMBBs and discriminating them from globular proteins.
Collapse
Affiliation(s)
- Georgios N Tsaousis
- Department of Cell Biology and Biophysics, Faculty of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens, 15701, Greece
| | - Stavros J Hamodrakas
- Department of Cell Biology and Biophysics, Faculty of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens, 15701, Greece
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Papasiopoulou 2-4, Lamia, 35100, Greece.
| |
Collapse
|
53
|
Abstract
More than two decades of research have enabled dihedral angle predictions at an accuracy that makes them an interesting alternative or supplement to secondary structure prediction that provides detailed local structure information for every residue of a protein. The evolution of dihedral angle prediction methods is closely linked to advancements in machine learning and other relevant technologies. Consequently recent improvements in large-scale training of deep neural networks have led to the best method currently available, which achieves a mean absolute error of 19° for phi, and 30° for psi. This performance opens interesting perspectives for the application of dihedral angle prediction in the comparison, prediction, and design of protein structures.
Collapse
Affiliation(s)
- Olav Zimmermann
- Jülich Supercomputing Centre (JSC), Institute for Advanced Simulation (IAS), Forschungszentrum Jülich GmbH, 52425, Jülich, Germany.
| |
Collapse
|
54
|
Rashid S, Saraswathi S, Kloczkowski A, Sundaram S, Kolinski A. Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach. BMC Bioinformatics 2016; 17:362. [PMID: 27618812 PMCID: PMC5020447 DOI: 10.1186/s12859-016-1209-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Accepted: 08/25/2016] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Protein secondary structure prediction (SSP) has been an area of intense research interest. Despite advances in recent methods conducted on large datasets, the estimated upper limit accuracy is yet to be reached. Since the predictions of SSP methods are applied as input to higher-level structure prediction pipelines, even small errors may have large perturbations in final models. Previous works relied on cross validation as an estimate of classifier accuracy. However, training on large numbers of protein chains compromises the classifier ability to generalize to new sequences. This prompts a novel approach to training and an investigation into the possible structural factors that lead to poor predictions. Here, a small group of 55 proteins termed the compact model is selected from the CB513 dataset using a heuristics-based approach. In a prior work, all sequences were represented as probability matrices of residues adopting each of Helix, Sheet and Coil states, based on energy calculations using the C-Alpha, C-Beta, Side-chain (CABS) algorithm. The functional relationship between the conformational energies computed with CABS force-field and residue states is approximated using a classifier termed the Fully Complex-valued Relaxation Network (FCRN). The FCRN is trained with the compact model proteins. RESULTS The performance of the compact model is compared with traditional cross-validated accuracies and blind-tested on a dataset of G Switch proteins, obtaining accuracies of ∼81 %. The model demonstrates better results when compared to several techniques in the literature. A comparative case study of the worst performing chain identifies hydrogen bond contacts that lead to Coil ⇔ Sheet misclassifications. Overall, mispredicted Coil residues have a higher propensity to participate in backbone hydrogen bonding than correctly predicted Coils. CONCLUSIONS The implications of these findings are: (i) the choice of training proteins is important in preserving the generalization of a classifier to predict new sequences accurately and (ii) SSP techniques sensitive in distinguishing between backbone hydrogen bonding and side-chain or water-mediated hydrogen bonding might be needed in the reduction of Coil ⇔ Sheet misclassifications.
Collapse
Affiliation(s)
- Shamima Rashid
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore, 639798 Singapore
| | - Saras Saraswathi
- Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, USA
- Sidra Medical and Research Center, Al Dafna, Doha, Qatar
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, USA
- Department of Paediatrics, College of Medicine, The Ohio State University, 370 W. 9th Avenue, Columbus, USA
| | - Suresh Sundaram
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore, 639798 Singapore
| | - Andrzej Kolinski
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Pasteura 1, Warsaw, 02-093 Poland
| |
Collapse
|
55
|
Movahedi M, Zare-Mirakabad F, Arab SS. Evaluating the accuracy of protein design using native secondary sub-structures. BMC Bioinformatics 2016; 17:353. [PMID: 27597167 PMCID: PMC5011913 DOI: 10.1186/s12859-016-1199-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 08/24/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND According to structure-dependent function of proteins, two main challenging problems called Protein Structure Prediction (PSP) and Inverse Protein Folding (IPF) are investigated. In spite of IPF essential applications, it has not been investigated as much as PSP problem. In fact, the ultimate goal of IPF problem or protein design is to create proteins with enhanced properties or even novel functions. One of the major computational challenges in protein design is its large sequence space, namely searching through all plausible sequences is impossible. Inasmuch as, protein secondary structure represents an appropriate primary scaffold of the protein conformation, undoubtedly studying the Protein Secondary Structure Inverse Folding (PSSIF) problem is a quantum leap forward in protein design, as it can reduce the search space. In this paper, a novel genetic algorithm which uses native secondary sub-structures is proposed to solve PSSIF problem. In essence, evolutionary information can lead the algorithm to design appropriate amino acid sequences respective to the target secondary structures. Furthermore, they can be folded to tertiary structures almost similar to their reference 3D structures. RESULTS The proposed algorithm called GAPSSIF benefits from evolutionary information obtained by solved proteins in the PDB. Therefore, we construct a repository of protein secondary sub-structures to accelerate convergence of the algorithm. The secondary structure of designed sequences by GAPSSIF is comparable with those obtained by Evolver and EvoDesign. Although we do not explicitly consider tertiary structure features through the algorithm, the structural similarity of native and designed sequences declares acceptable values. CONCLUSIONS Using the evolutionary information of native structures can significantly improve the quality of designed sequences. In fact, the combination of this information and effective features such as solvent accessibility and torsion angles leads IPF problem to an efficient solution. GAPSSIF can be downloaded at http://bioinformatics.aut.ac.ir/GAPSSIF/ .
Collapse
Affiliation(s)
- Marziyeh Movahedi
- Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - Fatemeh Zare-Mirakabad
- Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - Seyed Shahriar Arab
- Department of Biophysics, Faculty of Biological Sciences Tarbiat Modares University (TMU), Tehran, Iran
| |
Collapse
|
56
|
Kryshtafovych A, Monastyrskyy B, Fidelis K. CASP11 statistics and the prediction center evaluation system. Proteins 2016; 84 Suppl 1:15-9. [PMID: 26857434 PMCID: PMC5479680 DOI: 10.1002/prot.25005] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Revised: 01/18/2016] [Accepted: 02/04/2016] [Indexed: 01/10/2023]
Abstract
We outline the role of the Protein Structure Prediction Center (predictioncenter.org) in conducting the CASP11 and CASP ROLL experiments, discuss the experiment statistics, and provide an overview of the present CASP infrastructure. The biggest changes compared to the previous CASPs are the implementation of the evaluation system incorporating practically all evaluation measures, statistical tests, and visualization tools historically used by the CASP assessors, the expansion of the infrastructure to incorporate new categories of contact-assisted and multimeric predictions, and the redesign of the assessors' web-workspace enabling assessments based on multiple measures for different group categories and target sets. Proteins 2016; 84(Suppl 1):15-19. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Andriy Kryshtafovych
- Protein Structure Prediction Center, Genome and Biomedical Sciences Facilities, University of California, Davis, California, 95616
| | - Bohdan Monastyrskyy
- Protein Structure Prediction Center, Genome and Biomedical Sciences Facilities, University of California, Davis, California, 95616
| | - Krzysztof Fidelis
- Protein Structure Prediction Center, Genome and Biomedical Sciences Facilities, University of California, Davis, California, 95616.
| |
Collapse
|
57
|
Tsirigos KD, Elofsson A, Bagos PG. PRED-TMBB2: improved topology prediction and detection of beta-barrel outer membrane proteins. Bioinformatics 2016; 32:i665-i671. [DOI: 10.1093/bioinformatics/btw444] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
58
|
Kieslich CA, Smadbeck J, Khoury GA, Floudas CA. conSSert: Consensus SVM Model for Accurate Prediction of Ordered Secondary Structure. J Chem Inf Model 2016; 56:455-61. [DOI: 10.1021/acs.jcim.5b00566] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
| | - James Smadbeck
- Department
of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - George A. Khoury
- Department
of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | | |
Collapse
|
59
|
A New Secondary Structure Assignment Algorithm Using Cα Backbone Fragments. Int J Mol Sci 2016; 17:333. [PMID: 26978354 PMCID: PMC4813195 DOI: 10.3390/ijms17030333] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Revised: 02/26/2016] [Accepted: 02/29/2016] [Indexed: 11/17/2022] Open
Abstract
The assignment of secondary structure elements in proteins is a key step in the analysis of their structures and functions. We have developed an algorithm, SACF (secondary structure assignment based on Cα fragments), for secondary structure element (SSE) assignment based on the alignment of Cα backbone fragments with central poses derived by clustering known SSE fragments. The assignment algorithm consists of three steps: First, the outlier fragments on known SSEs are detected. Next, the remaining fragments are clustered to obtain the central fragments for each cluster. Finally, the central fragments are used as a template to make assignments. Following a large-scale comparison of 11 secondary structure assignment methods, SACF, KAKSI and PROSS are found to have similar agreement with DSSP, while PCASSO agrees with DSSP best. SACF and PCASSO show preference to reducing residues in N and C cap regions, whereas KAKSI, P-SEA and SEGNO tend to add residues to the terminals when DSSP assignment is taken as standard. Moreover, our algorithm is able to assign subtle helices (310-helix, π-helix and left-handed helix) and make uniform assignments, as well as to detect rare SSEs in β-sheets or long helices as outlier fragments from other programs. The structural uniformity should be useful for protein structure classification and prediction, while outlier fragments underlie the structure-function relationship.
Collapse
|
60
|
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields. Sci Rep 2016; 6:18962. [PMID: 26752681 PMCID: PMC4707437 DOI: 10.1038/srep18962] [Citation(s) in RCA: 255] [Impact Index Per Article: 31.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2015] [Accepted: 11/26/2015] [Indexed: 12/29/2022] Open
Abstract
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
Collapse
|
61
|
Secondary and Tertiary Structure Prediction of Proteins: A Bioinformatic Approach. COMPLEX SYSTEM MODELLING AND CONTROL THROUGH INTELLIGENT SOFT COMPUTATIONS 2015. [DOI: 10.1007/978-3-319-12883-2_19] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
62
|
Spencer M, Eickholt J, Cheng J. A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:103-12. [PMID: 25750595 PMCID: PMC4348072 DOI: 10.1109/tcbb.2014.2343960] [Citation(s) in RCA: 138] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Ab initio protein secondary structure (SS) predictions are utilized to generate tertiary structure predictions, which are increasingly demanded due to the rapid discovery of proteins. Although recent developments have slightly exceeded previous methods of SS prediction, accuracy has stagnated around 80 percent and many wonder if prediction cannot be advanced beyond this ceiling. Disciplines that have traditionally employed neural networks are experimenting with novel deep learning techniques in attempts to stimulate progress. Since neural networks have historically played an important role in SS prediction, we wanted to determine whether deep learning could contribute to the advancement of this field as well. We developed an SS predictor that makes use of the position-specific scoring matrix generated by PSI-BLAST and deep learning network architectures, which we call DNSS. Graphical processing units and CUDA software optimize the deep network architecture and efficiently train the deep networks. Optimal parameters for the training process were determined, and a workflow comprising three separately trained deep networks was constructed in order to make refined predictions. This deep learning network approach was used to predict SS for a fully independent test dataset of 198 proteins, achieving a Q3 accuracy of 80.7 percent and a Sov accuracy of 74.2 percent.
Collapse
Affiliation(s)
- Matt Spencer
- Informatics Institute, University of Missouri, Columbia, MO 65211.
| | - Jesse Eickholt
- Department of Computer Science, Central Michigan University, Mount Pleasant, MI 48859.
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211.
| |
Collapse
|
63
|
Sormanni P, Camilloni C, Fariselli P, Vendruscolo M. The s2D method: simultaneous sequence-based prediction of the statistical populations of ordered and disordered regions in proteins. J Mol Biol 2014; 427:982-996. [PMID: 25534081 DOI: 10.1016/j.jmb.2014.12.007] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Revised: 12/10/2014] [Accepted: 12/12/2014] [Indexed: 11/18/2022]
Abstract
Extensive amounts of information about protein sequences are becoming available, as demonstrated by the over 79 million entries in the UniProt database. Yet, it is still challenging to obtain proteome-wide experimental information on the structural properties associated with these sequences. Fast computational predictors of secondary structure and of intrinsic disorder of proteins have been developed in order to bridge this gap. These two types of predictions, however, have remained largely separated, often preventing a clear characterization of the structure and dynamics of proteins. Here, we introduce a computational method to predict secondary-structure populations from amino acid sequences, which simultaneously characterizes structure and disorder in a unified statistical mechanics framework. To develop this method, called s2D, we exploited recent advances made in the analysis of NMR chemical shifts that provide quantitative information about the probability distributions of secondary-structure elements in disordered states. The results that we discuss show that the s2D method predicts secondary-structure populations with an average error of about 14%. A validation on three datasets of mostly disordered, mostly structured and partly structured proteins, respectively, shows that its performance is comparable to or better than that of existing predictors of intrinsic disorder and of secondary structure. These results indicate that it is possible to perform rapid and quantitative sequence-based characterizations of the structure and dynamics of proteins through the predictions of the statistical distributions of their ordered and disordered regions.
Collapse
Affiliation(s)
- Pietro Sormanni
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK
| | - Carlo Camilloni
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK
| | - Piero Fariselli
- Department of Computer Science, University of Bologna, 40127 Bologna, Italy
| | | |
Collapse
|
64
|
Zhang Y, Sagui C. Secondary structure assignment for conformationally irregular peptides: comparison between DSSP, STRIDE and KAKSI. J Mol Graph Model 2014; 55:72-84. [PMID: 25424660 DOI: 10.1016/j.jmgm.2014.10.005] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Accepted: 10/08/2014] [Indexed: 11/25/2022]
Abstract
Secondary structure assignment codes were built to explore the regularities associated with the periodic motifs of proteins, such as those in backbone dihedral angles or in hydrogen bonds between backbone atoms. Precise structure assignment is challenging because real-life secondary structures are susceptible to bending, twist, fraying and other deformations that can distance them from their geometrical prototypes. Although results from codes such as DSSP and STRIDE converge in well-ordered structures, the agreement between the secondary structure assignments is known to deteriorate as the conformations become more distorted. Conformationally irregular peptides therefore offer a great opportunity to explore the differences between these codes. This is especially important for unfolded proteins and intrinsically disordered proteins, which are known to exhibit residual and/or transient secondary structure whose characterization is challenging. In this work, we have carried out Molecular Dynamics simulations of (relatively) disordered peptides, specifically gp41659-671 (ELLELDKWASLWN), the homopeptide polyasparagine (N18), and polyasparagine dimers. We have analyzed the resulting conformations with DSSP and STRIDE, based on hydrogen-bond patterns (and dihedral angles for STRIDE), and KAKSI, based on α-Carbon distances; and carefully characterized the differences in structural assignments. The full-sequence Segment Overlap (SOV) scores, that quantify the agreement between two secondary structure assignments, vary from 70% for gp41659-671 (STRIDE as reference) to 49% for N18 (DSSP as reference). Major differences are observed in turns, in the distinction between α and 310 helices, and in short parallel-sheet segments.
Collapse
Affiliation(s)
- Yuan Zhang
- Department of Physics, North Carolina State University, Raleigh, NC 27695, United States; Center for High Performance Simulations (CHiPS), North Carolina State University, Raleigh, NC 27695, United States
| | - Celeste Sagui
- Department of Physics, North Carolina State University, Raleigh, NC 27695, United States; Center for High Performance Simulations (CHiPS), North Carolina State University, Raleigh, NC 27695, United States.
| |
Collapse
|
65
|
Hafsa NE, Wishart DS. CSI 2.0: a significantly improved version of the Chemical Shift Index. JOURNAL OF BIOMOLECULAR NMR 2014; 60:131-146. [PMID: 25273503 DOI: 10.1007/s10858-014-9863-x] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2014] [Accepted: 09/17/2014] [Indexed: 06/03/2023]
Abstract
Protein chemical shifts have long been used by NMR spectroscopists to assist with secondary structure assignment and to provide useful distance and torsion angle constraint data for structure determination. One of the most widely used methods for secondary structure identification is called the Chemical Shift Index (CSI). The CSI method uses a simple digital chemical shift filter to locate secondary structures along the protein chain using backbone (13)C and (1)H chemical shifts. While the CSI method is simple to use and easy to implement, it is only about 75-80% accurate. Here we describe a significantly improved version of the CSI (2.0) that uses machine-learning techniques to combine all six backbone chemical shifts ((13)Cα, (13)Cβ, (13)C, (15)N, (1)HN, (1)Hα) with sequence-derived features to perform far more accurate secondary structure identification. Our tests indicate that CSI 2.0 achieved an average identification accuracy (Q3) of 90.56% for a training set of 181 proteins in a repeated tenfold cross-validation and 89.35% for a test set of 59 proteins. This represents a significant improvement over other state-of-the-art chemical shift-based methods. In particular, the level of performance of CSI 2.0 is equal to that of standard methods, such as DSSP and STRIDE, used to identify secondary structures via 3D coordinate data. This suggests that CSI 2.0 could be used both in providing accurate NMR constraint data in the early stages of protein structure determination as well as in defining secondary structure locations in the final protein model(s). A CSI 2.0 web server (http://csi.wishartlab.com) is available for submitting the input queries for secondary structure identification.
Collapse
Affiliation(s)
- Noor E Hafsa
- Department of Computing Science, University of Alberta, Edmonton, Canada
| | | |
Collapse
|
66
|
Hoffmann F, Vancea I, Kamat SG, Strodel B. Protein structure prediction: assembly of secondary structure elements by basin-hopping. Chemphyschem 2014; 15:3378-90. [PMID: 25056272 DOI: 10.1002/cphc.201402247] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Indexed: 12/30/2022]
Abstract
The prediction of protein tertiary structure from primary structure remains a challenging task. One possible approach to this problem is the application of basin-hopping global optimization combined with an all-atom force field. In this work, the efficiency of basin-hopping is improved by introducing an approach that derives tertiary structures from the secondary structure assignments of individual residues. This approach is termed secondary-to-tertiary basin-hopping and benchmarked for three miniproteins: trpzip, trp-cage and ER-10. For each of the three miniproteins, the secondary-to-tertiary basin-hopping approach successfully and reliably predicts their three-dimensional structure. When it is applied to larger proteins, correctly folded structures are obtained. It can be concluded that the assembly of secondary structure elements using basin-hopping is a promising tool for de novo protein structure prediction.
Collapse
Affiliation(s)
- Falk Hoffmann
- Institute of Complex Systems: Structural Biochemistry, Forschungszentrum Jülich, 52425 Jülich (Germany)
| | | | | | | |
Collapse
|
67
|
Walsh I, Giollo M, Di Domenico T, Ferrari C, Zimmermann O, Tosatto SCE. Comprehensive large-scale assessment of intrinsic protein disorder. ACTA ACUST UNITED AC 2014; 31:201-8. [PMID: 25246432 DOI: 10.1093/bioinformatics/btu625] [Citation(s) in RCA: 124] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
MOTIVATION Intrinsically disordered regions are key for the function of numerous proteins. Due to the difficulties in experimental disorder characterization, many computational predictors have been developed with various disorder flavors. Their performance is generally measured on small sets mainly from experimentally solved structures, e.g. Protein Data Bank (PDB) chains. MobiDB has only recently started to collect disorder annotations from multiple experimental structures. RESULTS MobiDB annotates disorder for UniProt sequences, allowing us to conduct the first large-scale assessment of fast disorder predictors on 25 833 different sequences with X-ray crystallographic structures. In addition to a comprehensive ranking of predictors, this analysis produced the following interesting observations. (i) The predictors cluster according to their disorder definition, with a consensus giving more confidence. (ii) Previous assessments appear over-reliant on data annotated at the PDB chain level and performance is lower on entire UniProt sequences. (iii) Long disordered regions are harder to predict. (iv) Depending on the structural and functional types of the proteins, differences in prediction performance of up to 10% are observed. AVAILABILITY The datasets are available from Web site at URL: http://mobidb.bio.unipd.it/lsd. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ian Walsh
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Manuel Giollo
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Tomás Di Domenico
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Carlo Ferrari
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Olav Zimmermann
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| |
Collapse
|
68
|
|
69
|
Yaseen A, Li Y. Template-based C8-SCORPION: a protein 8-state secondary structure prediction method using structural information and context-based features. BMC Bioinformatics 2014; 15 Suppl 8:S3. [PMID: 25080939 PMCID: PMC4120151 DOI: 10.1186/1471-2105-15-s8-s3] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Secondary structures prediction of proteins is important to many protein structure modeling applications. Correct prediction of secondary structures can significantly reduce the degrees of freedom in protein tertiary structure modeling and therefore reduces the difficulty of obtaining high resolution 3D models. Methods In this work, we investigate a template-based approach to enhance 8-state secondary structure prediction accuracy. We construct structural templates from known protein structures with certain sequence similarity. The structural templates are then incorporated as features with sequence and evolutionary information to train two-stage neural networks. In case of structural templates absence, heuristic structural information is incorporated instead. Results After applying the template-based 8-state secondary structure prediction method, the 7-fold cross-validated Q8 accuracy is 78.85%. Even templates from structures with only 20%~30% sequence similarity can help improve the 8-state prediction accuracy. More importantly, when good templates are available, the prediction accuracy of less frequent secondary structures, such as 3-10 helices, turns, and bends, are highly improved, which are useful for practical applications. Conclusions Our computational results show that the templates containing structural information are effective features to enhance 8-state secondary structure predictions. Our prediction algorithm is implemented on a web server named "C8-SCORPION" available at: http://hpcr.cs.odu.edu/c8scorpion.
Collapse
|
70
|
Bonella S, Raimondo D, Milanetti E, Tramontano A, Ciccotti G. Mapping the hydropathy of amino acids based on their local solvation structure. J Phys Chem B 2014; 118:6604-13. [PMID: 24845543 DOI: 10.1021/jp500980x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In spite of its relevant biological role, no general consensus exists on the quantitative characterization of amino acid's hydropathy. In particular, many hydrophobicity scales exist, often producing quite different rankings for the amino acids. To make progress toward a systematic classification, we analyze amino acids' hydropathy based on the orientation of water molecules at a given distance from them as computed from molecular dynamics simulations. In contrast with what is usually done, we argue that assigning a single number is not enough to characterize the properties of an amino acid, in particular when both hydrophobic and hydrophilic regions are present in a residue. Instead we show that appropriately defined conditional probability densities can be used to map the hydrophilic and hydrophobic groups on the amino acids with greater detail than possible with other available methods. Three indicators are then defined based on the features of these probabilities to quantify the specific hydrophobicity and hydrophilicity of each amino acid. The characterization that we propose can be used to understand some of the ambiguities in the ranking of amino acids in the current scales. The quantitative indicators can also be used in combination with standard bioinformatics tools to predict the location of transmembrane regions of proteins. The method is sensitive to the specific environment of the amino acids and can be applied to unnatural and modified amino acids, as well as to other small organic molecules.
Collapse
Affiliation(s)
- S Bonella
- Department of Physics, Sapienza University of Rome , Ple A. Moro 5, 00185 Rome, Italy
| | | | | | | | | |
Collapse
|
71
|
Are proposed early genetic codes capable of encoding viable proteins? J Mol Evol 2014; 78:263-74. [PMID: 24826911 DOI: 10.1007/s00239-014-9622-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Accepted: 04/28/2014] [Indexed: 01/10/2023]
Abstract
Proteins are elaborate biopolymers balancing between contradicting intrinsic propensities to fold, aggregate, or remain disordered. Assessing their primary structural preferences observable without evolutionary optimization has been reinforced by the recent identification of de novo proteins that have emerged from previously non-coding sequences. In this paper we investigate structural preferences of hypothetical proteins translated from random DNA segments using the standard genetic code and three of its proposed evolutionarily predecessor models encoding 10, 6, and 4 amino acids, respectively. Our only main assumption is that the disorder, aggregation, and transmembrane helix predictions used are able to reflect the differences in the trends of the protein sets investigated. We found that the 10-residue code encodes proteins that resemble modern proteins in their predicted structural properties. All of the investigated early genetic codes give rise to proteins with enhanced disorder and diminished aggregation propensities. Our results suggest that an ancestral genetic code similar to the proposed 10-residue one is capable of encoding functionally diverse proteins but these might have existed under conditions different from today's common physiological ones. The existence of a protein functional repertoire for the investigated earlier stages which is quite distinct as it is today can be deduced from the presented results.
Collapse
|
72
|
Cao R, Wang Z, Wang Y, Cheng J. SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines. BMC Bioinformatics 2014; 15:120. [PMID: 24776231 PMCID: PMC4013430 DOI: 10.1186/1471-2105-15-120] [Citation(s) in RCA: 87] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2013] [Accepted: 04/15/2014] [Indexed: 01/19/2023] Open
Abstract
Background It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. Results We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. Conclusion SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/.
Collapse
Affiliation(s)
| | | | | | - Jianlin Cheng
- Department of Computer Science, Informatics Institute, Christopher S, Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA.
| |
Collapse
|
73
|
Yaseen A, Li Y. Context-based features enhance protein secondary structure prediction accuracy. J Chem Inf Model 2014; 54:992-1002. [PMID: 24571803 DOI: 10.1021/ci400647u] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We report a new approach of using statistical context-based scores as encoded features to train neural networks to achieve secondary structure prediction accuracy improvement. The context-based scores are pseudo-potentials derived by evaluating statistical, high-order inter-residue interactions, which estimate the favorability of a residue adopting certain secondary structure conformation within its amino acid environment. Encoding these context-based scores as important training and prediction features provides a way to address a long-standing difficulty in neural network-based secondary structure predictions of taking interdependency among secondary structures of neighboring residues into account. Our computational results have shown that the context-based scores are effective features to enhance the prediction accuracy of secondary structure predictions. An overall 7-fold cross-validated Q3 accuracy of 82.74% and Segment Overlap Accuracy (SOV) accuracy of 86.25% are achieved on a set of more than 7987 protein chains with, at most, 25% sequence identity. The Q3 prediction accuracy on benchmarks of CB513, Manesh215, Carugo338, as well as CASP9 protein chains is higher than popularly used secondary structure prediction servers, including Psipred, Profphd, Jpred, Porter (ab initio), and Netsurf. More significant improvement is observed in the SOV accuracy, where more than 4% enhancement is observed, compared to the server with the best SOV accuracy. A Q8 accuracy of >70% (71.5%) is also found in eight-state secondary structure prediction. The majority of the Q3 accuracy improvement is contributed from correctly identifying β-sheets and α-helices. When the context-based scores are incorporated, there are 15.5% more residues predicted with >90% confidence. These high-confidence predictions usually have a rather high accuracy (averagely ~95%). The three- and eight-state prediction servers (SCORPION) implementing our methods are available online.
Collapse
Affiliation(s)
- Ashraf Yaseen
- Department of Computer Science, Old Dominion University , Norfolk, Virginia 23529, United States
| | | |
Collapse
|
74
|
Mao W, Cong P, Wang Z, Lu L, Zhu Z, Li T. NMRDSP: an accurate prediction of protein shape strings from NMR chemical shifts and sequence data. PLoS One 2013; 8:e83532. [PMID: 24376713 PMCID: PMC3871590 DOI: 10.1371/journal.pone.0083532] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2013] [Accepted: 11/04/2013] [Indexed: 11/28/2022] Open
Abstract
Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp.
Collapse
Affiliation(s)
- Wusong Mao
- Department of Chemistry, Tongji University, Shanghai, China
| | - Peisheng Cong
- Department of Chemistry, Tongji University, Shanghai, China
- * E-mail: (PC); (TL)
| | - Zhiheng Wang
- Department of Chemistry, Tongji University, Shanghai, China
| | - Longjian Lu
- Department of Chemistry, Tongji University, Shanghai, China
| | - Zhongliang Zhu
- Department of Chemistry, Tongji University, Shanghai, China
| | - Tonghua Li
- Department of Chemistry, Tongji University, Shanghai, China
- * E-mail: (PC); (TL)
| |
Collapse
|
75
|
HMMpTM: improving transmembrane protein topology prediction using phosphorylation and glycosylation site prediction. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:316-22. [PMID: 24225132 DOI: 10.1016/j.bbapap.2013.11.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Revised: 11/02/2013] [Accepted: 11/04/2013] [Indexed: 11/22/2022]
Abstract
During the last two decades a large number of computational methods have been developed for predicting transmembrane protein topology. Current predictors rely on topogenic signals in the protein sequence, such as the distribution of positively charged residues in extra-membrane loops and the existence of N-terminal signals. However, phosphorylation and glycosylation are post-translational modifications (PTMs) that occur in a compartment-specific manner and therefore the presence of a phosphorylation or glycosylation site in a transmembrane protein provides topological information. We examine the combination of phosphorylation and glycosylation site prediction with transmembrane protein topology prediction. We report the development of a Hidden Markov Model based method, capable of predicting the topology of transmembrane proteins and the existence of kinase specific phosphorylation and N/O-linked glycosylation sites along the protein sequence. Our method integrates a novel feature in transmembrane protein topology prediction, which results in improved performance for topology prediction and reliable prediction of phosphorylation and glycosylation sites. The method is freely available at http://bioinformatics.biol.uoa.gr/HMMpTM.
Collapse
|
76
|
GHANTY PRADIP, PAL NIKHILR, MUDI RAJANIK. PREDICTION OF PROTEIN SECONDARY STRUCTURE USING PROBABILITY BASED FEATURES AND A HYBRID SYSTEM. J Bioinform Comput Biol 2013; 11:1350012. [DOI: 10.1142/s0219720013500121] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this paper, we propose some co-occurrence probability-based features for prediction of protein secondary structure. The features are extracted using occurrence/nonoccurrence of secondary structures in the protein sequences. We explore two types of features: position-specific (based on position of amino acid on fragments of protein sequences) as well as position-independent (independent of amino acid position on fragments of protein sequences). We use a hybrid system, NEUROSVM, consisting of neural networks and support vector machines for classification of secondary structures. We propose two schemes NSVMps and NSVM for protein secondary structure prediction. The NSVMps uses position-specific probability-based features and NEUROSVM classifier whereas NSVM uses the same classifier with position-independent probability-based features. The proposed method falls in the single-sequence category of methods because it does not use any sequence profile information such as position specific scoring matrices (PSSM) derived from PSI-BLAST. Two widely used datasets RS126 and CB513 are used in the experiments. The results obtained using the proposed features and NEUROSVM classifier are better than most of the existing single-sequence prediction methods. Most importantly, the results using NSVMps that are obtained using lower dimensional features, are comparable to those by other existing methods. The NSVMps and NSVM are finally tested on target proteins of the critical assessment of protein structure prediction experiment-9 (CASP9). A larger dataset is used to compare the performance of the proposed methods with that of two recent single-sequence prediction methods. We also investigate the impact of presence of different amino acid residues (in protein sequences) that are responsible for the formation of different secondary structures.
Collapse
Affiliation(s)
- PRADIP GHANTY
- Praxis Softek Solutions Private Limited, Module 616, SDF Building, Sector V, Saltlake, Kolkata, India
| | - NIKHIL R. PAL
- Electronics and Communication Sciences Unit, Indian Statistical Institute, 203 B. T. Road, Calcutta 700108, India
| | - RAJANI K. MUDI
- Department of Instrumentation and Electronics Engineering, Jadavpur University, Saltlake Campus, Kolkata, India
| |
Collapse
|
77
|
Cong P, Li D, Wang Z, Tang S, Li T. SPSSM8: an accurate approach for predicting eight-state secondary structures of proteins. Biochimie 2013; 95:2460-4. [PMID: 24056076 DOI: 10.1016/j.biochi.2013.09.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2013] [Accepted: 09/09/2013] [Indexed: 11/15/2022]
Abstract
Protein eight-state secondary structure prediction is challenging, but is necessary to determine protein structure and function. Here, we report the development of a novel approach, SPSSM8, to predict eight-state secondary structures of proteins accurately from sequences based on the structural position-specific scoring matrix (SPSSM). The SPSSM has been successfully utilized to predict three-state secondary structures. Now we employ an eight-state SPSSM as a feature that is obtained from sequence structure alignment against a large database of 9 million sequences with putative structural information. The SPSSM8 uses a low sequence identity dataset (9062 entries) as a training set and conditional random field for the classification algorithm. The SPSSM8 achieved an average eight-state secondary structure accuracy (Q8) of 71.7% (Q3, 81.6%) for an independent testing set (463 entries), which had an improved accuracy of 10.1% and 4.6% compared with SSPro8 and CNF, respectively, and significantly improved the accuracy of eight-state secondary structure prediction. For CASP 9 dataset (92 entries) the SPSSM8 achieved a Q8 accuracy of 80.1% (Q3, 83.0%). The SPSSM8 was confirmed as an outstanding predictor for eight-state secondary structures of proteins. SPSSM8 is freely available at http://cal.tongji.edu.cn/SPSSM8.
Collapse
Affiliation(s)
- Peisheng Cong
- Department of Chemistry, Tongji University, Shanghai, PR China.
| | | | | | | | | |
Collapse
|
78
|
Circular-dichroism and synchrotron-radiation circular-dichroism spectroscopy as tools to monitor protein structure in a lipid environment. Methods Mol Biol 2013; 974:151-76. [PMID: 23404276 DOI: 10.1007/978-1-62703-275-9_8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2023]
Abstract
Circular-dichroism (CD) spectroscopy is a powerful tool for the secondary-structure analysis of proteins. The structural information obtained by CD does not have atomic-level resolution (unlike X-ray crystallography and NMR spectroscopy), but it has the great advantage of being applicable to both nonnative and native proteins in a wide range of solution conditions containing lipids and detergents. The development of synchrotron-radiation CD (SRCD) instruments has greatly expanded the utility of this method by extending the spectra to the vacuum-ultraviolet region below 190 nm and producing information that is unobtainable by conventional CD instruments. Combining SRCD data with bioinformatics provides new insight into the conformational changes of proteins in a membrane environment.
Collapse
|
79
|
Saraswathi S, Fernández-Martínez JL, Koliński A, Jernigan RL, Kloczkowski A. Distributions of amino acids suggest that certain residue types more effectively determine protein secondary structure. J Mol Model 2013; 19:4337-48. [PMID: 23907551 DOI: 10.1007/s00894-013-1911-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2013] [Accepted: 06/05/2013] [Indexed: 11/27/2022]
Abstract
Exponential growth in the number of available protein sequences is unmatched by the slower growth in the number of structures. As a result, the development of efficient and fast protein secondary structure prediction methods is essential for the broad comprehension of protein structures. Computational methods that can efficiently determine secondary structure can in turn facilitate protein tertiary structure prediction, since most methods rely initially on secondary structure predictions. Recently, we have developed a fast learning optimized prediction methodology (FLOPRED) for predicting protein secondary structure (Saraswathi et al. in JMM 18:4275, 2012). Data are generated by using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data to obtain better and faster convergence to more accurate secondary structure predicted results. A five-fold cross-validated testing accuracy of 83.8 % and a segment overlap (SOV) score of 78.3 % are obtained in this study. Secondary structure predictions and their accuracy are usually presented for three secondary structure elements: α-helix, β-strand and coil but rarely have the results been analyzed with respect to their constituent amino acids. In this paper, we use the results obtained with FLOPRED to provide detailed behaviors for different amino acid types in the secondary structure prediction. We investigate the influence of the composition, physico-chemical properties and position specific occurrence preferences of amino acids within secondary structure elements. In addition, we identify the correlation between these properties and prediction accuracy. The present detailed results suggest several important ways that secondary structure predictions can be improved in the future that might lead to improved protein design and engineering.
Collapse
Affiliation(s)
- S Saraswathi
- Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, 700 Children's Drive, Columbus, OH, USA
| | | | | | | | | |
Collapse
|
80
|
Extracting physicochemical features to predict protein secondary structure. ScientificWorldJournal 2013; 2013:347106. [PMID: 23766688 PMCID: PMC3666292 DOI: 10.1155/2013/347106] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2013] [Accepted: 04/23/2013] [Indexed: 11/29/2022] Open
Abstract
We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances.
Collapse
|
81
|
Saravanan KM, Selvaraj S. Performance of secondary structure prediction methods on proteins containing structurally ambivalent sequence fragments. Biopolymers 2013; 100:148-53. [DOI: 10.1002/bip.22178] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Revised: 09/04/2012] [Accepted: 09/23/2012] [Indexed: 11/12/2022]
Affiliation(s)
- K. Mani Saravanan
- Department of Bioinformatics; School of Life Sciences; Bharathidasan University; Tiruchirappalli; 620024; Tamil Nadu; India
| | - Samuel Selvaraj
- Department of Bioinformatics; School of Life Sciences; Bharathidasan University; Tiruchirappalli; 620024; Tamil Nadu; India
| |
Collapse
|
82
|
Rath EM, Tessier D, Campbell AA, Lee HC, Werner T, Salam NK, Lee LK, Church WB. A benchmark server using high resolution protein structure data, and benchmark results for membrane helix predictions. BMC Bioinformatics 2013; 14:111. [PMID: 23530628 PMCID: PMC3620685 DOI: 10.1186/1471-2105-14-111] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2012] [Accepted: 03/19/2013] [Indexed: 11/27/2022] Open
Abstract
Background Helical membrane proteins are vital for the interaction of cells with their environment. Predicting the location of membrane helices in protein amino acid sequences provides substantial understanding of their structure and function and identifies membrane proteins in sequenced genomes. Currently there is no comprehensive benchmark tool for evaluating prediction methods, and there is no publication comparing all available prediction tools. Current benchmark literature is outdated, as recently determined membrane protein structures are not included. Current literature is also limited to global assessments, as specialised benchmarks for predicting specific classes of membrane proteins were not previously carried out. Description We present a benchmark server at http://sydney.edu.au/pharmacy/sbio/software/TMH_benchmark.shtml that uses recent high resolution protein structural data to provide a comprehensive assessment of the accuracy of existing membrane helix prediction methods. The server further allows a user to compare uploaded predictions generated by novel methods, permitting the comparison of these novel methods against all existing methods compared by the server. Benchmark metrics include sensitivity and specificity of predictions for membrane helix location and orientation, and many others. The server allows for customised evaluations such as assessing prediction method performances for specific helical membrane protein subtypes. We report results for custom benchmarks which illustrate how the server may be used for specialised benchmarks. Which prediction method is the best performing method depends on which measure is being benchmarked. The OCTOPUS membrane helix prediction method is consistently one of the highest performing methods across all measures in the benchmarks that we performed. Conclusions The benchmark server allows general and specialised assessment of existing and novel membrane helix prediction methods. Users can employ this benchmark server to determine the most suitable method for the type of prediction the user needs to perform, be it general whole-genome annotation or the prediction of specific types of helical membrane protein. Creators of novel prediction methods can use this benchmark server to evaluate the performance of their new methods. The benchmark server will be a valuable tool for researchers seeking to extract more sophisticated information from the large and growing protein sequence databases.
Collapse
Affiliation(s)
- Emma M Rath
- Group in Biomolecular Structure and Informatics, Faculty of Pharmacy, The University of Sydney, Darlinghurst, Sydney NSW 2006, Australia
| | | | | | | | | | | | | | | |
Collapse
|
83
|
Mechelke M, Habeck M. A probabilistic model for secondary structure prediction from protein chemical shifts. Proteins 2013; 81:984-93. [DOI: 10.1002/prot.24249] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2012] [Revised: 11/07/2012] [Accepted: 12/18/2012] [Indexed: 11/10/2022]
|
84
|
Yan J, Marcus M, Kurgan L. Comprehensively designed consensus of standalone secondary structure predictors improves Q3 by over 3%. J Biomol Struct Dyn 2013; 32:36-51. [PMID: 23298369 DOI: 10.1080/07391102.2012.746945] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Protein fold is defined by a spatial arrangement of three types of secondary structures (SSs) including helices, sheets, and coils/loops. Current methods that predict SS from sequences rely on complex machine learning-derived models and provide the three-state accuracy (Q3) at about 82%. Further improvements in predictive quality could be obtained with a consensus-based approach, which so far received limited attention. We perform first-of-its-kind comprehensive design of a SS consensus predictor (SScon), in which we consider 12 modern standalone SS predictors and utilize Support Vector Machine (SVM) to combine their predictions. Using a large benchmark data-set with 10 random training-test splits, we show that a simple, voting-based consensus of carefully selected base methods improves Q3 by 1.9% when compared to the best single predictor. Use of SVM provides additional 1.4% improvement with the overall Q3 at 85.6% and segment overlap (SOV3) at 83.7%, when compared to 82.3 and 80.9%, respectively, obtained by the best individual methods. We also show strong improvements when the consensus is based on ab-initio methods, with Q3 = 82.3% and SOV3 = 80.7% that match the results from the best template-based approaches. Our consensus reduces the number of significant errors where helix is confused with a strand, provides particularly good results for short helices and strands, and gives the most accurate estimates of the content of individual SSs in the chain. Case studies are used to visualize the improvements offered by the consensus at the residue level. A web-server and a standalone implementation of SScon are available at http://biomine.ece.ualberta.ca/SSCon/ .
Collapse
Affiliation(s)
- Jing Yan
- a Department of Electrical and Computer Engineering , University of Alberta , Edmonton , Canada
| | | | | |
Collapse
|
85
|
Saraswathi S, Fernández-Martínez JL, Kolinski A, Jernigan RL, Kloczkowski A. Fast learning optimized prediction methodology (FLOPRED) for protein secondary structure prediction. J Mol Model 2012; 18:4275-89. [PMID: 22562230 PMCID: PMC3694724 DOI: 10.1007/s00894-012-1410-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2012] [Accepted: 03/19/2012] [Indexed: 10/28/2022]
Abstract
Computational methods are rapidly gaining importance in the field of structural biology, mostly due to the explosive progress in genome sequencing projects and the large disparity between the number of sequences and the number of structures. There has been an exponential growth in the number of available protein sequences and a slower growth in the number of structures. There is therefore an urgent need to develop computational methods to predict structures and identify their functions from the sequence. Developing methods that will satisfy these needs both efficiently and accurately is of paramount importance for advances in many biomedical fields, including drug development and discovery of biomarkers. A novel method called fast learning optimized prediction methodology (FLOPRED) is proposed for predicting protein secondary structure, using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data that yield better and faster convergence to produce more accurate results. Protein secondary structures are predicted reliably, more efficiently and more accurately using FLOPRED. These techniques yield superior classification of secondary structure elements, with a training accuracy ranging between 83 % and 87 % over a widerange of hidden neurons and a cross-validated testing accuracy ranging between 81 % and 84 % and a segment overlap (SOV) score of 78 % that are obtained with different sets of proteins. These results are comparable to other recently published studies, but are obtained with greater efficiencies, in terms of time and cost.
Collapse
Affiliation(s)
- Saras Saraswathi
- Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, OH, USA
| | | | - Andrzej Kolinski
- Department of Mathematics, Laboratory of Theory of Biopolymers, Faculty of Chemistry, Warsaw University, Pasteura 1, 02-093 Warsaw
| | - Robert L. Jernigan
- Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA, USA
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, OH, USA, Tel.: +1-614-722-3880 Fax: +1-614-355-2728
| |
Collapse
|
86
|
Kountouris P, Agathocleous M, Promponas VJ, Christodoulou G, Hadjicostas S, Vassiliades V, Christodoulou C. A comparative study on filtering protein secondary structure prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:731-739. [PMID: 22291162 DOI: 10.1109/tcbb.2012.22] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Filtering of Protein Secondary Structure Prediction (PSSP) aims to provide physicochemically realistic results, while it usually improves the predictive performance. We performed a comparative study on this challenging problem, utilizing both machine learning techniques and empirical rules and we found that combinations of the two lead to the highest improvement.
Collapse
Affiliation(s)
- Petros Kountouris
- Department of Computer Science, University of Cyprus, 75 Kallipoleos Avenue, PO Box 20537, 1678 Nicosia, Cyprus.
| | | | | | | | | | | | | |
Collapse
|
87
|
Armano G, Ledda F. Exploiting intrastructure information for secondary structure prediction with multifaceted pipelines. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:799-808. [PMID: 22201070 DOI: 10.1109/tcbb.2011.159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Predicting the secondary structure of proteins is still a typical step in several bioinformatic tasks, in particular, for tertiary structure prediction. Notwithstanding the impressive results obtained so far, mostly due to the advent of sequence encoding schemes based on multiple alignment, in our view the problem should be studied from a novel perspective, in which understanding how available information sources are dealt with plays a central role. After revisiting a well-known secondary structure predictor viewed from this perspective (with the goal of identifying which sources of information have been considered and which have not), we propose a generic software architecture designed to account for all relevant information sources. To demonstrate the validity of the approach, a predictor compliant with the proposed generic architecture has been implemented and compared with several state-of-the-art secondary structure predictors. Experiments have been carried out on standard data sets, and the corresponding results confirm the validity of the approach. The predictor is available at http://iasc.diee.unica.it/ssp2/ through the corresponding web application or as downloadable stand-alone portable unpack-and-run bundle.
Collapse
Affiliation(s)
- Giuliano Armano
- Department of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, Cagliari 09123, Italy.
| | | |
Collapse
|
88
|
Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y. SPINE X: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J Comput Chem 2012; 33:259-67. [PMID: 22045506 PMCID: PMC3240697 DOI: 10.1002/jcc.21968] [Citation(s) in RCA: 187] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2011] [Revised: 09/16/2011] [Accepted: 09/18/2011] [Indexed: 11/11/2022]
Abstract
Accurate prediction of protein secondary structure is essential for accurate sequence alignment, three-dimensional structure modeling, and function prediction. The accuracy of ab initio secondary structure prediction from sequence, however, has only increased from around 77 to 80% over the past decade. Here, we developed a multistep neural-network algorithm by coupling secondary structure prediction with prediction of solvent accessibility and backbone torsion angles in an iterative manner. Our method called SPINE X was applied to a dataset of 2640 proteins (25% sequence identity cutoff) previously built for the first version of SPINE and achieved a 82.0% accuracy based on 10-fold cross validation (Q(3)). Surpassing 81% accuracy by SPINE X is further confirmed by employing an independently built test dataset of 1833 protein chains, a recently built dataset of 1975 proteins and 117 CASP 9 targets (critical assessment of structure prediction techniques) with an accuracy of 81.3%, 82.3% and 81.8%, respectively. The prediction accuracy is further improved to 83.8% for the dataset of 2640 proteins if the DSSP assignment used above is replaced by a more consistent consensus secondary structure assignment method. Comparison to the popular PSIPRED and CASP-winning structure-prediction techniques is made. SPINE X predicts number of helices and sheets correctly for 21.0% of 1833 proteins, compared to 17.6% by PSIPRED. It further shows that SPINE X consistently makes more accurate prediction in helical residues (6%) without over prediction while PSIPRED makes more accurate prediction in coil residues (3-5%) and over predicts them by 7%. SPINE X Server and its training/test datasets are available at http://sparks.informatics.iupui.edu/
Collapse
Affiliation(s)
- Eshel Faraggi
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 719 Indiana Ave Ste 319, Walker Plaza Building, Indianapolis, Indiana 46202, USA
| | - Tuo Zhang
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 719 Indiana Ave Ste 319, Walker Plaza Building, Indianapolis, Indiana 46202, USA
| | - Yuedong Yang
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 719 Indiana Ave Ste 319, Walker Plaza Building, Indianapolis, Indiana 46202, USA
| | - Lukasz Kurgan
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 719 Indiana Ave Ste 319, Walker Plaza Building, Indianapolis, Indiana 46202, USA
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada
| | - Yaoqi Zhou
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 719 Indiana Ave Ste 319, Walker Plaza Building, Indianapolis, Indiana 46202, USA
| |
Collapse
|
89
|
Errami M, Geourjon C, Deléage G. Conservation of Amino Acids into Multiple Alignments Involved in Pairwise Interactions in Three-Dimensional Protein Structures. J Bioinform Comput Biol 2012; 1:505-520. [PMID: 15307241 DOI: 10.1142/s0219720003000228] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We present an original strategy, that involves a bioinformatic software structure, in order to perform an exhaustive and objective statistical analysis of three-dimensional structures of proteins. We establish the relationship between multiple sequences alignments and various structural features of proteins. We show that amino acids implied in disulfide bonds, salt bridges and hydrophobic interactions are particularly conserved. Effects of identity, global similarity within alignments, and accessibility of interactions have been studied. Furthermore, we point out that the more variable the sequences within a multiple alignment, the more informative the multiple alignment. The results support multiple alignments usefulness for predictions of structural features.
Collapse
Affiliation(s)
- Mounir Errami
- Pôle Bioinfomatique Lyonnais-Institute de Biologie et Chimie des Protéines, Laboratoire de Bioinformatique et RMN structurales, Lyon cedex, France.
| | | | | |
Collapse
|
90
|
Gopalakrishnan V. Computer Aided Knowledge Discovery in Biomedicine. Mach Learn 2012. [DOI: 10.4018/978-1-60960-818-7.ch512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This chapter provides a perspective on 3 important collaborative areas in systems biology research. These areas represent biological problems of clinical significance. The first area deals with macromolecular crystallization, which is a crucial step in protein structure determination. The second area deals with proteomic biomarker discovery from high-throughput mass spectral technologies; while the third area is protein structure prediction and complex fold recognition from sequence and prior knowledge of structure properties. For each area, successful case studies are revisited from the perspective of computer- aided knowledge discovery using machine learning and statistical methods. Information about protein sequence, structure, and function is slowly accumulating in standardized forms within databases. Methods are needed to maximize the use of this prior information for prediction and analysis purposes. This chapter provides insights into such methods by which available information in existing databases can be processed and combined with systems biology expertise to expedite biomedical discoveries.
Collapse
|
91
|
GUBBI JAYAVARDHANA, LAI DANIELTH, PALANISWAMI MARIMUTHU, PARKER MICHAEL. PROTEIN SECONDARY STRUCTURE PREDICTION USING SUPPORT VECTOR MACHINES AND A NEW FEATURE REPRESENTATION. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2011. [DOI: 10.1142/s1469026806002076] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Knowledge of the secondary structure and solvent accessibility of a protein plays a vital role in the prediction of fold, and eventually the tertiary structure of the protein. A challenging issue of predicting protein secondary structure from sequence alone is addressed. Support vector machines (SVM) are employed for the classification and the SVM outputs are converted to posterior probabilities for multi-class classification. The effect of using Chou–Fasman parameters and physico-chemical parameters along with evolutionary information in the form of position specific scoring matrix (PSSM) is analyzed. These proposed methods are tested on the RS126 and CB513 datasets. A new dataset is curated (PSS504) using recent release of CATH. On the CB513 dataset, sevenfold cross-validation accuracy of 77.9% was obtained using the proposed encoding method. A new method of calculating the reliability index based on the number of votes and the Support Vector Machine decision value is also proposed. A blind test on the EVA dataset gives an average Q3 accuracy of 74.5% and ranks in top five protein structure prediction methods. Supplementary material including datasets are available on .
Collapse
Affiliation(s)
- JAYAVARDHANA GUBBI
- Department of Electrical and Electronic Engineering, The University of Melbourne, Victoria 3010, Australia
| | - DANIEL T. H. LAI
- Department of Electrical and Electronic Engineering, The University of Melbourne, Victoria 3010, Australia
| | - MARIMUTHU PALANISWAMI
- Department of Electrical and Electronic Engineering, The University of Melbourne, Victoria 3010, Australia
| | - MICHAEL PARKER
- St. Vincent's Institute of Medical Research, 9 Princes Street, Fitzroy, Victoria 3065, Australia
| |
Collapse
|
92
|
Wei Y, Thompson J, Floudas CA. CONCORD: a consensus method for protein secondary structure prediction via mixed integer linear optimization. Proc Math Phys Eng Sci 2011. [DOI: 10.1098/rspa.2011.0514] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Most of the protein structure prediction methods use a multi-step process, which often includes secondary structure prediction, contact prediction, fragment generation, clustering, etc. For many years, secondary structure prediction has been the workhorse for numerous methods aimed at predicting protein structure and function. This paper presents a new mixed integer linear optimization (MILP)-based consensus method: a Consensus scheme based On a mixed integer liNear optimization method for seCOndary stRucture preDiction (CONCORD). Based on seven secondary structure prediction methods, SSpro, DSC, PROF, PROFphd, PSIPRED, Predator and GorIV, the MILP-based consensus method combines the strengths of different methods, maximizes the number of correctly predicted amino acids and achieves a better prediction accuracy. The method is shown to perform well compared with the seven individual methods when tested on the PDBselect25 training protein set using sixfold cross validation. It also performs well compared with another set of 10 online secondary structure prediction servers (including several recent ones) when tested on the CASP9 targets (
http://predictioncenter.org/casp9/
). The average Q3 prediction accuracy is 83.04 per cent for the sixfold cross validation of the PDBselect25 set and 82.3 per cent for the CASP9 targets. We have developed a MILP-based consensus method for protein secondary structure prediction. A web server, CONCORD, is available to the scientific community at
http://helios.princeton.edu/CONCORD
.
Collapse
Affiliation(s)
- Y. Wei
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - J. Thompson
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - C. A. Floudas
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
93
|
Li D, Li T, Cong P, Xiong W, Sun J. A novel structural position-specific scoring matrix for the prediction of protein secondary structures. Bioinformatics 2011; 28:32-9. [DOI: 10.1093/bioinformatics/btr611] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
94
|
Efficient Traversal of Beta-Sheet Protein Folding Pathways Using Ensemble Models. J Comput Biol 2011; 18:1635-47. [DOI: 10.1089/cmb.2011.0176] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
95
|
O'Donnell CW, Waldispühl J, Lis M, Halfmann R, Devadas S, Lindquist S, Berger B. A method for probing the mutational landscape of amyloid structure. ACTA ACUST UNITED AC 2011; 27:i34-42. [PMID: 21685090 PMCID: PMC3117379 DOI: 10.1093/bioinformatics/btr238] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Motivation: Proteins of all kinds can self-assemble into highly ordered β-sheet aggregates known as amyloid fibrils, important both biologically and clinically. However, the specific molecular structure of a fibril can vary dramatically depending on sequence and environmental conditions, and mutations can drastically alter amyloid function and pathogenicity. Experimental structure determination has proven extremely difficult with only a handful of NMR-based models proposed, suggesting a need for computational methods. Results: We present AmyloidMutants, a statistical mechanics approach for de novo prediction and analysis of wild-type and mutant amyloid structures. Based on the premise of protein mutational landscapes, AmyloidMutants energetically quantifies the effects of sequence mutation on fibril conformation and stability. Tested on non-mutant, full-length amyloid structures with known chemical shift data, AmyloidMutants offers roughly 2-fold improvement in prediction accuracy over existing tools. Moreover, AmyloidMutants is the only method to predict complete super-secondary structures, enabling accurate discrimination of topologically dissimilar amyloid conformations that correspond to the same sequence locations. Applied to mutant prediction, AmyloidMutants identifies a global conformational switch between Aβ and its highly-toxic ‘Iowa’ mutant in agreement with a recent experimental model based on partial chemical shift data. Predictions on mutant, yeast-toxic strains of HET-s suggest similar alternate folds. When applied to HET-s and a HET-s mutant with core asparagines replaced by glutamines (both highly amyloidogenic chemically similar residues abundant in many amyloids), AmyloidMutants surprisingly predicts a greatly reduced capacity of the glutamine mutant to form amyloid. We confirm this finding by conducting mutagenesis experiments. Availability: Our tool is publically available on the web at http://amyloid.csail.mit.edu/. Contact:lindquist_admin@wi.mit.edu; bab@csail.mit.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Charles W O'Donnell
- Computer Science and Artificial Intelligence Laboratory, Cambridge, MA 02139, USA
| | | | | | | | | | | | | |
Collapse
|
96
|
Kinch L, Yong Shi S, Cong Q, Cheng H, Liao Y, Grishin NV. CASP9 assessment of free modeling target predictions. Proteins 2011; 79 Suppl 10:59-73. [PMID: 21997521 DOI: 10.1002/prot.23181] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2011] [Revised: 08/26/2011] [Accepted: 09/04/2011] [Indexed: 11/06/2022]
Abstract
We present an overview of the ninth round of Critical Assessment of Protein Structure Prediction (CASP9) "Template free modeling" category (FM). Prediction models were evaluated using a combination of established structural and sequence comparison measures and a novel automated method designed to mimic manual inspection by capturing both global and local structural features. These scores were compared to those assigned manually over a diverse subset of target domains. Scores were combined to compare overall performance of participating groups and to estimate rank significance. Moreover, we discuss a few examples of free modeling targets to highlight the progress and bottlenecks of current prediction methods. Notably, a server prediction model for a single target (T0581) improved significantly over the closest structure template (44% GDT increase). This accomplishment represents the "winner" of the CASP9 FM category. A number of human expert groups submitted slight variations of this model, highlighting a trend for human experts to act as "meta predictors" by correctly selecting among models produced by the top-performing automated servers. The details of evaluation are available at http://prodata.swmed.edu/CASP9/ .
Collapse
Affiliation(s)
- Lisa Kinch
- Howard Hughes Medical Institute, University of Texas, Southwestern Medical Center, Dallas, TX 75390-9050, USA. .
| | | | | | | | | | | |
Collapse
|
97
|
Cong Q, Kinch LN, Pei J, Shi S, Grishin VN, Li W, Grishin NV. An automatic method for CASP9 free modeling structure prediction assessment. ACTA ACUST UNITED AC 2011; 27:3371-8. [PMID: 21994223 DOI: 10.1093/bioinformatics/btr572] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Manual inspection has been applied to and is well accepted for assessing critical assessment of protein structure prediction (CASP) free modeling (FM) category predictions over the years. Such manual assessment requires expertise and significant time investment, yet has the problems of being subjective and unable to differentiate models of similar quality. It is beneficial to incorporate the ideas behind manual inspection to an automatic score system, which could provide objective and reproducible assessment of structure models. RESULTS Inspired by our experience in CASP9 FM category assessment, we developed an automatic superimposition independent method named Quality Control Score (QCS) for structure prediction assessment. QCS captures both global and local structural features, with emphasis on global topology. We applied this method to all FM targets from CASP9, and overall the results showed the best agreement with Manual Inspection Scores among automatic prediction assessment methods previously applied in CASPs, such as Global Distance Test Total Score (GDT_TS) and Contact Score (CS). As one of the important components to guide our assessment of CASP9 FM category predictions, this method correlates well with other scoring methods and yet is able to reveal good-quality models that are missed by GDT_TS. AVAILABILITY The script for QCS calculation is available at http://prodata.swmed.edu/QCS/. CONTACT grishin@chop.swmed.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qian Cong
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050, USA
| | | | | | | | | | | | | |
Collapse
|
98
|
Bouziane H, Messabih B, Chouarfia A. Profiles and majority voting-based ensemble method for protein secondary structure prediction. Evol Bioinform Online 2011; 7:171-89. [PMID: 22058650 PMCID: PMC3204938 DOI: 10.4137/ebo.s7931] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Machine learning techniques have been widely applied to solve the problem of predicting protein secondary structure from the amino acid sequence. They have gained substantial success in this research area. Many methods have been used including k-Nearest Neighbors (k-NNs), Hidden Markov Models (HMMs), Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), which have attracted attention recently. Today, the main goal remains to improve the prediction quality of the secondary structure elements. The prediction accuracy has been continuously improved over the years, especially by using hybrid or ensemble methods and incorporating evolutionary information in the form of profiles extracted from alignments of multiple homologous sequences. In this paper, we investigate how best to combine k-NNs, ANNs and Multi-class SVMs (M-SVMs) to improve secondary structure prediction of globular proteins. An ensemble method which combines the outputs of two feed-forward ANNs, k-NN and three M-SVM classifiers has been applied. Ensemble members are combined using two variants of majority voting rule. An heuristic based filter has also been applied to refine the prediction. To investigate how much improvement the general ensemble method can give rather than the individual classifiers that make up the ensemble, we have experimented with the proposed system on the two widely used benchmark datasets RS126 and CB513 using cross-validation tests by including PSI-BLAST position-specific scoring matrix (PSSM) profiles as inputs. The experimental results reveal that the proposed system yields significant performance gains when compared with the best individual classifier.
Collapse
Affiliation(s)
- Hafida Bouziane
- Department of Computer Science, USTO-MB University, BP 1505 El Mnaouer, Oran, Algeria
| | | | | |
Collapse
|
99
|
Qu W, Sui H, Yang B, Qian W. Improving protein secondary structure prediction using a multi-modal BP method. Comput Biol Med 2011; 41:946-59. [DOI: 10.1016/j.compbiomed.2011.08.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2011] [Revised: 07/18/2011] [Accepted: 08/01/2011] [Indexed: 11/26/2022]
|
100
|
Qu W, Yang B, Jiang W, Wang L. HYBP_PSSP: a hybrid back propagation method for predicting protein secondary structure. Neural Comput Appl 2011. [DOI: 10.1007/s00521-011-0739-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|