151
|
Dehzangi A, Paliwal K, Lyons J, Sharma A, Sattar A. A Segmentation-Based Method to Extract Structural and Evolutionary Features for Protein Fold Recognition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:510-519. [PMID: 26356019 DOI: 10.1109/tcbb.2013.2296317] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Protein fold recognition (PFR) is considered as an important step towards the protein structure prediction problem. Despite all the efforts that have been made so far, finding an accurate and fast computational approach to solve the PFR still remains a challenging problem for bioinformatics and computational biology. In this study, we propose the concept of segmented-based feature extraction technique to provide local evolutionary information embedded in position specific scoring matrix (PSSM) and structural information embedded in the predicted secondary structure of proteins using SPINE-X. We also employ the concept of occurrence feature to extract global discriminatory information from PSSM and SPINE-X. By applying a support vector machine (SVM) to our extracted features, we enhance the protein fold prediction accuracy for 7.4 percent over the best results reported in the literature. We also report 73.8 percent prediction accuracy for a data set consisting of proteins with less than 25 percent sequence similarity rates and 80.7 percent prediction accuracy for a data set with proteins belonging to 110 folds with less than 40 percent sequence similarity rates. We also investigate the relation between the number of folds and the number of features being used and show that the number of features should be increased to get better protein fold prediction results when the number of folds is relatively large.
Collapse
|
152
|
Zheng W, Zhang C, Hanlon M, Ruan J, Gao J. An ensemble method for prediction of conformational B-cell epitopes from antigen sequences. Comput Biol Chem 2014; 49:51-8. [DOI: 10.1016/j.compbiolchem.2014.02.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2013] [Revised: 01/26/2014] [Accepted: 02/10/2014] [Indexed: 12/12/2022]
|
153
|
Feng Y, Lin H, Luo L. Prediction of protein secondary structure using feature selection and analysis approach. Acta Biotheor 2014; 62:1-14. [PMID: 24052343 DOI: 10.1007/s10441-013-9203-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2012] [Accepted: 08/24/2013] [Indexed: 01/09/2023]
Abstract
The prediction of the secondary structure of a protein from its amino acid sequence is an important step towards the prediction of its three-dimensional structure. However, the accuracy of ab initio secondary structure prediction from sequence is about 80% currently, which is still far from satisfactory. In this study, we proposed a novel method that uses binomial distribution to optimize tetrapeptide structural words and increment of diversity with quadratic discriminant to perform prediction for protein three-state secondary structure. A benchmark dataset including 2,640 proteins with sequence identity of less than 25% was used to train and test the proposed method. The results indicate that overall accuracy of 87.8% was achieved in secondary structure prediction by using ten-fold cross-validation. Moreover, the accuracy of predicted secondary structures ranges from 84 to 89% at the level of residue. These results suggest that the feature selection technique can detect the optimized tetrapeptide structural words which affect the accuracy of predicted secondary structures.
Collapse
|
154
|
Ahmed MH, Kellogg GE, Selley DE, Safo MK, Zhang Y. Predicting the molecular interactions of CRIP1a-cannabinoid 1 receptor with integrated molecular modeling approaches. Bioorg Med Chem Lett 2014; 24:1158-65. [PMID: 24461351 PMCID: PMC4353595 DOI: 10.1016/j.bmcl.2013.12.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2013] [Revised: 12/26/2013] [Accepted: 12/29/2013] [Indexed: 12/14/2022]
Abstract
Cannabinoid receptors are a family of G-protein coupled receptors that are involved in a wide variety of physiological processes and diseases. One of the key regulators that are unique to cannabinoid receptors is the cannabinoid receptor interacting proteins (CRIPs). Among them CRIP1a was found to decrease the constitutive activity of the cannabinoid type-1 receptor (CB1R). The aim of this study is to gain an understanding of the interaction between CRIP1a and CB1R through using different computational techniques. The generated model demonstrated several key putative interactions between CRIP1a and CB1R, including the critical involvement of Lys130 in CRIP1a.
Collapse
Affiliation(s)
- Mostafa H Ahmed
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA; Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Glen E Kellogg
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA; Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, VA 23298, USA; Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Dana E Selley
- Department of Pharmacology and Toxicology, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Martin K Safo
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA; Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Yan Zhang
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA.
| |
Collapse
|
155
|
Maurice KJ. SSThread: Template-free protein structure prediction by threading pairs of contacting secondary structures followed by assembly of overlapping pairs. J Comput Chem 2014; 35:644-56. [PMID: 24523210 DOI: 10.1002/jcc.23543] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2013] [Revised: 11/15/2013] [Accepted: 01/05/2014] [Indexed: 11/12/2022]
Abstract
Acquiring the three-dimensional structure of a protein from its amino acid sequence alone, despite a great deal of work and significant progress on the subject, is still an unsolved problem. SSThread, a new template-free algorithm is described here that consists of making several predictions of contacting pairs of α-helices and β-strands derived from a database of experimental structures using a knowledge-based potential, secondary structure prediction, and contact map prediction followed by assembly of overlapping pair predictions to create an ensemble of core structure predictions whose loops are then predicted. In a set of seven CASP10 targets SSThread outperformed the two leading methods for two targets each. The targets were all β-strand containing structures and most of them have a high relative contact order which demonstrates the advantages of SSThread. The primary bottlenecks based on sets of 74 and 21 test cases are the pair prediction and loop prediction stages.
Collapse
|
156
|
Dehzangi A, Paliwal K, Lyons J, Sharma A, Sattar A. Proposing a highly accurate protein structural class predictor using segmentation-based features. BMC Genomics 2014; 15 Suppl 1:S2. [PMID: 24564476 PMCID: PMC4046757 DOI: 10.1186/1471-2164-15-s1-s2] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Prediction of the structural classes of proteins can provide important information about their functionalities as well as their major tertiary structures. It is also considered as an important step towards protein structure prediction problem. Despite all the efforts have been made so far, finding a fast and accurate computational approach to solve protein structural class prediction problem still remains a challenging problem in bioinformatics and computational biology. RESULTS In this study we propose segmented distribution and segmented auto covariance feature extraction methods to capture local and global discriminatory information from evolutionary profiles and predicted secondary structure of the proteins. By applying SVM to our extracted features, for the first time we enhance the protein structural class prediction accuracy to over 90% and 85% for two popular low-homology benchmarks that have been widely used in the literature. We report 92.2% and 86.3% prediction accuracies for 25PDB and 1189 benchmarks which are respectively up to 7.9% and 2.8% better than previously reported results for these two benchmarks. CONCLUSION By proposing segmented distribution and segmented auto covariance feature extraction methods to capture local and global discriminatory information from evolutionary profiles and predicted secondary structure of the proteins, we are able to enhance the protein structural class prediction performance significantly.
Collapse
|
157
|
Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes. PLoS One 2014; 9:e86703. [PMID: 24475169 PMCID: PMC3901691 DOI: 10.1371/journal.pone.0086703] [Citation(s) in RCA: 112] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2013] [Accepted: 12/10/2013] [Indexed: 11/22/2022] Open
Abstract
Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader) were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that the proposed DBPPred can be an alternative perspective predictor for large-scale determination of DNA-binding proteins.
Collapse
|
158
|
Folkman L, Stantic B, Sattar A. Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins. BMC Genomics 2014; 15 Suppl 1:S4. [PMID: 24564514 PMCID: PMC4046685 DOI: 10.1186/1471-2164-15-s1-s4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Background Reliable prediction of stability changes induced by a single amino acid substitution is an important aspect of computational protein design. Several machine learning methods capable of predicting stability changes from the protein sequence alone have been introduced. Prediction performance of these methods is evaluated on mutations unseen during training. Nevertheless, different mutations of the same protein, and even the same residue, as encountered during training are commonly used for evaluation. We argue that a faithful evaluation can be achieved only when a method is tested on previously unseen proteins with low sequence similarity to the training set. Results We provided experimental evidence of the limitations of the evaluation commonly used for assessing the prediction performance. Furthermore, we demonstrated that the prediction of stability changes in previously unseen non-homologous proteins is a challenging task for currently available methods. To improve the prediction performance of our previously proposed method, we identified features which led to over-fitting and further extended the model with new features. The new method employs Evolutionary And Structural Encodings with Amino Acid parameters (EASE-AA). Evaluated with an independent test set of more than 600 mutations, EASE-AA yielded a Matthews correlation coefficient of 0.36 and was able to classify correctly 66% of the stabilising and 74% of the destabilising mutations. For real-value prediction, EASE-AA achieved the correlation of predicted and experimentally measured stability changes of 0.51. Conclusions Commonly adopted evaluation with mutations in the same protein, and even the same residue, randomly divided between the training and test sets lead to an overestimation of prediction performance. Therefore, stability changes prediction methods should be evaluated only on mutations in previously unseen non-homologous proteins. Under such an evaluation, EASE-AA predicts stability changes more reliably than currently available methods. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-S1-S4) contains supplementary material, which is available to authorized users.
Collapse
|
159
|
Yang Y, Zhao H, Wang J, Zhou Y. SPOT-Seq-RNA: predicting protein-RNA complex structure and RNA-binding function by fold recognition and binding affinity prediction. Methods Mol Biol 2014; 1137:119-30. [PMID: 24573478 PMCID: PMC3937850 DOI: 10.1007/978-1-4939-0366-5_9] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
RNA-binding proteins (RBPs) play key roles in RNA metabolism and post-transcriptional regulation. Computational methods have been developed separately for prediction of RBPs and RNA-binding residues by machine-learning techniques and prediction of protein-RNA complex structures by rigid or semiflexible structure-to-structure docking. Here, we describe a template-based technique called SPOT-Seq-RNA that integrates prediction of RBPs, RNA-binding residues, and protein-RNA complex structures into a single package. This integration is achieved by combining template-based structure-prediction software, SPARKS X, with binding affinity prediction software, DRNA. This tool yields reasonable sensitivity (46 %) and high precision (84 %) for an independent test set of 215 RBPs and 5,766 non-RBPs. SPOT-Seq-RNA is computationally efficient for genome-scale prediction of RBPs and protein-RNA complex structures. Its application to human genome study has revealed a similar sensitivity and ability to uncover hundreds of novel RBPs beyond simple homology. The online server and downloadable version of SPOT-Seq-RNA are available at http://sparks-lab.org/server/SPOT-Seq-RNA/.
Collapse
Affiliation(s)
- Yuedong Yang
- School of Informatics, Indiana University Purdue University, Indianapolis, IN, USA
| | | | | | | |
Collapse
|
160
|
Belushkin AA, Vinogradov DV, Gelfand MS, Osterman AL, Cieplak P, Kazanov MD. Sequence-derived structural features driving proteolytic processing. Proteomics 2013; 14:42-50. [PMID: 24227478 DOI: 10.1002/pmic.201300416] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Revised: 10/22/2013] [Accepted: 10/28/2013] [Indexed: 12/11/2022]
Abstract
Proteolytic signaling, or regulated proteolysis, is an essential part of many important pathways such as Notch, Wnt, and Hedgehog. How the structure of the cleaved substrate regions influences the efficacy of proteolytic processing remains underexplored. Here, we analyzed the relative importance in proteolysis of various structural features derived from substrate sequences using a dataset of more than 5000 experimentally verified proteolytic events captured in CutDB. Accessibility to the solvent was recognized as an essential property of a proteolytically processed polypeptide chain. Proteolytic events were found nearly uniformly distributed among three types of secondary structure, although with some enrichment in loops. Cleavages in α-helices were found to be relatively abundant in regions apparently prone to unfolding, while cleavages in β-structures tended to be located at the periphery of β-sheets. Application of the same statistical procedures to proteolytic events divided into separate sets according to the catalytic classes of proteases proved consistency of the results and confirmed that the structural mechanisms of proteolysis are universal. The estimated prediction power of sequence-derived structural features, which turned out to be sufficiently high, presents a rationale for their use in bioinformatic prediction of proteolytic events.
Collapse
Affiliation(s)
- Alexander A Belushkin
- Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia
| | | | | | | | | | | |
Collapse
|
161
|
Minhas FUAA, Geiss BJ, Ben-Hur A. PAIRpred: partner-specific prediction of interacting residues from sequence and structure. Proteins 2013; 82:1142-55. [PMID: 24243399 DOI: 10.1002/prot.24479] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2013] [Revised: 11/04/2013] [Accepted: 11/09/2013] [Indexed: 11/10/2022]
Abstract
We present a novel partner-specific protein-protein interaction site prediction method called PAIRpred. Unlike most existing machine learning binding site prediction methods, PAIRpred uses information from both proteins in a protein complex to predict pairs of interacting residues from the two proteins. PAIRpred captures sequence and structure information about residue pairs through pairwise kernels that are used for training a support vector machine classifier. As a result, PAIRpred presents a more detailed model of protein binding, and offers state of the art accuracy in predicting binding sites at the protein level as well as inter-protein residue contacts at the complex level. We demonstrate PAIRpred's performance on Docking Benchmark 4.0 and recent CAPRI targets. We present a detailed performance analysis outlining the contribution of different sequence and structure features, together with a comparison to a variety of existing interface prediction techniques. We have also studied the impact of binding-associated conformational change on prediction accuracy and found PAIRpred to be more robust to such structural changes than existing schemes. As an illustration of the potential applications of PAIRpred, we provide a case study in which PAIRpred is used to analyze the nature and specificity of the interface in the interaction of human ISG15 protein with NS1 protein from influenza A virus. Python code for PAIRpred is available at http://combi.cs.colostate.edu/supplements/pairpred/.
Collapse
|
162
|
Mitra P, Shultis D, Brender JR, Czajka J, Marsh D, Gray F, Cierpicki T, Zhang Y. An evolution-based approach to De Novo protein design and case study on Mycobacterium tuberculosis. PLoS Comput Biol 2013; 9:e1003298. [PMID: 24204234 PMCID: PMC3812052 DOI: 10.1371/journal.pcbi.1003298] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 09/09/2013] [Indexed: 01/31/2023] Open
Abstract
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality. The goal of computational protein design is to create new protein sequences of desirable structure and biological function. Most protein design methods are developed to search for sequences with the lowest free-energy based on physics-based force fields following Anfinsen's thermodynamic hypothesis. A major obstacle of such approaches is the inaccuracy of the force-field design, which cannot accurately describe atomic interactions or correctly recognize protein folds. We propose a novel method which uses evolutionary information, in the form of sequence profiles from structure families, to guide the sequence design. Since sequence profiles are generally more accurate than physics-based potentials in protein fold recognition, a unique advantage lies on that it targets the design procedure to a family of protein sequence profiles to enhance the robustness of designed sequences. The method was tested on 87 proteins and the designed sequences can be folded by I-TASSER to models with an average RMSD 2.1 Å. As a case study of large-scale application, the method is extended to redesign all structurally resolved proteins in the human pathogenic bacteria, Mycobacterium tuberculosis. Five sequences varying in fold and sizes were characterized by circular dichroism and NMR spectroscopy experiments and three were shown to have ordered tertiary structure.
Collapse
Affiliation(s)
- Pralay Mitra
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - David Shultis
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jeffrey R. Brender
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jeff Czajka
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - David Marsh
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Felicia Gray
- Department of Pathology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Tomasz Cierpicki
- Department of Pathology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| |
Collapse
|
163
|
Zhao H, Yang Y, Zhou Y. Prediction of RNA binding proteins comes of age from low resolution to high resolution. MOLECULAR BIOSYSTEMS 2013; 9:2417-25. [PMID: 23872922 PMCID: PMC3870025 DOI: 10.1039/c3mb70167k] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Networks of protein-RNA interactions is likely to be larger than protein-protein and protein-DNA interaction networks because RNA transcripts are encoded tens of times more than proteins (e.g. only 3% of human genome coded for proteins), have diverse function and localization, and are controlled by proteins from birth (transcription) to death (degradation). This massive network is evidenced by several recent experimental discoveries of large numbers of previously unknown RNA-binding proteins (RBPs). Meanwhile, more than 400 non-redundant protein-RNA complex structures (at 25% sequence identity or less) have been deposited into the protein databank. These sequences and structural resources for RBPs provide ample data for the development of computational techniques dedicated to RBP prediction, as experimentally determining RNA-binding functions is time-consuming and expensive. This review compares traditional machine-learning based approaches with emerging template-based methods at several levels of prediction resolution ranging from two-state binding/non-binding prediction, to binding residue prediction and protein-RNA complex structure prediction. The analysis indicates that the two approaches are complementary and their combinations may lead to further improvements.
Collapse
Affiliation(s)
- Huiying Zhao
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana 46202, USA.
| | | | | |
Collapse
|
164
|
Liu R, Hu J. DNABind: A hybrid algorithm for structure-based prediction of DNA-binding residues by combining machine learning- and template-based approaches. Proteins 2013; 81:1885-99. [DOI: 10.1002/prot.24330] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2013] [Revised: 05/02/2013] [Accepted: 05/12/2013] [Indexed: 01/10/2023]
Affiliation(s)
- Rong Liu
- Department of Computer Science and Engineering; University of South Carolina; Columbia South Carolina 29208
- Center for Bioinformatics; College of Life Science and Technology; Huazhong Agricultural University; Wuhan 430070 People's Republic of China
| | - Jianjun Hu
- Department of Computer Science and Engineering; University of South Carolina; Columbia South Carolina 29208
| |
Collapse
|
165
|
Novel GUCA1A mutations suggesting possible mechanisms of pathogenesis in cone, cone-rod, and macular dystrophy patients. BIOMED RESEARCH INTERNATIONAL 2013; 2013:517570. [PMID: 24024198 PMCID: PMC3759255 DOI: 10.1155/2013/517570] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Accepted: 06/19/2013] [Indexed: 01/06/2023]
Abstract
Here, we report two novel GUCA1A (the gene for guanylate cyclase activating protein 1) mutations identified in unrelated Spanish families affected by autosomal dominant retinal degeneration (adRD) with cone and rod involvement. All patients from a three-generation adRD pedigree underwent detailed ophthalmic evaluation. Total genome scan using single-nucleotide polymorphisms and then the linkage analysis were undertaken on the pedigree. Haplotype analysis revealed a 55.37 Mb genomic interval cosegregating with the disease phenotype on chromosome 6p21.31-q15. Mutation screening of positional candidate genes found a heterozygous transition c.250C>T in exon 4 of GUCA1A, corresponding to a novel mutation p.L84F. A second missense mutation, c.320T>C (p.I107T), was detected by screening of the gene in a Spanish patients cohort. Using bioinformatics approach, we predicted that either haploinsufficiency or dominant-negative effect accompanied by creation of a novel function for the mutant protein is a possible mechanism of the disease due to c.250C>T and c.320T>C. Although additional functional studies are required, our data in relation to the c.250C>T mutation open the possibility that transacting factors binding to de novo created recognition site resulting in formation of aberrant splicing variant is a disease model which may be more widespread than previously recognized as a mechanism causing inherited RD.
Collapse
|
166
|
Saraswathi S, Fernández-Martínez JL, Koliński A, Jernigan RL, Kloczkowski A. Distributions of amino acids suggest that certain residue types more effectively determine protein secondary structure. J Mol Model 2013; 19:4337-48. [PMID: 23907551 DOI: 10.1007/s00894-013-1911-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2013] [Accepted: 06/05/2013] [Indexed: 11/27/2022]
Abstract
Exponential growth in the number of available protein sequences is unmatched by the slower growth in the number of structures. As a result, the development of efficient and fast protein secondary structure prediction methods is essential for the broad comprehension of protein structures. Computational methods that can efficiently determine secondary structure can in turn facilitate protein tertiary structure prediction, since most methods rely initially on secondary structure predictions. Recently, we have developed a fast learning optimized prediction methodology (FLOPRED) for predicting protein secondary structure (Saraswathi et al. in JMM 18:4275, 2012). Data are generated by using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data to obtain better and faster convergence to more accurate secondary structure predicted results. A five-fold cross-validated testing accuracy of 83.8 % and a segment overlap (SOV) score of 78.3 % are obtained in this study. Secondary structure predictions and their accuracy are usually presented for three secondary structure elements: α-helix, β-strand and coil but rarely have the results been analyzed with respect to their constituent amino acids. In this paper, we use the results obtained with FLOPRED to provide detailed behaviors for different amino acid types in the secondary structure prediction. We investigate the influence of the composition, physico-chemical properties and position specific occurrence preferences of amino acids within secondary structure elements. In addition, we identify the correlation between these properties and prediction accuracy. The present detailed results suggest several important ways that secondary structure predictions can be improved in the future that might lead to improved protein design and engineering.
Collapse
Affiliation(s)
- S Saraswathi
- Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, 700 Children's Drive, Columbus, OH, USA
| | | | | | | | | |
Collapse
|
167
|
Kubrycht J, Sigler K, Souček P, Hudeček J. Structures composing protein domains. Biochimie 2013; 95:1511-24. [DOI: 10.1016/j.biochi.2013.04.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 04/02/2013] [Indexed: 12/21/2022]
|
168
|
Maksimov MO, Link AJ. Discovery and Characterization of an Isopeptidase That Linearizes Lasso Peptides. J Am Chem Soc 2013; 135:12038-47. [DOI: 10.1021/ja4054256] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Mikhail O. Maksimov
- Departments of Chemical and Biological Engineering and ‡Molecular Biology, Princeton University, Princeton, New Jersey 08544,
United States
| | - A. James Link
- Departments of Chemical and Biological Engineering and ‡Molecular Biology, Princeton University, Princeton, New Jersey 08544,
United States
| |
Collapse
|
169
|
Zhao H, Yang Y, Lin H, Zhang X, Mort M, Cooper DN, Liu Y, Zhou Y. DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels. Genome Biol 2013; 14:R23. [PMID: 23497682 PMCID: PMC4053752 DOI: 10.1186/gb-2013-14-3-r23] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 03/13/2013] [Indexed: 02/07/2023] Open
Abstract
Micro-indels (insertions or deletions shorter than 21 bps) constitute the second most frequent class of human gene mutation after single nucleotide variants. Despite the relative abundance of non-frameshifting indels, their damaging effect on protein structure and function has gone largely unstudied. We have developed a support vector machine-based method named DDIG-in (Detecting disease-causing genetic variations due to indels) to prioritize non-frameshifting indels by comparing disease-associated mutations with putatively neutral mutations from the 1,000 Genomes Project. The final model gives good discrimination for indels and is robust against annotation errors. A webserver implementing DDIG-in is available at http://sparks-lab.org/ddig.
Collapse
|
170
|
Li Z, Yang Y, Zhan J, Dai L, Zhou Y. Energy functions in de novo protein design: current challenges and future prospects. Annu Rev Biophys 2013; 42:315-35. [PMID: 23451890 DOI: 10.1146/annurev-biophys-083012-130315] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
In the past decade, a concerted effort to successfully capture specific tertiary packing interactions produced specific three-dimensional structures for many de novo designed proteins that are validated by nuclear magnetic resonance and/or X-ray crystallographic techniques. However, the success rate of computational design remains low. In this review, we provide an overview of experimentally validated, de novo designed proteins and compare four available programs, RosettaDesign, EGAD, Liang-Grishin, and RosettaDesign-SR, by assessing designed sequences computationally. Computational assessment includes the recovery of native sequences, the calculation of sizes of hydrophobic patches and total solvent-accessible surface area, and the prediction of structural properties such as intrinsic disorder, secondary structures, and three-dimensional structures. This computational assessment, together with a recent community-wide experiment in assessing scoring functions for interface design, suggests that the next-generation protein-design scoring function will come from the right balance of complementary interaction terms. Such balance may be found when more negative experimental data become available as part of a training set.
Collapse
Affiliation(s)
- Zhixiu Li
- School of Informatics, Indiana University-Purdue University, Indianapolis, Indiana 46202, USA
| | | | | | | | | |
Collapse
|
171
|
Dehzangi A, Paliwal K, Lyons J, Sharma A, Sattar A. Enhancing Protein Fold Prediction Accuracy Using Evolutionary and Structural Features. PATTERN RECOGNITION IN BIOINFORMATICS 2013. [DOI: 10.1007/978-3-642-39159-0_18] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
172
|
Saraswathi S, Fernández-Martínez JL, Kolinski A, Jernigan RL, Kloczkowski A. Fast learning optimized prediction methodology (FLOPRED) for protein secondary structure prediction. J Mol Model 2012; 18:4275-89. [PMID: 22562230 PMCID: PMC3694724 DOI: 10.1007/s00894-012-1410-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2012] [Accepted: 03/19/2012] [Indexed: 10/28/2022]
Abstract
Computational methods are rapidly gaining importance in the field of structural biology, mostly due to the explosive progress in genome sequencing projects and the large disparity between the number of sequences and the number of structures. There has been an exponential growth in the number of available protein sequences and a slower growth in the number of structures. There is therefore an urgent need to develop computational methods to predict structures and identify their functions from the sequence. Developing methods that will satisfy these needs both efficiently and accurately is of paramount importance for advances in many biomedical fields, including drug development and discovery of biomarkers. A novel method called fast learning optimized prediction methodology (FLOPRED) is proposed for predicting protein secondary structure, using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data that yield better and faster convergence to produce more accurate results. Protein secondary structures are predicted reliably, more efficiently and more accurately using FLOPRED. These techniques yield superior classification of secondary structure elements, with a training accuracy ranging between 83 % and 87 % over a widerange of hidden neurons and a cross-validated testing accuracy ranging between 81 % and 84 % and a segment overlap (SOV) score of 78 % that are obtained with different sets of proteins. These results are comparable to other recently published studies, but are obtained with greater efficiencies, in terms of time and cost.
Collapse
Affiliation(s)
- Saras Saraswathi
- Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, OH, USA
| | | | - Andrzej Kolinski
- Department of Mathematics, Laboratory of Theory of Biopolymers, Faculty of Chemistry, Warsaw University, Pasteura 1, 02-093 Warsaw
| | - Robert L. Jernigan
- Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA, USA
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, OH, USA, Tel.: +1-614-722-3880 Fax: +1-614-355-2728
| |
Collapse
|
173
|
Zhang T, Faraggi E, Xue B, Dunker AK, Uversky VN, Zhou Y. SPINE-D: accurate prediction of short and long disordered regions by a single neural-network based method. J Biomol Struct Dyn 2012; 29:799-813. [PMID: 22208280 PMCID: PMC3297974 DOI: 10.1080/073911012010525022] [Citation(s) in RCA: 138] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Short and long disordered regions of proteins have different preference for different amino acid residues. Different methods often have to be trained to predict them separately. In this study, we developed a single neural-network-based technique called SPINE-D that makes a three-state prediction first (ordered residues and disordered residues in short and long disordered regions) and reduces it into a two-state prediction afterwards. SPINE-D was tested on various sets composed of different combinations of Disprot annotated proteins and proteins directly from the PDB annotated for disorder by missing coordinates in X-ray determined structures. While disorder annotations are different according to Disprot and X-ray approaches, SPINE-D's prediction accuracy and ability to predict disorder are relatively independent of how the method was trained and what type of annotation was employed but strongly depend on the balance in the relative populations of ordered and disordered residues in short and long disordered regions in the test set. With greater than 85% overall specificity for detecting residues in both short and long disordered regions, the residues in long disordered regions are easier to predict at 81% sensitivity in a balanced test dataset with 56.5% ordered residues but more challenging (at 65% sensitivity) in a test dataset with 90% ordered residues. Compared to eleven other methods, SPINE-D yields the highest area under the curve (AUC), the highest Mathews correlation coefficient for residue-based prediction, and the lowest mean square error in predicting disorder contents of proteins for an independent test set with 329 proteins. In particular, SPINE-D is comparable to a meta predictor in predicting disordered residues in long disordered regions and superior in short disordered regions. SPINE-D participated in CASP 9 blind prediction and is one of the top servers according to the official ranking. In addition, SPINE-D was examined for prediction of functional molecular recognition motifs in several case studies.
Collapse
Affiliation(s)
- Tuo Zhang
- School of Informatics, Indiana University Purdue University, Indianapolis, IN 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Eshel Faraggi
- School of Informatics, Indiana University Purdue University, Indianapolis, IN 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Bin Xue
- Department of Molecular Medicine, University of South Florida, Tampa, FL 33612, USA
| | - A. Keith Dunker
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Vladimir N. Uversky
- Department of Molecular Medicine, University of South Florida, Tampa, FL 33612, USA
- Institute for Biological Instrumentation, Russian Academy of Sciences, 142290 Pushchino, Moscow Region, Russia
| | - Yaoqi Zhou
- School of Informatics, Indiana University Purdue University, Indianapolis, IN 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
174
|
Yang Y, Faraggi E, Zhao H, Zhou Y. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 2011; 27:2076-82. [PMID: 21666270 DOI: 10.1093/bioinformatics/btr350] [Citation(s) in RCA: 245] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In recent years, development of a single-method fold-recognition server lags behind consensus and multiple template techniques. However, a good consensus prediction relies on the accuracy of individual methods. This article reports our efforts to further improve a single-method fold recognition technique called SPARKS by changing the alignment scoring function and incorporating the SPINE-X techniques that make improved prediction of secondary structure, backbone torsion angle and solvent accessible surface area. RESULTS The new method called SPARKS-X was tested with the SALIGN benchmark for alignment accuracy, Lindahl and SCOP benchmarks for fold recognition, and CASP 9 blind test for structure prediction. The method is compared to several state-of-the-art techniques such as HHPRED and BoostThreader. Results show that SPARKS-X is one of the best single-method fold recognition techniques. We further note that incorporating multiple templates and refinement in model building will likely further improve SPARKS-X. AVAILABILITY The method is available as a SPARKS-X server at http://sparks.informatics.iupui.edu/
Collapse
Affiliation(s)
- Yuedong Yang
- School of Informatics, Indiana University Purdue University, Indianapolis, IN 46202, USA
| | | | | | | |
Collapse
|