Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Cheng H, Sen TZ, Kloczkowski A, Margaritis D, Jernigan RL. Prediction of protein secondary structure by mining structural fragment database. POLYMER 2005;46:4314-4321. [PMID: 19081746 DOI: 10.1016/j.polymer.2005.02.040] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

For:	Cheng H, Sen TZ, Kloczkowski A, Margaritis D, Jernigan RL. Prediction of protein secondary structure by mining structural fragment database. POLYMER 2005;46:4314-4321. [PMID: 19081746 DOI: 10.1016/j.polymer.2005.02.040] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Number

Cited by Other Article(s)

Long S, Tian P. Protein secondary structure prediction with context convolutional neural network. RSC Adv 2019;9:38391-38396. [PMID: 35540205 PMCID: PMC9075825 DOI: 10.1039/c9ra05218f] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 11/18/2019] [Indexed: 11/21/2022] Open

Kandoi G, Leelananda SP, Jernigan RL, Sen TZ. Predicting Protein Secondary Structure Using Consensus Data Mining (CDM) Based on Empirical Statistics and Evolutionary Information. Methods Mol Biol 2017;1484:35-44. [PMID: 27787818 DOI: 10.1007/978-1-4939-6406-2_4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Rashid S, Saraswathi S, Kloczkowski A, Sundaram S, Kolinski A. Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach. BMC Bioinformatics 2016;17:362. [PMID: 27618812 PMCID: PMC5020447 DOI: 10.1186/s12859-016-1209-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Accepted: 08/25/2016] [Indexed: 11/17/2022] Open

Abstract

BACKGROUND

Protein secondary structure prediction (SSP) has been an area of intense research interest. Despite advances in recent methods conducted on large datasets, the estimated upper limit accuracy is yet to be reached. Since the predictions of SSP methods are applied as input to higher-level structure prediction pipelines, even small errors may have large perturbations in final models. Previous works relied on cross validation as an estimate of classifier accuracy. However, training on large numbers of protein chains compromises the classifier ability to generalize to new sequences. This prompts a novel approach to training and an investigation into the possible structural factors that lead to poor predictions. Here, a small group of 55 proteins termed the compact model is selected from the CB513 dataset using a heuristics-based approach. In a prior work, all sequences were represented as probability matrices of residues adopting each of Helix, Sheet and Coil states, based on energy calculations using the C-Alpha, C-Beta, Side-chain (CABS) algorithm. The functional relationship between the conformational energies computed with CABS force-field and residue states is approximated using a classifier termed the Fully Complex-valued Relaxation Network (FCRN). The FCRN is trained with the compact model proteins.

RESULTS

The performance of the compact model is compared with traditional cross-validated accuracies and blind-tested on a dataset of G Switch proteins, obtaining accuracies of ∼81 %. The model demonstrates better results when compared to several techniques in the literature. A comparative case study of the worst performing chain identifies hydrogen bond contacts that lead to Coil ⇔ Sheet misclassifications. Overall, mispredicted Coil residues have a higher propensity to participate in backbone hydrogen bonding than correctly predicted Coils.

CONCLUSIONS

The implications of these findings are: (i) the choice of training proteins is important in preserving the generalization of a classifier to predict new sequences accurately and (ii) SSP techniques sensitive in distinguishing between backbone hydrogen bonding and side-chain or water-mediated hydrogen bonding might be needed in the reduction of Coil ⇔ Sheet misclassifications.

Collapse

Maier K, He Y, Esser PR, Thriene K, Sarca D, Kohlhase J, Dengjel J, Martin L, Has C. Single Amino Acid Deletion in Kindlin-1 Results in Partial Protein Degradation Which Can Be Rescued by Chaperone Treatment. J Invest Dermatol 2016;136:920-929. [DOI: 10.1016/j.jid.2015.12.039] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Revised: 11/30/2015] [Accepted: 12/19/2015] [Indexed: 10/22/2022]

Pinilla G, Muñoz LC, Salazar LM, Navarrete J, Guevara A. DISEÑO DE PÉPTIDOS BASADO EN LA SECUENCIA ANÁLOGA AL REPRESOR NEGATIVO icaR DE Staphylococcus sp. REVISTA COLOMBIANA DE QUÍMICA 2016. [DOI: 10.15446/rev.colomb.quim.v44n2.55213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open

Shirani A, Shahbazi Mojarrad J, Mussa Farkhani S, Yari Khosroshahi A, Zakeri-Milani P, Samadi N, Sharifi S, Mohammadi S, Valizadeh H. The Relation Between Thermodynamic and Structural Properties and Cellular Uptake of Peptides Containing Tryptophan and Arginine. Adv Pharm Bull 2015;5:161-8. [PMID: 26236653 DOI: 10.15171/apb.2015.023] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2014] [Revised: 11/20/2014] [Accepted: 11/24/2014] [Indexed: 01/31/2023] Open

Ahmed MH, Kellogg GE, Selley DE, Safo MK, Zhang Y. Predicting the molecular interactions of CRIP1a-cannabinoid 1 receptor with integrated molecular modeling approaches. Bioorg Med Chem Lett 2014;24:1158-65. [PMID: 24461351 PMCID: PMC4353595 DOI: 10.1016/j.bmcl.2013.12.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2013] [Revised: 12/26/2013] [Accepted: 12/29/2013] [Indexed: 12/14/2022]

Saraswathi S, Fernández-Martínez JL, Koliński A, Jernigan RL, Kloczkowski A. Distributions of amino acids suggest that certain residue types more effectively determine protein secondary structure. J Mol Model 2013;19:4337-48. [PMID: 23907551 DOI: 10.1007/s00894-013-1911-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2013] [Accepted: 06/05/2013] [Indexed: 11/27/2022]

Abstract

Exponential growth in the number of available protein sequences is unmatched by the slower growth in the number of structures. As a result, the development of efficient and fast protein secondary structure prediction methods is essential for the broad comprehension of protein structures. Computational methods that can efficiently determine secondary structure can in turn facilitate protein tertiary structure prediction, since most methods rely initially on secondary structure predictions. Recently, we have developed a fast learning optimized prediction methodology (FLOPRED) for predicting protein secondary structure (Saraswathi et al. in JMM 18:4275, 2012). Data are generated by using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data to obtain better and faster convergence to more accurate secondary structure predicted results. A five-fold cross-validated testing accuracy of 83.8 % and a segment overlap (SOV) score of 78.3 % are obtained in this study. Secondary structure predictions and their accuracy are usually presented for three secondary structure elements: α-helix, β-strand and coil but rarely have the results been analyzed with respect to their constituent amino acids. In this paper, we use the results obtained with FLOPRED to provide detailed behaviors for different amino acid types in the secondary structure prediction. We investigate the influence of the composition, physico-chemical properties and position specific occurrence preferences of amino acids within secondary structure elements. In addition, we identify the correlation between these properties and prediction accuracy. The present detailed results suggest several important ways that secondary structure predictions can be improved in the future that might lead to improved protein design and engineering.

Collapse

Yardeni T, Jacobs K, Niethamer TK, Ciccone C, Anikster Y, Kurochkina N, Gahl WA, Huizing M. Murine isoforms of UDP-GlcNAc 2-epimerase/ManNAc kinase: Secondary structures, expression profiles, and response to ManNAc therapy. Glycoconj J 2013;30:609-18. [PMID: 23266873 PMCID: PMC3622838 DOI: 10.1007/s10719-012-9459-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2012] [Revised: 11/27/2012] [Accepted: 11/28/2012] [Indexed: 11/25/2022]

Gribble KE, Mark Welch DB. The mate recognition protein gene mediates reproductive isolation and speciation in the Brachionus plicatilis cryptic species complex. BMC Evol Biol 2012;12:134. [PMID: 22852831 PMCID: PMC3495898 DOI: 10.1186/1471-2148-12-134] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2012] [Accepted: 07/23/2012] [Indexed: 12/15/2022] Open

Abstract

Background

Chemically mediated prezygotic barriers to reproduction likely play an important role in speciation. In facultatively sexual monogonont rotifers from the Brachionus plicatilis cryptic species complex, mate recognition of females by males is mediated by the Mate Recognition Protein (MRP), a globular glycoprotein on the surface of females, encoded by the mmr-b gene family. In this study, we sequenced mmr-b copies from 27 isolates representing 11 phylotypes of the B. plicatilis species complex, examined the mode of evolution and selection of mmr-b, and determined the relationship between mmr-b genetic distance and mate recognition among isolates.

Results

Isolates of the B. plicatilis species complex have 1–4 copies of mmr-b, each composed of 2–9 nearly identical tandem repeats. The repeats within a gene copy are generally more similar than are gene copies among phylotypes, suggesting concerted evolution. Compared to housekeeping genes from the same isolates, mmr-b has accumulated only half as many synonymous differences but twice as many non-synonymous differences. Most of the amino acid differences between repeats appear to occur on the outer face of the protein, and these often result in changes in predicted patterns of phosphorylation. However, we found no evidence of positive selection driving these differences. Isolates with the most divergent copies were unable to mate with other isolates and rarely self-crossed. Overall the degree of mate recognition was significantly correlated with the genetic distance of mmr-b.

Conclusions

Discrimination of compatible mates in the B. plicatilis species complex is determined by proteins encoded by closely related copies of a single gene, mmr-b. While concerted evolution of the tandem repeats in mmr-b may function to maintain identity, it can also lead to the rapid spread of a mutation through all copies in the genome and thus to reproductive isolation. The mmr-b gene is evolving rapidly, and novel alleles may be maintained and increase in frequency via asexual reproduction. Our analyses indicate that mate recognition, controlled by MMR-B, may drive reproductive isolation and allow saltational sympatric speciation within the B. plicatilis cryptic species complex, and that this process may be largely neutral.

Collapse

Wei Y, Thompson J, Floudas CA. CONCORD: a consensus method for protein secondary structure prediction via mixed integer linear optimization. Proc Math Phys Eng Sci 2011. [DOI: 10.1098/rspa.2011.0514] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Yardeni T, Choekyi T, Jacobs K, Ciccone C, Patzel K, Anikster Y, Gahl WA, Kurochkina N, Huizing M. Identification, tissue distribution, and molecular modeling of novel human isoforms of the key enzyme in sialic acid synthesis, UDP-GlcNAc 2-epimerase/ManNAc kinase. Biochemistry 2011;50:8914-25. [PMID: 21910480 DOI: 10.1021/bi201050u] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Saidemberg DM, Baptista-Saidemberg NB, Palma MS. Chemometric analysis of Hymenoptera toxins and defensins: A model for predicting the biological activity of novel peptides from venoms and hemolymph. Peptides 2011;32:1924-33. [PMID: 21855589 DOI: 10.1016/j.peptides.2011.08.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2011] [Revised: 07/29/2011] [Accepted: 08/01/2011] [Indexed: 11/22/2022]

Abstract

When searching for prospective novel peptides, it is difficult to determine the biological activity of a peptide based only on its sequence. The "trial and error" approach is generally laborious, expensive and time consuming due to the large number of different experimental setups required to cover a reasonable number of biological assays. To simulate a virtual model for Hymenoptera insects, 166 peptides were selected from the venoms and hemolymphs of wasps, bees and ants and applied to a mathematical model of multivariate analysis, with nine different chemometric components: GRAVY, aliphaticity index, number of disulfide bonds, total residues, net charge, pI value, Boman index, percentage of alpha helix, and flexibility prediction. Principal component analysis (PCA) with non-linear iterative projections by alternating least-squares (NIPALS) algorithm was performed, without including any information about the biological activity of the peptides. This analysis permitted the grouping of peptides in a way that strongly correlated to the biological function of the peptides. Six different groupings were observed, which seemed to correspond to the following groups: chemotactic peptides, mastoparans, tachykinins, kinins, antibiotic peptides, and a group of long peptides with one or two disulfide bonds and with biological activities that are not yet clearly defined. The partial overlap between the mastoparans group and the chemotactic peptides, tachykinins, kinins and antibiotic peptides in the PCA score plot may be used to explain the frequent reports in the literature about the multifunctionality of some of these peptides. The mathematical model used in the present investigation can be used to predict the biological activities of novel peptides in this system, and it may also be easily applied to other biological systems.

Collapse

Estimating the acidity of singly and multiply substituted benzoic acids via electrostatic potential at the nucleus. Chem Phys Lett 2011. [DOI: 10.1016/j.cplett.2011.07.038] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Burger SK, Liu S, Ayers PW. Practical Calculation of Molecular Acidity with the Aid of a Reference Molecule. J Phys Chem A 2011;115:1293-304. [DOI: 10.1021/jp111148q] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Lin HN, Sung TY, Ho SY, Hsu WL. Improving protein secondary structure prediction based on short subsequences with local structure similarity. BMC Genomics 2010;11 Suppl 4:S4. [PMID: 21143813 PMCID: PMC3005913 DOI: 10.1186/1471-2164-11-s4-s4] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

When characterizing the structural topology of proteins, protein secondary structure (PSS) plays an important role in analyzing and modeling protein structures because it represents the local conformation of amino acids into regular structures. Although PSS prediction has been studied for decades, the prediction accuracy reaches a bottleneck at around 80%, and further improvement is very difficult.

RESULTS

In this paper, we present an improved dictionary-based PSS prediction method called SymPred, and a meta-predictor called SymPsiPred. We adopt the concept behind natural language processing techniques and propose synonymous words to capture local sequence similarities in a group of similar proteins. A synonymous word is an n-gram pattern of amino acids that reflects the sequence variation in a protein's evolution. We generate a protein-dependent synonymous dictionary from a set of protein sequences for PSS prediction.On a large non-redundant dataset of 8,297 protein chains (DsspNr-25), the average Q3 of SymPred and SymPsiPred are 81.0% and 83.9% respectively. On the two latest independent test sets (EVA Set_1 and EVA_Set2), the average Q3 of SymPred is 78.8% and 79.2% respectively. SymPred outperforms other existing methods by 1.4% to 5.4%. We study two factors that may affect the performance of SymPred and find that it is very sensitive to the number of proteins of both known and unknown structures. This finding implies that SymPred and SymPsiPred have the potential to achieve higher accuracy as the number of protein sequences in the NCBInr and PDB databases increases.

CONCLUSIONS

Our experiment results show that local similarities in protein sequences typically exhibit conserved structures, which can be used to improve the accuracy of secondary structure prediction. For the application of synonymous words, we demonstrate an example of a sequence alignment which is generated by the distribution of shared synonymous words of a pair of protein sequences. We can align the two sequences nearly perfectly which are very dissimilar at the sequence level but very similar at the structural level. The SymPred and SymPsiPred prediction servers are available at http://bio-cluster.iis.sinica.edu.tw/SymPred/.

Collapse

Cheng H, Sen TZ, Jernigan RL, Kloczkowski A. Consensus Data Mining (CDM) Protein Secondary Structure Prediction Server: combining GOR V and Fragment Database Mining (FDM). Bioinformatics 2007;23:2628-30. [PMID: 17660202 PMCID: PMC2553684 DOI: 10.1093/bioinformatics/btm379] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Bondugula R, Xu D. MUPRED: a tool for bridging the gap between template based methods and sequence profile based methods for protein secondary structure prediction. Proteins 2007;66:664-70. [PMID: 17109407 DOI: 10.1002/prot.21177] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Zhang N, Ruan J, Wu J, Zhang T. SHEETSPAIR: A Database of Amino Acid Pairs in Protein Sheet Structures. DATA SCIENCE JOURNAL 2007. [DOI: 10.2481/dsj.6.s589] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Sen TZ, Cheng H, Kloczkowski A, Jernigan RL. A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining. Protein Sci 2006;15:2499-506. [PMID: 17001039 PMCID: PMC2242411 DOI: 10.1110/ps.062125306] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Sen TZ, Jernigan RL, Garnier J, Kloczkowski A. GOR V server for protein secondary structure prediction. Bioinformatics 2005;21:2787-8. [PMID: 15797907 PMCID: PMC2553678 DOI: 10.1093/bioinformatics/bti408] [Citation(s) in RCA: 146] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open