Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Bastolla U, Porto M, Roman HE, Vendruscolo M. Principal eigenvector of contact matrices and hydrophobicity profiles in proteins. Proteins 2005;58:22-30. [PMID: 15523667 DOI: 10.1002/prot.20240] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

For:	Bastolla U, Porto M, Roman HE, Vendruscolo M. Principal eigenvector of contact matrices and hydrophobicity profiles in proteins. Proteins 2005;58:22-30. [PMID: 15523667 DOI: 10.1002/prot.20240] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Number

Cited by Other Article(s)

Breimann S, Kamp F, Steiner H, Frishman D. AAontology: An Ontology of Amino Acid Scales for Interpretable Machine Learning. J Mol Biol 2024;436:168717. [PMID: 39053689 DOI: 10.1016/j.jmb.2024.168717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/15/2024] [Accepted: 07/19/2024] [Indexed: 07/27/2024]

Boopathi S, Garduño-Juárez R. A Small Molecule Impedes the Aβ_1-42 Tetramer Neurotoxicity by Preserving Membrane Integrity: Microsecond Multiscale Simulations. ACS Chem Neurosci 2024. [PMID: 39292558 DOI: 10.1021/acschemneuro.4c00383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/20/2024] Open

Abstract

Amyloid-β (Aβ1-42) peptides aggregated into plaques deposited in the brain are the main hallmark of Alzheimer's disease (AD), a social and economic burden worldwide. In this context, insoluble Aβ1-42 fibrils are the main components of plaques. The recent trials that used approved AD drugs show that they can remove the fibrils from AD patients' brains, but they did not halt the course of the disease. Mounting evidence envisages that the soluble Aβ1-42 oligomers' interactions with the neuronal membrane trigger higher cell death than Aβ1-42 fibril interactions. Developing a compound that can alleviate the oligomer's toxicity is one of the most demanding tasks for curing the disease. We performed two molecular dynamics (MD) simulations in an explicit solvent model. In the first case, 55-μs of multiscale all-atom (AA)/coarse-grained (CG) MD simulations were carried out to decipher the impact of a previously described small anti-Aβ molecule, termed M30 (2-octahydroisoquinolin-2(1H)-ylethanamine), on an Aβ1-42 tetramer structure in close contact with a DMPC bilayer. In the second case, 15-μs AA/CG MD simulations were performed to rationalize the dynamics between Aβ1-42 and Aβ1-42-M30 tetramer complexes embedded in DMPC. On the membrane bilayer, we found that the Aβ1-42 tetramer penetrates the bilayer surface due to unrestricted conformational flexibility and many contacts with the membrane phosphate groups. In contrast, no Aβ1-42-M30 tetramer penetration was observed during the entire course of the simulation. In the case of the membrane-embedded Aβ1-42 tetramer, the integrity of the bottom bilayer leaflet was severely affected by the interactions between the negatively charged phosphate groups and the positively charged residues of the Aβ1-42 tetramer, resulting in a deep tetramer penetration into the bilayer hydrophobic region. These contacts were not observed in the case of the membrane-embedded Aβ1-42-M30 tetramer. It was noted that M30 molecules bind to Aβ1-42 tetramer through hydrogen bonds, resulting in a conformational stable Aβ1-42-M30 complex. The associated complex has reduced conformational changes and an enhanced rigidity that prevents the tetramer dissociation by interfering with the tetramer-membrane contacts. Our findings suggest that the M30 molecules could bind to Aβ1-42 tetramer resulting in a rigid structure, and that such complexes do not significantly perturb the membrane bilayer organization. These observations support the in vitro and in vivo experimental evidence that the M30 molecules prevent synaptotocity, improving AD-affected mice memory.

Collapse

Ferreiro D, Branco C, Arenas M. Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics 2024;40:btae096. [PMID: 38374231 PMCID: PMC10914458 DOI: 10.1093/bioinformatics/btae096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 01/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open

Golinski AW, Schmitz ZD, Nielsen GH, Johnson B, Saha D, Appiah S, Hackel BJ, Martiniani S. Predicting and Interpreting Protein Developability Via Transfer of Convolutional Sequence Representation. ACS Synth Biol 2023;12:2600-2615. [PMID: 37642646 PMCID: PMC10829850 DOI: 10.1021/acssynbio.3c00196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]

Abstract

Engineered proteins have emerged as novel diagnostics, therapeutics, and catalysts. Often, poor protein developability─quantified by expression, solubility, and stability─hinders utility. The ability to predict protein developability from amino acid sequence would reduce the experimental burden when selecting candidates. Recent advances in screening technologies enabled a high-throughput (HT) developability dataset for 105 of 1020 possible variants of protein ligand scaffold Gp2. In this work, we evaluate the ability of neural networks to learn a developability representation from a HT dataset and transfer this knowledge to predict recombinant expression beyond observed sequences. The model convolves learned amino acid properties to predict expression levels 44% closer to the experimental variance compared to a non-embedded control. Analysis of learned amino acid embeddings highlights the uniqueness of cysteine, the importance of hydrophobicity and charge, and the unimportance of aromaticity, when aiming to improve the developability of small proteins. We identify clusters of similar sequences with increased recombinant expression through nonlinear dimensionality reduction and we explore the inferred expression landscape via nested sampling. The analysis enables the first direct visualization of the fitness landscape and highlights the existence of evolutionary bottlenecks in sequence space giving rise to competing subpopulations of sequences with different developability. The work advances applied protein engineering efforts by predicting and interpreting protein scaffold expression from a limited dataset. Furthermore, our statistical mechanical treatment of the problem advances foundational efforts to characterize the structure of the protein fitness landscape and the amino acid characteristics that influence protein developability.

Collapse

Wilson C, Lewis KA, Fitzkee NC, Hough LE, Whitten ST. ParSe 2.0: A web tool to identify drivers of protein phase separation at the proteome level. Protein Sci 2023;32:e4756. [PMID: 37574757 PMCID: PMC10464302 DOI: 10.1002/pro.4756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 08/09/2023] [Accepted: 08/10/2023] [Indexed: 08/15/2023]

Hollebrands B, Hageman JA, van de Sande JW, Albada B, Janssen HG. Improved LC-MS identification of short homologous peptides using sequence-specific retention time predictors. Anal Bioanal Chem 2023;415:2715-2726. [PMID: 37000211 PMCID: PMC10185643 DOI: 10.1007/s00216-023-04670-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 03/17/2023] [Accepted: 03/21/2023] [Indexed: 04/01/2023]

Abstract

Peptides are an important group of compounds contributing to the desired, as well as the undesired taste of a food product. Their taste impressions can include aspects of sweetness, bitterness, savoury, umami and many other impressions depending on the amino acids present as well as their sequence. Identification of short peptides in foods is challenging. We developed a method to assign identities to short peptides including homologous structures, i.e. peptides containing the same amino acids with a different sequence order, by accurate prediction of the retention times during reversed phase separation. To train the method, a large set of well-defined short peptides with systematic variations in the amino acid sequence was prepared by a novel synthesis strategy called 'swapped-sequence synthesis'. Additionally, several proteins were enzymatically digested to yield short peptides. Experimental retention times were determined after reversed phase separation and peptide MS2 data was acquired using a high-resolution mass spectrometer operated in data-dependent acquisition mode (DDA). A support vector regression model was trained using a combination of existing sequence-independent peptide descriptors and a newly derived set of selected amino acid index derived sequence-specific peptide (ASP) descriptors. The model was trained and validated using the experimental retention times of the 713 small food-relevant peptides prepared. Whilst selecting the most useful ASP descriptors for our model, special attention was given to predict the retention time differences between homologous peptide structures. Inclusion of ASP descriptors greatly improved the ability to accurately predict retention times, including retention time differences between 157 homologous peptide pairs. The final prediction model had a goodness-of-fit (Q2) of 0.94; moreover for 93% of the short peptides, the elution order was correctly predicted.

Collapse

Ibrahim AY, Khaodeuanepheng NP, Amarasekara DL, Correia JJ, Lewis KA, Fitzkee NC, Hough LE, Whitten ST. Intrinsically disordered regions that drive phase separation form a robustly distinct protein class. J Biol Chem 2022;299:102801. [PMID: 36528065 PMCID: PMC9860499 DOI: 10.1016/j.jbc.2022.102801] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 11/29/2022] [Accepted: 12/09/2022] [Indexed: 12/23/2022] Open

Boopathi S, Garduño‐Juárez R. Calcium inhibits penetration of Alzheimer's Aβ₁ -₄₂ monomers into the membrane. Proteins 2022;90:2124-2143. [PMID: 36321654 PMCID: PMC9804374 DOI: 10.1002/prot.26403] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 07/08/2022] [Accepted: 07/25/2022] [Indexed: 01/05/2023]

Abstract

Calcium ion regulation plays a crucial role in maintaining neuronal functions such as neurotransmitter release and synaptic plasticity. Copper (Cu²⁺ ) coordination to amyloid-β (Aβ) has accelerated Aβ_1-42 aggregation that can trigger calcium dysregulation by enhancing the influx of calcium ions by extensive perturbing integrity of the membranes. Aβ_1-42 aggregation, calcium dysregulation, and membrane damage are Alzheimer disease (AD) implications. To gain a detail of calcium ions' role in the full-length Aβ_1-42 and Aβ_1-42 -Cu²⁺ monomers contact, the cellular membrane before their aggregation to elucidate the neurotoxicity mechanism, we carried out 2.5 μs extensive molecular dynamics simulation (MD) to rigorous explorations of the intriguing feature of the Aβ_1-42 and Aβ_1-42 -Cu²⁺ interaction with the dimyristoylphosphatidylcholine (DMPC) bilayer in the presence of calcium ions. The outcome of the results compared to the same simulations without calcium ions. We surprisingly noted robust binding energies between the Aβ_1-42 and membrane observed in simulations containing without calcium ions and is two and a half fold lesser in the simulation with calcium ions. Therefore, in the case of the absence of calcium ions, N-terminal residues of Aβ_1-42 deeply penetrate from the surface to the center of the bilayer; in contrast to calcium ions presence, the N- and C-terminal residues are involved only in surface contacts through binding phosphate moieties. On the other hand, Aβ_1-42 -Cu²⁺ actively participated in surface bilayer contacts in the absence of calcium ions. These contacts are prevented by forming a calcium bridge between Aβ_1-42 -Cu²⁺ and the DMPC bilayer in the case of calcium ions presence. In a nutshell, Calcium ions do not allow Aβ_1-42 penetration into the membranes nor contact of Aβ_1-42 -Cu²⁺ with the membranes. These pieces of information imply that the calcium ions mediate the membrane perturbation via the monomer interactions but do not damage the membrane; they agree with the western blot experimental results of a higher concentration of calcium ions inhibit the membrane pore formation by Aβ peptides.

Collapse

Caldararo F, Di Giulio M. The genetic code is very close to a global optimum in a model of its origin taking into account both the partition energy of amino acids and their biosynthetic relationships. Biosystems 2022;214:104613. [DOI: 10.1016/j.biosystems.2022.104613] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 01/16/2022] [Accepted: 01/17/2022] [Indexed: 01/23/2023]

Görmez Y, Sabzekar M, Aydın Z. IGPRED: Combination of convolutional neural and graph convolutional networks for protein secondary structure prediction. Proteins 2021;89:1277-1288. [PMID: 33993559 DOI: 10.1002/prot.26149] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 04/21/2021] [Accepted: 05/11/2021] [Indexed: 11/10/2022]

Boopathi S, Dinh Quoc Huy P, Gonzalez W, Theodorakis PE, Li MS. Zinc binding promotes greater hydrophobicity inAlzheimer's Aβ42peptide than copper binding: Molecular dynamics and solvation thermodynamics studies. Proteins 2020;88:1285-1302. [DOI: 10.1002/prot.25901] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 05/04/2020] [Accepted: 05/13/2020] [Indexed: 12/29/2022]

De Pierri CR, Voyceik R, Santos de Mattos LGC, Kulik MG, Camargo JO, Repula de Oliveira AM, de Lima Nichio BT, Marchaukoski JN, da Silva Filho AC, Guizelini D, Ortega JM, Pedrosa FO, Raittz RT. SWeeP: representing large biological sequences datasets in compact vectors. Sci Rep 2020;10:91. [PMID: 31919449 PMCID: PMC6952362 DOI: 10.1038/s41598-019-55627-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 12/02/2019] [Indexed: 12/25/2022] Open

Affiliation(s)

Camilla Reginatto De Pierri Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Biochemistry and Molecular Biology, Curitiba, Paraná, Brazil
Ricardo Voyceik Federal University of Minas Gerais, Institute of Biological Sciences (ICB), Belo Horizonte, Minas Gerais, Brazil
Letícia Graziela Costa Santos de Mattos Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil
Mariane Gonçalves Kulik Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil
Josué Oliveira Camargo Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Biochemistry and Molecular Biology, Curitiba, Paraná, Brazil
Aryel Marlus Repula de Oliveira Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Genetics, Curitiba, Paraná, Brazil
Bruno Thiago de Lima Nichio Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Biochemistry and Molecular Biology, Curitiba, Paraná, Brazil
Jeroniza Nunes Marchaukoski Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil
Antonio Camilo da Silva Filho Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Pharmaceutical Sciences, Curitiba, Paraná, Brazil
Dieval Guizelini Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil
J Miguel Ortega Federal University of Minas Gerais, Institute of Biological Sciences (ICB), Belo Horizonte, Minas Gerais, Brazil
Fabio O Pedrosa Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Biochemistry and Molecular Biology, Curitiba, Paraná, Brazil
Roberto Tadeu Raittz Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil. .,Federal University of Minas Gerais, Institute of Biological Sciences (ICB), Belo Horizonte, Minas Gerais, Brazil. .,Federal University of Paraná, Department of Genetics, Curitiba, Paraná, Brazil.

Collapse

Tywoniuk B, Yuan Y, McCartan S, Szydłowska BM, Tofoleanu F, Brooks BR, Buchete NV. Amyloid Fibril Design: Limiting Structural Polymorphism in Alzheimer's Aβ Protofilaments. J Phys Chem B 2018;122:11535-11545. [PMID: 30335383 DOI: 10.1021/acs.jpcb.8b07423] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Abstract

Nanoscale fibrils formed by amyloid peptides have a polymorphic character, adopting several types of molecular structures in similar growth conditions. As shown by experimental (e.g., solid-state NMR) and computational studies, amyloid fibril polymorphism hinders both the structural characterization of Alzheimer's Aβ amyloid protofilaments and fibrils at a molecular level, as well as the possible applications (e.g., development of drugs or biomarkers) that rely on similar, controlled molecular arrangements of the Aβ peptides in amyloid fibril structures. We have explored the use of several contact potentials for the efficient identification of minimal sequence mutations that could enhance the stability of specific fibril structures while simultaneously destabilizing competing topologies, controlling thus the amount of structural polymorphism in a rational way. We found that different types of contact potentials, while having only partial accuracy on their own, lead to similar results regarding ranking the compatibility of wild-type (WT) and mutated amyloid sequences with different fibril morphologies. This approach allows exhaustive screening and assessment of possible mutations and the identification of minimal consensus mutations that could stabilize fibrils with the desired topology at the expense of other topology types, a prediction that is further validated using atomistic molecular dynamics with explicit water molecules. We apply this two-step multiscale (i.e., residue and atomistic-level) approach to predict and validate mutations that could bias either parallel or antiparallel packing in the core Alzheimer's Aβ_9-40 amyloid fibril models based on solid-state NMR experiments. Besides shedding new light on the molecular origins of structural polymorphism in WT Aβ fibrils, our study could also lead to efficient tools for assisting future experimental approaches for amyloid fibril determination, and for the development of biomarkers or drugs aimed at interfering with the stability of amyloid fibrils, as well as for the future design of amyloid fibrils with a controlled (e.g., reduced) level of structural polymorphism.

Collapse

Jiménez-Santos MJ, Arenas M, Bastolla U. Influence of mutation bias and hydrophobicity on the substitution rates and sequence entropies of protein evolution. PeerJ 2018;6:e5549. [PMID: 30310736 PMCID: PMC6174885 DOI: 10.7717/peerj.5549] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 08/10/2018] [Indexed: 01/13/2023] Open

Abstract

The number of amino acids that occupy a given protein site during evolution reflects the selective constraints operating on the site. This evolutionary variability is strongly influenced by the structural properties of the site in the native structure, and it is quantified either through sequence entropy or through substitution rates. However, while the sequence entropy only depends on the equilibrium frequencies of the amino acids, the substitution rate also depends on the exchangeability matrix that describes mutations in the mathematical model of the substitution process. Here we apply two variants of a mathematical model of protein evolution with selection for protein stability, both against unfolding and against misfolding. Exploiting the approximation of independent sites, these models allow computing site-specific substitution processes that satisfy global constraints on folding stability. We find that site-specific substitution rates do not depend only on the selective constraints acting on the site, quantified through its sequence entropy. In fact, polar sites evolve faster than hydrophobic sites even for equal sequence entropy, as a consequence of the fact that polar amino acids are characterized by higher mutational exchangeability than hydrophobic ones. Accordingly, the model predicts that more polar proteins tend to evolve faster. Nevertheless, these results change if we compare proteins that evolve under different mutation biases, such as orthologous proteins in different bacterial genomes. In this case, the substitution rates are faster in genomes that evolve under mutational bias that favor hydrophobic amino acids by preferentially incorporating the nucleotide Thymine that is more frequent in hydrophobic codons. This appearingly contradictory result arises because buried sites occupied by hydrophobic amino acids are characterized by larger selective factors that largely amplify the substitution rate between hydrophobic amino acids, while the selective factors of exposed sites have a weaker effect. Thus, changes in the mutational bias produce deep effects on the biophysical properties of the protein (hydrophobicity) and on its evolutionary properties (sequence entropy and substitution rate) at the same time. The program Prot_evol that implements the two site-specific substitution processes is freely available at https://ub.cbm.uam.es/prot_fold_evol/prot_fold_evol_soft_main.php#Prot_Evol.

Collapse

Jimenez MJ, Arenas M, Bastolla U. Substitution Rates Predicted by Stability-Constrained Models of Protein Evolution Are Not Consistent with Empirical Data. Mol Biol Evol 2017;35:743-755. [PMID: 29294047 DOI: 10.1093/molbev/msx327] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

Nojoomi S, Koehl P. A weighted string kernel for protein fold recognition. BMC Bioinformatics 2017;18:378. [PMID: 28841820 PMCID: PMC5574112 DOI: 10.1186/s12859-017-1795-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Accepted: 08/15/2017] [Indexed: 11/10/2022] Open

Abstract

Background

Alignment-free methods for comparing protein sequences have proved to be viable alternatives to approaches that first rely on an alignment of the sequences to be compared. Much work however need to be done before those methods provide reliable fold recognition for proteins whose sequences share little similarity. We have recently proposed an alignment-free method based on the concept of string kernels, SeqKernel (Nojoomi and Koehl, BMC Bioinformatics, 2017, 18:137). In this previous study, we have shown that while Seqkernel performs better than standard alignment-based methods, its applications are potentially limited, because of biases due mostly to sequence length effects.

Methods

In this study, we propose improvements to SeqKernel that follows two directions. First, we developed a weighted version of the kernel, WSeqKernel. Second, we expand the concept of string kernels into a novel framework for deriving information on amino acids from protein sequences.

Results

Using a dataset that only contains remote homologs, we have shown that WSeqKernel performs remarkably well in fold recognition experiments. We have shown that with the appropriate weighting scheme, we can remove the length effects on the kernel values. WSeqKernel, just like any alignment-based sequence comparison method, depends on a substitution matrix. We have shown that this matrix can be optimized so that sequence similarity scores correlate well with structure similarity scores. Starting from no information on amino acid similarity, we have shown that we can derive a scoring matrix that echoes the physico-chemical properties of amino acids.

Conclusion

We have made progress in characterizing and parametrizing string kernels as alignment-based methods for comparing protein sequences, and we have shown that they provide a framework for extracting sequence information from structure.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-017-1795-5) contains supplementary material, which is available to authorized users.

Collapse

de la Higuera I, Ferrer-Orta C, de Ávila AI, Perales C, Sierra M, Singh K, Sarafianos SG, Dehouck Y, Bastolla U, Verdaguer N, Domingo E. Molecular and Functional Bases of Selection against a Mutation Bias in an RNA Virus. Genome Biol Evol 2017;9:1212-1228. [PMID: 28460010 PMCID: PMC5433387 DOI: 10.1093/gbe/evx075] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2017] [Indexed: 12/12/2022] Open

Affiliation(s)

Ignacio de la Higuera Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain.,Christopher S. Bond Life Sciences Center and Department of Molecular Microbiology & Immunology, School of Medicine, University of Missouri, Columbia, Missouri
Cristina Ferrer-Orta Institut de Biologia Molecular de Barcelona (CSIC), Parc Científic de Barcelona, Barcelona, Spain
Ana I de Ávila Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain
Celia Perales Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Barcelona, Spain.,Liver Unit, Internal Medicine, Laboratory of Malalties Hepàtiques, Vall d'Hebron Institut de Recerca-Hospital Universitari Vall d'Hebron (VHIR-HUVH), Universitat Autònoma de Barcelona, Barcelona, Spain
Macarena Sierra Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain
Kamalendra Singh Christopher S. Bond Life Sciences Center and Department of Molecular Microbiology & Immunology, School of Medicine, University of Missouri, Columbia, Missouri
Stefan G Sarafianos Christopher S. Bond Life Sciences Center and Department of Molecular Microbiology & Immunology, School of Medicine, University of Missouri, Columbia, Missouri
Yves Dehouck Machine Learning Group, Université Libre de Bruxelles (ULB), Brussels, Belgium
Ugo Bastolla Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain
Nuria Verdaguer Institut de Biologia Molecular de Barcelona (CSIC), Parc Científic de Barcelona, Barcelona, Spain
Esteban Domingo Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Barcelona, Spain

Collapse

Echave J, Wilke CO. Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys 2017;46:85-103. [PMID: 28301766 DOI: 10.1146/annurev-biophys-070816-033819] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Huy PDQ, Vuong QV, La Penna G, Faller P, Li MS. Impact of Cu(II) Binding on Structures and Dynamics of Aβ₄₂ Monomer and Dimer: Molecular Dynamics Study. ACS Chem Neurosci 2016;7:1348-1363. [PMID: 27454036 DOI: 10.1021/acschemneuro.6b00109] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Livi L, Giuliani A, Rizzi A. Toward a multilevel representation of protein molecules: Comparative approaches to the aggregation/folding propensity problem. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2015.07.043] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Arenas M, Sánchez-Cobos A, Bastolla U. Maximum-Likelihood Phylogenetic Inference with Selection on Protein Folding Stability. Mol Biol Evol 2015;32:2195-207. [PMID: 25837579 DOI: 10.1093/molbev/msv085] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Abstract

Despite intense work, incorporating constraints on protein native structures into the mathematical models of molecular evolution remains difficult, because most models and programs assume that protein sites evolve independently, whereas protein stability is maintained by interactions between sites. Here, we address this problem by developing a new mean-field substitution model that generates independent site-specific amino acid distributions with constraints on the stability of the native state against both unfolding and misfolding. The model depends on a background distribution of amino acids and one selection parameter that we fix maximizing the likelihood of the observed protein sequence. The analytic solution of the model shows that the main determinant of the site-specific distributions is the number of native contacts of the site and that the most variable sites are those with an intermediate number of native contacts. The mean-field models obtained, taking into account misfolded conformations, yield larger likelihood than models that only consider the native state, because their average hydrophobicity is more realistic, and they produce on the average stable sequences for most proteins. We evaluated the mean-field model with respect to empirical substitution models on 12 test data sets of different protein families. In all cases, the observed site-specific sequence profiles presented smaller Kullback-Leibler divergence from the mean-field distributions than from the empirical substitution model. Next, we obtained substitution rates combining the mean-field frequencies with an empirical substitution model. The resulting mean-field substitution model assigns larger likelihood than the empirical model to all studied families when we consider sequences with identity larger than 0.35, plausibly a condition that enforces conservation of the native structure across the family. We found that the mean-field model performs better than other structurally constrained models with similar or higher complexity. With respect to the much more complex model recently developed by Bordner and Mittelmann, which takes into account pairwise terms in the amino acid distributions and also optimizes the exchangeability matrix, our model performed worse for data with small sequence divergence but better for data with larger sequence divergence. The mean-field model has been implemented into the computer program Prot_Evol that is freely available at http://ub.cbm.uam.es/software/Prot_Evol.php.

Collapse

Detecting selection on protein stability through statistical mechanical models of folding and evolution. Biomolecules 2014;4:291-314. [PMID: 24970217 PMCID: PMC4030984 DOI: 10.3390/biom4010291] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2013] [Revised: 02/13/2014] [Accepted: 02/14/2014] [Indexed: 12/31/2022] Open

Jackson EL, Ollikainen N, Covert AW, Kortemme T, Wilke CO. Amino-acid site variability among natural and designed proteins. PeerJ 2013;1:e211. [PMID: 24255821 PMCID: PMC3828621 DOI: 10.7717/peerj.211] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Accepted: 10/24/2013] [Indexed: 11/20/2022] Open

Valentin JB, Andreetta C, Boomsma W, Bottaro S, Ferkinghoff-Borg J, Frellsen J, Mardia KV, Tian P, Hamelryck T. Formulation of probabilistic models of protein structure in atomic detail using the reference ratio method. Proteins 2013;82:288-99. [PMID: 23934827 DOI: 10.1002/prot.24386] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Revised: 07/02/2013] [Accepted: 07/18/2013] [Indexed: 01/10/2023]

Lin SYH, Cheng CW, Su ECY. Prediction of B-cell epitopes using evolutionary information and propensity scales. BMC Bioinformatics 2013;14 Suppl 2:S10. [PMID: 23484214 PMCID: PMC3549808 DOI: 10.1186/1471-2105-14-s2-s10] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

Abstract

Background

Development of computational tools that can accurately predict presence and location of B-cell epitopes on pathogenic proteins has a valuable application to the field of vaccinology. Because of the highly variable yet enigmatic nature of B-cell epitopes, their prediction presents a great challenge to computational immunologists.

Methods

We propose a method, BEEPro (B-cell epitope prediction by evolutionary information and propensity scales), which adapts a linear averaging scheme on 16 properties using a support vector machine model to predict both linear and conformational B-cell epitopes. These 16 properties include position specific scoring matrix (PSSM), an amino acid ratio scale, and a set of 14 physicochemical scales obtained via a feature selection process. Finally, a three-way data split procedure is used during the validation process to prevent over-estimation of prediction performance and avoid bias in our experiment results.

Results

In our experiment, first we use a non-redundant linear B-cell epitope dataset curated by Sollner et al. for feature selection and parameter optimization. Evaluated by a three-way data split procedure, BEEPro achieves significant improvement with the area under the receiver operating curve (AUC) = 0.9987, accuracy = 99.29%, mathew's correlation coefficient (MCC) = 0.9281, sensitivity = 0.9604, specificity = 0.9946, positive predictive value (PPV) = 0.9042 for the Sollner dataset. In addition, the same parameters are used to evaluate performance on other independent linear B-cell epitope test datasets, BEEPro attains an AUC which ranges from 0.9874 to 0.9950 and an accuracy which ranges from 93.73% to 97.31%. Moreover, five-fold cross-validation on one benchmark conformational B-cell epitope dataset yields an accuracy of 92.14% and AUC of 0.9066.

Conclusions

Compared with other current models, our method achieves a significant improvement with respect to AUC, accuracy, MCC, sensitivity, specificity, and PPV. Thus, we have shown that an appropriate combination of evolutionary information and propensity scales with a support vector machine model can significantly enhance the prediction performance of both linear and conformational B-cell epitopes.

Collapse

Arenas M, Dos Santos HG, Posada D, Bastolla U. Protein evolution along phylogenetic histories under structurally constrained substitution models. ACTA ACUST UNITED AC 2013;29:3020-8. [PMID: 24037213 DOI: 10.1093/bioinformatics/btt530] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Minning J, Porto M, Bastolla U. Detecting selection for negative design in proteins through an improved model of the misfolded state. Proteins 2013;81:1102-12. [PMID: 23280507 DOI: 10.1002/prot.24244] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 12/17/2012] [Indexed: 11/05/2022]

Disfani FM, Hsu WL, Mizianty MJ, Oldfield CJ, Xue B, Dunker AK, Uversky VN, Kurgan L. MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins. ACTA ACUST UNITED AC 2013;28:i75-83. [PMID: 22689782 PMCID: PMC3371841 DOI: 10.1093/bioinformatics/bts209] [Citation(s) in RCA: 268] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Abstract

Motivation: Molecular recognition features (MoRFs) are short binding regions located within longer intrinsically disordered regions that bind to protein partners via disorder-to-order transitions. MoRFs are implicated in important processes including signaling and regulation. However, only a limited number of experimentally validated MoRFs is known, which motivates development of computational methods that predict MoRFs from protein chains.

Results: We introduce a new MoRF predictor, MoRFpred, which identifies all MoRF types (α, β, coil and complex). We develop a comprehensive dataset of annotated MoRFs to build and empirically compare our method. MoRFpred utilizes a novel design in which annotations generated by sequence alignment are fused with predictions generated by a Support Vector Machine (SVM), which uses a custom designed set of sequence-derived features. The features provide information about evolutionary profiles, selected physiochemical properties of amino acids, and predicted disorder, solvent accessibility and B-factors. Empirical evaluation on several datasets shows that MoRFpred outperforms related methods: α-MoRF-Pred that predicts α-MoRFs and ANCHOR which finds disordered regions that become ordered when bound to a globular partner. We show that our predicted (new) MoRF regions have non-random sequence similarity with native MoRFs. We use this observation along with the fact that predictions with higher probability are more accurate to identify putative MoRF regions. We also identify a few sequence-derived hallmarks of MoRFs. They are characterized by dips in the disorder predictions and higher hydrophobicity and stability when compared to adjacent (in the chain) residues.

Availability:http://biomine.ece.ualberta.ca/MoRFpred/; http://biomine.ece.ualberta.ca/MoRFpred/Supplement.pdf

Contact:lkurgan@ece.ualberta.ca

Supplementary information:Supplementary data are available at Bioinformatics online.

Collapse

Olson B, Molloy K, Hendi SF, Shehu A. Guiding probabilistic search of the protein conformational space with structural profiles. J Bioinform Comput Biol 2012;10:1242005. [PMID: 22809381 DOI: 10.1142/s021972001242005x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Bastolla U, Bruscolini P, Velasco JL. Sequence determinants of protein folding rates: Positive correlation between contact energy and contact range indicates selection for fast folding. Proteins 2012;80:2287-304. [DOI: 10.1002/prot.24118] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2011] [Revised: 05/14/2012] [Accepted: 05/17/2012] [Indexed: 11/12/2022]

Wolff K, Vendruscolo M, Porto M. Coarse-grained model for protein folding based on structural profiles. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011;84:041934. [PMID: 22181202 DOI: 10.1103/physreve.84.041934] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2011] [Indexed: 05/31/2023]

Kochańczyk M. Prediction of functionally important residues in globular proteins from unusual central distances of amino acids. BMC STRUCTURAL BIOLOGY 2011;11:34. [PMID: 21923943 PMCID: PMC3188475 DOI: 10.1186/1472-6807-11-34] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2011] [Accepted: 09/18/2011] [Indexed: 12/12/2022]

Abstract

BACKGROUND

Well-performing automated protein function recognition approaches usually comprise several complementary techniques. Beside constructing better consensus, their predictive power can be improved by either adding or refining independent modules that explore orthogonal features of proteins. In this work, we demonstrated how the exploration of global atomic distributions can be used to indicate functionally important residues.

RESULTS

Using a set of carefully selected globular proteins, we parametrized continuous probability density functions describing preferred central distances of individual protein atoms. Relative preferred burials were estimated using mixture models of radial density functions dependent on the amino acid composition of a protein under consideration. The unexpectedness of extraordinary locations of atoms was evaluated in the information-theoretic manner and used directly for the identification of key amino acids. In the validation study, we tested capabilities of a tool built upon our approach, called SurpResi, by searching for binding sites interacting with ligands. The tool indicated multiple candidate sites achieving success rates comparable to several geometric methods. We also showed that the unexpectedness is a property of regions involved in protein-protein interactions, and thus can be used for the ranking of protein docking predictions. The computational approach implemented in this work is freely available via a Web interface at http://www.bioinformatics.org/surpresi.

CONCLUSIONS

Probabilistic analysis of atomic central distances in globular proteins is capable of capturing distinct orientational preferences of amino acids as resulting from different sizes, charges and hydrophobic characters of their side chains. When idealized spatial preferences can be inferred from the sole amino acid composition of a protein, residues located in hydrophobically unfavorable environments can be easily detected. Such residues turn out to be often directly involved in binding ligands or interfacing with other proteins.

Collapse

The relationship between relative solvent accessibility and evolutionary rate in protein evolution. Genetics 2011;188:479-88. [PMID: 21467571 DOI: 10.1534/genetics.111.128025] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Teichert F, Minning J, Bastolla U, Porto M. High quality protein sequence alignment by combining structural profile prediction and profile alignment using SABER-TOOTH. BMC Bioinformatics 2010;11:251. [PMID: 20470364 PMCID: PMC2885375 DOI: 10.1186/1471-2105-11-251] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2009] [Accepted: 05/14/2010] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Protein alignments are an essential tool for many bioinformatics analyses. While sequence alignments are accurate for proteins of high sequence similarity, they become unreliable as they approach the so-called 'twilight zone' where sequence similarity gets indistinguishable from random. For such distant pairs, structure alignment is of much better quality. Nevertheless, sequence alignment is the only choice in the majority of cases where structural data is not available. This situation demands development of methods that extend the applicability of accurate sequence alignment to distantly related proteins.

RESULTS

We develop a sequence alignment method that combines the prediction of a structural profile based on the protein's sequence with the alignment of that profile using our recently published alignment tool SABERTOOTH. In particular, we predict the contact vector of protein structures using an artificial neural network based on position-specific scoring matrices generated by PSI-BLAST and align these predicted contact vectors. The resulting sequence alignments are assessed using two different tests: First, we assess the alignment quality by measuring the derived structural similarity for cases in which structures are available. In a second test, we quantify the ability of the significance score of the alignments to recognize structural and evolutionary relationships. As a benchmark we use a representative set of the SCOP (structural classification of proteins) database, with similarities ranging from closely related proteins at SCOP family level, to very distantly related proteins at SCOP fold level. Comparing these results with some prominent sequence alignment tools, we find that SABERTOOTH produces sequence alignments of better quality than those of Clustal W, T-Coffee, MUSCLE, and PSI-BLAST. HHpred, one of the most sophisticated and computationally expensive tools available, outperforms our alignment algorithm at family and superfamily levels, while the use of SABERTOOTH is advantageous for alignments at fold level. Our alignment scheme will profit from future improvements of structural profiles prediction.

CONCLUSIONS

We present the automatic sequence alignment tool SABERTOOTH that computes pairwise sequence alignments of very high quality. SABERTOOTH is especially advantageous when applied to alignments of remotely related proteins. The source code is available at http://www.fkp.tu-darmstadt.de/sabertooth_project/, free for academic users upon request.

Collapse

Morra G, Baragli C, Colombo G. Selecting sequences that fold into a defined 3D structure: A new approach for protein design based on molecular dynamics and energetics. Biophys Chem 2009;146:76-84. [PMID: 19926206 DOI: 10.1016/j.bpc.2009.10.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2009] [Revised: 10/07/2009] [Accepted: 10/26/2009] [Indexed: 11/29/2022]

Fornes O, Aragues R, Espadaler J, Marti-Renom MA, Sali A, Oliva B. ModLink+: improving fold recognition by using protein-protein interactions. ACTA ACUST UNITED AC 2009;25:1506-12. [PMID: 19357100 DOI: 10.1093/bioinformatics/btp238] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]

Kloczkowski A, Jernigan RL, Wu Z, Song G, Yang L, Kolinski A, Pokarowski P. Distance matrix-based approach to protein structure prediction. ACTA ACUST UNITED AC 2009;10:67-81. [PMID: 19224393 DOI: 10.1007/s10969-009-9062-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2008] [Accepted: 02/01/2009] [Indexed: 10/21/2022]

Abstract

Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r(ij)(2)] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r (ij) is greater or less than a cutoff value r (cutoff). We have performed spectral decomposition of the distance matrices D = sigma lambda(k)V(k)V(kT), in terms of eigenvalues lambda kappa and the corresponding eigenvectors v kappa and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r (2)--the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r (2) from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 A, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 A. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the dynamics. After structure matching, we apply principal component analysis (PCA) to obtain the important apparent motions for both bound and unbound structures. There are significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM) that is based on the contact matrix C (related to D), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Similarities are also found when the approach is applied to an NMR ensemble, as well as to atomic molecular dynamics (MD) trajectories. Thus, a sufficiently large number of experimental structures can directly provide important information about protein dynamics, but ENM can also provide a similar sampling of conformations. Finally, we use distance constraints from databases of known protein structures for structure refinement. We use the distributions of distances of various types in known protein structures to obtain the most probable ranges or the mean-force potentials for the distances. We then impose these constraints on structures to be refined or include the mean-force potentials directly in the energy minimization so that more plausible structural models can be built. This approach has been successfully used by us in 2006 in the CASPR structure refinement (http://predictioncenter.org/caspR).

Collapse

Bastolla U, Ortíz AR, Porto M, Teichert F. Effective connectivity profile: a structural representation that evidences the relationship between protein structures and sequences. Proteins 2008;73:872-88. [PMID: 18536008 DOI: 10.1002/prot.22113] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Abstract

The complexity of protein structures calls for simplified representations of their topology. The simplest possible mathematical description of a protein structure is a one-dimensional profile representing, for instance, buriedness or secondary structure. This kind of representation has been introduced for studying the sequence to structure relationship, with applications to fold recognition. Here we define the effective connectivity profile (EC), a network theoretical profile that self-consistently represents the network structure of the protein contact matrix. The EC profile makes mathematically explicit the relationship between protein structure and protein sequence, because it allows predicting the average hydrophobicity profile (HP) and the distributions of amino acids at each site for families of homologous proteins sharing the same structure. In this sense, the EC provides an analytic solution to the statistical inverse folding problem, which consists in finding the statistical properties of the set of sequences compatible with a given structure. We tested these predictions with simulations of the structurally constrained neutral (SCN) model of protein evolution with structure conservation, for single- and multi-domain proteins, and for a wide range of mutation processes, the latter producing sequences with very different hydrophobicity profiles, finding that the EC-based predictions are accurate even when only one sequence of the family is known. The EC profile is very significantly correlated with the HP for sequence-structure pairs in the PDB as well. The EC profile generalizes the properties of previously introduced structural profiles to modular proteins such as multidomain chains, and its correlation with the sequence profile is substantially improved with respect to the previously defined profiles, particularly for long proteins. Furthermore, the EC profile has a dynamic interpretation, since the EC components are strongly inversely related with the temperature factors measured in X-ray experiments, meaning that positions with large EC component are more strongly constrained in their equilibrium dynamics. Last, the EC profile allows to define a natural measure of modularity that correlates with the number of domains composing the protein, suggesting its application for domain decomposition. Finally, we show that structurally similar proteins have similar EC profiles, so that the similarity between aligned EC profiles can be used as a structure similarity measure, a property that we have recently applied for protein structure alignment. The code for computing the EC profile is available upon request writing to ubastolla@cbm.uam.es, and the structural profiles discussed in this article can be downloaded from the SLOTH webserver http://www.fkp.tu-darmstadt.de/SLOTH/.

Collapse

Wolff K, Vendruscolo M, Porto M. Stochastic reconstruction of protein structures from effective connectivity profiles. PMC BIOPHYSICS 2008;1:5. [PMID: 19351427 PMCID: PMC2666633 DOI: 10.1186/1757-5036-1-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2008] [Accepted: 11/26/2008] [Indexed: 11/23/2022]

Morra G, Colombo G. Relationship between energy distribution and fold stability: Insights from molecular dynamics simulations of native and mutant proteins. Proteins 2008;72:660-72. [PMID: 18247351 DOI: 10.1002/prot.21963] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Wolff K, Vendruscolo M, Porto M. A stochastic method for the reconstruction of protein structures from one-dimensional structural profiles. Gene 2008;422:47-51. [PMID: 18577428 DOI: 10.1016/j.gene.2008.06.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Miyazawa S, Kinjo AR. Properties of contact matrices induced by pairwise interactions in proteins. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2008;77:051910. [PMID: 18643105 DOI: 10.1103/physreve.77.051910] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2008] [Indexed: 05/26/2023]

Abstract

The properties of contact matrices ( C matrices) needed for native proteins to be the lowest-energy conformations are considered in relation to a contact energy matrix ( E matrix). The total conformational energy is assumed to consist of pairwise interaction energies between atoms or residues, each of which is expressed as a product of a conformation-dependent function (an element of the C matrix) and a sequence-dependent energy parameter (an element of the E matrix). Such pairwise interactions in proteins force native C matrices to be in a relationship as if the interactions are a Go-like potential [N. Go, Annu. Rev. Biophys. Bioeng. 12, 183 (1983)] for the native C matrix, because the lowest bound of the total energy function is equal to the total energy of the native conformation interacting in a Go-like pairwise potential. This relationship between C and E matrices corresponds to (a) a parallel relationship between the eigenvectors of the C and E matrices and a linear relationship between their eigenvalues and (b) a parallel relationship between a contact number vector and the principal eigenvectors of the C and E matrices, where the E matrix is expanded in a series of eigenspaces with an additional constant term. The additional constant term in the spectral expansion of the E matrix is indicated by the lowest bound of the total energy function to correspond to a threshold of contact energy that approximately separates native contacts from non-native ones. Inner products between the principal eigenvector of the C matrix, that of the E matrix, and a contact number vector have been examined for 182 proteins, each of which is a representative from each family of the SCOP database [Murzin, J. Mol. Biol. 247, 536 (1995)], and the results indicate the parallel tendencies between those vectors. A statistical contact potential [S. Miyazawa and R. L. Jernigan, Proteins 34, 49 (1999); S. Miyazawa and R. L. Jernigan, Proteins50, 35 (2003)] estimated from protein crystal structures was used to evaluate pairwise residue-residue interactions in the proteins. In addition, the spectral representation of C and E matrices reveals that pairwise residue-residue interactions, which depend only on the types of interacting amino acids, but not on other residues in a protein, are insufficient and other interactions including residue connectivities and steric hindrance are needed to make native structures unique lowest-energy conformations.

Collapse

Kinjo AR, Nakamura H. Nature of protein family signatures: insights from singular value analysis of position-specific scoring matrices. PLoS One 2008;3:e1963. [PMID: 18398479 PMCID: PMC2276316 DOI: 10.1371/journal.pone.0001963] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2008] [Accepted: 03/05/2008] [Indexed: 11/19/2022] Open

Buchete NV, Straub JE, Thirumalai D. Dissecting contact potentials for proteins: relative contributions of individual amino acids. Proteins 2008;70:119-30. [PMID: 17640067 DOI: 10.1002/prot.21538] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Liò P, Bishop M. Modeling sequence evolution. Methods Mol Biol 2008;452:255-285. [PMID: 18566769 DOI: 10.1007/978-1-60327-159-2_13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]

Bastolla U, Porto M, Ortíz AR. Local interactions in protein folding determined through an inverse folding model. Proteins 2008;71:278-99. [DOI: 10.1002/prot.21730] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

On the optimal contact potential of proteins. Chem Phys Lett 2008. [DOI: 10.1016/j.cplett.2007.12.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Holladay NB, Kinch LN, Grishin NV. Optimization of linear disorder predictors yields tight association between crystallographic disorder and hydrophobicity. Protein Sci 2007;16:2140-52. [PMID: 17893360 PMCID: PMC2204125 DOI: 10.1110/ps.072980107] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Teichert F, Bastolla U, Porto M. SABERTOOTH: protein structural alignment based on a vectorial structure representation. BMC Bioinformatics 2007;8:425. [PMID: 17974011 PMCID: PMC2257979 DOI: 10.1186/1471-2105-8-425] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2007] [Accepted: 10/31/2007] [Indexed: 11/22/2022] Open

The Structurally Constrained Neutral Model of Protein Evolution. ACTA ACUST UNITED AC 2007. [DOI: 10.1007/978-3-540-35306-5_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]