1
|
Breimann S, Kamp F, Steiner H, Frishman D. AAontology: An Ontology of Amino Acid Scales for Interpretable Machine Learning. J Mol Biol 2024; 436:168717. [PMID: 39053689 DOI: 10.1016/j.jmb.2024.168717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/15/2024] [Accepted: 07/19/2024] [Indexed: 07/27/2024]
Abstract
Amino acid scales are crucial for protein prediction tasks, many of them being curated in the AAindex database. Despite various clustering attempts to organize them and to better understand their relationships, these approaches lack the fine-grained classification necessary for satisfactory interpretability in many protein prediction problems. To address this issue, we developed AAontology-a two-level classification for 586 amino acid scales (mainly from AAindex) together with an in-depth analysis of their relations-using bag-of-word-based classification, clustering, and manual refinement over multiple iterations. AAontology organizes physicochemical scales into 8 categories and 67 subcategories, enhancing the interpretability of scale-based machine learning methods in protein bioinformatics. Thereby it enables researchers to gain a deeper biological insight. We anticipate that AAontology will be a building block to link amino acid properties with protein function and dysfunctions as well as aid informed decision-making in mutation analysis or protein drug design.
Collapse
Affiliation(s)
- Stephan Breimann
- Department of Bioinformatics, School of Life Sciences, Technical University of Munich, Freising, Germany; Ludwig-Maximilians-University Munich, Biomedical Center, Division of Metabolic Biochemistry, Munich, Germany; German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
| | - Frits Kamp
- Ludwig-Maximilians-University Munich, Biomedical Center, Division of Metabolic Biochemistry, Munich, Germany
| | - Harald Steiner
- Ludwig-Maximilians-University Munich, Biomedical Center, Division of Metabolic Biochemistry, Munich, Germany; German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|
2
|
Boopathi S, Garduño-Juárez R. A Small Molecule Impedes the Aβ 1-42 Tetramer Neurotoxicity by Preserving Membrane Integrity: Microsecond Multiscale Simulations. ACS Chem Neurosci 2024. [PMID: 39292558 DOI: 10.1021/acschemneuro.4c00383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/20/2024] Open
Abstract
Amyloid-β (Aβ1-42) peptides aggregated into plaques deposited in the brain are the main hallmark of Alzheimer's disease (AD), a social and economic burden worldwide. In this context, insoluble Aβ1-42 fibrils are the main components of plaques. The recent trials that used approved AD drugs show that they can remove the fibrils from AD patients' brains, but they did not halt the course of the disease. Mounting evidence envisages that the soluble Aβ1-42 oligomers' interactions with the neuronal membrane trigger higher cell death than Aβ1-42 fibril interactions. Developing a compound that can alleviate the oligomer's toxicity is one of the most demanding tasks for curing the disease. We performed two molecular dynamics (MD) simulations in an explicit solvent model. In the first case, 55-μs of multiscale all-atom (AA)/coarse-grained (CG) MD simulations were carried out to decipher the impact of a previously described small anti-Aβ molecule, termed M30 (2-octahydroisoquinolin-2(1H)-ylethanamine), on an Aβ1-42 tetramer structure in close contact with a DMPC bilayer. In the second case, 15-μs AA/CG MD simulations were performed to rationalize the dynamics between Aβ1-42 and Aβ1-42-M30 tetramer complexes embedded in DMPC. On the membrane bilayer, we found that the Aβ1-42 tetramer penetrates the bilayer surface due to unrestricted conformational flexibility and many contacts with the membrane phosphate groups. In contrast, no Aβ1-42-M30 tetramer penetration was observed during the entire course of the simulation. In the case of the membrane-embedded Aβ1-42 tetramer, the integrity of the bottom bilayer leaflet was severely affected by the interactions between the negatively charged phosphate groups and the positively charged residues of the Aβ1-42 tetramer, resulting in a deep tetramer penetration into the bilayer hydrophobic region. These contacts were not observed in the case of the membrane-embedded Aβ1-42-M30 tetramer. It was noted that M30 molecules bind to Aβ1-42 tetramer through hydrogen bonds, resulting in a conformational stable Aβ1-42-M30 complex. The associated complex has reduced conformational changes and an enhanced rigidity that prevents the tetramer dissociation by interfering with the tetramer-membrane contacts. Our findings suggest that the M30 molecules could bind to Aβ1-42 tetramer resulting in a rigid structure, and that such complexes do not significantly perturb the membrane bilayer organization. These observations support the in vitro and in vivo experimental evidence that the M30 molecules prevent synaptotocity, improving AD-affected mice memory.
Collapse
Affiliation(s)
- Subramanian Boopathi
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
- Instituto de Ciencias Físicas, Universidad Nacional Autónoma de México, Cuernavaca 62210, México
| | - Ramón Garduño-Juárez
- Instituto de Ciencias Físicas, Universidad Nacional Autónoma de México, Cuernavaca 62210, México
| |
Collapse
|
3
|
Ferreiro D, Branco C, Arenas M. Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics 2024; 40:btae096. [PMID: 38374231 PMCID: PMC10914458 DOI: 10.1093/bioinformatics/btae096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 01/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
MOTIVATION The selection among substitution models of molecular evolution is fundamental for obtaining accurate phylogenetic inferences. At the protein level, evolutionary analyses are traditionally based on empirical substitution models but these models make unrealistic assumptions and are being surpassed by structurally constrained substitution (SCS) models. The SCS models often consider site-dependent evolution, a process that provides realism but complicates their implementation into likelihood functions that are commonly used for substitution model selection. RESULTS We present a method to perform selection among site-dependent SCS models, also among empirical and site-dependent SCS models, based on the approximate Bayesian computation (ABC) approach and its implementation into the computational framework ProteinModelerABC. The framework implements ABC with and without regression adjustments and includes diverse empirical and site-dependent SCS models of protein evolution. Using extensive simulated data, we found that it provides selection among SCS and empirical models with acceptable accuracy. As illustrative examples, we applied the framework to analyze a variety of protein families observing that SCS models fit them better than the corresponding best-fitting empirical substitution models. AVAILABILITY AND IMPLEMENTATION ProteinModelerABC is freely available from https://github.com/DavidFerreiro/ProteinModelerABC, can run in parallel and includes a graphical user interface. The framework is distributed with detailed documentation and ready-to-use examples.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
4
|
Golinski AW, Schmitz ZD, Nielsen GH, Johnson B, Saha D, Appiah S, Hackel BJ, Martiniani S. Predicting and Interpreting Protein Developability Via Transfer of Convolutional Sequence Representation. ACS Synth Biol 2023; 12:2600-2615. [PMID: 37642646 PMCID: PMC10829850 DOI: 10.1021/acssynbio.3c00196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Engineered proteins have emerged as novel diagnostics, therapeutics, and catalysts. Often, poor protein developability─quantified by expression, solubility, and stability─hinders utility. The ability to predict protein developability from amino acid sequence would reduce the experimental burden when selecting candidates. Recent advances in screening technologies enabled a high-throughput (HT) developability dataset for 105 of 1020 possible variants of protein ligand scaffold Gp2. In this work, we evaluate the ability of neural networks to learn a developability representation from a HT dataset and transfer this knowledge to predict recombinant expression beyond observed sequences. The model convolves learned amino acid properties to predict expression levels 44% closer to the experimental variance compared to a non-embedded control. Analysis of learned amino acid embeddings highlights the uniqueness of cysteine, the importance of hydrophobicity and charge, and the unimportance of aromaticity, when aiming to improve the developability of small proteins. We identify clusters of similar sequences with increased recombinant expression through nonlinear dimensionality reduction and we explore the inferred expression landscape via nested sampling. The analysis enables the first direct visualization of the fitness landscape and highlights the existence of evolutionary bottlenecks in sequence space giving rise to competing subpopulations of sequences with different developability. The work advances applied protein engineering efforts by predicting and interpreting protein scaffold expression from a limited dataset. Furthermore, our statistical mechanical treatment of the problem advances foundational efforts to characterize the structure of the protein fitness landscape and the amino acid characteristics that influence protein developability.
Collapse
Affiliation(s)
- Alexander W. Golinski
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
| | - Zachary D. Schmitz
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
| | - Gregory H. Nielsen
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
| | - Bryce Johnson
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
| | - Diya Saha
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
| | - Sandhya Appiah
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
| | - Benjamin J. Hackel
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
| | - Stefano Martiniani
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
- Center for Soft Matter Research, Department of Physics, New York University, New York, NY 10003
- Simons Center for Computational Physical Chemistry, Departments of Chemistry, New York University, New York, NY 10003
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10003
| |
Collapse
|
5
|
Wilson C, Lewis KA, Fitzkee NC, Hough LE, Whitten ST. ParSe 2.0: A web tool to identify drivers of protein phase separation at the proteome level. Protein Sci 2023; 32:e4756. [PMID: 37574757 PMCID: PMC10464302 DOI: 10.1002/pro.4756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 08/09/2023] [Accepted: 08/10/2023] [Indexed: 08/15/2023]
Abstract
We have developed an algorithm, ParSe, which accurately identifies from the primary sequence those protein regions likely to exhibit physiological phase separation behavior. Originally, ParSe was designed to test the hypothesis that, for flexible proteins, phase separation potential is correlated to hydrodynamic size. While our results were consistent with that idea, we also found that many different descriptors could successfully differentiate between three classes of protein regions: folded, intrinsically disordered, and phase-separating intrinsically disordered. Consequently, numerous combinations of amino acid property scales can be used to make robust predictions of protein phase separation. Built from that finding, ParSe 2.0 uses an optimal set of property scales to predict domain-level organization and compute a sequence-based prediction of phase separation potential. The algorithm is fast enough to scan the whole of the human proteome in minutes on a single computer and is equally or more accurate than other published predictors in identifying proteins and regions within proteins that drive phase separation. Here, we describe a web application for ParSe 2.0 that may be accessed through a browser by visiting https://stevewhitten.github.io/Parse_v2_FASTA to quickly identify phase-separating proteins within large sequence sets, or by visiting https://stevewhitten.github.io/Parse_v2_web to evaluate individual protein sequences.
Collapse
Affiliation(s)
- Colorado Wilson
- Department of Chemistry and BiochemistryTexas State UniversitySan MarcosTexasUSA
- Present address:
Department of Pharmacology and Toxicology, Sealy Center for Structural Biology and Molecular BiophysicsUniversity of Texas Medical BranchGalvestonTexasUSA
| | - Karen A. Lewis
- Department of Chemistry and BiochemistryTexas State UniversitySan MarcosTexasUSA
| | - Nicholas C. Fitzkee
- Department of ChemistryMississippi State UniversityMississippi StateMississippiUSA
| | - Loren E. Hough
- Department of PhysicsUniversity of Colorado BoulderBoulderColoradoUSA
- BioFrontiers InstituteUniversity of Colorado BoulderBoulderColoradoUSA
| | - Steven T. Whitten
- Department of Chemistry and BiochemistryTexas State UniversitySan MarcosTexasUSA
| |
Collapse
|
6
|
Hollebrands B, Hageman JA, van de Sande JW, Albada B, Janssen HG. Improved LC-MS identification of short homologous peptides using sequence-specific retention time predictors. Anal Bioanal Chem 2023; 415:2715-2726. [PMID: 37000211 PMCID: PMC10185643 DOI: 10.1007/s00216-023-04670-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 03/17/2023] [Accepted: 03/21/2023] [Indexed: 04/01/2023]
Abstract
Peptides are an important group of compounds contributing to the desired, as well as the undesired taste of a food product. Their taste impressions can include aspects of sweetness, bitterness, savoury, umami and many other impressions depending on the amino acids present as well as their sequence. Identification of short peptides in foods is challenging. We developed a method to assign identities to short peptides including homologous structures, i.e. peptides containing the same amino acids with a different sequence order, by accurate prediction of the retention times during reversed phase separation. To train the method, a large set of well-defined short peptides with systematic variations in the amino acid sequence was prepared by a novel synthesis strategy called 'swapped-sequence synthesis'. Additionally, several proteins were enzymatically digested to yield short peptides. Experimental retention times were determined after reversed phase separation and peptide MS2 data was acquired using a high-resolution mass spectrometer operated in data-dependent acquisition mode (DDA). A support vector regression model was trained using a combination of existing sequence-independent peptide descriptors and a newly derived set of selected amino acid index derived sequence-specific peptide (ASP) descriptors. The model was trained and validated using the experimental retention times of the 713 small food-relevant peptides prepared. Whilst selecting the most useful ASP descriptors for our model, special attention was given to predict the retention time differences between homologous peptide structures. Inclusion of ASP descriptors greatly improved the ability to accurately predict retention times, including retention time differences between 157 homologous peptide pairs. The final prediction model had a goodness-of-fit (Q2) of 0.94; moreover for 93% of the short peptides, the elution order was correctly predicted.
Collapse
Affiliation(s)
- Boudewijn Hollebrands
- Unilever Foods Innovation Centre - Hive, Bronland 14, 6708 WH, Wageningen, the Netherlands.
- Laboratory of Organic Chemistry, Wageningen University & Research, Stippeneng 4, 6708 WE, Wageningen, the Netherlands.
| | - Jos A Hageman
- Wageningen University & Research, Biometris, P.O. Box 16, 6700 AA, Wageningen, the Netherlands
| | - Jasper W van de Sande
- Laboratory of Organic Chemistry, Wageningen University & Research, Stippeneng 4, 6708 WE, Wageningen, the Netherlands
| | - Bauke Albada
- Laboratory of Organic Chemistry, Wageningen University & Research, Stippeneng 4, 6708 WE, Wageningen, the Netherlands
| | - Hans-Gerd Janssen
- Unilever Foods Innovation Centre - Hive, Bronland 14, 6708 WH, Wageningen, the Netherlands
- Laboratory of Organic Chemistry, Wageningen University & Research, Stippeneng 4, 6708 WE, Wageningen, the Netherlands
| |
Collapse
|
7
|
Ibrahim AY, Khaodeuanepheng NP, Amarasekara DL, Correia JJ, Lewis KA, Fitzkee NC, Hough LE, Whitten ST. Intrinsically disordered regions that drive phase separation form a robustly distinct protein class. J Biol Chem 2022; 299:102801. [PMID: 36528065 PMCID: PMC9860499 DOI: 10.1016/j.jbc.2022.102801] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 11/29/2022] [Accepted: 12/09/2022] [Indexed: 12/23/2022] Open
Abstract
Protein phase separation is thought to be a primary driving force for the formation of membrane-less organelles, which control a wide range of biological functions from stress response to ribosome biogenesis. Among phase-separating (PS) proteins, many have intrinsically disordered regions (IDRs) that are needed for phase separation to occur. Accurate identification of IDRs that drive phase separation is important for testing the underlying mechanisms of phase separation, identifying biological processes that rely on phase separation, and designing sequences that modulate phase separation. To identify IDRs that drive phase separation, we first curated datasets of folded, ID, and PS ID sequences. We then used these sequence sets to examine how broadly existing amino acid property scales can be used to distinguish between the three classes of protein regions. We found that there are robust property differences between the classes and, consequently, that numerous combinations of amino acid property scales can be used to make robust predictions of protein phase separation. This result indicates that multiple, redundant mechanisms contribute to the formation of phase-separated droplets from IDRs. The top-performing scales were used to further optimize our previously developed predictor of PS IDRs, ParSe. We then modified ParSe to account for interactions between amino acids and obtained reasonable predictive power for mutations that have been designed to test the role of amino acid interactions in driving protein phase separation. Collectively, our findings provide further insight into the classification of IDRs and the elements involved in protein phase separation.
Collapse
Affiliation(s)
- Ayyam Y. Ibrahim
- Department of Chemistry and Biochemistry, Texas State University, San Marcos, Texas, USA
| | | | | | - John J. Correia
- Department of Cell and Molecular Biology, University of Mississippi Medical Center, Jackson, Mississippi, USA
| | - Karen A. Lewis
- Department of Chemistry and Biochemistry, Texas State University, San Marcos, Texas, USA
| | | | - Loren E. Hough
- Department of Physics, University of Colorado Boulder, Boulder, Colorado, USA,BioFrontiers Institute, University of Colorado Boulder, Boulder, Colorado, USA,For correspondence: Steven T. Whitten; Loren E. Hough
| | - Steven T. Whitten
- Department of Chemistry and Biochemistry, Texas State University, San Marcos, Texas, USA,For correspondence: Steven T. Whitten; Loren E. Hough
| |
Collapse
|
8
|
Boopathi S, Garduño‐Juárez R. Calcium inhibits penetration of Alzheimer's Aβ 1 - 42 monomers into the membrane. Proteins 2022; 90:2124-2143. [PMID: 36321654 PMCID: PMC9804374 DOI: 10.1002/prot.26403] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 07/08/2022] [Accepted: 07/25/2022] [Indexed: 01/05/2023]
Abstract
Calcium ion regulation plays a crucial role in maintaining neuronal functions such as neurotransmitter release and synaptic plasticity. Copper (Cu2+ ) coordination to amyloid-β (Aβ) has accelerated Aβ1-42 aggregation that can trigger calcium dysregulation by enhancing the influx of calcium ions by extensive perturbing integrity of the membranes. Aβ1-42 aggregation, calcium dysregulation, and membrane damage are Alzheimer disease (AD) implications. To gain a detail of calcium ions' role in the full-length Aβ1-42 and Aβ1-42 -Cu2+ monomers contact, the cellular membrane before their aggregation to elucidate the neurotoxicity mechanism, we carried out 2.5 μs extensive molecular dynamics simulation (MD) to rigorous explorations of the intriguing feature of the Aβ1-42 and Aβ1-42 -Cu2+ interaction with the dimyristoylphosphatidylcholine (DMPC) bilayer in the presence of calcium ions. The outcome of the results compared to the same simulations without calcium ions. We surprisingly noted robust binding energies between the Aβ1-42 and membrane observed in simulations containing without calcium ions and is two and a half fold lesser in the simulation with calcium ions. Therefore, in the case of the absence of calcium ions, N-terminal residues of Aβ1-42 deeply penetrate from the surface to the center of the bilayer; in contrast to calcium ions presence, the N- and C-terminal residues are involved only in surface contacts through binding phosphate moieties. On the other hand, Aβ1-42 -Cu2+ actively participated in surface bilayer contacts in the absence of calcium ions. These contacts are prevented by forming a calcium bridge between Aβ1-42 -Cu2+ and the DMPC bilayer in the case of calcium ions presence. In a nutshell, Calcium ions do not allow Aβ1-42 penetration into the membranes nor contact of Aβ1-42 -Cu2+ with the membranes. These pieces of information imply that the calcium ions mediate the membrane perturbation via the monomer interactions but do not damage the membrane; they agree with the western blot experimental results of a higher concentration of calcium ions inhibit the membrane pore formation by Aβ peptides.
Collapse
Affiliation(s)
- Subramanian Boopathi
- Instituto de Ciencias FísicasUniversidad Nacional Autónoma de MéxicoCuernavacaMexico
| | - Ramón Garduño‐Juárez
- Instituto de Ciencias FísicasUniversidad Nacional Autónoma de MéxicoCuernavacaMexico
| |
Collapse
|
9
|
Caldararo F, Di Giulio M. The genetic code is very close to a global optimum in a model of its origin taking into account both the partition energy of amino acids and their biosynthetic relationships. Biosystems 2022; 214:104613. [DOI: 10.1016/j.biosystems.2022.104613] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 01/16/2022] [Accepted: 01/17/2022] [Indexed: 01/23/2023]
|
10
|
Görmez Y, Sabzekar M, Aydın Z. IGPRED: Combination of convolutional neural and graph convolutional networks for protein secondary structure prediction. Proteins 2021; 89:1277-1288. [PMID: 33993559 DOI: 10.1002/prot.26149] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 04/21/2021] [Accepted: 05/11/2021] [Indexed: 11/10/2022]
Abstract
There is a close relationship between the tertiary structure and the function of a protein. One of the important steps to determine the tertiary structure is protein secondary structure prediction (PSSP). For this reason, predicting secondary structure with higher accuracy will give valuable information about the tertiary structure. Recently, deep learning techniques have obtained promising improvements in several machine learning applications including PSSP. In this article, a novel deep learning model, based on convolutional neural network and graph convolutional network is proposed. PSIBLAST PSSM, HHMAKE PSSM, physico-chemical properties of amino acids are combined with structural profiles to generate a rich feature set. Furthermore, the hyper-parameters of the proposed network are optimized using Bayesian optimization. The proposed model IGPRED obtained 89.19%, 86.34%, 87.87%, 85.76%, and 86.54% Q3 accuracies for CullPDB, EVAset, CASP10, CASP11, and CASP12 datasets, respectively.
Collapse
Affiliation(s)
- Yasin Görmez
- Faculty of Economics and Administrative Sciences, Management Information Systems, Sivas Cumhuriyet University, Sivas, Turkey
| | - Mostafa Sabzekar
- Department of Computer Engineering, Birjand University of Technology, Birjand, Iran
| | - Zafer Aydın
- Engineering Faculty, Computer Engineering Department, Abdullah Gül University, Kayseri, Turkey
| |
Collapse
|
11
|
Boopathi S, Dinh Quoc Huy P, Gonzalez W, Theodorakis PE, Li MS. Zinc binding promotes greater hydrophobicity inAlzheimer's Aβ42peptide than copper binding: Molecular dynamics and solvation thermodynamics studies. Proteins 2020; 88:1285-1302. [DOI: 10.1002/prot.25901] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 05/04/2020] [Accepted: 05/13/2020] [Indexed: 12/29/2022]
Affiliation(s)
- Subramanian Boopathi
- Centro de Bioinformática y Simulación Molecular (CBSM), Facultad de IngenieríaUniversidad de Talca Talca Chile
| | | | - Wendy Gonzalez
- Centro de Bioinformática y Simulación Molecular (CBSM), Facultad de IngenieríaUniversidad de Talca Talca Chile
- Millennium Nucleus of Ion Channels‐Associated Diseases (MiNICAD)Universidad de Talca Talca Chile
| | | | - Mai Suan Li
- Institute of PhysicsPolish Academy of Sciences Warsaw Poland
- Institute for Computational Science and Technology, Quang Trung Software City Tan Chanh Hiep Ward Ho Chi Minh City Vietnam
| |
Collapse
|
12
|
De Pierri CR, Voyceik R, Santos de Mattos LGC, Kulik MG, Camargo JO, Repula de Oliveira AM, de Lima Nichio BT, Marchaukoski JN, da Silva Filho AC, Guizelini D, Ortega JM, Pedrosa FO, Raittz RT. SWeeP: representing large biological sequences datasets in compact vectors. Sci Rep 2020; 10:91. [PMID: 31919449 PMCID: PMC6952362 DOI: 10.1038/s41598-019-55627-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 12/02/2019] [Indexed: 12/25/2022] Open
Abstract
Vectoral and alignment-free approaches to biological sequence representation have been explored in bioinformatics to efficiently handle big data. Even so, most current methods involve sequence comparisons via alignment-based heuristics and fail when applied to the analysis of large data sets. Here, we present “Spaced Words Projection (SWeeP)”, a method for representing biological sequences using relatively small vectors while preserving intersequence comparability. SWeeP uses spaced-words by scanning the sequences and generating indices to create a higher-dimensional vector that is later projected onto a smaller randomly oriented orthonormal base. We constructed phylogenetic trees for all organisms with mitochondrial and bacterial protein data in the NCBI database. SWeeP quickly built complete and accurate trees for these organisms with low computational cost. We compared SWeeP to other alignment-free methods and Sweep was 10 to 100 times quicker than the other techniques. A tool to build SWeeP vectors is available at https://sourceforge.net/projects/spacedwordsprojection/.
Collapse
Affiliation(s)
- Camilla Reginatto De Pierri
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Biochemistry and Molecular Biology, Curitiba, Paraná, Brazil
| | - Ricardo Voyceik
- Federal University of Minas Gerais, Institute of Biological Sciences (ICB), Belo Horizonte, Minas Gerais, Brazil
| | | | - Mariane Gonçalves Kulik
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil
| | - Josué Oliveira Camargo
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Biochemistry and Molecular Biology, Curitiba, Paraná, Brazil
| | - Aryel Marlus Repula de Oliveira
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Genetics, Curitiba, Paraná, Brazil
| | - Bruno Thiago de Lima Nichio
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Biochemistry and Molecular Biology, Curitiba, Paraná, Brazil
| | | | - Antonio Camilo da Silva Filho
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Pharmaceutical Sciences, Curitiba, Paraná, Brazil
| | - Dieval Guizelini
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil
| | - J Miguel Ortega
- Federal University of Minas Gerais, Institute of Biological Sciences (ICB), Belo Horizonte, Minas Gerais, Brazil
| | - Fabio O Pedrosa
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Biochemistry and Molecular Biology, Curitiba, Paraná, Brazil
| | - Roberto Tadeu Raittz
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil. .,Federal University of Minas Gerais, Institute of Biological Sciences (ICB), Belo Horizonte, Minas Gerais, Brazil. .,Federal University of Paraná, Department of Genetics, Curitiba, Paraná, Brazil.
| |
Collapse
|
13
|
Tywoniuk B, Yuan Y, McCartan S, Szydłowska BM, Tofoleanu F, Brooks BR, Buchete NV. Amyloid Fibril Design: Limiting Structural Polymorphism in Alzheimer's Aβ Protofilaments. J Phys Chem B 2018; 122:11535-11545. [PMID: 30335383 DOI: 10.1021/acs.jpcb.8b07423] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Nanoscale fibrils formed by amyloid peptides have a polymorphic character, adopting several types of molecular structures in similar growth conditions. As shown by experimental (e.g., solid-state NMR) and computational studies, amyloid fibril polymorphism hinders both the structural characterization of Alzheimer's Aβ amyloid protofilaments and fibrils at a molecular level, as well as the possible applications (e.g., development of drugs or biomarkers) that rely on similar, controlled molecular arrangements of the Aβ peptides in amyloid fibril structures. We have explored the use of several contact potentials for the efficient identification of minimal sequence mutations that could enhance the stability of specific fibril structures while simultaneously destabilizing competing topologies, controlling thus the amount of structural polymorphism in a rational way. We found that different types of contact potentials, while having only partial accuracy on their own, lead to similar results regarding ranking the compatibility of wild-type (WT) and mutated amyloid sequences with different fibril morphologies. This approach allows exhaustive screening and assessment of possible mutations and the identification of minimal consensus mutations that could stabilize fibrils with the desired topology at the expense of other topology types, a prediction that is further validated using atomistic molecular dynamics with explicit water molecules. We apply this two-step multiscale (i.e., residue and atomistic-level) approach to predict and validate mutations that could bias either parallel or antiparallel packing in the core Alzheimer's Aβ9-40 amyloid fibril models based on solid-state NMR experiments. Besides shedding new light on the molecular origins of structural polymorphism in WT Aβ fibrils, our study could also lead to efficient tools for assisting future experimental approaches for amyloid fibril determination, and for the development of biomarkers or drugs aimed at interfering with the stability of amyloid fibrils, as well as for the future design of amyloid fibrils with a controlled (e.g., reduced) level of structural polymorphism.
Collapse
Affiliation(s)
- Bartłomiej Tywoniuk
- School of Physics , University College Dublin , Dublin D04 V1W8 , Ireland.,Institute for Discovery , University College Dublin , Dublin D04 V1W8 , Ireland
| | - Ye Yuan
- School of Physics , University College Dublin , Dublin D04 V1W8 , Ireland.,Institute for Discovery , University College Dublin , Dublin D04 V1W8 , Ireland
| | - Sarah McCartan
- School of Physics , University College Dublin , Dublin D04 V1W8 , Ireland.,Institute for Discovery , University College Dublin , Dublin D04 V1W8 , Ireland
| | - Beata Maria Szydłowska
- Applied Physical Chemistry , Ruprecht-Karls University Heidelberg , Heidelberg 69120 , Germany.,Institute of Physics, EIT 2 , Universität der Bundeswehr München , Werner-Heisenberg-Weg 39 , 85577 Neubiberg , Germany
| | - Florentina Tofoleanu
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute , National Institutes of Health , Bethesda , Maryland 20892 , United States.,Department of Chemistry , Yale University , New Haven , Connecticut 06520 , United States
| | - Bernard R Brooks
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute , National Institutes of Health , Bethesda , Maryland 20892 , United States
| | - Nicolae-Viorel Buchete
- School of Physics , University College Dublin , Dublin D04 V1W8 , Ireland.,Institute for Discovery , University College Dublin , Dublin D04 V1W8 , Ireland
| |
Collapse
|
14
|
Jiménez-Santos MJ, Arenas M, Bastolla U. Influence of mutation bias and hydrophobicity on the substitution rates and sequence entropies of protein evolution. PeerJ 2018; 6:e5549. [PMID: 30310736 PMCID: PMC6174885 DOI: 10.7717/peerj.5549] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 08/10/2018] [Indexed: 01/13/2023] Open
Abstract
The number of amino acids that occupy a given protein site during evolution reflects the selective constraints operating on the site. This evolutionary variability is strongly influenced by the structural properties of the site in the native structure, and it is quantified either through sequence entropy or through substitution rates. However, while the sequence entropy only depends on the equilibrium frequencies of the amino acids, the substitution rate also depends on the exchangeability matrix that describes mutations in the mathematical model of the substitution process. Here we apply two variants of a mathematical model of protein evolution with selection for protein stability, both against unfolding and against misfolding. Exploiting the approximation of independent sites, these models allow computing site-specific substitution processes that satisfy global constraints on folding stability. We find that site-specific substitution rates do not depend only on the selective constraints acting on the site, quantified through its sequence entropy. In fact, polar sites evolve faster than hydrophobic sites even for equal sequence entropy, as a consequence of the fact that polar amino acids are characterized by higher mutational exchangeability than hydrophobic ones. Accordingly, the model predicts that more polar proteins tend to evolve faster. Nevertheless, these results change if we compare proteins that evolve under different mutation biases, such as orthologous proteins in different bacterial genomes. In this case, the substitution rates are faster in genomes that evolve under mutational bias that favor hydrophobic amino acids by preferentially incorporating the nucleotide Thymine that is more frequent in hydrophobic codons. This appearingly contradictory result arises because buried sites occupied by hydrophobic amino acids are characterized by larger selective factors that largely amplify the substitution rate between hydrophobic amino acids, while the selective factors of exposed sites have a weaker effect. Thus, changes in the mutational bias produce deep effects on the biophysical properties of the protein (hydrophobicity) and on its evolutionary properties (sequence entropy and substitution rate) at the same time. The program Prot_evol that implements the two site-specific substitution processes is freely available at https://ub.cbm.uam.es/prot_fold_evol/prot_fold_evol_soft_main.php#Prot_Evol.
Collapse
Affiliation(s)
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Ugo Bastolla
- Bioinformatics Unit, Center for Molecular Biology Severo Ochoa, CSIC-UAM, Madrid, Spain
| |
Collapse
|
15
|
Jimenez MJ, Arenas M, Bastolla U. Substitution Rates Predicted by Stability-Constrained Models of Protein Evolution Are Not Consistent with Empirical Data. Mol Biol Evol 2017; 35:743-755. [PMID: 29294047 DOI: 10.1093/molbev/msx327] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Protein structures strongly influence molecular evolution. In particular, the evolutionary rate of a protein site depends on the number of its native contacts. Stability-constrained models of protein evolution consider this influence of protein structure on evolution by predicting the effect of mutations on the stability of the native state, but they currently neglect how mutations affect the protein structure. These models predict that buried protein sites with more native contacts are more constrained by natural selection and less variable, as observed. Nevertheless, previous work did not consider the stability against compact misfolded conformations, although it is known that the negative design that destabilizes these misfolded conformations influences protein evolution significantly. Here, we show that stability-constrained models that consider misfolding predict that site-specific sequence entropy and substitution rate peak at amphiphilic sites with an intermediate number of contacts, as these sites are less constrained than exposed sites with few contacts whose hydrophobicity must be limited. This result holds both for a mean-field model with independent sites and for a pairwise model that takes as a reference the wild-type sequence, but it contrasts with the observations that indicate that the entropy and the substitution rate decrease monotonically with the number of contacts. Our work suggests that stability-constrained models overestimate the tolerance of amphiphilic sites against mutations, either because of the limits of the free energy function or, more importantly in our opinion, because they do not consider how mutations perturb the native protein structure.
Collapse
Affiliation(s)
- María José Jimenez
- Centro de Biologia Molecular "Severo Ochoa" CSIC-UAM Cantoblanco, Madrid, Spain
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Ugo Bastolla
- Centro de Biologia Molecular "Severo Ochoa" CSIC-UAM Cantoblanco, Madrid, Spain
| |
Collapse
|
16
|
Nojoomi S, Koehl P. A weighted string kernel for protein fold recognition. BMC Bioinformatics 2017; 18:378. [PMID: 28841820 PMCID: PMC5574112 DOI: 10.1186/s12859-017-1795-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Accepted: 08/15/2017] [Indexed: 11/10/2022] Open
Abstract
Background Alignment-free methods for comparing protein sequences have proved to be viable alternatives to approaches that first rely on an alignment of the sequences to be compared. Much work however need to be done before those methods provide reliable fold recognition for proteins whose sequences share little similarity. We have recently proposed an alignment-free method based on the concept of string kernels, SeqKernel (Nojoomi and Koehl, BMC Bioinformatics, 2017, 18:137). In this previous study, we have shown that while Seqkernel performs better than standard alignment-based methods, its applications are potentially limited, because of biases due mostly to sequence length effects. Methods In this study, we propose improvements to SeqKernel that follows two directions. First, we developed a weighted version of the kernel, WSeqKernel. Second, we expand the concept of string kernels into a novel framework for deriving information on amino acids from protein sequences. Results Using a dataset that only contains remote homologs, we have shown that WSeqKernel performs remarkably well in fold recognition experiments. We have shown that with the appropriate weighting scheme, we can remove the length effects on the kernel values. WSeqKernel, just like any alignment-based sequence comparison method, depends on a substitution matrix. We have shown that this matrix can be optimized so that sequence similarity scores correlate well with structure similarity scores. Starting from no information on amino acid similarity, we have shown that we can derive a scoring matrix that echoes the physico-chemical properties of amino acids. Conclusion We have made progress in characterizing and parametrizing string kernels as alignment-based methods for comparing protein sequences, and we have shown that they provide a framework for extracting sequence information from structure. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1795-5) contains supplementary material, which is available to authorized users.
Collapse
|
17
|
de la Higuera I, Ferrer-Orta C, de Ávila AI, Perales C, Sierra M, Singh K, Sarafianos SG, Dehouck Y, Bastolla U, Verdaguer N, Domingo E. Molecular and Functional Bases of Selection against a Mutation Bias in an RNA Virus. Genome Biol Evol 2017; 9:1212-1228. [PMID: 28460010 PMCID: PMC5433387 DOI: 10.1093/gbe/evx075] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2017] [Indexed: 12/12/2022] Open
Abstract
The selective pressures acting on viruses that replicate under enhanced mutation rates are largely unknown. Here, we describe resistance of foot-and-mouth disease virus to the mutagen 5-fluorouracil (FU) through a single polymerase substitution that prevents an excess of A to G and U to C transitions evoked by FU on the wild-type foot-and-mouth disease virus, while maintaining the same level of mutant spectrum complexity. The polymerase substitution inflicts upon the virus a fitness loss during replication in absence of FU but confers a fitness gain in presence of FU. The compensation of mutational bias was documented by in vitro nucleotide incorporation assays, and it was associated with structural modifications at the N-terminal region and motif B of the viral polymerase. Predictions of the effect of mutations that increase the frequency of G and C in the viral genome and encoded polymerase suggest multiple points in the virus life cycle where the mutational bias in favor of G and C may be detrimental. Application of predictive algorithms suggests adverse effects of the FU-directed mutational bias on protein stability. The results reinforce modulation of nucleotide incorporation as a lethal mutagenesis-escape mechanism (that permits eluding virus extinction despite replication in the presence of a mutagenic agent) and suggest that mutational bias can be a target of selection during virus replication.
Collapse
Affiliation(s)
- Ignacio de la Higuera
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain.,Christopher S. Bond Life Sciences Center and Department of Molecular Microbiology & Immunology, School of Medicine, University of Missouri, Columbia, Missouri
| | - Cristina Ferrer-Orta
- Institut de Biologia Molecular de Barcelona (CSIC), Parc Científic de Barcelona, Barcelona, Spain
| | - Ana I de Ávila
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain
| | - Celia Perales
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Barcelona, Spain.,Liver Unit, Internal Medicine, Laboratory of Malalties Hepàtiques, Vall d'Hebron Institut de Recerca-Hospital Universitari Vall d'Hebron (VHIR-HUVH), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Macarena Sierra
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain
| | - Kamalendra Singh
- Christopher S. Bond Life Sciences Center and Department of Molecular Microbiology & Immunology, School of Medicine, University of Missouri, Columbia, Missouri
| | - Stefan G Sarafianos
- Christopher S. Bond Life Sciences Center and Department of Molecular Microbiology & Immunology, School of Medicine, University of Missouri, Columbia, Missouri
| | - Yves Dehouck
- Machine Learning Group, Université Libre de Bruxelles (ULB), Brussels, Belgium
| | - Ugo Bastolla
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain
| | - Nuria Verdaguer
- Institut de Biologia Molecular de Barcelona (CSIC), Parc Científic de Barcelona, Barcelona, Spain
| | - Esteban Domingo
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Barcelona, Spain
| |
Collapse
|
18
|
Echave J, Wilke CO. Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys 2017; 46:85-103. [PMID: 28301766 DOI: 10.1146/annurev-biophys-070816-033819] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
For decades, rates of protein evolution have been interpreted in terms of the vague concept of functional importance. Slowly evolving proteins or sites within proteins were assumed to be more functionally important and thus subject to stronger selection pressure. More recently, biophysical models of protein evolution, which combine evolutionary theory with protein biophysics, have completely revolutionized our view of the forces that shape sequence divergence. Slowly evolving proteins have been found to evolve slowly because of selection against toxic misfolding and misinteractions, linking their rate of evolution primarily to their abundance. Similarly, most slowly evolving sites in proteins are not directly involved in function, but mutating these sites has a large impact on protein structure and stability. In this article, we review the studies in the emerging field of biophysical protein evolution that have shaped our current understanding of sequence divergence patterns. We also propose future research directions to develop this nascent field.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina; .,Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Texas 78712;
| |
Collapse
|
19
|
Huy PDQ, Vuong QV, La Penna G, Faller P, Li MS. Impact of Cu(II) Binding on Structures and Dynamics of Aβ 42 Monomer and Dimer: Molecular Dynamics Study. ACS Chem Neurosci 2016; 7:1348-1363. [PMID: 27454036 DOI: 10.1021/acschemneuro.6b00109] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The classical force field, which is compatible with the Amber force field 99SB, has been obtained for the interaction of Cu(II) with monomer and dimers of amyloid-β peptides using the coordination where Cu(II) is bound to His6, His13 (or His14), and Asp1 with distorted planar geometry. The newly developed force field and molecular dynamics simulation were employed to study the impact of Cu(II) binding on structures and dynamics of Aβ42 monomer and dimers. It was shown that in the presence of Cu(II) the β content of monomer is reduced substantially compared with the wild-type Aβ42 suggesting that, in accord with experiments, metal ions facilitate formation of amorphous aggregates rather than amyloid fibrils with cross-β structures. In addition, one possible mechanism for amorphous assembly is that the Asp23-Lys28 salt bridge, which plays a crucial role in β sheet formation, becomes more flexible upon copper ion binding to the Aβ N-terminus. The simulation of dimers was conducted with the Cu(II)/Aβ stoichiometric ratios of 1:1 and 1:2. For the 1:1 ratio Cu(II) delays the Aβ dimerization process as observed in a number of experiments. The mechanism underlying this phenomenon is associated with slow formation of interchain salt bridges in dimer as well as with decreased hydrophobicity of monomer upon Cu-binding.
Collapse
Affiliation(s)
- Pham Dinh Quoc Huy
- Institute
of Physics, Polish Academy of Sciences, Al. Lotnikow 32/46, 02-668 Warsaw, Poland
- Institute
for Computational Science and Technology, Quang Trung Software City, Tan Chanh Hiep Ward, District 12, Ho Chi
Minh City, Vietnam
| | - Quan Van Vuong
- Institute
for Computational Science and Technology, Quang Trung Software City, Tan Chanh Hiep Ward, District 12, Ho Chi
Minh City, Vietnam
- Department
of Chemistry, Nagoya University, Nagoya 464-8602, Japan
| | - Giovanni La Penna
- National Research Council of Italy CNR, Institute
for Chemistry of Organometallic Compounds ICCOM, 50019 Florence, Italy
- Italian Institute for Nuclear Physics INFN, Section
of Roma-Tor Vergata, 50019 Florence, Italy
| | - Peter Faller
- Biometals
and Biological Chemistry, Institute of Chemistry, University of Strasbourg, 4 rue B. Pascal, 67081 Strasbourg, France
| | - Mai Suan Li
- Institute
of Physics, Polish Academy of Sciences, Al. Lotnikow 32/46, 02-668 Warsaw, Poland
| |
Collapse
|
20
|
Livi L, Giuliani A, Rizzi A. Toward a multilevel representation of protein molecules: Comparative approaches to the aggregation/folding propensity problem. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2015.07.043] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
21
|
Arenas M, Sánchez-Cobos A, Bastolla U. Maximum-Likelihood Phylogenetic Inference with Selection on Protein Folding Stability. Mol Biol Evol 2015; 32:2195-207. [PMID: 25837579 DOI: 10.1093/molbev/msv085] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Despite intense work, incorporating constraints on protein native structures into the mathematical models of molecular evolution remains difficult, because most models and programs assume that protein sites evolve independently, whereas protein stability is maintained by interactions between sites. Here, we address this problem by developing a new mean-field substitution model that generates independent site-specific amino acid distributions with constraints on the stability of the native state against both unfolding and misfolding. The model depends on a background distribution of amino acids and one selection parameter that we fix maximizing the likelihood of the observed protein sequence. The analytic solution of the model shows that the main determinant of the site-specific distributions is the number of native contacts of the site and that the most variable sites are those with an intermediate number of native contacts. The mean-field models obtained, taking into account misfolded conformations, yield larger likelihood than models that only consider the native state, because their average hydrophobicity is more realistic, and they produce on the average stable sequences for most proteins. We evaluated the mean-field model with respect to empirical substitution models on 12 test data sets of different protein families. In all cases, the observed site-specific sequence profiles presented smaller Kullback-Leibler divergence from the mean-field distributions than from the empirical substitution model. Next, we obtained substitution rates combining the mean-field frequencies with an empirical substitution model. The resulting mean-field substitution model assigns larger likelihood than the empirical model to all studied families when we consider sequences with identity larger than 0.35, plausibly a condition that enforces conservation of the native structure across the family. We found that the mean-field model performs better than other structurally constrained models with similar or higher complexity. With respect to the much more complex model recently developed by Bordner and Mittelmann, which takes into account pairwise terms in the amino acid distributions and also optimizes the exchangeability matrix, our model performed worse for data with small sequence divergence but better for data with larger sequence divergence. The mean-field model has been implemented into the computer program Prot_Evol that is freely available at http://ub.cbm.uam.es/software/Prot_Evol.php.
Collapse
Affiliation(s)
- Miguel Arenas
- Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain
| | - Agustin Sánchez-Cobos
- Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain
| | - Ugo Bastolla
- Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain
| |
Collapse
|
22
|
Detecting selection on protein stability through statistical mechanical models of folding and evolution. Biomolecules 2014; 4:291-314. [PMID: 24970217 PMCID: PMC4030984 DOI: 10.3390/biom4010291] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2013] [Revised: 02/13/2014] [Accepted: 02/14/2014] [Indexed: 12/31/2022] Open
Abstract
The properties of biomolecules depend both on physics and on the evolutionary process that formed them. These two points of view produce a powerful synergism. Physics sets the stage and the constraints that molecular evolution has to obey, and evolutionary theory helps in rationalizing the physical properties of biomolecules, including protein folding thermodynamics. To complete the parallelism, protein thermodynamics is founded on the statistical mechanics in the space of protein structures, and molecular evolution can be viewed as statistical mechanics in the space of protein sequences. In this review, we will integrate both points of view, applying them to detecting selection on the stability of the folded state of proteins. We will start discussing positive design, which strengthens the stability of the folded against the unfolded state of proteins. Positive design justifies why statistical potentials for protein folding can be obtained from the frequencies of structural motifs. Stability against unfolding is easier to achieve for longer proteins. On the contrary, negative design, which consists in destabilizing frequently formed misfolded conformations, is more difficult to achieve for longer proteins. The folding rate can be enhanced by strengthening short-range native interactions, but this requirement contrasts with negative design, and evolution has to trade-off between them. Finally, selection can accelerate functional movements by favoring low frequency normal modes of the dynamics of the native state that strongly correlate with the functional conformation change.
Collapse
|
23
|
Jackson EL, Ollikainen N, Covert AW, Kortemme T, Wilke CO. Amino-acid site variability among natural and designed proteins. PeerJ 2013; 1:e211. [PMID: 24255821 PMCID: PMC3828621 DOI: 10.7717/peerj.211] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Accepted: 10/24/2013] [Indexed: 11/20/2022] Open
Abstract
Computational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility. We find that proteins designed with a fixed backbone tend to underestimate the amount of site variability observed in natural proteins while proteins designed with an intermediate amount of backbone flexibility result in more realistic site variability. Further, the correlation between solvent exposure and site variability in designed proteins is lower than that in natural proteins. This finding suggests that site variability is too uniform across different solvent exposure states (i.e., buried residues are too variable or exposed residues too conserved). When comparing the amino acid frequencies in the designed proteins with those in natural proteins we find that in the designed proteins hydrophobic residues are underrepresented in the core. From these results we conclude that intermediate backbone flexibility during design results in more accurate protein design and that either scoring functions or backbone sampling methods require further improvement to accurately replicate structural constraints on site variability.
Collapse
Affiliation(s)
- Eleisha L. Jackson
- Institute of Cellular and Molecular Biology, Center for Computational Biology and Bioinformatics, and Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Noah Ollikainen
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, CA, USA
| | - Arthur W. Covert
- Institute of Cellular and Molecular Biology, Center for Computational Biology and Bioinformatics, and Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Tanja Kortemme
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, CA, USA
- California Institute for Quantitative Biosciences (QB3) and Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Claus O. Wilke
- Institute of Cellular and Molecular Biology, Center for Computational Biology and Bioinformatics, and Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
24
|
Valentin JB, Andreetta C, Boomsma W, Bottaro S, Ferkinghoff-Borg J, Frellsen J, Mardia KV, Tian P, Hamelryck T. Formulation of probabilistic models of protein structure in atomic detail using the reference ratio method. Proteins 2013; 82:288-99. [PMID: 23934827 DOI: 10.1002/prot.24386] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Revised: 07/02/2013] [Accepted: 07/18/2013] [Indexed: 01/10/2023]
Abstract
We propose a method to formulate probabilistic models of protein structure in atomic detail, for a given amino acid sequence, based on Bayesian principles, while retaining a close link to physics. We start from two previously developed probabilistic models of protein structure on a local length scale, which concern the dihedral angles in main chain and side chains, respectively. Conceptually, this constitutes a probabilistic and continuous alternative to the use of discrete fragment and rotamer libraries. The local model is combined with a nonlocal model that involves a small number of energy terms according to a physical force field, and some information on the overall secondary structure content. In this initial study we focus on the formulation of the joint model and the evaluation of the use of an energy vector as a descriptor of a protein's nonlocal structure; hence, we derive the parameters of the nonlocal model from the native structure without loss of generality. The local and nonlocal models are combined using the reference ratio method, which is a well-justified probabilistic construction. For evaluation, we use the resulting joint models to predict the structure of four proteins. The results indicate that the proposed method and the probabilistic models show considerable promise for probabilistic protein structure prediction and related applications.
Collapse
Affiliation(s)
- Jan B Valentin
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Lin SYH, Cheng CW, Su ECY. Prediction of B-cell epitopes using evolutionary information and propensity scales. BMC Bioinformatics 2013; 14 Suppl 2:S10. [PMID: 23484214 PMCID: PMC3549808 DOI: 10.1186/1471-2105-14-s2-s10] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Background Development of computational tools that can accurately predict presence and location of B-cell epitopes on pathogenic proteins has a valuable application to the field of vaccinology. Because of the highly variable yet enigmatic nature of B-cell epitopes, their prediction presents a great challenge to computational immunologists. Methods We propose a method, BEEPro (B-cell epitope prediction by evolutionary information and propensity scales), which adapts a linear averaging scheme on 16 properties using a support vector machine model to predict both linear and conformational B-cell epitopes. These 16 properties include position specific scoring matrix (PSSM), an amino acid ratio scale, and a set of 14 physicochemical scales obtained via a feature selection process. Finally, a three-way data split procedure is used during the validation process to prevent over-estimation of prediction performance and avoid bias in our experiment results. Results In our experiment, first we use a non-redundant linear B-cell epitope dataset curated by Sollner et al. for feature selection and parameter optimization. Evaluated by a three-way data split procedure, BEEPro achieves significant improvement with the area under the receiver operating curve (AUC) = 0.9987, accuracy = 99.29%, mathew's correlation coefficient (MCC) = 0.9281, sensitivity = 0.9604, specificity = 0.9946, positive predictive value (PPV) = 0.9042 for the Sollner dataset. In addition, the same parameters are used to evaluate performance on other independent linear B-cell epitope test datasets, BEEPro attains an AUC which ranges from 0.9874 to 0.9950 and an accuracy which ranges from 93.73% to 97.31%. Moreover, five-fold cross-validation on one benchmark conformational B-cell epitope dataset yields an accuracy of 92.14% and AUC of 0.9066. Conclusions Compared with other current models, our method achieves a significant improvement with respect to AUC, accuracy, MCC, sensitivity, specificity, and PPV. Thus, we have shown that an appropriate combination of evolutionary information and propensity scales with a support vector machine model can significantly enhance the prediction performance of both linear and conformational B-cell epitopes.
Collapse
Affiliation(s)
- Scott Yi-Heng Lin
- School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | | | | |
Collapse
|
26
|
Arenas M, Dos Santos HG, Posada D, Bastolla U. Protein evolution along phylogenetic histories under structurally constrained substitution models. ACTA ACUST UNITED AC 2013; 29:3020-8. [PMID: 24037213 DOI: 10.1093/bioinformatics/btt530] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Models of molecular evolution aim at describing the evolutionary processes at the molecular level. However, current models rarely incorporate information from protein structure. Conversely, structure-based models of protein evolution have not been commonly applied to simulate sequence evolution in a phylogenetic framework, and they often ignore relevant evolutionary processes such as recombination. A simulation evolutionary framework that integrates substitution models that account for protein structure stability should be able to generate more realistic in silico evolved proteins for a variety of purposes. RESULTS We developed a method to simulate protein evolution that combines models of protein folding stability, such that the fitness depends on the stability of the native state both with respect to unfolding and misfolding, with phylogenetic histories that can be either specified by the user or simulated with the coalescent under complex evolutionary scenarios, including recombination, demographics and migration. We have implemented this framework in a computer program called ProteinEvolver. Remarkably, comparing these models with empirical amino acid replacement models, we found that the former produce amino acid distributions closer to distributions observed in real protein families, and proteins that are predicted to be more stable. Therefore, we conclude that evolutionary models that consider protein stability and realistic evolutionary histories constitute a better approximation of the real evolutionary process.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology 'Severo Ochoa', Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain and Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | | | | | | |
Collapse
|
27
|
Minning J, Porto M, Bastolla U. Detecting selection for negative design in proteins through an improved model of the misfolded state. Proteins 2013; 81:1102-12. [PMID: 23280507 DOI: 10.1002/prot.24244] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 12/17/2012] [Indexed: 11/05/2022]
Abstract
Proteins that need to be structured in their native state must be stable both against the unfolded ensemble and against incorrectly folded (misfolded) conformations with low free energy. Positive design targets the first type of stability by strengthening native interactions. The second type of stability is achieved by destabilizing interactions that occur frequently in the misfolded ensemble, a strategy called negative design. Here, we investigate negative design adopting a statistical mechanical model of the misfolded ensemble, which improves the usual Gaussian approximation by taking into account the third moment of the energy distribution and contact correlations. Applying this model, we detect and quantify selection for negative design in most natural proteins, and we analytically design protein sequences that are stable both against unfolding and against misfolding.
Collapse
Affiliation(s)
- Jonas Minning
- Institut für Festkörperphysik, Technische Universität Darmstadt, Darmstadt, Germany
| | | | | |
Collapse
|
28
|
Disfani FM, Hsu WL, Mizianty MJ, Oldfield CJ, Xue B, Dunker AK, Uversky VN, Kurgan L. MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins. ACTA ACUST UNITED AC 2013; 28:i75-83. [PMID: 22689782 PMCID: PMC3371841 DOI: 10.1093/bioinformatics/bts209] [Citation(s) in RCA: 268] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Motivation: Molecular recognition features (MoRFs) are short binding regions located within longer intrinsically disordered regions that bind to protein partners via disorder-to-order transitions. MoRFs are implicated in important processes including signaling and regulation. However, only a limited number of experimentally validated MoRFs is known, which motivates development of computational methods that predict MoRFs from protein chains. Results: We introduce a new MoRF predictor, MoRFpred, which identifies all MoRF types (α, β, coil and complex). We develop a comprehensive dataset of annotated MoRFs to build and empirically compare our method. MoRFpred utilizes a novel design in which annotations generated by sequence alignment are fused with predictions generated by a Support Vector Machine (SVM), which uses a custom designed set of sequence-derived features. The features provide information about evolutionary profiles, selected physiochemical properties of amino acids, and predicted disorder, solvent accessibility and B-factors. Empirical evaluation on several datasets shows that MoRFpred outperforms related methods: α-MoRF-Pred that predicts α-MoRFs and ANCHOR which finds disordered regions that become ordered when bound to a globular partner. We show that our predicted (new) MoRF regions have non-random sequence similarity with native MoRFs. We use this observation along with the fact that predictions with higher probability are more accurate to identify putative MoRF regions. We also identify a few sequence-derived hallmarks of MoRFs. They are characterized by dips in the disorder predictions and higher hydrophobicity and stability when compared to adjacent (in the chain) residues. Availability:http://biomine.ece.ualberta.ca/MoRFpred/; http://biomine.ece.ualberta.ca/MoRFpred/Supplement.pdf Contact:lkurgan@ece.ualberta.ca Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fatemeh Miri Disfani
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, T6G 2V4, Canada
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Olson B, Molloy K, Hendi SF, Shehu A. Guiding probabilistic search of the protein conformational space with structural profiles. J Bioinform Comput Biol 2012; 10:1242005. [PMID: 22809381 DOI: 10.1142/s021972001242005x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The roughness of the protein energy surface poses a significant challenge to search algorithms that seek to obtain a structural characterization of the native state. Recent research seeks to bias search toward near-native conformations through one-dimensional structural profiles of the protein native state. Here we investigate the effectiveness of such profiles in a structure prediction setting for proteins of various sizes and folds. We pursue two directions. We first investigate the contribution of structural profiles in comparison to or in conjunction with physics-based energy functions in providing an effective energy bias. We conduct this investigation in the context of Metropolis Monte Carlo with fragment-based assembly. Second, we explore the effectiveness of structural profiles in providing projection coordinates through which to organize the conformational space. We do so in the context of a robotics-inspired search framework proposed in our lab that employs projections of the conformational space to guide search. Our findings indicate that structural profiles are most effective in obtaining physically realistic near-native conformations when employed in conjunction with physics-based energy functions. Our findings also show that these profiles are very effective when employed instead as projection coordinates to guide probabilistic search toward undersampled regions of the conformational space.
Collapse
Affiliation(s)
- Brian Olson
- Department of Computer Science, George Mason University, 4400 University Drive Fairfax, VA 22030, USA
| | | | | | | |
Collapse
|
30
|
Bastolla U, Bruscolini P, Velasco JL. Sequence determinants of protein folding rates: Positive correlation between contact energy and contact range indicates selection for fast folding. Proteins 2012; 80:2287-304. [DOI: 10.1002/prot.24118] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2011] [Revised: 05/14/2012] [Accepted: 05/17/2012] [Indexed: 11/12/2022]
|
31
|
Wolff K, Vendruscolo M, Porto M. Coarse-grained model for protein folding based on structural profiles. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 84:041934. [PMID: 22181202 DOI: 10.1103/physreve.84.041934] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2011] [Indexed: 05/31/2023]
Abstract
We study a coarse-grained protein model whose primary characteristics are (i) a tubelike geometry to describe the self-avoidance effects of the polypeptide chain and (ii) an energy function based on a one-dimensional structural representation. The latter specifies the connectivity of a sequence in a given conformation, so that the energy function, rather than favoring the formation of specific native pairwise contacts, promotes the establishment of a specific target connectivity for each amino acid. We show that the resulting dynamics is in good agreement with both experimental observations and the results of all-atoms simulations. In contrast to the latter, our coarse-grained approach provides the possibility to explore longer time scales and thus enables one to access, albeit in less detail, larger regions of the conformational space. We illustrate our approach by its application to the villin headpiece domain, a three-helix protein, by studying its folding behavior and determining heat capacities and free-energy landscapes in various reaction coordinates.
Collapse
Affiliation(s)
- Katrin Wolff
- School of Physics, University of Edinburgh, JCMB Kings Buildings, Edinburgh EH9 3JZ, United Kingdom
| | | | | |
Collapse
|
32
|
Kochańczyk M. Prediction of functionally important residues in globular proteins from unusual central distances of amino acids. BMC STRUCTURAL BIOLOGY 2011; 11:34. [PMID: 21923943 PMCID: PMC3188475 DOI: 10.1186/1472-6807-11-34] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2011] [Accepted: 09/18/2011] [Indexed: 12/12/2022]
Abstract
BACKGROUND Well-performing automated protein function recognition approaches usually comprise several complementary techniques. Beside constructing better consensus, their predictive power can be improved by either adding or refining independent modules that explore orthogonal features of proteins. In this work, we demonstrated how the exploration of global atomic distributions can be used to indicate functionally important residues. RESULTS Using a set of carefully selected globular proteins, we parametrized continuous probability density functions describing preferred central distances of individual protein atoms. Relative preferred burials were estimated using mixture models of radial density functions dependent on the amino acid composition of a protein under consideration. The unexpectedness of extraordinary locations of atoms was evaluated in the information-theoretic manner and used directly for the identification of key amino acids. In the validation study, we tested capabilities of a tool built upon our approach, called SurpResi, by searching for binding sites interacting with ligands. The tool indicated multiple candidate sites achieving success rates comparable to several geometric methods. We also showed that the unexpectedness is a property of regions involved in protein-protein interactions, and thus can be used for the ranking of protein docking predictions. The computational approach implemented in this work is freely available via a Web interface at http://www.bioinformatics.org/surpresi. CONCLUSIONS Probabilistic analysis of atomic central distances in globular proteins is capable of capturing distinct orientational preferences of amino acids as resulting from different sizes, charges and hydrophobic characters of their side chains. When idealized spatial preferences can be inferred from the sole amino acid composition of a protein, residues located in hydrophobically unfavorable environments can be easily detected. Such residues turn out to be often directly involved in binding ligands or interfacing with other proteins.
Collapse
Affiliation(s)
- Marek Kochańczyk
- Faculty of Physics, Jagiellonian University, ul, Reymonta 4, 30-059 Krakow, Poland.
| |
Collapse
|
33
|
The relationship between relative solvent accessibility and evolutionary rate in protein evolution. Genetics 2011; 188:479-88. [PMID: 21467571 DOI: 10.1534/genetics.111.128025] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recent work with Saccharomyces cerevisiae shows a linear relationship between the evolutionary rate of sites and the relative solvent accessibility (RSA) of the corresponding residues in the folded protein. Here, we aim to develop a mathematical model that can reproduce this linear relationship. We first demonstrate that two models that both seem reasonable choices (a simple model in which selection strength correlates with RSA and a more complex model based on RSA-dependent amino acid distributions) fail to reproduce the observed relationship. We then develop a model on the basis of observed site-specific amino acid distributions and show that this model behaves appropriately. We conclude that evolutionary rates are directly linked to the distribution of amino acids at individual sites. Because of this link, any future insight into the biophysical mechanisms that determine amino acid distributions will improve our understanding of evolutionary rates.
Collapse
|
34
|
Teichert F, Minning J, Bastolla U, Porto M. High quality protein sequence alignment by combining structural profile prediction and profile alignment using SABER-TOOTH. BMC Bioinformatics 2010; 11:251. [PMID: 20470364 PMCID: PMC2885375 DOI: 10.1186/1471-2105-11-251] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2009] [Accepted: 05/14/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein alignments are an essential tool for many bioinformatics analyses. While sequence alignments are accurate for proteins of high sequence similarity, they become unreliable as they approach the so-called 'twilight zone' where sequence similarity gets indistinguishable from random. For such distant pairs, structure alignment is of much better quality. Nevertheless, sequence alignment is the only choice in the majority of cases where structural data is not available. This situation demands development of methods that extend the applicability of accurate sequence alignment to distantly related proteins. RESULTS We develop a sequence alignment method that combines the prediction of a structural profile based on the protein's sequence with the alignment of that profile using our recently published alignment tool SABERTOOTH. In particular, we predict the contact vector of protein structures using an artificial neural network based on position-specific scoring matrices generated by PSI-BLAST and align these predicted contact vectors. The resulting sequence alignments are assessed using two different tests: First, we assess the alignment quality by measuring the derived structural similarity for cases in which structures are available. In a second test, we quantify the ability of the significance score of the alignments to recognize structural and evolutionary relationships. As a benchmark we use a representative set of the SCOP (structural classification of proteins) database, with similarities ranging from closely related proteins at SCOP family level, to very distantly related proteins at SCOP fold level. Comparing these results with some prominent sequence alignment tools, we find that SABERTOOTH produces sequence alignments of better quality than those of Clustal W, T-Coffee, MUSCLE, and PSI-BLAST. HHpred, one of the most sophisticated and computationally expensive tools available, outperforms our alignment algorithm at family and superfamily levels, while the use of SABERTOOTH is advantageous for alignments at fold level. Our alignment scheme will profit from future improvements of structural profiles prediction. CONCLUSIONS We present the automatic sequence alignment tool SABERTOOTH that computes pairwise sequence alignments of very high quality. SABERTOOTH is especially advantageous when applied to alignments of remotely related proteins. The source code is available at http://www.fkp.tu-darmstadt.de/sabertooth_project/, free for academic users upon request.
Collapse
Affiliation(s)
- Florian Teichert
- Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstr, Darmstadt, Germany
| | | | | | | |
Collapse
|
35
|
Morra G, Baragli C, Colombo G. Selecting sequences that fold into a defined 3D structure: A new approach for protein design based on molecular dynamics and energetics. Biophys Chem 2009; 146:76-84. [PMID: 19926206 DOI: 10.1016/j.bpc.2009.10.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2009] [Revised: 10/07/2009] [Accepted: 10/26/2009] [Indexed: 11/29/2022]
Abstract
The problem of finding amino acid sequences able to fold into a defined three-dimensional (3D) structure is at the basis of successful protein design efforts. Herein, we present the results of the application of a novel, all-atom molecular dynamics based, energy decomposition approach to the selection of sequences able to fold into a given 3D conformation. First, the energy decomposition approach is applied to natural sequences associated to a well-defined structure to identify the principal energetic coupling interactions necessary to stabilize it, defining the specific energetic signature for the fold. Then, several different sequences are threaded on the defined 3D structure and only those sequences whose energetic signature (pattern) is close to that of the natural sequence, according to a similarity criterion, are selected as able to populate the specific fold. Furthermore, it is possible to evaluate the fitness of a certain sequence for a fold by combining the information provided by the energetic signature to that contained in the contact map, which recapitulates the fold topology. The results show that the better fit between the energetic properties of a sequence and the topology corresponds to a better stabilization of the protein fold by that sequence. We applied this approach to a library of natural and artificial WW domain sequences, previously developed by the Ranganathan group, containing sequences that are experimentally known to be able and unable to fold into native structures. The results show that our approach can correctly identify 70% of the sequences known to populate the typical WW domain fold.
Collapse
Affiliation(s)
- Giulia Morra
- Istituto di Chimica del Riconoscimento Molecolare, CNR, Milano, Italy
| | | | | |
Collapse
|
36
|
Fornes O, Aragues R, Espadaler J, Marti-Renom MA, Sali A, Oliva B. ModLink+: improving fold recognition by using protein-protein interactions. ACTA ACUST UNITED AC 2009; 25:1506-12. [PMID: 19357100 DOI: 10.1093/bioinformatics/btp238] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
MOTIVATION Several strategies have been developed to predict the fold of a target protein sequence, most of which are based on aligning the target sequence to other sequences of known structure. Previously, we demonstrated that the consideration of protein-protein interactions significantly increases the accuracy of fold assignment compared with PSI-BLAST sequence comparisons. A drawback of our method was the low number of proteins to which a fold could be assigned. Here, we present an improved version of the method that addresses this limitation. We also compare our method to other state-of-the-art fold assignment methodologies. RESULTS Our approach (ModLink+) has been tested on 3716 proteins with domain folds classified in the Structural Classification Of Proteins (SCOP) as well as known interacting partners in the Database of Interacting Proteins (DIP). For this test set, the ratio of success [positive predictive value (PPV)] on fold assignment increases from 75% for PSI-BLAST, 83% for HHSearch and 81% for PRC to >90% for ModLink+at the e-value cutoff of 10(-3). Under this e-value, ModLink+can assign a fold to 30-45% of the proteins in the test set, while our previous method could cover <25%. When applied to 6384 proteins with unknown fold in the yeast proteome, ModLink+combined with PSI-BLAST assigns a fold for domains in 3738 proteins, while PSI-BLAST alone covers only 2122 proteins, HHSearch 2969 and PRC 2826 proteins, using a threshold e-value that would represent a PPV >82% for each method in the test set. AVAILABILITY The ModLink+server is freely accessible in the World Wide Web at http://sbi.imim.es/modlink/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Oriol Fornes
- Structural Bioinformatics Lab (GRIB-IMIM), Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona (PRBB), Barcelona, Catalonia, Spain.
| | | | | | | | | | | |
Collapse
|
37
|
Kloczkowski A, Jernigan RL, Wu Z, Song G, Yang L, Kolinski A, Pokarowski P. Distance matrix-based approach to protein structure prediction. ACTA ACUST UNITED AC 2009; 10:67-81. [PMID: 19224393 DOI: 10.1007/s10969-009-9062-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2008] [Accepted: 02/01/2009] [Indexed: 10/21/2022]
Abstract
Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r(ij)(2)] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r (ij) is greater or less than a cutoff value r (cutoff). We have performed spectral decomposition of the distance matrices D = sigma lambda(k)V(k)V(kT), in terms of eigenvalues lambda kappa and the corresponding eigenvectors v kappa and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r (2)--the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r (2) from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 A, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 A. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the dynamics. After structure matching, we apply principal component analysis (PCA) to obtain the important apparent motions for both bound and unbound structures. There are significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM) that is based on the contact matrix C (related to D), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Similarities are also found when the approach is applied to an NMR ensemble, as well as to atomic molecular dynamics (MD) trajectories. Thus, a sufficiently large number of experimental structures can directly provide important information about protein dynamics, but ENM can also provide a similar sampling of conformations. Finally, we use distance constraints from databases of known protein structures for structure refinement. We use the distributions of distances of various types in known protein structures to obtain the most probable ranges or the mean-force potentials for the distances. We then impose these constraints on structures to be refined or include the mean-force potentials directly in the energy minimization so that more plausible structural models can be built. This approach has been successfully used by us in 2006 in the CASPR structure refinement (http://predictioncenter.org/caspR).
Collapse
Affiliation(s)
- Andrzej Kloczkowski
- Laurence H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, 112 Office and Lab Bldg, Ames, IA 50011-3020, USA.
| | | | | | | | | | | | | |
Collapse
|
38
|
Bastolla U, Ortíz AR, Porto M, Teichert F. Effective connectivity profile: a structural representation that evidences the relationship between protein structures and sequences. Proteins 2008; 73:872-88. [PMID: 18536008 DOI: 10.1002/prot.22113] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The complexity of protein structures calls for simplified representations of their topology. The simplest possible mathematical description of a protein structure is a one-dimensional profile representing, for instance, buriedness or secondary structure. This kind of representation has been introduced for studying the sequence to structure relationship, with applications to fold recognition. Here we define the effective connectivity profile (EC), a network theoretical profile that self-consistently represents the network structure of the protein contact matrix. The EC profile makes mathematically explicit the relationship between protein structure and protein sequence, because it allows predicting the average hydrophobicity profile (HP) and the distributions of amino acids at each site for families of homologous proteins sharing the same structure. In this sense, the EC provides an analytic solution to the statistical inverse folding problem, which consists in finding the statistical properties of the set of sequences compatible with a given structure. We tested these predictions with simulations of the structurally constrained neutral (SCN) model of protein evolution with structure conservation, for single- and multi-domain proteins, and for a wide range of mutation processes, the latter producing sequences with very different hydrophobicity profiles, finding that the EC-based predictions are accurate even when only one sequence of the family is known. The EC profile is very significantly correlated with the HP for sequence-structure pairs in the PDB as well. The EC profile generalizes the properties of previously introduced structural profiles to modular proteins such as multidomain chains, and its correlation with the sequence profile is substantially improved with respect to the previously defined profiles, particularly for long proteins. Furthermore, the EC profile has a dynamic interpretation, since the EC components are strongly inversely related with the temperature factors measured in X-ray experiments, meaning that positions with large EC component are more strongly constrained in their equilibrium dynamics. Last, the EC profile allows to define a natural measure of modularity that correlates with the number of domains composing the protein, suggesting its application for domain decomposition. Finally, we show that structurally similar proteins have similar EC profiles, so that the similarity between aligned EC profiles can be used as a structure similarity measure, a property that we have recently applied for protein structure alignment. The code for computing the EC profile is available upon request writing to ubastolla@cbm.uam.es, and the structural profiles discussed in this article can be downloaded from the SLOTH webserver http://www.fkp.tu-darmstadt.de/SLOTH/.
Collapse
Affiliation(s)
- Ugo Bastolla
- Centro de Biología Molecular Severo Ochoa, (CSIC-UAM), Cantoblanco, 28049 Madrid, Spain.
| | | | | | | |
Collapse
|
39
|
Wolff K, Vendruscolo M, Porto M. Stochastic reconstruction of protein structures from effective connectivity profiles. PMC BIOPHYSICS 2008; 1:5. [PMID: 19351427 PMCID: PMC2666633 DOI: 10.1186/1757-5036-1-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2008] [Accepted: 11/26/2008] [Indexed: 11/23/2022]
Abstract
We discuss a stochastic approach for reconstructing the native structures of proteins from the knowledge of the "effective connectivity", which is a one-dimensional structural profile constructed as a linear combination of the eigenvectors of the contact map of the target structure. The structural profile is used to bias a search of the conformational space towards the target structure in a Monte Carlo scheme operating on a Cα-chain of uniform, finite thickness. Structure information thus enters the folding dynamics via the effective connectivity, but the interaction is not restricted to pairs of amino acids that form native contacts, resulting in a free energy landscape which does not rely on the assumption of minimal frustration. Moreover, effective connectivity vectors can be predicted more readily from the amino acid sequence of proteins than the corresponding contact maps, thus suggesting that the stochastic protocol presented here could be effectively combined with other current methods for predicting native structures. PACS codes: 87.14.Ee.
Collapse
Affiliation(s)
- Katrin Wolff
- Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstrasse 6, 64289 Darmstadt, Germany.
| | | | | |
Collapse
|
40
|
Morra G, Colombo G. Relationship between energy distribution and fold stability: Insights from molecular dynamics simulations of native and mutant proteins. Proteins 2008; 72:660-72. [PMID: 18247351 DOI: 10.1002/prot.21963] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Most proteins must fold to a well-defined structure with a minimal stability to perform their function. Here we use a simple, molecular dynamics-based, energy decomposition approach to map the principal energetic interactions in a set of proteins representative of different folds. This work involves the all-atom simulation and analysis of the native structures and mutants of five different proteins representative of an all-alpha (yACPB, Protein A), all-beta (SH3), and a mixed alpha/beta fold (Proteins G and L). Given a certain structure, a native sequence and a set of mutants, we show that our model discriminates the ability of a mutation to yield a more or less stable protein, in agreement with experimental data, catching the principal energetic determinants of protein stabilization. Our approach identifies the interaction determinants responsible to define a fold and shows that mutations can either modulate the strength of pair-wise coupling between residues important for folding, or modify the profile of the principal interactions. Furthermore, we address the question of how to evaluate the fitness of a sequence to a given structure by comparing the information contained in the energy map, which recapitulates the chemistry of the sequence, to that contained in the contact map, which recapitulates the fold topology. The results show that the better fit between the energetic properties of the sequence and the fold topology corresponds to a higher stabilization of the protein. We discuss the relevance of these observations to the analysis of protein designability and to the rational evolution of new sequences.
Collapse
Affiliation(s)
- Giulia Morra
- Istituto di Chimica del Riconoscimento Molecolare, CNR, Via Mario Bianco 9, 20131, Milano, Italy
| | | |
Collapse
|
41
|
Wolff K, Vendruscolo M, Porto M. A stochastic method for the reconstruction of protein structures from one-dimensional structural profiles. Gene 2008; 422:47-51. [PMID: 18577428 DOI: 10.1016/j.gene.2008.06.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We discuss a computational approach for reconstructing the native structures of proteins from the knowledge of a structural profile - the first eigenvector of the contact map of the native structure itself. The procedure consists in carrying out Monte Carlo simulations of a tube model of the protein structure with an energy bias towards the target structural profile. We present the reconstruction of two small proteins and address problems arising in the reconstruction of larger proteins. Our results indicate that an accurate physico-chemical energy function should be used in conjunction with the structural profile bias in order to achieve accurate reconstructions.
Collapse
Affiliation(s)
- Katrin Wolff
- Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstrasse 6, 64289 Darmstadt, Germany
| | | | | |
Collapse
|
42
|
Miyazawa S, Kinjo AR. Properties of contact matrices induced by pairwise interactions in proteins. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2008; 77:051910. [PMID: 18643105 DOI: 10.1103/physreve.77.051910] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2008] [Indexed: 05/26/2023]
Abstract
The properties of contact matrices ( C matrices) needed for native proteins to be the lowest-energy conformations are considered in relation to a contact energy matrix ( E matrix). The total conformational energy is assumed to consist of pairwise interaction energies between atoms or residues, each of which is expressed as a product of a conformation-dependent function (an element of the C matrix) and a sequence-dependent energy parameter (an element of the E matrix). Such pairwise interactions in proteins force native C matrices to be in a relationship as if the interactions are a Go-like potential [N. Go, Annu. Rev. Biophys. Bioeng. 12, 183 (1983)] for the native C matrix, because the lowest bound of the total energy function is equal to the total energy of the native conformation interacting in a Go-like pairwise potential. This relationship between C and E matrices corresponds to (a) a parallel relationship between the eigenvectors of the C and E matrices and a linear relationship between their eigenvalues and (b) a parallel relationship between a contact number vector and the principal eigenvectors of the C and E matrices, where the E matrix is expanded in a series of eigenspaces with an additional constant term. The additional constant term in the spectral expansion of the E matrix is indicated by the lowest bound of the total energy function to correspond to a threshold of contact energy that approximately separates native contacts from non-native ones. Inner products between the principal eigenvector of the C matrix, that of the E matrix, and a contact number vector have been examined for 182 proteins, each of which is a representative from each family of the SCOP database [Murzin, J. Mol. Biol. 247, 536 (1995)], and the results indicate the parallel tendencies between those vectors. A statistical contact potential [S. Miyazawa and R. L. Jernigan, Proteins 34, 49 (1999); S. Miyazawa and R. L. Jernigan, Proteins50, 35 (2003)] estimated from protein crystal structures was used to evaluate pairwise residue-residue interactions in the proteins. In addition, the spectral representation of C and E matrices reveals that pairwise residue-residue interactions, which depend only on the types of interacting amino acids, but not on other residues in a protein, are insufficient and other interactions including residue connectivities and steric hindrance are needed to make native structures unique lowest-energy conformations.
Collapse
Affiliation(s)
- Sanzo Miyazawa
- Graduate School of Engineering, Gunma University, Kiryu, Gunma 376-8515, Japan.
| | | |
Collapse
|
43
|
Kinjo AR, Nakamura H. Nature of protein family signatures: insights from singular value analysis of position-specific scoring matrices. PLoS One 2008; 3:e1963. [PMID: 18398479 PMCID: PMC2276316 DOI: 10.1371/journal.pone.0001963] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2008] [Accepted: 03/05/2008] [Indexed: 11/19/2022] Open
Abstract
Position-specific scoring matrices (PSSMs) are useful for detecting weak homology in protein sequence analysis, and they are thought to contain some essential signatures of the protein families. In order to elucidate what kind of ingredients constitute such family-specific signatures, we apply singular value decomposition to a set of PSSMs and examine the properties of dominant right and left singular vectors. The first right singular vectors were correlated with various amino acid indices including relative mutability, amino acid composition in protein interior, hydropathy, or turn propensity, depending on proteins. A significant correlation between the first left singular vector and a measure of site conservation was observed. It is shown that the contribution of the first singular component to the PSSMs act to disfavor potentially but falsely functionally important residues at conserved sites. The second right singular vectors were highly correlated with hydrophobicity scales, and the corresponding left singular vectors with contact numbers of protein structures. It is suggested that sequence alignment with a PSSM is essentially equivalent to threading supplemented with functional information. In addition, singular vectors may be useful for analyzing and annotating the characteristics of conserved sites in protein families.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, Suita, Osaka, Japan.
| | | |
Collapse
|
44
|
Buchete NV, Straub JE, Thirumalai D. Dissecting contact potentials for proteins: relative contributions of individual amino acids. Proteins 2008; 70:119-30. [PMID: 17640067 DOI: 10.1002/prot.21538] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Knowledge-based contact potentials are routinely used in fold recognition, binding of peptides to proteins, structure prediction, and coarse-grained models to probe protein folding kinetics. The dominant physical forces embodied in the contact potentials are revealed by eigenvalue analysis of the matrices, whose elements describe the strengths of interaction between amino acid side chains. We propose a general method to rank quantitatively the importance of various inter-residue interactions represented in the currently popular pair contact potentials. Eigenvalue analysis and correlation diagrams are used to rank the inter-residue pair interactions with respect to the magnitude of their relative contributions to the contact potentials. The amino acid ranking is shown to be consistent with a mean field approximation that is used to reconstruct the original contact potentials from the most relevant amino acids for several contact potentials. By providing a general, relative ranking score for amino acids, this method permits a detailed, quantitative comparison of various contact interaction schemes. For most contact potentials, between 7 and 9 amino acids of varying chemical character are needed to accurately reconstruct the full matrix. By correlating the identified important amino acid residues in contact potentials and analysis of about 7800 structural domains in the CATH database we predict that it is important to model accurately interactions between small hydrophobic residues. In addition, only potentials that take interactions involving the protein backbone into account can predict dense packing in protein structures.
Collapse
Affiliation(s)
- N-V Buchete
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892-0520, USA.
| | | | | |
Collapse
|
45
|
Abstract
DNA and amino acid sequences contain information about both the phylogenetic relationships among species and the evolutionary processes that caused the sequences to divergence. Mathematical and statistical methods try to detect this information to determine how and why DNA and protein molecules work the way they do. This chapter describes some of the models of evolution of biological sequences most widely used. It first focuses on single nucleotide/amino acid replacement rate models. Then it discusses the modelling of evolution at gene and protein module levels. The chapter concludes with speculations about the future use of molecular evolution studies using genomic and proteomic data.
Collapse
Affiliation(s)
- Pietro Liò
- Computer Laboratory, University of Cambridge, Cambridge, UK
| | | |
Collapse
|
46
|
Bastolla U, Porto M, Ortíz AR. Local interactions in protein folding determined through an inverse folding model. Proteins 2008; 71:278-99. [DOI: 10.1002/prot.21730] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
47
|
|
48
|
Holladay NB, Kinch LN, Grishin NV. Optimization of linear disorder predictors yields tight association between crystallographic disorder and hydrophobicity. Protein Sci 2007; 16:2140-52. [PMID: 17893360 PMCID: PMC2204125 DOI: 10.1110/ps.072980107] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
X-ray crystallographic protein structures often contain disordered regions that are observed as missing electron density. Diffraction data may give little or no direct evidence as to the specific nature of disordered regions. We have developed a weighted window-based disorder predictor optimized using crystallographic data. Performance of a predictor is strongly influenced by chain termini. Optimized score adjustment values for amino- and carboxy-terminal positions demonstrate a simple, monotonic relationship between disorder and residue distance from termini. This optimized disorder predictor performs similarly to DISOPRED2 on crystallographically disordered regions. Data-optimized residue disorder propensities show strong linear correlation with experimentally determined amino acid transfer energies between water and hydrogen-bonding organic solvents, which primarily reflect residue hydrophobicity (exemplified by the Nozaki-Tanford hydrophobicity scale). Disorder propensities do not correlate as well with transfer energies between water and apolar solvents, which primarily reflect a different hydropathic property: residue hydrophilicity (also reflected by the Kyte-Doolittle hydropathy scale). Our results suggest that while hydrophobic side-chain interactions are primarily involved in determining stability of the folded conformation, hydrogen bonding, and similar polar interactions are primarily involved in conformational and interaction specificity.
Collapse
Affiliation(s)
- Nathan B Holladay
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050, USA
| | | | | |
Collapse
|
49
|
Teichert F, Bastolla U, Porto M. SABERTOOTH: protein structural alignment based on a vectorial structure representation. BMC Bioinformatics 2007; 8:425. [PMID: 17974011 PMCID: PMC2257979 DOI: 10.1186/1471-2105-8-425] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2007] [Accepted: 10/31/2007] [Indexed: 11/22/2022] Open
Abstract
Background The task of computing highly accurate structural alignments of proteins in very short computation time is still challenging. This is partly due to the complexity of protein structures. Therefore, instead of manipulating coordinates directly, matrices of inter-atomic distances, sets of vectors between protein backbone atoms, and other reduced representations are used. These decrease the effort of comparing large sets of coordinates, but protein structural alignment still remains computationally expensive. Results We represent the topology of a protein structure through a structural profile that expresses the global effective connectivity of each residue. We have shown recently that this representation allows explicitly expressing the relationship between protein structure and protein sequence. Based on this very condensed vectorial representation, we develop a structural alignment framework that recognizes structural similarities with accuracy comparable to established alignment tools. Furthermore, our algorithm has favourable scaling of computation time with chain length. Since the algorithm is independent of the details of the structural representation, our framework can be applied to sequence-to-sequence and sequence-to-structure comparison within the same setup, and it is therefore more general than other existing tools.
Collapse
Affiliation(s)
- Florian Teichert
- Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstr, 6-8, 64289 Darmstadt, Germany.
| | | | | |
Collapse
|
50
|
The Structurally Constrained Neutral Model of Protein Evolution. ACTA ACUST UNITED AC 2007. [DOI: 10.1007/978-3-540-35306-5_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|