1
|
Brown SM, Mayer-Bacon C, Freeland S. Xeno Amino Acids: A Look into Biochemistry as We Do Not Know It. Life (Basel) 2023; 13:2281. [PMID: 38137883 PMCID: PMC10744825 DOI: 10.3390/life13122281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 11/18/2023] [Accepted: 11/20/2023] [Indexed: 12/24/2023] Open
Abstract
Would another origin of life resemble Earth's biochemical use of amino acids? Here, we review current knowledge at three levels: (1) Could other classes of chemical structure serve as building blocks for biopolymer structure and catalysis? Amino acids now seem both readily available to, and a plausible chemical attractor for, life as we do not know it. Amino acids thus remain important and tractable targets for astrobiological research. (2) If amino acids are used, would we expect the same L-alpha-structural subclass used by life? Despite numerous ideas, it is not clear why life favors L-enantiomers. It seems clearer, however, why life on Earth uses the shortest possible (alpha-) amino acid backbone, and why each carries only one side chain. However, assertions that other backbones are physicochemically impossible have relaxed into arguments that they are disadvantageous. (3) Would we expect a similar set of side chains to those within the genetic code? Many plausible alternatives exist. Furthermore, evidence exists for both evolutionary advantage and physicochemical constraint as explanatory factors for those encoded by life. Overall, as focus shifts from amino acids as a chemical class to specific side chains used by post-LUCA biology, the probable role of physicochemical constraint diminishes relative to that of biological evolution. Exciting opportunities now present themselves for laboratory work and computing to explore how changing the amino acid alphabet alters the universe of protein folds. Near-term milestones include: (a) expanding evidence about amino acids as attractors within chemical evolution; (b) extending characterization of other backbones relative to biological proteins; and (c) merging computing and laboratory explorations of structures and functions unlocked by xeno peptides.
Collapse
|
2
|
Beuming T, Martín H, Díaz-Rovira AM, Díaz L, Guallar V, Ray SS. Are Deep Learning Structural Models Sufficiently Accurate for Free-Energy Calculations? Application of FEP+ to AlphaFold2-Predicted Structures. J Chem Inf Model 2022; 62:4351-4360. [PMID: 36099477 DOI: 10.1021/acs.jcim.2c00796] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The availability of AlphaFold2 has led to great excitement in the scientific community─particularly among drug hunters─due to the ability of the algorithm to predict protein structures with high accuracy. However, beyond globally accurate protein structure prediction, it remains to be determined whether ligand binding sites are predicted with sufficient accuracy in these structures to be useful in supporting computationally driven drug discovery programs. We explored this question by performing free-energy perturbation (FEP) calculations on a set of well-studied protein-ligand complexes, where AlphaFold2 predictions were performed by removing all templates with >30% identity to the target protein from the training set. We observed that in most cases, the ΔΔG values for ligand transformations calculated with FEP, using these prospective AlphaFold2 structures, were comparable in accuracy to the corresponding calculations previously carried out using crystal structures. We conclude that under the right circumstances, AlphaFold2-modeled structures are accurate enough to be used by physics-based methods such as FEP in typical lead optimization stages of a drug discovery program.
Collapse
Affiliation(s)
- Thijs Beuming
- Latham Biopharm Group, 101 Main Street, Suite 1400, Cambridge, Massachusetts 02142, United States
| | | | - Anna M Díaz-Rovira
- Barcelona Supercomputing Center, Jordi Girona 29, E-08034 Barcelona, Spain
| | - Lucía Díaz
- NOSTRUM BIODISCOVERY S.L., E-08029 Barcelona, Spain
| | - Victor Guallar
- NOSTRUM BIODISCOVERY S.L., E-08029 Barcelona, Spain.,Barcelona Supercomputing Center, Jordi Girona 29, E-08034 Barcelona, Spain.,ICREA, Passeig Lluís Companys 23, E-08010 Barcelona, Spain
| | - Soumya S Ray
- RA Capital, 200 Berkeley Street, Boston Massachusetts 02116, United States
| |
Collapse
|
3
|
Church JR, Amoyal GS, Borin VA, Adam S, Olsen JMH, Schapiro I. Deciphering the Spectral Tuning Mechanism in Proteorhodopsin: The Dominant Role of Electrostatics Instead of Chromophore Geometry. Chemistry 2022; 28:e202200139. [PMID: 35307890 PMCID: PMC9325082 DOI: 10.1002/chem.202200139] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Indexed: 11/11/2022]
Abstract
Proteorhodopsin (PR) is a photoactive proton pump found in marine bacteria. There are two phenotypes of PR exhibiting an environmental adaptation to the ocean's depth which tunes their maximum absorption: blue‐absorbing proteorhodopsin (BPR) and green‐absorbing proteorhodopsin (GPR). This blue/green color‐shift is controlled by a glutamine to leucine substitution at position 105 which accounts for a 20 nm shift. Typically, spectral tuning in rhodopsins is rationalized by the external point charge model but the Q105L mutation is charge neutral. To study this tuning mechanism, we employed the hybrid QM/MM method with sampling from molecular dynamics. Our results reveal that the positive partial charge of glutamine near the C14−C15 bond of retinal shortens the effective conjugation length of the chromophore compared to the leucine residue. The derived mechanism can be applied to explain the color regulation in other retinal proteins and can serve as a guideline for rational design of spectral shifts.
Collapse
Affiliation(s)
- Jonathan R Church
- Fritz Haber Center for Molecular Dynamics Research, Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel
| | - Gil S Amoyal
- Fritz Haber Center for Molecular Dynamics Research, Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel
| | - Veniamin A Borin
- Fritz Haber Center for Molecular Dynamics Research, Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel
| | - Suliman Adam
- Fritz Haber Center for Molecular Dynamics Research, Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel
| | | | - Igor Schapiro
- Fritz Haber Center for Molecular Dynamics Research, Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel
| |
Collapse
|
4
|
Thambu K, Glomb V, Hernadez R, Facelli JC. Microproteins: a 3D protein structure prediction analysis. J Biomol Struct Dyn 2021; 40:13738-13746. [PMID: 34705603 PMCID: PMC9489054 DOI: 10.1080/07391102.2021.1993343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 10/11/2021] [Indexed: 01/03/2023]
Abstract
Microproteins are a novel and expanding group of small proteins encoded by less than 100-150 codons that are translated from small open reading frames (smORFs). It has been shown that smORFs and their corresponding microproteins make up a sizable fraction of the genome and proteome, but very little information on microproteins' structural features exists in the literature. In this paper, we present the results of analyzing the predicted structures of 44 microproteins. The results show that this set of microproteins have a different amino acid composition profiles, similar structural characteristics and fewer small-molecule ligand binding sites than regular proteins.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Kishan Thambu
- Department of Biomedical Informatics, The University of Utah, Salt Lake City, Utah
| | - Victoria Glomb
- Department of Biomedical Informatics, The University of Utah, Salt Lake City, Utah
| | - Rolando Hernadez
- Department of Biomedical Informatics, The University of Utah, Salt Lake City, Utah
| | - Julio C. Facelli
- Department of Biomedical Informatics, The University of Utah, Salt Lake City, Utah
- Center for Clinical and Translational Science, The University of Utah, Salt Lake City, Utah
| |
Collapse
|
5
|
Klebba PE, Newton SMC, Six DA, Kumar A, Yang T, Nairn BL, Munger C, Chakravorty S. Iron Acquisition Systems of Gram-negative Bacterial Pathogens Define TonB-Dependent Pathways to Novel Antibiotics. Chem Rev 2021; 121:5193-5239. [PMID: 33724814 PMCID: PMC8687107 DOI: 10.1021/acs.chemrev.0c01005] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Iron is an indispensable metabolic cofactor in both pro- and eukaryotes, which engenders a natural competition for the metal between bacterial pathogens and their human or animal hosts. Bacteria secrete siderophores that extract Fe3+ from tissues, fluids, cells, and proteins; the ligand gated porins of the Gram-negative bacterial outer membrane actively acquire the resulting ferric siderophores, as well as other iron-containing molecules like heme. Conversely, eukaryotic hosts combat bacterial iron scavenging by sequestering Fe3+ in binding proteins and ferritin. The variety of iron uptake systems in Gram-negative bacterial pathogens illustrates a range of chemical and biochemical mechanisms that facilitate microbial pathogenesis. This document attempts to summarize and understand these processes, to guide discovery of immunological or chemical interventions that may thwart infectious disease.
Collapse
Affiliation(s)
- Phillip E Klebba
- Department of Biochemistry and Molecular Biophysics, Kansas State University, Manhattan, Kansas 66506, United States
| | - Salete M C Newton
- Department of Biochemistry and Molecular Biophysics, Kansas State University, Manhattan, Kansas 66506, United States
| | - David A Six
- Venatorx Pharmaceuticals, Inc., 30 Spring Mill Drive, Malvern, Pennsylvania 19355, United States
| | - Ashish Kumar
- Department of Biochemistry and Molecular Biophysics, Kansas State University, Manhattan, Kansas 66506, United States
| | - Taihao Yang
- Department of Biochemistry and Molecular Biophysics, Kansas State University, Manhattan, Kansas 66506, United States
| | - Brittany L Nairn
- Department of Biological Sciences, Bethel University, 3900 Bethel Drive, St. Paul, Minnesota 55112, United States
| | - Colton Munger
- Department of Biochemistry and Molecular Biophysics, Kansas State University, Manhattan, Kansas 66506, United States
| | - Somnath Chakravorty
- Jacobs School of Medicine and Biomedical Sciences, SUNY Buffalo, Buffalo, New York 14203, United States
| |
Collapse
|
6
|
Jia K, Jernigan RL. New amino acid substitution matrix brings sequence alignments into agreement with structure matches. Proteins 2021; 89:671-682. [PMID: 33469973 PMCID: PMC8641535 DOI: 10.1002/prot.26050] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 01/08/2021] [Accepted: 01/12/2021] [Indexed: 12/27/2022]
Abstract
Protein sequence matching presently fails to identify many structures that are highly similar, even when they are known to have the same function. The high packing densities in globular proteins lead to interdependent substitutions, which have not previously been considered for amino acid similarities. At present, sequence matching compares sequences based only upon the similarities of single amino acids, ignoring the fact that in densely packed protein, there are additional conservative substitutions representing exchanges between two interacting amino acids, such as a small-large pair changing to a large-small pair substitutions that are not individually so conservative. Here we show that including information for such pairs of substitutions yields improved sequence matches, and that these yield significant gains in the agreements between sequence alignments and structure matches of the same protein pair. The result shows sequence segments matched where structure segments are aligned. There are gains for all 2002 collected cases where the sequence alignments that were not previously congruent with the structure matches. Our results also demonstrate a significant gain in detecting homology for “twilight zone” protein sequences. The amino acid substitution metrics derived have many other potential applications, for annotations, protein design, mutagenesis design, and empirical potential derivation.
Collapse
Affiliation(s)
- Kejue Jia
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa, USA
| | - Robert L Jernigan
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa, USA
| |
Collapse
|
7
|
Runthala A. Probabilistic divergence of a template-based modelling methodology from the ideal protocol. J Mol Model 2021; 27:25. [PMID: 33411019 DOI: 10.1007/s00894-020-04640-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 12/09/2020] [Indexed: 12/27/2022]
Abstract
Protein structural information is essential for the detailed mapping of a functional protein network. For a higher modelling accuracy and quicker implementation, template-based algorithms have been extensively deployed and redefined. The methods only assess the predicted structure against its native state/template and do not estimate the accuracy for each modelling step. A divergence measure is therefore postulated to estimate the modelling accuracy against its theoretical optimal benchmark. By freezing the domain boundaries, the divergence measures are predicted for the most crucial steps of a modelling algorithm. To precisely refine the score using weighting constants, big data analysis could further be deployed.
Collapse
Affiliation(s)
- Ashish Runthala
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, 522502, India.
| |
Collapse
|
8
|
Hameduh T, Haddad Y, Adam V, Heger Z. Homology modeling in the time of collective and artificial intelligence. Comput Struct Biotechnol J 2020; 18:3494-3506. [PMID: 33304450 PMCID: PMC7695898 DOI: 10.1016/j.csbj.2020.11.007] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 11/04/2020] [Accepted: 11/04/2020] [Indexed: 12/12/2022] Open
Abstract
Homology modeling is a method for building protein 3D structures using protein primary sequence and utilizing prior knowledge gained from structural similarities with other proteins. The homology modeling process is done in sequential steps where sequence/structure alignment is optimized, then a backbone is built and later, side-chains are added. Once the low-homology loops are modeled, the whole 3D structure is optimized and validated. In the past three decades, a few collective and collaborative initiatives allowed for continuous progress in both homology and ab initio modeling. Critical Assessment of protein Structure Prediction (CASP) is a worldwide community experiment that has historically recorded the progress in this field. Folding@Home and Rosetta@Home are examples of crowd-sourcing initiatives where the community is sharing computational resources, whereas RosettaCommons is an example of an initiative where a community is sharing a codebase for the development of computational algorithms. Foldit is another initiative where participants compete with each other in a protein folding video game to predict 3D structure. In the past few years, contact maps deep machine learning was introduced to the 3D structure prediction process, adding more information and increasing the accuracy of models significantly. In this review, we will take the reader in a journey of exploration from the beginnings to the most recent turnabouts, which have revolutionized the field of homology modeling. Moreover, we discuss the new trends emerging in this rapidly growing field.
Collapse
Affiliation(s)
- Tareq Hameduh
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
| | - Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Vojtech Adam
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| |
Collapse
|
9
|
Alberto AVP, da Silva Ferreira NC, Soares RF, Alves LA. Molecular Modeling Applied to the Discovery of New Lead Compounds for P2 Receptors Based on Natural Sources. Front Pharmacol 2020; 11:01221. [PMID: 33117147 PMCID: PMC7553047 DOI: 10.3389/fphar.2020.01221] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 07/27/2020] [Indexed: 12/24/2022] Open
Abstract
P2 receptors are a family of transmembrane receptors activated by nucleotides and nucleosides. Two classes have been described in mammals, P2X and P2Y, which are implicated in various diseases. Currently, only P2Y12 has medicines approved for clinical use as antiplatelet agents and natural products have emerged as a source of new drugs with action on P2 receptors due to the diversity of chemical structures. In drug discovery, in silico virtual screening (VS) techniques have become popular because they have numerous advantages, which include the evaluation of thousands of molecules against a target, usually proteins, faster and cheaper than classical high throughput screening (HTS). The number of studies using VS techniques has been growing in recent years and has led to the discovery of new molecules of natural origin with action on different P2X and P2Y receptors. Using different algorithms it is possible to obtain information on absorption, distribution, metabolism, toxicity, as well as predictions on biological activity and the lead-likeness of the selected hits. Selected biomolecules may then be tested by molecular dynamics and, if necessary, rationally designed or modified to improve their interaction for the target. The algorithms of these in silico tools are being improved to permit the precision development of new drugs and, in the future, this process will take the front of drug development against some central nervous system (CNS) disorders. Therefore, this review discusses the methodologies of in silico tools concerning P2 receptors, as well as future perspectives and discoveries, such as the employment of artificial intelligence in drug discovery.
Collapse
Affiliation(s)
- Anael Viana Pinto Alberto
- Laboratory of Cellular Communication, Oswaldo Cruz Institute, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | | | - Rafael Ferreira Soares
- Laboratory of Functional Genomics and Bioinformatics, Oswaldo Cruz Institute, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | - Luiz Anastacio Alves
- Laboratory of Cellular Communication, Oswaldo Cruz Institute, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| |
Collapse
|
10
|
Runthala A, Chowdhury S. Refined template selection and combination algorithm significantly improves template-based modeling accuracy. J Bioinform Comput Biol 2020; 17:1950006. [PMID: 31057073 DOI: 10.1142/s0219720019500069] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
In contrast to ab-initio protein modeling methodologies, comparative modeling is considered as the most popular and reliable algorithm to model protein structure. However, the selection of the best set of templates is still a major challenge. An effective template-ranking algorithm is developed to efficiently select only the reliable hits for predicting the protein structures. The algorithm employs the pairwise as well as multiple sequence alignments of template hits to rank and select the best possible set of templates. It captures several key sequences and structural information of template hits and converts into scores to effectively rank them. This selected set of templates is used to model a target. Modeling accuracy of the algorithm is tested and evaluated on TBM-HA domain containing CASP8, CASP9 and CASP10 targets. On an average, this template ranking and selection algorithm improves GDT-TS, GDT-HA and TM_Score by 3.531, 4.814 and 0.022, respectively. Further, it has been shown that the inclusion of structurally similar templates with ample conformational diversity is crucial for the modeling algorithm to maximally as well as reliably span the target sequence and construct its near-native model. The optimal model sampling also holds the key to predict the best possible target structure.
Collapse
Affiliation(s)
- Ashish Runthala
- 1 Department of Biological Sciences, Birla Institute of Technology and Science, Pilani-333031, India
| | - Shibasish Chowdhury
- 1 Department of Biological Sciences, Birla Institute of Technology and Science, Pilani-333031, India
| |
Collapse
|
11
|
Seffernick J, Harvey SR, Wysocki VH, Lindert S. Predicting Protein Complex Structure from Surface-Induced Dissociation Mass Spectrometry Data. ACS CENTRAL SCIENCE 2019; 5:1330-1341. [PMID: 31482115 PMCID: PMC6716128 DOI: 10.1021/acscentsci.8b00912] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2018] [Indexed: 05/23/2023]
Abstract
Recently, mass spectrometry (MS) has become a viable method for elucidation of protein structure. Surface-induced dissociation (SID), colliding multiply charged protein complexes or other ions with a surface, has been paired with native MS to provide useful structural information such as connectivity and topology for many different protein complexes. We recently showed that SID gives information not only on connectivity and topology but also on relative interface strengths. However, SID has not yet been coupled with computational structure prediction methods that could use the sparse information from SID to improve the prediction of quaternary structures, i.e., how protein subunits interact with each other to form complexes. Protein-protein docking, a computational method to predict the quaternary structure of protein complexes, can be used in combination with subunit structures from X-ray crystallography and NMR in situations where it is difficult to obtain an experimental structure of an entire complex. While de novo structure prediction can be successful, many studies have shown that inclusion of experimental data can greatly increase prediction accuracy. In this study, we show that the appearance energy (AE, defined as 10% fragmentation) extracted from SID can be used in combination with Rosetta to successfully evaluate protein-protein docking poses. We developed an improved model to predict measured SID AEs and incorporated this model into a scoring function that combines the RosettaDock scoring function with a novel SID scoring term, which quantifies agreement between experiments and structures generated from RosettaDock. As a proof of principle, we tested the effectiveness of these restraints on 57 systems using ideal SID AE data (AE determined from crystal structures using the predictive model). When theoretical AEs were used, the RMSD of the selected structure improved or stayed the same in 95% of cases. When experimental SID data were incorporated on a different set of systems, the method predicted near-native structures (less than 2 Å root-mean-square deviation, RMSD, from native) for 6/9 tested cases, while unrestrained RosettaDock (without SID data) only predicted 3/9 such cases. Score versus RMSD funnel profiles were also improved when SID data were included. Additionally, we developed a confidence measure to evaluate predicted model quality in the absence of a crystal structure.
Collapse
|
12
|
Methods for the Refinement of Protein Structure 3D Models. Int J Mol Sci 2019; 20:ijms20092301. [PMID: 31075942 PMCID: PMC6539982 DOI: 10.3390/ijms20092301] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 04/24/2019] [Accepted: 05/07/2019] [Indexed: 12/25/2022] Open
Abstract
The refinement of predicted 3D protein models is crucial in bringing them closer towards experimental accuracy for further computational studies. Refinement approaches can be divided into two main stages: The sampling and scoring stages. Sampling strategies, such as the popular Molecular Dynamics (MD)-based protocols, aim to generate improved 3D models. However, generating 3D models that are closer to the native structure than the initial model remains challenging, as structural deviations from the native basin can be encountered due to force-field inaccuracies. Therefore, different restraint strategies have been applied in order to avoid deviations away from the native structure. For example, the accurate prediction of local errors and/or contacts in the initial models can be used to guide restraints. MD-based protocols, using physics-based force fields and smart restraints, have made significant progress towards a more consistent refinement of 3D models. The scoring stage, including energy functions and Model Quality Assessment Programs (MQAPs) are also used to discriminate near-native conformations from non-native conformations. Nevertheless, there are often very small differences among generated 3D models in refinement pipelines, which makes model discrimination and selection problematic. For this reason, the identification of the most native-like conformations remains a major challenge.
Collapse
|
13
|
Zacco E, Graña-Montes R, Martin SR, de Groot NS, Alfano C, Tartaglia GG, Pastore A. RNA as a key factor in driving or preventing self-assembly of the TAR DNA-binding protein 43. J Mol Biol 2019; 431:1671-1688. [PMID: 30742796 PMCID: PMC6461199 DOI: 10.1016/j.jmb.2019.01.028] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 01/22/2019] [Accepted: 01/24/2019] [Indexed: 12/12/2022]
Abstract
Amyotrophic lateral sclerosis and frontotemporal lobar degeneration are incurable motor neuron diseases associated with muscle weakness, paralysis and respiratory failure. Accumulation of TAR DNA-binding protein 43 (TDP-43) as toxic cytoplasmic inclusions is one of the hallmarks of these pathologies. TDP-43 is an RNA-binding protein responsible for regulating RNA transcription, splicing, transport and translation. Aggregated TDP-43 does not retain its physiological function. Here, we exploit the ability of TDP-43 to bind specific RNA sequences to validate our hypothesis that the native partners of a protein can be used to interfere with its ability to self-assemble into aggregates. We propose that binding of TDP-43 to specific RNA can compete with protein aggregation. This study provides a solid proof of concept to the hypothesis that natural interactions can be exploited to increase protein solubility and could be adopted as a more general rational therapeutic strategy. We found that binding of the RRM domains of TDP-43 to specific RNA competes with protein aggregation. This study provides a solid proof of concept to the hypothesis that natural interactions can be exploited to increase protein solubility. The concept could be adopted as a more general rationale for protein-specific drug design.
Collapse
Affiliation(s)
- Elsa Zacco
- UK Dementia Research Institute at King's College London, London, SE5 9RT, United Kingdom; The Wohl Institute at King's College London, London, SE5 9RT, United Kingdom
| | - Ricardo Graña-Montes
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
| | | | - Natalia Sanchez de Groot
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
| | | | - Gian Gaetano Tartaglia
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain; Institutio Catalan de Recerca I Estudis Avancats (ICREA), 23 Passeig Lluıs Companys, 08010 Barcelona, Spain; Department of Biology 'Charles Darwin', Sapienza University of Rome, P.le A. Moro 5, Rome 00185, Italy.
| | - Annalisa Pastore
- UK Dementia Research Institute at King's College London, London, SE5 9RT, United Kingdom; The Wohl Institute at King's College London, London, SE5 9RT, United Kingdom; Scuola Normale Superiore, Piazza dei Cavalieri, Pisa, 56126, Italy.
| |
Collapse
|
14
|
Studer G, Tauriello G, Bienert S, Waterhouse AM, Bertoni M, Bordoli L, Schwede T, Lepore R. Modeling of Protein Tertiary and Quaternary Structures Based on Evolutionary Information. Methods Mol Biol 2019; 1851:301-316. [PMID: 30298405 DOI: 10.1007/978-1-4939-8736-8_17] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Proteins are subject to evolutionary forces that shape their three-dimensional structure to meet specific functional demands. The knowledge of the structure of a protein is therefore instrumental to gain information about the molecular basis of its function. However, experimental structure determination is inherently time consuming and expensive, making it impossible to follow the explosion of sequence data deriving from genome-scale projects. As a consequence, computational structural modeling techniques have received much attention and established themselves as a valuable complement to experimental structural biology efforts. Among these, comparative modeling remains the method of choice to model the three-dimensional structure of a protein when homology to a protein of known structure can be detected.The general strategy consists of using experimentally determined structures of proteins as templates for the generation of three-dimensional models of related family members (targets) of which the structure is unknown. This chapter provides a description of the individual steps needed to obtain a comparative model using SWISS-MODEL, one of the most widely used automated servers for protein structure homology modeling.
Collapse
Affiliation(s)
- Gabriel Studer
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Stefan Bienert
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Andrew Mark Waterhouse
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Martino Bertoni
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Lorenza Bordoli
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Rosalba Lepore
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland.
| |
Collapse
|
15
|
Pfeiffenberger E, Bates PA. Predicting improved protein conformations with a temporal deep recurrent neural network. PLoS One 2018; 13:e0202652. [PMID: 30180164 PMCID: PMC6122789 DOI: 10.1371/journal.pone.0202652] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2018] [Accepted: 08/07/2018] [Indexed: 02/03/2023] Open
Abstract
Accurate protein structure prediction from amino acid sequence is still an unsolved problem. The most reliable methods centre on template based modelling. However, the accuracy of these models entirely depends on the availability of experimentally resolved homologous template structures. In order to generate more accurate models, extensive physics based molecular dynamics (MD) refinement simulations are performed to sample many different conformations to find improved conformational states. In this study, we propose a deep recurrent network model, called DeepTrajectory, that is able to identify these improved conformational states, with high precision, from a variety of different MD based sampling protocols. The proposed model learns the temporal patterns of features computed from MD trajectory data in order to classify whether each recorded simulation snapshot is an improved quality conformational state, decreased quality conformational state or whether there is no perceivable change in state with respect to the starting conformation. The model was trained and tested on 904 trajectories from 42 different protein systems with a cumulative number of more than 1.7 million snapshots. We show that our model outperforms other state of the art machine-learning algorithms that do not consider temporal dependencies. To our knowledge, DeepTrajectory is the first implementation of a time-dependent deep-learning protocol that is re-trainable and able to adapt to any new MD based sampling procedure, thereby demonstrating how a neural network can be used to learn the latter part of the protein folding funnel.
Collapse
Affiliation(s)
- Erik Pfeiffenberger
- Biomolecular Modelling Laboratory, The Francis Crick Institute, 1 Midland Road, London NW1 1AT, United Kingdom
| | - Paul A. Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, 1 Midland Road, London NW1 1AT, United Kingdom
| |
Collapse
|
16
|
Sieradzan AK, Golon Ł, Liwo A. Prediction of DNA and RNA structure with the NARES-2P force field and conformational space annealing. Phys Chem Chem Phys 2018; 20:19656-19663. [PMID: 30014063 DOI: 10.1039/c8cp03018a] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
A physics-based method for the prediction of the structures of nucleic acids, which is based on the physics-based 2-bead NARES-2P model of polynucleotides and global-optimization Conformational Space Annealing (CSA) algorithm has been proposed. The target structure is sought as the global-energy-minimum structure, which ignores the entropy component of the free energy but spares expensive multicanonical simulations necessary to find the conformational ensemble with the lowest free energy. The CSA algorithm has been modified to optimize its performance when treating both single and multi-chain nucleic acids. It was shown that the method finds the native fold for simple RNA molecules and DNA duplexes and with limited distance restraints, which can easily be obtained from the secondary-structure-prediction servers, complex RNA folds can be treated with using moderate computer resources.
Collapse
Affiliation(s)
- Adam K Sieradzan
- Faculty of Chemistry, University of Gdańsk, 80-308 Gdańsk, Poland.
| | | | | |
Collapse
|
17
|
Similarity/dissimilarity analysis of protein structures based on Markov random fields. Comput Biol Chem 2018; 75:45-53. [PMID: 29747075 DOI: 10.1016/j.compbiolchem.2018.04.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Revised: 03/08/2018] [Accepted: 04/23/2018] [Indexed: 11/21/2022]
Abstract
Protein Structure Similarity plays an important role in study on functional properties of proteins and evolutionary study. Many efficient methods have been proposed to advance protein structural comparison, but there are still some challenges in the contact strength definitions and similarity measures. In this work, we schemed out a new method to analyze the similarity/dissimilarity of the protein structures based on Markov random fields. We evaluated the proposed method with two experiments and compared it with the competing methods The results indicate that the proposed method exhibits a strong ability to detect the similarities/dissimilarities among the conformation of different cyclic peptides and protein structures. We also found that the alpha-C, oxygen O and N allow us to extract more conserved structures of the proteins, and Markov random fields with 2-point cliques (V) and orders 3 and 1 are more efficient in detecting the similarities/dissimilarities among different protein structures. This understanding can be used to design more powerful methods for similarities/dissimilarities analysis of different protein structures.
Collapse
|
18
|
Kryshtafovych A, Monastyrskyy B, Fidelis K, Moult J, Schwede T, Tramontano A. Evaluation of the template-based modeling in CASP12. Proteins 2017; 86 Suppl 1:321-334. [PMID: 29159950 DOI: 10.1002/prot.25425] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Revised: 10/22/2017] [Accepted: 11/16/2017] [Indexed: 01/29/2023]
Abstract
The article describes results of numerical evaluation of CASP12 models submitted on targets for which structural templates could be identified and for which servers produced models of relatively high accuracy. The emphasis is on analysis of details of models, and how well the models compete with experimental structures. Performance of contributing research groups is measured in terms of backbone accuracy, all-atom local geometry, and the ability to estimate local errors in models. Separate analyses for all participating groups and automatic servers were carried out. Compared with the last CASP, two years ago, there have been significant improvements in a number of areas, particularly the accuracy of protein backbone atoms, accuracy of sequence alignment between models and available structures, increased accuracy over that which can be obtained from simple copying of a closest template, and accuracy of modeling of sub-structures not present in the closest template. These advancements are likely associated with more effective strategies to build non-template regions of the targets ab initio, better algorithms to combine information from multiple templates, enhanced refinement methods, and better methods for estimating model accuracy.
Collapse
Affiliation(s)
- Andriy Kryshtafovych
- Protein Structure Prediction Center, Genome Center, University of California, Davis, California
| | - Bohdan Monastyrskyy
- Protein Structure Prediction Center, Genome Center, University of California, Davis, California
| | - Krzysztof Fidelis
- Protein Structure Prediction Center, Genome Center, University of California, Davis, California
| | - John Moult
- Institute for Bioscience and Biotechnology Research and Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Anna Tramontano
- Department of Biochemical Sciences, Sapienza - University of Rome, P. le A. Moro, 5, Rome, 00185
| |
Collapse
|
19
|
TonB-Dependent Heme/Hemoglobin Utilization by Caulobacter crescentus HutA. J Bacteriol 2017; 199:JB.00723-16. [PMID: 28031282 DOI: 10.1128/jb.00723-16] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2016] [Accepted: 11/18/2016] [Indexed: 11/20/2022] Open
Abstract
Siderophore nutrition tests with Caulobacter crescentus strain NA1000 revealed that it utilized a variety of ferric hydroxamate siderophores, including asperchromes, ferrichromes, ferrichrome A, malonichrome, and ferric aerobactin, as well as hemin and hemoglobin. C. crescentus did not transport ferrioxamine B or ferric catecholates. Because it did not use ferric enterobactin, the catecholate aposiderophore was an effective agent for iron deprivation. We determined the kinetics and thermodynamics of [59Fe]apoferrichrome and 59Fe-citrate binding and transport by NA1000. Its affinity and uptake rate for ferrichrome (equilibrium dissociation constant [Kd ], 1 nM; Michaelis-Menten constant [KM ], 0.1 nM; Vmax, 19 pMol/109 cells/min) were similar to those of Escherichia coli FhuA. Transport properties for 59Fe-citrate were similar to those of E. coli FecA (KM , 5.3 nM; Vmax, 29 pMol/109 cells/min). Bioinformatic analyses implicated Fur-regulated loci 00028, 00138, 02277, and 03023 as TonB-dependent transporters (TBDT) that participate in iron acquisition. We resolved TBDT with elevated expression under high- or low-iron conditions by SDS-PAGE of sodium sarcosinate cell envelope extracts, excised bands of interest, and analyzed them by mass spectrometry. These data identified five TBDT: three were overexpressed during iron deficiency (00028, 02277, and 03023), and 2 were overexpressed during iron repletion (00210 and 01196). CLUSTALW analyses revealed homology of putative TBDT 02277 to Escherichia coli FepA and BtuB. A Δ02277 mutant did not transport hemin or hemoglobin in nutrition tests, leading us to designate the 02277 structural gene as hutA (for heme/hemoglobin utilization).IMPORTANCE The physiological roles of the 62 putative TBDT of C. crescentus are mostly unknown, as are their evolutionary relationships to TBDT of other bacteria. We biochemically studied the iron uptake systems of C. crescentus, identified potential iron transporters, and clarified the phylogenetic relationships among its numerous TBDT. Our findings identified the first outer membrane protein involved in iron acquisition by C. crescentus, its heme/hemoglobin transporter (HutA).
Collapse
|
20
|
Lipska AG, Seidman SR, Sieradzan AK, Giełdoń A, Liwo A, Scheraga HA. Molecular dynamics of protein A and a WW domain with a united-residue model including hydrodynamic interaction. J Chem Phys 2016; 144:184110. [PMID: 27179474 PMCID: PMC4866947 DOI: 10.1063/1.4948710] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Accepted: 04/25/2016] [Indexed: 01/01/2023] Open
Abstract
The folding of the N-terminal part of the B-domain of staphylococcal protein A (PDB ID: 1BDD, a 46-residue three-α-helix bundle) and the formin-binding protein 28 WW domain (PDB ID: 1E0L, a 37-residue three-stranded anti-parallel β protein) was studied by means of Langevin dynamics with the coarse-grained UNRES force field to assess the influence of hydrodynamic interactions on protein-folding pathways and kinetics. The unfolded, intermediate, and native-like structures were identified by cluster analysis, and multi-exponential functions were fitted to the time dependence of the fractions of native and intermediate structures, respectively, to determine bulk kinetics. It was found that introducing hydrodynamic interactions slows down both the formation of an intermediate state and the transition from the collapsed structures to the final native-like structures by creating multiple kinetic traps. Therefore, introducing hydrodynamic interactions considerably slows the folding, as opposed to the results obtained from earlier studies with the use of Gō-like models.
Collapse
Affiliation(s)
- Agnieszka G Lipska
- Laboratory of Molecular Modeling, Faculty of Chemistry, University of Gdańsk, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Steven R Seidman
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853-1301, USA
| | - Adam K Sieradzan
- Laboratory of Molecular Modeling, Faculty of Chemistry, University of Gdańsk, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Artur Giełdoń
- Laboratory of Molecular Modeling, Faculty of Chemistry, University of Gdańsk, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Adam Liwo
- Laboratory of Molecular Modeling, Faculty of Chemistry, University of Gdańsk, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Harold A Scheraga
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853-1301, USA
| |
Collapse
|
21
|
Bhattacharya D, Nowotny J, Cao R, Cheng J. 3Drefine: an interactive web server for efficient protein structure refinement. Nucleic Acids Res 2016; 44:W406-9. [PMID: 27131371 PMCID: PMC4987902 DOI: 10.1093/nar/gkw336] [Citation(s) in RCA: 290] [Impact Index Per Article: 36.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2016] [Accepted: 04/15/2016] [Indexed: 11/14/2022] Open
Abstract
3Drefine is an interactive web server for consistent and computationally efficient protein structure refinement with the capability to perform web-based statistical and visual analysis. The 3Drefine refinement protocol utilizes iterative optimization of hydrogen bonding network combined with atomic-level energy minimization on the optimized model using a composite physics and knowledge-based force fields for efficient protein structure refinement. The method has been extensively evaluated on blind CASP experiments as well as on large-scale and diverse benchmark datasets and exhibits consistent improvement over the initial structure in both global and local structural quality measures. The 3Drefine web server allows for convenient protein structure refinement through a text or file input submission, email notification, provided example submission and is freely available without any registration requirement. The server also provides comprehensive analysis of submissions through various energy and statistical feedback and interactive visualization of multiple refined models through the JSmol applet that is equipped with numerous protein model analysis tools. The web server has been extensively tested and used by many users. As a result, the 3Drefine web server conveniently provides a useful tool easily accessible to the community. The 3Drefine web server has been made publicly available at the URL: http://sysbio.rnet.missouri.edu/3Drefine/.
Collapse
Affiliation(s)
| | - Jackson Nowotny
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Renzhi Cao
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA Informatics Institute, University of Missouri, Columbia, MO 65211, USA C. Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
22
|
Lee MS, Olson MA. Assessment of Detection and Refinement Strategies for de novo Protein Structures Using Force Field and Statistical Potentials. J Chem Theory Comput 2015; 3:312-24. [PMID: 26627174 DOI: 10.1021/ct600195f] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
De novo predictions of protein structures at high resolution are plagued by the problem of detecting the native conformation from false energy minima. In this work, we provide an assessment of various detection and refinement protocols on a small subset of the second-generation all-atom Rosetta decoy set (Tsai et al. Proteins 2003, 53, 76-87) using two potentials: the all-atom CHARMM PARAM22 force field combined with generalized Born/surface-area (GB-SA) implicit solvation and the DFIRE-AA statistical potential. Detection schemes included DFIRE-AA conformational scoring and energy minimization followed by scoring with both GB-SA and DFIRE-AA potentials. Refinement methods included short-time (1-ps) molecular dynamics simulations, temperature-based replica exchange molecular dynamics, and a new computational unfold/refold procedure. Refinement methods include temperature-based replica exchange molecular dynamics and a new computational unfold/refold procedure. Our results indicate that simple detection with only minimization is the best protocol for finding the most nativelike structures in the decoy set. The refinement techniques that we tested are generally unsuccessful in improving detection; however, they provide marginal improvements to some of the decoy structures. Future directions in the development of refinement techniques are discussed in the context of the limitations of the protocols evaluated in this study.
Collapse
Affiliation(s)
- Michael S Lee
- Computational and Information Sciences Directorate, U.S. Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, and Department of Cell Biology and Biochemistry, U.S. Army Medical Research Institute of Infectious Diseases, Frederick, Maryland 21702
| | - Mark A Olson
- Computational and Information Sciences Directorate, U.S. Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, and Department of Cell Biology and Biochemistry, U.S. Army Medical Research Institute of Infectious Diseases, Frederick, Maryland 21702
| |
Collapse
|
23
|
ProTSAV: A protein tertiary structure analysis and validation server. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2015; 1864:11-9. [PMID: 26478257 DOI: 10.1016/j.bbapap.2015.10.004] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Revised: 09/26/2015] [Accepted: 10/14/2015] [Indexed: 01/06/2023]
Abstract
Quality assessment of predicted model structures of proteins is as important as the protein tertiary structure prediction. A highly efficient quality assessment of predicted model structures directs further research on function. Here we present a new server ProTSAV, capable of evaluating predicted model structures based on some popular online servers and standalone tools. ProTSAV furnishes the user with a single quality score in case of individual protein structure along with a graphical representation and ranking in case of multiple protein structure assessment. The server is validated on ~64,446 protein structures including experimental structures from RCSB and predicted model structures for CASP targets and from public decoy sets. ProTSAV succeeds in predicting quality of protein structures with a specificity of 100% and a sensitivity of 98% on experimentally solved structures and achieves a specificity of 88%and a sensitivity of 91% on predicted protein structures of CASP11 targets under 2Å.The server overcomes the limitations of any single server/method and is seen to be robust in helping in quality assessment. ProTSAV is freely available at http://www.scfbio-iitd.res.in/software/proteomics/protsav.jsp.
Collapse
|
24
|
Yang J, Zhang W, He B, Walker SE, Zhang H, Govindarajoo B, Virtanen J, Xue Z, Shen HB, Zhang Y. Template-based protein structure prediction in CASP11 and retrospect of I-TASSER in the last decade. Proteins 2015; 84 Suppl 1:233-46. [PMID: 26343917 DOI: 10.1002/prot.24918] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Revised: 08/13/2015] [Accepted: 08/31/2015] [Indexed: 01/26/2023]
Abstract
We report the structure prediction results of a new composite pipeline for template-based modeling (TBM) in the 11th CASP experiment. Starting from multiple structure templates identified by LOMETS based meta-threading programs, the QUARK ab initio folding program is extended to generate initial full-length models under strong constraints from template alignments. The final atomic models are then constructed by I-TASSER based fragment reassembly simulations, followed by the fragment-guided molecular dynamic simulation and the MQAP-based model selection. It was found that the inclusion of QUARK-TBM simulations as an intermediate modeling step could help improve the quality of the I-TASSER models for both Easy and Hard TBM targets. Overall, the average TM-score of the first I-TASSER model is 12% higher than that of the best LOMETS templates, with the RMSD in the same threading-aligned regions reduced from 5.8 to 4.7 Å. Nevertheless, there are nearly 18% of TBM domains with the templates deteriorated by the structure assembly pipeline, which may be attributed to the errors of secondary structure and domain orientation predictions that propagate through and degrade the procedures of template identification and final model selections. To examine the record of progress, we made a retrospective report of the I-TASSER pipeline in the last five CASP experiments (CASP7-11). The data show no clear progress of the LOMETS threading programs over PSI-BLAST; but obvious progress on structural improvement relative to threading templates was witnessed in recent CASP experiments, which is probably attributed to the integration of the extended ab initio folding simulation with the threading assembly pipeline and the introduction of atomic-level structure refinements following the reduced modeling simulations. Proteins 2016; 84(Suppl 1):233-246. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Jianyi Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Wenxuan Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Baoji He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Sara Elizabeth Walker
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Hongjiu Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Brandon Govindarajoo
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Jouko Virtanen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Zhidong Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Hong-Bin Shen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109.
| |
Collapse
|
25
|
Manavalan B, Lee J, Lee J. Random forest-based protein model quality assessment (RFMQA) using structural features and potential energy terms. PLoS One 2014; 9:e106542. [PMID: 25222008 PMCID: PMC4164442 DOI: 10.1371/journal.pone.0106542] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 08/06/2014] [Indexed: 01/28/2023] Open
Abstract
Recently, predicting proteins three-dimensional (3D) structure from its sequence information has made a significant progress due to the advances in computational techniques and the growth of experimental structures. However, selecting good models from a structural model pool is an important and challenging task in protein structure prediction. In this study, we present the first application of random forest based model quality assessment (RFMQA) to rank protein models using its structural features and knowledge-based potential energy terms. The method predicts a relative score of a model by using its secondary structure, solvent accessibility and knowledge-based potential energy terms. We trained and tested the RFMQA method on CASP8 and CASP9 targets using 5-fold cross-validation. The correlation coefficient between the TM-score of the model selected by RFMQA (TMRF) and the best server model (TMbest) is 0.945. We benchmarked our method on recent CASP10 targets by using CASP8 and 9 server models as a training set. The correlation coefficient and average difference between TMRF and TMbest over 95 CASP10 targets are 0.984 and 0.0385, respectively. The test results show that our method works better in selecting top models when compared with other top performing methods. RFMQA is available for download from http://lee.kias.re.kr/RFMQA/RFMQA_eval.tar.gz.
Collapse
Affiliation(s)
- Balachandran Manavalan
- Center for In Silico Protein Science, School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
| | - Juyong Lee
- Center for In Silico Protein Science, School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science, School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
- * E-mail:
| |
Collapse
|
26
|
Mirjalili V, Noyes K, Feig M. Physics-based protein structure refinement through multiple molecular dynamics trajectories and structure averaging. Proteins 2014; 82 Suppl 2:196-207. [PMID: 23737254 PMCID: PMC4212311 DOI: 10.1002/prot.24336] [Citation(s) in RCA: 87] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2013] [Revised: 04/30/2013] [Accepted: 05/09/2013] [Indexed: 12/26/2022]
Abstract
We used molecular dynamics (MD) simulations for structure refinement of Critical Assessment of Techniques for Protein Structure Prediction 10 (CASP10) targets. Refinement was achieved by selecting structures from the MD-based ensembles followed by structural averaging. The overall performance of this method in CASP10 is described, and specific aspects are analyzed in detail to provide insight into key components. In particular, the use of different restraint types, sampling from multiple short simulations versus a single long simulation, the success of a quality assessment criterion, the application of scoring versus averaging, and the impact of a final refinement step are discussed in detail.
Collapse
Affiliation(s)
- Vahid Mirjalili
- Department of Mechanical Engineering Michigan State University East Lansing, MI 48824; USA
- Department of Biochemistry and Molecular Biology Michigan State University East Lansing, MI 48824; USA
| | - Keenan Noyes
- Department of Chemistry Michigan State University East Lansing, MI 48824; USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology Michigan State University East Lansing, MI 48824; USA
- Department of Chemistry Michigan State University East Lansing, MI 48824; USA
| |
Collapse
|
27
|
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)--round x. Proteins 2014; 82 Suppl 2:1-6. [PMID: 24344053 PMCID: PMC4394854 DOI: 10.1002/prot.24452] [Citation(s) in RCA: 312] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2013] [Accepted: 10/21/2013] [Indexed: 12/28/2022]
Abstract
This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.
Collapse
Affiliation(s)
- John Moult
- Institute for Bioscience and Biotechnology Research, and Department of Cell Biology and Molecular Genetics, University of Maryland, Rockville, Maryland 20850
| | | | | | - Torsten Schwede
- University of Basel, Biozentrum & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Anna Tramontano
- Department of Physics and Istituto Pasteur-Fondazione Cenci Bolognetti, Sapienza University of Rome, 00185 Rome, Italy
| |
Collapse
|
28
|
Holtby D, Li SC, Li M. LoopWeaver: loop modeling by the weighted scaling of verified proteins. J Comput Biol 2014; 20:212-23. [PMID: 23461572 DOI: 10.1089/cmb.2012.0078] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Modeling loops is a necessary step in protein structure determination, even with experimental nuclear magnetic resonance (NMR) data, it is widely known to be difficult. Database techniques have the advantage of producing a higher proportion of predictions with subangstrom accuracy when compared with ab initio techniques, but the disadvantage of also producing a higher proportion of clashing or highly inaccurate predictions. We introduce LoopWeaver, a database method that uses multidimensional scaling to achieve better, clash-free placement of loops obtained from a database of protein structures. This allows us to maintain the above-mentioned advantage while avoiding the disadvantage. Test results show that we achieve significantly better results than all other methods, including Modeler, Loopy, SuperLooper, and Rapper, before refinement. With refinement, our results (LoopWeaver and Loopy consensus) are better than ROSETTA, with 0.42 Å RMSD on average for 206 length 6 loops, 0.64 Å local RMSD for 168 length 7 loops, 0.81Å RMSD for 117 length 8 loops, and 0.98 Å RMSD for length 9 loops, while ROSETTA has 0.55, 0.79, 1.16, 1.42, respectively, at the same average time limit (3 hours). When we allow ROSETTA to run for over a week, it approaches, but does not surpass, our accuracy.
Collapse
Affiliation(s)
- Daniel Holtby
- David R. Chariton School of Computer Science, University of Waterloo, Waterloo, Canada.
| | | | | |
Collapse
|
29
|
Abstract
This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.
Collapse
|
30
|
Kryshtafovych A, Fidelis K, Moult J. CASP10 results compared to those of previous CASP experiments. Proteins 2013; 82 Suppl 2:164-74. [PMID: 24150928 DOI: 10.1002/prot.24448] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Revised: 10/04/2013] [Accepted: 10/04/2013] [Indexed: 11/11/2022]
Abstract
We compare results of the community efforts in modeling protein structures in the tenth CASP experiment, with those in earlier CASPs particularly in CASP5, a decade ago. There is a substantial improvement in template based model accuracy as reflected in more successful modeling of regions of structure not easily derived from a single experimental structure template, most likely reflecting intensive work within the modeling community in developing methods that make use of multiple templates, as well as the increased number of experimental structures available. Deriving structural information not obvious from a template is the most demanding as well as one of the most useful tasks that modeling can perform. Thus this is gratifying progress. By contrast, overall backbone accuracy of models appears little changed in the last decade. This puzzling result is explained by two factors--increased database size in some ways makes it harder to choose the best available templates, and the increased intrinsic difficulty of CASP targets as experimental work has progressed to larger and more unusual structures. There is no detectable recent improvement in template-free modeling, but again, this may reflect the changing nature of CASP targets.
Collapse
|
31
|
Eggimann BL, Vostrikov VV, Veglia G, Siepmann JI. Modeling helical proteins using residual dipolar couplings, sparse long-range distance constraints and a simple residue-based force field. Theor Chem Acc 2013; 132:1388. [PMID: 24639619 DOI: 10.1007/s00214-013-1388-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We present a fast and simple protocol to obtain moderate-resolution backbone structures of helical proteins. This approach utilizes a combination of sparse backbone NMR data (residual dipolar couplings and paramagnetic relaxation enhancements) or EPR data with a residue-based force field and Monte Carlo/simulated annealing protocol to explore the folding energy landscape of helical proteins. By using only backbone NMR data, which are relatively easy to collect and analyze, and strategically placed spin relaxation probes, we show that it is possible to obtain protein structures with correct helical topology and backbone RMS deviations well below 4 Å. This approach offers promising alternatives for the structural determination of proteins in which nuclear Overha-user effect data are difficult or impossible to assign and produces initial models that will speed up the high-resolution structure determination by NMR spectroscopy.
Collapse
Affiliation(s)
- Becky L Eggimann
- Department of Chemistry, Chemical Theory Center, University of Minnesota, 207 Pleasant St. SE, Minneapolis, MN 55455, USA
| | - Vitaly V Vostrikov
- Molecular Biology and Biophysics, University of Minnesota, 321 Church St. SE, Minneapolis, MN 55455, USA
| | - Gianluigi Veglia
- Department of Chemistry, Chemical Theory Center, University of Minnesota, 207 Pleasant St. SE, Minneapolis, MN 55455, USA
| | - J Ilja Siepmann
- Department of Chemistry, Chemical Theory Center, University of Minnesota, 207 Pleasant St. SE, Minneapolis, MN 55455, USA
| |
Collapse
|
32
|
Palopoli N, Lanzarotti E, Parisi G. BeEP Server: Using evolutionary information for quality assessment of protein structure models. Nucleic Acids Res 2013; 41:W398-405. [PMID: 23729471 PMCID: PMC3692104 DOI: 10.1093/nar/gkt453] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The BeEP Server (http://www.embnet.qb.fcen.uba.ar/embnet/beep.php) is an online resource aimed to help in the endgame of protein structure prediction. It is able to rank submitted structural models of a protein through an explicit use of evolutionary information, a criterion differing from structural or energetic considerations commonly used in other assessment programs. The idea behind BeEP (Best Evolutionary Pattern) is to benefit from the substitution pattern derived from structural constraints present in a set of homologous proteins adopting a given protein conformation. The BeEP method uses a model of protein evolution that takes into account the structure of a protein to build site-specific substitution matrices. The suitability of these substitution matrices is assessed through maximum likelihood calculations from which position-specific and global scores can be derived. These scores estimate how well the structural constraints derived from each structural model are represented in a sequence alignment of homologous proteins. Our assessment on a subset of proteins from the Critical Assessment of techniques for protein Structure Prediction (CASP) experiment has shown that BeEP is capable of discriminating the models and selecting one or more native-like structures. Moreover, BeEP is not explicitly parameterized to find structural similarities between models and given targets, potentially helping to explore the conformational ensemble of the native state.
Collapse
Affiliation(s)
- Nicolas Palopoli
- Departamento de Ciencia y Tecnologia, Universidad Nacional de Quilmes, B1876BXD, Bernal, Buenos Aires, Argentina, Centre for Biological Sciences, University of Southampton, SO17 1BJ, Southampton, UK and Departamento de Quimica Biologica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, C1428EHA, Buenos Aires, Argentina
| | - Esteban Lanzarotti
- Departamento de Ciencia y Tecnologia, Universidad Nacional de Quilmes, B1876BXD, Bernal, Buenos Aires, Argentina, Centre for Biological Sciences, University of Southampton, SO17 1BJ, Southampton, UK and Departamento de Quimica Biologica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, C1428EHA, Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnologia, Universidad Nacional de Quilmes, B1876BXD, Bernal, Buenos Aires, Argentina, Centre for Biological Sciences, University of Southampton, SO17 1BJ, Southampton, UK and Departamento de Quimica Biologica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, C1428EHA, Buenos Aires, Argentina
- *To whom correspondence should be addressed. Tel: +54 011 43657100 (ext. 4135); Fax: +54 011 437657101;
| |
Collapse
|
33
|
Capturing native/native like structures with a physico-chemical metric (pcSM) in protein folding. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1834:1520-31. [PMID: 23665455 DOI: 10.1016/j.bbapap.2013.04.023] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Revised: 04/12/2013] [Accepted: 04/15/2013] [Indexed: 12/15/2022]
Abstract
Specification of the three dimensional structure of a protein from its amino acid sequence, also called a "Grand Challenge" problem, has eluded a solution for over six decades. A modestly successful strategy has evolved over the last couple of decades based on development of scoring functions (e.g. mimicking free energy) that can capture native or native-like structures from an ensemble of decoys generated as plausible candidates for the native structure. A scoring function must be fast enough in discriminating the native from unfolded/misfolded structures, and requires validation on a large data set(s) to generate sufficient confidence in the score. Here we develop a scoring function called pcSM that detects true native structure in the top 5 with 93% accuracy from an ensemble of candidate structures. If we eliminate the native from ensemble of decoys then pcSM is able to capture near native structure (RMSD<=5Ǻ) in top 10 with 86% accuracy. The parameters considered in pcSM are a C-alpha Euclidean metric, secondary structural propensity, surface areas and an intramolecular energy function. pcSM has been tested on 415 systems consisting 142,698 decoys (public and CASP-largest reported hitherto in literature). The average rank for the native is 2.38, a significant improvement over that existing in literature. In-silico protein structure prediction requires robust scoring technique(s). Therefore, pcSM is easily amenable to integration into a successful protein structure prediction strategy. The tool is freely available at http://www.scfbio-iitd.res.in/software/pcsm.jsp.
Collapse
|
34
|
Facelli JC, Hurdle JF, Mitchell JA. Medical Informatics and Bioinformatics. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
In the last 50 years, computational applications have been developed to aid clinicians and researchers alike in the broad field of Biomedicine. Adopted early in the evolution of the field, the term medical informatics has been applied to the various sub-disciplines of computer applications and methods of organizing and using information principles and techniques in both clinical care and biomedical research. This chapter provides a broad survey of the complex discipline of Biomedical Informatics with special emphasis on the key emerging sub-disciplines such as translational informatics, clinical research informatics, consumer health informatics, and the informatics of the “omics” sciences, systems biology, and nanotechnology.
Collapse
|
35
|
Brylinski M. The utility of artificially evolved sequences in protein threading and fold recognition. J Theor Biol 2013; 328:77-88. [PMID: 23542050 DOI: 10.1016/j.jtbi.2013.03.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Revised: 01/24/2013] [Accepted: 03/18/2013] [Indexed: 12/23/2022]
Abstract
Template-based protein structure prediction plays an important role in Functional Genomics by providing structural models of gene products, which can be utilized by structure-based approaches to function inference. From a systems level perspective, the high structural coverage of gene products in a given organism is critical. Despite continuous efforts towards the development of more sensitive threading approaches, confident structural models cannot be constructed for a considerable fraction of proteins due to difficulties in recognizing low-sequence identity templates with a similar fold to the target. Here we introduce a new modeling stratagem, which employs a library of synthetic sequences to improve template ranking in fold recognition by sequence profile-based methods. We developed a new method for the optimization of generic protein-like amino acid sequences to stabilize the respective structures using a combined empirical scoring function, which is compatible with these commonly used in protein threading and fold recognition. We show that the artificially evolved sequences, whose average sequence identity to the wild-type sequences is as low as 13.8%, have significant capabilities to recognize the correct structures. Importantly, the quality of the corresponding threading alignments is comparable to these constructed using conventional wild-type approaches (the average TM-score is 0.48 and 0.54, respectively). Fold recognition that uses data fusion to combine ranks calculated for both wild-type and synthetic template libraries systematically improves the detection of structural analogs. Depending on the threading algorithm used, it yields on average 4-16% higher recognition rates than using the wild-type template library alone. Synthetic sequences artificially evolved for the template structures provide an orthogonal source of signal that could be exploited to detect these templates unrecognized by standard modeling techniques. It opens up new directions in the development of more sensitive threading methods with the enhanced capabilities of targeting difficult, midnight zone templates.
Collapse
Affiliation(s)
- Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
| |
Collapse
|
36
|
Cirillo D, Agostini F, Klus P, Marchese D, Rodriguez S, Bolognesi B, Tartaglia GG. Neurodegenerative diseases: quantitative predictions of protein-RNA interactions. RNA (NEW YORK, N.Y.) 2013; 19:129-140. [PMID: 23264567 PMCID: PMC3543085 DOI: 10.1261/rna.034777.112] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2012] [Accepted: 11/16/2012] [Indexed: 06/01/2023]
Abstract
Increasing evidence indicates that RNA plays an active role in a number of neurodegenerative diseases. We recently introduced a theoretical framework, catRAPID, to predict the binding ability of protein and RNA molecules. Here, we use catRAPID to investigate ribonucleoprotein interactions linked to inherited intellectual disability, amyotrophic lateral sclerosis, Creutzfeuld-Jakob, Alzheimer's, and Parkinson's diseases. We specifically focus on (1) RNA interactions with fragile X mental retardation protein FMRP; (2) protein sequestration caused by CGG repeats; (3) noncoding transcripts regulated by TAR DNA-binding protein 43 TDP-43; (4) autogenous regulation of TDP-43 and FMRP; (5) iron-mediated expression of amyloid precursor protein APP and α-synuclein; (6) interactions between prions and RNA aptamers. Our results are in striking agreement with experimental evidence and provide new insights in processes associated with neuronal function and misfunction.
Collapse
Affiliation(s)
- Davide Cirillo
- Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Federico Agostini
- Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Petr Klus
- Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Domenica Marchese
- Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Silvia Rodriguez
- Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Benedetta Bolognesi
- Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Gian Gaetano Tartaglia
- Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| |
Collapse
|
37
|
Bhattacharya D, Cheng J. 3Drefine: consistent protein structure refinement by optimizing hydrogen bonding network and atomic-level energy minimization. Proteins 2013; 81:119-31. [PMID: 22927229 PMCID: PMC3634918 DOI: 10.1002/prot.24167] [Citation(s) in RCA: 122] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2012] [Revised: 07/26/2012] [Accepted: 08/17/2012] [Indexed: 12/27/2022]
Abstract
One of the major limitations of computational protein structure prediction is the deviation of predicted models from their experimentally derived true, native structures. The limitations often hinder the possibility of applying computational protein structure prediction methods in biochemical assignment and drug design that are very sensitive to structural details. Refinement of these low-resolution predicted models to high-resolution structures close to the native state, however, has proven to be extremely challenging. Thus, protein structure refinement remains a largely unsolved problem. Critical assessment of techniques for protein structure prediction (CASP) specifically indicated that most predictors participating in the refinement category still did not consistently improve model quality. Here, we propose a two-step refinement protocol, called 3Drefine, to consistently bring the initial model closer to the native structure. The first step is based on optimization of hydrogen bonding (HB) network and the second step applies atomic-level energy minimization on the optimized model using a composite physics and knowledge-based force fields. The approach has been evaluated on the CASP benchmark data and it exhibits consistent improvement over the initial structure in both global and local structural quality measures. 3Drefine method is also computationally inexpensive, consuming only few minutes of CPU time to refine a protein of typical length (300 residues). 3Drefine web server is freely available at http://sysbio.rnet.missouri.edu/3Drefine/.
Collapse
Affiliation(s)
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
- Informatics Institute, University of Missouri, Columbia, MO 65211, USA
- Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
38
|
Vyas VK, Ukawala RD, Ghate M, Chintha C. Homology modeling a fast tool for drug discovery: current perspectives. Indian J Pharm Sci 2012. [PMID: 23204616 PMCID: PMC3507339 DOI: 10.4103/0250-474x.102537] [Citation(s) in RCA: 139] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery.
Collapse
Affiliation(s)
- V K Vyas
- Department of Pharmaceutical Chemistry, Institute of Pharmacy, Nirma University, Ahmedabad-382 481, India
| | | | | | | |
Collapse
|
39
|
|
40
|
Ravna AW, Sylte I. Homology modeling of transporter proteins (carriers and ion channels). Methods Mol Biol 2012; 857:281-99. [PMID: 22323226 DOI: 10.1007/978-1-61779-588-6_12] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Transporter proteins are divided into channels and carriers and constitute families of membrane proteins of physiological and pharmacological importance. These proteins are targeted by several currently prescribed drugs, and they have a large potential as targets for new drug development. Ion channels and carriers are difficult to express and purify in amounts for X-ray crystallography and nuclear magnetic resonance (NMR) studies, and few carrier and ion channel structures are deposited in the PDB database. The scarcity of atomic resolution 3D structures of carriers and channels is a problem for understanding their molecular mechanisms of action and for designing new compounds with therapeutic potentials. The homology modeling approach is a valuable approach for obtaining structural information about carriers and ion channels when no crystal structure of the protein of interest is available. In this chapter, computational approaches for constructing homology models of carriers and transporters are reviewed.
Collapse
Affiliation(s)
- Aina Westrheim Ravna
- Medical Pharmacology and Toxicology, Department of Medical Biology, Faculty of Health Sciences, University of Tromsø, Tromsø, Norway
| | | |
Collapse
|
41
|
Lange OF, Baker D. Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation. Proteins 2012; 80:884-95. [PMID: 22423358 PMCID: PMC3310173 DOI: 10.1002/prot.23245] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
Recent work has shown that NMR structures can be determined by integrating sparse NMR data with structure prediction methods such as Rosetta. The experimental data serve to guide the search for the lowest energy state towards the deep minimum at the native state which is frequently missed in Rosetta de novo structure calculations. However, as the protein size increases, sampling again becomes limiting; for example, the standard Rosetta protocol involving Monte Carlo fragment insertion starting from an extended chain fails to converge for proteins over 150 amino acids even with guidance from chemical shifts (CS-Rosetta) and other NMR data. The primary limitation of this protocol—that every folding trajectory is completely independent of every other—was recently overcome with the development of a new approach involving resolution-adapted structural recombination (RASREC). Here we describe the RASREC approach in detail and compare it to standard CS-Rosetta. We show that the improved sampling of RASREC is essential in obtaining accurate structures over a benchmark set of 11 proteins in the 15-25 kDa size range using chemical shifts, backbone RDCs and HN-HN NOE data; in a number of cases the improved sampling methodology makes a larger contribution than incorporation of additional experimental data. Experimental data are invaluable for guiding sampling to the vicinity of the global energy minimum, but for larger proteins, the standard Rosetta fold-from-extended-chain protocol does not converge on the native minimum even with experimental data and the more powerful RASREC approach is necessary to converge to accurate solutions.
Collapse
Affiliation(s)
- Oliver F Lange
- Department Chemie, Biomolecular NMR and Munich Center for Integrated Protein Science, Technische Universität München, Garching, Germany.
| | | |
Collapse
|
42
|
Raval A, Piana S, Eastwood MP, Dror RO, Shaw DE. Refinement of protein structure homology models via long, all-atom molecular dynamics simulations. Proteins 2012; 80:2071-9. [PMID: 22513870 DOI: 10.1002/prot.24098] [Citation(s) in RCA: 184] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2012] [Revised: 04/03/2012] [Accepted: 04/11/2012] [Indexed: 11/07/2022]
Abstract
Accurate computational prediction of protein structure represents a longstanding challenge in molecular biology and structure-based drug design. Although homology modeling techniques are widely used to produce low-resolution models, refining these models to high resolution has proven difficult. With long enough simulations and sufficiently accurate force fields, molecular dynamics (MD) simulations should in principle allow such refinement, but efforts to refine homology models using MD have for the most part yielded disappointing results. It has thus far been unclear whether MD-based refinement is limited primarily by accessible simulation timescales, force field accuracy, or both. Here, we examine MD as a technique for homology model refinement using all-atom simulations, each at least 100 μs long-more than 100 times longer than previous refinement simulations-and a physics-based force field that was recently shown to successfully fold a structurally diverse set of fast-folding proteins. In MD simulations of 24 proteins chosen from the refinement category of recent Critical Assessment of Structure Prediction (CASP) experiments, we find that in most cases, simulations initiated from homology models drift away from the native structure. Comparison with simulations initiated from the native structure suggests that force field accuracy is the primary factor limiting MD-based refinement. This problem can be mitigated to some extent by restricting sampling to the neighborhood of the initial model, leading to structural improvement that, while limited, is roughly comparable to the leading alternative methods.
Collapse
Affiliation(s)
- Alpan Raval
- D E Shaw Research, New York, New York 10036, USA
| | | | | | | | | |
Collapse
|
43
|
|
44
|
Haddadian EJ, Gong H, Jha AK, Yang X, Debartolo J, Hinshaw JR, Rice PA, Sosnick TR, Freed KF. Automated real-space refinement of protein structures using a realistic backbone move set. Biophys J 2011; 101:899-909. [PMID: 21843481 DOI: 10.1016/j.bpj.2011.06.063] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2011] [Revised: 06/23/2011] [Accepted: 06/28/2011] [Indexed: 11/26/2022] Open
Abstract
Crystals of many important biological macromolecules diffract to limited resolution, rendering accurate model building and refinement difficult and time-consuming. We present a torsional optimization protocol that is applicable to many such situations and combines Protein Data Bank-based torsional optimization with real-space refinement against the electron density derived from crystallography or cryo-electron microscopy. Our method converts moderate- to low-resolution structures at initial (e.g., backbone trace only) or late stages of refinement to structures with increased numbers of hydrogen bonds, improved crystallographic R-factors, and superior backbone geometry. This automated method is applicable to DNA-binding and membrane proteins of any size and will aid studies of structural biology by improving model quality and saving considerable effort. The method can be extended to improve NMR and other structures. Our backbone score and its sequence profile provide an additional standard tool for evaluating structural quality.
Collapse
Affiliation(s)
- Esmael J Haddadian
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, Sander C. Protein 3D structure computed from evolutionary sequence variation. PLoS One 2011; 6:e28766. [PMID: 22163331 PMCID: PMC3233603 DOI: 10.1371/journal.pone.0028766] [Citation(s) in RCA: 743] [Impact Index Per Article: 57.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2011] [Accepted: 11/14/2011] [Indexed: 11/19/2022] Open
Abstract
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.
Collapse
Affiliation(s)
- Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, United States of America.
| | | | | | | | | | | | | |
Collapse
|
46
|
PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences. ACTA ACUST UNITED AC 2011; 12:181-9. [DOI: 10.1007/s10969-011-9119-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2011] [Accepted: 11/24/2011] [Indexed: 10/14/2022]
|
47
|
Kufareva I, Rueda M, Katritch V, Stevens RC, Abagyan R. Status of GPCR modeling and docking as reflected by community-wide GPCR Dock 2010 assessment. Structure 2011; 19:1108-26. [PMID: 21827947 DOI: 10.1016/j.str.2011.05.012] [Citation(s) in RCA: 228] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2011] [Revised: 05/24/2011] [Accepted: 05/28/2011] [Indexed: 12/19/2022]
Abstract
The community-wide GPCR Dock assessment is conducted to evaluate the status of molecular modeling and ligand docking for human G protein-coupled receptors. The present round of the assessment was based on the recent structures of dopamine D3 and CXCR4 chemokine receptors bound to small molecule antagonists and CXCR4 with a synthetic cyclopeptide. Thirty-five groups submitted their receptor-ligand complex structure predictions prior to the release of the crystallographic coordinates. With closely related homology modeling templates, as for dopamine D3 receptor, and with incorporation of biochemical and QSAR data, modern computational techniques predicted complex details with accuracy approaching experimental. In contrast, CXCR4 complexes that had less-characterized interactions and only distant homology to the known GPCR structures still remained very challenging. The assessment results provide guidance for modeling and crystallographic communities in method development and target selection for further expansion of the structural coverage of the GPCR universe.
Collapse
Affiliation(s)
- Irina Kufareva
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA 92039, USA
| | | | | | | | | |
Collapse
|
48
|
Jaroszewski L, Li Z, Cai XH, Weber C, Godzik A. FFAS server: novel features and applications. Nucleic Acids Res 2011; 39:W38-44. [PMID: 21715387 PMCID: PMC3125803 DOI: 10.1093/nar/gkr441] [Citation(s) in RCA: 120] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
The Fold and Function Assignment System (FFAS) server [Jaroszewski et al. (2005) FFAS03: a server for profile–profile sequence alignments. Nucleic Acids Research, 33, W284–W288] implements the algorithm for protein profile–profile alignment introduced originally in [Rychlewski et al. (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science: a Publication of the Protein Society, 9, 232–241]. Here, we present updates, changes and novel functionality added to the server since 2005 and discuss its new applications. The sequence database used to calculate sequence profiles was enriched by adding sets of publicly available metagenomic sequences. The profile of a user’s protein can now be compared with ∼20 additional profile databases, including several complete proteomes, human proteins involved in genetic diseases and a database of microbial virulence factors. A newly developed interface uses a system of tabs, allowing the user to navigate multiple results pages, and also includes novel functionality, such as a dotplot graph viewer, modeling tools, an improved 3D alignment viewer and links to the database of structural similarities. The FFAS server was also optimized for speed: running times were reduced by an order of magnitude. The FFAS server, http://ffas.godziklab.org, has no log-in requirement, albeit there is an option to register and store results in individual, password-protected directories. Source code and Linux executables for the FFAS program are available for download from the FFAS server.
Collapse
Affiliation(s)
- Lukasz Jaroszewski
- Bioinformatics and Systems Biology Program, Sanford Burnham Medical Research Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA
| | | | | | | | | |
Collapse
|
49
|
Kryshtafovych A, Moult J, Bartual SG, Bazan JF, Berman H, Casteel DE, Christodoulou E, Everett JK, Hausmann J, Heidebrecht T, Hills T, Hui R, Hunt JF, Seetharaman J, Joachimiak A, Kennedy MA, Kim C, Lingel A, Michalska K, Montelione GT, Otero JM, Perrakis A, Pizarro JC, van Raaij MJ, Ramelot TA, Rousseau F, Tong L, Wernimont AK, Young J, Schwede T. Target highlights in CASP9: Experimental target structures for the critical assessment of techniques for protein structure prediction. Proteins 2011; 79 Suppl 10:6-20. [PMID: 22020785 PMCID: PMC3692002 DOI: 10.1002/prot.23196] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
One goal of the CASP community wide experiment on the critical assessment of techniques for protein structure prediction is to identify the current state of the art in protein structure prediction and modeling. A fundamental principle of CASP is blind prediction on a set of relevant protein targets, that is, the participating computational methods are tested on a common set of experimental target proteins, for which the experimental structures are not known at the time of modeling. Therefore, the CASP experiment would not have been possible without broad support of the experimental protein structural biology community. In this article, several experimental groups discuss the structures of the proteins which they provided as prediction targets for CASP9, highlighting structural and functional peculiarities of these structures: the long tail fiber protein gp37 from bacteriophage T4, the cyclic GMP-dependent protein kinase Iβ dimerization/docking domain, the ectodomain of the JTB (jumping translocation breakpoint) transmembrane receptor, Autotaxin in complex with an inhibitor, the DNA-binding J-binding protein 1 domain essential for biosynthesis and maintenance of DNA base-J (β-D-glucosyl-hydroxymethyluracil) in Trypanosoma and Leishmania, an so far uncharacterized 73 residue domain from Ruminococcus gnavus with a fold typical for PDZ-like domains, a domain from the phycobilisome core-membrane linker phycobiliprotein ApcE from Synechocystis, the heat shock protein 90 activators PFC0360w and PFC0270w from Plasmodium falciparum, and 2-oxo-3-deoxygalactonate kinase from Klebsiella pneumoniae.
Collapse
Affiliation(s)
- Andriy Kryshtafovych
- Genome Center, University of California-Davis, 451 Health Sciences Drive, Davis, CA 95616, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Joo H, Chavan AG, Day R, Lennox KP, Sukhanov P, Dahl DB, Vannucci M, Tsai J. Near-native protein loop sampling using nonparametric density estimation accommodating sparcity. PLoS Comput Biol 2011; 7:e1002234. [PMID: 22028638 PMCID: PMC3197639 DOI: 10.1371/journal.pcbi.1002234] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2011] [Accepted: 09/01/2011] [Indexed: 11/29/2022] Open
Abstract
Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD <2.0 Å), the DPM-HMM method performs as well or better than the best templates, demonstrating that our automated method recaptures these canonical loops without inclusion of any IgG specific terms or manual intervention. In cases with poor or few good templates (mean RMSD >7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/. A protein's structure consists of elements of regular secondary structure connected by less regular stretches of loop segments. The irregularity of the loop structure makes loop modeling quite challenging. More accurate sampling of these loop conformations has a direct impact on protein modeling, design, function classification, as well as protein interactions. A method has been developed that extends a more comprehensive knowledge-based approach to producing models of the loop regions of protein structure. Most physical models cannot adequately sample the large conformational space, while the more discrete knowledge based libraries are conformationally limited. To address both of these problems, we introduce a novel statistical method that produces a continuous yet weighted estimation of loop conformational space from a discrete library of structures by using a Dirichlet process mixture of hidden Markov models (DPM-HMM). Applied to loop structure sampling, the results of a number of tests demonstrate that our approach quickly generates large numbers of candidates with near native loop conformations. Most significantly, in the cases where the template sampling is sparse and/or far from native conformations, the DPM-HMM method samples close to the native space and produces a population of accurate loop structures.
Collapse
Affiliation(s)
- Hyun Joo
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - Archana G. Chavan
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - Ryan Day
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - Kristin P. Lennox
- Department of Statistics, Texas A&M University, College Station, Texas, United States of America
| | - Paul Sukhanov
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - David B. Dahl
- Department of Statistics, Texas A&M University, College Station, Texas, United States of America
| | - Marina Vannucci
- Department of Statistics, Rice University, Houston, Texas, United States of America
| | - Jerry Tsai
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
- * E-mail:
| |
Collapse
|