1
|
Sha CM, Wang J, Dokholyan NV. Predicting 3D RNA structure from the nucleotide sequence using Euclidean neural networks. Biophys J 2024; 123:2671-2681. [PMID: 37838833 PMCID: PMC11393712 DOI: 10.1016/j.bpj.2023.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/19/2023] [Accepted: 10/12/2023] [Indexed: 10/16/2023] Open
Abstract
Fast and accurate 3D RNA structure prediction remains a major challenge in structural biology, mostly due to the size and flexibility of RNA molecules, as well as the lack of diverse experimentally determined structures of RNA molecules. Unlike DNA structure, RNA structure is far less constrained by basepair hydrogen bonding, resulting in an explosion of potential stable states. Here, we propose a convolutional neural network that predicts all pairwise distances between residues in an RNA, using a recently described smooth parametrization of Euclidean distance matrices. We achieve high-accuracy predictions on RNAs up to 100 nt in length in fractions of a second, a factor of 107 faster than existing molecular dynamics-based methods. We also convert our coarse-grained machine learning output into an all-atom model using discrete molecular dynamics with constraints. Our proposed computational pipeline predicts all-atom RNA models solely from the nucleotide sequence. However, this method suffers from the same limitation as nucleic acid molecular dynamics: the scarcity of available RNA crystal structures for training.
Collapse
Affiliation(s)
- Congzhou M Sha
- Department of Engineering Science and Mechanics, Penn State University, State College, Pennsylvania; Department of Pharmacology, Penn State College of Medicine, Hershey, Pennsylvania
| | - Jian Wang
- Department of Pharmacology, Penn State College of Medicine, Hershey, Pennsylvania
| | - Nikolay V Dokholyan
- Department of Engineering Science and Mechanics, Penn State University, State College, Pennsylvania; Department of Pharmacology, Penn State College of Medicine, Hershey, Pennsylvania; Department of Biochemistry and Molecular Biology, Penn State College of Medicine, Hershey, Pennsylvania; Department of Chemistry, Penn State University, State College, Pennsylvania; Department of Biomedical Engineering, Penn State University, State College, Pennsylvania.
| |
Collapse
|
2
|
Dokholyan NV. Experimentally-driven protein structure modeling. J Proteomics 2020; 220:103777. [PMID: 32268219 PMCID: PMC7214187 DOI: 10.1016/j.jprot.2020.103777] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 03/17/2020] [Accepted: 04/02/2020] [Indexed: 11/25/2022]
Abstract
Revolutions in natural and exact sciences started at the dawn of last century have led to the explosion of theoretical, experimental, and computational approaches to determine structures of molecules, complexes, as well as their rich conformational dynamics. Since different experimental methods produce information that is attributed to specific time and length scales, corresponding computational methods have to be tailored to these scales and experiments. These methods can be then combined and integrated in scales, hence producing a fuller picture of molecular structure and motion from the "puzzle pieces" offered by various experiments. Here, we describe a number of computational approaches to utilize experimental data to glance into structure of proteins and understand their dynamics. We will also discuss the limitations and the resolution of the constraints-based modeling approaches. SIGNIFICANCE: Experimentally-driven computational structure modeling and determination is a rapidly evolving alternative to traditional approaches for molecular structure determination. These new hybrid experimental-computational approaches are proving to be a powerful microscope to glance into the structural features of intrinsically or partially disordered proteins, dynamics of molecules and complexes. In this review, we describe various approaches in the field of experimentally-driven computational structure modeling.
Collapse
Affiliation(s)
- Nikolay V Dokholyan
- Department of Pharmacology, Penn State University College of Medicine, Hershey, PA 17033, USA; Department of Biochemistry & Molecular Biology, Penn State College of Medicine, Hershey, PA 17033, USA.; Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA.; Department of Biomedical Engineering, Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
3
|
Bittrich S, Schroeder M, Labudde D. StructureDistiller: Structural relevance scoring identifies the most informative entries of a contact map. Sci Rep 2019; 9:18517. [PMID: 31811259 PMCID: PMC6898053 DOI: 10.1038/s41598-019-55047-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 11/21/2019] [Indexed: 12/17/2022] Open
Abstract
Protein folding and structure prediction are two sides of the same coin. Contact maps and the related techniques of constraint-based structure reconstruction can be considered as unifying aspects of both processes. We present the Structural Relevance (SR) score which quantifies the information content of individual contacts and residues in the context of the whole native structure. The physical process of protein folding is commonly characterized with spatial and temporal resolution: some residues are Early Folding while others are Highly Stable with respect to unfolding events. We employ the proposed SR score to demonstrate that folding initiation and structure stabilization are subprocesses realized by distinct sets of residues. The example of cytochrome c is used to demonstrate how StructureDistiller identifies the most important contacts needed for correct protein folding. This shows that entries of a contact map are not equally relevant for structural integrity. The proposed StructureDistiller algorithm identifies contacts with the highest information content; these entries convey unique constraints not captured by other contacts. Identification of the most informative contacts effectively doubles resilience toward contacts which are not observed in the native contact map. Furthermore, this knowledge increases reconstruction fidelity on sparse contact maps significantly by 0.4 Å.
Collapse
Affiliation(s)
- Sebastian Bittrich
- University of Applied Sciences Mittweida, Mittweida, 09648, Germany. .,Biotechnology Center (BIOTEC), TU Dresden, Dresden, 01307, Germany. .,Research Collaboratory for Structural Bioinformatics Protein Data Bank, University of California, San Diego, La Jolla, CA, 92093, USA.
| | | | - Dirk Labudde
- University of Applied Sciences Mittweida, Mittweida, 09648, Germany
| |
Collapse
|
4
|
Insight into the Structure of the "Unstructured" Tau Protein. Structure 2019; 27:1710-1715.e4. [PMID: 31628033 DOI: 10.1016/j.str.2019.09.003] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 07/02/2019] [Accepted: 09/12/2019] [Indexed: 02/07/2023]
Abstract
Combining structural proteomics experimental data with computational methods is a powerful tool for protein structure prediction. Here, we apply a recently developed approach for de novo protein structure determination based on the incorporation of short-distance crosslinking data as constraints in discrete molecular dynamics simulations (CL-DMD), for the determination of the conformational ensemble of tau protein in solution. The predicted structures were in agreement with surface modification and long-distance crosslinking data. Tau in solution was found as an ensemble of rather compact globular conformations with distinct topology, inter-residue contacts, and a number of transient secondary-structure elements. Regions important for pathological aggregation consistently were found to contain β strands. The determined structures are compatible with the tau protein in solution being a molten globule at near-ground state with persistent residual structural features which we were able to capture by CL-DMD. The predicted structure may facilitate an understanding of the misfolding and oligomerization pathways of the tau protein.
Collapse
|
5
|
Wang J, Williams B, Chirasani VR, Krokhotin A, Das R, Dokholyan NV. Limits in accuracy and a strategy of RNA structure prediction using experimental information. Nucleic Acids Res 2019; 47:5563-5572. [PMID: 31106330 PMCID: PMC6582333 DOI: 10.1093/nar/gkz427] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 05/03/2019] [Accepted: 05/08/2019] [Indexed: 01/22/2023] Open
Abstract
RNA structural complexity and flexibility present a challenge for computational modeling efforts. Experimental information and bioinformatics data can be used as restraints to improve the accuracy of RNA tertiary structure prediction. Regarding utilization of restraints, the fundamental questions are: (i) What is the limit in prediction accuracy that one can achieve with arbitrary number of restraints? (ii) Is there a strategy for selection of the minimal number of restraints that would result in the best structural model? We address the first question by testing the limits in prediction accuracy using native contacts as restraints. To address the second question, we develop an algorithm based on the distance variation allowed by secondary structure (DVASS), which ranks restraints according to their importance to RNA tertiary structure prediction. We find that due to kinetic traps, the greatest improvement in the structure prediction accuracy is achieved when we utilize only 40-60% of the total number of native contacts as restraints. When the restraints are sorted by DVASS algorithm, using only the first 20% ranked restraints can greatly improve the prediction accuracy. Our findings suggest that only a limited number of strategically selected distance restraints can significantly assist in RNA structure modeling.
Collapse
Affiliation(s)
- Jian Wang
- Department of Pharmacology, Penn State University College of Medicine, Hershey, PA 17033, USA
| | - Benfeard Williams
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Venkata R Chirasani
- Department of Pharmacology, Penn State University College of Medicine, Hershey, PA 17033, USA
| | - Andrey Krokhotin
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Rajeshree Das
- Weinberg College of Arts and Sciences, Northwestern University, Evanston, IL 60208, USA
| | - Nikolay V Dokholyan
- Department of Pharmacology, Penn State University College of Medicine, Hershey, PA 17033, USA
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
- Department of Biochemistry and Molecular Biology, Penn State University College of Medicine, Hershey, PA 17033, USA
- Department of Chemistry, Penn State University, University Park, PA 16802, USA
- Department of Biomedical Engineering, Penn State University, University Park, PA 16802, USA
| |
Collapse
|
6
|
Brodie NI, Popov KI, Petrotchenko EV, Dokholyan NV, Borchers CH. Conformational ensemble of native α-synuclein in solution as determined by short-distance crosslinking constraint-guided discrete molecular dynamics simulations. PLoS Comput Biol 2019; 15:e1006859. [PMID: 30917118 PMCID: PMC6453469 DOI: 10.1371/journal.pcbi.1006859] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 04/08/2019] [Accepted: 02/08/2019] [Indexed: 12/01/2022] Open
Abstract
Combining structural proteomics experimental data with computational methods is a powerful tool for protein structure prediction. Here, we apply a recently-developed approach for de novo protein structure determination based on the incorporation of short-distance crosslinking data as constraints in discrete molecular dynamics simulations (CL-DMD) for the determination of conformational ensemble of the intrinsically disordered protein α-synuclein in the solution. The predicted structures were in agreement with hydrogen-deuterium exchange, circular dichroism, surface modification, and long-distance crosslinking data. We found that α-synuclein is present in solution as an ensemble of rather compact globular conformations with distinct topology and inter-residue contacts, which is well-represented by movements of the large loops and formation of few transient secondary structure elements. Non-amyloid component and C-terminal regions were consistently found to contain β-structure elements and hairpins. As the population ages, neurodegenerative diseases such as Parkinson’s disease will become an increasing problem in many countries. Aggregation of the protein α-synuclein is the primary cause of Parkinson’s disease, but there is still a dearth of structural information pertaining to the native, non-aggregating form of this protein. A better understanding the structural state of the native protein may prove useful for the design of new therapeutics to combat this disease. In order to obtain more structural information on this protein, we have recently modelled the native α-synuclein protein. These models were generated using a novel approach which combines protein crosslinking and discrete molecular dynamics simulations. We have found that the α-synuclein protein can adopt several shapes, all with a similar topology, resembling a three fingered closed claw. A region of the protein important for aggregation was found to be protected from the surrounding biological environment in these conformations, and the stabilization of these structures may be a fruitful avenue for future drug research into mitigating the cause and effect of Parkinson’s disease.
Collapse
Affiliation(s)
- Nicholas I. Brodie
- University of Victoria -Genome British Columbia Proteomics Centre, Vancouver Island Technology Park, Victoria, British Columbia, Canada
| | - Konstantin I. Popov
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Evgeniy V. Petrotchenko
- University of Victoria -Genome British Columbia Proteomics Centre, Vancouver Island Technology Park, Victoria, British Columbia, Canada
- Segal Cancer Proteomics Centre, Lady Davis Institute, Jewish General Hospital, McGill University, Quebec, Canada
| | - Nikolay V. Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Departments of Pharmacology, and Biochemistry and Molecular Biology, Pennsylvania State College of Medicine, Hershey, Pennsylvania, United States of America
- * E-mail: (NVD); (CHB)
| | - Christoph H. Borchers
- University of Victoria -Genome British Columbia Proteomics Centre, Vancouver Island Technology Park, Victoria, British Columbia, Canada
- Segal Cancer Proteomics Centre, Lady Davis Institute, Jewish General Hospital, McGill University, Quebec, Canada
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, British Columbia, Canada
- Gerald Bronfman Department of Oncology, Jewish General Hospital, McGill University, Montreal, Quebec, Canada
- * E-mail: (NVD); (CHB)
| |
Collapse
|
7
|
Brodie NI, Popov KI, Petrotchenko EV, Dokholyan NV, Borchers CH. Solving protein structures using short-distance cross-linking constraints as a guide for discrete molecular dynamics simulations. SCIENCE ADVANCES 2017; 3:e1700479. [PMID: 28695211 PMCID: PMC5501500 DOI: 10.1126/sciadv.1700479] [Citation(s) in RCA: 72] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 05/19/2017] [Indexed: 05/21/2023]
Abstract
We present an integrated experimental and computational approach for de novo protein structure determination in which short-distance cross-linking data are incorporated into rapid discrete molecular dynamics (DMD) simulations as constraints, reducing the conformational space and achieving the correct protein folding on practical time scales. We tested our approach on myoglobin and FK506 binding protein-models for α helix-rich and β sheet-rich proteins, respectively-and found that the lowest-energy structures obtained were in agreement with the crystal structure, hydrogen-deuterium exchange, surface modification, and long-distance cross-linking validation data. Our approach is readily applicable to other proteins with unknown structures.
Collapse
Affiliation(s)
- Nicholas I. Brodie
- University of Victoria–Genome British Columbia Proteomics Centre, Vancouver Island Technology Park, #3101-4464 Markham Street, Victoria, British Columbia V8Z7X8, Canada
| | - Konstantin I. Popov
- Department of Biochemistry and Biophysics, University of North Carolina, Genetic Medicine Building, 120 Mason Farm Road, Chapel Hill, NC 27599, USA
| | - Evgeniy V. Petrotchenko
- University of Victoria–Genome British Columbia Proteomics Centre, Vancouver Island Technology Park, #3101-4464 Markham Street, Victoria, British Columbia V8Z7X8, Canada
| | - Nikolay V. Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina, Genetic Medicine Building, 120 Mason Farm Road, Chapel Hill, NC 27599, USA
| | - Christoph H. Borchers
- University of Victoria–Genome British Columbia Proteomics Centre, Vancouver Island Technology Park, #3101-4464 Markham Street, Victoria, British Columbia V8Z7X8, Canada
- Department of Biochemistry and Microbiology, University of Victoria, Room 270d, Petch Building, 3800 Finnerty Road, Victoria, British Columbia V8P 5C2, Canada
- Gerald Bronfman Department of Oncology, Jewish General Hospital, Suite 720, 5100 de Maisonneuve Boulevard West, Montreal, Quebec H4A 3T2, Canada
- Proteomics Centre, Segal Cancer Centre, Lady Davis Institute, Jewish General Hospital, McGill University, 3755 Côte-Sainte-Catherine Road, Montreal, Quebec H3T 1E2, Canada
| |
Collapse
|
8
|
Kmiecik S, Gront D, Kolinski M, Wieteska L, Dawid AE, Kolinski A. Coarse-Grained Protein Models and Their Applications. Chem Rev 2016; 116:7898-936. [DOI: 10.1021/acs.chemrev.6b00163] [Citation(s) in RCA: 555] [Impact Index Per Article: 69.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Sebastian Kmiecik
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Dominik Gront
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Michal Kolinski
- Bioinformatics
Laboratory, Mossakowski Medical Research Center of the Polish Academy of Sciences, Pawinskiego 5, 02-106 Warsaw, Poland
| | - Lukasz Wieteska
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
- Department
of Medical Biochemistry, Medical University of Lodz, Mazowiecka 6/8, 92-215 Lodz, Poland
| | | | - Andrzej Kolinski
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| |
Collapse
|
9
|
Proctor EA, Dokholyan NV. Applications of Discrete Molecular Dynamics in biology and medicine. Curr Opin Struct Biol 2015; 37:9-13. [PMID: 26638022 DOI: 10.1016/j.sbi.2015.11.001] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Revised: 10/28/2015] [Accepted: 11/05/2015] [Indexed: 11/27/2022]
Abstract
Discrete Molecular Dynamics (DMD) is a physics-based simulation method using discrete energetic potentials rather than traditional continuous potentials, allowing microsecond time scale simulations of biomolecular systems to be performed on personal computers rather than supercomputers or specialized hardware. With the ongoing explosion in processing power even in personal computers, applications of DMD have similarly multiplied. In the past two years, researchers have used DMD to model structures of disease-implicated protein folding intermediates, study assembly of protein complexes, predict protein-protein binding conformations, engineer rescue mutations in disease-causative protein mutants, design a protein conformational switch to control cell signaling, and describe the behavior of polymeric dispersants for environmental cleanup of oil spills, among other innovative applications.
Collapse
Affiliation(s)
- Elizabeth A Proctor
- Department of Biological Engineering, Massachusetts Institute of Technology, United States.
| | - Nikolay V Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, United States.
| |
Collapse
|
10
|
Fioramonte M, dos Santos AM, McIlwain S, Noble WS, Franchini KG, Gozzo FC. Analysis of secondary structure in proteins by chemical cross-linking coupled to MS. Proteomics 2013; 12:2746-52. [PMID: 22778071 DOI: 10.1002/pmic.201200040] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Chemical cross-linking is an attractive technique for the study of the structure of protein complexes due to its low sample consumption and short analysis time. Furthermore, distance constraints obtained from the identification of cross-linked peptides by MS can be used to construct and validate protein models. If a sufficient number of distance constraints are obtained, then determining the secondary structure of a protein can allow inference of the protein's fold. In this work, we show how the distance constraints obtained from cross-linking experiments can identify secondary structures within the protein sequence. Molecular modeling of alpha helices and beta sheets reveals that each secondary structure presents different cross-linking possibilities due to the topological distances between reactive residues. Cross-linking experiments performed with amine reactive cross-linkers with model alpha helix containing proteins corroborated the molecular modeling predictions. The cross-linking patterns established here can be extended to other cross-linkers with known lengths for the determination of secondary structures in proteins.
Collapse
|
11
|
Gaci O. Community structure description in amino acid interaction networks. Interdiscip Sci 2011; 3:50-6. [PMID: 21369888 DOI: 10.1007/s12539-011-0061-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2009] [Revised: 06/22/2009] [Accepted: 07/06/2009] [Indexed: 11/25/2022]
Abstract
In this paper, we represent proteins by amino acid interaction networks. This is a graph whose vertices are the protein's amino acids and whose edges are the interactions between them. We begin by identifying the main topological properties of these interaction networks using graph theory measures. We observe that the amino acids interact specifically, according to their structural role, and depending on whether they participate or not in the secondary structure. Thus, certain amino acids tend to group together to form local clouds. Then, we study the formation of node aggregations through community structure detections. We observe that the composition of organizations confirms a specific aggregation between loops around a core composed of secondary.
Collapse
Affiliation(s)
- Omar Gaci
- LITIS Laboratory, 25 rue Philippe Lebon, Le Havre, France.
| |
Collapse
|
12
|
Proctor EA, Ding F, Dokholyan NV. Discrete molecular dynamics. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2011. [DOI: 10.1002/wcms.4] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Elizabeth A. Proctor
- Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Feng Ding
- Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Nikolay V. Dokholyan
- Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
13
|
Blurring contact maps of thousands of proteins: what we can learn by reconstructing 3D structure. BioData Min 2011; 4:1. [PMID: 21232136 PMCID: PMC3033854 DOI: 10.1186/1756-0381-4-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2010] [Accepted: 01/13/2011] [Indexed: 11/17/2022] Open
Abstract
Background The present knowledge of protein structures at atomic level derives from some 60,000 molecules. Yet the exponential ever growing set of hypothetical protein sequences comprises some 10 million chains and this makes the problem of protein structure prediction one of the challenging goals of bioinformatics. In this context, the protein representation with contact maps is an intermediate step of fold recognition and constitutes the input of contact map predictors. However contact map representations require fast and reliable methods to reconstruct the specific folding of the protein backbone. Methods In this paper, by adopting a GRID technology, our algorithm for 3D reconstruction FT-COMAR is benchmarked on a huge set of non redundant proteins (1716) taking random noise into consideration and this makes our computation the largest ever performed for the task at hand. Results We can observe the effects of introducing random noise on 3D reconstruction and derive some considerations useful for future implementations. The dimension of the protein set allows also statistical considerations after grouping per SCOP structural classes. Conclusions All together our data indicate that the quality of 3D reconstruction is unaffected by deleting up to an average 75% of the real contacts while only few percentage of randomly generated contacts in place of non-contacts are sufficient to hamper 3D reconstruction.
Collapse
|
14
|
Petrotchenko EV, Borchers CH. Crosslinking combined with mass spectrometry for structural proteomics. MASS SPECTROMETRY REVIEWS 2010; 29:862-76. [PMID: 20730915 DOI: 10.1002/mas.20293] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
The method of crosslinking combined with mass spectrometry is being gradually accepted as a technology enabling detailed structural information on proteins and protein complexes. Intrinsic challenges of the method, which have prevented its widespread use, are being progressively addressed by improvements in mass spectrometry instrumentation capabilities, by the development of new crosslinking reagents, and by the development of specialized software tools for processing of mass spectrometric crosslinking data. This review focuses on recent literature concerning the development of specialized crosslinking reagents and approaches for mass spectrometry-based applications. Critical features of crosslinking reagents for optimum mass spectrometric performance, such as isotopic coding, cleavability, affinity groups, structure of the linkers, and reactive groups, are assessed. Requirements for the design of crosslinking reagents to make them well suited for mass spectrometric detection and analysis are summarized.
Collapse
Affiliation(s)
- Evgeniy V Petrotchenko
- University of Victoria Proteomics Centre, 3101-4464 Markham Street, Victoria, British Columbia, Canada V8Z7X8
| | | |
Collapse
|
15
|
Gomes AF, Gozzo FC. Chemical cross-linking with a diazirine photoactivatable cross-linker investigated by MALDI- and ESI-MS/MS. JOURNAL OF MASS SPECTROMETRY : JMS 2010; 45:892-9. [PMID: 20635431 DOI: 10.1002/jms.1776] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Crystallography and nuclear magnetic resonance are well-established methods to study protein tertiary structure and interactions. Despite their usefulness, such methods are not applicable to many protein systems. Chemical cross-linking of proteins coupled with mass spectrometry allows low-resolution characterization of proteins and protein complexes based on measuring distance constraints from cross-links. In this work, we have investigated cross-linking by means of a heterobifunctional cross-linker containing a traditional N-hydroxysuccinimide (NHS) ester and a UV photoactivatable diazirine group. Activation of the diazirine group yields a highly reactive carbene species, with potential to increase the number of cross-links compared with homobifunctional, NHS-based cross-linkers. Cross-linking reactions were performed on model systems such as synthetic peptides and equine myoglobin. After reduction of the disulfide bond, the formation of intra- and intermolecular cross-links was identified and the peptides modified with both NHS and diazirine moieties characterized. Fragmentation of these modified peptides reveals the presence of a marker ion for intramolecular cross-links, which facilitates identification.
Collapse
Affiliation(s)
- Alexandre F Gomes
- Institute of Chemistry, University of Campinas-UNICAMP and Instituto Nacional de Ciencia e Tecnologia de Bioanalitica, CP 6154, 13083-970, Campinas, Sao Paulo, Brazil
| | | |
Collapse
|
16
|
Duarte JM, Sathyapriya R, Stehr H, Filippis I, Lappe M. Optimal contact definition for reconstruction of contact maps. BMC Bioinformatics 2010; 11:283. [PMID: 20507547 PMCID: PMC3583236 DOI: 10.1186/1471-2105-11-283] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2009] [Accepted: 05/27/2010] [Indexed: 11/23/2022] Open
Abstract
Background Contact maps have been extensively used as a simplified representation of protein structures. They capture most important features of a protein's fold, being preferred by a number of researchers for the description and study of protein structures. Inspired by the model's simplicity many groups have dedicated a considerable amount of effort towards contact prediction as a proxy for protein structure prediction. However a contact map's biological interest is subject to the availability of reliable methods for the 3-dimensional reconstruction of the structure. Results We use an implementation of the well-known distance geometry protocol to build realistic protein 3-dimensional models from contact maps, performing an extensive exploration of many of the parameters involved in the reconstruction process. We try to address the questions: a) to what accuracy does a contact map represent its corresponding 3D structure, b) what is the best contact map representation with regard to reconstructability and c) what is the effect of partial or inaccurate contact information on the 3D structure recovery. Our results suggest that contact maps derived from the application of a distance cutoff of 9 to 11Å around the Cβ atoms constitute the most accurate representation of the 3D structure. The reconstruction process does not provide a single solution to the problem but rather an ensemble of conformations that are within 2Å RMSD of the crystal structure and with lower values for the pairwise average ensemble RMSD. Interestingly it is still possible to recover a structure with partial contact information, although wrong contacts can lead to dramatic loss in reconstruction fidelity. Conclusions Thus contact maps represent a valid approximation to the structures with an accuracy comparable to that of experimental methods. The optimal contact definitions constitute key guidelines for methods based on contact maps such as structure prediction through contacts and structural alignments based on maximum contact map overlap.
Collapse
Affiliation(s)
- Jose M Duarte
- Max Planck Institute for Molecular Genetics, Ihnestr, Berlin, Germany.
| | | | | | | | | |
Collapse
|
17
|
Sathyapriya R, Duarte JM, Stehr H, Filippis I, Lappe M. Defining an essence of structure determining residue contacts in proteins. PLoS Comput Biol 2009; 5:e1000584. [PMID: 19997489 PMCID: PMC2778133 DOI: 10.1371/journal.pcbi.1000584] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2009] [Accepted: 10/30/2009] [Indexed: 11/18/2022] Open
Abstract
The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this “structural essence” has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts—such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed “cone-peeling” that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 Å Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This “structural essence” opens new avenues in the fields of structure prediction, empirical potentials and docking. A protein structure can be visualized as a network of non-covalent contacts existing between amino acids. But not all such contacts are important structural determinants of a protein. We have attempted to identify a subset of amino acid contacts that are essential for reconstructing protein structures. Initially, we followed random sampling of contacts and tested their efficacy to successfully represent the three-dimensional structure. Further, we also developed an algorithm that selects a subset of amino acid contacts from proteins based on the sequence and network properties. The subsets picked by our algorithm represent protein three-dimensional structure better than random subsets, thereby offering direct evidence for the existence of a structural essence in protein structures. The identification of such structure-defining subsets finds application in experimental and computational protein structure determination.
Collapse
Affiliation(s)
- R. Sathyapriya
- Structural Genomics/Bioinformatics Group, Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Jose M. Duarte
- Structural Genomics/Bioinformatics Group, Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Henning Stehr
- Structural Genomics/Bioinformatics Group, Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Ioannis Filippis
- Structural Genomics/Bioinformatics Group, Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Michael Lappe
- Structural Genomics/Bioinformatics Group, Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Berlin, Germany
- * E-mail:
| |
Collapse
|
18
|
Lappe M, Bagler G, Filippis I, Stehr H, Duarte JM, Sathyapriya R. Designing evolvable libraries using multi-body potentials. Curr Opin Biotechnol 2009; 20:437-46. [DOI: 10.1016/j.copbio.2009.07.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2009] [Revised: 07/15/2009] [Accepted: 07/25/2009] [Indexed: 01/13/2023]
|
19
|
Wolff K, Vendruscolo M, Porto M. Stochastic reconstruction of protein structures from effective connectivity profiles. PMC BIOPHYSICS 2008; 1:5. [PMID: 19351427 PMCID: PMC2666633 DOI: 10.1186/1757-5036-1-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2008] [Accepted: 11/26/2008] [Indexed: 11/23/2022]
Abstract
We discuss a stochastic approach for reconstructing the native structures of proteins from the knowledge of the "effective connectivity", which is a one-dimensional structural profile constructed as a linear combination of the eigenvectors of the contact map of the target structure. The structural profile is used to bias a search of the conformational space towards the target structure in a Monte Carlo scheme operating on a Cα-chain of uniform, finite thickness. Structure information thus enters the folding dynamics via the effective connectivity, but the interaction is not restricted to pairs of amino acids that form native contacts, resulting in a free energy landscape which does not rely on the assumption of minimal frustration. Moreover, effective connectivity vectors can be predicted more readily from the amino acid sequence of proteins than the corresponding contact maps, thus suggesting that the stochastic protocol presented here could be effectively combined with other current methods for predicting native structures. PACS codes: 87.14.Ee.
Collapse
Affiliation(s)
- Katrin Wolff
- Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstrasse 6, 64289 Darmstadt, Germany.
| | | | | |
Collapse
|
20
|
Wolff K, Vendruscolo M, Porto M. A stochastic method for the reconstruction of protein structures from one-dimensional structural profiles. Gene 2008; 422:47-51. [PMID: 18577428 DOI: 10.1016/j.gene.2008.06.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We discuss a computational approach for reconstructing the native structures of proteins from the knowledge of a structural profile - the first eigenvector of the contact map of the native structure itself. The procedure consists in carrying out Monte Carlo simulations of a tube model of the protein structure with an energy bias towards the target structural profile. We present the reconstruction of two small proteins and address problems arising in the reconstruction of larger proteins. Our results indicate that an accurate physico-chemical energy function should be used in conjunction with the structural profile bias in order to achieve accurate reconstructions.
Collapse
Affiliation(s)
- Katrin Wolff
- Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstrasse 6, 64289 Darmstadt, Germany
| | | | | |
Collapse
|
21
|
Jhunjhunwala S, van Zelm MC, Peak MM, Cutchin S, Riblet R, van Dongen JJ, Grosveld FG, Knoch TA, Murre C. The 3D structure of the immunoglobulin heavy-chain locus: implications for long-range genomic interactions. Cell 2008; 133:265-79. [PMID: 18423198 PMCID: PMC2771211 DOI: 10.1016/j.cell.2008.03.024] [Citation(s) in RCA: 226] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2007] [Revised: 01/04/2008] [Accepted: 03/16/2008] [Indexed: 12/23/2022]
Abstract
The immunoglobulin heavy-chain (Igh) locus is organized into distinct regions that contain multiple variable (V(H)), diversity (D(H)), joining (J(H)) and constant (C(H)) coding elements. How the Igh locus is structured in 3D space is unknown. To probe the topography of the Igh locus, spatial distance distributions were determined between 12 genomic markers that span the entire Igh locus. Comparison of the distance distributions to computer simulations of alternative chromatin arrangements predicted that the Igh locus is organized into compartments containing clusters of loops separated by linkers. Trilateration and triple-point angle measurements indicated the mean relative 3D positions of the V(H), D(H), J(H), and C(H) elements, showed compartmentalization and striking conformational changes involving V(H) and D(H)-J(H) elements during early B cell development. In pro-B cells, the entire repertoire of V(H) regions (2 Mbp) appeared to have merged and juxtaposed to the D(H) elements, mechanistically permitting long-range genomic interactions to occur with relatively high frequency.
Collapse
Affiliation(s)
- Suchit Jhunjhunwala
- Division of Biological Sciences, 0377, University of California, San Diego, La Jolla, CA 92093, USA
| | - Menno C. van Zelm
- Division of Biological Sciences, 0377, University of California, San Diego, La Jolla, CA 92093, USA
| | - Mandy M. Peak
- Division of Biological Sciences, 0377, University of California, San Diego, La Jolla, CA 92093, USA
| | - Steve Cutchin
- San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92037, USA
| | - Roy Riblet
- Torrey Pines Institute for Molecular Studies, San Diego, CA 92121, USA
| | - Jacques J.M. van Dongen
- Department of Immunology, Erasmus MC, Dr. Molewaterplein 50, 3015 GE Rotterdam, The Netherlands
| | - Frank G. Grosveld
- Departments of Biophysical Genomics, Cell Biology and Genetics, Erasmus MC, Dr. Molewaterplein 50, 3015 GE Rotterdam, The Netherlands
| | - Tobias A. Knoch
- Departments of Biophysical Genomics, Cell Biology and Genetics, Erasmus MC, Dr. Molewaterplein 50, 3015 GE Rotterdam, The Netherlands
- Ruperto-Carola University Heidelberg, Kirchhoff Institute for Physics, Department of Biophysical Genomics, Im Neuenheimfer Feld 280, 69120 Heidelberg, Germany
| | - Cornelis Murre
- Division of Biological Sciences, 0377, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|