1
|
Abstract
Abstract
Deep learning is transforming most areas of science and technology, including electron microscopy. This review paper offers a practical perspective aimed at developers with limited familiarity. For context, we review popular applications of deep learning in electron microscopy. Following, we discuss hardware and software needed to get started with deep learning and interface with electron microscopes. We then review neural network components, popular architectures, and their optimization. Finally, we discuss future directions of deep learning in electron microscopy.
Collapse
|
2
|
Chiu W, Schmid MF, Pintilie GD, Lawson CL. Evolution of standardization and dissemination of cryo-EM structures and data jointly by the community, PDB, and EMDB. J Biol Chem 2021; 296:100560. [PMID: 33744287 PMCID: PMC8050867 DOI: 10.1016/j.jbc.2021.100560] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 02/08/2021] [Accepted: 03/16/2021] [Indexed: 01/04/2023] Open
Abstract
Cryogenic electron microscopy (cryo-EM) methods began to be used in the mid-1970s to study thin and periodic arrays of proteins. Following a half-century of development in cryo-specimen preparation, instrumentation, data collection, data processing, and modeling software, cryo-EM has become a routine method for solving structures from large biological assemblies to small biomolecules at near to true atomic resolution. This review explores the critical roles played by the Protein Data Bank (PDB) and Electron Microscopy Data Bank (EMDB) in partnership with the community to develop the necessary infrastructure to archive cryo-EM maps and associated models. Public access to cryo-EM structure data has in turn facilitated better understanding of structure–function relationships and advancement of image processing and modeling tool development. The partnership between the global cryo-EM community and PDB and EMDB leadership has synergistically shaped the standards for metadata, one-stop deposition of maps and models, and validation metrics to assess the quality of cryo-EM structures. The advent of cryo-electron tomography (cryo-ET) for in situ molecular cell structures at a broad resolution range and their correlations with other imaging data introduce new data archival challenges in terms of data size and complexity in the years to come.
Collapse
Affiliation(s)
- Wah Chiu
- Department of Bioengineering, Stanford University, Stanford, California, USA; Division of CryoEM and Bioimaging, SLAC National Accelerator Laboratory, Stanford University, Menlo Park, California, USA.
| | - Michael F Schmid
- Division of CryoEM and Bioimaging, SLAC National Accelerator Laboratory, Stanford University, Menlo Park, California, USA
| | - Grigore D Pintilie
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Catherine L Lawson
- Institute for Quantitative Biomedicine and Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| |
Collapse
|
3
|
Abstract
Protein Data Bank is the single worldwide archive of experimentally determined macromolecular structure data. Established in 1971 as the first open access data resource in biology, the PDB archive is managed by the worldwide Protein Data Bank (wwPDB) consortium which has four partners-the RCSB Protein Data Bank (RCSB PDB; rcsb.org), the Protein Data Bank Japan (PDBj; pdbj.org), the Protein Data Bank in Europe (PDBe; pdbe.org), and BioMagResBank (BMRB; www.bmrb.wisc.edu ). The PDB archive currently includes ~175,000 entries. The wwPDB has established a number of task forces and working groups that bring together experts form the community who provide recommendations on improving data standards and data validation for improving data quality and integrity. The wwPDB members continue to develop the joint deposition, biocuration, and validation system (OneDep) to improve data quality and accommodate new data from emerging techniques such as 3DEM. Each PDB entry contains coordinate model and associated metadata for all experimentally determined atomic structures, experimental data for the traditional structure determination techniques (X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy), validation reports, and additional information on quaternary structures. The wwPDB partners are committed to following the FAIR (Findability, Accessibility, Interoperability, and Reproducibility) principles and have implemented a DOI resolution mechanism that provides access to all the relevant files for a given PDB entry. On average, >250 new entries are added to the archive every week and made available by each wwPDB partner via FTP area. The wwPDB partner sites also develop data access and analysis tools and make these available via their websites. wwPDB continues to work with experts in the community to establish a federation of archives for archiving structures determined using integrative/hybrid method where multiple experimental techniques are used.
Collapse
Affiliation(s)
- Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.,Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.,Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA.,Skaggs School of Pharmacy and Pharmaceutical Sciences and San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA, USA
| | - Genji Kurisu
- Protein Data Bank Japan, Institute for Protein Research, Osaka University, Suita, Osaka, Japan
| | - Jeffrey C Hoch
- BioMagResBank, Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT, USA
| | - John L Markley
- BioMagResBank, Biochemistry Department, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
4
|
Berman HM, Vallat B, Lawson CL. The data universe of structural biology. IUCRJ 2020; 7:630-638. [PMID: 32695409 PMCID: PMC7340255 DOI: 10.1107/s205225252000562x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 04/21/2020] [Indexed: 05/05/2023]
Abstract
The Protein Data Bank (PDB) has grown from a small data resource for crystallographers to a worldwide resource serving structural biology. The history of the growth of the PDB and the role that the community has played in developing standards and policies are described. This article also illustrates how other biophysics communities are collaborating with the worldwide PDB to create a network of interoperating data resources. This network will expand the capabilities of structural biology and enable the determination and archiving of increasingly complex structures.
Collapse
Affiliation(s)
- Helen M. Berman
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Department of Biological Sciences and Bridge Institute, University of Southern California, Los Angeles, CA 90089, USA
| | - Brinda Vallat
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Catherine L. Lawson
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
5
|
Lawson CL, Berman HM, Chiu W. Evolving data standards for cryo-EM structures. STRUCTURAL DYNAMICS (MELVILLE, N.Y.) 2020; 7:014701. [PMID: 32002441 PMCID: PMC6980868 DOI: 10.1063/1.5138589] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 01/07/2020] [Indexed: 05/04/2023]
Abstract
Electron cryo-microscopy (cryo-EM) is increasingly being used to determine 3D structures of a broad spectrum of biological specimens from molecules to cells. Anticipating this progress in the early 2000s, an international collaboration of scientists with expertise in both cryo-EM and structure data archiving was established (EMDataResource, previously known as EMDataBank). The major goals of the collaboration have been twofold: to develop the necessary infrastructure for archiving cryo-EM-derived density maps and models, and to promote development of cryo-EM structure validation standards. We describe how cryo-EM data archiving and validation have been developed and jointly coordinated for the Electron Microscopy Data Bank and Protein Data Bank archives over the past two decades, as well as the impact of evolving technology on data standards. Just as for X-ray crystallography and nuclear magnetic resonance, engaging the scientific community via workshops and challenging activities has played a central role in developing recommendations and requirements for the cryo-EM structure data archives.
Collapse
Affiliation(s)
- Catherine L. Lawson
- Institute for Quantitative Biomedicine and Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, USA
| | | | | |
Collapse
|
6
|
Ortega DR, Oikonomou CM, Ding HJ, Rees-Lee P, Jensen GJ. ETDB-Caltech: A blockchain-based distributed public database for electron tomography. PLoS One 2019; 14:e0215531. [PMID: 30986271 PMCID: PMC6464211 DOI: 10.1371/journal.pone.0215531] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Accepted: 04/03/2019] [Indexed: 01/12/2023] Open
Abstract
Three-dimensional electron microscopy techniques like electron tomography provide valuable insights into cellular structures, and present significant challenges for data storage and dissemination. Here we explored a novel method to publicly release more than 11,000 such datasets, more than 30 TB in total, collected by our group. Our method, based on a peer-to-peer file sharing network built around a blockchain ledger, offers a distributed solution to data storage. In addition, we offer a user-friendly browser-based interface, https://etdb.caltech.edu, for anyone interested to explore and download our data. We discuss the relative advantages and disadvantages of this system and provide tools for other groups to mine our data and/or use the same approach to share their own imaging datasets.
Collapse
Affiliation(s)
- Davi R. Ortega
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Catherine M. Oikonomou
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - H. Jane Ding
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Prudence Rees-Lee
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | | | - Grant J. Jensen
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
- Howard Hughes Medical Institute, Pasadena, California, United States of America
- * E-mail:
| |
Collapse
|
7
|
Afonine PV, Poon BK, Read RJ, Sobolev OV, Terwilliger TC, Urzhumtsev A, Adams PD. Real-space refinement in PHENIX for cryo-EM and crystallography. Acta Crystallogr D Struct Biol 2018; 74:531-544. [PMID: 29872004 PMCID: PMC6096492 DOI: 10.1107/s2059798318006551] [Citation(s) in RCA: 1588] [Impact Index Per Article: 264.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Accepted: 04/27/2018] [Indexed: 02/23/2023] Open
Abstract
This article describes the implementation of real-space refinement in the phenix.real_space_refine program from the PHENIX suite. The use of a simplified refinement target function enables very fast calculation, which in turn makes it possible to identify optimal data-restraint weights as part of routine refinements with little runtime cost. Refinement of atomic models against low-resolution data benefits from the inclusion of as much additional information as is available. In addition to standard restraints on covalent geometry, phenix.real_space_refine makes use of extra information such as secondary-structure and rotamer-specific restraints, as well as restraints or constraints on internal molecular symmetry. The re-refinement of 385 cryo-EM-derived models available in the Protein Data Bank at resolutions of 6 Å or better shows significant improvement of the models and of the fit of these models to the target maps.
Collapse
Affiliation(s)
- Pavel V. Afonine
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Department of Physics and International Centre for Quantum and Molecular Structures, Shanghai University, Shanghai 200444, People’s Republic of China
| | - Billy K. Poon
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Randy J. Read
- Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC Building, Hills Road, Cambridge CB2 0XY, England
| | - Oleg V. Sobolev
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Thomas C. Terwilliger
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
- New Mexico Consortium, Los Alamos, NM 87545, USA
| | - Alexandre Urzhumtsev
- Faculté des Sciences et Technologies, Université de Lorraine, BP 239, 54506 Vandoeuvre-les-Nancy, France
- Centre for Integrative Biology, IGBMC, CNRS–INSERM–UdS, 1 Rue Laurent Fries, BP 10142, 67404 Illkirch, France
| | - Paul D. Adams
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Department of Bioengineering, University of California Berkeley, Berkeley, California, USA
| |
Collapse
|
8
|
Abstract
In this review, we describe how the interplay among science, technology and community interests contributed to the evolution of four structural biology data resources. We present the method by which data deposited by scientists are prepared for worldwide distribution, and argue that data archiving in a trusted repository must be an integral part of any scientific investigation.
Collapse
Affiliation(s)
- Helen M. Berman
- Center for Integrative Proteomics Research, Institute for Quantitative Biomedicine, Department of Chemistry and Chemical Biology, 174 Frelinghuysen Road, Piscataway New Jersey 08854
| | - Catherine L. Lawson
- Center for Integrative Proteomics Research, Institute for Quantitative Biomedicine, Department of Chemistry and Chemical Biology, 174 Frelinghuysen Road, Piscataway New Jersey 08854
| | - Brinda Vallat
- Center for Integrative Proteomics Research, Institute for Quantitative Biomedicine, Department of Chemistry and Chemical Biology, 174 Frelinghuysen Road, Piscataway New Jersey 08854
| | - Margaret J. Gabanyi
- Center for Integrative Proteomics Research, Institute for Quantitative Biomedicine, Department of Chemistry and Chemical Biology, 174 Frelinghuysen Road, Piscataway New Jersey 08854
| |
Collapse
|
9
|
Lawson CL, Patwardhan A, Baker ML, Hryc C, Garcia ES, Hudson BP, Lagerstedt I, Ludtke SJ, Pintilie G, Sala R, Westbrook JD, Berman HM, Kleywegt GJ, Chiu W. EMDataBank unified data resource for 3DEM. Nucleic Acids Res 2015; 44:D396-403. [PMID: 26578576 PMCID: PMC4702818 DOI: 10.1093/nar/gkv1126] [Citation(s) in RCA: 177] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Accepted: 10/15/2015] [Indexed: 01/10/2023] Open
Abstract
Three-dimensional Electron Microscopy (3DEM) has become a key experimental method in structural biology for a broad spectrum of biological specimens from molecules to cells. The EMDataBank project provides a unified portal for deposition, retrieval and analysis of 3DEM density maps, atomic models and associated metadata (emdatabank.org). We provide here an overview of the rapidly growing 3DEM structural data archives, which include maps in EM Data Bank and map-derived models in the Protein Data Bank. In addition, we describe progress and approaches toward development of validation protocols and methods, working with the scientific community, in order to create a validation pipeline for 3DEM data.
Collapse
Affiliation(s)
- Catherine L Lawson
- Department of Chemistry and Chemical Biology and Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, 610 Taylor Road Piscataway, NJ 08854, USA
| | - Ardan Patwardhan
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew L Baker
- Verna and Marrs McLean Department of Biochemistry & Molecular Biology, National Center for Macromolecular Imaging, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 70030, USA
| | - Corey Hryc
- Verna and Marrs McLean Department of Biochemistry & Molecular Biology, National Center for Macromolecular Imaging, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 70030, USA
| | - Eduardo Sanz Garcia
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Brian P Hudson
- Department of Chemistry and Chemical Biology and Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, 610 Taylor Road Piscataway, NJ 08854, USA
| | - Ingvar Lagerstedt
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Steven J Ludtke
- Verna and Marrs McLean Department of Biochemistry & Molecular Biology, National Center for Macromolecular Imaging, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 70030, USA
| | - Grigore Pintilie
- Verna and Marrs McLean Department of Biochemistry & Molecular Biology, National Center for Macromolecular Imaging, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 70030, USA
| | - Raul Sala
- Department of Chemistry and Chemical Biology and Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, 610 Taylor Road Piscataway, NJ 08854, USA
| | - John D Westbrook
- Department of Chemistry and Chemical Biology and Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, 610 Taylor Road Piscataway, NJ 08854, USA
| | - Helen M Berman
- Department of Chemistry and Chemical Biology and Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, 610 Taylor Road Piscataway, NJ 08854, USA
| | - Gerard J Kleywegt
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Wah Chiu
- Verna and Marrs McLean Department of Biochemistry & Molecular Biology, National Center for Macromolecular Imaging, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 70030, USA
| |
Collapse
|
10
|
Ding HJ, Oikonomou CM, Jensen GJ. The Caltech Tomography Database and Automatic Processing Pipeline. J Struct Biol 2015; 192:279-86. [PMID: 26087141 DOI: 10.1016/j.jsb.2015.06.016] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Revised: 06/11/2015] [Accepted: 06/13/2015] [Indexed: 10/23/2022]
Abstract
Here we describe the Caltech Tomography Database and automatic image processing pipeline, designed to process, store, display, and distribute electron tomographic data including tilt-series, sample information, data collection parameters, 3D reconstructions, correlated light microscope images, snapshots, segmentations, movies, and other associated files. Tilt-series are typically uploaded automatically during collection to a user's "Inbox" and processed automatically, but can also be entered and processed in batches via scripts or file-by-file through an internet interface. As with the video website YouTube, each tilt-series is represented on the browsing page with a link to the full record, a thumbnail image and a video icon that delivers a movie of the tomogram in a pop-out window. Annotation tools allow users to add notes and snapshots. The database is fully searchable, and sets of tilt-series can be selected and re-processed, edited, or downloaded to a personal workstation. The results of further processing and snapshots of key results can be recorded in the database, automatically linked to the appropriate tilt-series. While the database is password-protected for local browsing and searching, datasets can be made public and individual files can be shared with collaborators over the Internet. Together these tools facilitate high-throughput tomography work by both individuals and groups.
Collapse
Affiliation(s)
- H Jane Ding
- Division of Biology, California Institute of Technology, 1200 E. California Blvd., Pasadena, CA 91125, United States
| | - Catherine M Oikonomou
- Division of Biology, California Institute of Technology, 1200 E. California Blvd., Pasadena, CA 91125, United States
| | - Grant J Jensen
- Division of Biology, California Institute of Technology, 1200 E. California Blvd., Pasadena, CA 91125, United States; Howard Hughes Medical Institute, United States.
| |
Collapse
|
11
|
Marabini R, Macias JR, Vargas J, Quintana A, Sorzano COS, Carazo JM. On the development of three new tools for organizing and sharing information in three-dimensional electron microscopy. ACTA CRYSTALLOGRAPHICA SECTION D: BIOLOGICAL CRYSTALLOGRAPHY 2013; 69:695-700. [DOI: 10.1107/s0907444913007038] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2012] [Accepted: 03/13/2013] [Indexed: 11/10/2022]
|
12
|
de Vries SJ, Zacharias M. ATTRACT-EM: a new method for the computational assembly of large molecular machines using cryo-EM maps. PLoS One 2012; 7:e49733. [PMID: 23251350 PMCID: PMC3522670 DOI: 10.1371/journal.pone.0049733] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2012] [Accepted: 10/17/2012] [Indexed: 11/23/2022] Open
Abstract
Many of the most important functions in the cell are carried out by proteins organized in large molecular machines. Cryo-electron microscopy (cryo-EM) is increasingly being used to obtain low resolution density maps of these large assemblies. A new method, ATTRACT-EM, for the computational assembly of molecular assemblies from their components has been developed. Based on concepts from the protein-protein docking field, it utilizes cryo-EM density maps to assemble molecular subunits at near atomic detail, starting from millions of initial subunit configurations. The search efficiency was further enhanced by recombining partial solutions, the inclusion of symmetry information, and refinement using a molecular force field. The approach was tested on the GroES-GroEL system, using an experimental cryo-EM map at 23.5 Å resolution, and on several smaller complexes. Inclusion of experimental information on the symmetry of the systems and the application of a new gradient vector matching algorithm allowed the efficient identification of docked assemblies in close agreement with experiment. Application to the GroES-GroEL complex resulted in a top ranked model with a deviation of 4.6 Å (and a 2.8 Å model within the top 10) from the GroES-GroEL crystal structure, a significant improvement over existing methods.
Collapse
Affiliation(s)
- Sjoerd J de Vries
- Physik-Department T38, Technische Universität München, Garching, Germany.
| | | |
Collapse
|
13
|
Patwardhan A, Carazo JM, Carragher B, Henderson R, Heymann JB, Hill E, Jensen GJ, Lagerstedt I, Lawson CL, Ludtke SJ, Mastronarde D, Moore WJ, Roseman A, Rosenthal P, Sorzano COS, Sanz-García E, Scheres SHW, Subramaniam S, Westbrook J, Winn M, Swedlow JR, Kleywegt GJ. Data management challenges in three-dimensional EM. Nat Struct Mol Biol 2012; 19:1203-7. [PMID: 23211764 PMCID: PMC4048199 DOI: 10.1038/nsmb.2426] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2012] [Accepted: 09/24/2012] [Indexed: 11/09/2022]
Abstract
This report describes the outcomes of the Data Management Challenges in 3D Electron Microscopy workshop. Key topics discussed include data models, validation and raw-data archiving. The meeting participants agreed that the EMDataBank should take the lead in addressing these issues, and concrete action points were agreed upon that will have a substantial impact on the accessibility of three-dimensional EM data in biology and medicine.
Collapse
Affiliation(s)
- Ardan Patwardhan
- Protein Data Bank in Europe, European Molecular Biology Laboratory-European Bioinformatics Institute, Hinxton, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Esquivel-Rodríguez J, Kihara D. Fitting multimeric protein complexes into electron microscopy maps using 3D Zernike descriptors. J Phys Chem B 2012; 116:6854-61. [PMID: 22417139 PMCID: PMC3376205 DOI: 10.1021/jp212612t] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A novel computational method for fitting high-resolution structures of multiple proteins into a cryoelectron microscopy map is presented. The method named EMLZerD generates a pool of candidate multiple protein docking conformations of component proteins, which are later compared with a provided electron microscopy (EM) density map to select the ones that fit well into the EM map. The comparison of docking conformations and the EM map is performed using the 3D Zernike descriptor (3DZD), a mathematical series expansion of three-dimensional functions. The 3DZD provides a unified representation of the surface shape of multimeric protein complex models and EM maps, which allows a convenient, fast quantitative comparison of the three-dimensional structural data. Out of 19 multimeric complexes tested, near native complex structures with a root-mean-square deviation of less than 2.5 Å were obtained for 14 cases while medium range resolution structures with correct topology were computed for the additional 5 cases.
Collapse
Affiliation(s)
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
- Markey Center for Structural Biology, Purdue University, West Lafayette, IN, 47907, USA
| |
Collapse
|
15
|
Frankenstein Z, Sperling J, Sperling R, Eisenstein M. A unique spatial arrangement of the snRNPs within the native spliceosome emerges from in silico studies. Structure 2012; 20:1097-106. [PMID: 22578543 DOI: 10.1016/j.str.2012.03.022] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2011] [Revised: 02/25/2012] [Accepted: 03/26/2012] [Indexed: 02/05/2023]
Abstract
The spliceosome is a mega-Dalton ribonucleoprotein (RNP) assembly that processes primary RNA transcripts, producing functional mRNA. The electron microscopy structures of the native spliceosome and of several spliceosomal subcomplexes are available; however, the spatial arrangement of the latter within the native spliceosome is not known. We designed a computational procedure to efficiently fit thousands of conformers into the spliceosome envelope. Despite the low resolution limitations, we obtained only one model that complies with the available biochemical data. Our model localizes the five small nuclear RNPs (snRNPs) mostly within the large subunit of the native spliceosome, requiring only minor conformation changes. The remaining free volume presumably accommodates additional spliceosomal components. The constituents of the active core of the spliceosome are juxtaposed, forming a continuous surface deep within the large spliceosomal cavity, which provides a sheltered environment for the splicing reaction.
Collapse
Affiliation(s)
- Ziv Frankenstein
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | | | |
Collapse
|
16
|
Lasker K, Velázquez-Muriel JA, Webb BM, Yang Z, Ferrin TE, Sali A. Macromolecular assembly structures by comparative modeling and electron microscopy. Methods Mol Biol 2012; 857:331-350. [PMID: 22323229 DOI: 10.1007/978-1-61779-588-6_15] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Advances in electron microscopy allow for structure determination of large biological machines at increasingly higher resolutions. A key step in this process is fitting component structures into the electron microscopy-derived density map of their assembly. Comparative modeling can contribute by providing atomic models of the components, via fold assignment, sequence-structure alignment, model building, and model assessment. All four stages of comparative modeling can also benefit from consideration of the density map. In this chapter, we describe numerous types of modeling problems restrained by a density map and available protocols for finding solutions. In particular, we provide detailed instructions for density map-guided modeling using the Integrative Modeling Platform (IMP), MODELLER, and UCSF Chimera.
Collapse
Affiliation(s)
- Keren Lasker
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.
| | | | | | | | | | | |
Collapse
|
17
|
Ahmed A, Whitford PC, Sanbonmatsu KY, Tama F. Consensus among flexible fitting approaches improves the interpretation of cryo-EM data. J Struct Biol 2011; 177:561-70. [PMID: 22019767 DOI: 10.1016/j.jsb.2011.10.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2011] [Revised: 10/05/2011] [Accepted: 10/06/2011] [Indexed: 12/31/2022]
Abstract
Cryo-elecron microscopy (cryo-EM) can provide important structural information of large macromolecular assemblies in different conformational states. Recent years have seen an increase in structures deposited in the Protein Data Bank (PDB) by fitting a high-resolution structure into its low-resolution cryo-EM map. A commonly used protocol for accommodating the conformational changes between the X-ray structure and the cryo-EM map is rigid body fitting of individual domains. With the emergence of different flexible fitting approaches, there is a need to compare and revise these different protocols for the fitting. We have applied three diverse automated flexible fitting approaches on a protein dataset for which rigid domain fitting (RDF) models have been deposited in the PDB. In general, a consensus is observed in the conformations, which indicates a convergence from these theoretically different approaches to the most probable solution corresponding to the cryo-EM map. However, the result shows that the convergence might not be observed for proteins with complex conformational changes or with missing densities in cryo-EM map. In contrast, RDF structures deposited in the PDB can represent conformations that not only differ from the consensus obtained by flexible fitting but also from X-ray crystallography. Thus, this study emphasizes that a "consensus" achieved by the use of several automated flexible fitting approaches can provide a higher level of confidence in the modeled configurations. Following this protocol not only increases the confidence level of fitting, but also highlights protein regions with uncertain fitting. Hence, this protocol can lead to better interpretation of cryo-EM data.
Collapse
Affiliation(s)
- Aqeel Ahmed
- Department of Chemistry and Biochemistry, The University of Arizona, 1041 E. Lowell Street, Tucson, AZ 85721, USA.
| | | | | | | |
Collapse
|
18
|
Johnson GT, Autin L, Goodsell DS, Sanner MF, Olson AJ. ePMV embeds molecular modeling into professional animation software environments. Structure 2011; 19:293-303. [PMID: 21397181 DOI: 10.1016/j.str.2010.12.023] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2010] [Revised: 11/24/2010] [Accepted: 12/04/2010] [Indexed: 11/16/2022]
Abstract
Increasingly complex research has made it more difficult to prepare data for publication, education, and outreach. Many scientists must also wade through black-box code to interface computational algorithms from diverse sources to supplement their bench work. To reduce these barriers we have developed an open-source plug-in, embedded Python Molecular Viewer (ePMV), that runs molecular modeling software directly inside of professional 3D animation applications (hosts) to provide simultaneous access to the capabilities of these newly connected systems. Uniting host and scientific algorithms into a single interface allows users from varied backgrounds to assemble professional quality visuals and to perform computational experiments with relative ease. By enabling easy exchange of algorithms, ePMV can facilitate interdisciplinary research, smooth communication between broadly diverse specialties, and provide a common platform to frame and visualize the increasingly detailed intersection(s) of cellular and molecular biology.
Collapse
Affiliation(s)
- Graham T Johnson
- Molecular Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | | | | | | | | |
Collapse
|
19
|
Velankar S, Kleywegt GJ. The Protein Data Bank in Europe (PDBe): bringing structure to biology. ACTA CRYSTALLOGRAPHICA SECTION D: BIOLOGICAL CRYSTALLOGRAPHY 2011; 67:324-30. [PMID: 21460450 PMCID: PMC3069747 DOI: 10.1107/s090744491004117x] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2010] [Accepted: 10/13/2010] [Indexed: 12/30/2022]
Abstract
The Protein Data Bank in Europe (PDBe) is the European partner in the Worldwide PDB and as such handles depositions of X-ray, NMR and EM data and structure models. PDBe also provides advanced bioinformatics services based on data from the PDB and related resources. Some of the challenges facing the PDB and its guardians are discussed, as well as some of the areas on which PDBe activities will focus in the future (advanced services, ligands, integration, validation and experimental data). Finally, some recent developments at PDBe are described.
Collapse
Affiliation(s)
- Sameer Velankar
- Protein Data Bank in Europe (PDBe), EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, England
| | | |
Collapse
|
20
|
Beck M, Topf M, Frazier Z, Tjong H, Xu M, Zhang S, Alber F. Exploring the spatial and temporal organization of a cell's proteome. J Struct Biol 2011; 173:483-96. [PMID: 21094684 PMCID: PMC3784337 DOI: 10.1016/j.jsb.2010.11.011] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2010] [Revised: 11/05/2010] [Accepted: 11/08/2010] [Indexed: 10/18/2022]
Abstract
To increase our current understanding of cellular processes, such as cell signaling and division, knowledge is needed about the spatial and temporal organization of the proteome at different organizational levels. These levels cover a wide range of length and time scales: from the atomic structures of macromolecules for inferring their molecular function, to the quantitative description of their abundance, and spatial distribution in the cell. Emerging new experimental technologies are greatly increasing the availability of such spatial information on the molecular organization in living cells. This review addresses three fields that have significantly contributed to our understanding of the proteome's spatial and temporal organization: first, methods for the structure determination of individual macromolecular assemblies, specifically the fitting of atomic structures into density maps generated from electron microscopy techniques; second, research that visualizes the spatial distributions of these complexes within the cellular context using cryo electron tomography techniques combined with computational image processing; and third, methods for the spatial modeling of the dynamic organization of the proteome, specifically those methods for simulating reaction and diffusion of proteins and complexes in crowded intracellular fluids. The long-term goal is to integrate the varied data about a proteome's organization into a spatially explicit, predictive model of cellular processes.
Collapse
Affiliation(s)
- Martin Beck
- European Molecular Biology Laboratory, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Maya Topf
- Molecular Biology, Crystallography, Department of Biological Sciences, Birkbeck College, University of London, London, UK
| | - Zachary Frazier
- Program in Molecular and Computational Biology, University of Southern California, 1050 Childs Way, RRI 413E, Los Angeles, CA 90068, USA
| | - Harianto Tjong
- Program in Molecular and Computational Biology, University of Southern California, 1050 Childs Way, RRI 413E, Los Angeles, CA 90068, USA
| | - Min Xu
- Program in Molecular and Computational Biology, University of Southern California, 1050 Childs Way, RRI 413E, Los Angeles, CA 90068, USA
| | - Shihua Zhang
- Program in Molecular and Computational Biology, University of Southern California, 1050 Childs Way, RRI 413E, Los Angeles, CA 90068, USA
| | - Frank Alber
- Program in Molecular and Computational Biology, University of Southern California, 1050 Childs Way, RRI 413E, Los Angeles, CA 90068, USA
| |
Collapse
|
21
|
Lasker K, Sali A, Wolfson HJ. Determining macromolecular assembly structures by molecular docking and fitting into an electron density map. Proteins 2011; 78:3205-11. [PMID: 20827723 DOI: 10.1002/prot.22845] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Structural models of macromolecular assemblies are instrumental for gaining a mechanistic understanding of cellular processes. Determining these structures is a major challenge for experimental techniques, such as X-ray crystallography, NMR spectroscopy and electron microscopy (EM). Thus, computational modeling techniques, including molecular docking, are required. The development of most molecular docking methods has so far been focused on modeling of binary complexes. We have recently introduced the MultiFit method for modeling the structure of a multisubunit complex by simultaneously optimizing the fit of the model into an EM density map of the entire complex and the shape complementarity between interacting subunits. Here, we report algorithmic advances of the MultiFit method that result in an efficient and accurate assembly of the input subunits into their density map. The successful predictions and the increasing number of complexes being characterized by EM suggests that the CAPRI challenge could be extended to include docking-based modeling of macromolecular assemblies guided by EM.
Collapse
Affiliation(s)
- Keren Lasker
- Raymond and Beverly Sackler Faculty of Exact Sciences, Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | |
Collapse
|
22
|
Abstract
Three-dimensional (3D) cryoelectron microscopy reconstruction methods are uniquely able to reveal structures of many important macromolecules and macromolecular complexes. EMDataBank.org, a joint effort of the Protein Databank in Europe (PDBe), the Research Collaboratory for Structural Bioinformatics (RCSB), and the National Center for Macromolecular Imaging (NCMI), is a "one-stop shop" resource for global deposition and retrieval of cryo-EM map, model, and associated metadata. The resource unifies public access to the two major EM Structural Data archives: EM Data Bank (EMDB) and Protein Data Bank (PDB), and facilitates use of EM structural data of macromolecules and macromolecular complexes by the wider scientific community.
Collapse
Affiliation(s)
- Catherine L Lawson
- Department of Chemistry and Chemical Biology and Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, USA
| |
Collapse
|
23
|
Kim DN, Altschuler J, Strong C, McGill G, Bathe M. Conformational dynamics data bank: a database for conformational dynamics of proteins and supramolecular protein assemblies. Nucleic Acids Res 2011; 39:D451-5. [PMID: 21051356 PMCID: PMC3013685 DOI: 10.1093/nar/gkq1088] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2010] [Revised: 10/08/2010] [Accepted: 10/15/2010] [Indexed: 12/29/2022] Open
Abstract
The conformational dynamics data bank (CDDB, http://www.cdyn.org) is a database that aims to provide comprehensive results on the conformational dynamics of high molecular weight proteins and protein assemblies. Analysis is performed using a recently introduced coarse-grained computational approach that is applied to the majority of structures present in the electron microscopy data bank (EMDB). Results include equilibrium thermal fluctuations and elastic strain energy distributions that identify rigid versus flexible protein domains generally, as well as those associated with specific functional transitions, and correlations in molecular motions that identify molecular regions that are highly coupled dynamically, with implications for allosteric mechanisms. A practical web-based search interface enables users to easily collect conformational dynamics data in various formats. The data bank is maintained and updated automatically to include conformational dynamics results for new structural entries as they become available in the EMDB. The CDDB complements static structural information to facilitate the investigation and interpretation of the biological function of proteins and protein assemblies essential to cell function.
Collapse
Affiliation(s)
- Do-Nyun Kim
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, Digizyme, Inc., Brookline, MA 02446 and Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Josiah Altschuler
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, Digizyme, Inc., Brookline, MA 02446 and Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Campbell Strong
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, Digizyme, Inc., Brookline, MA 02446 and Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Gaël McGill
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, Digizyme, Inc., Brookline, MA 02446 and Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Mark Bathe
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, Digizyme, Inc., Brookline, MA 02446 and Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
24
|
Velankar S, Alhroub Y, Alili A, Best C, Boutselakis HC, Caboche S, Conroy MJ, Dana JM, van Ginkel G, Golovin A, Gore SP, Gutmanas A, Haslam P, Hirshberg M, John M, Lagerstedt I, Mir S, Newman LE, Oldfield TJ, Penkett CJ, Pineda-Castillo J, Rinaldi L, Sahni G, Sawka G, Sen S, Slowley R, Sousa da Silva AW, Suarez-Uruena A, Swaminathan GJ, Symmons MF, Vranken WF, Wainwright M, Kleywegt GJ. PDBe: Protein Data Bank in Europe. Nucleic Acids Res 2010; 39:D402-10. [PMID: 21045060 PMCID: PMC3013808 DOI: 10.1093/nar/gkq985] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The Protein Data Bank in Europe (PDBe; pdbe.org) is actively involved in managing the international archive of biomacromolecular structure data as one of the partners in the Worldwide Protein Data Bank (wwPDB; wwpdb.org). PDBe also develops new tools to make structural data more widely and more easily available to the biomedical community. PDBe has developed a browser to access and analyze the structural archive using classification systems that are familiar to chemists and biologists. The PDBe web pages that describe individual PDB entries have been enhanced through the introduction of plain-English summary pages and iconic representations of the contents of an entry (PDBprints). In addition, the information available for structures determined by means of NMR spectroscopy has been expanded. Finally, the entire web site has been redesigned to make it substantially easier to use for expert and novice users alike. PDBe works closely with other teams at the European Bioinformatics Institute (EBI) and in the international scientific community to develop new resources with value-added information. The SIFTS initiative is an example of such a collaboration—it provides extensive mapping data between proteins whose structures are available from the PDB and a host of other biomedical databases. SIFTS is widely used by major bioinformatics resources.
Collapse
Affiliation(s)
- Sameer Velankar
- Protein Data Bank in Europe, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Lawson CL, Baker ML, Best C, Bi C, Dougherty M, Feng P, van Ginkel G, Devkota B, Lagerstedt I, Ludtke SJ, Newman RH, Oldfield TJ, Rees I, Sahni G, Sala R, Velankar S, Warren J, Westbrook JD, Henrick K, Kleywegt GJ, Berman HM, Chiu W. EMDataBank.org: unified data resource for CryoEM. Nucleic Acids Res 2010; 39:D456-64. [PMID: 20935055 PMCID: PMC3013769 DOI: 10.1093/nar/gkq880] [Citation(s) in RCA: 192] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Cryo-electron microscopy reconstruction methods are uniquely able to reveal structures of many important macromolecules and macromolecular complexes. EMDataBank.org, a joint effort of the Protein Data Bank in Europe (PDBe), the Research Collaboratory for Structural Bioinformatics (RCSB) and the National Center for Macromolecular Imaging (NCMI), is a global ‘one-stop shop’ resource for deposition and retrieval of cryoEM maps, models and associated metadata. The resource unifies public access to the two major archives containing EM-based structural data: EM Data Bank (EMDB) and Protein Data Bank (PDB), and facilitates use of EM structural data of macromolecules and macromolecular complexes by the wider scientific community.
Collapse
Affiliation(s)
- Catherine L Lawson
- Department of Chemistry and Chemical Biology and Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, 610 Taylor Road Piscataway, NJ 08854, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Lasker K, Phillips JL, Russel D, Velázquez-Muriel J, Schneidman-Duhovny D, Tjioe E, Webb B, Schlessinger A, Sali A. Integrative structure modeling of macromolecular assemblies from proteomics data. Mol Cell Proteomics 2010; 9:1689-702. [PMID: 20507923 DOI: 10.1074/mcp.r110.000067] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Proteomics techniques have been used to generate comprehensive lists of protein interactions in a number of species. However, relatively little is known about how these interactions result in functional multiprotein complexes. This gap can be bridged by combining data from proteomics experiments with data from established structure determination techniques. Correspondingly, integrative computational methods are being developed to provide descriptions of protein complexes at varying levels of accuracy and resolution, ranging from complex compositions to detailed atomic structures.
Collapse
Affiliation(s)
- Keren Lasker
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California 94158, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Mannige RV, Brooks CL. Periodic table of virus capsids: implications for natural selection and design. PLoS One 2010; 5:e9423. [PMID: 20209096 PMCID: PMC2831995 DOI: 10.1371/journal.pone.0009423] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2009] [Accepted: 01/21/2010] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND For survival, most natural viruses depend upon the existence of spherical capsids: protective shells of various sizes composed of protein subunits. So far, general evolutionary pressures shaping capsid design have remained elusive, even though an understanding of such properties may help in rationally impeding the virus life cycle and designing efficient nano-assemblies. PRINCIPAL FINDINGS This report uncovers an unprecedented and species-independent evolutionary pressure on virus capsids, based on the the notion that the simplest capsid designs (or those capsids with the lowest "hexamer complexity", C(h)) are the fittest, which was shown to be true for all available virus capsids. The theories result in a physically meaningful periodic table of virus capsids that uncovers strong and overarching evolutionary pressures, while also offering geometric explanations to other capsid properties (rigidity, pleomorphy, auxiliary requirements, etc.) that were previously considered to be unrelatable properties of the individual virus. SIGNIFICANCE Apart from describing a universal rule for virus capsid evolution, our work (especially the periodic table) provides a language with which highly diverse virus capsids, unified only by geometry, may be described and related to each other. Finally, the available virus structure databases and other published data reiterate the predicted geometry-derived rules, reinforcing the role of geometry in the natural selection and design of virus capsids.
Collapse
Affiliation(s)
- Ranjan V. Mannige
- Department of Chemistry and Biophysics Program, University of Michigan, Ann Arbor, Michigan, United States of America
- Center for Theoretical Biological Physics, University of California San Diego, La Jolla, California, United States of America
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California, United States of America
| | - Charles L. Brooks
- Department of Chemistry and Biophysics Program, University of Michigan, Ann Arbor, Michigan, United States of America
- Center for Theoretical Biological Physics, University of California San Diego, La Jolla, California, United States of America
| |
Collapse
|
28
|
Velankar S, Best C, Beuth B, Boutselakis CH, Cobley N, Sousa Da Silva AW, Dimitropoulos D, Golovin A, Hirshberg M, John M, Krissinel EB, Newman R, Oldfield T, Pajon A, Penkett CJ, Pineda-Castillo J, Sahni G, Sen S, Slowley R, Suarez-Uruena A, Swaminathan J, van Ginkel G, Vranken WF, Henrick K, Kleywegt GJ. PDBe: Protein Data Bank in Europe. Nucleic Acids Res 2009; 38:D308-17. [PMID: 19858099 PMCID: PMC2808887 DOI: 10.1093/nar/gkp916] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Protein Data Bank in Europe (PDBe) (http://www.ebi.ac.uk/pdbe/) is actively working with its Worldwide Protein Data Bank partners to enhance the quality and consistency of the international archive of bio-macromolecular structure data, the Protein Data Bank (PDB). PDBe also works closely with its collaborators at the European Bioinformatics Institute and the scientific community around the world to enhance its databases and services by adding curated and actively maintained derived data to the existing structural data in the PDB. We have developed a new database infrastructure based on the remediated PDB archive data and a specially designed database for storing information on interactions between proteins and bound molecules. The group has developed new services that allow users to carry out simple textual queries or more complex 3D structure-based queries. The newly designed 'PDBeView Atlas pages' provide an overview of an individual PDB entry in a user-friendly layout and serve as a starting point to further explore the information available in the PDBe database. PDBe's active involvement with the X-ray crystallography, Nuclear Magnetic Resonance spectroscopy and cryo-Electron Microscopy communities have resulted in improved tools for structure deposition and analysis.
Collapse
Affiliation(s)
- S Velankar
- Protein Databank in Europe, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Lasker K, Topf M, Sali A, Wolfson HJ. Inferential optimization for simultaneous fitting of multiple components into a CryoEM map of their assembly. J Mol Biol 2009; 388:180-94. [PMID: 19233204 DOI: 10.1016/j.jmb.2009.02.031] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2008] [Revised: 12/29/2008] [Accepted: 02/12/2009] [Indexed: 11/24/2022]
Abstract
Models of macromolecular assemblies are essential for a mechanistic description of cellular processes. Such models are increasingly obtained by fitting atomic-resolution structures of components into a density map of the whole assembly. Yet, current density-fitting techniques are frequently insufficient for an unambiguous determination of the positions and orientations of all components. Here, we describe MultiFit, a method used for simultaneously fitting atomic structures of components into their assembly density map at resolutions as low as 25 A. The component positions and orientations are optimized with respect to a scoring function that includes the quality-of-fit of components in the map, the protrusion of components from the map envelope, and the shape complementarity between pairs of components. The scoring function is optimized by our exact inference optimizer DOMINO (Discrete Optimization of Multiple INteracting Objects) that efficiently finds the global minimum in a discrete sampling space. MultiFit was benchmarked on seven assemblies of known structure, consisting of up to seven proteins each. The input atomic structures of the components were obtained from the Protein Data Bank, as well as by comparative modeling based on a 16-99% sequence identity to a template structure. A near-native configuration was usually found as the top-scoring model. Therefore, MultiFit can provide initial configurations for further refinement of many multicomponent assembly structures described by electron microscopy.
Collapse
Affiliation(s)
- Keren Lasker
- Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel-Aviv 69978, Israel.
| | | | | | | |
Collapse
|
30
|
Data deposition and annotation at the worldwide protein data bank. Mol Biotechnol 2008; 42:1-13. [PMID: 19082769 DOI: 10.1007/s12033-008-9127-7] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2008] [Accepted: 11/06/2008] [Indexed: 10/21/2022]
Abstract
The Protein Data Bank (PDB) is the repository for three-dimensional structures of biological macromolecules, determined by experimental methods. The data in the archive is free and easily available via the Internet from any of the worldwide centers managing this global archive. These data are used by scientists, researchers, bioinformatics specialists, educators, students, and general audiences to understand biological phenomenon at a molecular level. Analysis of this structural data also inspires and facilitates new discoveries in science. This chapter describes the tools and methods currently used for deposition, processing, and release of data in the PDB. References to future enhancements are also included.
Collapse
|
31
|
Dutta S, Burkhardt K, Swaminathan GJ, Kosada T, Henrick K, Nakamura H, Berman HM. Data deposition and annotation at the worldwide protein data bank. Methods Mol Biol 2008; 426:81-101. [PMID: 18542858 DOI: 10.1007/978-1-60327-058-8_5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Abstract
The Protein Data Bank (PDB) is the repository for the three-dimensional structures of biological macromolecules, determined by experimental methods. The data in the archive are free and easily available via the Internet from any of the worldwide centers managing this global archive. These data are used by scientists, researchers, bioinformatics specialists, educators, students, and lay audiences to understand biological phenomena at a molecular level. Analysis of these structural data also inspires and facilitates new discoveries in science. This chapter describes the tools and methods currently used for deposition, processing, and release of data in the PDB. References to future enhancements are also included.
Collapse
Affiliation(s)
- Shuchismita Dutta
- Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | | | | | | | | | | | | |
Collapse
|
32
|
Multiple subunit fitting into a low-resolution density map of a macromolecular complex using a gaussian mixture model. Biophys J 2008; 95:4643-58. [PMID: 18708469 PMCID: PMC2576401 DOI: 10.1529/biophysj.108.137125] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Recently, electron microscopy measurement of single particles has enabled us to reconstruct a low-resolution 3D density map of large biomolecular complexes. If structures of the complex subunits can be solved by x-ray crystallography at atomic resolution, fitting these models into the 3D density map can generate an atomic resolution model of the entire large complex. The fitting of multiple subunits, however, generally requires large computational costs; therefore, development of an efficient algorithm is required. We developed a fast fitting program, “gmfit”, which employs a Gaussian mixture model (GMM) to represent approximated shapes of the 3D density map and the atomic models. A GMM is a distribution function composed by adding together several 3D Gaussian density functions. Because our model analytically provides an integral of a product of two distribution functions, it enables us to quickly calculate the fitness of the density map and the atomic models. Using the integral, two types of potential energy function are introduced: the attraction potential energy between a 3D density map and each subunit, and the repulsion potential energy between subunits. The restraint energy for symmetry is also employed to build symmetrical origomeric complexes. To find the optimal configuration of subunits, we randomly generated initial configurations of subunit models, and performed a steepest-descent method using forces and torques of the three potential energies. Comparison between an original density map and its GMM showed that the required number of Gaussian distribution functions for a given accuracy depended on both resolution and molecular size. We then performed test fitting calculations for simulated low-resolution density maps of atomic models of homodimer, trimer, and hexamer, using different search parameters. The results indicated that our method was able to rebuild atomic models of a complex even for maps of 30 Å resolution if sufficient numbers (eight or more) of Gaussian distribution functions were employed for each subunit, and the symmetric restraints were assigned for complexes with more than three subunits. As a more realistic test, we tried to build an atomic model of the GroEL/ES complex by fitting 21-subunit atomic models into the 3D density map obtained by cryoelectron microscopy using the C7 symmetric restraints. A model with low root mean-square deviations (14.7 Å) was obtained as the lowest-energy model, showing that our fitting method was reasonably accurate. Inclusion of other restraints from biological and biochemical experiments could further enhance the accuracy.
Collapse
|
33
|
Smith R, Carragher B. Software tools for molecular microscopy. J Struct Biol 2008; 163:224-8. [PMID: 18406627 DOI: 10.1016/j.jsb.2008.03.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2008] [Revised: 03/04/2008] [Accepted: 03/04/2008] [Indexed: 10/22/2022]
Abstract
In our role as the editors of a special edition of the Journal of Structural Biology published in 1996 and devoted to the development of software tools, we offer our view of past developments and future prospects in this area. The astonishing progress in computer hardware over the past decade has fueled a significant increase in computational power available for the solution of macromolecular structures. At the same time the relatively slow growth and development of the accompanying software reflects the difficulties of developing large, complex and very specialized analytical methods.
Collapse
Affiliation(s)
- Ross Smith
- Department of Cell Biology, New York University School of Medicine, New York, NY, USA
| | | |
Collapse
|
34
|
Berman HM. The Protein Data Bank: a historical perspective. Acta Crystallogr A 2007; 64:88-95. [PMID: 18156675 DOI: 10.1107/s0108767307035623] [Citation(s) in RCA: 224] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2007] [Accepted: 07/20/2007] [Indexed: 11/10/2022] Open
Abstract
The Protein Data Bank began as a grassroots effort in 1971. It has grown from a small archive containing a dozen structures to a major international resource for structural biology containing more than 40000 entries. The interplay of science, technology and attitudes about data sharing have all played a role in the growth of this resource.
Collapse
|
35
|
Martone ME, Sargis J, Tran J, Wong WW, Jiles H, Mangir C. Database resources for cellular electron microscopy. Methods Cell Biol 2007; 79:799-822. [PMID: 17327184 DOI: 10.1016/s0091-679x(06)79031-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Affiliation(s)
- Maryann E Martone
- National Center for Microscopy and Imaging Research, Center for Research in Biological Systems, University of California, San Diego, La Jolla, California 92093, USA
| | | | | | | | | | | |
Collapse
|
36
|
Tagari M, Tate J, Swaminathan GJ, Newman R, Naim A, Vranken W, Kapopoulou A, Hussain A, Fillon J, Henrick K, Velankar S. E-MSD: improving data deposition and structure quality. Nucleic Acids Res 2006; 34:D287-90. [PMID: 16381867 PMCID: PMC1347525 DOI: 10.1093/nar/gkj163] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The Macromolecular Structure Database (MSD) (http://www.ebi.ac.uk/msd/) [H. Boutselakis, D. Dimitropoulos, J. Fillon, A. Golovin, K. Henrick, A. Hussain, J. Ionides, M. John, P. A. Keller, E. Krissinel et al. (2003) E-MSD: the European Bioinformatics Institute Macromolecular Structure Database. Nucleic Acids Res., 31, 458-462.] group is one of the three partners in the worldwide Protein DataBank (wwPDB), the consortium entrusted with the collation, maintenance and distribution of the global repository of macromolecular structure data [H. Berman, K. Henrick and H. Nakamura (2003) Announcing the worldwide Protein Data Bank. Nature Struct. Biol., 10, 980.]. Since its inception, the MSD group has worked with partners around the world to improve the quality of PDB data, through a clean up programme that addresses inconsistencies and inaccuracies in the legacy archive. The improvements in data quality in the legacy archive have been achieved largely through the creation of a unified data archive, in the form of a relational database that stores all of the data in the wwPDB. The three partners are working towards improving the tools and methods for the deposition of new data by the community at large. The implementation of the MSD database, together with the parallel development of improved tools and methodologies for data harvesting, validation and archival, has lead to significant improvements in the quality of data that enters the archive. Through this and related projects in the NMR and EM realms the MSD continues to improve the quality of publicly available structural data.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - S. Velankar
- To whom correspondence should be addressed. Tel: +44 1223 494 646; Fax: +44 1223 494 468;
| |
Collapse
|
37
|
Arzt S, Beteva A, Cipriani F, Delageniere S, Felisaz F, Förstner G, Gordon E, Launer L, Lavault B, Leonard G, Mairs T, McCarthy A, McCarthy J, McSweeney S, Meyer J, Mitchell E, Monaco S, Nurizzo D, Ravelli R, Rey V, Shepard W, Spruce D, Svensson O, Theveneau P. Automation of macromolecular crystallography beamlines. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2005; 89:124-52. [PMID: 15910915 DOI: 10.1016/j.pbiomolbio.2004.09.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The production of three-dimensional crystallographic structural information of macromolecules can now be thought of as a pipeline which is being streamlined at every stage from protein cloning, expression and purification, through crystallisation to data collection and structure solution. Synchrotron X-ray beamlines are a key section of this pipeline as it is at these that the X-ray diffraction data that ultimately leads to the elucidation of macromolecular structures are collected. The burgeoning number of macromolecular crystallography (MX) beamlines available worldwide may be enhanced significantly with the automation of both their operation and of the experiments carried out on them. This paper reviews the current situation and provides a glimpse of how a MX beamline may look in the not too distant future.
Collapse
Affiliation(s)
- Steffi Arzt
- European Synchrotron Radiation Facility, 6 rue Jules Horowitz, Zip 38000, BP 220, F-38043 Grenoble Cedex, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|