1
|
Abrahams G, Newman J. Data- and diversity-driven development of a Shotgun crystallization screen using the Protein Data Bank. ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY 2021; 77:1437-1450. [PMID: 34726171 DOI: 10.1107/s2059798321009724] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 09/17/2021] [Indexed: 11/10/2022]
Abstract
Protein crystallization has for decades been a critical and restrictive step in macromolecular structure determination via X-ray diffraction. Crystallization typically involves a multi-stage exploration of the available chemical space, beginning with an initial sampling (screening) followed by iterative refinement (optimization). Effective screening is important for reducing the number of optimization rounds required, reducing the cost and time required to determine a structure. Here, an initial screen (Shotgun II) derived from analysis of the up-to-date Protein Data Bank (PDB) is proposed and compared with the previously derived (2014) Shotgun I screen. In an update to that analysis, it is clarified that the Shotgun approach entails finding the crystallization conditions that cover the most diverse space of proteins by sequence found in the PDB, which can be mapped to the well known maximum coverage problem in computer science. With this realization, it was possible to apply a more effective algorithm for selecting conditions. In-house data demonstrate that compared with alternatives, the Shotgun I screen has been remarkably successful over the seven years that it has been in use, indicating that Shotgun II is also likely to be a highly effective screen.
Collapse
Affiliation(s)
- Gabriel Abrahams
- Manufacturing (Biomedical), CSIRO, 343 Royal Parade, Parkville, VIC 3052, Australia
| | - Janet Newman
- Manufacturing (Biomedical), CSIRO, 343 Royal Parade, Parkville, VIC 3052, Australia
| |
Collapse
|
2
|
Westerman EL, Bowman SEJ, Davidson B, Davis MC, Larson ER, Sanford CPJ. Deploying Big Data to Crack the Genotype to Phenotype Code. Integr Comp Biol 2021; 60:385-396. [PMID: 32492136 DOI: 10.1093/icb/icaa055] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Mechanistically connecting genotypes to phenotypes is a longstanding and central mission of biology. Deciphering these connections will unite questions and datasets across all scales from molecules to ecosystems. Although high-throughput sequencing has provided a rich platform on which to launch this effort, tools for deciphering mechanisms further along the genome to phenome pipeline remain limited. Machine learning approaches and other emerging computational tools hold the promise of augmenting human efforts to overcome these obstacles. This vision paper is the result of a Reintegrating Biology Workshop, bringing together the perspectives of integrative and comparative biologists to survey challenges and opportunities in cracking the genotype to phenotype code and thereby generating predictive frameworks across biological scales. Key recommendations include promoting the development of minimum "best practices" for the experimental design and collection of data; fostering sustained and long-term data repositories; promoting programs that recruit, train, and retain a diversity of talent; and providing funding to effectively support these highly cross-disciplinary efforts. We follow this discussion by highlighting a few specific transformative research opportunities that will be advanced by these efforts.
Collapse
Affiliation(s)
- Erica L Westerman
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA
| | - Sarah E J Bowman
- High-Throughput Crystallization Screening Center, Hauptman-Woodward Medical Research Institute, Buffalo, NY 14203, USA.,Department of Biochemistry, Jacobs School of Medicine & Biomedical Sciences at the University at Buffalo, Buffalo, NY 14203, USA
| | - Bradley Davidson
- Department of Biology, Swarthmore College, Swarthmore, PA 19081, USA
| | - Marcus C Davis
- Department of Biology, James Madison University, Harrisonburg, VA 22807, USA
| | - Eric R Larson
- Department of Natural Resources and Environmental Sciences, University of Illinois, Urbana, IL 61801, USA
| | - Christopher P J Sanford
- Department of Ecology, Evolution and Organismal Biology, Kennesaw State University, Kennesaw, GA 30144, USA
| |
Collapse
|
3
|
Daniel E, Maksimainen MM, Smith N, Ratas V, Biterova E, Murthy SN, Rahman MT, Kiema TR, Sridhar S, Cordara G, Dalwani S, Venkatesan R, Prilusky J, Dym O, Lehtiö L, Koski MK, Ashton AW, Sussman JL, Wierenga RK. IceBear: an intuitive and versatile web application for research-data tracking from crystallization experiment to PDB deposition. Acta Crystallogr D Struct Biol 2021; 77:151-163. [PMID: 33559605 PMCID: PMC7869904 DOI: 10.1107/s2059798320015223] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 11/15/2020] [Indexed: 12/26/2022] Open
Abstract
The web-based IceBear software is a versatile tool to monitor the results of crystallization experiments and is designed to facilitate supervisor and student communications. It also records and tracks all relevant information from crystallization setup to PDB deposition in protein crystallography projects. Fully automated data collection is now possible at several synchrotrons, which means that the number of samples tested at the synchrotron is currently increasing rapidly. Therefore, the protein crystallography research communities at the University of Oulu, Weizmann Institute of Science and Diamond Light Source have joined forces to automate the uploading of sample metadata to the synchrotron. In IceBear, each crystal selected for data collection is given a unique sample name and a crystal page is generated. Subsequently, the metadata required for data collection are uploaded directly to the ISPyB synchrotron database by a shipment module, and for each sample a link to the relevant ISPyB page is stored. IceBear allows notes to be made for each sample during cryocooling treatment and during data collection, as well as in later steps of the structure determination. Protocols are also available to aid the recycling of pins, pucks and dewars when the dewar returns from the synchrotron. The IceBear database is organized around projects, and project members can easily access the crystallization and diffraction metadata for each sample, as well as any additional information that has been provided via the notes. The crystal page for each sample connects the crystallization, diffraction and structural information by providing links to the IceBear drop-viewer page and to the ISPyB data-collection page, as well as to the structure deposited in the Protein Data Bank.
Collapse
Affiliation(s)
- Ed Daniel
- Biocenter Oulu, University of Oulu, Oulu, Finland
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland
| | - Mirko M. Maksimainen
- Biocenter Oulu, University of Oulu, Oulu, Finland
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland
| | - Neil Smith
- Diamond Light Source, Harwell Science and Innovation Campus, Didcot, United Kingdom
| | - Ville Ratas
- Biocenter Oulu, University of Oulu, Oulu, Finland
| | - Ekaterina Biterova
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland
| | - Sudarshan N. Murthy
- Biocenter Oulu, University of Oulu, Oulu, Finland
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland
| | - M. Tanvir Rahman
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland
| | | | - Shruthi Sridhar
- Biocenter Oulu, University of Oulu, Oulu, Finland
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland
| | - Gabriele Cordara
- Biocenter Oulu, University of Oulu, Oulu, Finland
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland
| | - Subhadra Dalwani
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland
| | - Rajaram Venkatesan
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland
| | - Jaime Prilusky
- Bioinformatics and Biological Computing Unit, Life Science Core Facility, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Orly Dym
- Israel Structural Proteomics Center, Life Science Core Facility, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Lari Lehtiö
- Biocenter Oulu, University of Oulu, Oulu, Finland
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland
| | | | - Alun W. Ashton
- Diamond Light Source, Harwell Science and Innovation Campus, Didcot, United Kingdom
| | - Joel L. Sussman
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Rik K. Wierenga
- Biocenter Oulu, University of Oulu, Oulu, Finland
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland
| |
Collapse
|
4
|
Lynch ML, Dudek MF, Bowman SE. A Searchable Database of Crystallization Cocktails in the PDB: Analyzing the Chemical Condition Space. PATTERNS (NEW YORK, N.Y.) 2020; 1:100024. [PMID: 32776019 PMCID: PMC7409820 DOI: 10.1016/j.patter.2020.100024] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 03/22/2020] [Accepted: 03/30/2020] [Indexed: 10/26/2022]
Abstract
Nearly 90% of structural models in the Protein Data Bank (PDB), the central resource worldwide for three-dimensional structural information, are currently derived from macromolecular crystallography (MX). A major bottleneck in determining MX structures is finding conditions in which a biomolecule will crystallize. Here, we present a searchable database of the chemicals associated with successful crystallization experiments from the PDB. We use these data to examine the relationship between protein secondary structure and average molecular weight of polyethylene glycol and to investigate patterns in crystallization conditions. Our analyses reveal striking patterns of both redundancy of chemical compositions in crystallization experiments and extreme sparsity of specific chemical combinations, underscoring the challenges faced in generating predictive models for de novo optimal crystallization experiments.
Collapse
Affiliation(s)
- Miranda L. Lynch
- High-Throughput Crystallization Screening Center, Hauptman-Woodward Medical Research Institute, Buffalo, NY 14203, USA
| | - Max F. Dudek
- University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Sarah E.J. Bowman
- High-Throughput Crystallization Screening Center, Hauptman-Woodward Medical Research Institute, Buffalo, NY 14203, USA
- Department of Biochemistry, Jacobs School of Medicine & Biomedical Sciences at the University at Buffalo, Buffalo, NY 14203, USA
| |
Collapse
|
5
|
Wilson J, Ristic M, Kirkwood J, Hargreaves D, Newman J. Predicting the Effect of Chemical Factors on the pH of Crystallization Trials. iScience 2020; 23:101219. [PMID: 32540772 PMCID: PMC7298652 DOI: 10.1016/j.isci.2020.101219] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2020] [Revised: 05/14/2020] [Accepted: 05/27/2020] [Indexed: 01/13/2023] Open
Abstract
In macromolecular crystallization, success is often dependent on the pH of the experiment. However, little is known about the pH of reagents used, and it is generally assumed that the pH of the experiment will closely match that of any buffering chemical in the solution. We use a large dataset of experimentally measured solution pH values to show that this assumption can be very wrong and generate a model that can be used to successfully predict the overall solution pH of a crystallization experiment. Furthermore, we investigate the time dependence of the pH of some polyethylene glycol polymers widely used in protein crystallization under different storage conditions. The overall pH of crystallization solutions can be modeled The model was trained and tested on a set of more than 40,000 measured pH values A pH value can be assigned to a non-buffered crystallization cocktail A 12-month stability study of polyethylene glycol suggests ways to store PEGs
Collapse
Affiliation(s)
- Julie Wilson
- Department of Mathematics, University of York, York, UK.
| | - Marko Ristic
- Collaborative Crystallisation Centre, CSIRO, Parkville, VIC, Australia
| | | | - David Hargreaves
- AstraZeneca, Darwin Building, Cambridge Science Park, Cambridge, UK
| | - Janet Newman
- Collaborative Crystallisation Centre, CSIRO, Parkville, VIC, Australia.
| |
Collapse
|
6
|
Abstract
The process of macromolecular crystallisation almost always begins by setting up crystallisation trials using commercial or other premade screens, followed by cycles of optimisation where the crystallisation cocktails are focused towards a particular small region of chemical space. The screening process is relatively straightforward, but still requires an understanding of the plethora of commercially available screens. Optimisation is complicated by requiring both the design and preparation of the appropriate secondary screens. Software has been developed in the C3 lab to aid the process of choosing initial screens, to analyse the results of the initial trials, and to design and describe how to prepare optimisation screens.
Collapse
|
7
|
Svensson O, Gilski M, Nurizzo D, Bowler MW. A comparative anatomy of protein crystals: lessons from the automatic processing of 56 000 samples. IUCRJ 2019; 6:822-831. [PMID: 31576216 PMCID: PMC6760449 DOI: 10.1107/s2052252519008017] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 06/04/2019] [Indexed: 05/12/2023]
Abstract
The fully automatic processing of crystals of macromolecules has presented a unique opportunity to gather information on the samples that is not usually recorded. This has proved invaluable in improving sample-location, characterization and data-collection algorithms. After operating for four years, MASSIF-1 has now processed over 56 000 samples, gathering information at each stage, from the volume of the crystal to the unit-cell dimensions, the space group, the quality of the data collected and the reasoning behind the decisions made in data collection. This provides an unprecedented opportunity to analyse these data together, providing a detailed landscape of macromolecular crystals, intimate details of their contents and, importantly, how the two are related. The data show that mosaic spread is unrelated to the size or shape of crystals and demonstrate experimentally that diffraction intensities scale in proportion to crystal volume and molecular weight. It is also shown that crystal volume scales inversely with molecular weight. The results set the scene for the development of X-ray crystallography in a changing environment for structural biology.
Collapse
Affiliation(s)
- Olof Svensson
- European Synchrotron Radiation Facility, 71 Avenue des Martyrs, CS 40220, F-38043 Grenoble, France
| | - Maciej Gilski
- European Molecular Biology Laboratory, Grenoble Outstation, 71 Avenue des Martyrs, CS 90181, F-38042 Grenoble, France
| | - Didier Nurizzo
- European Synchrotron Radiation Facility, 71 Avenue des Martyrs, CS 40220, F-38043 Grenoble, France
| | - Matthew W. Bowler
- European Molecular Biology Laboratory, Grenoble Outstation, 71 Avenue des Martyrs, CS 90181, F-38042 Grenoble, France
| |
Collapse
|
8
|
van Raaij MJ. Welcoming Janet Newman with a BLAST on crystallization strategy. Acta Crystallogr F Struct Biol Commun 2019; 75:147. [DOI: 10.1107/s2053230x19003091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|