1
|
Helliwell JR, Hester JR, Kroon-Batenburg LMJ, McMahon B, Storm SLS. The evolution of raw data archiving and the growth of its importance in crystallography. IUCRJ 2024; 11:464-475. [PMID: 38864497 PMCID: PMC11220881 DOI: 10.1107/s205225252400455x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Accepted: 05/15/2024] [Indexed: 06/13/2024]
Abstract
The hardware for data archiving has expanded capacities for digital storage enormously in the past decade or more. The IUCr evaluated the costs and benefits of this within an official working group which advised that raw data archiving would allow ground truth reproducibility in published studies. Consultations of the IUCr's Commissions ensued via a newly constituted standing advisory committee, the Committee on Data. At all stages, the IUCr financed workshops to facilitate community discussions and possible methods of raw data archiving implementation. The recent launch of the IUCrData journal's Raw Data Letters is a milestone in the implementation of raw data archiving beyond the currently published studies: it includes diffraction patterns that have not been fully interpreted, if at all. The IUCr 75th Congress in Melbourne included a workshop on raw data reuse, discussing the successes and ongoing challenges of raw data reuse. This article charts the efforts of the IUCr to facilitate discussions and plans relating to raw data archiving and reuse within the various communities of crystallography, diffraction and scattering.
Collapse
Affiliation(s)
- John R. Helliwell
- Department of ChemistryUniversity of ManchesterManchesterM13 9PLUnited Kingdom
| | - James R. Hester
- Australian Nuclear Science and Technology Organisation (ANSTO)Locked Bag 2001Kirrawee DCNew South Wales2232Australia
| | - Loes M. J. Kroon-Batenburg
- Structural Biochemistry, Bijvoet Center for Biomolecular ResearchUtrecht UniversityUniversiteitsweg 993584 CGUtrechtThe Netherlands
| | - Brian McMahon
- International Union of Crystallography5 Abbey SquareChesterCH1 2HUUnited Kingdom
| | - Selina L. S. Storm
- European Molecular Biology Laboratoryc/o DESY, Notkestraße 8522607HamburgGermany
| |
Collapse
|
2
|
Heller M, Ott B, Dalbauer V, Felfer P. A MATLAB Toolbox for Findable, Accessible, Interoperable, and Reusable Atom Probe Data Science. MICROSCOPY AND MICROANALYSIS : THE OFFICIAL JOURNAL OF MICROSCOPY SOCIETY OF AMERICA, MICROBEAM ANALYSIS SOCIETY, MICROSCOPICAL SOCIETY OF CANADA 2024:ozae031. [PMID: 38885135 DOI: 10.1093/mam/ozae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 03/07/2024] [Accepted: 03/22/2024] [Indexed: 06/20/2024]
Abstract
Atom probe tomography (APT) data analytics have traditionally been based on manual analytics by researchers. As newer atom probes together with focused ion beam-based specimen preparation have opened APT to many more materials, yielding much more complex mass spectra, building up a systematic understanding of the pathway from raw data to final interpretation has increasingly become important. This demands a system in which the data and treatment can be traced, ideally by any interested party. Such an approach of findable, accessible, interoperable, and reusable (FAIR) data and analysis policies is becoming increasingly important, not just in APT. In this paper, we present a toolbox, written in MATLAB, which allows the user to store the raw and processed data in a standardized FAIR format (hierarchical data format 5) and process the data in a largely scriptable environment to minimize manual user input. This allows for the experiment data to be interchanged without owner explanations and the analysis to be reproduced. We have devised a metadata scheme that is extensible to other experiments in the materials science domain. With this toolbox, collective knowledge can be built up, and a large number of data sets can be analyzed in a fully automated fashion.
Collapse
Affiliation(s)
- Martina Heller
- Institute for General Materials Properties, Department of Materials Science, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Erlangen 91058, Germany
- Interdisciplinary Center for Nanostructured Films (IZNF), Erlangen 91058, Germany
| | - Benedict Ott
- Institute for General Materials Properties, Department of Materials Science, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Erlangen 91058, Germany
| | - Valentin Dalbauer
- Institute for General Materials Properties, Department of Materials Science, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Erlangen 91058, Germany
| | - Peter Felfer
- Institute for General Materials Properties, Department of Materials Science, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Erlangen 91058, Germany
| |
Collapse
|
3
|
Kroon-Batenburg LMJ, Lightfoot MP, Johnson NT, Helliwell JR. Raw diffraction data and reproducibility. STRUCTURAL DYNAMICS (MELVILLE, N.Y.) 2024; 11:011301. [PMID: 38361661 PMCID: PMC10869167 DOI: 10.1063/4.0000232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 01/19/2024] [Indexed: 02/17/2024]
Abstract
In recent years, there has been a major expansion in digital storage capability for hosting raw diffraction datasets. Naturally, the question has now arisen as to the benefits and costs for the preservation of such raw, i.e., experimental diffraction datasets. We describe the consultations made of the global structural chemistry, i.e., chemical crystallography community from the points of view of the International Union of Crystallography (IUCr) Committee on Data, of which JRH was the Chair until very recently, and the IUCrData Raw Data Letters initiative, for which LKB is the Main Editor. The monitoring by the CCDC of CSD depositions which cite the digital object identifiers of raw diffraction datasets provides interesting statistics by probe (x-ray, neutron, or electron) and by home lab vs central facility. Clearly, a better understanding of the reproducibility of current analysis procedures is at hand. Policies for publication requiring raw data have been updated in IUCr Journals for macromolecular crystallography, namely, that raw data should be made available for a new crystal structure or a new method as well as the wwPDB deposition. For chemical crystallography, such a step requiring raw data archiving has not yet been recommended by the IUCr Commission on Structural Chemistry.
Collapse
Affiliation(s)
- Loes M. J. Kroon-Batenburg
- Crystal and Structural Chemistry, Bijvoet Center for Biomolecular Research, Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands
| | - Matthew P. Lightfoot
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, United Kingdom
| | - Natalie T. Johnson
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, United Kingdom
| | - John R. Helliwell
- Department of Chemistry, University of Manchester, Manchester M13 9PL, United Kingdom
| |
Collapse
|
4
|
Rauh D, Blankenburg C, Fischer TG, Jung N, Kuhn S, Schatzschneider U, Schulze T, Neumann S. Data format standards in analytical chemistry. PURE APPL CHEM 2022. [DOI: 10.1515/pac-2021-3101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
Research data is an essential part of research and almost every publication in chemistry. The data itself can be valuable for reuse if sustainably deposited, annotated and archived. Thus, it is important to publish data following the FAIR principles, to make it findable, accessible, interoperable and reusable not only for humans but also in machine-readable form. This also improves transparency and reproducibility of research findings and fosters analytical work with scientific data to generate new insights, being only accessible with manifold and diverse datasets. Research data requires complete and informative metadata and use of open data formats to obtain interoperable data. Generic data formats like AnIML and JCAMP-DX have been used for many applications. Special formats for some analytical methods are already accepted, like mzML for mass spectrometry or nmrML and NMReDATA for NMR spectroscopy data. Other methods still lack common standards for data. Only a joint effort of chemists, instrument and software vendors, publishers and infrastructure maintainers can make sure that the analytical data will be of value in the future. In this review, we describe existing data formats in analytical chemistry and introduce guidelines for the development and use of standardized and open data formats.
Collapse
Affiliation(s)
- David Rauh
- Leibniz Institute of Plant Biochemistry, Bioinformatics and Scientific Data , Weinberg 3 , 06120 Halle , Germany
| | - Claudia Blankenburg
- Leibniz Institute of Plant Biochemistry, Bioinformatics and Scientific Data , Weinberg 3 , 06120 Halle , Germany
| | - Tillmann G. Fischer
- Leibniz Institute of Plant Biochemistry, Bioinformatics and Scientific Data , Weinberg 3 , 06120 Halle , Germany
| | - Nicole Jung
- Karlsruhe Institute of Technology, Institute for Chemical and Biological Systems (IBCS-FMS) , Hermann von Helmholtz Platz 1 , 76344 Eggenstein-Leopolshafen , Germany
| | - Stefan Kuhn
- School of Computer Science and Informatics , De Montfort University , Leicester , UK
| | - Ulrich Schatzschneider
- Institut für Anorganische Chemie , Julius-Maximilians-Universität Würzburg , Am Hubland , D-97074 Würzburg , Germany
| | - Tobias Schulze
- Department of Effect-Directed Analysis , Helmholtz Centre for Environmental Research – UFZ , Permoserstr. 15, 04318 Leipzig , Germany
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry, Bioinformatics and Scientific Data , Weinberg 3 , 06120 Halle , Germany
| |
Collapse
|
5
|
Helliwell JR. Raw diffraction data are our ground truth from which all subsequent workflows develop. Acta Crystallogr D Struct Biol 2022; 78:683-689. [PMID: 35647915 PMCID: PMC9159283 DOI: 10.1107/s2059798322003795] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 04/06/2022] [Indexed: 11/11/2022] Open
Abstract
Defining best practice in science is challenging. International consensus is facilitated by the International Science Council via its members such as the International Union of Crystallography (IUCr). The crystallographic community has many decades of tradition linking articles with the underpinning data, and is admired across all sciences accordingly. Crystallography has always been at the forefront of harnessing new technology in the service of consensus. Technology has provided new vast data-archiving opportunities, allowing the preservation of raw diffraction data, along with article and database depositions of a model's coordinates and associated structure factors. The raw diffraction data, which can now be preserved, are the ground truth from which all subsequent workflows develop. Journal editorial boards provide a practical forum for setting the criteria to decide if a study's files are truly the version of record. Within that, reality involves a variance of reasonable workflows. But what is a reasonable variance? Workflows must be detailed carefully by authors in explaining what they have done. There is a great, and increasing, diversity of macromolecular crystallography analyses, and yet an increased constraint on how much can be written in an article about the workflow used. Raw data provide the ultimate reproducibility evidence. A part of reproducibility and replicability is using an agreed vocabulary; the meaning of words such as precision and accuracy and, more recently, the confidence of a protein structure prediction should feature in approaching `truth'.
Collapse
|
6
|
Helliwell JR. Pre- and Post-publication Verification for Reproducible Data Mining in Macromolecular Crystallography. Methods Mol Biol 2022; 2449:235-261. [PMID: 35507266 DOI: 10.1007/978-1-0716-2095-3_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Like an article narrative is deemed by an editor and referees to be worthy of being a version of record on acceptance as a publication, so must the underpinning data also be scrutinized before passing it as a version of record. Indeed without the underpinning data, a study and its conclusions cannot be reproduced at any stage of evaluation, pre- or post-publication. Likewise, an independent study without its own underpinning data also cannot be reproduced let alone be considered a replicate of the first study. The PDB is a modern marvel of achievement providing an organized open access to depositor and user of the data held there opening numerous applications. Methods for modeling protein structures and for determination of structures are still improving their precision, and artifacts of the method exist. So their accuracy is realized if they are reproduced by other methods. It is on such foundations that reproducible data mining is based. Data rates are expanding considerably be they at synchrotrons, the X-ray free electron lasers (XFELs), electron cryomicroscopes (cryoEM), or at the neutron facilities. The work of a person as a referee or user with a narrative and its underpinning data may well be complemented in future by artificial intelligence with machine learning, the former for specific refereeing and the latter for the more general validation, both ideally before publication. Examples are described involving rhenium theranostics, the anti-cancer platins and the SARS-CoV-2 main protease.
Collapse
Affiliation(s)
- John R Helliwell
- Department of Chemistry, University of Manchester, Manchester, UK.
| |
Collapse
|
7
|
Bernstein HJ, Förster A, Bhowmick A, Brewster AS, Brockhauser S, Gelisio L, Hall DR, Leonarski F, Mariani V, Santoni G, Vonrhein C, Winter G. Gold Standard for macromolecular crystallography diffraction data. IUCRJ 2020; 7:784-792. [PMID: 32939270 PMCID: PMC7467160 DOI: 10.1107/s2052252520008672] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 06/26/2020] [Indexed: 06/11/2023]
Abstract
Macromolecular crystallography (MX) is the dominant means of determining the three-dimensional structures of biological macromolecules. Over the last few decades, most MX data have been collected at synchrotron beamlines using a large number of different detectors produced by various manufacturers and taking advantage of various protocols and goniometries. These data came in their own formats: sometimes proprietary, sometimes open. The associated metadata rarely reached the degree of completeness required for data management according to Findability, Accessibility, Interoperability and Reusability (FAIR) principles. Efforts to reuse old data by other investigators or even by the original investigators some time later were often frustrated. In the culmination of an effort dating back more than two decades, a large portion of the research community concerned with high data-rate macromolecular crystallography (HDRMX) has now agreed to an updated specification of data and metadata for diffraction images produced at synchrotron light sources and X-ray free-electron lasers (XFELs). This 'Gold Standard' will facilitate the processing of data sets independent of the facility at which they were collected and enable data archiving according to FAIR principles, with a particular focus on interoperability and reusability. This agreed standard builds on the NeXus/HDF5 NXmx application definition and the International Union of Crystallo-graphy (IUCr) imgCIF/CBF dictionary, and it is compatible with major data-processing programs and pipelines. Just as with the IUCr CBF/imgCIF standard from which it arose and to which it is tied, the NeXus/HDF5 NXmx Gold Standard application definition is intended to be applicable to all detectors used for crystallography, and all hardware and software developers in the field are encouraged to adopt and contribute to the standard.
Collapse
Affiliation(s)
- Herbert J. Bernstein
- Ronin Institute for Independent Scholarship, c/o NSLS II, Brookhaven National Laboratory, Upton, New York, USA
| | | | - Asmit Bhowmick
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Aaron S. Brewster
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Sandor Brockhauser
- European XFEL GmbH, Holzkoppel 4, 22869 Schenefeld, Germany
- Biological Research Centre Szeged (BRC), Temesvári krt. 62, 6726 Szeged, Hungary
- University of Szeged, Arpad ter 2, 6720 Szeged, Hungary
| | - Luca Gelisio
- Center for Free-Electron Laser Science, Notkestrasse 85, 22607 Hamburg, Germany
| | - David R. Hall
- Diamond Light Source Ltd, Harwell Science and Innovation Campus, Didcot OX11 0DE, United Kingdom
| | - Filip Leonarski
- Swiss Light Source, Paul Scherrer Institut, Forschungsstrasse 111, 5232 Villigen PSI, Switzerland
| | - Valerio Mariani
- Center for Free-Electron Laser Science, Notkestrasse 85, 22607 Hamburg, Germany
| | - Gianluca Santoni
- Structural Biology Group, European Synchrotron Radiation Facility, 71 Avenue des Martyrs, 38000 Grenoble, France
| | - Clemens Vonrhein
- Global Phasing Ltd, Sheraton House, Castle Park, Cambridge CB3 0AX, United Kingdom
| | - Graeme Winter
- Diamond Light Source Ltd, Harwell Science and Innovation Campus, Didcot OX11 0DE, United Kingdom
| |
Collapse
|
8
|
Spek AL. checkCIF validation ALERTS: what they mean and how to respond. Acta Crystallogr E Crystallogr Commun 2020; 76:1-11. [PMID: 31921444 PMCID: PMC6944088 DOI: 10.1107/s2056989019016244] [Citation(s) in RCA: 666] [Impact Index Per Article: 166.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 12/02/2019] [Indexed: 11/23/2022]
Abstract
Authors of a paper that includes a new crystal-structure determination are expected to not only report the structural results of inter-est and their inter-pretation, but are also expected to archive in computer-readable CIF format the experimental data on which the crystal-structure analysis is based. Additionally, an IUCr/checkCIF validation report will be required for the review of a submitted paper. Such a validation report, automatically created from the deposited CIF file, lists as ALERTS not only potential errors or unusual findings, but also suggestions for improvement along with inter-esting information on the structure at hand. Major ALERTS for issues are expected to have been acted on already before the submission for publication or discussed in the associated paper and/or commented on in the CIF file. In addition, referees, readers and users of the data should be able to make their own judgment and inter-pretation of the underlying experimental data or perform their own calculations with the archived data. All the above is consistent with the FAIR (findable, accessible, inter-operable, and reusable) initiative [Helliwell (2019 ▸). Struct. Dyn. 6, 05430]. Validation can also be helpful for less experienced authors in pointing to and avoiding of crystal-structure determination and inter-pretation pitfalls. The IUCr web-based checkCIF server provides such a validation report, based on data uploaded in CIF format. Alternatively, a locally installable checkCIF version is available to be used iteratively during the structure-determination process. ALERTS come mostly as short single-line messages. There is also a short explanation of the ALERTS available through the IUCr web server or with the locally installed PLATON/checkCIF version. This paper provides additional background information on the checkCIF procedure and additional details for a number of ALERTS along with options for how to act on them.
Collapse
Affiliation(s)
- Anthony L. Spek
- Crystal and Structural Chemistry, Bijvoet Center for Biomolecular Research, Utrecht University, Padualaan 8, 3584CH Utrecht, The Netherlands
| |
Collapse
|
9
|
Grabowski M, Cymborowski M, Porebski PJ, Osinski T, Shabalin IG, Cooper DR, Minor W. The Integrated Resource for Reproducibility in Macromolecular Crystallography: Experiences of the first four years. STRUCTURAL DYNAMICS (MELVILLE, N.Y.) 2019; 6:064301. [PMID: 31768399 PMCID: PMC6874509 DOI: 10.1063/1.5128672] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 11/04/2019] [Indexed: 05/05/2023]
Abstract
It has been increasingly recognized that preservation and public accessibility of primary experimental data are cornerstones necessary for the reproducibility of empirical sciences. In the field of molecular crystallography, many journals now recommend that authors of manuscripts presenting a new crystal structure should deposit their primary experimental data (X-ray diffraction images) to one of the dedicated resources created in recent years. Here, we describe our experiences developing the Integrated Resource for Reproducibility in Molecular Crystallography (IRRMC) and describe several examples of a crucial role that diffraction data can play in improving previously determined protein structures. In its first four years, several hundred crystallographers have deposited data from over 5200 diffraction experiments performed at over 60 different synchrotron beamlines or home sources all over the world. In addition to improving the resource and curating submitted data, we have been building a pipeline for extraction or, in some cases, reconstruction of the metadata necessary for seamless automated processing. Preliminary analysis indicates that about 95% of the archived data can be automatically reprocessed. A high rate of reprocessing success shows the feasibility of using the automated metadata extraction and automated processing as a validation step for the deposition of raw diffraction images. The IRRMC is guided by the Findable, Accessible, Interoperable, and Reusable data management principles.
Collapse
Affiliation(s)
| | | | - Przemyslaw J. Porebski
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charottesville, Virginia 22908, USA
| | - Tomasz Osinski
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charottesville, Virginia 22908, USA
| | | | | | - Wladek Minor
- Authors to whom correspondence should be addressed: and
| |
Collapse
|
10
|
Helliwell JR. FACT and FAIR with Big Data allows objectivity in science: The view of crystallography. STRUCTURAL DYNAMICS (MELVILLE, N.Y.) 2019; 6:054306. [PMID: 31673568 PMCID: PMC6816445 DOI: 10.1063/1.5124439] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 10/09/2019] [Indexed: 05/14/2023]
Abstract
A publication is an important narrative of the work done and interpretations made by researchers securing a scientific discovery. As The Royal Society neatly states though, "Nullius in verba" ("Take nobody's word for it"), whereby the role of the underpinning data is paramount. Therefore, the objectivity that preserving that data within the article provides is due to readers being able to check the calculation decisions of the authors. But how to achieve full data archiving? This is the raw data archiving challenge, in size and need for correct metadata. Processed diffraction data and final derived molecular coordinates archiving in crystallography have achieved an exemplary state of the art relative to most fields. One can credit IUCr with developing exemplary peer review procedures, of narrative, underpinning structure factors and coordinate data and validation report, through its checkcif development and submission system introduced for Acta Cryst. C and subsequently developed for its other chemistry journals. The crystallographic databases likewise have achieved amazing success and sustainability these last 50 years or so. The wider science data scene is celebrating the FAIR data accord, namely, that data be Findable, Accessible, Interoperable, and Reusable [Wilkinson et al., "Comment: The FAIR guiding principles for scientific data management and stewardship," Sci. Data 3, 160018 (2016)]. Some social scientists also emphasize more than FAIR being needed, the data should be "FACT," which is an acronym meaning Fair, Accurate, Confidential, and Transparent [van der Aalst et al., "Responsible data science," Bus Inf. Syst. Eng. 59(5), 311-313 (2017)], this being the issue of ensuring reproducibility not just reusability. (Confidentiality of data not likely being relevant to our data obviously.) Acta Cryst. B, C, E, and IUCrData are the closest I know to being both FACT and FAIR where I repeat for due emphasis: the narrative, the automatic "general" validation checks, and the underpinning data are checked thoroughly by subject specialists (i.e., the specialist referees). IUCr Journals are also the best that I know of for encouraging and then expediting the citation of the DOI for a raw diffraction dataset in a publication; examples can be found in IUCrJ, Acta Cryst D, and Acta Cryst F. The wish for a checkcif for raw diffraction data has been championed by the IUCr Diffraction Data Deposition Working Group and its successor, the IUCr Committee on Data.
Collapse
Affiliation(s)
- John R Helliwell
- Department of Chemistry, University of Manchester, Manchester M13 9PL, United Kingdom
| |
Collapse
|
11
|
Helliwell JR, Minor W, Weiss MS, Garman EF, Read RJ, Newman J, van Raaij MJ, Hajdu J, Baker EN. Findable Accessible Interoperable Re-usable (FAIR) diffraction data are coming to protein crystallography. J Appl Crystallogr 2019; 52:495-497. [PMID: 31236090 PMCID: PMC6557178 DOI: 10.1107/s1600576719005922] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
The policy of IUCr Journals on diffraction data is defined.
Collapse
Affiliation(s)
- John R Helliwell
- School of Chemistry, The University of Manchester, Brunswick Street, Manchester M13 9PL, UK
| | - Wladek Minor
- Department of Molecular Physiology and Biological Physics, University of Virginia, 1340 Jefferson Park Avenue Pinn Hall, Charlottesville, VA 22908-0736, USA
| | - Manfred S Weiss
- Macromolecular Crystallography (HZB-MX), Helmholtz-Zentrum Berlin, Albert-Einstein-Str. 15, D-12489 Berlin, Germany
| | - Elspeth F Garman
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, UK
| | - Randy J Read
- Cambridge Institute for Medical Research, Department of Haematology, University of Cambridge, The Keith Peters Building, Hills Road, Cambridge CB2 0XY, UK
| | - Janet Newman
- Collaborative Crystallisation Centre (C3), CSIRO, 343 Royal Parade, Parkville, VIC 3052, Australia
| | - Mark J van Raaij
- CSIC, Centro Nacional de Biotecnologia, c/Darwin 3, Madrid, 28049, Spain
| | - Janos Hajdu
- Laboratory of Molecular Biophysics, Department of Cell and Molecular Biology, Uppsala University, Husargatan 3, Box 596, Uppsala, 75124, Sweden
- The European Extreme Light Infrastructure, Institute of Physics, AS CR, Na Slovance 2, Prague 18221 8, Czech Republic
| | - Edward N Baker
- School of Biological Sciences, University of Auckland, School of Biological Sciences, Private Bag 92-019, Auckland, New Zealand
| |
Collapse
|
12
|
Henn J. Metrics for crystallographic diffraction- and fit-data: a review of existing ones and the need for new ones. CRYSTALLOGR REV 2019. [DOI: 10.1080/0889311x.2019.1607845] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- J. Henn
- Fakultät I, Universität Bayreuth, Bayreuth, Germany
| |
Collapse
|
13
|
Helliwell JR, Minor W, Weiss MS, Garman EF, Read RJ, Newman J, van Raaij MJ, Hajdu J, Baker EN. Findable Accessible Interoperable Re-usable (FAIR) diffraction data are coming to protein crystallography. IUCRJ 2019; 6:341-343. [PMID: 31098014 PMCID: PMC6503929 DOI: 10.1107/s2052252519005918] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
The policy of IUCr Journals on diffraction data is defined.
Collapse
Affiliation(s)
- John R Helliwell
- School of Chemistry, The University of Manchester, Brunswick Street, Manchester M13 9PL, United Kingdom
| | - Wladek Minor
- Department of Molecular Physiology and Biological Physics, University of Virginia, 1340 Jefferson Park Avenue Pinn Hall, Charlottesville, VA 22908-0736, USA
| | - Manfred S Weiss
- Macromolecular Crystallography (HZB-MX), Helmholtz-Zentrum Berlin, Albert-Einstein-Str. 15, D-12489 Berlin, Germany
| | - Elspeth F Garman
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, United Kingdom
| | - Randy J Read
- Cambridge Institute for Medical Research, Department of Haematology, University of Cambridge, The Keith Peters Building, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Janet Newman
- Collaborative Crystallisation Centre (C3), CSIRO, 343 Royal Parade, Parkville, VIC 3052, Australia
| | - Mark J van Raaij
- CSIC, Centro Nacional de Biotecnologia, c/Darwin 3, Madrid, 28049, Spain
| | - Janos Hajdu
- Laboratory of Molecular Biophysics, Department of Cell and Molecular Biology, Uppsala University, Husargatan 3, Box 596, Uppsala, 75124, Sweden
- The European Extreme Light Infrastructure, Institute of Physics, AS CR, Na Slovance 2, Prague 18221 8, Czech Republic
| | - Edward N Baker
- School of Biological Sciences, University of Auckland, School of Biological Sciences, Private Bag 92-019, Auckland, New Zealand
| |
Collapse
|
14
|
Helliwell JR, Minor W, Weiss MS, Garman EF, Read RJ, Newman J, van Raaij MJ, Hajdu J, Baker EN. Findable Accessible Interoperable Re-usable (FAIR) diffraction data are coming to protein crystallography. Acta Crystallogr F Struct Biol Commun 2019; 75:321-323. [PMID: 31045560 PMCID: PMC6497101 DOI: 10.1107/s2053230x19005909] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
The policy of IUCr Journals on diffraction data is defined.
Collapse
Affiliation(s)
- John R Helliwell
- School of Chemistry, The University of Manchester, Brunswick Street, Manchester M13 9PL, United Kingdom
| | - Wladek Minor
- Department of Molecular Physiology and Biological Physics, University of Virginia, 1340 Jefferson Park Avenue Pinn Hall, Charlottesville, VA 22908-0736, USA
| | - Manfred S Weiss
- Macromolecular Crystallography (HZB-MX), Helmholtz-Zentrum Berlin, Albert-Einstein-Str. 15, D-12489 Berlin, Germany
| | - Elspeth F Garman
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, United Kingdom
| | - Randy J Read
- Cambridge Institute for Medical Research, Department of Haematology, University of Cambridge, The Keith Peters Building, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Janet Newman
- Collaborative Crystallisation Centre (C3), CSIRO, 343 Royal Parade, Parkville, VIC 3052, Australia
| | - Mark J van Raaij
- CSIC, Centro Nacional de Biotecnologia, c/Darwin 3, Madrid, 28049, Spain
| | - Janos Hajdu
- Laboratory of Molecular Biophysics, Department of Cell and Molecular Biology, Uppsala University, Husargatan 3, Box 596, Uppsala, 75124, Sweden
| | - Edward N Baker
- School of Biological Sciences, University of Auckland, School of Biological Sciences, Private Bag 92-019, Auckland, New Zealand
| |
Collapse
|
15
|
Helliwell JR, Minor W, Weiss MS, Garman EF, Read RJ, Newman J, van Raaij MJ, Hajdu J, Baker EN. Findable Accessible Interoperable Re-usable (FAIR) diffraction data are coming to protein crystallography. Acta Crystallogr D Struct Biol 2019; 75:455-457. [PMID: 31063147 PMCID: PMC6503765 DOI: 10.1107/s2059798319004844] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
The policy of IUCr Journals on diffraction data is defined.
Collapse
Affiliation(s)
- John R Helliwell
- School of Chemistry, The University of Manchester, Brunswick Street, Manchester M13 9PL, United Kingdom
| | - Wladek Minor
- Department of Molecular Physiology and Biological Physics, University of Virginia, 1340 Jefferson Park Avenue Pinn Hall, Charlottesville, VA 22908-0736, USA
| | - Manfred S Weiss
- Macromolecular Crystallography (HZB-MX), Helmholtz-Zentrum Berlin, Albert-Einstein-Str. 15, D-12489 Berlin, Germany
| | - Elspeth F Garman
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, United Kingdom
| | - Randy J Read
- Cambridge Institute for Medical Research, Department of Haematology, University of Cambridge, The Keith Peters Building, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Janet Newman
- Collaborative Crystallisation Centre (C3), CSIRO, 343 Royal Parade, Parkville, VIC 3052, Australia
| | - Mark J van Raaij
- CSIC, Centro Nacional de Biotecnologia, c/Darwin 3, Madrid, 28049, Spain
| | - Janos Hajdu
- Laboratory of Molecular Biophysics, Department of Cell and Molecular Biology, Uppsala University, Husargatan 3, Box 596, Uppsala, 75124, Sweden
| | - Edward N Baker
- School of Biological Sciences, University of Auckland, School of Biological Sciences, Private Bag 92-019, Auckland, New Zealand
| |
Collapse
|
16
|
|
17
|
Abstract
Scientific data are as important as scientific publications. If this statement holds true, why are we not routinely sharing scientific data? The tools are now out there, for instance Zenodo and related repositories. It could be a lack of motivation of researchers derived from an apparent lack of short-term reward. Here the author will try to show the importance of sharing ready-to-analyse raw powder diffraction data with immediate benefits for authors and for the wider community. Moreover, it is speculated that sharing curated scientific data may have more important medium-term benefits, including credibility and not least reproducibility. Raw data sharing is coming.
Collapse
|
18
|
Wang C, Steiner U, Sepe A. Synchrotron Big Data Science. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2018; 14:e1802291. [PMID: 30222245 DOI: 10.1002/smll.201802291] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 07/27/2018] [Indexed: 06/08/2023]
Abstract
The rapid development of synchrotrons has massively increased the speed at which experiments can be performed, while new techniques have increased the amount of raw data collected during each experiment. While this has created enormous new opportunities, it has also created tremendous challenges for national facilities and users. With the huge increase in data volume, the manual analysis of data is no longer possible. As a result, only a fraction of the data collected during the time- and money-expensive synchrotron beam-time is analyzed and used to deliver new science. Additionally, the lack of an appropriate data analysis environment limits the realization of experiments that generate a large amount of data in a very short period of time. The current lack of automated data analysis pipelines prevents the fine-tuning of beam-time experiments, further reducing their potential usage. These effects, collectively known as the "data deluge," affect synchrotrons in several different ways including fast data collection, available local storage, data management systems, and curation of the data. This review highlights the Big Data strategies adopted nowadays at synchrotrons, documenting this novel and promising hybridization between science and technology, which promise a dramatic increase in the number of scientific discoveries.
Collapse
Affiliation(s)
- Chunpeng Wang
- Big Data Science Center, Shanghai Synchrotron Radiation Facility, Shanghai Institute of Applied Physics, Chinese Academy of Sciences, 201204, Shanghai, China
| | - Ullrich Steiner
- Adolphe Merkle Institute, University of Fribourg, CH-1700, Fribourg, Switzerland
| | - Alessandro Sepe
- Big Data Science Center, Shanghai Synchrotron Radiation Facility, Shanghai Institute of Applied Physics, Chinese Academy of Sciences, 201204, Shanghai, China
- Adolphe Merkle Institute, University of Fribourg, CH-1700, Fribourg, Switzerland
| |
Collapse
|
19
|
|
20
|
Russo Krauss I, Ferraro G, Pica A, Márquez JA, Helliwell JR, Merlino A. Principles and methods used to grow and optimize crystals of protein-metallodrug adducts, to determine metal binding sites and to assign metal ligands. Metallomics 2018; 9:1534-1547. [PMID: 28967006 DOI: 10.1039/c7mt00219j] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The characterization of the interactions between biological macromolecules (proteins and nucleic acids) and metal-based drugs is a fundamental prerequisite for understanding their mechanisms of action. X-ray crystallography enables the structural analysis of such complexes with atomic level detail. However, this approach requires the preparation of highly diffracting single crystals, the measurement of diffraction patterns and the structural analysis and interpretation of macromolecule-metal interactions from electron density maps. In this review, we describe principles and methods used to grow and optimize crystals of protein-metallodrug adducts, to determine metal binding sites and to assign and validate metal ligands. Examples from the literature and experience in our own laboratory are provided and key challenges are described, notably crystallization and molecular model refinement against the X-ray diffraction data.
Collapse
Affiliation(s)
- Irene Russo Krauss
- Department of Chemical Sciences, University of Naples Federico II, Complesso Universitario di Monte Sant'Angelo, Via Cintia, I-80126, Napoli, Italy.
| | | | | | | | | | | |
Collapse
|
21
|
Rupp B. Against Method: Table 1-Cui Bono? Structure 2018; 26:919-923. [PMID: 29861344 DOI: 10.1016/j.str.2018.04.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Revised: 03/20/2018] [Accepted: 04/18/2018] [Indexed: 11/16/2022]
Abstract
The almost universally required "Table 1," summarizing data-collection and data-processing statistics, has in its present form outlived its usefulness in almost all publications of biomolecular crystal structure reports. Information contained in "Table 1" is insufficient to evaluate or repeat the experiment; is redundant with information extractable from deposited diffraction data; and includes data items whose meaning is under increased scrutiny in the crystallographic community. Direct and consistent extraction and analysis of data quality metrics from preferably unmerged intensity data with graphical presentation of reciprocal space features, including impact on map and model features, should replace "Table 1."
Collapse
Affiliation(s)
- Bernhard Rupp
- k.-k.Hofkristallamt, San Diego, CA 92084, USA; Division of Genetic Epidemiology, Medical University Innsbruck, Schöpfstraße 41, Innsbruck, Tyrol 6020, Austria.
| |
Collapse
|
22
|
Wall ME, Wolff AM, Fraser JS. Bringing diffuse X-ray scattering into focus. Curr Opin Struct Biol 2018; 50:109-116. [PMID: 29455056 PMCID: PMC6078797 DOI: 10.1016/j.sbi.2018.01.009] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Revised: 01/12/2018] [Accepted: 01/21/2018] [Indexed: 01/01/2023]
Abstract
X-ray crystallography is experiencing a renaissance as a method for probing the protein conformational ensemble. The inherent limitations of Bragg analysis, however, which only reveals the mean structure, have given way to a surge in interest in diffuse scattering, which is caused by structure variations. Diffuse scattering is present in all macromolecular crystallography experiments. Recent studies are shedding light on the origins of diffuse scattering in protein crystallography, and provide clues for leveraging diffuse scattering to model protein motions with atomic detail.
Collapse
Affiliation(s)
- Michael E Wall
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Alexander M Wolff
- Graduate Group in Biophysics, University of California San Francisco, San Francisco, CA 94158, USA; Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
| | - James S Fraser
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA.
| |
Collapse
|
23
|
Horn JAVD, Lutz M. Triethanolaminate iron perchlorate revisited: change of space group, chemical composition and oxidation states in [Fe 7(tea) 3(tea-H) 3](ClO 4) 2 (tea-H 3 is triethanolamine). ACTA CRYSTALLOGRAPHICA SECTION C-STRUCTURAL CHEMISTRY 2018; 74:125-130. [PMID: 29400325 DOI: 10.1107/s2053229617018460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Accepted: 12/27/2017] [Indexed: 11/11/2022]
Abstract
The X-ray crystal structure of tris[N-(2-hydroxyethyl)-2,2'-iminodiethanolato]tris(2,2',2''-nitrilotriethanolato)tetrairon(II)triiron(III) bis(perchlorate), [Fe7(C6H12NO3)3(C6H13NO3)3](ClO4)2 or [Fe7(tea)3(tea-H)3](ClO4)2 (tea-H3 is triethanolamine), is known from the literature [Liu et al. (2008). Z. Anorg. Allg. Chem. 634, 778-783] as a heptanuclear coordination cluster. The space group was given as I213 and is reinvestigated in the present study. We find a new space-group symmetry of Pa-3 and could detect O-H hydrogens, which were missing in the original publication. Consequences on the Fe oxidation states are investigated with the bond-valence method, resulting in a mixed-valence core of four FeII and three FeIII centres. Symmetry relationships between the two space groups and the average supergroup Ia-3 are discussed in detail.
Collapse
Affiliation(s)
- Jitschaq A van der Horn
- Bijvoet Center for Biomolecular Research, Crystal and Structural Chemistry, Faculty of Science, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Martin Lutz
- Bijvoet Center for Biomolecular Research, Crystal and Structural Chemistry, Faculty of Science, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| |
Collapse
|
24
|
|
25
|
Helliwell JR, McMahon B, Guss JM, Kroon-Batenburg LMJ. The science is in the data. IUCRJ 2017; 4:714-722. [PMID: 29123672 PMCID: PMC5668855 DOI: 10.1107/s2052252517013690] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Accepted: 09/24/2017] [Indexed: 05/22/2023]
Abstract
Understanding published research results should be through one's own eyes and include the opportunity to work with raw diffraction data to check the various decisions made in the analyses by the original authors. Today, preserving raw diffraction data is technically and organizationally viable at a growing number of data archives, both centralized and distributed, which are empowered to register data sets and obtain a preservation descriptor, typically a 'digital object identifier'. This introduces an important role of preserving raw data, namely understanding where we fail in or could improve our analyses. Individual science area case studies in crystallography are provided.
Collapse
Affiliation(s)
- John R. Helliwell
- School of Chemistry, University of Manchester, Manchester M13 9PL, England
| | - Brian McMahon
- International Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
| | - J. Mitchell Guss
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Loes M. J. Kroon-Batenburg
- Crystal and Structural Chemistry, Bijvoet Center for Biomolecular Research, Utrecht University, Padualaan 8, CH 3584 Utrecht, The Netherlands
| |
Collapse
|
26
|
|
27
|
Liu J, Lhermitte J, Tian Y, Zhang Z, Yu D, Yager KG. Healing X-ray scattering images. IUCRJ 2017; 4:455-465. [PMID: 28875032 PMCID: PMC5571808 DOI: 10.1107/s2052252517006212] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Accepted: 04/24/2017] [Indexed: 05/03/2023]
Abstract
X-ray scattering images contain numerous gaps and defects arising from detector limitations and experimental configuration. We present a method to heal X-ray scattering images, filling gaps in the data and removing defects in a physically meaningful manner. Unlike generic inpainting methods, this method is closely tuned to the expected structure of reciprocal-space data. In particular, we exploit statistical tests and symmetry analysis to identify the structure of an image; we then copy, average and interpolate measured data into gaps in a way that respects the identified structure and symmetry. Importantly, the underlying analysis methods provide useful characterization of structures present in the image, including the identification of diffuse versus sharp features, anisotropy and symmetry. The presented method leverages known characteristics of reciprocal space, enabling physically reasonable reconstruction even with large image gaps. The method will correspondingly fail for images that violate these underlying assumptions. The method assumes point symmetry and is thus applicable to small-angle X-ray scattering (SAXS) data, but only to a subset of wide-angle data. Our method succeeds in filling gaps and healing defects in experimental images, including extending data beyond the original detector borders.
Collapse
Affiliation(s)
- Jiliang Liu
- Center for Functional Nanomaterials, Brookhaven National Laboratory, Upton, New York 11973, USA
| | - Julien Lhermitte
- Center for Functional Nanomaterials, Brookhaven National Laboratory, Upton, New York 11973, USA
| | - Ye Tian
- Center for Functional Nanomaterials, Brookhaven National Laboratory, Upton, New York 11973, USA
| | - Zheng Zhang
- Center for Functional Nanomaterials, Brookhaven National Laboratory, Upton, New York 11973, USA
| | - Dantong Yu
- Computational Science Initiative, Brookhaven National Laboratory, Upton, New York 11973, USA
- New Jersey Institute of Technology, Newark, New Jersey 07102, USA
| | - Kevin G. Yager
- Center for Functional Nanomaterials, Brookhaven National Laboratory, Upton, New York 11973, USA
| |
Collapse
|
28
|
Helliwell JR. Concerning the measurement of charge density X-ray diffraction data at synchrotron sources: challenges and opportunities. CRYSTALLOGR REV 2017. [DOI: 10.1080/0889311x.2017.1295038] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
29
|
Abstract
The importance of preserving and making available the original experimental data underlying biological structural models is discussed, both for crystallography, where the raw data images pose particular challenges, and for other structure determination techniques.
Collapse
Affiliation(s)
- Edward N. Baker
- School of Biological Sciences, University of Auckland, School of Biological Sciences, Private Bag 92-019, Auckland, New Zealand
| |
Collapse
|
30
|
Abstract
Macromolecular Big Data provide numerous challenges and a number of initiatives that are starting to overcome these issues are discussed.
Collapse
Affiliation(s)
- Marek Grabowski
- Department of Molecular Physiology and Biological Physics, University of Virginia , Charlottesville, VA 22903, USA
| | - Wladek Minor
- Department of Molecular Physiology and Biological Physics, University of Virginia , Charlottesville, VA 22903, USA
| |
Collapse
|