1
|
Mous S, Poitevin F, Hunter MS, Asthagiri DN, Beck TL. Structural biology in the age of X-ray free-electron lasers and exascale computing. Curr Opin Struct Biol 2024; 86:102808. [PMID: 38547555 DOI: 10.1016/j.sbi.2024.102808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/07/2024] [Accepted: 03/07/2024] [Indexed: 05/19/2024]
Abstract
Serial femtosecond X-ray crystallography has emerged as a powerful method for investigating biomolecular structure and dynamics. With the new generation of X-ray free-electron lasers, which generate ultrabright X-ray pulses at megahertz repetition rates, we can now rapidly probe ultrafast conformational changes and charge movement in biomolecules. Over the last year, another innovation has been the deployment of Frontier, the world's first exascale supercomputer. Synergizing extremely high repetition rate X-ray light sources and exascale computing has the potential to accelerate discovery in biomolecular sciences. Here we outline our perspective on each of these remarkable innovations individually, and the opportunities and challenges in yoking them within an integrated research infrastructure.
Collapse
Affiliation(s)
- Sandra Mous
- Linac Coherent Light Source, SLAC National Accelerator Laboratory, Menlo Park, 94025, CA, USA
| | - Frédéric Poitevin
- Linac Coherent Light Source, SLAC National Accelerator Laboratory, Menlo Park, 94025, CA, USA
| | - Mark S Hunter
- Linac Coherent Light Source, SLAC National Accelerator Laboratory, Menlo Park, 94025, CA, USA.
| | - Dilipkumar N Asthagiri
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, 37830-6012, TN, USA
| | - Thomas L Beck
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, 37830-6012, TN, USA.
| |
Collapse
|
2
|
Rahmani V, Nawaz S, Pennicard D, Graafsma H. Robust image descriptor for machine learning based data reduction in serial crystallography. J Appl Crystallogr 2024; 57:413-430. [PMID: 38596725 PMCID: PMC11001400 DOI: 10.1107/s160057672400147x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 02/13/2024] [Indexed: 04/11/2024] Open
Abstract
Serial crystallography experiments at synchrotron and X-ray free-electron laser (XFEL) sources are producing crystallographic data sets of ever-increasing volume. While these experiments have large data sets and high-frame-rate detectors (around 3520 frames per second), only a small percentage of the data are useful for downstream analysis. Thus, an efficient and real-time data classification pipeline is essential to differentiate reliably between useful and non-useful images, typically known as 'hit' and 'miss', respectively, and keep only hit images on disk for further analysis such as peak finding and indexing. While feature-point extraction is a key component of modern approaches to image classification, existing approaches require computationally expensive patch preprocessing to handle perspective distortion. This paper proposes a pipeline to categorize the data, consisting of a real-time feature extraction algorithm called modified and parallelized FAST (MP-FAST), an image descriptor and a machine learning classifier. For parallelizing the primary operations of the proposed pipeline, central processing units, graphics processing units and field-programmable gate arrays are implemented and their performances compared. Finally, MP-FAST-based image classification is evaluated using a multi-layer perceptron on various data sets, including both synthetic and experimental data. This approach demonstrates superior performance compared with other feature extractors and classifiers.
Collapse
Affiliation(s)
- Vahid Rahmani
- Deutsches Elektronen-Synchrotron (DESY), Notkestraße 85, Hamburg, 22607, Germany
| | - Shah Nawaz
- Deutsches Elektronen-Synchrotron (DESY), Notkestraße 85, Hamburg, 22607, Germany
| | - David Pennicard
- Deutsches Elektronen-Synchrotron (DESY), Notkestraße 85, Hamburg, 22607, Germany
| | - Heinz Graafsma
- Deutsches Elektronen-Synchrotron (DESY), Notkestraße 85, Hamburg, 22607, Germany
- Mid-Sweden University, Sundsvall, Sweden
| |
Collapse
|
3
|
Round A, Jungcheng E, Fortmann-Grote C, Giewekemeyer K, Graceffa R, Kim C, Kirkwood H, Mills G, Round E, Sato T, Pascarelli S, Mancuso A. Characterization of Biological Samples Using Ultra-Short and Ultra-Bright XFEL Pulses. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2024; 3234:141-162. [PMID: 38507205 DOI: 10.1007/978-3-031-52193-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/22/2024]
Abstract
The advent of X-ray Free Electron Lasers (XFELs) has ushered in a transformative era in the field of structural biology, materials science, and ultrafast physics. These state-of-the-art facilities generate ultra-bright, femtosecond-long X-ray pulses, allowing researchers to delve into the structure and dynamics of molecular systems with unprecedented temporal and spatial resolutions. The unique properties of XFEL pulses have opened new avenues for scientific exploration that were previously considered unattainable. One of the most notable applications of XFELs is in structural biology. Traditional X-ray crystallography, while instrumental in determining the structures of countless biomolecules, often requires large, high-quality crystals and may not capture highly transient states of proteins. XFELs, with their ability to produce diffraction patterns from nanocrystals or even single particles, have provided solutions to these challenges. XFEL has expanded the toolbox of structural biologists by enabling structural determination approaches such as Single Particle Imaging (SPI) and Serial X-ray Crystallography (SFX). Despite their remarkable capabilities, the journey of XFELs is still in its nascent stages, with ongoing advancements aimed at improving their coherence, pulse duration, and wavelength tunability.
Collapse
Affiliation(s)
| | | | | | | | | | - Chan Kim
- European XFEL, Schenefeld, Germany
| | | | | | | | | | | | | |
Collapse
|
4
|
Nawaz S, Rahmani V, Pennicard D, Setty SPR, Klaudel B, Graafsma H. Explainable machine learning for diffraction patterns. J Appl Crystallogr 2023; 56:1494-1504. [PMID: 37791364 PMCID: PMC10543671 DOI: 10.1107/s1600576723007446] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 08/24/2023] [Indexed: 10/05/2023] Open
Abstract
Serial crystallography experiments at X-ray free-electron laser facilities produce massive amounts of data but only a fraction of these data are useful for downstream analysis. Thus, it is essential to differentiate between acceptable and unacceptable data, generally known as 'hit' and 'miss', respectively. Image classification methods from artificial intelligence, or more specifically convolutional neural networks (CNNs), classify the data into hit and miss categories in order to achieve data reduction. The quantitative performance established in previous work indicates that CNNs successfully classify serial crystallography data into desired categories [Ke, Brewster, Yu, Ushizima, Yang & Sauter (2018). J. Synchrotron Rad.25, 655-670], but no qualitative evidence on the internal workings of these networks has been provided. For example, there are no visualization methods that highlight the features contributing to a specific prediction while classifying data in serial crystallography experiments. Therefore, existing deep learning methods, including CNNs classifying serial crystallography data, are like a 'black box'. To this end, presented here is a qualitative study to unpack the internal workings of CNNs with the aim of visualizing information in the fundamental blocks of a standard network with serial crystallography data. The region(s) or part(s) of an image that mostly contribute to a hit or miss prediction are visualized.
Collapse
Affiliation(s)
- Shah Nawaz
- Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Vahid Rahmani
- Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - David Pennicard
- Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | | | | | - Heinz Graafsma
- Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
- Mid-Sweden University, Sundsvall, Sweden
| |
Collapse
|
5
|
Hadian-Jazi M, Sadri A. A Python package based on robust statistical analysis for serial crystallography data processing. Acta Crystallogr D Struct Biol 2023; 79:820-829. [PMID: 37584428 PMCID: PMC10478633 DOI: 10.1107/s2059798323005855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 07/03/2023] [Indexed: 08/17/2023] Open
Abstract
The term robustness in statistics refers to methods that are generally insensitive to deviations from model assumptions. In other words, robust methods are able to preserve their accuracy even when the data do not perfectly fit the statistical models. Robust statistical analyses are particularly effective when analysing mixtures of probability distributions. Therefore, these methods enable the discretization of X-ray serial crystallography data into two probability distributions: a group comprising true data points (for example the background intensities) and another group comprising outliers (for example Bragg peaks or bad pixels on an X-ray detector). These characteristics of robust statistical analysis are beneficial for the ever-increasing volume of serial crystallography (SX) data sets produced at synchrotron and X-ray free-electron laser (XFEL) sources. The key advantage of the use of robust statistics for some applications in SX data analysis is that it requires minimal parameter tuning because of its insensitivity to the input parameters. In this paper, a software package called Robust Gaussian Fitting library (RGFlib) is introduced that is based on the concept of robust statistics. Two methods are presented based on the concept of robust statistics and RGFlib for two SX data-analysis tasks: (i) a robust peak-finding algorithm and (ii) an automated robust method to detect bad pixels on X-ray pixel detectors.
Collapse
Affiliation(s)
- Marjan Hadian-Jazi
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Melbourne, Victoria 3052, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Melbourne, Victoria 3052, Australia
| | - Alireza Sadri
- School of Physics and Astronomy, Monash University, Clayton, Victoria 3800, Australia
| |
Collapse
|
6
|
Rahmani V, Nawaz S, Pennicard D, Setty SPR, Graafsma H. Data reduction for X-ray serial crystallography using machine learning. J Appl Crystallogr 2023; 56:200-213. [PMID: 36777143 PMCID: PMC9901916 DOI: 10.1107/s1600576722011748] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 12/07/2022] [Indexed: 01/25/2023] Open
Abstract
Serial crystallography experiments produce massive amounts of experimental data. Yet in spite of these large-scale data sets, only a small percentage of the data are useful for downstream analysis. Thus, it is essential to differentiate reliably between acceptable data (hits) and unacceptable data (misses). To this end, a novel pipeline is proposed to categorize the data, which extracts features from the images, summarizes these features with the 'bag of visual words' method and then classifies the images using machine learning. In addition, a novel study of various feature extractors and machine learning classifiers is presented, with the aim of finding the best feature extractor and machine learning classifier for serial crystallography data. The study reveals that the oriented FAST and rotated BRIEF (ORB) feature extractor with a multilayer perceptron classifier gives the best results. Finally, the ORB feature extractor with multilayer perceptron is evaluated on various data sets including both synthetic and experimental data, demonstrating superior performance compared with other feature extractors and classifiers.
Collapse
Affiliation(s)
- Vahid Rahmani
- Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Shah Nawaz
- Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - David Pennicard
- Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | | | - Heinz Graafsma
- Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
- Mid-Sweden University, Sundsvall, Sweden
| |
Collapse
|
7
|
Sadri A, Hadian-Jazi M, Yefanov O, Galchenkova M, Kirkwood H, Mills G, Sikorski M, Letrun R, de Wijn R, Vakili M, Oberthuer D, Komadina D, Brehm W, Mancuso AP, Carnis J, Gelisio L, Chapman HN. Automatic bad-pixel mask maker for X-ray pixel detectors with application to serial crystallography. J Appl Crystallogr 2022; 55:1549-1561. [PMID: 36570663 PMCID: PMC9721322 DOI: 10.1107/s1600576722009815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 10/06/2022] [Indexed: 11/22/2022] Open
Abstract
X-ray crystallography has witnessed a massive development over the past decade, driven by large increases in the intensity and brightness of X-ray sources and enabled by employing high-frame-rate X-ray detectors. The analysis of large data sets is done via automatic algorithms that are vulnerable to imperfections in the detector and noise inherent with the detection process. By improving the model of the behaviour of the detector, data can be analysed more reliably and data storage costs can be significantly reduced. One major requirement is a software mask that identifies defective pixels in diffraction frames. This paper introduces a methodology and program based upon concepts of machine learning, called robust mask maker (RMM), for the generation of bad-pixel masks for large-area X-ray pixel detectors based on modern robust statistics. It is proposed to discriminate normally behaving pixels from abnormal pixels by analysing routine measurements made with and without X-ray illumination. Analysis software typically uses a Bragg peak finder to detect Bragg peaks and an indexing method to detect crystal lattices among those peaks. Without proper masking of the bad pixels, peak finding methods often confuse the abnormal values of bad pixels in a pattern with true Bragg peaks and flag such patterns as useful regardless, leading to storage of enormous uninformative data sets. Also, it is computationally very expensive for indexing methods to search for crystal lattices among false peaks and the solution may be biased. This paper shows how RMM vastly improves peak finders and prevents them from labelling bad pixels as Bragg peaks, by demonstrating its effectiveness on several serial crystallography data sets.
Collapse
Affiliation(s)
- Alireza Sadri
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Marjan Hadian-Jazi
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
- ARC Centre of Excellence in Advanced Molecular Imaging, La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Australia
- Australian Nuclear Science and Technology Organisation (ANSTO), Australia
| | - Oleksandr Yefanov
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Marina Galchenkova
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Henry Kirkwood
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
| | - Grant Mills
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
| | - Marcin Sikorski
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
| | - Romain Letrun
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
| | - Raphael de Wijn
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
| | - Mohammad Vakili
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
| | - Dominik Oberthuer
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Dana Komadina
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Wolfgang Brehm
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Adrian P. Mancuso
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
- Department of Chemistry and Physics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria, Australia
| | - Jerome Carnis
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Luca Gelisio
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Henry N. Chapman
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
- Department of Physics, Universität Hamburg, Luruper Chaussee 149, 22761 Hamburg, Germany
- The Hamburg Centre for Ultrafast Imaging, Luruper Chaussee 149, 22761 Hamburg, Germany
| |
Collapse
|
8
|
Kirkwood HJ, de Wijn R, Mills G, Letrun R, Kloos M, Vakili M, Karnevskiy M, Ahmed K, Bean RJ, Bielecki J, Dall'Antonia F, Kim Y, Kim C, Koliyadu J, Round A, Sato T, Sikorski M, Vagovič P, Sztuk-Dambietz J, Mancuso AP. A multi-million image Serial Femtosecond Crystallography dataset collected at the European XFEL. Sci Data 2022; 9:161. [PMID: 35414146 PMCID: PMC9005607 DOI: 10.1038/s41597-022-01266-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 02/22/2022] [Indexed: 11/09/2022] Open
Abstract
Serial femtosecond crystallography is a rapidly developing method for determining the structure of biomolecules for samples which have proven challenging with conventional X-ray crystallography, such as for membrane proteins and microcrystals, or for time-resolved studies. The European XFEL, the first high repetition rate hard X-ray free electron laser, provides the ability to record diffraction data at more than an order of magnitude faster than previously achievable, putting increased demand on sample delivery and data processing. This work describes a publicly available serial femtosecond crystallography dataset collected at the SPB/SFX instrument at the European XFEL. This dataset contains information suitable for algorithmic development for detector calibration, image classification and structure determination, as well as testing and training for future users of the European XFEL and other XFELs.
Collapse
Affiliation(s)
| | | | - Grant Mills
- European XFEL, Holzkoppel 4, 22869, Schenefeld, Germany
| | - Romain Letrun
- European XFEL, Holzkoppel 4, 22869, Schenefeld, Germany
| | - Marco Kloos
- European XFEL, Holzkoppel 4, 22869, Schenefeld, Germany
| | | | | | - Karim Ahmed
- European XFEL, Holzkoppel 4, 22869, Schenefeld, Germany
| | | | | | | | - Yoonhee Kim
- European XFEL, Holzkoppel 4, 22869, Schenefeld, Germany
| | - Chan Kim
- European XFEL, Holzkoppel 4, 22869, Schenefeld, Germany
| | | | - Adam Round
- European XFEL, Holzkoppel 4, 22869, Schenefeld, Germany
- School of Chemical and Physical Sciences, Keele University, Staffordshire, ST5 5AZ, United Kingdom
| | - Tokushi Sato
- European XFEL, Holzkoppel 4, 22869, Schenefeld, Germany
| | | | | | | | - Adrian P Mancuso
- European XFEL, Holzkoppel 4, 22869, Schenefeld, Germany
- Department of Chemistry and Physics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, 3086, Australia
| |
Collapse
|