1
|
Sorzano COS, Jiménez-Moreno A, Maluenda D, Martínez M, Ramírez-Aportela E, Krieger J, Melero R, Cuervo A, Conesa J, Filipovic J, Conesa P, del Caño L, Fonseca YC, Jiménez-de la Morena J, Losana P, Sánchez-García R, Strelak D, Fernández-Giménez E, de Isidro-Gómez FP, Herreros D, Vilas JL, Marabini R, Carazo JM. On bias, variance, overfitting, gold standard and consensus in single-particle analysis by cryo-electron microscopy. Acta Crystallogr D Struct Biol 2022; 78:410-423. [PMID: 35362465 PMCID: PMC8972802 DOI: 10.1107/s2059798322001978] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 02/18/2022] [Indexed: 12/05/2022] Open
Abstract
Cryo-electron microscopy (cryoEM) has become a well established technique to elucidate the 3D structures of biological macromolecules. Projection images from thousands of macromolecules that are assumed to be structurally identical are combined into a single 3D map representing the Coulomb potential of the macromolecule under study. This article discusses possible caveats along the image-processing path and how to avoid them to obtain a reliable 3D structure. Some of these problems are very well known in the community. These may be referred to as sample-related (such as specimen denaturation at interfaces or non-uniform projection geometry leading to underrepresented projection directions). The rest are related to the algorithms used. While some have been discussed in depth in the literature, such as the use of an incorrect initial volume, others have received much less attention. However, they are fundamental in any data-analysis approach. Chiefly among them, instabilities in estimating many of the key parameters that are required for a correct 3D reconstruction that occur all along the processing workflow are referred to, which may significantly affect the reliability of the whole process. In the field, the term overfitting has been coined to refer to some particular kinds of artifacts. It is argued that overfitting is a statistical bias in key parameter-estimation steps in the 3D reconstruction process, including intrinsic algorithmic bias. It is also shown that common tools (Fourier shell correlation) and strategies (gold standard) that are normally used to detect or prevent overfitting do not fully protect against it. Alternatively, it is proposed that detecting the bias that leads to overfitting is much easier when addressed at the level of parameter estimation, rather than detecting it once the particle images have been combined into a 3D map. Comparing the results from multiple algorithms (or at least, independent executions of the same algorithm) can detect parameter bias. These multiple executions could then be averaged to give a lower variance estimate of the underlying parameters.
Collapse
Affiliation(s)
- C. O. S. Sorzano
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - A. Jiménez-Moreno
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - D. Maluenda
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - M. Martínez
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - E. Ramírez-Aportela
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - J. Krieger
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - R. Melero
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - A. Cuervo
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - J. Conesa
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | | | - P. Conesa
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - L. del Caño
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - Y. C. Fonseca
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - J. Jiménez-de la Morena
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - P. Losana
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - R. Sánchez-García
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - D. Strelak
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
- Masaryk University, Brno, Czech Republic
| | - E. Fernández-Giménez
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - F. P. de Isidro-Gómez
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - D. Herreros
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| | - J. L. Vilas
- School of Engineering and Applied Science, Yale University, New Haven, CT 06520-829, USA
| | - R. Marabini
- Escuela Politecnica Superior, Universidad Autónoma de Madrid, 28049 Cantoblanco, Madrid, Spain
| | - J. M. Carazo
- Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain
| |
Collapse
|
2
|
Tegunov D, Cramer P. Real-time cryo-electron microscopy data preprocessing with Warp. Nat Methods 2019; 16:1146-1152. [PMID: 31591575 PMCID: PMC6858868 DOI: 10.1038/s41592-019-0580-y] [Citation(s) in RCA: 685] [Impact Index Per Article: 137.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2018] [Accepted: 08/21/2019] [Indexed: 12/22/2022]
Abstract
The acquisition of cryo-electron microscopy (cryo-EM) data from biological specimens must be tightly coupled to data preprocessing to ensure the best data quality and microscope usage. Here we describe Warp, a software that automates all preprocessing steps of cryo-EM data acquisition and enables real-time evaluation. Warp corrects micrographs for global and local motion, estimates the local defocus and monitors key parameters for each recorded micrograph or tomographic tilt series in real time. The software further includes deep-learning-based models for accurate particle picking and image denoising. The output from Warp can be fed into established programs for particle classification and 3D-map refinement. Our benchmarks show improvement in the nominal resolution, which went from 3.9 Å to 3.2 Å, of a published cryo-EM data set for influenza virus hemagglutinin. Warp is easy to install from http://github.com/cramerlab/warp and computationally inexpensive, and has an intuitive, streamlined user interface.
Collapse
Affiliation(s)
- Dimitry Tegunov
- Max Planck Institute for Biophysical Chemistry, Department of Molecular Biology, Göttingen, Germany.
| | - Patrick Cramer
- Max Planck Institute for Biophysical Chemistry, Department of Molecular Biology, Göttingen, Germany.
| |
Collapse
|
3
|
Penczek PA, Fang J, Li X, Cheng Y, Loerke J, Spahn CMT. CTER-rapid estimation of CTF parameters with error assessment. Ultramicroscopy 2014; 140:9-19. [PMID: 24562077 DOI: 10.1016/j.ultramic.2014.01.009] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Revised: 01/22/2014] [Accepted: 01/27/2014] [Indexed: 10/25/2022]
Abstract
In structural electron microscopy, the accurate estimation of the Contrast Transfer Function (CTF) parameters, particularly defocus and astigmatism, is of utmost importance for both initial evaluation of micrograph quality and for subsequent structure determination. Due to increases in the rate of data collection on modern microscopes equipped with new generation cameras, it is also important that the CTF estimation can be done rapidly and with minimal user intervention. Finally, in order to minimize the necessity for manual screening of the micrographs by a user it is necessary to provide an assessment of the errors of fitted parameters values. In this work we introduce CTER, a CTF parameters estimation method distinguished by its computational efficiency. The efficiency of the method makes it suitable for high-throughput EM data collection, and enables the use of a statistical resampling technique, bootstrap, that yields standard deviations of estimated defocus and astigmatism amplitude and angle, thus facilitating the automation of the process of screening out inferior micrograph data. Furthermore, CTER also outputs the spatial frequency limit imposed by reciprocal space aliasing of the discrete form of the CTF and the finite window size. We demonstrate the efficiency and accuracy of CTER using a data set collected on a 300kV Tecnai Polara (FEI) using the K2 Summit DED camera in super-resolution counting mode. Using CTER we obtained a structure of the 80S ribosome whose large subunit had a resolution of 4.03Å without, and 3.85Å with, inclusion of astigmatism parameters.
Collapse
Affiliation(s)
- Pawel A Penczek
- Department of Biochemistry and Molecular Biology, The University of Texas Medical School, 6431 Fannin MSB 6.220, Houston, TX 77054, USA.
| | - Jia Fang
- Department of Biochemistry and Molecular Biology, The University of Texas Medical School, 6431 Fannin MSB 6.220, Houston, TX 77054, USA
| | - Xueming Li
- The Keck Advanced Microscopy Laboratory, Department of Biochemistry and Biophysics, University of California, San Francisco, CA 94158, USA
| | - Yifan Cheng
- The Keck Advanced Microscopy Laboratory, Department of Biochemistry and Biophysics, University of California, San Francisco, CA 94158, USA
| | - Justus Loerke
- Institut für Medizinische Physik und Biophysik, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Christian M T Spahn
- Institut für Medizinische Physik und Biophysik, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| |
Collapse
|
4
|
Jeong HS, Park HN, Kim JG, Hyun JK. Critical importance of the correction of contrast transfer function for
transmission electron microscopy-mediated structural biology. J Anal Sci Technol 2013. [DOI: 10.1186/2093-3371-4-14] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Abstracts
Background
Transmission electron microscopy (TEM) is an excellent tool for studying
detailed biological structures. High-resolution structure determination is
now routinely performed using advanced sample preparation techniques and
image processing software. In particular, correction for contrast transfer
function (CTF) is crucial for extracting high-resolution information from
TEM image that is convoluted by imperfect imaging condition. Accurate
determination of defocus, one of the major elements constituting the CTF, is
mandatory for CTF correction.
Findings
To investigate the effect of correct estimation of image defocus and
subsequent CTF correction, we tested arbitrary CTF imposition onto the
images of two-dimensional crystals of Rous sarcoma virus capsid protein. The
morphology of the crystal in calculated projection maps from incorrect CTF
imposition was utterly distorted in comparison to an appropriately
CTF-corrected image.
Conclusion
This result demonstrates critical importance of CTF correction for producing
true representation of the specimen at high resolution.
Collapse
|
5
|
Semiautomatic, high-throughput, high-resolution protocol for three-dimensional reconstruction of single particles in electron microscopy. Methods Mol Biol 2013; 950:171-93. [PMID: 23086876 DOI: 10.1007/978-1-62703-137-0_11] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
In this chapter we describe the steps needed for reconstructing the three-dimensional structure of a macromolecular complex starting from its projections collected in electron micrographs. The concepts are shown through the use of Xmipp 3.0, a software suite specifically designed for the image processing of biological structures imaged with electron or X-ray microscopy. We illustrate the image processing workflow by applying it to the images of Bovine Papilloma virus published in Wolf et al. (Proc Natl Acad Sci USA 107:6298-6303, 2010). We show that in the case of high-quality, homogeneous datasets with a priori knowledge about the initial volume, we can have a high-resolution 3D reconstruction in less than 1 day using a computer cluster with only 32 processors.
Collapse
|
7
|
Lyumkis D, Moeller A, Cheng A, Herold A, Hou E, Irving C, Jacovetty EL, Lau PW, Mulder AM, Pulokas J, Quispe JD, Voss NR, Potter CS, Carragher B. Automation in single-particle electron microscopy connecting the pieces. Methods Enzymol 2010; 483:291-338. [PMID: 20888480 DOI: 10.1016/s0076-6879(10)83015-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Throughout the history of single-particle electron microscopy (EM), automated technologies have seen varying degrees of emphasis and development, usually depending upon the contemporary demands of the field. We are currently faced with increasingly sophisticated devices for specimen preparation, vast increases in the size of collected data sets, comprehensive algorithms for image processing, sophisticated tools for quality assessment, and an influx of interested scientists from outside the field who might lack the skills of experienced microscopists. This situation places automated techniques in high demand. In this chapter, we provide a generic definition of and discuss some of the most important advances in automated approaches to specimen preparation, grid handling, robotic screening, microscope calibrations, data acquisition, image processing, and computational infrastructure. Each section describes the general problem and then provides examples of how that problem has been addressed through automation, highlighting available processing packages, and sometimes describing the particular approach at the National Resource for Automated Molecular Microscopy (NRAMM). We contrast the more familiar manual procedures with automated approaches, emphasizing breakthroughs as well as current limitations. Finally, we speculate on future directions and improvements in automated technologies. Our overall goal is to present automation as more than simply a tool to save time. Rather, we aim to illustrate that automation is a comprehensive and versatile strategy that can deliver biological information on an unprecedented scale beyond the scope available with classical manual approaches.
Collapse
Affiliation(s)
- Dmitry Lyumkis
- National Resource for Automated Molecular Microscopy, Department of Cell Biology, The Scripps Research Institute, La Jolla, California, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|