1
|
Bandyopadhyay H, Deng Z, Ding L, Liu S, Uddin MR, Zeng X, Behpour S, Xu M. Cryo-shift: reducing domain shift in cryo-electron subtomograms with unsupervised domain adaptation and randomization. Bioinformatics 2022; 38:977-984. [PMID: 34897387 DOI: 10.1093/bioinformatics/btab794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 10/18/2021] [Accepted: 11/17/2021] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology that enables the visualization of subcellular structures in situ at near-atomic resolution. Cellular cryo-ET images help in resolving the structures of macromolecules and determining their spatial relationship in a single cell, which has broad significance in cell and structural biology. Subtomogram classification and recognition constitute a primary step in the systematic recovery of these macromolecular structures. Supervised deep learning methods have been proven to be highly accurate and efficient for subtomogram classification, but suffer from limited applicability due to scarcity of annotated data. While generating simulated data for training supervised models is a potential solution, a sizeable difference in the image intensity distribution in generated data as compared with real experimental data will cause the trained models to perform poorly in predicting classes on real subtomograms. RESULTS In this work, we present Cryo-Shift, a fully unsupervised domain adaptation and randomization framework for deep learning-based cross-domain subtomogram classification. We use unsupervised multi-adversarial domain adaption to reduce the domain shift between features of simulated and experimental data. We develop a network-driven domain randomization procedure with 'warp' modules to alter the simulated data and help the classifier generalize better on experimental data. We do not use any labeled experimental data to train our model, whereas some of the existing alternative approaches require labeled experimental samples for cross-domain classification. Nevertheless, Cryo-Shift outperforms the existing alternative approaches in cross-domain subtomogram classification in extensive evaluation studies demonstrated herein using both simulated and experimental data. AVAILABILITYAND IMPLEMENTATION https://github.com/xulabs/aitom. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hmrishav Bandyopadhyay
- Department of Electronics and Telecommunication Engineering, Jadavpur University, Kolkata 700032, India
| | - Zihao Deng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Leiting Ding
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Sinuo Liu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Mostofa Rafid Uddin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Sima Behpour
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
2
|
Desmosome architecture derived from molecular dynamics simulations and cryo-electron tomography. Proc Natl Acad Sci U S A 2020; 117:27132-27140. [PMID: 33067392 PMCID: PMC7959525 DOI: 10.1073/pnas.2004563117] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The desmosome is a major cell–cell junction connecting cells in tissues under high mechanical load. Currently, while structures of the constituent cadherins are known, the desmosome architecture has remained elusive. The primary reason is the high plasticity of the cadherins. As many other cellular structures, their high flexibility cannot be easily addressed by conventional structural techniques that rely on averaging many identical structures. For this, we combine high-end cryo-electron tomography with large-scale molecular dynamics simulations to produce a molecular model of the desmosome that integrates new with decades-old observations, accounts for the remarkable biophysical properties, and maps the intermolecular interactions. Desmosomes are cell–cell junctions that link tissue cells experiencing intense mechanical stress. Although the structure of the desmosomal cadherins is known, the desmosome architecture—which is essential for mediating numerous functions—remains elusive. Here, we recorded cryo-electron tomograms (cryo-ET) in which individual cadherins can be discerned; they appear variable in shape, spacing, and tilt with respect to the membrane. The resulting sub-tomogram average reaches a resolution of ∼26 Å, limited by the inherent flexibility of desmosomes. To address this challenge typical of dynamic biological assemblies, we combine sub-tomogram averaging with atomistic molecular dynamics (MD) simulations. We generate models of possible cadherin arrangements and perform an in silico screening according to biophysical and structural properties extracted from MD simulation trajectories. We find a truss-like arrangement of cadherins that resembles the characteristic footprint seen in the electron micrograph. The resulting model of the desmosomal architecture explains their unique biophysical properties and strength.
Collapse
|
3
|
Zeng X, Xu M. Gum-Net: Unsupervised Geometric Matching for Fast and Accurate 3D Subtomogram Image Alignment and Averaging. PROCEEDINGS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2020; 2020:4072-4082. [PMID: 33716478 PMCID: PMC7955792 DOI: 10.1109/cvpr42600.2020.00413] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
We propose a Geometric unsupervised matching Network (Gum-Net) for finding the geometric correspondence between two images with application to 3D subtomogram alignment and averaging. Subtomogram alignment is the most important task in cryo-electron tomography (cryo-ET), a revolutionary 3D imaging technique for visualizing the molecular organization of unperturbed cellular landscapes in single cells. However, subtomogram alignment and averaging are very challenging due to severe imaging limits such as noise and missing wedge effects. We introduce an end-to-end trainable architecture with three novel modules specifically designed for preserving feature spatial information and propagating feature matching information. The training is performed in a fully unsupervised fashion to optimize a matching metric. No ground truth transformation information nor category-level or instance-level matching supervision information is needed. After systematic assessments on six real and nine simulated datasets, we demonstrate that Gum-Net reduced the alignment error by 40 to 50% and improved the averaging resolution by 10%. Gum-Net also achieved 70 to 110 times speedup in practice with GPU acceleration compared to state-of-the-art subtomogram alignment methods. Our work is the first 3D unsupervised geometric matching method for images of strong transformation variation and high noise level. The training code, trained model, and datasets are available in our open-source software AITom.
Collapse
Affiliation(s)
- Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213
| |
Collapse
|
4
|
Zhao Y, Zeng X, Guo Q, Xu M. An integration of fast alignment and maximum-likelihood methods for electron subtomogram averaging and classification. Bioinformatics 2019; 34:i227-i236. [PMID: 29949977 PMCID: PMC6022576 DOI: 10.1093/bioinformatics/bty267] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation Cellular Electron CryoTomography (CECT) is an emerging 3D imaging technique that visualizes subcellular organization of single cells at sub-molecular resolution and in near-native state. CECT captures large numbers of macromolecular complexes of highly diverse structures and abundances. However, the structural complexity and imaging limits complicate the systematic de novo structural recovery and recognition of these macromolecular complexes. Efficient and accurate reference-free subtomogram averaging and classification represent the most critical tasks for such analysis. Existing subtomogram alignment based methods are prone to the missing wedge effects and low signal-to-noise ratio (SNR). Moreover, existing maximum-likelihood based methods rely on integration operations, which are in principle computationally infeasible for accurate calculation. Results Built on existing works, we propose an integrated method, Fast Alignment Maximum Likelihood method (FAML), which uses fast subtomogram alignment to sample sub-optimal rigid transformations. The transformations are then used to approximate integrals for maximum-likelihood update of subtomogram averages through expectation–maximization algorithm. Our tests on simulated and experimental subtomograms showed that, compared to our previously developed fast alignment method (FA), FAML is significantly more robust to noise and missing wedge effects with moderate increases of computation cost. Besides, FAML performs well with significantly fewer input subtomograms when the FA method fails. Therefore, FAML can serve as a key component for improved construction of initial structural models from macromolecules captured by CECT. Availability and implementation http://www.cs.cmu.edu/mxu1
Collapse
Affiliation(s)
- Yixiu Zhao
- Computational Biology and Computer Science Departments, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiangrui Zeng
- Computational Biology and Computer Science Departments, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Qiang Guo
- Department of Molecular Structural Biology, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Min Xu
- Computational Biology and Computer Science Departments, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
5
|
Lin R, Zeng X, Kitani K, Xu M. Adversarial domain adaptation for cross data source macromolecule in situ structural classification in cellular electron cryo-tomograms. Bioinformatics 2019; 35:i260-i268. [PMID: 31510673 PMCID: PMC6612867 DOI: 10.1093/bioinformatics/btz364] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION Since 2017, an increasing amount of attention has been paid to the supervised deep learning-based macromolecule in situ structural classification (i.e. subtomogram classification) in cellular electron cryo-tomography (CECT) due to the substantially higher scalability of deep learning. However, the success of such supervised approach relies heavily on the availability of large amounts of labeled training data. For CECT, creating valid training data from the same data source as prediction data is usually laborious and computationally intensive. It would be beneficial to have training data from a separate data source where the annotation is readily available or can be performed in a high-throughput fashion. However, the cross data source prediction is often biased due to the different image intensity distributions (a.k.a. domain shift). RESULTS We adapt a deep learning-based adversarial domain adaptation (3D-ADA) method to timely address the domain shift problem in CECT data analysis. 3D-ADA first uses a source domain feature extractor to extract discriminative features from the training data as the input to a classifier. Then it adversarially trains a target domain feature extractor to reduce the distribution differences of the extracted features between training and prediction data. As a result, the same classifier can be directly applied to the prediction data. We tested 3D-ADA on both experimental and realistically simulated subtomogram datasets under different imaging conditions. 3D-ADA stably improved the cross data source prediction, as well as outperformed two popular domain adaptation methods. Furthermore, we demonstrate that 3D-ADA can improve cross data source recovery of novel macromolecular structures. AVAILABILITY AND IMPLEMENTATION https://github.com/xulabs/projects. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruogu Lin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Kris Kitani
- Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
6
|
Xu M, Chai X, Muthakana H, Liang X, Yang G, Zeev-Ben-Mordehai T, Xing EP. Deep learning-based subdivision approach for large scale macromolecules structure recovery from electron cryo tomograms. Bioinformatics 2018; 33:i13-i22. [PMID: 28881965 PMCID: PMC5946875 DOI: 10.1093/bioinformatics/btx230] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Motivation Cellular Electron CryoTomography (CECT) enables 3D visualization of cellular organization at near-native state and in sub-molecular resolution, making it a powerful tool for analyzing structures of macromolecular complexes and their spatial organizations inside single cells. However, high degree of structural complexity together with practical imaging limitations makes the systematic de novo discovery of structures within cells challenging. It would likely require averaging and classifying millions of subtomograms potentially containing hundreds of highly heterogeneous structural classes. Although it is no longer difficult to acquire CECT data containing such amount of subtomograms due to advances in data acquisition automation, existing computational approaches have very limited scalability or discrimination ability, making them incapable of processing such amount of data. Results To complement existing approaches, in this article we propose a new approach for subdividing subtomograms into smaller but relatively homogeneous subsets. The structures in these subsets can then be separately recovered using existing computation intensive methods. Our approach is based on supervised structural feature extraction using deep learning, in combination with unsupervised clustering and reference-free classification. Our experiments show that, compared with existing unsupervised rotation invariant feature and pose-normalization based approaches, our new approach achieves significant improvements in both discrimination ability and scalability. More importantly, our new approach is able to discover new structural classes and recover structures that do not exist in training data. Availability and Implementation Source code freely available at http://www.cs.cmu.edu/∼mxu1/software. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiaoqi Chai
- Biomedical Engineering Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Hariank Muthakana
- Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiaodan Liang
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ge Yang
- Biomedical Engineering Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tzviya Zeev-Ben-Mordehai
- Division of Structural Biology, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Eric P Xing
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
7
|
Duyx B, Urlings MJ, Swaen GM, Bouter LM, Zeegers MP. Scientific citations favor positive results: a systematic review and meta-analysis. J Clin Epidemiol 2017; 88:92-101. [DOI: 10.1016/j.jclinepi.2017.06.002] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2016] [Revised: 05/15/2017] [Accepted: 06/03/2017] [Indexed: 10/19/2022]
|
8
|
Galaz-Montoya JG, Ludtke SJ. The advent of structural biology in situ by single particle cryo-electron tomography. BIOPHYSICS REPORTS 2017; 3:17-35. [PMID: 28781998 PMCID: PMC5516000 DOI: 10.1007/s41048-017-0040-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2017] [Accepted: 03/30/2017] [Indexed: 01/06/2023] Open
Abstract
Single particle tomography (SPT), also known as subtomogram averaging, is a powerful technique uniquely poised to address questions in structural biology that are not amenable to more traditional approaches like X-ray crystallography, nuclear magnetic resonance, and conventional cryoEM single particle analysis. Owing to its potential for in situ structural biology at subnanometer resolution, SPT has been gaining enormous momentum in the last five years and is becoming a prominent, widely used technique. This method can be applied to unambiguously determine the structures of macromolecular complexes that exhibit compositional and conformational heterogeneity, both in vitro and in situ. Here we review the development of SPT, highlighting its applications and identifying areas of ongoing development.
Collapse
Affiliation(s)
- Jesús G Galaz-Montoya
- National Center for Macromolecular Imaging, Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030 USA
| | - Steven J Ludtke
- National Center for Macromolecular Imaging, Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030 USA
| |
Collapse
|
9
|
Frazier Z, Xu M, Alber F. TomoMiner and TomoMinerCloud: A Software Platform for Large-Scale Subtomogram Structural Analysis. Structure 2017; 25:951-961.e2. [PMID: 28552576 DOI: 10.1016/j.str.2017.04.016] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2015] [Revised: 12/17/2016] [Accepted: 04/28/2017] [Indexed: 11/19/2022]
Abstract
Cryo-electron tomography (cryo-ET) captures the 3D electron density distribution of macromolecular complexes in close to native state. With the rapid advance of cryo-ET acquisition technologies, it is possible to generate large numbers (>100,000) of subtomograms, each containing a macromolecular complex. Often, these subtomograms represent a heterogeneous sample due to variations in the structure and composition of a complex in situ form or because particles are a mixture of different complexes. In this case subtomograms must be classified. However, classification of large numbers of subtomograms is a time-intensive task and often a limiting bottleneck. This paper introduces an open source software platform, TomoMiner, for large-scale subtomogram classification, template matching, subtomogram averaging, and alignment. Its scalable and robust parallel processing allows efficient classification of tens to hundreds of thousands of subtomograms. In addition, TomoMiner provides a pre-configured TomoMinerCloud computing service permitting users without sufficient computing resources instant access to TomoMiners high-performance features.
Collapse
Affiliation(s)
- Zachary Frazier
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA.
| | - Frank Alber
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, USA.
| |
Collapse
|