1
|
Wang T, Li B, Zhang J, Zeng X, Uddin MR, Wu W, Xu M. DEEP ACTIVE LEARNING FOR CRYO-ELECTRON TOMOGRAPHY CLASSIFICATION. PROCEEDINGS. INTERNATIONAL CONFERENCE ON IMAGE PROCESSING 2022; 2022:1611-1615. [PMID: 37021115 PMCID: PMC10072314 DOI: 10.1109/icip46576.2022.9898002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/12/2023]
Abstract
Cryo-Electron Tomography (cryo-ET) is an emerging 3D imaging technique which shows great potentials in structural biology research. One of the main challenges is to perform classification of macromolecules captured by cryo-ET. Recent efforts exploit deep learning to address this challenge. However, training reliable deep models usually requires a huge amount of labeled data in supervised fashion. Annotating cryo-ET data is arguably very expensive. Deep Active Learning (DAL) can be used to reduce labeling cost while not sacrificing the task performance too much. Nevertheless, most existing methods resort to auxiliary models or complex fashions (e.g. adversarial learning) for uncertainty estimation, the core of DAL. These models need to be highly customized for cryo-ET tasks which require 3D networks, and extra efforts are also indispensable for tuning these models, rendering a difficulty of deployment on cryo-ET tasks. To address these challenges, we propose a novel metric for data selection in DAL, which can also be leveraged as a regularizer of the empirical loss, further boosting the task model. We demonstrate the superiority of our method via extensive experiments on both simulated and real cryo-ET datasets. Our source Code and Appendix can be found at this URL.
Collapse
Affiliation(s)
| | - Bo Li
- University of Southern Mississippi
| | | | | | | | - Wei Wu
- University of Southern Mississippi
| | - Min Xu
- Carnegie Mellon University
| |
Collapse
|
2
|
Gupta T, He X, Uddin MR, Zeng X, Zhou A, Zhang J, Freyberg Z, Xu M. Self-supervised learning for macromolecular structure classification based on cryo-electron tomograms. Front Physiol 2022; 13:957484. [PMID: 36111160 PMCID: PMC9468634 DOI: 10.3389/fphys.2022.957484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 08/02/2022] [Indexed: 11/21/2022] Open
Abstract
Macromolecular structure classification from cryo-electron tomography (cryo-ET) data is important for understanding macro-molecular dynamics. It has a wide range of applications and is essential in enhancing our knowledge of the sub-cellular environment. However, a major limitation has been insufficient labelled cryo-ET data. In this work, we use Contrastive Self-supervised Learning (CSSL) to improve the previous approaches for macromolecular structure classification from cryo-ET data with limited labels. We first pretrain an encoder with unlabelled data using CSSL and then fine-tune the pretrained weights on the downstream classification task. To this end, we design a cryo-ET domain-specific data-augmentation pipeline. The benefit of augmenting cryo-ET datasets is most prominent when the original dataset is limited in size. Overall, extensive experiments performed on real and simulated cryo-ET data in the semi-supervised learning setting demonstrate the effectiveness of our approach in macromolecular labeling and classification.
Collapse
Affiliation(s)
- Tarun Gupta
- Department of Computer Science and Engineering, Indian Institute of Technology, Indore, India
| | - Xuehai He
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, CA, United States
| | - Mostofa Rafid Uddin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Andrew Zhou
- Irvington High School, Irvington, NY, United States
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, Irvine, CA, United States
| | - Zachary Freyberg
- Departments of Psychiatry and Cell Biology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
- *Correspondence: Min Xu,
| |
Collapse
|
3
|
Bandyopadhyay H, Deng Z, Ding L, Liu S, Uddin MR, Zeng X, Behpour S, Xu M. Cryo-shift: reducing domain shift in cryo-electron subtomograms with unsupervised domain adaptation and randomization. Bioinformatics 2022; 38:977-984. [PMID: 34897387 DOI: 10.1093/bioinformatics/btab794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 10/18/2021] [Accepted: 11/17/2021] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology that enables the visualization of subcellular structures in situ at near-atomic resolution. Cellular cryo-ET images help in resolving the structures of macromolecules and determining their spatial relationship in a single cell, which has broad significance in cell and structural biology. Subtomogram classification and recognition constitute a primary step in the systematic recovery of these macromolecular structures. Supervised deep learning methods have been proven to be highly accurate and efficient for subtomogram classification, but suffer from limited applicability due to scarcity of annotated data. While generating simulated data for training supervised models is a potential solution, a sizeable difference in the image intensity distribution in generated data as compared with real experimental data will cause the trained models to perform poorly in predicting classes on real subtomograms. RESULTS In this work, we present Cryo-Shift, a fully unsupervised domain adaptation and randomization framework for deep learning-based cross-domain subtomogram classification. We use unsupervised multi-adversarial domain adaption to reduce the domain shift between features of simulated and experimental data. We develop a network-driven domain randomization procedure with 'warp' modules to alter the simulated data and help the classifier generalize better on experimental data. We do not use any labeled experimental data to train our model, whereas some of the existing alternative approaches require labeled experimental samples for cross-domain classification. Nevertheless, Cryo-Shift outperforms the existing alternative approaches in cross-domain subtomogram classification in extensive evaluation studies demonstrated herein using both simulated and experimental data. AVAILABILITYAND IMPLEMENTATION https://github.com/xulabs/aitom. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hmrishav Bandyopadhyay
- Department of Electronics and Telecommunication Engineering, Jadavpur University, Kolkata 700032, India
| | - Zihao Deng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Leiting Ding
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Sinuo Liu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Mostofa Rafid Uddin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Sima Behpour
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
4
|
Chharia A, Upadhyay R, Kumar V, Cheng C, Zhang J, Wang T, Xu M. Deep-Precognitive Diagnosis: Preventing Future Pandemics by Novel Disease Detection With Biologically-Inspired Conv-Fuzzy Network. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2022; 10:23167-23185. [PMID: 35360503 PMCID: PMC8967064 DOI: 10.1109/access.2022.3153059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Deep learning-based Computer-Aided Diagnosis has gained immense attention in recent years due to its capability to enhance diagnostic performance and elucidate complex clinical tasks. However, conventional supervised deep learning models are incapable of recognizing novel diseases that do not exist in the training dataset. Automated early-stage detection of novel infectious diseases can be vital in controlling their rapid spread. Moreover, the development of a conventional CAD model is only possible after disease outbreaks and datasets become available for training (viz. COVID-19 outbreak). Since novel diseases are unknown and cannot be included in training data, it is challenging to recognize them through existing supervised deep learning models. Even after data becomes available, recognizing new classes with conventional models requires a complete extensive re-training. The present study is the first to report this problem and propose a novel solution to it. In this study, we propose a new class of CAD models, i.e., Deep-Precognitive Diagnosis, wherein artificial agents are enabled to identify unknown diseases that have the potential to cause a pandemic in the future. A de novo biologically-inspired Conv-Fuzzy network is developed. Experimental results show that the model trained to classify Chest X-Ray (CXR) scans into normal and bacterial pneumonia detected a novel disease during testing, unseen by it in the training sample and confirmed to be COVID-19 later. The model is also tested on SARS-CoV-1 and MERS-CoV samples as unseen diseases and achieved state-of-the-art accuracy. The proposed model eliminates the need for model re-training by creating a new class in real-time for the detected novel disease, thus classifying it on all subsequent occurrences. Second, the model addresses the challenge of limited labeled data availability, which renders most supervised learning techniques ineffective and establishes that modified fuzzy classifiers can achieve high accuracy on image classification tasks.
Collapse
Affiliation(s)
- Aviral Chharia
- Mechanical Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab 147004, India
| | - Rahul Upadhyay
- Electronics and Communication Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab 147004, India
| | - Vinay Kumar
- Electronics and Communication Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab 147004, India
| | - Chao Cheng
- Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jing Zhang
- Department of Computer Science, University of California at Irvine, Irvine, CA 92697, USA
| | - Tianyang Wang
- Department of Computer Science and Information Technology, Austin Peay State University, Clarksville, TN 37044, USA
| | - Min Xu
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Computer Vision Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| |
Collapse
|
5
|
Abstract
Cryo-electron tomography has stepped fully into the spotlight. Enthusiasm is high. Fortunately for us, this is an exciting time to be a cryotomographer, but there is still a way to go before declaring victory. Despite its potential, cryo-electron tomography possesses many inherent challenges. How do we image through thick cell samples, and possibly even tissue? How do we identify a protein of interest amidst the noisy, crowded environment of the cytoplasm? How do we target specific moments of a dynamic cellular process for tomographic imaging? In this review, we cover the history of cryo-electron tomography and how it came to be, roughly speaking, as well as the many approaches that have been developed to overcome its intrinsic limitations.
Collapse
Affiliation(s)
- Ryan K. Hylton
- Department of Biochemistry and Molecular Biology, Penn State College of Medicine, Hershey, PA 17033, USA
| | - Matthew T. Swulius
- Department of Biochemistry and Molecular Biology, Penn State College of Medicine, Hershey, PA 17033, USA
| |
Collapse
|
6
|
Masrati G, Landau M, Ben-Tal N, Lupas A, Kosloff M, Kosinski J. Integrative Structural Biology in the Era of Accurate Structure Prediction. J Mol Biol 2021; 433:167127. [PMID: 34224746 DOI: 10.1016/j.jmb.2021.167127] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 06/28/2021] [Accepted: 06/28/2021] [Indexed: 11/16/2022]
Abstract
Characterizing the three-dimensional structure of macromolecules is central to understanding their function. Traditionally, structures of proteins and their complexes have been determined using experimental techniques such as X-ray crystallography, NMR, or cryo-electron microscopy-applied individually or in an integrative manner. Meanwhile, however, computational methods for protein structure prediction have been improving their accuracy, gradually, then suddenly, with the breakthrough advance by AlphaFold2, whose models of monomeric proteins are often as accurate as experimental structures. This breakthrough foreshadows a new era of computational methods that can build accurate models for most monomeric proteins. Here, we envision how such accurate modeling methods can combine with experimental structural biology techniques, enhancing integrative structural biology. We highlight the challenges that arise when considering multiple structural conformations, protein complexes, and polymorphic assemblies. These challenges will motivate further developments, both in modeling programs and in methods to solve experimental structures, towards better and quicker investigation of structure-function relationships.
Collapse
Affiliation(s)
- Gal Masrati
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Meytal Landau
- Department of Biology, Technion-Israel Institute of Technology, Haifa 3200003, Israel; European Molecular Biology Laboratory (EMBL), Hamburg 22607, Germany
| | - Nir Ben-Tal
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Andrei Lupas
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany.
| | - Mickey Kosloff
- Department of Human Biology, Faculty of Natural Sciences, University of Haifa, 199 Aba Khoushy Ave., Mt. Carmel, 3498838 Haifa, Israel.
| | - Jan Kosinski
- European Molecular Biology Laboratory (EMBL), Hamburg 22607, Germany; Centre for Structural Systems Biology (CSSB), Hamburg 22607, Germany; Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany.
| |
Collapse
|