1
|
Wang X, Zhu H, Terashi G, Taluja M, Kihara D. DiffModeler: large macromolecular structure modeling for cryo-EM maps using a diffusion model. Nat Methods 2024; 21:2307-2317. [PMID: 39433880 DOI: 10.1038/s41592-024-02479-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 09/19/2024] [Indexed: 10/23/2024]
Abstract
Cryogenic electron microscopy (cryo-EM) has now been widely used for determining multichain protein complexes. However, modeling a large complex structure, such as those with more than ten chains, is challenging, particularly when the map resolution decreases. Here we present DiffModeler, a fully automated method for modeling large protein complex structures. DiffModeler employs a diffusion model for backbone tracing and integrates AlphaFold2-predicted single-chain structures for structure fitting. DiffModeler showed an average template modeling score of 0.88 and 0.91 for two datasets of cryo-EM maps of 0-5 Å resolution and 0.92 for intermediate resolution maps (5-10 Å), substantially outperforming existing methodologies. Further benchmarking at low resolutions (10-20 Å) confirms its versatility, demonstrating plausible performance.
Collapse
Affiliation(s)
- Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Han Zhu
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Manav Taluja
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
2
|
Dialpuri J, Agirre J, Cowtan K, Bond P. NucleoFind: a deep-learning network for interpreting nucleic acid electron density. Nucleic Acids Res 2024; 52:e84. [PMID: 39162213 PMCID: PMC11417358 DOI: 10.1093/nar/gkae715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 07/31/2024] [Accepted: 08/06/2024] [Indexed: 08/21/2024] Open
Abstract
Nucleic acid electron density interpretation after phasing by molecular replacement or other methods remains a difficult problem for computer programs to deal with. Programs tend to rely on time-consuming and computationally exhaustive searches to recognise characteristic features. We present NucleoFind, a deep-learning-based approach to interpreting and segmenting electron density. Using an electron density map from X-ray crystallography obtained after molecular replacement, the positions of the phosphate group, sugar ring and nitrogenous base group can be predicted with high accuracy. On average, 78% of phosphate atoms, 85% of sugar atoms and 83% of base atoms are positioned in predicted density after giving NucleoFind maps produced following successful molecular replacement. NucleoFind can use the wealth of context these predicted maps provide to build more accurate and complete nucleic acid models automatically.
Collapse
Affiliation(s)
- Jordan S Dialpuri
- York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
| | - Jon Agirre
- York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
| | - Kathryn D Cowtan
- York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
| | - Paul S Bond
- York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
| |
Collapse
|
3
|
Song X, Bao L, Feng C, Huang Q, Zhang F, Gao X, Han R. Accurate Prediction of Protein Structural Flexibility by Deep Learning Integrating Intricate Atomic Structures and Cryo-EM Density Information. Nat Commun 2024; 15:5538. [PMID: 38956032 PMCID: PMC11219796 DOI: 10.1038/s41467-024-49858-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Accepted: 06/20/2024] [Indexed: 07/04/2024] Open
Abstract
The dynamics of proteins are crucial for understanding their mechanisms. However, computationally predicting protein dynamic information has proven challenging. Here, we propose a neural network model, RMSF-net, which outperforms previous methods and produces the best results in a large-scale protein dynamics dataset; this model can accurately infer the dynamic information of a protein in only a few seconds. By learning effectively from experimental protein structure data and cryo-electron microscopy (cryo-EM) data integration, our approach is able to accurately identify the interactive bidirectional constraints and supervision between cryo-EM maps and PDB models in maximizing the dynamic prediction efficacy. Rigorous 5-fold cross-validation on the dataset demonstrates that RMSF-net achieves test correlation coefficients of 0.746 ± 0.127 at the voxel level and 0.765 ± 0.109 at the residue level, showcasing its ability to deliver dynamic predictions closely approximating molecular dynamics simulations. Additionally, it offers real-time dynamic inference with minimal storage overhead on the order of megabytes. RMSF-net is a freely accessible tool and is anticipated to play an essential role in the study of protein dynamics.
Collapse
Affiliation(s)
- Xintao Song
- Research Center for Mathematics and Interdisciplinary Sciences (Ministry of Education Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao, China
- BioMap Research, Menlo Park, CA, USA
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia
| | - Lei Bao
- School of Public Health, Hubei University of Medicine, Shiyan, China
| | - Chenjie Feng
- College of Medical Information and Engineering, Ningxia Medical University, Yinchuan, China
| | - Qiang Huang
- Research Center for Mathematics and Interdisciplinary Sciences (Ministry of Education Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao, China
| | - Fa Zhang
- School of Medical Technology, Beijing Institute of Technology, Beijing, China.
| | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia.
| | - Renmin Han
- Research Center for Mathematics and Interdisciplinary Sciences (Ministry of Education Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao, China.
- BioMap Research, Menlo Park, CA, USA.
| |
Collapse
|
4
|
Giri N, Wang L, Cheng J. Cryo2StructData: A Large Labeled Cryo-EM Density Map Dataset for AI-based Modeling of Protein Structures. Sci Data 2024; 11:458. [PMID: 38710720 PMCID: PMC11074267 DOI: 10.1038/s41597-024-03299-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/23/2024] [Indexed: 05/08/2024] Open
Abstract
The advent of single-particle cryo-electron microscopy (cryo-EM) has brought forth a new era of structural biology, enabling the routine determination of large biological molecules and their complexes at atomic resolution. The high-resolution structures of biological macromolecules and their complexes significantly expedite biomedical research and drug discovery. However, automatically and accurately building atomic models from high-resolution cryo-EM density maps is still time-consuming and challenging when template-based models are unavailable. Artificial intelligence (AI) methods such as deep learning trained on limited amount of labeled cryo-EM density maps generate inaccurate atomic models. To address this issue, we created a dataset called Cryo2StructData consisting of 7,600 preprocessed cryo-EM density maps whose voxels are labelled according to their corresponding known atomic structures for training and testing AI methods to build atomic models from cryo-EM density maps. Cryo2StructData is larger than existing, publicly available datasets for training AI methods to build atomic protein structures from cryo-EM density maps. We trained and tested deep learning models on Cryo2StructData to validate its quality showing that it is ready for being used to train and test AI methods for building atomic models.
Collapse
Affiliation(s)
- Nabin Giri
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
- Roy Blunt NextGen Precision Health, University of Missouri, Columbia, MO, 65211, USA
| | - Liguo Wang
- Laboratory for BioMolecular Structure (LBMS), Brookhaven National Laboratory, Upton, NY, 11973, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA.
- Roy Blunt NextGen Precision Health, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
5
|
Giri N, Wang L, Cheng J. Cryo2StructData: A Large Labeled Cryo-EM Density Map Dataset for AI-based Modeling of Protein Structures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.06.14.545024. [PMID: 37398020 PMCID: PMC10312718 DOI: 10.1101/2023.06.14.545024] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
The advent of single-particle cryo-electron microscopy (cryo-EM) has brought forth a new era of structural biology, enabling the routine determination of large biological molecules and their complexes at atomic resolution. The high-resolution structures of biological macromolecules and their complexes significantly expedite biomedical research and drug discovery. However, automatically and accurately building atomic models from high-resolution cryo-EM density maps is still time-consuming and challenging when template-based models are unavailable. Artificial intelligence (AI) methods such as deep learning trained on limited amount of labeled cryo-EM density maps generate inaccurate atomic models. To address this issue, we created a dataset called Cryo2StructData consisting of 7,600 preprocessed cryo-EM density maps whose voxels are labelled according to their corresponding known atomic structures for training and testing AI methods to build atomic models from cryo-EM density maps. It is larger and of higher quality than any existing, publicly available dataset. We trained and tested deep learning models on Cryo2StructData to make sure it is ready for the large-scale development of AI methods for building atomic models from cryo-EM density maps.
Collapse
Affiliation(s)
- Nabin Giri
- University of Missouri, Electrical Engineering and Computer Science, Columbia, 65211, USA
- NextGen Precision Health Institute, Columbia, 65211, USA
| | - Liguo Wang
- Laboratory for Biological Structure, Brookhaven National Laboratory, Upton, NY, 11973, USA
| | - Jianlin Cheng
- University of Missouri, Electrical Engineering and Computer Science, Columbia, 65211, USA
- NextGen Precision Health Institute, Columbia, 65211, USA
| |
Collapse
|
6
|
Zhang Z, Yan B. Convolution Neural Network-Assisted Smart Fluorescent-Tongue Based on Lanthanide Ion-Induced Forming MOF/HOF Composite for Differentiation of Flavor Compounds and Wine Identification. ACS Sens 2023; 8:3585-3594. [PMID: 37612786 DOI: 10.1021/acssensors.3c01273] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
Wine flavor is a vital quality characteristic in wine, influenced by those flavor components with low sensory thresholds. It is crucial to recognize and classify the wine components related to their flavor contribution. The integration of fluorescent sensors and artificial intelligence shows huge potential in flavor recognition by emulation of the gustatory perception system. Meanwhile, achieving information identification of wine based on multiple information barcodes has hopeful applications in anticounterfeiting. In this study, we present a simple method in which organic linkers are weaved into a hydrogen-bonded organic framework (HOF) for the available transformation of a metal-bonded organic framework (MOF) induced by lanthanide ions (Ln3+). The fluorescent Ln-MOF/HOF composite exhibits high sensitivity, rapid response, and good recyclability for detecting seven flavor compounds in wine, including tannic acid, ionone, vanillin, anethole, anisaldehyde, hydroxybenzaldehyde, and 4-hydroxy-2-methylacetophenone. Depending on its satisfactory detectability, a novel strategy is provided in which a fluorescent sensor is able to function as a smart fluorescent-tongue (F-tongue) by the aid of convolutional neural network to differentiate these seven flavor compounds. In addition, the Ln-MOF/HOF composite has been used to prepare multiple information barcodes for wine information identification on the basis of dynamic fluorescence response toward tannic acid. The mimetic gustatory perception system developed in this study may offer a promising strategy for flavor recognition in food and further food anticounterfeiting.
Collapse
Affiliation(s)
- Zishuo Zhang
- School of Chemical Science and Engineering, Tongji University, Siping Road 1239, Shanghai 200092, China
| | - Bing Yan
- School of Chemical Science and Engineering, Tongji University, Siping Road 1239, Shanghai 200092, China
| |
Collapse
|
7
|
DiIorio MC, Kulczyk AW. Novel Artificial Intelligence-Based Approaches for Ab Initio Structure Determination and Atomic Model Building for Cryo-Electron Microscopy. MICROMACHINES 2023; 14:1674. [PMID: 37763837 PMCID: PMC10534518 DOI: 10.3390/mi14091674] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 08/21/2023] [Accepted: 08/25/2023] [Indexed: 09/29/2023]
Abstract
Single particle cryo-electron microscopy (cryo-EM) has emerged as the prevailing method for near-atomic structure determination, shedding light on the important molecular mechanisms of biological macromolecules. However, the inherent dynamics and structural variability of biological complexes coupled with the large number of experimental images generated by a cryo-EM experiment make data processing nontrivial. In particular, ab initio reconstruction and atomic model building remain major bottlenecks that demand substantial computational resources and manual intervention. Approaches utilizing recent innovations in artificial intelligence (AI) technology, particularly deep learning, have the potential to overcome the limitations that cannot be adequately addressed by traditional image processing approaches. Here, we review newly proposed AI-based methods for ab initio volume generation, heterogeneous 3D reconstruction, and atomic model building. We highlight the advancements made by the implementation of AI methods, as well as discuss remaining limitations and areas for future development.
Collapse
Affiliation(s)
- Megan C. DiIorio
- Institute for Quantitative Biomedicine, Rutgers University, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Arkadiusz W. Kulczyk
- Institute for Quantitative Biomedicine, Rutgers University, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
- Department of Biochemistry & Microbiology, Rutgers University, 76 Lipman Drive, New Brunswick, NJ 08901, USA
| |
Collapse
|
8
|
Giri N, Roy RS, Cheng J. Deep learning for reconstructing protein structures from cryo-EM density maps: Recent advances and future directions. Curr Opin Struct Biol 2023; 79:102536. [PMID: 36773336 PMCID: PMC10023387 DOI: 10.1016/j.sbi.2023.102536] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 12/20/2022] [Accepted: 01/03/2023] [Indexed: 02/11/2023]
Abstract
Cryo-Electron Microscopy (cryo-EM) has emerged as a key technology to determine the structure of proteins, particularly large protein complexes and assemblies in recent years. A key challenge in cryo-EM data analysis is to automatically reconstruct accurate protein structures from cryo-EM density maps. In this review, we briefly overview various deep learning methods for building protein structures from cryo-EM density maps, analyze their impact, and discuss the challenges of preparing high-quality data sets for training deep learning models. Looking into the future, more advanced deep learning models of effectively integrating cryo-EM data with other sources of complementary data such as protein sequences and AlphaFold-predicted structures need to be developed to further advance the field.
Collapse
Affiliation(s)
- Nabin Giri
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, 65211, Missouri, USA; NextGen Precision Health, University of Missouri, Columbia, 65211, Missouri, USA. https://twitter.com/@nvngiri
| | - Raj S Roy
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, 65211, Missouri, USA. https://twitter.com/@rajshekhorroy
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, 65211, Missouri, USA; NextGen Precision Health, University of Missouri, Columbia, 65211, Missouri, USA.
| |
Collapse
|
9
|
Nakamura A, Meng H, Zhao M, Wang F, Hou J, Cao R, Si D. Fast and automated protein-DNA/RNA macromolecular complex modeling from cryo-EM maps. Brief Bioinform 2023; 24:bbac632. [PMID: 36682003 PMCID: PMC10399284 DOI: 10.1093/bib/bbac632] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 12/15/2022] [Accepted: 12/29/2022] [Indexed: 01/23/2023] Open
Abstract
Cryo-electron microscopy (cryo-EM) allows a macromolecular structure such as protein-DNA/RNA complexes to be reconstructed in a three-dimensional coulomb potential map. The structural information of these macromolecular complexes forms the foundation for understanding the molecular mechanism including many human diseases. However, the model building of large macromolecular complexes is often difficult and time-consuming. We recently developed DeepTracer-2.0, an artificial-intelligence-based pipeline that can build amino acid and nucleic acid backbones from a single cryo-EM map, and even predict the best-fitting residues according to the density of side chains. The experiments showed improved accuracy and efficiency when benchmarking the performance on independent experimental maps of protein-DNA/RNA complexes and demonstrated the promising future of macromolecular modeling from cryo-EM maps. Our method and pipeline could benefit researchers worldwide who work in molecular biomedicine and drug discovery, and substantially increase the throughput of the cryo-EM model building. The pipeline has been integrated into the web portal https://deeptracer.uw.edu/.
Collapse
Affiliation(s)
- Andrew Nakamura
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, WA 98011, USA
| | - Hanze Meng
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Minglei Zhao
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL 60637, USA
| | - Fengbin Wang
- Department of Biochemistry and Molecular Genetics, University of Alabama Birmingham, Heersink School of Medicine, Birmingham, AL 35233, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, Saint Louis, MO 63103, USA
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, USA
| | - Dong Si
- Corresponding author: Dong Si, Division of Computing and Software Systems, University of Washington Bothell, Bothell, WA 98011, USA. E-mail:
| |
Collapse
|
10
|
Li YL, Langley CA, Azumaya CM, Echeverria I, Chesarino NM, Emerman M, Cheng Y, Gross JD. The structural basis for HIV-1 Vif antagonism of human APOBEC3G. Nature 2023; 615:728-733. [PMID: 36754086 PMCID: PMC10033410 DOI: 10.1038/s41586-023-05779-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 02/02/2023] [Indexed: 02/10/2023]
Abstract
The APOBEC3 (A3) proteins are host antiviral cellular proteins that hypermutate the viral genome of diverse viral families. In retroviruses, this process requires A3 packaging into viral particles1-4. The lentiviruses encode a protein, Vif, that antagonizes A3 family members by targeting them for degradation. Diversification of A3 allows host escape from Vif whereas adaptations in Vif enable cross-species transmission of primate lentiviruses. How this 'molecular arms race' plays out at the structural level is unknown. Here, we report the cryogenic electron microscopy structure of human APOBEC3G (A3G) bound to HIV-1 Vif, and the hijacked cellular proteins that promote ubiquitin-mediated proteolysis. A small surface explains the molecular arms race, including a cross-species transmission event that led to the birth of HIV-1. Unexpectedly, we find that RNA is a molecular glue for the Vif-A3G interaction, enabling Vif to repress A3G by ubiquitin-dependent and -independent mechanisms. Our results suggest a model in which Vif antagonizes A3G by intercepting it in its most dangerous form for the virus-when bound to RNA and on the pathway to packaging-to prevent viral restriction. By engaging essential surfaces required for restriction, Vif exploits a vulnerability in A3G, suggesting a general mechanism by which RNA binding helps to position key residues necessary for viral antagonism of a host antiviral gene.
Collapse
Affiliation(s)
- Yen-Li Li
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
| | - Caroline A Langley
- Divisions of Human Biology and Basic Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA, USA
| | - Caleigh M Azumaya
- Fred Hutchinson Cancer Center, Electron Microscopy Shared Resource, Seattle, WA, USA
| | - Ignacia Echeverria
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA, USA
- Quantitative Bioscience Institute, University of California, San Francisco, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | - Nicholas M Chesarino
- Divisions of Human Biology and Basic Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Michael Emerman
- Divisions of Human Biology and Basic Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Yifan Cheng
- Quantitative Bioscience Institute, University of California, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, CA, USA
- Howard Hughes Medical Institute, University of California, San Francisco, CA, USA
| | - John D Gross
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA.
- Quantitative Bioscience Institute, University of California, San Francisco, CA, USA.
| |
Collapse
|
11
|
Si D, Chen J, Nakamura A, Chang L, Guan H. Smart de novo Macromolecular Structure Modeling from Cryo-EM Maps. J Mol Biol 2023; 435:167967. [PMID: 36681181 DOI: 10.1016/j.jmb.2023.167967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 01/04/2023] [Accepted: 01/12/2023] [Indexed: 01/20/2023]
Abstract
The study of macromolecular structures has expanded our understanding of the amazing cell machinery and such knowledge has changed how the pharmaceutical industry develops new vaccines in recent years. Traditionally, X-ray crystallography has been the main method for structure determination, however, cryogenic electron microscopy (cryo-EM) has increasingly become more popular due to recent advancements in hardware and software. The number of cryo-EM maps deposited in the EMDataResource (formerly EMDatabase) since 2002 has been dramatically increasing and it continues to do so. De novo macromolecular complex modeling is a labor-intensive process, therefore, it is highly desirable to develop software that can automate this process. Here we discuss our automated, data-driven, and artificial intelligence approaches including map processing, feature extraction, modeling building, and target identification. Recently, we have enabled DNA/RNA modeling in our deep learning-based prediction tool, DeepTracer. We have also developed DeepTracer-ID, a tool that can identify proteins solely based on the cryo-EM map. In this paper, we will present our accumulated experiences in developing deep learning-based methods surrounding macromolecule modeling applications.
Collapse
Affiliation(s)
- Dong Si
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, WA 98011, United States.
| | - Jason Chen
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, WA 98011, United States
| | - Andrew Nakamura
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, WA 98011, United States
| | - Luca Chang
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, WA 98011, United States
| | - Haowen Guan
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, WA 98011, United States
| |
Collapse
|
12
|
Dandekar T, Kunz M. We Can Think About Ourselves – The Computer Cannot. Bioinformatics 2023. [DOI: 10.1007/978-3-662-65036-3_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
|
13
|
Botifoll M, Pinto-Huguet I, Arbiol J. Machine learning in electron microscopy for advanced nanocharacterization: current developments, available tools and future outlook. NANOSCALE HORIZONS 2022; 7:1427-1477. [PMID: 36239693 DOI: 10.1039/d2nh00377e] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In the last few years, electron microscopy has experienced a new methodological paradigm aimed to fix the bottlenecks and overcome the challenges of its analytical workflow. Machine learning and artificial intelligence are answering this call providing powerful resources towards automation, exploration, and development. In this review, we evaluate the state-of-the-art of machine learning applied to electron microscopy (and obliquely, to materials and nano-sciences). We start from the traditional imaging techniques to reach the newest higher-dimensionality ones, also covering the recent advances in spectroscopy and tomography. Additionally, the present review provides a practical guide for microscopists, and in general for material scientists, but not necessarily advanced machine learning practitioners, to straightforwardly apply the offered set of tools to their own research. To conclude, we explore the state-of-the-art of other disciplines with a broader experience in applying artificial intelligence methods to their research (e.g., high-energy physics, astronomy, Earth sciences, and even robotics, videogames, or marketing and finances), in order to narrow down the incoming future of electron microscopy, its challenges and outlook.
Collapse
Affiliation(s)
- Marc Botifoll
- Catalan Institute of Nanoscience and Nanotechnology (ICN2), CSIC and BIST, Campus UAB, Bellaterra, 08193 Barcelona, Catalonia, Spain.
| | - Ivan Pinto-Huguet
- Catalan Institute of Nanoscience and Nanotechnology (ICN2), CSIC and BIST, Campus UAB, Bellaterra, 08193 Barcelona, Catalonia, Spain.
| | - Jordi Arbiol
- Catalan Institute of Nanoscience and Nanotechnology (ICN2), CSIC and BIST, Campus UAB, Bellaterra, 08193 Barcelona, Catalonia, Spain.
- ICREA, Pg. Lluís Companys 23, 08010 Barcelona, Catalonia, Spain
| |
Collapse
|
14
|
Residue-wise local quality estimation for protein models from cryo-EM maps. Nat Methods 2022; 19:1116-1125. [PMID: 35953671 PMCID: PMC10024464 DOI: 10.1038/s41592-022-01574-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Accepted: 07/11/2022] [Indexed: 01/31/2023]
Abstract
An increasing number of protein structures are being determined by cryogenic electron microscopy (cryo-EM). Although the resolution of determined cryo-EM density maps is improving in general, there are still many cases where amino acids of a protein are assigned with different levels of confidence. Here we developed a method that identifies potential misassignment of residues in the map, including residue shifts along an otherwise correct main-chain trace. The score, named DAQ, computes the likelihood that the local density corresponds to different amino acids, atoms, and secondary structures, estimated via deep learning, and assesses the consistency of the amino acid assignment in the protein structure model with that likelihood. When DAQ was applied to different versions of model structures in the Protein Data Bank that were derived from the same density maps, a clear improvement in the DAQ score was observed in the newer versions of the models. DAQ also found potential misassignment errors in a substantial number of deposited protein structure models built into cryo-EM maps.
Collapse
|
15
|
Wood DM, Dobson RC, Horne CR. Using cryo-EM to uncover mechanisms of bacterial transcriptional regulation. Biochem Soc Trans 2021; 49:2711-2726. [PMID: 34854920 PMCID: PMC8786299 DOI: 10.1042/bst20210674] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 11/10/2021] [Accepted: 11/15/2021] [Indexed: 11/17/2022]
Abstract
Transcription is the principal control point for bacterial gene expression, and it enables a global cellular response to an intracellular or environmental trigger. Transcriptional regulation is orchestrated by transcription factors, which activate or repress transcription of target genes by modulating the activity of RNA polymerase. Dissecting the nature and precise choreography of these interactions is essential for developing a molecular understanding of transcriptional regulation. While the contribution of X-ray crystallography has been invaluable, the 'resolution revolution' of cryo-electron microscopy has transformed our structural investigations, enabling large, dynamic and often transient transcription complexes to be resolved that in many cases had resisted crystallisation. In this review, we highlight the impact cryo-electron microscopy has had in gaining a deeper understanding of transcriptional regulation in bacteria. We also provide readers working within the field with an overview of the recent innovations available for cryo-electron microscopy sample preparation and image reconstruction of transcription complexes.
Collapse
Affiliation(s)
- David M. Wood
- Biomolecular Interaction Centre and School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Renwick C.J. Dobson
- Biomolecular Interaction Centre and School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
- Bio21 Molecular Science and Biotechnology Institute, Department of Biochemistry and Pharmacology, University of Melbourne, Parkville, VIC, Australia
| | - Christopher R. Horne
- Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC 3052, Australia
| |
Collapse
|
16
|
He J, Huang SY. EMNUSS: a deep learning framework for secondary structure annotation in cryo-EM maps. Brief Bioinform 2021; 22:bbab156. [PMID: 33954706 PMCID: PMC8574626 DOI: 10.1093/bib/bbab156] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 03/30/2021] [Accepted: 04/06/2021] [Indexed: 02/06/2023] Open
Abstract
Cryo-electron microscopy (cryo-EM) has become one of important experimental methods in structure determination. However, despite the rapid growth in the number of deposited cryo-EM maps motivated by advances in microscopy instruments and image processing algorithms, building accurate structure models for cryo-EM maps remains a challenge. Protein secondary structure information, which can be extracted from EM maps, is beneficial for cryo-EM structure modeling. Here, we present a novel secondary structure annotation framework for cryo-EM maps at both intermediate and high resolutions, named EMNUSS. EMNUSS adopts a three-dimensional (3D) nested U-net architecture to assign secondary structures for EM maps. Tested on three diverse datasets including simulated maps, middle resolution experimental maps, and high-resolution experimental maps, EMNUSS demonstrated its accuracy and robustness in identifying the secondary structures for cyro-EM maps of various resolutions. The EMNUSS program is freely available at http://huanglab.phys.hust.edu.cn/EMNUSS.
Collapse
Affiliation(s)
- Jiahua He
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| |
Collapse
|
17
|
Zumbado-Corrales M, Esquivel-Rodríguez J. EvoSeg: Automated Electron Microscopy Segmentation through Random Forests and Evolutionary Optimization. Biomimetics (Basel) 2021; 6:biomimetics6020037. [PMID: 34206006 PMCID: PMC8293153 DOI: 10.3390/biomimetics6020037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 05/17/2021] [Accepted: 05/28/2021] [Indexed: 11/30/2022] Open
Abstract
Electron Microscopy Maps are key in the study of bio-molecular structures, ranging from borderline atomic level to the sub-cellular range. These maps describe the envelopes that cover possibly a very large number of proteins that form molecular machines within the cell. Within those envelopes, we are interested to find what regions correspond to specific proteins so that we can understand how they function, and design drugs that can enhance or suppress a process that they are involved in, along with other experimental purposes. A classic approach by which we can begin the exploration of map regions is to apply a segmentation algorithm. This yields a mask where each voxel in 3D space is assigned an identifier that maps it to a segment; an ideal segmentation would map each segment to one protein unit, which is rarely the case. In this work, we present a method that uses bio-inspired optimization, through an Evolutionary-Optimized Segmentation algorithm, to iteratively improve upon baseline segments obtained from a classical approach, called watershed segmentation. The cost function used by the evolutionary optimization is based on an ideal segmentation classifier trained as part of this development, which uses basic structural information available to scientists, such as the number of expected units, volume and topology. We show that a basic initial segmentation with the additional information allows our evolutionary method to find better segmentation results, compared to the baseline generated by the watershed.
Collapse
|
18
|
Croll TI, Diederichs K, Fischer F, Fyfe CD, Gao Y, Horrell S, Joseph AP, Kandler L, Kippes O, Kirsten F, Müller K, Nolte K, Payne AM, Reeves M, Richardson JS, Santoni G, Stäb S, Tronrud DE, von Soosten LC, Williams CJ, Thorn A. Making the invisible enemy visible. Nat Struct Mol Biol 2021; 28:404-408. [PMID: 33972785 DOI: 10.1038/s41594-021-00593-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
| | | | - Florens Fischer
- Institut für Nanostruktur und Festkörperphysik, Universität Hamburg, Hamburg, Germany.,Rudolf-Virchow-Zentrum, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | | | - Yunyun Gao
- Institut für Nanostruktur und Festkörperphysik, Universität Hamburg, Hamburg, Germany.,Rudolf-Virchow-Zentrum, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | | | | | - Luise Kandler
- Institut für Nanostruktur und Festkörperphysik, Universität Hamburg, Hamburg, Germany.,Rudolf-Virchow-Zentrum, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Oliver Kippes
- Institut für Nanostruktur und Festkörperphysik, Universität Hamburg, Hamburg, Germany.,Rudolf-Virchow-Zentrum, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Ferdinand Kirsten
- Institut für Nanostruktur und Festkörperphysik, Universität Hamburg, Hamburg, Germany.,Rudolf-Virchow-Zentrum, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Konstantin Müller
- Rudolf-Virchow-Zentrum, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Kristopher Nolte
- Institut für Nanostruktur und Festkörperphysik, Universität Hamburg, Hamburg, Germany.,Rudolf-Virchow-Zentrum, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | | | - Matthew Reeves
- Rudolf-Virchow-Zentrum, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | | | | | - Sabrina Stäb
- Institut für Nanostruktur und Festkörperphysik, Universität Hamburg, Hamburg, Germany.,Rudolf-Virchow-Zentrum, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | | | - Lea C von Soosten
- Institut für Nanostruktur und Festkörperphysik, Universität Hamburg, Hamburg, Germany.,Rudolf-Virchow-Zentrum, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | | | - Andrea Thorn
- Institut für Nanostruktur und Festkörperphysik, Universität Hamburg, Hamburg, Germany. .,Rudolf-Virchow-Zentrum, Julius-Maximilians-Universität Würzburg, Würzburg, Germany.
| |
Collapse
|
19
|
He J, Huang SY. Full-length de novo protein structure determination from cryo-EM maps using deep learning. Bioinformatics 2021; 37:3480-3490. [PMID: 33978686 DOI: 10.1093/bioinformatics/btab357] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 04/03/2021] [Accepted: 05/08/2021] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION Advances in microscopy instruments and image processing algorithms have led to an increasing number of cryo-EM maps. However, building accurate models for the EM maps at 3-5 Å resolution remains a challenging and time-consuming process. With the rapid growth of deposited EM maps, there is an increasing gap between the maps and reconstructed/modeled 3-dimensional (3D) structures. Therefore, automatic reconstruction of atomic-accuracy full-atomstructures fromEMmaps is pressingly needed. RESULTS We present a semi-automatic de novo structure determination method using a deep learningbased framework, named as DeepMM, which builds atomic-accuracy all-atom models from cryo-EM maps at near-atomic resolution. In our method, the main-chain and Cα positions as well as their amino acid and secondary structure types are predicted in the EM map using Densely Connected Convolutional Networks. DeepMM was extensively validated on 40 simulated maps at 5 Å resolution and 30 experimental maps at 2.6-4.8 Å resolution as well as an EMDB-wide data set of 2931 experimental maps at 2.6-4.9 Å resolution, and compared with state-of-the-art algorithms including RosettaES, MAINMAST, and Phenix. Overall, our DeepMM algorithm obtained a significant improvement over existing methods in terms of both accuracy and coverage in building full-length protein structures on all test sets, demonstrating the efficacy and general applicability of DeepMM. AVAILABILITY http://huanglab.phys.hust.edu.cn/DeepMM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiahua He
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| |
Collapse
|
20
|
Wang X, Alnabati E, Aderinwale TW, Maddhuri Venkata Subramaniya SR, Terashi G, Kihara D. Detecting protein and DNA/RNA structures in cryo-EM maps of intermediate resolution using deep learning. Nat Commun 2021; 12:2302. [PMID: 33863902 PMCID: PMC8052361 DOI: 10.1038/s41467-021-22577-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 03/19/2021] [Indexed: 12/21/2022] Open
Abstract
An increasing number of density maps of macromolecular structures, including proteins and DNA/RNA complexes, have been determined by cryo-electron microscopy (cryo-EM). Although lately maps at a near-atomic resolution are routinely reported, there are still substantial fractions of maps determined at intermediate or low resolutions, where extracting structure information is not trivial. Here, we report a new computational method, Emap2sec+, which identifies DNA or RNA as well as the secondary structures of proteins in cryo-EM maps of 5 to 10 Å resolution. Emap2sec+ employs the deep Residual convolutional neural network. Emap2sec+ assigns structural labels with associated probabilities at each voxel in a cryo-EM map, which will help structure modeling in an EM map. Emap2sec+ showed stable and high assignment accuracy for nucleotides in low resolution maps and improved performance for protein secondary structure assignments than its earlier version when tested on simulated and experimental maps.
Collapse
Affiliation(s)
- Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Eman Alnabati
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Tunde W Aderinwale
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | | | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
21
|
Kyrilis FL, Belapure J, Kastritis PL. Detecting Protein Communities in Native Cell Extracts by Machine Learning: A Structural Biologist's Perspective. Front Mol Biosci 2021; 8:660542. [PMID: 33937337 PMCID: PMC8082361 DOI: 10.3389/fmolb.2021.660542] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 03/18/2021] [Indexed: 11/13/2022] Open
Abstract
Native cell extracts hold great promise for understanding the molecular structure of ordered biological systems at high resolution. This is because higher-order biomolecular interactions, dubbed as protein communities, may be retained in their (near-)native state, in contrast to extensively purifying or artificially overexpressing the proteins of interest. The distinct machine-learning approaches are applied to discover protein-protein interactions within cell extracts, reconstruct dedicated biological networks, and report on protein community members from various organisms. Their validation is also important, e.g., by the cross-linking mass spectrometry or cell biology methods. In addition, the cell extracts are amenable to structural analysis by cryo-electron microscopy (cryo-EM), but due to their inherent complexity, sorting structural signatures of protein communities derived by cryo-EM comprises a formidable task. The application of image-processing workflows inspired by machine-learning techniques would provide improvements in distinguishing structural signatures, correlating proteomic and network data to structural signatures and subsequently reconstructed cryo-EM maps, and, ultimately, characterizing unidentified protein communities at high resolution. In this review article, we summarize recent literature in detecting protein communities from native cell extracts and identify the remaining challenges and opportunities. We argue that the progress in, and the integration of, machine learning, cryo-EM, and complementary structural proteomics approaches would provide the basis for a multi-scale molecular description of protein communities within native cell extracts.
Collapse
Affiliation(s)
- Fotis L. Kyrilis
- Interdisciplinary Research Center HALOmem, Charles Tanford Protein Center, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
- Institute of Biochemistry and Biotechnology, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Jaydeep Belapure
- Interdisciplinary Research Center HALOmem, Charles Tanford Protein Center, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Panagiotis L. Kastritis
- Interdisciplinary Research Center HALOmem, Charles Tanford Protein Center, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
- Institute of Biochemistry and Biotechnology, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
- Biozentrum, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| |
Collapse
|
22
|
Croll T, Diederichs K, Fischer F, Fyfe C, Gao Y, Horrell S, Joseph AP, Kandler L, Kippes O, Kirsten F, Müller K, Nolte K, Payne A, Reeves MG, Richardson J, Santoni G, Stäb S, Tronrud D, von Soosten L, Williams C, Thorn A. Making the invisible enemy visible. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.10.07.307546. [PMID: 33052340 PMCID: PMC7553165 DOI: 10.1101/2020.10.07.307546] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
During the COVID-19 pandemic, structural biologists rushed to solve the structures of the 28 proteins encoded by the SARS-CoV-2 genome in order to understand the viral life cycle and enable structure-based drug design. In addition to the 204 previously solved structures from SARS-CoV-1, 548 structures covering 16 of the SARS-CoV-2 viral proteins have been released in a span of only 6 months. These structural models serve as the basis for research to understand how the virus hijacks human cells, for structure-based drug design, and to aid in the development of vaccines. However, errors often occur in even the most careful structure determination - and may be even more common among these structures, which were solved quickly and under immense pressure. The Coronavirus Structural Task Force has responded to this challenge by rapidly categorizing, evaluating and reviewing all of these experimental protein structures in order to help downstream users and original authors. In addition, the Task Force provided improved models for key structures online, which have been used by Folding@Home, OpenPandemics, the EU JEDI COVID-19 challenge and others.
Collapse
|