1
|
Si D, Chen J, Nakamura A, Chang L, Guan H. Smart de novo Macromolecular Structure Modeling from Cryo-EM Maps. J Mol Biol 2023; 435:167967. [PMID: 36681181 DOI: 10.1016/j.jmb.2023.167967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 01/04/2023] [Accepted: 01/12/2023] [Indexed: 01/20/2023]
Abstract
The study of macromolecular structures has expanded our understanding of the amazing cell machinery and such knowledge has changed how the pharmaceutical industry develops new vaccines in recent years. Traditionally, X-ray crystallography has been the main method for structure determination, however, cryogenic electron microscopy (cryo-EM) has increasingly become more popular due to recent advancements in hardware and software. The number of cryo-EM maps deposited in the EMDataResource (formerly EMDatabase) since 2002 has been dramatically increasing and it continues to do so. De novo macromolecular complex modeling is a labor-intensive process, therefore, it is highly desirable to develop software that can automate this process. Here we discuss our automated, data-driven, and artificial intelligence approaches including map processing, feature extraction, modeling building, and target identification. Recently, we have enabled DNA/RNA modeling in our deep learning-based prediction tool, DeepTracer. We have also developed DeepTracer-ID, a tool that can identify proteins solely based on the cryo-EM map. In this paper, we will present our accumulated experiences in developing deep learning-based methods surrounding macromolecule modeling applications.
Collapse
Affiliation(s)
- Dong Si
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, WA 98011, United States.
| | - Jason Chen
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, WA 98011, United States
| | - Andrew Nakamura
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, WA 98011, United States
| | - Luca Chang
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, WA 98011, United States
| | - Haowen Guan
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, WA 98011, United States
| |
Collapse
|
2
|
Beton JG, Cragnolini T, Kaleel M, Mulvaney T, Sweeney A, Topf M. Integrating model simulation tools and
cryo‐electron
microscopy. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Joseph George Beton
- Centre for Structural Systems Biology (CSSB) Leibniz‐Institut für Virologie (LIV) Hamburg Germany
| | - Tristan Cragnolini
- Institute of Structural and Molecular Biology, Birkbeck and University College London London UK
| | - Manaz Kaleel
- Centre for Structural Systems Biology (CSSB) Leibniz‐Institut für Virologie (LIV) Hamburg Germany
| | - Thomas Mulvaney
- Centre for Structural Systems Biology (CSSB) Leibniz‐Institut für Virologie (LIV) Hamburg Germany
| | - Aaron Sweeney
- Centre for Structural Systems Biology (CSSB) Leibniz‐Institut für Virologie (LIV) Hamburg Germany
| | - Maya Topf
- Centre for Structural Systems Biology (CSSB) Leibniz‐Institut für Virologie (LIV) Hamburg Germany
| |
Collapse
|
3
|
Behkamal B, Naghibzadeh M, Saberi MR, Tehranizadeh ZA, Pagnani A, Al Nasr K. Three-Dimensional Graph Matching to Identify Secondary Structure Correspondence of Medium-Resolution Cryo-EM Density Maps. Biomolecules 2021; 11:1773. [PMID: 34944417 PMCID: PMC8698881 DOI: 10.3390/biom11121773] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 11/18/2021] [Accepted: 11/20/2021] [Indexed: 01/15/2023] Open
Abstract
Cryo-electron microscopy (cryo-EM) is a structural technique that has played a significant role in protein structure determination in recent years. Compared to the traditional methods of X-ray crystallography and NMR spectroscopy, cryo-EM is capable of producing images of much larger protein complexes. However, cryo-EM reconstructions are limited to medium-resolution (~4-10 Å) for some cases. At this resolution range, a cryo-EM density map can hardly be used to directly determine the structure of proteins at atomic level resolutions, or even at their amino acid residue backbones. At such a resolution, only the position and orientation of secondary structure elements (SSEs) such as α-helices and β-sheets are observable. Consequently, finding the mapping of the secondary structures of the modeled structure (SSEs-A) to the cryo-EM map (SSEs-C) is one of the primary concerns in cryo-EM modeling. To address this issue, this study proposes a novel automatic computational method to identify SSEs correspondence in three-dimensional (3D) space. Initially, through a modeling of the target sequence with the aid of extracting highly reliable features from a generated 3D model and map, the SSEs matching problem is formulated as a 3D vector matching problem. Afterward, the 3D vector matching problem is transformed into a 3D graph matching problem. Finally, a similarity-based voting algorithm combined with the principle of least conflict (PLC) concept is developed to obtain the SSEs correspondence. To evaluate the accuracy of the method, a testing set of 25 experimental and simulated maps with a maximum of 65 SSEs is selected. Comparative studies are also conducted to demonstrate the superiority of the proposed method over some state-of-the-art techniques. The results demonstrate that the method is efficient, robust, and works well in the presence of errors in the predicted secondary structures of the cryo-EM images.
Collapse
Affiliation(s)
- Bahareh Behkamal
- Department of Computer Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad 9177948974, Iran;
| | - Mahmoud Naghibzadeh
- Department of Computer Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad 9177948974, Iran;
| | - Mohammad Reza Saberi
- Medicinal Chemistry Department, School of Pharmacy, Mashhad University of Medical Sciences, Mashhad 9177899191, Iran; (M.R.S.); (Z.A.T.)
- Bioinformatics Research Group, Mashhad University of Medical Sciences, Mashhad 9177899191, Iran
| | - Zeinab Amiri Tehranizadeh
- Medicinal Chemistry Department, School of Pharmacy, Mashhad University of Medical Sciences, Mashhad 9177899191, Iran; (M.R.S.); (Z.A.T.)
| | - Andrea Pagnani
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy;
- Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo, Italy
- INFN, Sezione di Torino, I-10125 Torino, Italy
| | - Kamal Al Nasr
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA
| |
Collapse
|
4
|
Alshammari M, He J. Combining Cryo-EM Density Map and Residue Contact for Protein Secondary Structure Topologies. Molecules 2021; 26:7049. [PMID: 34834140 PMCID: PMC8624718 DOI: 10.3390/molecules26227049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 11/01/2021] [Accepted: 11/15/2021] [Indexed: 11/23/2022] Open
Abstract
Although atomic structures have been determined directly from cryo-EM density maps with high resolutions, current structure determination methods for medium resolution (5 to 10 Å) cryo-EM maps are limited by the availability of structure templates. Secondary structure traces are lines detected from a cryo-EM density map for α-helices and β-strands of a protein. A topology of secondary structures defines the mapping between a set of sequence segments and a set of traces of secondary structures in three-dimensional space. In order to enhance accuracy in ranking secondary structure topologies, we explored a method that combines three sources of information: a set of sequence segments in 1D, a set of amino acid contact pairs in 2D, and a set of traces in 3D at the secondary structure level. A test of fourteen cases shows that the accuracy of predicted secondary structures is critical for deriving topologies. The use of significant long-range contact pairs is most effective at enriching the rank of the maximum-match topology for proteins with a large number of secondary structures, if the secondary structure prediction is fairly accurate. It was observed that the enrichment depends on the quality of initial topology candidates in this approach. We provide detailed analysis in various cases to show the potential and challenge when combining three sources of information.
Collapse
Affiliation(s)
| | - Jing He
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA;
| |
Collapse
|
5
|
He J, Huang SY. EMNUSS: a deep learning framework for secondary structure annotation in cryo-EM maps. Brief Bioinform 2021; 22:bbab156. [PMID: 33954706 PMCID: PMC8574626 DOI: 10.1093/bib/bbab156] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 03/30/2021] [Accepted: 04/06/2021] [Indexed: 02/06/2023] Open
Abstract
Cryo-electron microscopy (cryo-EM) has become one of important experimental methods in structure determination. However, despite the rapid growth in the number of deposited cryo-EM maps motivated by advances in microscopy instruments and image processing algorithms, building accurate structure models for cryo-EM maps remains a challenge. Protein secondary structure information, which can be extracted from EM maps, is beneficial for cryo-EM structure modeling. Here, we present a novel secondary structure annotation framework for cryo-EM maps at both intermediate and high resolutions, named EMNUSS. EMNUSS adopts a three-dimensional (3D) nested U-net architecture to assign secondary structures for EM maps. Tested on three diverse datasets including simulated maps, middle resolution experimental maps, and high-resolution experimental maps, EMNUSS demonstrated its accuracy and robustness in identifying the secondary structures for cyro-EM maps of various resolutions. The EMNUSS program is freely available at http://huanglab.phys.hust.edu.cn/EMNUSS.
Collapse
Affiliation(s)
- Jiahua He
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| |
Collapse
|
6
|
SSA: Subset sum approach to protein β-sheet structure prediction. Comput Biol Chem 2021; 94:107552. [PMID: 34390958 DOI: 10.1016/j.compbiolchem.2021.107552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Revised: 07/21/2021] [Accepted: 07/27/2021] [Indexed: 11/22/2022]
Abstract
The three-dimensional structures of proteins provide their functions and incorrect folding of its β-strands can be the cause of many diseases. There are two major approaches for determining protein structures: computational prediction and experimental methods that employ technologies such as Cryo-electron microscopy. Due to experimental methods's high costs, extended wait times for its lengthy processes, and incompleteness of results, computational prediction is an attractive alternative. As the focus of the present paper, β-sheet structure prediction is a major portion of overall protein structure prediction. Prediction of other substructures, such as α-helices, is simpler with lower computational time complexities. Brute force methods are the most common approach and dynamic programming is also utilized to generate all possible conformations. The current study introduces the Subset Sum Approach (SSA) for the direct search space generation method, which is shown to outperform the dynamic programming approach in terms of both time and space. For the first time, the present work has calculated both the state space cardinality of the dynamic programming approach and the search space cardinality of the general brute force approaches. In regard to a set of pruning rules, SSA has demonstrated higher efficiency with respect to both time and accuracy in comparison to state-of-the-art methods.
Collapse
|
7
|
Alshammari M, He J. Combine Cryo-EM Density Map and Residue Contact for Protein Structure Prediction - A Case Study. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2020; 2020:110. [PMID: 35838376 PMCID: PMC9279007 DOI: 10.1145/3388440.3414708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Cryo-electron microscopy is a major structure determination technique for large molecular machines and membrane-associated complexes. Although atomic structures have been determined directly from cryo-EM density maps with high resolutions, current structure determination methods for medium resolution (5 to 10 Å) cryo-EM maps are limited by the availability of structure templates. Secondary structure traces are lines detected from a cryo-EM density map for α-helices and β-strands of a protein. When combined with secondary structure sequence segments predicted from a protein sequence, it is possible to generate a set of likely topologies of α-traces and β-sheet traces. A topology describes the overall folding relationship among secondary structures; it is a critical piece of information for deriving the corresponding atomic structure. We propose a method for protein structure prediction that combines three sources of information: the secondary structure traces detected from the cryo-EM density map, predicted secondary structure sequence segments, and amino acid contact pairs predicted using MULTICOM. A case study shows that using amino acid contact prediction from MULTICOM improves the ranking of the true topology. Our observations convey that using a small set of highly voted secondary structure contact pairs enhances the ranking in all experiments conducted for this case.
Collapse
|
8
|
Deng Y, Mu Y, Sazzed S, Sun J, He J. Using Curriculum Learning in Pattern Recognition of 3-dimensional Cryo-electron Microscopy Density Maps. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2020; 2020:112. [PMID: 35838357 PMCID: PMC9279008 DOI: 10.1145/3388440.3414710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Although Cryo-electron microscopy (cryo-EM) has been successfully used to derive atomic structures for many proteins, it is still challenging to derive atomic structure when the resolution of cryo-EM density maps is in the medium range, e.g., 5-10 Å. Studies have attempted to utilize machine learning methods, especially deep neural networks to build predictive models for the detection of protein secondary structures from cryo-EM images, which ultimately helps to derive the atomic structure of proteins. However, the large variation in data quality makes it challenging to train a deep neural network with high prediction accuracy. Curriculum learning has been shown as an effective learning paradigm in machine learning. In this paper, we present a study using curriculum learning as a more effective way to utilize cryo-EM density maps with varying quality. We investigated three distinct training curricula that differ in whether/how images used for training in past are reused while the network was continually trained using new images. A total of 1,382 3-dimensional cryo-EM images were extracted from density maps of Electron Microscopy Data Bank in our study. Our results indicate learning with curriculum significantly improves the performance of the final trained network when the forgetting problem is properly addressed.
Collapse
Affiliation(s)
- Yangmei Deng
- Department of Computer Science, Old Dominion University, Norfolk VA USA
| | - Yongcheng Mu
- Department of Computer Science, Old Dominion University, Norfolk VA USA
| | - Salim Sazzed
- Department of Computer Science, Old Dominion University, Norfolk VA USA
| | - Jiangwen Sun
- Department of Computer Science, Old Dominion University, Norfolk VA USA
| | - Jing He
- Department of Computer Science, Old Dominion University, Norfolk VA USA
| |
Collapse
|
9
|
Si D, Moritz SA, Pfab J, Hou J, Cao R, Wang L, Wu T, Cheng J. Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps. Sci Rep 2020; 10:4282. [PMID: 32152330 PMCID: PMC7063051 DOI: 10.1038/s41598-020-60598-y] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 02/10/2020] [Indexed: 11/29/2022] Open
Abstract
Cryo-electron microscopy (cryo-EM) has become a leading technology for determining protein structures. Recent advances in this field have allowed for atomic resolution. However, predicting the backbone trace of a protein has remained a challenge on all but the most pristine density maps (<2.5 Å resolution). Here we introduce a deep learning model that uses a set of cascaded convolutional neural networks (CNNs) to predict Cα atoms along a protein's backbone structure. The cascaded-CNN (C-CNN) is a novel deep learning architecture comprised of multiple CNNs, each predicting a specific aspect of a protein's structure. This model predicts secondary structure elements (SSEs), backbone structure, and Cα atoms, combining the results of each to produce a complete prediction map. The cascaded-CNN is a semantic segmentation image classifier and was trained using thousands of simulated density maps. This method is largely automatic and only requires a recommended threshold value for each protein density map. A specialized tabu-search path walking algorithm was used to produce an initial backbone trace with Cα placements. A helix-refinement algorithm made further improvements to the α-helix SSEs of the backbone trace. Finally, a novel quality assessment-based combinatorial algorithm was used to effectively map protein sequences onto Cα traces to obtain full-atom protein structures. This method was tested on 50 experimental maps between 2.6 Å and 4.4 Å resolution. It outperformed several state-of-the-art prediction methods including Rosetta de-novo, MAINMAST, and a Phenix based method by producing the most complete predicted protein structures, as measured by percentage of found Cα atoms. This method accurately predicted 88.9% (mean) of the Cα atoms within 3 Å of a protein's backbone structure surpassing the 66.8% mark achieved by the leading alternate method (Phenix based fully automatic method) on the same set of density maps. The C-CNN also achieved an average root-mean-square deviation (RMSD) of 1.24 Å on a set of 50 experimental density maps which was tested by the Phenix based fully automatic method. The source code and demo of this research has been published at https://github.com/DrDongSi/Ca-Backbone-Prediction.
Collapse
Affiliation(s)
- Dong Si
- Division of Computing & Software Systems, University of Washington, Bothell, WA, 98011, USA.
| | - Spencer A Moritz
- Division of Computing & Software Systems, University of Washington, Bothell, WA, 98011, USA
| | - Jonas Pfab
- Division of Computing & Software Systems, University of Washington, Bothell, WA, 98011, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, Saint Louis, MO, 63103, USA
- Program in Bioinformatics & Computational Biology, Saint Louis University, Saint Louis, MO, 63103, USA
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, 98447, USA
| | - Liguo Wang
- Department of Biological Structure, University of Washington, Seattle, WA, 98185, USA
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| |
Collapse
|
10
|
Protein secondary structure detection in intermediate-resolution cryo-EM maps using deep learning. Nat Methods 2019; 16:911-917. [PMID: 31358979 PMCID: PMC6717539 DOI: 10.1038/s41592-019-0500-1] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 06/24/2019] [Indexed: 02/05/2023]
Abstract
An increasing number of protein structures have been solved by cryo-electron microscopy (cryo-EM). Although structures determined at near-atomic resolution are now routinely reported, many density maps are still determined at an intermediate resolution, where extracting structure information is still a challenge. We have developed a computational method, Emap2sec, which identifies the secondary structures of proteins (α helices, β sheets, and other structures) in an EM map of 5 to 10 Å resolution. Emap2sec uses a 3D deep convolutional neural network to assign secondary structure to each grid point in an EM map. We tested Emap2sec on 6.0 and 10.0 Å resolution EM maps simulated from 34 structures, as well as on 43 maps determined experimentally at 5.0 to 9.5 Å resolution. Emap2sec was able to clearly identify the secondary structures in many maps tested, and showed substantially better performance than existing methods.
Collapse
|
11
|
Haslam D, Zeng T, Li R, He J. Exploratory Studies Detecting Secondary Structures in Medium Resolution 3D Cryo-EM Images Using Deep Convolutional Neural Networks. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2018; 2018:628-632. [PMID: 35838356 PMCID: PMC9279009 DOI: 10.1145/3233547.3233704] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Cryo-electron microscopy (cryo-EM) is an emerging biophysical technique for structural determination of protein complexes. However, accurate detection of secondary structures is still challenging when cryo-EM density maps are at medium resolutions (5-10 Å). Most of existing methods are image processing methods that do not fully utilize available images in the cryo-EM database. In this paper, we present a deep learning approach to segment secondary structure elements as helices and β-sheets from medium-resolution density maps. The proposed 3D convolutional neural network is shown to detect secondary structure locations with an F1 score between 0.79 and 0.88 for six simulated test cases. The architecture was also applied to an experimentally-derived cryo-EM density map with good accuracy.
Collapse
Affiliation(s)
- Devin Haslam
- Department of Computer Science, Old Dominion University, Norfolk, VA, 23529
| | - Tao Zeng
- Department of Computer Science, Washington State University, Pullman, WA 99164
| | | | - Jing He
- Corresponding author: Jing He,
| |
Collapse
|
12
|
Haslam D, Sazzed S, Wriggers W, Kovcas J, Song J, Auer M, He J. A Pattern Recognition Tool for Medium-resolution Cryo-EM Density Maps and Low-resolution Cryo-ET Density maps. BIOINFORMATICS RESEARCH AND APPLICATIONS : 14TH INTERNATIONAL SYMPOSIUM, ISBRA 2018, BEIJING, CHINA, JUNE 8-11, 2018, PROCEEDINGS. ISBRA (CONFERENCE) (14TH : 2018 : BEIJING, CHINA) 2018; 10847:233-238. [PMID: 36383494 PMCID: PMC9645795 DOI: 10.1007/978-3-319-94968-0_22] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Cryo-electron microscopy (Cryo-EM) and cryo-electron tomography (cryo-ET) produce 3-D density maps of biological molecules at a range of resolution levels. Pattern recognition tools are important in distinguishing biological components from volumetric maps with the available resolutions. One of the most distinct characters in density maps at medium (5-10 Å) resolution is the visibility of protein secondary structures. Although computational methods have been developed, the accurate detection of helices and β-strands from cryo-EM density maps is still an active research area. We have developed a tool for protein secondary structure detection and evaluation of medium resolution 3-D cryo-EM density maps which combines three computational methods (SSETracer, StrandTwister, and AxisComparison). The program was integrated in UCSF Chimera, a popular visualization software in the cryo-EM community. In related work, we have developed BundleTrac, a computational method to trace filaments in a bundle from lower resolution cryo-ET density maps. It has been applied to actin filament tracing in stereocilia with good accuracy and can be potentially added as a tool in Chimera.
Collapse
Affiliation(s)
- Devin Haslam
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - Salim Sazzed
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - Willy Wriggers
- Department of Mechanical and Aerospace Engineering, Old Dominion University, Norfolk, VA 23529, USA
| | - Julio Kovcas
- Department of Mechanical and Aerospace Engineering, Old Dominion University, Norfolk, VA 23529, USA
| | - Junha Song
- Cell and Tissue Imaging, Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Manfred Auer
- Cell and Tissue Imaging, Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jing He
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| |
Collapse
|
13
|
Al Nasr K, Yousef F, Jebril R, Jones C. Analytical Approaches to Improve Accuracy in Solving the Protein Topology Problem. Molecules 2018; 23:E28. [PMID: 29360779 PMCID: PMC6017786 DOI: 10.3390/molecules23020028] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Revised: 01/19/2018] [Accepted: 01/19/2018] [Indexed: 11/17/2022] Open
Abstract
To take advantage of recent advances in genomics and proteomics it is critical that the three-dimensional physical structure of biological macromolecules be determined. Cryo-Electron Microscopy (cryo-EM) is a promising and improving method for obtaining this data, however resolution is often not sufficient to directly determine the atomic scale structure. Despite this, information for secondary structure locations is detectable. De novo modeling is a computational approach to modeling these macromolecular structures based on cryo-EM derived data. During de novo modeling a mapping between detected secondary structures and the underlying amino acid sequence must be identified. DP-TOSS (Dynamic Programming for determining the Topology Of Secondary Structures) is one tool that attempts to automate the creation of this mapping. By treating the correspondence between the detected structures and the structures predicted from sequence data as a constraint graph problem DP-TOSS achieved good accuracy in its original iteration. In this paper, we propose modifications to the scoring methodology of DP-TOSS to improve its accuracy. Three scoring schemes were applied to DP-TOSS and tested: (i) a skeleton-based scoring function; (ii) a geometry-based analytical function; and (iii) a multi-well potential energy-based function. A test of 25 proteins shows that a combination of these schemes can improve the performance of DP-TOSS to solve the topology determination problem for macromolecule proteins.
Collapse
Affiliation(s)
- Kamal Al Nasr
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA.
| | - Feras Yousef
- Department of Mathematics, The University of Jordan, Amman 11942, Jordan.
| | - Ruba Jebril
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA.
| | - Christopher Jones
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA.
| |
Collapse
|
14
|
Islam T, Poteat M, He J. Quantification of Twist from the Central Lines of β-Strands. J Comput Biol 2018; 25:114-120. [PMID: 29313736 DOI: 10.1089/cmb.2017.0174] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Since the discovery of right-handed twist of a β-strand, many studies have been conducted to understand the twist. Given the atomic structure of a protein, twist angles have been defined using atomic positions of the backbone. However, limited study is available to characterize twist when the atomic positions are not available, but the central lines of β-strands are. Recent studies in cryoelectron microscopy show that it is possible to predict the central lines of β-strands from a medium-resolution density map. Accurate measurement of twist angles is important in identification of β-strands from such density maps. We propose an effective method to quantify twist angles from a set of splines. In a data set of 55 pairs of β-strands from 11 β-sheets of 11 proteins, the spline measurement shows comparable results as measured using the discrete method that uses atomic positions directly, particularly in capturing twist angle change along a pair, different levels of twist among different pairs, and the average of twist angles. The proposed method provides an alternative method to characterize twist using the central lines of a β-sheet.
Collapse
Affiliation(s)
- Tunazzina Islam
- Department of Computer Science, Old Dominion University , Norfolk, Virginia
| | - Michael Poteat
- Department of Computer Science, Old Dominion University , Norfolk, Virginia
| | - Jing He
- Department of Computer Science, Old Dominion University , Norfolk, Virginia
| |
Collapse
|
15
|
Ismer J, Rose AS, Tiemann JKS, Hildebrand PW. A fragment based method for modeling of protein segments into cryo-EM density maps. BMC Bioinformatics 2017; 18:475. [PMID: 29132296 PMCID: PMC5683378 DOI: 10.1186/s12859-017-1904-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Accepted: 11/01/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Single-particle analysis of electron cryo-microscopy (cryo-EM) is a key technology for elucidation of macromolecular structures. Recent technical advances in hardware and software developments significantly enhanced the resolution of cryo-EM density maps and broadened the applicability and the circle of users. To facilitate modeling of macromolecules into cryo-EM density maps, fast and easy to use methods for modeling are now demanded. RESULTS Here we investigated and benchmarked the suitability of a classical and well established fragment-based approach for modeling of segments into cryo-EM density maps (termed FragFit). FragFit uses a hierarchical strategy to select fragments from a pre-calculated set of billions of fragments derived from structures deposited in the Protein Data Bank, based on sequence similarly, fit of stem atoms and fit to a cryo-EM density map. The user only has to specify the sequence of the segment and the number of the N- and C-terminal stem-residues in the protein. Using a representative data set of protein structures, we show that protein segments can be accurately modeled into cryo-EM density maps of different resolution by FragFit. Prediction quality depends on segment length, the type of secondary structure of the segment and local quality of the map. CONCLUSION Fast and automated calculation of FragFit renders it applicable for implementation of interactive web-applications e.g. to model missing segments, flexible protein parts or hinge-regions into cryo-EM density maps.
Collapse
Affiliation(s)
- Jochen Ismer
- Institute of Medical Physics and Biophysics, University Medicine Berlin, Charitéplatz 1, 10117, Berlin, Germany
| | - Alexander S Rose
- Institute of Medical Physics and Biophysics, University Medicine Berlin, Charitéplatz 1, 10117, Berlin, Germany.,RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, CA, 92093-0743, USA
| | - Johanna K S Tiemann
- Institute of Medical Physics and Biophysics, University Medicine Berlin, Charitéplatz 1, 10117, Berlin, Germany.,Institute of Medical Physics and Biophysics, University Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Peter W Hildebrand
- Institute of Medical Physics and Biophysics, University Medicine Berlin, Charitéplatz 1, 10117, Berlin, Germany. .,Institute of Medical Physics and Biophysics, University Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany.
| |
Collapse
|
16
|
Biswas A, Ranjan D, Zubair M, Zeil S, Nasr KA, He J. An Effective Computational Method Incorporating Multiple Secondary Structure Predictions in Topology Determination for Cryo-EM Images. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:578-586. [PMID: 27008671 PMCID: PMC5071113 DOI: 10.1109/tcbb.2016.2543721] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A key idea in de novo modeling of a medium-resolution density image obtained from cryo-electron microscopy is to compute the optimal mapping between the secondary structure traces observed in the density image and those predicted on the protein sequence. When secondary structures are not determined precisely, either from the image or from the amino acid sequence of the protein, the computational problem becomes more complex. We present an efficient method that addresses the secondary structure placement problem in presence of multiple secondary structure predictions and computes the optimal mapping. We tested the method using 12 simulated images from α-proteins and two Cryo-EM images of α-β proteins. We observed that the rank of the true topologies is consistently improved by using multiple secondary structure predictions instead of a single prediction. The results show that the algorithm is robust and works well even when errors/misses in the predicted secondary structures are present in the image or the sequence. The results also show that the algorithm is efficient and is able to handle proteins with as many as 33 helices.
Collapse
Affiliation(s)
- Abhishek Biswas
- Dept. of Computer Science, Old Dominion University, Norfolk, VA 23529
| | - Desh Ranjan
- Dept. of Computer Science, Old Dominion University, Norfolk, VA 23529
| | - Mohammad Zubair
- Dept. of Computer Science, Old Dominion University, Norfolk, VA 23529
| | - Stephanie Zeil
- Dept. of Computer Science, Old Dominion University, Norfolk, VA 23529
| | - Kamal Al Nasr
- Dept. of Computer Science, Tennessee State University, Nashville, TN 37209
| | - Jing He
- Dept. of Computer Science, Old Dominion University, Norfolk, VA 23529
| |
Collapse
|
17
|
Li R, Si D, Zeng T, Ji S, He J. Deep Convolutional Neural Networks for Detecting Secondary Structures in Protein Density Maps from Cryo-Electron Microscopy. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2017; 2016:41-46. [PMID: 29770260 DOI: 10.1109/bibm.2016.7822490] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The detection of secondary structure of proteins using three dimensional (3D) cryo-electron microscopy (cryo-EM) images is still a challenging task when the spatial resolution of cryo-EM images is at medium level (5-10Å ). Prior researches focused on the usage of local features that may not capture the global information of image objects. In this study, we propose to use deep learning methods to extract high representative global features and then automatically detect secondary structures of proteins. In particular, we build a convolutional neural network (CNN) classifier that predicts the probability of label for every individual voxel in 3D cryo-EM image with respect to the secondary structure elements of proteins such as α-helix, β-sheet and background. To effectively incorporate the 3D spatial information in protein structures, we propose to perform 3D convolutions in the convolutional layers of CNNs. We show that the proposed CNN classifier can outperform existing SVM method on identifying the secondary structure elements of proteins from 3D cryo-EM medium resolution images.
Collapse
Affiliation(s)
- Rongjian Li
- Department of Computer Science, Old Dominion University, Norfolk, Virginia 23529
| | - Dong Si
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, WA 98011
| | - Tao Zeng
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164
| | - Shuiwang Ji
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164
| | - Jing He
- Department of Computer Science, Old Dominion University, Norfolk, Virginia 23529
| |
Collapse
|
18
|
Si D, He J. Modeling Beta-Traces for Beta-Barrels from Cryo-EM Density Maps. BIOMED RESEARCH INTERNATIONAL 2017; 2017:1793213. [PMID: 28164115 PMCID: PMC5259677 DOI: 10.1155/2017/1793213] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Accepted: 12/08/2016] [Indexed: 01/09/2023]
Abstract
Cryo-electron microscopy (cryo-EM) has produced density maps of various resolutions. Although α-helices can be detected from density maps at 5-8 Å resolutions, β-strands are challenging to detect at such density maps due to close-spacing of β-strands. The variety of shapes of β-sheets adds the complexity of β-strands detection from density maps. We propose a new approach to model traces of β-strands for β-barrel density regions that are extracted from cryo-EM density maps. In the test containing eight β-barrels extracted from experimental cryo-EM density maps at 5.5 Å-8.25 Å resolution, StrandRoller detected about 74.26% of the amino acids in the β-strands with an overall 2.05 Å 2-way distance between the detected β-traces and the observed ones, if the best of the fifteen detection cases is considered.
Collapse
Affiliation(s)
- Dong Si
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, WA 98011, USA
| | - Jing He
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| |
Collapse
|
19
|
Zeil S, Kovacs J, Wriggers W, He J. Comparing an Atomic Model or Structure to a Corresponding Cryo-electron Microscopy Image at the Central Axis of a Helix. J Comput Biol 2017; 24:52-67. [PMID: 27936925 PMCID: PMC5220566 DOI: 10.1089/cmb.2016.0145] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
Abstract
Three-dimensional density maps of biological specimens from cryo-electron microscopy (cryo-EM) can be interpreted in the form of atomic models that are modeled into the density, or they can be compared to known atomic structures. When the central axis of a helix is detectable in a cryo-EM density map, it is possible to quantify the agreement between this central axis and a central axis calculated from the atomic model or structure. We propose a novel arc-length association method to compare the two axes reliably. This method was applied to 79 helices in simulated density maps and six case studies using cryo-EM maps at 6.4-7.7 Å resolution. The arc-length association method is then compared to three existing measures that evaluate the separation of two helical axes: a two-way distance between point sets, the length difference between two axes, and the individual amino acid detection accuracy. The results show that our proposed method sensitively distinguishes lateral and longitudinal discrepancies between the two axes, which makes the method particularly suitable for the systematic investigation of cryo-EM map-model pairs.
Collapse
Affiliation(s)
- Stephanie Zeil
- Department of Computer Science, Old Dominion University, Norfolk, Virginia
| | - Julio Kovacs
- Department of Mechanical and Aerospace Engineering and Institute of Biomedical Engineering, Old Dominion University, Norfolk, Virginia
| | - Willy Wriggers
- Department of Mechanical and Aerospace Engineering and Institute of Biomedical Engineering, Old Dominion University, Norfolk, Virginia
| | - Jing He
- Department of Computer Science, Old Dominion University, Norfolk, Virginia
| |
Collapse
|
20
|
Haslam D, Zubair M, Ranjan D, Biswas A, He J. CHALLENGES IN MATCHING SECONDARY STRUCTURES IN CRYO-EM: AN EXPLORATION. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2016; 2016:1714-1719. [PMID: 29770261 PMCID: PMC5952047 DOI: 10.1109/bibm.2016.7822776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Cryo-electron microscopy is a fast emerging biophysical technique for structural determination of large protein complexes. While more atomic structures are being determined using this technique, it is still challenging to derive atomic structures from density maps produced at medium resolution when no suitable templates are available. A critical step in structure determination is how a protein chain threads through the 3-dimensional density map. A dynamic programming method was previously developed to generate K best matches of secondary structures between the density map and its protein sequence using shortest paths in a related weighted graph. We discuss challenges associated with the creation of the weighted graph and explore heuristic methods to solve the problem of matching secondary structures.
Collapse
Affiliation(s)
- Devin Haslam
- Department of Computer Science, Old Dominion University, Norfolk VA23529
| | - Mohammad Zubair
- Department of Computer Science, Old Dominion University, Norfolk VA23529
| | - Desh Ranjan
- Department of Computer Science, Old Dominion University, Norfolk VA23529
| | | | - Jing He
- Department of Computer Science, Old Dominion University, Norfolk VA23529
| |
Collapse
|
21
|
Constrained cyclic coordinate descent for cryo-EM images at medium resolutions: beyond the protein loop closure problem. ROBOTICA 2016; 34:1777-1790. [DOI: 10.1017/s0263574716000242] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
SUMMARYThe cyclic coordinate descent (CCD) method is a popular loop closure method in protein structure modeling. It is a robotics algorithm originally developed for inverse kinematic applications. We demonstrate an effective method of building the backbone of protein structure models using the principle of CCD and a guiding trace. For medium-resolution 3-dimensional (3D) images derived using cryo-electron microscopy (cryo-EM), it is possible to obtain guiding traces of secondary structures and their skeleton connections. Our new method, constrained cyclic coordinate descent (CCCD), builds α-helices, β-strands, and loops quickly and fairly accurately along predefined traces. We show that it is possible to build the entire backbone of a protein fairly accurately when the guiding traces are accurate. In a test of 10 proteins, the models constructed using CCCD show an average of 3.91 Å of backbone root mean square deviation (RMSD). When the CCCD method is incorporated in a simulated annealing framework to sample possible shift, translation, and rotation freedom, the models built with the true topology were ranked high on the list, with an average backbone RMSD100 of 3.76 Å. CCCD is an effective method for modeling atomic structures after secondary structure traces and skeletons are extracted from 3D cryo-EM images.
Collapse
|
22
|
He J, Zeil S, Hallak H, McKaig K, Kovacs J, Wriggers W. Comparison of an Atomic Model and Its Cryo-EM Image at the Central Axis of a Helix. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2015; 2015:1253-1259. [PMID: 27280059 PMCID: PMC4894056 DOI: 10.1109/bibm.2015.7359860] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Cryo-electron microscopy (cryo-EM) is an important biophysical technique that produces three-dimensional (3D) density maps at different resolutions. Because more and more models are being produced from cryo-EM density maps, validation of the models is becoming important. We propose a method for measuring local agreement between a model and the density map using the central axis of the helix. This method was tested using 19 helices from cryo-EM density maps between 5.5 Å and 7.2 Å resolution and 94 helices from simulated density maps. This method distinguished most of the well-fitting helices, although challenges exist for shorter helices.
Collapse
Affiliation(s)
- Jing He
- Department of Computer Science, Old Dominion University, Norfolk, VA, 23529
| | - Stephanie Zeil
- Department of Computer Science, Old Dominion University, Norfolk, VA, 23529
| | - Hussam Hallak
- Department of Computer Science, Old Dominion University, Norfolk, VA, 23529
| | - Kele McKaig
- Department of Computer Science, Old Dominion University, Norfolk, VA, 23529
| | - Julio Kovacs
- Department of Mechanical & Aerospace Engineering, Old Dominion University, Norfolk, VA, 23529
| | - Willy Wriggers
- Department of Mechanical & Aerospace Engineering, Old Dominion University, Norfolk, VA, 23529
| |
Collapse
|
23
|
Wriggers W, He J. Numerical geometry of map and model assessment. J Struct Biol 2015; 192:255-61. [PMID: 26416532 DOI: 10.1016/j.jsb.2015.09.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Revised: 09/18/2015] [Accepted: 09/24/2015] [Indexed: 10/23/2022]
Abstract
We are describing best practices and assessment strategies for the atomic interpretation of cryo-electron microscopy (cryo-EM) maps. Multiscale numerical geometry strategies in the Situs package and in secondary structure detection software are currently evolving due to the recent increases in cryo-EM resolution. Criteria that aim to predict the accuracy of fitted atomic models at low (worse than 8Å) and medium (4-8 Å) resolutions remain challenging. However, a high level of confidence in atomic models can be achieved by combining such criteria. The observed errors are due to map-model discrepancies and due to the effect of imperfect global docking strategies. Extending the earlier motion capture approach developed for flexible fitting, we use simulated fiducials (pseudoatoms) at varying levels of coarse-graining to track the local drift of structural features. We compare three tracking approaches: naïve vector quantization, a smoothly deformable model, and a tessellation of the structure into rigid Voronoi cells, which are fitted using a multi-fragment refinement approach. The lowest error is an upper bound for the (small) discrepancy between the crystal structure and the EM map due to different conditions in their structure determination. When internal features such as secondary structures are visible in medium-resolution EM maps, it is possible to extend the idea of point-based fiducials to more complex geometric representations such as helical axes, strands, and skeletons. We propose quantitative strategies to assess map-model pairs when such secondary structure patterns are prominent.
Collapse
Affiliation(s)
- Willy Wriggers
- Department of Mechanical & Aerospace Engineering, Old Dominion University, Norfolk, VA 23529, United States.
| | - Jing He
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, United States.
| |
Collapse
|
24
|
Biswas A, Ranjan D, Zubair M, He J. A Dynamic Programming Algorithm for Finding the Optimal Placement of a Secondary Structure Topology in Cryo-EM Data. J Comput Biol 2015; 22:837-43. [PMID: 26244416 DOI: 10.1089/cmb.2015.0120] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The determination of secondary structure topology is a critical step in deriving the atomic structures from the protein density maps obtained from electron cryomicroscopy technique. This step often relies on matching the secondary structure traces detected from the protein density map to the secondary structure sequence segments predicted from the amino acid sequence. Due to inaccuracies in both sources of information, a pool of possible secondary structure positions needs to be sampled. One way to approach the problem is to first derive a small number of possible topologies using existing matching algorithms, and then find the optimal placement for each possible topology. We present a dynamic programming method of Θ(Nq(2)h) to find the optimal placement for a secondary structure topology. We show that our algorithm requires significantly less computational time than the brute force method that is in the order of Θ(q(N) h).
Collapse
Affiliation(s)
- Abhishek Biswas
- Department of Computer Science, Old Dominion University , Norfolk, Virginia
| | - Desh Ranjan
- Department of Computer Science, Old Dominion University , Norfolk, Virginia
| | - Mohammad Zubair
- Department of Computer Science, Old Dominion University , Norfolk, Virginia
| | - Jing He
- Department of Computer Science, Old Dominion University , Norfolk, Virginia
| |
Collapse
|