1
|
Mu Y, Nguyen T, Hawickhorst B, Wriggers W, Sun J, He J. The combined focal loss and dice loss function improves the segmentation of beta-sheets in medium-resolution cryo-electron-microscopy density maps. BIOINFORMATICS ADVANCES 2024; 4:vbae169. [PMID: 39600382 PMCID: PMC11590252 DOI: 10.1093/bioadv/vbae169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 08/17/2024] [Accepted: 11/19/2024] [Indexed: 11/29/2024]
Abstract
Summary Although multiple neural networks have been proposed for detecting secondary structures from medium-resolution (5-10 Å) cryo-electron microscopy (cryo-EM) maps, the loss functions used in the existing deep learning networks are primarily based on cross-entropy loss, which is known to be sensitive to class imbalances. We investigated five loss functions: cross-entropy, Focal loss, Dice loss, and two combined loss functions. Using a U-Net architecture in our DeepSSETracer method and a dataset composed of 1355 box-cropped atomic-structure/density-map pairs, we found that a newly designed loss function that combines Focal loss and Dice loss provides the best overall detection accuracy for secondary structures. For β-sheet voxels, which are generally much harder to detect than helix voxels, the combined loss function achieved a significant improvement (an 8.8% increase in the F1 score) compared to the cross-entropy loss function and a noticeable improvement from the Dice loss function. This study demonstrates the potential for designing more effective loss functions for hard cases in the segmentation of secondary structures. The newly trained model was incorporated into DeepSSETracer 1.1 for the segmentation of protein secondary structures in medium-resolution cryo-EM map components. DeepSSETracer can be integrated into ChimeraX, a popular molecular visualization software. Availability and implementation https://www.cs.odu.edu/∼bioinfo/B2I_Tools/.
Collapse
Affiliation(s)
- Yongcheng Mu
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, United States
| | - Thu Nguyen
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, United States
| | - Bryan Hawickhorst
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, United States
| | - Willy Wriggers
- Department of Mechanical and Aerospace Engineering, Old Dominion University, Norfolk, VA 23529, United States
| | - Jiangwen Sun
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, United States
| | - Jing He
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, United States
| |
Collapse
|
2
|
Sazzed S. Determining Protein Secondary Structures in Heterogeneous Medium-Resolution Cryo-EM Images Using CryoSSESeg. ACS OMEGA 2024; 9:26409-26416. [PMID: 38911779 PMCID: PMC11191131 DOI: 10.1021/acsomega.4c02608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 05/02/2024] [Accepted: 05/09/2024] [Indexed: 06/25/2024]
Abstract
While the acquisition of cryo-electron microscopy (cryo-EM) at near-atomic resolution is becoming more prevalent, a considerable number of density maps are still resolved only at intermediate resolutions (5-10 Å). Due to the large variation in quality among these medium-resolution density maps, extracting structural information from them remains a challenging task. This study introduces a convolutional neural network (CNN)-based framework, cryoSSESeg, to determine the organization of protein secondary structure elements in medium-resolution cryo-EM images. CryoSSESeg is trained on approximately 1300 protein chains derived from around 500 experimental cryo-EM density maps of varied quality. It demonstrates strong performance with residue-level F 1 scores of 0.76 for helix detection and 0.60 for β-sheet detection on average across a set of testing chains. In comparison to traditional image processing tools like SSETracer, which demand significant manual intervention and preprocessing steps, cryoSSESeg demonstrates comparable or superior performance. Additionally, it demonstrates competitive performance alongside another deep learning-based model, Emap2sec. Furthermore, this study underscores the importance of secondary structure quality, particularly adherence to expected shapes, in detection performance, emphasizing the necessity for careful evaluation of the data quality.
Collapse
|
3
|
Mu Y, Sazzed S, Alshammari M, Sun J, He J. A Tool for Segmentation of Secondary Structures in 3D Cryo-EM Density Map Components Using Deep Convolutional Neural Networks. FRONTIERS IN BIOINFORMATICS 2021; 1:710119. [PMID: 36303800 PMCID: PMC9581063 DOI: 10.3389/fbinf.2021.710119] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Accepted: 09/28/2021] [Indexed: 07/20/2023] Open
Abstract
Although cryo-electron microscopy (cryo-EM) has been successfully used to derive atomic structures for many proteins, it is still challenging to derive atomic structures when the resolution of cryo-EM density maps is in the medium resolution range, such as 5-10 Å. Detection of protein secondary structures, such as helices and β-sheets, from cryo-EM density maps provides constraints for deriving atomic structures from such maps. As more deep learning methodologies are being developed for solving various molecular problems, effective tools are needed for users to access them. We have developed an effective software bundle, DeepSSETracer, for the detection of protein secondary structure from cryo-EM component maps in medium resolution. The bundle contains the network architecture and a U-Net model trained with a curriculum and gradient of episodic memory (GEM). The bundle integrates the deep neural network with the visualization capacity provided in ChimeraX. Using a Linux server that is remotely accessed by Windows users, it takes about 6 s on one CPU and one GPU for the trained deep neural network to detect secondary structures in a cryo-EM component map containing 446 amino acids. A test using 28 chain components of cryo-EM maps shows overall residue-level F1 scores of 0.72 and 0.65 to detect helices and β-sheets, respectively. Although deep learning applications are built on software frameworks, such as PyTorch and Tensorflow, our pioneer work here shows that integration of deep learning applications with ChimeraX is a promising and effective approach. Our experiments show that the F1 score measured at the residue level is an effective evaluation of secondary structure detection for individual classes. The test using 28 cryo-EM component maps shows that DeepSSETracer detects β-sheets more accurately than Emap2sec+, with a weighted average residue-level F1 score of 0.65 and 0.42, respectively. It also shows that Emap2sec+ detects helices more accurately than DeepSSETracer with a weighted average residue-level F1 score of 0.77 and 0.72 respectively.
Collapse
Affiliation(s)
| | | | | | | | - Jing He
- *Correspondence: Jing He, ; Jiangwen Sun,
| |
Collapse
|
4
|
Mori T, Terashi G, Matsuoka D, Kihara D, Sugita Y. Efficient Flexible Fitting Refinement with Automatic Error Fixing for De Novo Structure Modeling from Cryo-EM Density Maps. J Chem Inf Model 2021; 61:3516-3528. [PMID: 34142833 PMCID: PMC9282639 DOI: 10.1021/acs.jcim.1c00230] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Structural modeling of proteins from cryo-electron microscopy (cryo-EM) density maps is one of the challenging issues in structural biology. De novo modeling combined with flexible fitting refinement (FFR) has been widely used to build a structure of new proteins. In de novo prediction, artificial conformations containing local structural errors such as chirality errors, cis peptide bonds, and ring penetrations are frequently generated and cannot be easily removed in the subsequent FFR. Moreover, refinement can be significantly suppressed due to the low mobility of atoms inside the protein. To overcome these problems, we propose an efficient scheme for FFR, in which the local structural errors are fixed first, followed by FFR using an iterative simulated annealing (SA) molecular dynamics protocol with the united atom (UA) model in an implicit solvent model; we call this scheme "SAUA-FFR". The best model is selected from multiple flexible fitting runs with various biasing force constants to reduce overfitting. We apply our scheme to the decoys obtained from MAINMAST and demonstrate an improvement of the best model of eight selected proteins in terms of the root-mean-square deviation, MolProbity score, and RWplus score compared to the original scheme of MAINMAST. Fixing the local structural errors can enhance the formation of secondary structures, and the UA model enables progressive refinement compared to the all-atom model owing to its high mobility in the implicit solvent. The SAUA-FFR scheme realizes efficient and accurate protein structure modeling from medium-resolution maps with less overfitting.
Collapse
Affiliation(s)
- Takaharu Mori
- RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana 47907, United States
| | - Daisuke Matsuoka
- RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana 47907, United States.,Department of Computer Science, Purdue University, West Lafayette, Indiana 47907, United States
| | - Yuji Sugita
- RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan.,RIKEN Center for Computational Science, 7-1-26 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan.,RIKEN Center for Biosystems Dynamics Research, 7-1-26 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| |
Collapse
|
5
|
Palermo G, Sugita Y, Wriggers W, Amaro RE. Faces of Contemporary CryoEM Information and Modeling. J Chem Inf Model 2021; 60:2407-2409. [PMID: 32452204 DOI: 10.1021/acs.jcim.0c00481] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Giulia Palermo
- Department of Bioengineering, University of California Riverside, Riverside, California 92521, United States
| | - Yuji Sugita
- Theoretical Molecular Science Laboratory, RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.,Computational Biophysics Research Team, RIKEN Center for Computational Science, 7-1-26 Minatojima-Minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan.,Laboratory for Biomolecular Function Simulation, RIKEN Center for Biosystems Dynamics Research, 1-6-5 Minatojima-Minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Willy Wriggers
- Department of Mechanical and Aerospace Engineering, Old Dominion University, Norfolk, Virginia 23529, United States
| | - Rommie E Amaro
- Department of Chemistry and Biochemistry, University of California San Diego, San Diego, California 92093-0340, United States
| |
Collapse
|
6
|
Deng Y, Mu Y, Sazzed S, Sun J, He J. Using Curriculum Learning in Pattern Recognition of 3-dimensional Cryo-electron Microscopy Density Maps. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2020; 2020:112. [PMID: 35838357 PMCID: PMC9279008 DOI: 10.1145/3388440.3414710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Although Cryo-electron microscopy (cryo-EM) has been successfully used to derive atomic structures for many proteins, it is still challenging to derive atomic structure when the resolution of cryo-EM density maps is in the medium range, e.g., 5-10 Å. Studies have attempted to utilize machine learning methods, especially deep neural networks to build predictive models for the detection of protein secondary structures from cryo-EM images, which ultimately helps to derive the atomic structure of proteins. However, the large variation in data quality makes it challenging to train a deep neural network with high prediction accuracy. Curriculum learning has been shown as an effective learning paradigm in machine learning. In this paper, we present a study using curriculum learning as a more effective way to utilize cryo-EM density maps with varying quality. We investigated three distinct training curricula that differ in whether/how images used for training in past are reused while the network was continually trained using new images. A total of 1,382 3-dimensional cryo-EM images were extracted from density maps of Electron Microscopy Data Bank in our study. Our results indicate learning with curriculum significantly improves the performance of the final trained network when the forgetting problem is properly addressed.
Collapse
Affiliation(s)
- Yangmei Deng
- Department of Computer Science, Old Dominion University, Norfolk VA USA
| | - Yongcheng Mu
- Department of Computer Science, Old Dominion University, Norfolk VA USA
| | - Salim Sazzed
- Department of Computer Science, Old Dominion University, Norfolk VA USA
| | - Jiangwen Sun
- Department of Computer Science, Old Dominion University, Norfolk VA USA
| | - Jing He
- Department of Computer Science, Old Dominion University, Norfolk VA USA
| |
Collapse
|