1
|
Al Nasr K, Yousef F, Jebril R, Jones C. Analytical Approaches to Improve Accuracy in Solving the Protein Topology Problem. Molecules 2018; 23:E28. [PMID: 29360779 PMCID: PMC6017786 DOI: 10.3390/molecules23020028] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Revised: 01/19/2018] [Accepted: 01/19/2018] [Indexed: 11/17/2022] Open
Abstract
To take advantage of recent advances in genomics and proteomics it is critical that the three-dimensional physical structure of biological macromolecules be determined. Cryo-Electron Microscopy (cryo-EM) is a promising and improving method for obtaining this data, however resolution is often not sufficient to directly determine the atomic scale structure. Despite this, information for secondary structure locations is detectable. De novo modeling is a computational approach to modeling these macromolecular structures based on cryo-EM derived data. During de novo modeling a mapping between detected secondary structures and the underlying amino acid sequence must be identified. DP-TOSS (Dynamic Programming for determining the Topology Of Secondary Structures) is one tool that attempts to automate the creation of this mapping. By treating the correspondence between the detected structures and the structures predicted from sequence data as a constraint graph problem DP-TOSS achieved good accuracy in its original iteration. In this paper, we propose modifications to the scoring methodology of DP-TOSS to improve its accuracy. Three scoring schemes were applied to DP-TOSS and tested: (i) a skeleton-based scoring function; (ii) a geometry-based analytical function; and (iii) a multi-well potential energy-based function. A test of 25 proteins shows that a combination of these schemes can improve the performance of DP-TOSS to solve the topology determination problem for macromolecule proteins.
Collapse
Affiliation(s)
- Kamal Al Nasr
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA.
| | - Feras Yousef
- Department of Mathematics, The University of Jordan, Amman 11942, Jordan.
| | - Ruba Jebril
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA.
| | - Christopher Jones
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA.
| |
Collapse
|
2
|
Zeil S, Kovacs J, Wriggers W, He J. Comparing an Atomic Model or Structure to a Corresponding Cryo-electron Microscopy Image at the Central Axis of a Helix. J Comput Biol 2017; 24:52-67. [PMID: 27936925 PMCID: PMC5220566 DOI: 10.1089/cmb.2016.0145] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
Abstract
Three-dimensional density maps of biological specimens from cryo-electron microscopy (cryo-EM) can be interpreted in the form of atomic models that are modeled into the density, or they can be compared to known atomic structures. When the central axis of a helix is detectable in a cryo-EM density map, it is possible to quantify the agreement between this central axis and a central axis calculated from the atomic model or structure. We propose a novel arc-length association method to compare the two axes reliably. This method was applied to 79 helices in simulated density maps and six case studies using cryo-EM maps at 6.4-7.7 Å resolution. The arc-length association method is then compared to three existing measures that evaluate the separation of two helical axes: a two-way distance between point sets, the length difference between two axes, and the individual amino acid detection accuracy. The results show that our proposed method sensitively distinguishes lateral and longitudinal discrepancies between the two axes, which makes the method particularly suitable for the systematic investigation of cryo-EM map-model pairs.
Collapse
Affiliation(s)
- Stephanie Zeil
- Department of Computer Science, Old Dominion University, Norfolk, Virginia
| | - Julio Kovacs
- Department of Mechanical and Aerospace Engineering and Institute of Biomedical Engineering, Old Dominion University, Norfolk, Virginia
| | - Willy Wriggers
- Department of Mechanical and Aerospace Engineering and Institute of Biomedical Engineering, Old Dominion University, Norfolk, Virginia
| | - Jing He
- Department of Computer Science, Old Dominion University, Norfolk, Virginia
| |
Collapse
|
3
|
He J, Zeil S, Hallak H, McKaig K, Kovacs J, Wriggers W. Comparison of an Atomic Model and Its Cryo-EM Image at the Central Axis of a Helix. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2015; 2015:1253-1259. [PMID: 27280059 PMCID: PMC4894056 DOI: 10.1109/bibm.2015.7359860] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Cryo-electron microscopy (cryo-EM) is an important biophysical technique that produces three-dimensional (3D) density maps at different resolutions. Because more and more models are being produced from cryo-EM density maps, validation of the models is becoming important. We propose a method for measuring local agreement between a model and the density map using the central axis of the helix. This method was tested using 19 helices from cryo-EM density maps between 5.5 Å and 7.2 Å resolution and 94 helices from simulated density maps. This method distinguished most of the well-fitting helices, although challenges exist for shorter helices.
Collapse
Affiliation(s)
- Jing He
- Department of Computer Science, Old Dominion University, Norfolk, VA, 23529
| | - Stephanie Zeil
- Department of Computer Science, Old Dominion University, Norfolk, VA, 23529
| | - Hussam Hallak
- Department of Computer Science, Old Dominion University, Norfolk, VA, 23529
| | - Kele McKaig
- Department of Computer Science, Old Dominion University, Norfolk, VA, 23529
| | - Julio Kovacs
- Department of Mechanical & Aerospace Engineering, Old Dominion University, Norfolk, VA, 23529
| | - Willy Wriggers
- Department of Mechanical & Aerospace Engineering, Old Dominion University, Norfolk, VA, 23529
| |
Collapse
|
4
|
Si D, He J. Tracing Beta Strands Using StrandTwister from Cryo-EM Density Maps at Medium Resolutions. Structure 2014; 22:1665-76. [DOI: 10.1016/j.str.2014.08.017] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Revised: 08/07/2014] [Accepted: 08/08/2014] [Indexed: 10/24/2022]
|
5
|
Al Nasr K, Ranjan D, Zubair M, Chen L, He J. Solving the Secondary Structure Matching Problem in Cryo-EM De Novo Modeling Using a Constrained K-Shortest Path Graph Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:419-430. [PMID: 26355788 DOI: 10.1109/tcbb.2014.2302803] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Electron cryomicroscopy is becoming a major experimental technique in solving the structures of large molecular assemblies. More and more three-dimensional images have been obtained at the medium resolutions between 5 and 10 Å. At this resolution range, major α-helices can be detected as cylindrical sticks and β-sheets can be detected as plain-like regions. A critical question in de novo modeling from cryo-EM images is to determine the match between the detected secondary structures from the image and those on the protein sequence. We formulate this matching problem into a constrained graph problem and present an O(Δ(2)N(2)2(N)) algorithm to this NP-Hard problem. The algorithm incorporates the dynamic programming approach into a constrained K-shortest path algorithm. Our method, DP-TOSS, has been tested using α-proteins with maximum 33 helices and α-β proteins up to five helices and 12 β-strands. The correct match was ranked within the top 35 for 19 of the 20 α-proteins and all nine α-β proteins tested. The results demonstrate that DP-TOSS improves accuracy, time and memory space in deriving the topologies of the secondary structure elements for proteins with a large number of secondary structures and a complex skeleton.
Collapse
|
6
|
Nasr KA, Liu C, Rwebangira M, Burge L, He J. Intensity-based skeletonization of CryoEM gray-scale images using a true segmentation-free algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1289-98. [PMID: 24384713 PMCID: PMC4104753 DOI: 10.1109/tcbb.2013.121] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Cryo-electron microscopy is an experimental technique that is able to produce 3D gray-scale images of protein molecules. In contrast to other experimental techniques, cryo-electron microscopy is capable of visualizing large molecular complexes such as viruses and ribosomes. At medium resolution, the positions of the atoms are not visible and the process cannot proceed. The medium-resolution images produced by cryo-electron microscopy are used to derive the atomic structure of the proteins in de novo modeling. The skeletons of the 3D gray-scale images are used to interpret important information that is helpful in de novo modeling. Unfortunately, not all features of the image can be captured using a single segmentation. In this paper, we present a segmentation-free approach to extract the gray-scale curve-like skeletons. The approach relies on a novel representation of the 3D image, where the image is modeled as a graph and a set of volume trees. A test containing 36 synthesized maps and one authentic map shows that our approach can improve the performance of the two tested tools used in de novo modeling. The improvements were 62 and 13 percent for Gorgon and DP-TOSS, respectively.
Collapse
Affiliation(s)
- Kamal Al Nasr
- Department of Computer Science, Tennessee State University, 3500 John Merritt Blvd, McCord Hall, Nashville, TN 37209
| | - Chunmei Liu
- Department of Systems and Computer Science, Howard University, 2300 Sixth Street, NW, Washington, DC 20059
| | - Mugizi Rwebangira
- Department of Systems and Computer Science, Howard University, 2300 Sixth Street, NW, Washington, DC 20059
| | - Legand Burge
- Department of Systems and Computer Science, Howard University, 2300 Sixth Street, NW, Washington, DC 20059
| | - Jing He
- Department of Computer Science, Old Dominion University, Engineering & Computer Sciences Bldg., 4700 Elkhorn Ave, Suite 3300, Norfolk, VA 23529
| |
Collapse
|
7
|
BISWAS ABHISHEK, SI DONG, AL NASR KAMAL, RANJAN DESH, ZUBAIR MOHAMMAD, HE JING. IMPROVED EFFICIENCY IN CRYO-EM SECONDARY STRUCTURE TOPOLOGY DETERMINATION FROM INACCURATE DATA. J Bioinform Comput Biol 2012; 10:1242006. [DOI: 10.1142/s0219720012420061] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The determination of the secondary structure topology is a critical step in deriving the atomic structure from the protein density map obtained from electron cryo-microscopy technique. This step often relies on the matching of two sources of information. One source comes from the secondary structures detected from the protein density map at the medium resolution, such as 5–10 Å. The other source comes from the predicted secondary structures from the amino acid sequence. Due to the inaccuracy in either source of information, a pool of possible secondary structure positions needs to be sampled. This paper studies the question, that is, how to reduce the computation of the mapping when the inaccuracy of the secondary structure predictions is considered. We present a method that combines the concept of dynamic graph with our previous work of using constrained shortest path to identify the topology of the secondary structures. We show a reduction of 34.55% of run-time as comparison to the naïve way of handling the inaccuracies. We also show an improved accuracy when the potential secondary structure errors are explicitly sampled verses the use of one consensus prediction. Our framework demonstrated the potential of developing computationally effective exact algorithms to identify the optimal topology of the secondary structures when the inaccuracy of the predicted data is considered.
Collapse
Affiliation(s)
- ABHISHEK BISWAS
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - DONG SI
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - KAMAL AL NASR
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - DESH RANJAN
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - MOHAMMAD ZUBAIR
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - JING HE
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| |
Collapse
|
8
|
AL NASR KAMAL, RANJAN DESH, ZUBAIR MOHAMMAD, HE JING. RANKING VALID TOPOLOGIES OF THE SECONDARY STRUCTURE ELEMENTS USING A CONSTRAINT GRAPH. J Bioinform Comput Biol 2011; 9:415-30. [DOI: 10.1142/s0219720011005604] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2011] [Revised: 04/12/2011] [Accepted: 04/17/2011] [Indexed: 11/18/2022]
Abstract
Electron cryo-microscopy is a fast advancing biophysical technique to derive three-dimensional structures of large protein complexes. Using this technique, many density maps have been generated at intermediate resolution such as 6–10 Å resolution. Although it is challenging to derive the backbone of the protein directly from such density maps, secondary structure elements such as helices and β-sheets can be computationally detected. Our work in this paper provides an approach to enumerate the top-ranked possible topologies instead of enumerating the entire population of the topologies. This approach is particularly practical for large proteins. We developed a directed weighted graph, the topology graph, to represent the secondary structure assignment problem. We prove that the problem of finding the valid topology with the minimum cost is NP hard. We developed an O(N2 2N) dynamic programming algorithm to identify the topology with the minimum cost. The test of 15 proteins suggests that our dynamic programming approach is feasible to work with proteins of much larger size than we could before. The largest protein in the test contains 18 helical sticks detected from the density map out of 33 helices in the protein.
Collapse
Affiliation(s)
- KAMAL AL NASR
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - DESH RANJAN
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - MOHAMMAD ZUBAIR
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - JING HE
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| |
Collapse
|
9
|
Sun W, He J. From isotropic to anisotropic side chain representations: comparison of three models for residue contact estimation. PLoS One 2011; 6:e19238. [PMID: 21552527 PMCID: PMC3084275 DOI: 10.1371/journal.pone.0019238] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 03/29/2011] [Indexed: 11/19/2022] Open
Abstract
The criterion to determine residue contact is a fundamental problem in deriving knowledge-based mean-force potential energy calculations for protein structures. A frequently used criterion is to require the side chain center-to-center distance or the -to- atom distance to be within a pre-determined cutoff distance. However, the spatially anisotropic nature of the side chain determines that it is challenging to identify the contact pairs. This study compares three side chain contact models: the Atom Distance criteria (ADC) model, the Isotropic Sphere Side chain (ISS) model and the Anisotropic Ellipsoid Side chain (AES) model using 424 high resolution protein structures in the Protein Data Bank. The results indicate that the ADC model is the most accurate and ISS is the worst. The AES model eliminates about 95% of the incorrectly counted contact-pairs in the ISS model. Algorithm analysis shows that AES model is the most computational intensive while ADC model has moderate computational cost. We derived a dataset of the mis-estimated contact pairs by AES model. The most misjudged pairs are Arg-Glu, Arg-Asp and Arg-Tyr. Such a dataset can be useful for developing the improved AES model by incorporating the pair-specific information for the cutoff distance.
Collapse
Affiliation(s)
- Weitao Sun
- Zhou Pei-Yuan Center for Applied Mathematics, Tsinghua University, Beijing, China.
| | | |
Collapse
|