1
|
REDCRAFT: A computational platform using residual dipolar coupling NMR data for determining structures of perdeuterated proteins in solution. PLoS Comput Biol 2021; 17:e1008060. [PMID: 33524015 PMCID: PMC7877757 DOI: 10.1371/journal.pcbi.1008060] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 02/11/2021] [Accepted: 01/05/2021] [Indexed: 01/10/2023] Open
Abstract
Nuclear Magnetic Resonance (NMR) spectroscopy is one of the three primary experimental means of characterizing macromolecular structures, including protein structures. Structure determination by solution NMR spectroscopy has traditionally relied heavily on distance restraints derived from nuclear Overhauser effect (NOE) measurements. While structure determination of proteins from NOE-based restraints is well understood and broadly used, structure determination from Residual Dipolar Couplings (RDCs) is relatively less well developed. Here, we describe the new features of the protein structure modeling program REDCRAFT and focus on the new Adaptive Decimation (AD) feature. The AD plays a critical role in improving the robustness of REDCRAFT to missing or noisy data, while allowing structure determination of larger proteins from less data. In this report we demonstrate the successful application of REDCRAFT in structure determination of proteins ranging in size from 50 to 145 residues using experimentally collected data, and of larger proteins (145 to 573 residues) using simulated RDC data. In both cases, REDCRAFT uses only RDC data that can be collected from perdeuterated proteins. Finally, we compare the accuracy of structure determination from RDCs alone with traditional NOE-based methods for the structurally novel PF.2048.1 protein. The RDC-based structure of PF.2048.1 exhibited 1.0 Å BB-RMSD with respect to a high-quality NOE-based structure. Although optimal strategies would include using RDC data together with chemical shift, NOE, and other NMR data, these studies provide proof-of-principle for robust structure determination of largely-perdeuterated proteins from RDC data alone using REDCRAFT. Residual Dipolar Couplings have the potential to improve the accuracy and reduce the time needed to characterize protein structures. In addition, RDC data have been demonstrated to concurrently elucidate structure of proteins, provide assignment of resonances, and characterize the internal dynamics of proteins. Given all the advantages associated with the study of proteins from RDC data, based on the statistics provided by the Protein Databank (PDB), surprisingly only 124 proteins (out of nearly 150,000 proteins) have utilized RDCs as part of their structure determination. Even a smaller subset of these proteins (approximately 7) have utilized RDCs as the primary source of data for structure determination. One key factor in the use of RDCs is the challenging computational and analytical aspects of this source of data. In this report, we demonstrate the success of the REDCRAFT software package in structure determination of proteins using RDC data that can be collected from small and large proteins in a routine fashion. REDCRAFT accomplishes the challenging task of structure determination from RDCs by introducing a unique search and optimization technique that is both robust and computationally tractable. Structure determination from routinely collectable RDC data using REDCRAFT can complement existing methods to provide faster and more accurate studies of larger and more complex protein structures by NMR spectroscopy in solution state.
Collapse
|
2
|
Cole C, Parks C, Rachele J, Valafar H. Increased usability, algorithmic improvements and incorporation of data mining for structure calculation of proteins with REDCRAFT software package. BMC Bioinformatics 2020; 21:204. [PMID: 33272215 PMCID: PMC7712608 DOI: 10.1186/s12859-020-3522-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 04/29/2020] [Indexed: 02/08/2023] Open
Abstract
Background Traditional approaches to elucidation of protein structures by Nuclear Magnetic Resonance spectroscopy (NMR) rely on distance restraints also known as Nuclear Overhauser effects (NOEs). The use of NOEs as the primary source of structure determination by NMR spectroscopy is time consuming and expensive. Residual Dipolar Couplings (RDCs) have become an alternate approach for structure calculation by NMR spectroscopy. In previous works, the software package REDCRAFT has been presented as a means of harnessing the information containing in RDCs for structure calculation of proteins. However, to meet its full potential, several improvements to REDCRAFT must be made. Results In this work, we present improvements to REDCRAFT that include increased usability, better interoperability, and a more robust core algorithm. We have demonstrated the impact of the improved core algorithm in the successful folding of the protein 1A1Z with as high as ±4 Hz of added error. The REDCRAFT computed structure from the highly corrupted data exhibited less than 1.0 Å with respect to the X-ray structure. We have also demonstrated the interoperability of REDCRAFT in a few instances including with PDBMine to reduce the amount of required data in successful folding of proteins to unprecedented levels. Here we have demonstrated the successful folding of the protein 1D3Z (to within 2.4 Å of the X-ray structure) using only N-H RDCs from one alignment medium. Conclusions The additional GUI features of REDCRAFT combined with the NEF compliance have significantly increased the flexibility and usability of this software package. The improvements of the core algorithm have substantially improved the robustness of REDCRAFT in utilizing less experimental data both in quality and quantity.
Collapse
Affiliation(s)
- Casey Cole
- Department of Computer Science and Engineering, University of South Carolina, M. Bert Storey Engineering and Innovation Center, 550 Assembly St, Columbia, SC, 29201, USA
| | - Caleb Parks
- Department of Computer Science and Engineering, University of South Carolina, M. Bert Storey Engineering and Innovation Center, 550 Assembly St, Columbia, SC, 29201, USA
| | - Julian Rachele
- Department of Computer Science and Engineering, University of South Carolina, M. Bert Storey Engineering and Innovation Center, 550 Assembly St, Columbia, SC, 29201, USA
| | - Homayoun Valafar
- Department of Computer Science and Engineering, University of South Carolina, M. Bert Storey Engineering and Innovation Center, 550 Assembly St, Columbia, SC, 29201, USA.
| |
Collapse
|
3
|
Simin M, Irausquin S, Cole CA, Valafar H. Improvements to REDCRAFT: a software tool for simultaneous characterization of protein backbone structure and dynamics from residual dipolar couplings. JOURNAL OF BIOMOLECULAR NMR 2014; 60:241-264. [PMID: 25403759 DOI: 10.1007/s10858-014-9871-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Accepted: 10/30/2014] [Indexed: 06/04/2023]
Abstract
Within the past two decades, there has been an increase in the acquisition of residual dipolar couplings (RDC) for investigations of biomolecular structures. Their use however is still not as widely adopted as the traditional methods of structure determination by NMR, despite their potential for extending the limits in studies that examine both the structure and dynamics of biomolecules. This is in part due to the difficulties associated with the analysis of this information-rich data type. The software analysis tool REDCRAFT was previously introduced to address some of these challenges. Here we describe and evaluate a number of additional features that have been incorporated in order to extend its computational and analytical capabilities. REDCRAFT's more traditional enhancements integrate a modified steric collision term, as well as structural refinement in the rotamer space. Other, non-traditional improvements include: the filtering of viable structures based on relative order tensor estimates, decimation of the conformational space based on structural similarity, and forward/reverse folding of proteins. Utilizing REDCRAFT's newest features we demonstrate de-novo folding of proteins 1D3Z and 1P7E to within less than 1.6 Å of the corresponding X-ray structures, using as many as four RDCs per residue and as little as two RDCs per residue, in two alignment media. We also show the successful folding of a structure to less than 1.6 Å of the X-ray structure using {C(i-1)-N(i), N(i)-H(i), and C(i-1)-H(i)} RDCs in one alignment medium, and only {N(i)-H(i)} in the second alignment medium (a set of data which can be collected on deuterated samples). The program is available for download from our website at http://ifestos.cse.sc.edu .
Collapse
Affiliation(s)
- Mikhail Simin
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29208, USA
| | | | | | | |
Collapse
|
4
|
Mukhopadhyay R, Irausquin S, Schmidt C, Valafar H. Dynafold: a dynamic programming approach to protein backbone structure determination from minimal sets of Residual Dipolar Couplings. J Bioinform Comput Biol 2014; 12:1450002. [PMID: 24467760 DOI: 10.1142/s0219720014500024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Residual Dipolar Couplings (RDCs) are a source of NMR data that can provide a powerful set of constraints on the orientation of inter-nuclear vectors, and are quickly becoming a larger part of the experimental toolset for molecular biologists. However, few reliable protocols exist for the determination of protein backbone structures from small sets of RDCs. DynaFold is a new dynamic programming algorithm designed specifically for this task, using minimal sets of RDCs collected in multiple alignment media. DynaFold was first tested utilizing synthetic data generated for the N--H , C(α)--H(α), and C--N vectors of 1BRF, 1F53, 110M, and 3LAY proteins, with up to ±1 Hz error in three alignment media, and was able to produce structures with less than 1.9 Å of the original structures. DynaFold was then tested using experimental data, obtained from the Biological Magnetic Resonance Bank, for proteins PDBID:1P7E and 1D3Z using RDC data from two alignment media. This exercise yielded structures within 1.0 Å of their respective published structures in segments with high data density, and less than 1.9 Å over the entire protein. The same sets of RDC data were also used in comparisons with traditional methods for analysis of RDCs, which failed to match the accuracy of DynaFold's approach to structure determination.
Collapse
Affiliation(s)
- Rishi Mukhopadhyay
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
| | | | | | | |
Collapse
|
5
|
Protein structure validation and identification from unassigned residual dipolar coupling data using 2D-PDPA. Molecules 2013; 18:10162-88. [PMID: 23973992 PMCID: PMC4090686 DOI: 10.3390/molecules180910162] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2013] [Revised: 08/10/2013] [Accepted: 08/13/2013] [Indexed: 11/22/2022] Open
Abstract
More than 90% of protein structures submitted to the PDB each year are homologous to some previously characterized protein structure. The extensive resources that are required for structural characterization of proteins can be justified for the 10% of the novel structures, but not for the remaining 90%. This report presents the 2D-PDPA method, which utilizes unassigned residual dipolar coupling in order to address the economics of structure determination of routine proteins by reducing the data acquisition and processing time. 2D-PDPA has been demonstrated to successfully identify the correct structure of an array of proteins that range from 46 to 445 residues in size from a library of 619 decoy structures by using unassigned simulated RDC data. When using experimental data, 2D-PDPA successfully identified the correct NMR structures from the same library of decoy structures. In addition, the most homologous X-ray structure was also identified as the second best structural candidate. Finally, success of 2D-PDPA in identifying and evaluating the most appropriate structure from a set of computationally predicted structures in the case of a previously uncharacterized protein Pf2048.1 has been demonstrated. This protein exhibits less than 20% sequence identity to any protein with known structure and therefore presents a compelling and practical application of our proposed work.
Collapse
|
6
|
Koehler J, Meiler J. Expanding the utility of NMR restraints with paramagnetic compounds: background and practical aspects. PROGRESS IN NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY 2011; 59:360-89. [PMID: 22027343 PMCID: PMC3202700 DOI: 10.1016/j.pnmrs.2011.05.001] [Citation(s) in RCA: 103] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2011] [Accepted: 05/06/2011] [Indexed: 05/05/2023]
Affiliation(s)
- Julia Koehler
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, TN 37232-8725, USA.
| | | |
Collapse
|
7
|
Shealy P, Liu Y, Simin M, Valafar H. Backbone resonance assignment and order tensor estimation using residual dipolar couplings. JOURNAL OF BIOMOLECULAR NMR 2011; 50:357-69. [PMID: 21667298 PMCID: PMC4071608 DOI: 10.1007/s10858-011-9521-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2011] [Accepted: 05/19/2011] [Indexed: 05/23/2023]
Abstract
An NMR investigation of proteins with known X-ray structures is of interest in a number of endeavors. Performing these studies through nuclear magnetic resonance (NMR) requires the costly step of resonance assignment. The prevalent assignment strategy does not make use of existing structural information and requires uniform isotope labeling. Here we present a rapid and cost-effective method of assigning NMR data to an existing structure-either an X-ray or computationally modeled structure. The presented method, Exhaustively Permuted Assignment of RDCs (EPAR), utilizes unassigned residual dipolar coupling (RDC) data that can easily be obtained by NMR spectroscopy. The algorithm uses only the backbone N-H RDCs from multiple alignment media along with the amino acid type of the RDCs. It is inspired by previous work from Zweckstetter and provides several extensions. We present results on 13 synthetic and experimental datasets from 8 different structures, including two homodimers. Using just two alignment media, EPAR achieves an average assignment accuracy greater than 80%. With three media, the average accuracy is higher than 94%. The algorithm also outputs a prediction of the assignment accuracy, which has a correlation of 0.77 to the true accuracy. This prediction score can be used to establish the needed confidence in assignment accuracy.
Collapse
Affiliation(s)
- Paul Shealy
- Department of Computer Science and Engineering, University of South Carolina, 315 Main Street, Columbia, SC 29208, USA
| | - Yizhou Liu
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30603, USA
| | - Mikhail Simin
- Department of Computer Science and Engineering, University of South Carolina, 315 Main Street, Columbia, SC 29208, USA
| | - Homayoun Valafar
- Department of Computer Science and Engineering, University of South Carolina, 315 Main Street, Columbia, SC 29208, USA
| |
Collapse
|
8
|
Donald BR, Martin J. Automated NMR Assignment and Protein Structure Determination using Sparse Dipolar Coupling Constraints. PROGRESS IN NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY 2009; 55:101-127. [PMID: 20160991 PMCID: PMC2755298 DOI: 10.1016/j.pnmrs.2008.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Affiliation(s)
- Bruce R Donald
- Departments of Computer Science and Biochemistry, Duke University
| | | |
Collapse
|
9
|
Mukhopadhyay R, Miao X, Shealy P, Valafar H. Efficient and accurate estimation of relative order tensors from lambda-maps. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2009; 198:236-247. [PMID: 19345125 PMCID: PMC4071621 DOI: 10.1016/j.jmr.2009.02.014] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2008] [Revised: 02/17/2009] [Accepted: 02/27/2009] [Indexed: 05/25/2023]
Abstract
The rapid increase in the availability of RDC data from multiple alignment media in recent years has necessitated the development of more sophisticated analyses that extract the RDC data's full information content. This article presents an analysis of the distribution of RDCs from two media (2D-RDC data), using the information obtained from a lambda-map. This article also introduces an efficient algorithm, which leverages these findings to extract the order tensors for each alignment medium using unassigned RDC data in the absence of any structural information. The results of applying this 2D-RDC analysis method to synthetic and experimental data are reported in this article. The relative order tensor estimates obtained from the 2D-RDC analysis are compared to order tensors obtained from the program REDCAT after using assignment and structural information. The final comparisons indicate that the relative order tensors estimated from the unassigned 2D-RDC method very closely match the results from methods that require assignment and structural information. The presented method is successful even in cases with small datasets. The results of analyzing experimental RDC data for the protein 1P7E are presented to demonstrate the potential of the presented work in accurately estimating the principal order parameters from RDC data that incompletely sample the RDC space. In addition to the new algorithm, a discussion of the uniqueness of the solutions is presented; no more than two clusters of distinct solutions have been shown to satisfy each lambda-map.
Collapse
|
10
|
Miao X, Mukhopadhyay R, Valafar H. Estimation of relative order tensors, and reconstruction of vectors in space using unassigned RDC data and its application. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2008; 194:202-11. [PMID: 18692422 PMCID: PMC2669903 DOI: 10.1016/j.jmr.2008.07.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2008] [Revised: 06/27/2008] [Accepted: 07/02/2008] [Indexed: 05/11/2023]
Abstract
Advances in NMR instrumentation and pulse sequence design have resulted in easier acquisition of Residual Dipolar Coupling (RDC) data. However, computational and theoretical analysis of this type of data has continued to challenge the international community of investigators because of their complexity and rich information content. Contemporary use of RDC data has required a-priori assignment, which significantly increases the overall cost of structural analysis. This article introduces a novel algorithm that utilizes unassigned RDC data acquired from multiple alignment media (nD-RDC, n3) for simultaneous extraction of the relative order tensor matrices and reconstruction of the interacting vectors in space. Estimation of the relative order tensors and reconstruction of the interacting vectors can be invaluable in a number of endeavors. An example application has been presented where the reconstructed vectors have been used to quantify the fitness of a template protein structure to the unknown protein structure. This work has other important direct applications such as verification of the novelty of an unknown protein and validation of the accuracy of an available protein structure model in drug design. More importantly, the presented work has the potential to bridge the gap between experimental and computational methods of structure determination.
Collapse
Affiliation(s)
- Xijiang Miao
- Computer Science and Engineering, Swearingen Engineering Center, University of South Carolina, Columbia, SC 29308, USA
| | | | | |
Collapse
|