1
|
Das NR, Chaudhury KN, Pal D. Improved NMR-data-compliant protein structure modeling captures context-dependent variations and expands the scope of functional inference. Proteins 2023; 91:412-435. [PMID: 36287124 DOI: 10.1002/prot.26439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 09/12/2022] [Accepted: 10/20/2022] [Indexed: 11/13/2022]
Abstract
Nuclear magnetic resonance (NMR) spectroscopy can reveal conformational states of a protein in physiological conditions. However, sparsely available NMR data for a protein with large degrees of freedom can introduce structural artifacts in the built models. Currently used state-of-the-art methods deriving protein structure and conformation from NMR deploy molecular dynamics (MD) coupled with simulated annealing for building models. We provide an alternate graph-based modeling approach, where we first build substructures from NMR-derived distance-geometry constraints combined in one shot to form the core structure. The remaining molecule with inadequate data is modeled using a hybrid approach respecting the observed distance-geometry constraints. One-shot structure building is rarely undertaken for large and sparse data systems, but our data-driven bottom-up approach makes this uniquely feasible by suitable partitioning of the problem. A detailed comparison of select models with state-of-art methods reveals differences in the secondary structure regions wherein the correctness of our models is confirmed by NMR data. Benchmarking of 106 protein-folds covering 38-282 length structures shows minimal experimental-constraint violations while conforming to other structure quality parameters such as the proper folding, steric clash, and torsion angle violation based on Ramachandran plot criteria. Comparative MD studies using select protein models from a state-of-art method and ours under identical experimental parameters reveal distinct conformational dynamics that could be attributed to protein structure-function. Our work is thus useful in building enhanced NMR-evidence-based models that encapsulate the contextual secondary and tertiary structure variations present during the experimentation and expand the scope of functional inference.
Collapse
Affiliation(s)
- Niladri R Das
- IISc Mathematics Initiative, Indian Institute of Science, Bangalore, India.,Department of Electrical Engineering, Indian Institute of Science, Bangalore, India
| | - Kunal N Chaudhury
- Department of Electrical Engineering, Indian Institute of Science, Bangalore, India
| | - Debnath Pal
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India
| |
Collapse
|
2
|
Wei X, Li ZC, Li SJ, Peng XB, Zhao Q. Protein structure determination using a Riemannian approach. FEBS Lett 2019; 594:1036-1051. [PMID: 31769509 DOI: 10.1002/1873-3468.13688] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Revised: 10/31/2019] [Accepted: 11/14/2019] [Indexed: 11/05/2022]
Abstract
Protein NMR structure determination is one of the most extensively studied problems. Here, we adopt a novel method based on a matrix completion technique - the Riemannian approach - to rebuild the protein structure from the nuclear Overhauser effect distance restraints and the dihedral angle restraints. In comparison with the cyana method, the results generated via the Riemannian approach are more similar to the standard X-ray crystallographic structures as a result of the simple but powerful internal calculation processing function. In addition, our results demonstrate that the Riemannian approach has a comparable or even better performance than the cyana method on other structural assessment metrics, including the stereochemical quality and restraint violations. The Riemannian approach software is available at: https://github.com/xubiaopeng/Protein_Recon_MCRiemman.
Collapse
Affiliation(s)
- Xian Wei
- Center for Quantum Technology Research, School of Physics, Beijing Institute of Technology, China.,Department of Science, Taiyuan Institute of Technology, China
| | - Zhi-Cheng Li
- Department of Physics, Taiyuan Normal University, China
| | - Shi-Jian Li
- Center for Quantum Technology Research, School of Physics, Beijing Institute of Technology, China
| | - Xu-Biao Peng
- Center for Quantum Technology Research, School of Physics, Beijing Institute of Technology, China
| | - Qing Zhao
- Center for Quantum Technology Research, School of Physics, Beijing Institute of Technology, China
| |
Collapse
|
3
|
Malliavin TE, Mucherino A, Lavor C, Liberti L. Systematic Exploration of Protein Conformational Space Using a Distance Geometry Approach. J Chem Inf Model 2019; 59:4486-4503. [PMID: 31442036 DOI: 10.1021/acs.jcim.9b00215] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The optimization approaches classically used during the determination of protein structure encounter various difficulties, especially when the size of the conformational space is large. Indeed, in such a case, algorithmic convergence criteria are more difficult to set up. Moreover, the size of the search space makes it difficult to achieve a complete exploration. The interval branch-and-prune (iBP) approach, based on the reformulation of the distance geometry problem (DGP) provides a theoretical frame for the generation of protein conformations, by systematically sampling the conformational space. When an appropriate subset of interatomic distances is known exactly, this worst-case exponential-time algorithm is provably complete and fixed-parameter tractable. These guarantees, however, immediately disappear as distance measurement errors are introduced. Here we propose an improvement of this approach: threading-augmented interval branch-and-prune (TAiBP), where the combinatorial explosion of the original iBP approach arising from its exponential complexity is alleviated by partitioning the input instances into consecutive peptide fragments and by using self-organizing maps (SOMs) to obtain clusters of similar solutions. A validation of the TAiBP approach is presented here on a set of proteins of various sizes and structures. The calculation inputs are a uniform covalent geometry extracted from force field covalent terms, the backbone dihedral angles with error intervals, and a few long-range distances. For most of the proteins smaller than 50 residues and interval widths of 20°, the TAiBP approach yielded solutions with RMSD values smaller than 3 Å with respect to the initial protein conformation. The efficiency of the TAiBP approach for proteins larger than 50 residues will require the use of nonuniform covalent geometry and may have benefits from the recent development of residue-specific force-fields.
Collapse
Affiliation(s)
- Thérèse E Malliavin
- Unité de Bioinformatique Structurale, UMR 3528, CNRS, and Departement de Bioinformatique, Biostatistique et Biologie Intégrative, USR 3756, CNRS , Institut Pasteur , 75015 Paris , France
| | | | - Carlile Lavor
- Applied Math Department , IMECC-University of Campinas , Campinas , SP 13083-970 , Brazil
| | - Leo Liberti
- LIX CNRS, Ecole Polytechnique , Institut Polytechnique de Paris , Route de Saclay , 91128 Palaiseau , France
| |
Collapse
|
4
|
Li Z, Li S, Wei X, Peng X, Zhao Q. Recovering the Missing Regions in Crystal Structures from the Nuclear Magnetic Resonance Measurement Data Using Matrix Completion Method. J Comput Biol 2019; 27:709-717. [PMID: 31502861 DOI: 10.1089/cmb.2019.0107] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Based on matrix completion algorithm, we proposed a simple method to recover the missing regions in the X-ray crystal structures using the corresponding nuclear magnetic resonance (NMR) measurement data for the proteins with both X-ray and NMR experimental data deposited in Protein Data Bank (PDB). By selecting 10 test proteins deposited in PDB and comparing with the standard MODELLER results from the root-mean-square deviation and MolProbity aspects, we validated that our method can provide a better protein structure model, which combines both X-ray crystallographic structure data and NMR data together than MODELLER algorithm. This method is particularly useful for building the initial structures in Molecular Dynamics when studying the protein folding process.
Collapse
Affiliation(s)
- Zhicheng Li
- Center for Quantum Technology Research, School of Physics, Beijing Institute of Technology, Beijing, P.R. China
| | - Shijian Li
- Center for Quantum Technology Research, School of Physics, Beijing Institute of Technology, Beijing, P.R. China
| | - Xian Wei
- Center for Quantum Technology Research, School of Physics, Beijing Institute of Technology, Beijing, P.R. China
| | - Xubiao Peng
- Center for Quantum Technology Research, School of Physics, Beijing Institute of Technology, Beijing, P.R. China
| | - Qing Zhao
- Center for Quantum Technology Research, School of Physics, Beijing Institute of Technology, Beijing, P.R. China
| |
Collapse
|
5
|
Li Z, Li S, Wei X, Zhao Q. Scaled Alternating Steepest Descent Algorithm Applied for Protein Structure Determination from Nuclear Magnetic Resonance Data. J Comput Biol 2019; 26:1020-1029. [DOI: 10.1089/cmb.2019.0013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Zhicheng Li
- Center for Quantum Technology Research, School of Physics, Beijing Institute of Technology, Beijing, China
| | - Shijian Li
- Center for Quantum Technology Research, School of Physics, Beijing Institute of Technology, Beijing, China
| | - Xian Wei
- Center for Quantum Technology Research, School of Physics, Beijing Institute of Technology, Beijing, China
| | - Qing Zhao
- Center for Quantum Technology Research, School of Physics, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
6
|
Khoo Y, Singer A, Cowburn D. Integrating NOE and RDC using sum-of-squares relaxation for protein structure determination. JOURNAL OF BIOMOLECULAR NMR 2017; 68:163-185. [PMID: 28616711 DOI: 10.1007/s10858-017-0108-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 03/31/2017] [Indexed: 06/07/2023]
Abstract
We revisit the problem of protein structure determination from geometrical restraints from NMR, using convex optimization. It is well-known that the NP-hard distance geometry problem of determining atomic positions from pairwise distance restraints can be relaxed into a convex semidefinite program (SDP). However, often the NOE distance restraints are too imprecise and sparse for accurate structure determination. Residual dipolar coupling (RDC) measurements provide additional geometric information on the angles between atom-pair directions and axes of the principal-axis-frame. The optimization problem involving RDC is highly non-convex and requires a good initialization even within the simulated annealing framework. In this paper, we model the protein backbone as an articulated structure composed of rigid units. Determining the rotation of each rigid unit gives the full protein structure. We propose solving the non-convex optimization problems using the sum-of-squares (SOS) hierarchy, a hierarchy of convex relaxations with increasing complexity and approximation power. Unlike classical global optimization approaches, SOS optimization returns a certificate of optimality if the global optimum is found. Based on the SOS method, we proposed two algorithms-RDC-SOS and RDC-NOE-SOS, that have polynomial time complexity in the number of amino-acid residues and run efficiently on a standard desktop. In many instances, the proposed methods exactly recover the solution to the original non-convex optimization problem. To the best of our knowledge this is the first time SOS relaxation is introduced to solve non-convex optimization problems in structural biology. We further introduce a statistical tool, the Cramér-Rao bound (CRB), to provide an information theoretic bound on the highest resolution one can hope to achieve when determining protein structure from noisy measurements using any unbiased estimator. Our simulation results show that when the RDC measurements are corrupted by Gaussian noise of realistic variance, both SOS based algorithms attain the CRB. We successfully apply our method in a divide-and-conquer fashion to determine the structure of ubiquitin from experimental NOE and RDC measurements obtained in two alignment media, achieving more accurate and faster reconstructions compared to the current state of the art.
Collapse
Affiliation(s)
- Y Khoo
- Department of Physics, Princeton University, Princeton, NJ, 08540, USA.
- Department of Mathematics, Stanford University, Stanford, CA, 94305, USA.
| | - A Singer
- Department of Mathematics and PACM, Princeton University, Princeton, NJ, 08544, USA
| | - D Cowburn
- Department of Biochemistry, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| |
Collapse
|
7
|
Protein structure estimation from NMR data by matrix completion. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2017; 46:525-532. [DOI: 10.1007/s00249-017-1198-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Revised: 01/11/2017] [Accepted: 01/20/2017] [Indexed: 10/20/2022]
|
8
|
Cassioli A, Bardiaux B, Bouvier G, Mucherino A, Alves R, Liberti L, Nilges M, Lavor C, Malliavin TE. An algorithm to enumerate all possible protein conformations verifying a set of distance constraints. BMC Bioinformatics 2015; 16:23. [PMID: 25627244 PMCID: PMC4384350 DOI: 10.1186/s12859-015-0451-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 01/05/2015] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND The determination of protein structures satisfying distance constraints is an important problem in structural biology. Whereas the most common method currently employed is simulated annealing, there have been other methods previously proposed in the literature. Most of them, however, are designed to find one solution only. RESULTS In order to explore exhaustively the feasible conformational space, we propose here an interval Branch-and-Prune algorithm (iBP) to solve the Distance Geometry Problem (DGP) associated to protein structure determination. This algorithm is based on a discretization of the problem obtained by recursively constructing a search space having the structure of a tree, and by verifying whether the generated atomic positions are feasible or not by making use of pruning devices. The pruning devices used here are directly related to features of protein conformations. CONCLUSIONS We described the new algorithm iBP to generate protein conformations satisfying distance constraints, that would potentially allows a systematic exploration of the conformational space. The algorithm iBP has been applied on three α-helical peptides.
Collapse
Affiliation(s)
| | - Benjamin Bardiaux
- Institut Pasteur, Structural Bioinformatics Unit, 25, rue du Dr Roux, Paris, 75015, France. .,CNRS UMR3528, 25, rue du Dr Roux, Paris, 75015, France.
| | - Guillaume Bouvier
- Institut Pasteur, Structural Bioinformatics Unit, 25, rue du Dr Roux, Paris, 75015, France. .,CNRS UMR3528, 25, rue du Dr Roux, Paris, 75015, France.
| | | | - Rafael Alves
- LIX, Ecole Polytechnique, Palaiseau, 91128, France.
| | - Leo Liberti
- LIX, Ecole Polytechnique, Palaiseau, 91128, France. .,IBM TJ Watson Research Center, NY Yorktown Heights, 10598, USA.
| | - Michael Nilges
- Institut Pasteur, Structural Bioinformatics Unit, 25, rue du Dr Roux, Paris, 75015, France. .,CNRS UMR3528, 25, rue du Dr Roux, Paris, 75015, France.
| | - Carlile Lavor
- University of Campinas (IMECC-UNICAMP), Campinas-SP, 13083-859, Brasil.
| | - Thérèse E Malliavin
- Institut Pasteur, Structural Bioinformatics Unit, 25, rue du Dr Roux, Paris, 75015, France. .,CNRS UMR3528, 25, rue du Dr Roux, Paris, 75015, France.
| |
Collapse
|
9
|
Li XB, Burkowski F. Generating conformational transitions using the euclidean distance matrix. IEEE Trans Nanobioscience 2015; 14:203-9. [PMID: 25608309 DOI: 10.1109/tnb.2014.2387156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Elastic network interpolation (ENI) is an efficient method for generating intermediate conformations between two end protein conformations. Its current formulation uses interatomic distance. We show how this can be generalized to interatomic distances-squared. This generalization is part of an effort to study protein dynamics on the set of positive semidefinite (PSD) matrices, which has a rich mathematical structure. We use lattice structures to test this interpolation scheme, and discuss some limitations observed. We conclude with some suggestions for future research.
Collapse
|