1
|
Xue X, Sun H, Yang M, Liu X, Hu HY, Deng Y, Wang X. Advances in the Application of Artificial Intelligence-Based Spectral Data Interpretation: A Perspective. Anal Chem 2023; 95:13733-13745. [PMID: 37688541 DOI: 10.1021/acs.analchem.3c02540] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2023]
Abstract
The interpretation of spectral data, including mass, nuclear magnetic resonance, infrared, and ultraviolet-visible spectra, is critical for obtaining molecular structural information. The development of advanced sensing technology has multiplied the amount of available spectral data. Chemical experts must use basic principles corresponding to the spectral information generated by molecular fragments and functional groups. This is a time-consuming process that requires a solid professional knowledge base. In recent years, the rapid development of computer science and its applications in cheminformatics and the emergence of computer-aided expert systems have greatly reduced the difficulty in analyzing large quantities of data. For expert systems, however, the problem-solving strategy must be known in advance or extracted by human experts and translated into algorithms. Gratifyingly, the development of artificial intelligence (AI) methods has shown great promise for solving such problems. Traditional algorithms, including the latest neural network algorithms, have shown great potential for both extracting useful information and processing massive quantities of data. This Perspective highlights recent innovations covering all of the emerging AI-based spectral interpretation techniques. In addition, the main limitations and current obstacles are presented, and the corresponding directions for further research are proposed. Moreover, this Perspective gives the authors' personal outlook on the development and future applications of spectral interpretation.
Collapse
Affiliation(s)
- Xi Xue
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hai-Yu Hu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
| |
Collapse
|
2
|
Fowler NJ, Albalwi MF, Lee S, Hounslow AM, Williamson MP. Improved methodology for protein NMR structure calculation using hydrogen bond restraints and ANSURR validation: The SH2 domain of SH2B1. Structure 2023; 31:975-986.e3. [PMID: 37311460 DOI: 10.1016/j.str.2023.05.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 05/02/2023] [Accepted: 05/18/2023] [Indexed: 06/15/2023]
Abstract
Protein structures calculated using NMR data are less accurate and less well-defined than they could be. Here we use the program ANSURR to show that this deficiency is at least in part due to a lack of hydrogen bond restraints. We describe a protocol to introduce hydrogen bond restraints into the structure calculation of the SH2 domain from SH2B1 in a systematic and transparent way and show that the structures generated are more accurate and better defined as a result. We also show that ANSURR can be used as a guide to know when the structure calculation is good enough to stop.
Collapse
Affiliation(s)
- Nicholas J Fowler
- School of Biosciences, University of Sheffield, S10 2TN Sheffield, UK.
| | - Marym F Albalwi
- School of Biosciences, University of Sheffield, S10 2TN Sheffield, UK
| | - Subin Lee
- School of Biosciences, University of Sheffield, S10 2TN Sheffield, UK
| | - Andrea M Hounslow
- School of Biosciences, University of Sheffield, S10 2TN Sheffield, UK
| | - Mike P Williamson
- School of Biosciences, University of Sheffield, S10 2TN Sheffield, UK.
| |
Collapse
|
3
|
Migdadi L, Telfah A, Hergenröder R, Wöhler C. Novelty detection for metabolic dynamics established on breast cancer tissue using 2D NMR TOCSY spectra. Comput Struct Biotechnol J 2022; 20:2965-2977. [PMID: 35782733 PMCID: PMC9213235 DOI: 10.1016/j.csbj.2022.05.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 05/26/2022] [Accepted: 05/26/2022] [Indexed: 11/30/2022] Open
Abstract
Most metabolic profiling approaches focus only on identifying pre-known metabolites on NMR TOCSY spectrum using configured parameters. However, there is a lack of tasks dealing with automating the detection of new metabolites that might appear during the dynamic evolution of biological cells. Novelty detection is a category of machine learning that is used to identify data that emerge during the test phase and were not considered during the training phase. We propose a novelty detection system for detecting novel metabolites in the 2D NMR TOCSY spectrum of a breast cancer-tissue sample. We build one- and multi-class recognition systems using different classifiers such as, Kernel Null Foley-Sammon Transform, Kernel Density Estimation, and Support Vector Data Description. The training models were constructed based on different sizes of training data and are used in the novelty detection procedure. Multiple evaluation measures were applied to test the performance of the novelty detection methods. Depending on the training data size, all classifiers were able to achieve 0% false positive rates and total misclassification error in addition to 100% true positive rates. The median total time for the novelty detection process varies between 1.5 and 20 seconds, depending on the classifier and the amount of training data. The results of our novel metabolic profiling method demonstrate its suitability, robustness and speed in automated metabolic research.
Collapse
Key Words
- 2D NMR TOCSY
- ATP, Adenosine Triphosphate
- AUC, Area under Curve
- BMRB, Biological Magnetic Resonance Data Bank
- Breast cancer
- Chemometrics
- Classification
- HMDB, Human Metabolome Database
- KDE, Kernel Density Estimation
- KNFST, Kernel Null Foley–Sammon Transform
- Machine learning
- Metabolic profiling
- Metabolomics
- NMR, Nuclear Magnetic Resonance
- Novelty detection
- ROC, Receiver Operating Characteristic
- SVDD, Support Vector Data Description
- TOCSY, Total Correlation Spectroscopy
Collapse
Affiliation(s)
- Lubaba Migdadi
- Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V, 44139 Dortmund, Germany
- Image Analysis Group, TU Dortmund, 44227 Dortmund, Germany
| | - Ahmad Telfah
- Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V, 44139 Dortmund, Germany
| | - Roland Hergenröder
- Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V, 44139 Dortmund, Germany
| | | |
Collapse
|
4
|
The accuracy of protein structures in solution determined by AlphaFold and NMR. Structure 2022; 30:925-933.e2. [DOI: 10.1016/j.str.2022.04.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 03/18/2022] [Accepted: 04/13/2022] [Indexed: 02/05/2023]
|
5
|
Chen D, Wang Z, Guo D, Orekhov V, Qu X. Review and Prospect: Deep Learning in Nuclear Magnetic Resonance Spectroscopy. Chemistry 2020; 26:10391-10401. [PMID: 32251549 DOI: 10.1002/chem.202000246] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 04/03/2020] [Indexed: 01/08/2023]
Abstract
Since the concept of deep learning (DL) was formally proposed in 2006, it has had a major impact on academic research and industry. Nowadays, DL provides an unprecedented way to analyze and process data with demonstrated great results in computer vision, medical imaging, natural language processing, and so forth. Herein, applications of DL in NMR spectroscopy are summarized, and a perspective for DL as an entirely new approach that is likely to transform NMR spectroscopy into a much more efficient and powerful technique in chemistry and life sciences is outlined.
Collapse
Affiliation(s)
- Dicheng Chen
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, P.O. Box 979, Xiamen, 361005, P.R. China
| | - Zi Wang
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, P.O. Box 979, Xiamen, 361005, P.R. China
| | - Di Guo
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, 361024, P.R. China
| | - Vladislav Orekhov
- Department of Chemistry and Molecular Biology, University of Gothenburg, Box 465, Gothenburg, 40530, Sweden
| | - Xiaobo Qu
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, P.O. Box 979, Xiamen, 361005, P.R. China
| |
Collapse
|
6
|
Vila JA, Arnautova YA. 13C Chemical Shifts in Proteins: A Rich Source of Encoded Structural Information. SPRINGER SERIES ON BIO- AND NEUROSYSTEMS 2019. [PMCID: PMC7123919 DOI: 10.1007/978-3-319-95843-9_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Despite the formidable progress in Nuclear Magnetic Resonance (NMR) spectroscopy, quality assessment of NMR-derived structures remains as an important problem. Thus, validation of protein structures is essential for the spectroscopists, since it could enable them to detect structural flaws and potentially guide their efforts in further refinement. Moreover, availability of accurate and efficient validation tools would help molecular biologists and computational chemists to evaluate quality of available experimental structures and to select a protein model which is the most suitable for a given scientific problem. The 13Cα nuclei are ubiquitous in proteins, moreover, their shieldings are easily obtainable from NMR experiments and represent a rich source of encoded structural information that makes 13Cα chemical shifts an attractive candidate for use in computational methods aimed at determination and validation of protein structures. In this chapter, the basis of a novel methodology of computing, at the quantum chemical level of theory, the 13Cα shielding for the amino acid residues in proteins is described. We also identify and examine the main factors affecting the 13Cα-shielding computation. Finally, we illustrate how the information encoded in the 13C chemical shifts can be used for a number of applications, viz., from protein structure prediction of both α-helical and β-sheet conformations, to determination of the fraction of the tautomeric forms of the imidazole ring of histidine in proteins as a function of pH or to accurate detection of structural flaws, at a residue-level, in NMR-determined protein models.
Collapse
|
7
|
Ito K, Obuchi Y, Chikayama E, Date Y, Kikuchi J. Exploratory machine-learned theoretical chemical shifts can closely predict metabolic mixture signals. Chem Sci 2018; 9:8213-8220. [PMID: 30542569 PMCID: PMC6240814 DOI: 10.1039/c8sc03628d] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 08/23/2018] [Indexed: 12/30/2022] Open
Abstract
Various chemical shift predictive methodologies have been studied and developed, but there remains the problem of prediction accuracy. Assigning the NMR signals of metabolic mixtures requires high predictive performance owing to the complexity of the signals. Here we propose a new predictive tool that combines quantum chemistry and machine learning. A scaling factor as the objective variable to correct the errors of 2355 theoretical chemical shifts was optimized by exploring 91 machine learning algorithms and using the partial structure of 150 compounds as explanatory variables. The optimal predictive model gave RMSDs between experimental and predicted chemical shifts of 0.2177 ppm for δ 1H and 3.3261 ppm for δ 13C in the test data; thus, better accuracy was achieved compared with existing empirical and quantum chemical methods. The utility of the predictive model was demonstrated by applying it to assignments of experimental NMR signals of a complex metabolic mixture.
Collapse
Affiliation(s)
- Kengo Ito
- RIKEN Center for Sustainable Resource Science , 1-7-22 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan .
- Graduate School of Medical Life Science , Yokohama City University , 1-7-29 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan
| | - Yuka Obuchi
- Graduate School of Medical Life Science , Yokohama City University , 1-7-29 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan
| | - Eisuke Chikayama
- RIKEN Center for Sustainable Resource Science , 1-7-22 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan .
- Department of Information Systems , Niigata University of International and Information Studies , 3-1-1 Mizukino, Nishi-ku , Niigata-shi , Niigata 950-2292 , Japan
| | - Yasuhiro Date
- RIKEN Center for Sustainable Resource Science , 1-7-22 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan .
- Graduate School of Medical Life Science , Yokohama City University , 1-7-29 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan
| | - Jun Kikuchi
- RIKEN Center for Sustainable Resource Science , 1-7-22 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan .
- Graduate School of Medical Life Science , Yokohama City University , 1-7-29 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan
- Graduate School of Bioagricultural Sciences , Nagoya University , 1 Furo-cho, Chikusa-ku , Nagoya , Aichi 464-0810 , Japan
| |
Collapse
|
8
|
Jin X, Zhu T, Zhang JZH, He X. Automated Fragmentation QM/MM Calculation of NMR Chemical Shifts for Protein-Ligand Complexes. Front Chem 2018; 6:150. [PMID: 29868556 PMCID: PMC5952040 DOI: 10.3389/fchem.2018.00150] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Accepted: 04/16/2018] [Indexed: 01/13/2023] Open
Abstract
In this study, the automated fragmentation quantum mechanics/molecular mechanics (AF-QM/MM) method was applied for NMR chemical shift calculations of protein-ligand complexes. In the AF-QM/MM approach, the protein binding pocket is automatically divided into capped fragments (within ~200 atoms) for density functional theory (DFT) calculations of NMR chemical shifts. Meanwhile, the solvent effect was also included using the Poission-Boltzmann (PB) model, which properly accounts for the electrostatic polarization effect from the solvent for protein-ligand complexes. The NMR chemical shifts of neocarzinostatin (NCS)-chromophore binding complex calculated by AF-QM/MM accurately reproduce the large-sized system results. The 1H chemical shift perturbations (CSP) between apo-NCS and holo-NCS predicted by AF-QM/MM are also in excellent agreement with experimental results. Furthermore, the DFT calculated chemical shifts of the chromophore and residues in the NCS binding pocket can be utilized as molecular probes to identify the correct ligand binding conformation. By combining the CSP of the atoms in the binding pocket with the Glide scoring function, the new scoring function can accurately distinguish the native ligand pose from decoy structures. Therefore, the AF-QM/MM approach provides an accurate and efficient platform for protein-ligand binding structure prediction based on NMR derived information.
Collapse
Affiliation(s)
- Xinsheng Jin
- State Key Laboratory of Precision Spectroscopy, School of Chemistry and Molecular Engineering, Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, East China Normal University, Shanghai, China
| | - Tong Zhu
- State Key Laboratory of Precision Spectroscopy, School of Chemistry and Molecular Engineering, Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, East China Normal University, Shanghai, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, China
| | - John Z. H. Zhang
- State Key Laboratory of Precision Spectroscopy, School of Chemistry and Molecular Engineering, Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, East China Normal University, Shanghai, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, China
- Department of Chemistry, New York University, New York, NY, United States
| | - Xiao He
- State Key Laboratory of Precision Spectroscopy, School of Chemistry and Molecular Engineering, Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, East China Normal University, Shanghai, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, China
- National Engineering Research Centre for Nanotechnology, Shanghai, China
| |
Collapse
|
9
|
Klukowski P, Augoff M, Zięba M, Drwal M, Gonczarek A, Walczak MJ. NMRNet: a deep learning approach to automated peak picking of protein NMR spectra. Bioinformatics 2018; 34:2590-2597. [DOI: 10.1093/bioinformatics/bty134] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 03/09/2018] [Indexed: 01/13/2023] Open
Affiliation(s)
- Piotr Klukowski
- Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeze Wyspianskiego 27, Wrocław, Poland
| | - Michał Augoff
- Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeze Wyspianskiego 27, Wrocław, Poland
| | - Maciej Zięba
- Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeze Wyspianskiego 27, Wrocław, Poland
| | - Maciej Drwal
- Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeze Wyspianskiego 27, Wrocław, Poland
| | - Adam Gonczarek
- Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeze Wyspianskiego 27, Wrocław, Poland
- Alphamoon Ltd., ul. Wlodkowica 21/3, Wrocław, Poland
| | - Michał J Walczak
- Captor Therapeutics Ltd., ul. Dunska 11, Wrocław, Poland
- Alphamoon Ltd., ul. Wlodkowica 21/3, Wrocław, Poland
| |
Collapse
|
10
|
Nielsen JT, Mulder FAA. POTENCI: prediction of temperature, neighbor and pH-corrected chemical shifts for intrinsically disordered proteins. JOURNAL OF BIOMOLECULAR NMR 2018; 70:141-165. [PMID: 29399725 DOI: 10.1007/s10858-018-0166-5] [Citation(s) in RCA: 99] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 01/25/2018] [Indexed: 05/04/2023]
Abstract
Chemical shifts contain important site-specific information on the structure and dynamics of proteins. Deviations from statistical average values, known as random coil chemical shifts (RCCSs), are extensively used to infer these relationships. Unfortunately, the use of imprecise reference RCCSs leads to biased inference and obstructs the detection of subtle structural features. Here we present a new method, POTENCI, for the prediction of RCCSs that outperforms the currently most authoritative methods. POTENCI is parametrized using a large curated database of chemical shifts for protein segments with validated disorder; It takes pH and temperature explicitly into account, and includes sequence-dependent nearest and next-nearest neighbor corrections as well as second-order corrections. RCCS predictions with POTENCI show root-mean-square values that are lower by 25-78%, with the largest improvements observed for 1Hα and 13C'. It is demonstrated how POTENCI can be applied to analyze subtle deviations from RCCSs to detect small populations of residual structure in intrinsically disorder proteins that were not discernible before. POTENCI source code is available for download, or can be deployed from the URL http://www.protein-nmr.org .
Collapse
Affiliation(s)
- Jakob Toudahl Nielsen
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Gustav Wieds Vej 14, 8000, Aarhus C, Denmark.
- Department of Chemistry, Aarhus University, Langelandsgade 140, 8000, Aarhus C, Denmark.
| | - Frans A A Mulder
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Gustav Wieds Vej 14, 8000, Aarhus C, Denmark.
- Department of Chemistry, Aarhus University, Langelandsgade 140, 8000, Aarhus C, Denmark.
| |
Collapse
|
11
|
Yu Z, Li P, Merz KM. Using Ligand-Induced Protein Chemical Shift Perturbations To Determine Protein–Ligand Structures. Biochemistry 2017; 56:2349-2362. [DOI: 10.1021/acs.biochem.7b00170] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Zhuoqin Yu
- Department of Chemistry, Michigan State University, East Lansing, Michigan 48824-1322, United States
| | - Pengfei Li
- Department of Chemistry, Michigan State University, East Lansing, Michigan 48824-1322, United States
| | - Kenneth M. Merz
- Department of Chemistry, Michigan State University, East Lansing, Michigan 48824-1322, United States
| |
Collapse
|
12
|
Jee JG. Strategy for Determining the Structures of Large Biomolecules using the Torsion Angle Dynamics of CYANA. JOURNAL OF THE KOREAN MAGNETIC RESONANCE SOCIETY 2016. [DOI: 10.6564/jkmrs.2016.20.4.102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
13
|
Swails J, Zhu T, He X, Case DA. AFNMR: automated fragmentation quantum mechanical calculation of NMR chemical shifts for biomolecules. JOURNAL OF BIOMOLECULAR NMR 2015; 63:125-39. [PMID: 26232926 PMCID: PMC6556433 DOI: 10.1007/s10858-015-9970-3] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Accepted: 07/20/2015] [Indexed: 05/08/2023]
Abstract
We evaluate the performance of the automated fragmentation quantum mechanics/molecular mechanics approach (AF-QM/MM) on the calculation of protein and nucleic acid NMR chemical shifts. The AF-QM/MM approach models solvent effects implicitly through a set of surface charges computed using the Poisson-Boltzmann equation, and it can also be combined with an explicit solvent model through the placement of water molecules in the first solvation shell around the solute; the latter substantially improves the accuracy of chemical shift prediction of protons involved in hydrogen bonding with solvent. We also compare the performance of AF-QM/MM on proteins and nucleic acids with two leading empirical chemical shift prediction programs SHIFTS and SHIFTX2. Although the empirical programs outperform AF-QM/MM in predicting chemical shifts, the differences are in some cases small, and the latter can be applied to chemical shifts on biomolecules which are outside the training set employed by the empirical programs, such as structures containing ligands, metal centers, and non-standard residues. The AF-QM/MM described here is implemented in version 5 of the SHIFTS software, and is fully automated, so that only a structure in PDB format is required as input.
Collapse
Affiliation(s)
- Jason Swails
- Department of Chemistry and Chemical Biology and BioMaPS Institute, Rutgers University, Piscataway, NJ, 08854, USA
| | - Tong Zhu
- State Key Laboratory of Precision Spectroscopy, Institute of Theoretical and Computational Science, East China Normal University, Shanghai, 200062, China
| | - Xiao He
- State Key Laboratory of Precision Spectroscopy, Institute of Theoretical and Computational Science, East China Normal University, Shanghai, 200062, China.
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China.
| | - David A Case
- Department of Chemistry and Chemical Biology and BioMaPS Institute, Rutgers University, Piscataway, NJ, 08854, USA.
| |
Collapse
|
14
|
Fenwick M, Hoch JC, Ulrich E, Gryk MR. CONNJUR R: an annotation strategy for fostering reproducibility in bio-NMR-protein spectral assignment. JOURNAL OF BIOMOLECULAR NMR 2015; 63:141-50. [PMID: 26253947 PMCID: PMC4864978 DOI: 10.1007/s10858-015-9964-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Accepted: 07/01/2015] [Indexed: 05/21/2023]
Abstract
Reproducibility is a cornerstone of the scientific method, essential for validation of results by independent laboratories and the sine qua non of scientific progress. A key step toward reproducibility of biomolecular NMR studies was the establishment of public data repositories (PDB and BMRB). Nevertheless, bio-NMR studies routinely fall short of the requirement for reproducibility that all the data needed to reproduce the results are published. A key limitation is that considerable metadata goes unpublished, notably manual interventions that are typically applied during the assignment of multidimensional NMR spectra. A general solution to this problem has been elusive, in part because of the wide range of approaches and software packages employed in the analysis of protein NMR spectra. Here we describe an approach for capturing missing metadata during the assignment of protein NMR spectra that can be generalized to arbitrary workflows, different software packages, other biomolecules, or other stages of data analysis in bio-NMR. We also present extensions to the NMR-STAR data dictionary that enable machine archival and retrieval of the "missing" metadata.
Collapse
Affiliation(s)
- Matthew Fenwick
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT, 06030-3305, USA
| | - Jeffrey C Hoch
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT, 06030-3305, USA
| | - Eldon Ulrich
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Michael R Gryk
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT, 06030-3305, USA.
| |
Collapse
|
15
|
Abstract
Three-dimensional structures of proteins in solution can be calculated on the basis of conformational restraints derived from NMR measurements. This chapter gives an overview of the computational methods for NMR protein structure analysis highlighting recent automated methods for the assignment of NMR spectra, the collection of conformational restraints, and the structure calculation.
Collapse
|
16
|
Wenrich BR, Sonstrom RE, Gupta RA, Rovnyak D. Enhanced biosynthetically directed fractional carbon-13 enrichment of proteins for backbone NMR assignments. Protein Expr Purif 2015; 115:1-10. [PMID: 26256059 DOI: 10.1016/j.pep.2015.08.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2014] [Revised: 07/29/2015] [Accepted: 08/05/2015] [Indexed: 01/28/2023]
Abstract
Routes to carbon-13 enrichment of bacterially expressed proteins include achieving uniform or positionally selective (e.g. ILV-Me, or (13)C', etc.) enrichment. We consider the potential for biosynthetically directed fractional enrichment (e.g. carbon-13 incorporation in the protein less than 100%) for performing routine n-(D)dimensional NMR spectroscopy of proteins. First, we demonstrate an approach to fractional isotope addition where the initial growth media containing natural abundance glucose is replenished at induction with a small amount (e.g. 10%(w/w)u-(13)C-glucose) of enriched nutrient. The approach considered here is to add 10% (e.g. 200mg for a 2g/L culture) u-(13)C-glucose at the induction time (OD600=0.8), resulting in a protein with enhanced (13)C incorporation that gives almost the same NMR signal levels as an exact 20% (13)C sample. Second, whereas fractional enrichment is used for obtaining stereospecific methyl assignments, we find that (13)C incorporation levels no greater than 20%(w/w) yield (13)C and (13)C-(13)C spin pair incorporation sufficient to conduct typical 3D-bioNMR backbone experiments on moderate instrumentation (600 MHz, RT probe). Typical 3D-bioNMR experiments of a fractionally enriched protein yield expected backbone connectivities, and did not show amino acid biases in this work, with one exception. When adding 10% u-(13)C glucose to expression media at induction, there is poor preservation of (13)Cα-(13)Cβ spin pairs in the amino acids ILV, leading to the absence of Cβ signals in HNCACB spectra for ILV, a potentially useful editing effect. Enhanced fractional carbon-13 enrichment provides lower-cost routes to high throughput protein NMR studies, and makes modern protein NMR more cost-accessible.
Collapse
Affiliation(s)
| | | | - Riju A Gupta
- Bucknell University, Lewisburg, PA 17837, United States
| | - David Rovnyak
- Bucknell University, Lewisburg, PA 17837, United States.
| |
Collapse
|
17
|
Rosato A, Vranken W, Fogh RH, Ragan TJ, Tejero R, Pederson K, Lee HW, Prestegard JH, Yee A, Wu B, Lemak A, Houliston S, Arrowsmith CH, Kennedy M, Acton TB, Xiao R, Liu G, Montelione GT, Vuister GW. The second round of Critical Assessment of Automated Structure Determination of Proteins by NMR: CASD-NMR-2013. JOURNAL OF BIOMOLECULAR NMR 2015; 62:413-24. [PMID: 26071966 PMCID: PMC4569658 DOI: 10.1007/s10858-015-9953-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2015] [Accepted: 05/28/2015] [Indexed: 05/21/2023]
Abstract
The second round of the community-wide initiative Critical Assessment of automated Structure Determination of Proteins by NMR (CASD-NMR-2013) comprised ten blind target datasets, consisting of unprocessed spectral data, assigned chemical shift lists and unassigned NOESY peak and RDC lists, that were made available in both curated (i.e. manually refined) or un-curated (i.e. automatically generated) form. Ten structure calculation programs, using fully automated protocols only, generated a total of 164 three-dimensional structures (entries) for the ten targets, sometimes using both curated and un-curated lists to generate multiple entries for a single target. The accuracy of the entries could be established by comparing them to the corresponding manually solved structure of each target, which was not available at the time the data were provided. Across the entire data set, 71 % of all entries submitted achieved an accuracy relative to the reference NMR structure better than 1.5 Å. Methods based on NOESY peak lists achieved even better results with up to 100% of the entries within the 1.5 Å threshold for some programs. However, some methods did not converge for some targets using un-curated NOESY peak lists. Over 90% of the entries achieved an accuracy better than the more relaxed threshold of 2.5 Å that was used in the previous CASD-NMR-2010 round. Comparisons between entries generated with un-curated versus curated peaks show only marginal improvements for the latter in those cases where both calculations converged.
Collapse
Affiliation(s)
- Antonio Rosato
- Department of Chemistry and Magnetic Resonance Center, University of Florence, 50019, Sesto Fiorentino, Italy
| | - Wim Vranken
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium
- (IB)2 Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Triomflaan, 1050, Brussels, Belgium
| | - Rasmus H Fogh
- Department of Biochemistry, School of Biological Sciences, University of Leicester, Henry Wellcome Building, Lancaster Road, Leicester, LE1 9HN, UK
| | - Timothy J Ragan
- Department of Biochemistry, School of Biological Sciences, University of Leicester, Henry Wellcome Building, Lancaster Road, Leicester, LE1 9HN, UK
| | - Roberto Tejero
- Departamento de Química Física, Universidad de Valencia, Avda. Dr. Moliner 50, 46100, Burjassot (Valencia), Spain
| | - Kari Pederson
- Complex Carbohydrate Research Center and Northeast Structural Genomics Consortium, University of Georgia, Athens, GA, 30602, USA
| | - Hsiau-Wei Lee
- Complex Carbohydrate Research Center and Northeast Structural Genomics Consortium, University of Georgia, Athens, GA, 30602, USA
| | - James H Prestegard
- Complex Carbohydrate Research Center and Northeast Structural Genomics Consortium, University of Georgia, Athens, GA, 30602, USA
| | - Adelinda Yee
- Department of Medical Biophysics, Cancer Genomics and Proteomics, Ontario Cancer Institute, Northeast Structural Genomics Consortium, University of Toronto, Toronto, ON, M5G 1L7, Canada
| | - Bin Wu
- Department of Medical Biophysics, Cancer Genomics and Proteomics, Ontario Cancer Institute, Northeast Structural Genomics Consortium, University of Toronto, Toronto, ON, M5G 1L7, Canada
| | - Alexander Lemak
- Department of Medical Biophysics, Cancer Genomics and Proteomics, Ontario Cancer Institute, Northeast Structural Genomics Consortium, University of Toronto, Toronto, ON, M5G 1L7, Canada
| | - Scott Houliston
- Department of Medical Biophysics, Cancer Genomics and Proteomics, Ontario Cancer Institute, Northeast Structural Genomics Consortium, University of Toronto, Toronto, ON, M5G 1L7, Canada
| | - Cheryl H Arrowsmith
- Department of Medical Biophysics, Cancer Genomics and Proteomics, Ontario Cancer Institute, Northeast Structural Genomics Consortium, University of Toronto, Toronto, ON, M5G 1L7, Canada
| | - Michael Kennedy
- Department of Chemistry and Biochemistry, Northeast Structural Genomics Consortium, Miami University, Oxford, OH, 45056, USA
| | - Thomas B Acton
- Department of Molecular Biology and Biochemistry, Center for Advanced Biotechnology and Medicine, Northeast Structural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
- Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Piscataway, NJ, 08854, USA
| | - Rong Xiao
- Department of Molecular Biology and Biochemistry, Center for Advanced Biotechnology and Medicine, Northeast Structural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
- Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Piscataway, NJ, 08854, USA
| | - Gaohua Liu
- Department of Molecular Biology and Biochemistry, Center for Advanced Biotechnology and Medicine, Northeast Structural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
- Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Piscataway, NJ, 08854, USA
| | - Gaetano T Montelione
- Department of Molecular Biology and Biochemistry, Center for Advanced Biotechnology and Medicine, Northeast Structural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA.
- Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Piscataway, NJ, 08854, USA.
| | - Geerten W Vuister
- Department of Biochemistry, School of Biological Sciences, University of Leicester, Henry Wellcome Building, Lancaster Road, Leicester, LE1 9HN, UK.
| |
Collapse
|
18
|
Güntert P, Buchner L. Combined automated NOE assignment and structure calculation with CYANA. JOURNAL OF BIOMOLECULAR NMR 2015; 62:453-71. [PMID: 25801209 DOI: 10.1007/s10858-015-9924-9] [Citation(s) in RCA: 264] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 03/17/2015] [Indexed: 05/12/2023]
Abstract
The automated assignment of NOESY cross peaks has become a fundamental technique for NMR protein structure analysis. A widely used algorithm for this purpose is implemented in the program CYANA. It has been used for a large number of structure determinations of proteins in solution but was so far not described in full detail. In this paper we present a complete description of the CYANA implementation of automated NOESY assignment, which differs extensively from its predecessor CANDID by the use of a consistent probabilistic treatment, and we discuss its performance in the second round of the critical assessment of structure determination by NMR.
Collapse
Affiliation(s)
- Peter Güntert
- Center for Biomolecular Magnetic Resonance, Institute of Biophysical Chemistry, Goethe University Frankfurt am Main, Max-von-Laue-Str. 9, 60438, Frankfurt am Main, Germany.
- Laboratory of Physical Chemistry, ETH Zürich, Zurich, Switzerland.
- Graduate School of Science, Tokyo Metropolitan University, Hachioji, Tokyo, Japan.
| | - Lena Buchner
- Center for Biomolecular Magnetic Resonance, Institute of Biophysical Chemistry, Goethe University Frankfurt am Main, Max-von-Laue-Str. 9, 60438, Frankfurt am Main, Germany
| |
Collapse
|
19
|
Mareuil F, Malliavin TE, Nilges M, Bardiaux B. Improved reliability, accuracy and quality in automated NMR structure calculation with ARIA. JOURNAL OF BIOMOLECULAR NMR 2015; 62:425-438. [PMID: 25861734 PMCID: PMC4569677 DOI: 10.1007/s10858-015-9928-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 04/03/2015] [Indexed: 05/30/2023]
Abstract
In biological NMR, assignment of NOE cross-peaks and calculation of atomic conformations are critical steps in the determination of reliable high-resolution structures. ARIA is an automated approach that performs NOE assignment and structure calculation in a concomitant manner in an iterative procedure. The log-harmonic shape for distance restraint potential and the Bayesian weighting of distance restraints, recently introduced in ARIA, were shown to significantly improve the quality and the accuracy of determined structures. In this paper, we propose two modifications of the ARIA protocol: (1) the softening of the force field together with adapted hydrogen radii, which is meaningful in the context of the log-harmonic potential with Bayesian weighting, (2) a procedure that automatically adjusts the violation tolerance used in the selection of active restraints, based on the fitting of the structure to the input data sets. The new ARIA protocols were fine-tuned on a set of eight protein targets from the CASD-NMR initiative. As a result, the convergence problems previously observed for some targets was resolved and the obtained structures exhibited better quality. In addition, the new ARIA protocols were applied for the structure calculation of ten new CASD-NMR targets in a blind fashion, i.e. without knowing the actual solution. Even though optimisation of parameters and pre-filtering of unrefined NOE peak lists were necessary for half of the targets, ARIA consistently and reliably determined very precise and highly accurate structures for all cases. In the context of integrative structural biology, an increasing number of experimental methods are used that produce distance data for the determination of 3D structures of macromolecules, stressing the importance of methods that successfully make use of ambiguous and noisy distance data.
Collapse
Affiliation(s)
- Fabien Mareuil
- Unité de Bioinformatique Structurale, CNRS UMR 3528, Institut Pasteur, 25-28 rue du Dr Roux, 75724, Paris Cedex 15, France
- Cellule d'Informatique pour la Biologie, Institut Pasteur, 25-28 rue du Dr Roux, 75724, Paris Cedex 15, France
| | - Thérèse E Malliavin
- Unité de Bioinformatique Structurale, CNRS UMR 3528, Institut Pasteur, 25-28 rue du Dr Roux, 75724, Paris Cedex 15, France
| | - Michael Nilges
- Unité de Bioinformatique Structurale, CNRS UMR 3528, Institut Pasteur, 25-28 rue du Dr Roux, 75724, Paris Cedex 15, France
| | - Benjamin Bardiaux
- Unité de Bioinformatique Structurale, CNRS UMR 3528, Institut Pasteur, 25-28 rue du Dr Roux, 75724, Paris Cedex 15, France.
| |
Collapse
|
20
|
Guerry P, Duong VD, Herrmann T. CASD-NMR 2: robust and accurate unsupervised analysis of raw NOESY spectra and protein structure determination with UNIO. JOURNAL OF BIOMOLECULAR NMR 2015; 62:473-480. [PMID: 25917899 DOI: 10.1007/s10858-015-9934-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 04/18/2015] [Indexed: 06/04/2023]
Abstract
UNIO is a comprehensive software suite for protein NMR structure determination that enables full automation of all NMR data analysis steps involved--including signal identification in NMR spectra, sequence-specific backbone and side-chain resonance assignment, NOE assignment and structure calculation. Within the framework of the second round of the community-wide stringent blind NMR structure determination challenge (CASD-NMR 2), we participated in two categories of CASD-NMR 2, namely using either raw NMR spectra or unrefined NOE peak lists as input. A total of 15 resulting NMR structure bundles were submitted for 9 out of 10 blind protein targets. All submitted UNIO structures accurately coincided with the corresponding blind targets as documented by an average backbone root mean-square deviation to the reference proteins of only 1.2 Å. Also, the precision of the UNIO structure bundles was virtually identical to the ensemble of reference structures. By assessing the quality of all UNIO structures submitted to the two categories, we find throughout that only the UNIO-ATNOS/CANDID approach using raw NMR spectra consistently yielded structure bundles of high quality for direct deposition in the Protein Data Bank. In conclusion, the results obtained in CASD-NMR 2 are another vital proof for robust, accurate and unsupervised NMR data analysis by UNIO for real-world applications.
Collapse
Affiliation(s)
- Paul Guerry
- Institut des Sciences Analytiques, Centre de RMN à très Hauts Champs, Université de Lyon (UMR 5280 CNRS, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon 1), 5 rue de la Doua, 69100, Villeurbanne, France
| | - Viet Dung Duong
- Institut des Sciences Analytiques, Centre de RMN à très Hauts Champs, Université de Lyon (UMR 5280 CNRS, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon 1), 5 rue de la Doua, 69100, Villeurbanne, France
| | - Torsten Herrmann
- Institut des Sciences Analytiques, Centre de RMN à très Hauts Champs, Université de Lyon (UMR 5280 CNRS, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon 1), 5 rue de la Doua, 69100, Villeurbanne, France.
| |
Collapse
|
21
|
Ragan TJ, Fogh RH, Tejero R, Vranken W, Montelione GT, Rosato A, Vuister GW. Analysis of the structural quality of the CASD-NMR 2013 entries. JOURNAL OF BIOMOLECULAR NMR 2015; 62:527-40. [PMID: 26032236 PMCID: PMC4569653 DOI: 10.1007/s10858-015-9949-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 05/20/2015] [Indexed: 05/30/2023]
Abstract
We performed a comprehensive structure validation of both automated and manually generated structures of the 10 targets of the CASD-NMR-2013 effort. We established that automated structure determination protocols are capable of reliably producing structures of comparable accuracy and quality to those generated by a skilled researcher, at least for small, single domain proteins such as the ten targets tested. The most robust results appear to be obtained when NOESY peak lists are used either as the primary input data or to augment chemical shift data without the need to manually filter such lists. A detailed analysis of the long-range NOE restraints generated by the different programs from the same data showed a surprisingly low degree of overlap. Additionally, we found that there was no significant correlation between the extent of the NOE restraint overlap and the accuracy of the structure. This result was surprising given the importance of NOE data in producing good quality structures. We suggest that this could be explained by the information redundancy present in NOEs between atoms contained within a fixed covalent network.
Collapse
Affiliation(s)
- Timothy J Ragan
- Department of Biochemistry, School of Biological Sciences, University of Leicester, Henry Wellcome Building, Lancaster Road, Leicester, LE1 9HN, UK
| | - Rasmus H Fogh
- Department of Biochemistry, School of Biological Sciences, University of Leicester, Henry Wellcome Building, Lancaster Road, Leicester, LE1 9HN, UK
| | - Roberto Tejero
- Departamento de Química Física, Universidad de Valencia, Avda. Dr. Moliner 50, 46100, Burjassot (Valencia), Spain
| | - Wim Vranken
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, Brussels, Belgium
- (IB)2 Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Triomflaan, 1050, Brussels, Belgium
| | - Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, and Northeast Structural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
- Robert Wood Johnson Medical School, Piscataway, NJ, 08854, USA
| | - Antonio Rosato
- Magnetic Resonance Center, Department of Chemistry, University of Florence, 50019, Sesto Fiorentino, Italy
| | - Geerten W Vuister
- Department of Biochemistry, School of Biological Sciences, University of Leicester, Henry Wellcome Building, Lancaster Road, Leicester, LE1 9HN, UK.
| |
Collapse
|
22
|
Strickland M, Stephens T, Liu J, Tjandra N. Exploiting image registration for automated resonance assignment in NMR. JOURNAL OF BIOMOLECULAR NMR 2015; 62:143-156. [PMID: 25828257 PMCID: PMC4452424 DOI: 10.1007/s10858-015-9926-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 03/24/2015] [Indexed: 06/04/2023]
Abstract
Analysis of protein NMR data involves the assignment of resonance peaks in a number of multidimensional data sets. To establish resonance assignment a three-dimensional search is used to match a pair of common variables, such as chemical shifts of the same spin system, in different NMR spectra. We show that by displaying the variables to be compared in two-dimensional plots the process can be simplified. Moreover, by utilizing a fast Fourier transform cross-correlation algorithm, more common to the field of image registration or pattern matching, we can automate this process. Here, we use sequential NMR backbone assignment as an example to show that the combination of correlation plots and segmented pattern matching establishes fast backbone assignment in fifteen proteins of varying sizes. For example, the 265-residue RalBP1 protein was 95.4% correctly assigned in 10 s. The same concept can be applied to any multidimensional NMR data set where analysis comprises the comparison of two variables. This modular and robust approach offers high efficiency with excellent computational scalability and could be easily incorporated into existing assignment software.
Collapse
Affiliation(s)
| | | | | | - Nico Tjandra
- To whom correspondence should be addressed: Building 50, Room 3503, NHLBI, NIH, Bethesda, MD 20892, Phone: (301) 402-3029, Fax: (301) 402-3405,
| |
Collapse
|
23
|
Zhu T, Zhang JZH, He X. Correction of erroneously packed protein's side chains in the NMR structure based on ab initio chemical shift calculations. Phys Chem Chem Phys 2015; 16:18163-9. [PMID: 25052367 DOI: 10.1039/c4cp02553a] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
In this work, protein side chain (1)H chemical shifts are used as probes to detect and correct side-chain packing errors in protein's NMR structures through structural refinement. By applying the automated fragmentation quantum mechanics/molecular mechanics (AF-QM/MM) method for ab initio calculation of chemical shifts, incorrect side chain packing was detected in the NMR structures of the Pin1 WW domain. The NMR structure is then refined by using molecular dynamics simulation and the polarized protein-specific charge (PPC) model. The computationally refined structure of the Pin1 WW domain is in excellent agreement with the corresponding X-ray structure. In particular, the use of the PPC model yields a more accurate structure than that using the standard (nonpolarizable) force field. For comparison, some of the widely used empirical models for chemical shift calculations are unable to correctly describe the relationship between the particular proton chemical shift and protein structures. The AF-QM/MM method can be used as a powerful tool for protein NMR structure validation and structural flaw detection.
Collapse
Affiliation(s)
- Tong Zhu
- State Key Laboratory of Precision Spectroscopy, Institute of Theoretical and Computational Science, East China Normal University, Shanghai, 200062, China.
| | | | | |
Collapse
|
24
|
Buchner L, Güntert P. Systematic evaluation of combined automated NOE assignment and structure calculation with CYANA. JOURNAL OF BIOMOLECULAR NMR 2015; 62:81-95. [PMID: 25796507 DOI: 10.1007/s10858-015-9921-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 03/16/2015] [Indexed: 05/07/2023]
Abstract
The automated assignment of NOESY cross peaks has become a fundamental technique for NMR protein structure analysis. A widely used algorithm for this purpose is implemented in the program CYANA. It has been used for a large number of structure determinations of proteins in solution but a systematic evaluation of its performance has not yet been reported. In this paper we systematically analyze the reliability of combined automated NOESY assignment and structure calculation with CYANA under a variety of conditions on the basis of the experimental NMR data sets of ten proteins. To evaluate the robustness of the algorithm, the original high-quality experimental data sets were modified in different ways to simulate the effect of data imperfections, i.e. incomplete or erroneous chemical shift assignments, missing NOESY cross peaks, inaccurate peak positions, inaccurate peak intensities, lower dimensionality NOESY spectra, and higher tolerances for the matching of chemical shifts and peak positions. The results show that the algorithm is remarkably robust with regard to imperfections of the NOESY peak lists and the chemical shift tolerances but susceptible to lacking or erroneous resonance assignments, in particular for nuclei that are involved in many NOESY cross peaks.
Collapse
Affiliation(s)
- Lena Buchner
- Institute of Biophysical Chemistry, Center for Biomolecular Magnetic Resonance, and Frankfurt Institute of Advanced Studies, Goethe University Frankfurt am Main, Max-von-Laue-Str. 9, 60438, Frankfurt am Main, Germany
| | | |
Collapse
|
25
|
Cannistraci CV, Abbas A, Gao X. Median Modified Wiener Filter for nonlinear adaptive spatial denoising of protein NMR multidimensional spectra. Sci Rep 2015; 5:8017. [PMID: 25619991 PMCID: PMC4306135 DOI: 10.1038/srep08017] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Accepted: 12/29/2014] [Indexed: 11/21/2022] Open
Abstract
Denoising multidimensional NMR-spectra is a fundamental step in NMR protein structure determination. The state-of-the-art method uses wavelet-denoising, which may suffer when applied to non-stationary signals affected by Gaussian-white-noise mixed with strong impulsive artifacts, like those in multi-dimensional NMR-spectra. Regrettably, Wavelet's performance depends on a combinatorial search of wavelet shapes and parameters; and multi-dimensional extension of wavelet-denoising is highly non-trivial, which hampers its application to multidimensional NMR-spectra. Here, we endorse a diverse philosophy of denoising NMR-spectra: less is more! We consider spatial filters that have only one parameter to tune: the window-size. We propose, for the first time, the 3D extension of the median-modified-Wiener-filter (MMWF), an adaptive variant of the median-filter, and also its novel variation named MMWF*. We test the proposed filters and the Wiener-filter, an adaptive variant of the mean-filter, on a benchmark set that contains 16 two-dimensional and three-dimensional NMR-spectra extracted from eight proteins. Our results demonstrate that the adaptive spatial filters significantly outperform their non-adaptive versions. The performance of the new MMWF* on 2D/3D-spectra is even better than wavelet-denoising. Noticeably, MMWF* produces stable high performance almost invariant for diverse window-size settings: this signifies a consistent advantage in the implementation of automatic pipelines for protein NMR-spectra analysis.
Collapse
Affiliation(s)
- Carlo Vittorio Cannistraci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Ahmed Abbas
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
26
|
Vila JA, Arnautova YA. 13C Chemical Shifts in Proteins: A Rich Source of Encoded Structural Information. COMPUTATIONAL METHODS TO STUDY THE STRUCTURE AND DYNAMICS OF BIOMOLECULES AND BIOMOLECULAR PROCESSES 2014. [PMCID: PMC7121069 DOI: 10.1007/978-3-642-28554-7_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Despite the formidable progress in Nuclear Magnetic Resonance (NMR) spectroscopy, quality assessment of NMR-derived structures remains as an important problem. Thus, validation of protein structures is essential for the spectroscopists, since it could enable them to detect structural flaws and potentially guide their efforts in further refinement. Moreover, availability of accurate and efficient validation tools would help molecular biologists and computational chemists to evaluate quality of available experimental structures and to select a protein model which is the most suitable for a given scientific problem. The 13Cα nuclei are ubiquitous in proteins, moreover, their shieldings are easily obtainable from NMR experiments and represent a rich source of encoded structural information that makes 13Cα chemical shifts an attractive candidate for use in computational methods aimed at determination and validation of protein structures. In this chapter, the basis of a novel methodology of computing, at the quantum chemical level of theory, the 13Cα shielding for the amino acid residues in proteins is described. We also identify and examine the main factors affecting the 13Cα-shielding computation. Finally, we illustrate how the information encoded in the 13C chemical shifts can be used for a number of applications, viz., from protein structure prediction of both α-helical and β-sheet conformations, to determination of the fraction of the tautomeric forms of the imidazole ring of histidine in proteins as a function of pH or to accurate detection of structural flaws, at a residue-level, in NMR-determined protein models.
Collapse
|
27
|
Schmidt E, Güntert P. Reliability of exclusively NOESY-based automated resonance assignment and structure determination of proteins. JOURNAL OF BIOMOLECULAR NMR 2013; 57:193-204. [PMID: 24036635 DOI: 10.1007/s10858-013-9779-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2013] [Accepted: 09/02/2013] [Indexed: 06/02/2023]
Abstract
Protein structure determination by NMR can in principle be speeded up both by reducing the measurement time on the NMR spectrometer and by a more efficient analysis of the spectra. Here we study the reliability of protein structure determination based on a single type of spectra, namely nuclear Overhauser effect spectroscopy (NOESY), using a fully automated procedure for the sequence-specific resonance assignment with the recently introduced FLYA algorithm, followed by combined automated NOE distance restraint assignment and structure calculation with CYANA. This NOESY-FLYA method was applied to eight proteins with 63-160 residues for which resonance assignments and solution structures had previously been determined by the Northeast Structural Genomics Consortium (NESG), and unrefined and refined NOESY data sets have been made available for the Critical Assessment of Automated Structure Determination of Proteins by NMR project. Using only peak lists from three-dimensional (13)C- or (15)N-resolved NOESY spectra as input, the FLYA algorithm yielded for the eight proteins 91-98 % correct backbone and side-chain assignments if manually refined peak lists are used, and 64-96 % correct assignments based on raw peak lists. Subsequent structure calculations with CYANA then produced structures with root-mean-square deviation (RMSD) values to the manually determined reference structures of 0.8-2.0 Å if refined peak lists are used. With raw peak lists, calculations for 4 proteins converged resulting in RMSDs to the reference structure of 0.8-2.8 Å, whereas no convergence was obtained for the four other proteins (two of which did already not converge with the correct manual resonance assignments given as input). These results show that, given high-quality experimental NOESY peak lists, the chemical shift assignments can be uncovered, without any recourse to traditional through-bond type assignment experiments, to an extent that is sufficient for calculating accurate three-dimensional structures.
Collapse
Affiliation(s)
- Elena Schmidt
- Institute of Biophysical Chemistry, Center for Biomolecular Magnetic Resonance, Frankfurt Institute for Advanced Studies, Goethe University Frankfurt am Main, Max-von-Laue-Str. 9, 60438, Frankfurt am Main, Germany
| | | |
Collapse
|
28
|
Williamson MP. Using chemical shift perturbation to characterise ligand binding. PROGRESS IN NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY 2013; 73:1-16. [PMID: 23962882 DOI: 10.1016/j.pnmrs.2013.02.001] [Citation(s) in RCA: 956] [Impact Index Per Article: 86.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Revised: 02/12/2013] [Accepted: 02/18/2013] [Indexed: 05/05/2023]
Abstract
Chemical shift perturbation (CSP, chemical shift mapping or complexation-induced changes in chemical shift, CIS) follows changes in the chemical shifts of a protein when a ligand is added, and uses these to determine the location of the binding site, the affinity of the ligand, and/or possibly the structure of the complex. A key factor in determining the appearance of spectra during a titration is the exchange rate between free and bound, or more specifically the off-rate koff. When koff is greater than the chemical shift difference between free and bound, which typically equates to an affinity Kd weaker than about 3μM, then exchange is fast on the chemical shift timescale. Under these circumstances, the observed shift is the population-weighted average of free and bound, which allows Kd to be determined from measurement of peak positions, provided the measurements are made appropriately. (1)H shifts are influenced to a large extent by through-space interactions, whereas (13)Cα and (13)Cβ shifts are influenced more by through-bond effects. (15)N and (13)C' shifts are influenced both by through-bond and by through-space (hydrogen bonding) interactions. For determining the location of a bound ligand on the basis of shift change, the most appropriate method is therefore usually to measure (15)N HSQC spectra, calculate the geometrical distance moved by the peak, weighting (15)N shifts by a factor of about 0.14 compared to (1)H shifts, and select those residues for which the weighted shift change is larger than the standard deviation of the shift for all residues. Other methods are discussed, in particular the measurement of (13)CH3 signals. Slow to intermediate exchange rates lead to line broadening, and make Kd values very difficult to obtain. There is no good way to distinguish changes in chemical shift due to direct binding of the ligand from changes in chemical shift due to allosteric change. Ligand binding at multiple sites can often be characterised, by simultaneous fitting of many measured shift changes, or more simply by adding substoichiometric amounts of ligand. The chemical shift changes can be used as restraints for docking ligand onto protein. By use of quantitative calculations of ligand-induced chemical shift changes, it is becoming possible to determine not just the position but also the orientation of ligands.
Collapse
Affiliation(s)
- Mike P Williamson
- Department of Molecular Biology and Biotechnology, University of Sheffield, Firth Court, Western Bank, Sheffield S10 2TN, UK.
| |
Collapse
|
29
|
Schieborr U, Sreeramulu S, Elshorst B, Maurer M, Saxena K, Stehle T, Kudlinzki D, Gande SL, Schwalbe H. MOTOR: model assisted software for NMR structure determination. Proteins 2013; 81:2007-22. [PMID: 23852655 DOI: 10.1002/prot.24361] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2012] [Revised: 06/26/2013] [Accepted: 06/28/2013] [Indexed: 11/06/2022]
Abstract
Eukaryotic proteins with important biological function can be partially unstructured, conformational flexible, or heterogenic. Crystallization trials often fail for such proteins. In NMR spectroscopy, parts of the polypeptide chain undergoing dynamics in unfavorable time regimes cannot be observed. De novo NMR structure determination is seriously hampered when missing signals lead to an incomplete chemical shift assignment resulting in an information content of the NOE data insufficient to determine the structure ab initio. We developed a new protein structure determination strategy for such cases based on a novel NOE assignment strategy utilizing a number of model structures but no explicit reference structure as it is used for bootstrapping like algorithms. The software distinguishes in detail between consistent and mutually exclusive pairs of possible NOE assignments on the basis of different precision levels of measured chemical shifts searching for a set of maximum number of consistent NOE assignments in agreement with 3D space. Validation of the method using the structure of the low molecular-weight-protein tyrosine phosphatase A (MptpA) showed robust results utilizing protein structures with 30-45% sequence identity and 70% of the chemical shift assignments. About 60% of the resonance assignments are sufficient to identify those structural models with highest conformational similarity to the real structure. The software was benchmarked by de novo solution structures of fibroblast growth factor 21 (FGF21) and the extracellular fibroblast growth factor receptor domain FGFR4 D2, which both failed in crystallization trials and in classical NMR structure determination.
Collapse
Affiliation(s)
- Ulrich Schieborr
- Johann Wolfgang Goethe-University Frankfurt, Institute for Organic Chemistry and Chemical Biology, Center for Biomolecular Magnetic Resonance, Max-von-Laue-Str. 7, 60438, Frankfurt am Main, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Buchner L, Schmidt E, Güntert P. Peakmatch: a simple and robust method for peak list matching. JOURNAL OF BIOMOLECULAR NMR 2013; 55:267-77. [PMID: 23329391 DOI: 10.1007/s10858-013-9708-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2012] [Accepted: 01/09/2013] [Indexed: 05/26/2023]
Abstract
Peak lists are commonly used in NMR as input data for various software tools such as automatic assignment and structure calculation programs. Inconsistencies of chemical shift referencing among different peak lists or between peak and chemical shift lists can cause severe problems during peak assignment. Here we present a simple and robust tool to achieve self-consistency of the chemical shift referencing among a set of peak lists. The Peakmatch algorithm matches a set of peak lists to a specified reference peak list, neither of which have to be assigned. The chemical shift referencing offset between two peak lists is determined by optimizing an assignment-free match score function using either a complete grid search or downhill simplex optimization. It is shown that peak lists from many different types of spectra can be matched reliably as long as they contain at least two corresponding dimensions. Using a simulated peak list, the Peakmatch algorithm can also be used to obtain the optimal agreement between a chemical shift list and experimental peak lists. Combining these features makes Peakmatch a useful tool that can be applied routinely before automatic assignment or structure calculation in order to obtain an optimized input data set.
Collapse
Affiliation(s)
- Lena Buchner
- Institute of Biophysical Chemistry, Center for Biomolecular Magnetic Resonance, and Frankfurt Institute for Advanced Studies, Goethe University Frankfurt am Main, Max-von-Laue-Str. 9, 60438 Frankfurt am Main, Germany
| | | | | |
Collapse
|
31
|
Kim TR, Ji S, Lee S, Chu IS, Shin S, Lee J. A hybrid modeling strategy using Nuclear Overhauser Effect data with contact information. Chem Phys Lett 2012. [DOI: 10.1016/j.cplett.2012.09.074] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
32
|
Alipanahi B, Krislock N, Ghodsi A, Wolkowicz H, Donaldson L, Li M. Determining protein structures from NOESY distance constraints by semidefinite programming. J Comput Biol 2012; 20:296-310. [PMID: 23113706 DOI: 10.1089/cmb.2012.0089] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Contemporary practical methods for protein nuclear magnetic resonance (NMR) structure determination use molecular dynamics coupled with a simulated annealing schedule. The objective of these methods is to minimize the error of deviating from the nuclear overhauser effect (NOE) distance constraints. However, the corresponding objective function is highly nonconvex and, consequently, difficult to optimize. Euclidean distance matrix (EDM) methods based on semidefinite programming (SDP) provide a natural framework for these problems. However, the high complexity of SDP solvers and the often noisy distance constraints provide major challenges to this approach. The main contribution of this article is a new SDP formulation for the EDM approach that overcomes these two difficulties. We model the protein as a set of intersecting two- and three-dimensional cliques. Then, we adapt and extend a technique called semidefinite facial reduction to reduce the SDP problem size to approximately one quarter of the size of the original problem. The reduced SDP problem can be solved approximately 100 times faster, and it is also more resistant to numerical problems from erroneous and inexact distance bounds.
Collapse
Affiliation(s)
- Babak Alipanahi
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
| | | | | | | | | | | |
Collapse
|
33
|
Kumar D, Gautam A, Hosur RV. A unified NMR strategy for high-throughput determination of backbone fold of small proteins. ACTA ACUST UNITED AC 2012; 13:201-12. [DOI: 10.1007/s10969-012-9144-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2012] [Accepted: 09/18/2012] [Indexed: 11/30/2022]
|
34
|
Liu Z, Abbas A, Jing BY, Gao X. WaVPeak: picking NMR peaks through wavelet-based smoothing and volume-based filtering. Bioinformatics 2012; 28:914-20. [PMID: 22328784 PMCID: PMC3315717 DOI: 10.1093/bioinformatics/bts078] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2011] [Revised: 01/16/2012] [Accepted: 02/08/2012] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Nuclear magnetic resonance (NMR) has been widely used as a powerful tool to determine the 3D structures of proteins in vivo. However, the post-spectra processing stage of NMR structure determination usually involves a tremendous amount of time and expert knowledge, which includes peak picking, chemical shift assignment and structure calculation steps. Detecting accurate peaks from the NMR spectra is a prerequisite for all following steps, and thus remains a key problem in automatic NMR structure determination. RESULTS We introduce WaVPeak, a fully automatic peak detection method. WaVPeak first smoothes the given NMR spectrum by wavelets. The peaks are then identified as the local maxima. The false positive peaks are filtered out efficiently by considering the volume of the peaks. WaVPeak has two major advantages over the state-of-the-art peak-picking methods. First, through wavelet-based smoothing, WaVPeak does not eliminate any data point in the spectra. Therefore, WaVPeak is able to detect weak peaks that are embedded in the noise level. NMR spectroscopists need the most help isolating these weak peaks. Second, WaVPeak estimates the volume of the peaks to filter the false positives. This is more reliable than intensity-based filters that are widely used in existing methods. We evaluate the performance of WaVPeak on the benchmark set proposed by PICKY (Alipanahi et al., 2009), one of the most accurate methods in the literature. The dataset comprises 32 2D and 3D spectra from eight different proteins. Experimental results demonstrate that WaVPeak achieves an average of 96%, 91%, 88%, 76% and 85% recall on (15)N-HSQC, HNCO, HNCA, HNCACB and CBCA(CO)NH, respectively. When the same number of peaks are considered, WaVPeak significantly outperforms PICKY. AVAILABILITY WaVPeak is an open source program. The source code and two test spectra of WaVPeak are available at http://faculty.kaust.edu.sa/sites/xingao/Pages/Publications.aspx. The online server is under construction. CONTACT statliuzhi@xmu.edu.cn; ahmed.abbas@kaust.edu.sa; majing@ust.hk; xin.gao@kaust.edu.sa.
Collapse
Affiliation(s)
- Zhi Liu
- The Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen 361000, China
| | | | | | | |
Collapse
|
35
|
Tomlinson JH, Williamson MP. Amide temperature coefficients in the protein G B1 domain. JOURNAL OF BIOMOLECULAR NMR 2012; 52:57-64. [PMID: 22076570 DOI: 10.1007/s10858-011-9583-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Accepted: 10/06/2011] [Indexed: 05/31/2023]
Abstract
Temperature coefficients have been measured for backbone amide (1)H and (15)N nuclei in the B1 domain of protein G (GB1), using temperatures in the range 283-313 K, and pH values from 2.0 to 9.0. Many nuclei display pH-dependent coefficients, which were fitted to one or two pK(a) values. (1)H coefficients showed the expected behaviour, in that hydrogen-bonded amides have less negative values, but for those amides involved in strong hydrogen bonds in regular secondary structure there is a negative correlation between strength of hydrogen bond and size of temperature coefficient. The best correlation to temperature coefficient is with secondary shift, indicative of a very approximately uniform thermal expansion. The largest pH-dependent changes in coefficient are for amides in loops adjacent to sidechain hydrogen bonds rather than the amides involved directly in hydrogen bonds, indicating that the biggest determinant of the temperature coefficient is temperature-dependent loss of structure, not hydrogen bonding. Amide (15)N coefficients have no clear relationship with structure.
Collapse
Affiliation(s)
- Jennifer H Tomlinson
- Department of Molecular Biology and Biotechnology, University of Sheffield, Firth Court, Western Bank, Sheffield, S10 2TN, UK
| | | |
Collapse
|
36
|
Guerry P, Herrmann T. Comprehensive automation for NMR structure determination of proteins. Methods Mol Biol 2012; 831:429-51. [PMID: 22167686 DOI: 10.1007/978-1-61779-480-3_22] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
This chapter gives an overview of automated protein structure determination by nuclear magnetic resonance (NMR) with the UNIO protocol that enables high to full automation of all NMR data analysis steps involved. Four established algorithms, namely, the MATCH algorithm for sequence-specific resonance assignment, the ASCAN algorithm for side-chain resonance assignment, the CANDID algorithm for NOE assignment, and the ATNOS algorithm for signal identification in NMR spectra, are assembled into three principal UNIO NMR data analysis components (MATCH, ATNOS/ASCAN, and ATNOS/CANDID) that are accessed thanks to a particularly intuitive and flexible, yet powerful graphical user interface (GUI). UNIO is designed to work independently or in association with other NMR software. The principal data analysis components for sequence-specific backbone, side-chain and NOE assignment may be run separately or out of sequence. User-intervention at individual stages is encouraged and facilitated by graphical tools included for the preparation, analysis, validation, and subsequent presentation of the NMR structure.
Collapse
Affiliation(s)
- Paul Guerry
- Centre Européen de RMN à très Hauts Champs, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, Université Claude, Villeurbanne, France
| | | |
Collapse
|
37
|
Zhao L, Liu Z, Cao Z, Liu H, Wang J. Determination of thermal intermediate state ensemble of box 5 with restrained molecular dynamics simulations. COMPUT THEOR CHEM 2011. [DOI: 10.1016/j.comptc.2011.10.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
38
|
Olsson S, Boomsma W, Frellsen J, Bottaro S, Harder T, Ferkinghoff-Borg J, Hamelryck T. Generative probabilistic models extend the scope of inferential structure determination. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2011; 213:182-186. [PMID: 21993764 DOI: 10.1016/j.jmr.2011.08.039] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Revised: 08/19/2011] [Accepted: 08/30/2011] [Indexed: 05/31/2023]
Abstract
Conventional methods for protein structure determination from NMR data rely on the ad hoc combination of physical forcefields and experimental data, along with heuristic determination of free parameters such as weight of experimental data relative to a physical forcefield. Recently, a theoretically rigorous approach was developed which treats structure determination as a problem of Bayesian inference. In this case, the forcefields are brought in as a prior distribution in the form of a Boltzmann factor. Due to high computational cost, the approach has been only sparsely applied in practice. Here, we demonstrate that the use of generative probabilistic models instead of physical forcefields in the Bayesian formalism is not only conceptually attractive, but also improves precision and efficiency. Our results open new vistas for the use of sophisticated probabilistic models of biomolecular structure in structure determination from experimental data.
Collapse
Affiliation(s)
- Simon Olsson
- Bioinformatics Center, University of Copenhagen, Department of Biology, Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| | | | | | | | | | | | | |
Collapse
|
39
|
Jang R, Gao X, Li M. Towards fully automated structure-based NMR resonance assignment of ¹⁵N-labeled proteins from automatically picked peaks. J Comput Biol 2011; 18:347-63. [PMID: 21385039 DOI: 10.1089/cmb.2010.0251] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In NMR resonance assignment, an indispensable step in NMR protein studies, manually processed peaks from both N-labeled and C-labeled spectra are typically used as inputs. However, the use of homologous structures can allow one to use only N-labeled NMR data and avoid the added expense of using C-labeled data. We propose a novel integer programming framework for structure-based backbone resonance assignment using N-labeled data. The core consists of a pair of integer programming models: one for spin system forming and amino acid typing, and the other for backbone resonance assignment. The goal is to perform the assignment directly from spectra without any manual intervention via automatically picked peaks, which are much noisier than manually picked peaks, so methods must be error-tolerant. In the case of semi-automated/manually processed peak data, we compare our system with the Xiong-Pandurangan-Bailey-Kellogg's contact replacement (CR) method, which is the most error-tolerant method for structure-based resonance assignment. Our system, on average, reduces the error rate of the CR method by five folds on their data set. In addition, by using an iterative algorithm, our system has the added capability of using the NOESY data to correct assignment errors due to errors in predicting the amino acid and secondary structure type of each spin system. On a publicly available data set for human ubiquitin, where the typing accuracy is 83%, we achieve 91% accuracy, compared to the 59% accuracy obtained without correcting for such errors. In the case of automatically picked peaks, using assignment information from yeast ubiquitin, we achieve a fully automatic assignment with 97% accuracy. To our knowledge, this is the first system that can achieve fully automatic structure-based assignment directly from spectra. This has implications in NMR protein mutant studies, where the assignment step is repeated for each mutant.
Collapse
Affiliation(s)
- Richard Jang
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
| | | | | |
Collapse
|
40
|
Breukels V, Konijnenberg A, Nabuurs SM, Doreleijers JF, Kovalevskaya NV, Vuister GW. Overview on the use of NMR to examine protein structure. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2011; Chapter 17:Unit17.5. [PMID: 21488042 DOI: 10.1002/0471140864.ps1705s64] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Any protein structure determination process contains several steps, starting from obtaining a suitable sample, then moving on to acquiring data and spectral assignment, and lastly to the final steps of structure determination and validation. This unit describes all of these steps, starting with the basic physical principles behind NMR and some of the most commonly measured and observed phenomena such as chemical shift, scalar and residual coupling, and the nuclear Overhauser effect. Then, in somewhat more detail, the process of spectral assignment and structure elucidation is explained. Furthermore, the use of NMR to study protein-ligand interaction, protein dynamics, or protein folding is described.
Collapse
Affiliation(s)
- Vincent Breukels
- Protein Biophysics, Institute for Molecules and Materials, Radboud University Nijmegen, Nijmegen, The Netherlands
| | | | | | | | | | | |
Collapse
|
41
|
Abstract
Around half of all protein structures solved nowadays using solution-state nuclear magnetic resonance (NMR) spectroscopy have been because of automated data analysis. The pervasiveness of computational approaches in general hides, however, a more nuanced view in which the full variety and richness of the field appears. This review is structured around a comparison of methods associated with three NMR observables: classical nuclear Overhauser effect (NOE) constraint gathering in contrast with more recent chemical shift and residual dipole coupling (RDC) based protocols. In each case, the emphasis is placed on the latest research, covering mainly the past 5 years. By describing both general concepts and representative programs, the objective is to map out a field in which--through the very profusion of approaches--it is all too easy to lose one's bearings.
Collapse
|
42
|
Ziarek JJ, Peterson FC, Lytle BL, Volkman BF. Binding site identification and structure determination of protein-ligand complexes by NMR a semiautomated approach. Methods Enzymol 2011; 493:241-75. [PMID: 21371594 PMCID: PMC3635485 DOI: 10.1016/b978-0-12-381274-2.00010-8] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Over the last 15 years, the role of NMR spectroscopy in the lead identification and optimization stages of pharmaceutical drug discovery has steadily increased. NMR occupies a unique niche in the biophysical analysis of drug-like compounds because of its ability to identify binding sites, affinities, and ligand poses at the level of individual amino acids without necessarily solving the structure of the protein-ligand complex. However, it can also provide structures of flexible proteins and low-affinity (K(d)>10(-6)M) complexes, which often fail to crystallize. This chapter emphasizes a throughput-focused protocol that aims to identify practical aspects of binding site characterization, automated and semiautomated NMR assignment methods, and structure determination of protein-ligand complexes by NMR.
Collapse
Affiliation(s)
- Joshua J. Ziarek
- Department of Biochemistry, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, Wisconsin, 53226 USA
| | - Francis C. Peterson
- Department of Biochemistry, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, Wisconsin, 53226 USA
| | - Betsy L. Lytle
- Department of Biochemistry, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, Wisconsin, 53226 USA
| | - Brian F. Volkman
- Department of Biochemistry, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, Wisconsin, 53226 USA
| |
Collapse
|
43
|
|
44
|
Crippen GM, Rousaki A, Revington M, Zhang Y, Zuiderweg ERP. SAGA: rapid automatic mainchain NMR assignment for large proteins. JOURNAL OF BIOMOLECULAR NMR 2010; 46:281-298. [PMID: 20232231 DOI: 10.1007/s10858-010-9403-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Accepted: 02/23/2010] [Indexed: 05/26/2023]
Abstract
Here we describe a new algorithm for automatically determining the mainchain sequential assignment of NMR spectra for proteins. Using only the customary triple resonance experiments, assignments can be quickly found for not only small proteins having rather complete data, but also for large proteins, even when only half the residues can be assigned. The result of the calculation is not the single best assignment according to some criterion, but rather a large number of satisfactory assignments that are summarized in such a way as to help the user identify portions of the sequence that are assigned with confidence, vs. other portions where the assignment has some correlated alternatives. Thus very imperfect initial data can be used to suggest future experiments.
Collapse
Affiliation(s)
- Gordon M Crippen
- College of Pharmacy, University of Michigan, Ann Arbor, MI 48109, USA.
| | | | | | | | | |
Collapse
|
45
|
Stratmann D, Guittet E, van Heijenoort C. Robust structure-based resonance assignment for functional protein studies by NMR. JOURNAL OF BIOMOLECULAR NMR 2010; 46:157-73. [PMID: 20024602 PMCID: PMC2813526 DOI: 10.1007/s10858-009-9390-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2009] [Accepted: 11/04/2009] [Indexed: 05/20/2023]
Abstract
High-throughput functional protein NMR studies, like protein interactions or dynamics, require an automated approach for the assignment of the protein backbone. With the availability of a growing number of protein 3D structures, a new class of automated approaches, called structure-based assignment, has been developed quite recently. Structure-based approaches use primarily NMR input data that are not based on J-coupling and for which connections between residues are not limited by through bonds magnetization transfer efficiency. We present here a robust structure-based assignment approach using mainly H(N)-H(N) NOEs networks, as well as (1)H-(15) N residual dipolar couplings and chemical shifts. The NOEnet complete search algorithm is robust against assignment errors, even for sparse input data. Instead of a unique and partly erroneous assignment solution, an optimal assignment ensemble with an accuracy equal or near to 100% is given by NOEnet. We show that even low precision assignment ensembles give enough information for functional studies, like modeling of protein-complexes. Finally, the combination of NOEnet with a low number of ambiguous J-coupling sequential connectivities yields a high precision assignment ensemble. NOEnet will be available under: http://www.icsn.cnrs-gif.fr/download/nmr.
Collapse
Affiliation(s)
- Dirk Stratmann
- NMR, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Eric Guittet
- Centre de Recherche de Gif, Laboratoire de Chimie et Biologie Structurales ICSN-CNRS, 1, av. de la terrasse, 91190 Gif-sur-Yvette, France
| | - Carine van Heijenoort
- Centre de Recherche de Gif, Laboratoire de Chimie et Biologie Structurales ICSN-CNRS, 1, av. de la terrasse, 91190 Gif-sur-Yvette, France
| |
Collapse
|
46
|
Abstract
The main drawback of protein NMR spectroscopy today is still the extensive amount of time required for solving a single structure. The main bottleneck in this respect is the manual evaluation of the experimental spectra. A clear solution to this challenge is the development of automated methods for this purpose. At the current stage of development, this goal has been almost or in a few cases fully reached for favorable cases such as well-behaved, stably folding smaller proteins below the 25 kDa range. For larger and/or more difficult molecules, the input of a human expert is still required. However, even here, automated routines will substantially speed up the structure determination process. In this report, we will summarize recent developments in this field and especially emphasize practical aspects important for a successful automated protein structure determination in solution. An important aspect closely related to structure determination is structure validation. Therefore, we devote a section to automated approaches for this topic.
Collapse
Affiliation(s)
- Wolfram Gronwald
- Institute for Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | | |
Collapse
|
47
|
Vila JA, Arnautova YA, Martin OA, Scheraga HA. Quantum-mechanics-derived 13Calpha chemical shift server (CheShift) for protein structure validation. Proc Natl Acad Sci U S A 2009; 106:16972-7. [PMID: 19805131 PMCID: PMC2761357 DOI: 10.1073/pnas.0908833106] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2009] [Indexed: 11/18/2022] Open
Abstract
A server (CheShift) has been developed to predict (13)C(alpha) chemical shifts of protein structures. It is based on the generation of 696,916 conformations as a function of the phi, psi, omega, chi1 and chi2 torsional angles for all 20 naturally occurring amino acids. Their (13)C(alpha) chemical shifts were computed at the DFT level of theory with a small basis set and extrapolated, with an empirically-determined linear regression formula, to reproduce the values obtained with a larger basis set. Analysis of the accuracy and sensitivity of the CheShift predictions, in terms of both the correlation coefficient R and the conformational-averaged rmsd between the observed and predicted (13)C(alpha) chemical shifts, was carried out for 3 sets of conformations: (i) 36 x-ray-derived protein structures solved at 2.3 A or better resolution, for which sets of (13)C(alpha) chemical shifts were available; (ii) 15 pairs of x-ray and NMR-derived sets of protein conformations; and (iii) a set of decoys for 3 proteins showing an rmsd with respect to the x-ray structure from which they were derived of up to 3 A. Comparative analysis carried out with 4 popular servers, namely SHIFTS, SHIFTX, SPARTA, and PROSHIFT, for these 3 sets of conformations demonstrated that CheShift is the most sensitive server with which to detect subtle differences between protein models and, hence, to validate protein structures determined by either x-ray or NMR methods, if the observed (13)C(alpha) chemical shifts are available. CheShift is available as a web server.
Collapse
Affiliation(s)
- Jorge A. Vila
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca NY, 14853-1301; and
- Universidad Nacional de San Luis, Instituto de Matemática Aplicada de San Luis-Consejo Nacional de Investigaciones Cientificas y Técnicas, Ejército de Los Andes 950-5700 San Luis, Argentina
| | - Yelena A. Arnautova
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca NY, 14853-1301; and
| | - Osvaldo A. Martin
- Universidad Nacional de San Luis, Instituto de Matemática Aplicada de San Luis-Consejo Nacional de Investigaciones Cientificas y Técnicas, Ejército de Los Andes 950-5700 San Luis, Argentina
| | - Harold A. Scheraga
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca NY, 14853-1301; and
| |
Collapse
|
48
|
Ikeya T, Takeda M, Yoshida H, Terauchi T, Jee JG, Kainosho M, Güntert P. Automated NMR structure determination of stereo-array isotope labeled ubiquitin from minimal sets of spectra using the SAIL-FLYA system. JOURNAL OF BIOMOLECULAR NMR 2009; 44:261-72. [PMID: 19597942 DOI: 10.1007/s10858-009-9339-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2009] [Accepted: 06/24/2009] [Indexed: 05/05/2023]
Abstract
Stereo-array isotope labeling (SAIL) has been combined with the fully automated NMR structure determination algorithm FLYA to determine the three-dimensional structure of the protein ubiquitin from different sets of input NMR spectra. SAIL provides a complete stereo- and regio-specific pattern of stable isotopes that results in sharper resonance lines and reduced signal overlap, without information loss. Here we show that as a result of the superior quality of the SAIL NMR spectra, reliable, fully automated analyses of the NMR spectra and structure calculations are possible using fewer input spectra than with conventional uniformly 13C/15N-labeled proteins. FLYA calculations with SAIL ubiquitin, using a single three-dimensional "through-bond" spectrum (and 2D HSQC spectra) in addition to the 13C-edited and 15N-edited NOESY spectra for conformational restraints, yielded structures with an accuracy of 0.83-1.15 A for the backbone RMSD to the conventionally determined solution structure of SAIL ubiquitin. NMR structures can thus be determined almost exclusively from the NOESY spectra that yield the conformational restraints, without the need to record many spectra only for determining intermediate, auxiliary data of the chemical shift assignments. The FLYA calculations for this report resulted in 252 ubiquitin structure bundles, obtained with different input data but identical structure calculation and refinement methods. These structures cover the entire range from highly accurate structures to seriously, but not trivially, wrong structures, and thus constitute a valuable database for the substantiation of structure validation methods.
Collapse
Affiliation(s)
- Teppei Ikeya
- Institute of Biophysical Chemistry, Center for Biomolecular Magnetic Resonance, Goethe University Frankfurt am Main, Max-von-Laue-Str. 9, 60438 Frankfurt am Main, Germany
| | | | | | | | | | | | | |
Collapse
|
49
|
Abstract
MOTIVATION Picking peaks from experimental NMR spectra is a key unsolved problem for automated NMR protein structure determination. Such a process is a prerequisite for resonance assignment, nuclear overhauser enhancement (NOE) distance restraint assignment, and structure calculation tasks. Manual or semi-automatic peak picking, which is currently the prominent way used in NMR labs, is tedious, time consuming and costly. RESULTS We introduce new ideas, including noise-level estimation, component forming and sub-division, singular value decomposition (SVD)-based peak picking and peak pruning and refinement. PICKY is developed as an automated peak picking method. Different from the previous research on peak picking, we provide a systematic study of the proposed method. PICKY is tested on 32 real 2D and 3D spectra of eight target proteins, and achieves an average of 88% recall and 74% precision. PICKY is efficient. It takes PICKY on average 15.7 s to process an NMR spectrum. More important than these numbers, PICKY actually works in practice. We feed peak lists generated by PICKY to IPASS for resonance assignment, feed IPASS assignment to SPARTA for fragments generation, and feed SPARTA fragments to FALCON for structure calculation. This results in high-resolution structures of several proteins, for example, TM1112, at 1.25 A. AVAILABILITY PICKY is available upon request. The peak lists of PICKY can be easily loaded by SPARKY to enable a better interactive strategy for rapid peak picking.
Collapse
Affiliation(s)
- Babak Alipanahi
- David R.Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada
| | | | | | | | | |
Collapse
|