1
|
Nikita S, Bhattacharya S, Manocha K, Rathore AS. Deep learning framework for peak detection at the intact level of therapeutic proteins. J Sep Sci 2024; 47:e2400051. [PMID: 38819868 DOI: 10.1002/jssc.202400051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 05/14/2024] [Accepted: 05/21/2024] [Indexed: 06/01/2024]
Abstract
While automated peak detection functionalities are available in commercially accessible software, achieving optimal true positive rates frequently necessitates visual inspection and manual adjustments. In the initial phase of this study, hetero-variants (glycoforms) of a monoclonal antibody were distinguished using liquid chromatography-mass spectrometry, revealing discernible peaks at the intact level. To comprehensively identify each peak (hetero-variant) in the intact-level analysis, a deep learning approach utilizing convolutional neural networks (CNNs) was employed in the subsequent phase of the study. In the current case study, utilizing conventional software for peak identification, five peaks were detected using a 0.5 threshold, whereas seven peaks were identified using the CNN model. The model exhibited strong performance with a probability area under the curve (AUC) of 0.9949, surpassing that of partial least squares discriminant analysis (PLS-DA) (probability AUC of 0.8041), and locally weighted regression (LWR) (probability AUC of 0.6885) on the data acquired during experimentation in real-time. The AUC of the receiver operating characteristic curve also illustrated the superior performance of the CNN over PLS-DA and LWR.
Collapse
Affiliation(s)
- Saxena Nikita
- Department of Chemical Engineering, Indian Institute of Technology, Delhi, India
| | | | - Kriti Manocha
- Department of Chemical Engineering, Indian Institute of Technology, Delhi, India
| | - Anurag S Rathore
- Department of Chemical Engineering, Indian Institute of Technology, Delhi, India
| |
Collapse
|
2
|
Klukowski P, Damberger FF, Allain FHT, Iwai H, Kadavath H, Ramelot TA, Montelione GT, Riek R, Güntert P. The 100-protein NMR spectra dataset: A resource for biomolecular NMR data analysis. Sci Data 2024; 11:30. [PMID: 38177162 PMCID: PMC10767026 DOI: 10.1038/s41597-023-02879-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/22/2023] [Indexed: 01/06/2024] Open
Abstract
Multidimensional NMR spectra are the basis for studying proteins by NMR spectroscopy and crucial for the development and evaluation of methods for biomolecular NMR data analysis. Nevertheless, in contrast to derived data such as chemical shift assignments in the BMRB and protein structures in the PDB databases, this primary data is in general not publicly archived. To change this unsatisfactory situation, we present a standardized set of solution NMR data comprising 1329 2-4-dimensional NMR spectra and associated reference (chemical shift assignments, structures) and derived (peak lists, restraints for structure calculation, etc.) annotations. With the 100-protein NMR spectra dataset that was originally compiled for the development of the ARTINA deep learning-based spectra analysis method, 100 protein structures can be reproduced from their original experimental data. The 100-protein NMR spectra dataset is expected to help the development of computational methods for NMR spectroscopy, in particular machine learning approaches, and enable consistent and objective comparisons of these methods.
Collapse
Affiliation(s)
- Piotr Klukowski
- Institute of Molecular Physical Science, ETH Zurich, 8093, Zurich, Switzerland.
| | - Fred F Damberger
- Institute of Biochemistry, ETH Zurich, 8093, Zurich, Switzerland
| | | | - Hideo Iwai
- Institute of Biotechnology, University of Helsinki, 00100, Helsinki, Finland
| | | | - Theresa A Ramelot
- Department of Chemistry and Chemical Biology, and Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
| | - Gaetano T Montelione
- Department of Chemistry and Chemical Biology, and Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
| | - Roland Riek
- Institute of Molecular Physical Science, ETH Zurich, 8093, Zurich, Switzerland.
| | - Peter Güntert
- Institute of Molecular Physical Science, ETH Zurich, 8093, Zurich, Switzerland.
- Institute of Biophysical Chemistry, Goethe University, 60438, Frankfurt am Main, Germany.
- Department of Chemistry, Tokyo Metropolitan University, Hachioji, 192-0397, Tokyo, Japan.
| |
Collapse
|
3
|
Chai X, Liu C, Fan X, Huang T, Zhang X, Jiang B, Liu M. Combination of peak-picking and binning for NMR-based untargeted metabonomics study. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2023; 351:107429. [PMID: 37099854 DOI: 10.1016/j.jmr.2023.107429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 03/27/2023] [Accepted: 03/29/2023] [Indexed: 05/29/2023]
Abstract
In NMR-based untargeted metabolomic studies, 1H NMR spectra are usually divided into equal bins/buckets to diminish the effects of peak shift caused by sample status or instrument instability, and to reduce the number of variables used as input for the multivariate statistical analysis. It was noticed that the peaks near bin boundaries may cause significant changes in integral values of adjacent bins, and the weaker peak may be obscured if it is allocated in the same bin with intense peaks. Several efforts have been taken to improve the performance of binning. Here we propose an alternative method, named P-Bin, based on the combination of the classic peak-picking and binning procedures. The location of each peak defined by peak-picking is used as the center of the individual bin. P-Bin is expected to keep all spectral information associated with the peaks and significantly reduce the data size as the spectral regions without peaks are not considered. In addition, both peak-picking and binning are routine procedures, making P-Bin easy to be implemented. To verify the performance, two sets of experimental data from human plasma and Ganoderma lucidum (G. lucidum) extracts were processed using the conventional binning method and the proposed method, before the principal component analysis (PCA) and the orthogonal projection to latent structures discriminant analysis (OPLS-DA). The results indicate that the proposed method has improved both the clustering performance of PCA score plots and the interpretability of OPLS-DA loading plots, and P-Bin could be an improved version of data preparation for metabonomic study.
Collapse
Affiliation(s)
- Xin Chai
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement of Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Caixiang Liu
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement of Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xinyu Fan
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement of Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; State Key Laboratory of Component-based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China
| | - Tao Huang
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement of Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xu Zhang
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement of Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China; Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430074, China; Optics Valley Laboratory, Wuhan 430074, China
| | - Bin Jiang
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement of Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China; Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430074, China; Optics Valley Laboratory, Wuhan 430074, China.
| | - Maili Liu
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement of Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China; Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430074, China; Optics Valley Laboratory, Wuhan 430074, China.
| |
Collapse
|
4
|
Schmid N, Bruderer S, Paruzzo F, Fischetti G, Toscano G, Graf D, Fey M, Henrici A, Ziebart V, Heitmann B, Grabner H, Wegner JD, Sigel RKO, Wilhelm D. Deconvolution of 1D NMR spectra: A deep learning-based approach. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2023; 347:107357. [PMID: 36563418 DOI: 10.1016/j.jmr.2022.107357] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 12/01/2022] [Accepted: 12/04/2022] [Indexed: 06/17/2023]
Abstract
The analysis of nuclear magnetic resonance (NMR) spectra to detect peaks and characterize their parameters, often referred to as deconvolution, is a crucial step in the quantification, elucidation, and verification of the structure of molecular systems. However, deconvolution of 1D NMR spectra is a challenge for both experts and machines. We propose a robust, expert-level quality deep learning-based deconvolution algorithm for 1D experimental NMR spectra. The algorithm is based on a neural network trained on synthetic spectra. Our customized pre-processing and labeling of the synthetic spectra enable the estimation of critical peak parameters. Furthermore, the neural network model transfers well to the experimental spectra and demonstrates low fitting errors and sparse peak lists in challenging scenarios such as crowded, high dynamic range, shoulder peak regions as well as broad peaks. We demonstrate in challenging spectra that the proposed algorithm is superior to expert results.
Collapse
Affiliation(s)
- N Schmid
- Zurich University of Applied Sciences (ZHAW), Switzerland; University of Zurich (UZH), Switzerland.
| | | | | | | | | | - D Graf
- Bruker Switzerland AG, Switzerland
| | - M Fey
- Bruker Switzerland AG, Switzerland
| | - A Henrici
- Zurich University of Applied Sciences (ZHAW), Switzerland
| | - V Ziebart
- Zurich University of Applied Sciences (ZHAW), Switzerland
| | | | - H Grabner
- Zurich University of Applied Sciences (ZHAW), Switzerland
| | | | | | - D Wilhelm
- Zurich University of Applied Sciences (ZHAW), Switzerland
| |
Collapse
|
5
|
Li DW, Hansen AL, Bruschweiler-Li L, Yuan C, Brüschweiler R. Fundamental and practical aspects of machine learning for the peak picking of biomolecular NMR spectra. JOURNAL OF BIOMOLECULAR NMR 2022; 76:49-57. [PMID: 35389128 PMCID: PMC9246764 DOI: 10.1007/s10858-022-00393-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 02/28/2022] [Indexed: 06/14/2023]
Abstract
Rapid progress in machine learning offers new opportunities for the automated analysis of multidimensional NMR spectra ranging from protein NMR to metabolomics applications. Most recently, it has been demonstrated how deep neural networks (DNN) designed for spectral peak picking are capable of deconvoluting highly crowded NMR spectra rivaling the facilities of human experts. Superior DNN-based peak picking is one of a series of critical steps during NMR spectral processing, analysis, and interpretation where machine learning is expected to have a major impact. In this perspective, we lay out some of the unique strengths as well as challenges of machine learning approaches in this new era of automated NMR spectral analysis. Such a discussion seems timely and should help define common goals for the NMR community, the sharing of software tools, standardization of protocols, and calibrate expectations. It will also help prepare for an NMR future where machine learning and artificial intelligence tools will be common place.
Collapse
Affiliation(s)
- Da-Wei Li
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, 43210, USA.
| | - Alexandar L Hansen
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Lei Bruschweiler-Li
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Chunhua Yuan
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Rafael Brüschweiler
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, 43210, USA.
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH, 43210, USA.
- Department of Biological Chemistry and Pharmacology, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
6
|
Laveglia V, Giachetti A, Cerofolini L, Haubrich K, Fragai M, Ciulli A, Rosato A. Automated Determination of Nuclear Magnetic Resonance Chemical Shift Perturbations in Ligand Screening Experiments: The PICASSO Web Server. J Chem Inf Model 2021; 61:5726-5733. [PMID: 34843238 PMCID: PMC8715503 DOI: 10.1021/acs.jcim.1c00871] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Indexed: 11/28/2022]
Abstract
Nuclear magnetic resonance (NMR) is an effective, commonly used experimental approach to screen small organic molecules against a protein target. A very popular method consists of monitoring the changes of the NMR chemical shifts of the protein nuclei upon addition of the small molecule to the free protein. Multidimensional NMR experiments allow the interacting residues to be mapped along the protein sequence. A significant amount of human effort goes into manually tracking the chemical shift variations, especially when many signals exhibit chemical shift changes and when many ligands are tested. Some computational approaches to automate the procedure are available, but none of them as a web server. Furthermore, some methods require the adoption of a fairly specific experimental setup, such as recording a series of spectra at increasing small molecule:protein ratios. In this work, we developed a tool requesting a minimal amount of experimental data from the user, implemented it as an open-source program, and made it available as a web application. Our tool compares two spectra, one of the free protein and one of the small molecule:protein mixture, based on the corresponding peak lists. The performance of the tool in terms of correct identification of the protein-binding regions has been evaluated on different protein targets, using experimental data from interaction studies already available in the literature. For a total of 16 systems, our tool achieved between 79% and 100% correct assignments, properly identifying the protein regions involved in the interaction.
Collapse
Affiliation(s)
- Vincenzo Laveglia
- Consorzio
Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Andrea Giachetti
- Consorzio
Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Linda Cerofolini
- Consorzio
Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Kevin Haubrich
- School
of Life Sciences, Division of Biological Chemistry and Drug Discovery, The University of Dundee, James Black Centre, Dow Street, DD1 5EH, Dundee, United Kingdom
| | - Marco Fragai
- Consorzio
Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Magnetic
Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department
of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Alessio Ciulli
- School
of Life Sciences, Division of Biological Chemistry and Drug Discovery, The University of Dundee, James Black Centre, Dow Street, DD1 5EH, Dundee, United Kingdom
| | - Antonio Rosato
- Consorzio
Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Magnetic
Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department
of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| |
Collapse
|
7
|
Li DW, Hansen AL, Yuan C, Bruschweiler-Li L, Brüschweiler R. DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra. Nat Commun 2021; 12:5229. [PMID: 34471142 PMCID: PMC8410766 DOI: 10.1038/s41467-021-25496-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Accepted: 08/09/2021] [Indexed: 11/09/2022] Open
Abstract
The analysis of nuclear magnetic resonance (NMR) spectra for the comprehensive and unambiguous identification and characterization of peaks is a difficult, but critically important step in all NMR analyses of complex biological molecular systems. Here, we introduce DEEP Picker, a deep neural network (DNN)-based approach for peak picking and spectral deconvolution which semi-automates the analysis of two-dimensional NMR spectra. DEEP Picker includes 8 hidden convolutional layers and was trained on a large number of synthetic spectra of known composition with variable degrees of crowdedness. We show that our method is able to correctly identify overlapping peaks, including ones that are challenging for expert spectroscopists and existing computational methods alike. We demonstrate the utility of DEEP Picker on NMR spectra of folded and intrinsically disordered proteins as well as a complex metabolomics mixture, and show how it provides access to valuable NMR information. DEEP Picker should facilitate the semi-automation and standardization of protocols for better consistency and sharing of results within the scientific community.
Collapse
Affiliation(s)
- Da-Wei Li
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, USA.
| | - Alexandar L Hansen
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, USA
| | - Chunhua Yuan
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, USA
| | - Lei Bruschweiler-Li
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, USA
| | - Rafael Brüschweiler
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, USA.
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH, USA.
- Department of Biological Chemistry and Pharmacology, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
8
|
Vincenzi M, Mercurio FA, Leone M. NMR Spectroscopy in the Conformational Analysis of Peptides: An Overview. Curr Med Chem 2021; 28:2729-2782. [PMID: 32614739 DOI: 10.2174/0929867327666200702131032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 05/21/2020] [Accepted: 05/28/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND NMR spectroscopy is one of the most powerful tools to study the structure and interaction properties of peptides and proteins from a dynamic perspective. Knowing the bioactive conformations of peptides is crucial in the drug discovery field to design more efficient analogue ligands and inhibitors of protein-protein interactions targeting therapeutically relevant systems. OBJECTIVE This review provides a toolkit to investigate peptide conformational properties by NMR. METHODS Articles cited herein, related to NMR studies of peptides and proteins were mainly searched through PubMed and the web. More recent and old books on NMR spectroscopy written by eminent scientists in the field were consulted as well. RESULTS The review is mainly focused on NMR tools to gain the 3D structure of small unlabeled peptides. It is more application-oriented as it is beyond its goal to deliver a profound theoretical background. However, the basic principles of 2D homonuclear and heteronuclear experiments are briefly described. Protocols to obtain isotopically labeled peptides and principal triple resonance experiments needed to study them, are discussed as well. CONCLUSION NMR is a leading technique in the study of conformational preferences of small flexible peptides whose structure can be often only described by an ensemble of conformations. Although NMR studies of peptides can be easily and fast performed by canonical protocols established a few decades ago, more recently we have assisted to tremendous improvements of NMR spectroscopy to investigate instead large systems and overcome its molecular weight limit.
Collapse
Affiliation(s)
- Marian Vincenzi
- Institute of Biostructures and Bioimaging, National Research Council of Italy, Via Mezzocannone 16, 80134, Naples, Italy
| | - Flavia Anna Mercurio
- Institute of Biostructures and Bioimaging, National Research Council of Italy, Via Mezzocannone 16, 80134, Naples, Italy
| | - Marilisa Leone
- Institute of Biostructures and Bioimaging, National Research Council of Italy, Via Mezzocannone 16, 80134, Naples, Italy
| |
Collapse
|
9
|
Chen D, Wang Z, Guo D, Orekhov V, Qu X. Review and Prospect: Deep Learning in Nuclear Magnetic Resonance Spectroscopy. Chemistry 2020; 26:10391-10401. [PMID: 32251549 DOI: 10.1002/chem.202000246] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 04/03/2020] [Indexed: 01/08/2023]
Abstract
Since the concept of deep learning (DL) was formally proposed in 2006, it has had a major impact on academic research and industry. Nowadays, DL provides an unprecedented way to analyze and process data with demonstrated great results in computer vision, medical imaging, natural language processing, and so forth. Herein, applications of DL in NMR spectroscopy are summarized, and a perspective for DL as an entirely new approach that is likely to transform NMR spectroscopy into a much more efficient and powerful technique in chemistry and life sciences is outlined.
Collapse
Affiliation(s)
- Dicheng Chen
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, P.O. Box 979, Xiamen, 361005, P.R. China
| | - Zi Wang
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, P.O. Box 979, Xiamen, 361005, P.R. China
| | - Di Guo
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, 361024, P.R. China
| | - Vladislav Orekhov
- Department of Chemistry and Molecular Biology, University of Gothenburg, Box 465, Gothenburg, 40530, Sweden
| | - Xiaobo Qu
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, P.O. Box 979, Xiamen, 361005, P.R. China
| |
Collapse
|
10
|
Sheberstov KF, Sistaré Guardiola E, Pupier M, Jeannerat D. SAN plot: A graphical representation of the signal, noise, and artifacts content of spectra. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2020; 58:466-472. [PMID: 31058352 DOI: 10.1002/mrc.4882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Revised: 04/20/2019] [Accepted: 04/30/2019] [Indexed: 06/09/2023]
Abstract
The signal-to-noise ratio is an important property of NMR spectra. It allows to compare the sensitivity of experiments, the performance of hardware, etc. Its measurement is usually done in a rudimentary manner involving manual operation of selecting separately a region of the spectrum with signal and noise, respectively, applying some operation and returning the signal-to-noise ratio. We introduce here a simple method based on the analysis of the distribution of point intensities in one- and two-dimensional spectra. The signal/artifact/noise plots, (SAN plots) allows one to present in a graphical manner qualitative and quantitative information about spectra. It will be shown that besides measuring signal and noise levels, SAN plots are also quite useful to visualize and compare artifacts within a series of spectra. Some basic properties of the SAN plots are illustrated with simple application.
Collapse
Affiliation(s)
| | | | - Marion Pupier
- Department of Organic Chemistry, University of Geneva, Geneva, Switzerland
| | - Damien Jeannerat
- Department of Organic Chemistry, University of Geneva, Geneva, Switzerland
| |
Collapse
|
11
|
Computational methods for NMR and MS for structure elucidation III: More advanced approaches. PHYSICAL SCIENCES REVIEWS 2019. [DOI: 10.1515/psr-2018-0109] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
The structural assignment of natural products, even with the very sophisticated one-dimensional and two-dimensional (1D and 2D) spectroscopic methods available today, is still a tedious and time-consuming task. Mass spectrometry (MS) is generally used for molecular mass determination, molecular formula generation and MS/MSn fragmentation patterns of molecules. In the meantime, nuclear magnetic resonance (NMR) spectroscopy provides spectra (e. g. 1 H, 13C and correlation spectra) whose interpretation allows the structure determination of known or unknown compounds. With the advance of high throughput studies, like metabolomics, the fast and automated identification or annotation of natural products became highly demanded. Some growing tools to meet this demand apply computational methods for structure elucidation. These methods act on characteristic parameters in the structural determination of small molecules. We have numbered and herein present existing and reputed computational methods for peak picking analysis, resonance assignment, nuclear Overhauser effect (NOE) assignment, combinatorial fragmentation and structure calculation and prediction. Fully automated programs in structure determination are also mentioned, together with their integrated algorithms used to elucidate the structure of a metabolite. The use of these automated tools has helped to significantly reduce errors introduced by manual processing and, hence, accelerated the structure identification or annotation of compounds.
Collapse
|
12
|
Alazmi M, Abbas A, Guo X, Fan M, Li L, Gao X. A Slice-based 13C-detected NMR Spin System Forming and Resonance Assignment Method. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1999-2008. [PMID: 29994483 DOI: 10.1109/tcbb.2018.2849728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Nuclear magnetic resonance (NMR) spectroscopy is attracting more attention in the field of computational structural biology. Till recently, 1H-detected experiments are the dominant NMR technique used due to the high sensitivity of 1H nuclei. However, the current availability of high magnetic fields and cryogenically cooled probe heads allow researchers to overcome the low sensitivity of 13C nuclei. Consequently, 13C-detected experiments have become a popular technique in different NMR applications especially resonance assignment and structure determination of large proteins. In this paper, we propose the first spin system forming method for 13C-detected NMR spectra. Our method is able to accurately form spin systems based on as few as two 13C-detected spectra, CBCACON, and CBCANCO. Our method picks slices from the more trusted spectrum and uses them as feedback to direct the slice picking in the less trusted one. This feedback leads to picking the accurate slices that consequently helps to form better spin systems. We tested our method on a real dataset of 'Ubiquitin' and a benchmark simulated dataset consisting of 12 proteins. We fed our spin systems as inputs to a genetic algorithm to generate the chemical shift assignment, and obtained 92 percent correct chemical shift assignment for Ubiquitin. For the simulated dataset, we obtained an average recall of 86 percent and an average precision of 88 percent. Finally, our chemical shift assignment of Ubiquitin was given as an input to CS-ROSETTA server that generated structures close to the experimentally determined structure.
Collapse
|
13
|
Li D, Hansen AL, Bruschweiler-Li L, Brüschweiler R. Non-Uniform and Absolute Minimal Sampling for High-Throughput Multidimensional NMR Applications. Chemistry 2018; 24:11535-11544. [PMID: 29566285 PMCID: PMC6488043 DOI: 10.1002/chem.201800954] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2018] [Indexed: 11/10/2022]
Abstract
Many biomolecular NMR applications can benefit from the faster acquisition of multidimensional NMR data with high resolution and their automated analysis and interpretation. In recent years, a number of non-uniform sampling (NUS) approaches have been introduced for the reconstruction of multidimensional NMR spectra, such as compressed sensing, thereby bypassing traditional Fourier-transform processing. Such approaches are applicable to both biomacromolecules and small molecules and their complex mixtures and can be combined with homonuclear decoupling (pure shift) and covariance processing. For homonuclear 2D TOCSY experiments, absolute minimal sampling (AMS) permits the drastic shortening of measurement times necessary for high-throughput applications for identification and quantification of components in complex biological mixtures in the field of metabolomics. Such TOCSY spectra can be comprehensively represented by graphic theoretical maximal cliques for the identification of entire spin systems and their subsequent query against NMR databases. Integration of these methods in webservers permits the rapid and reliable identification of mixture components. Recent progress is reviewed in this Minireview.
Collapse
Affiliation(s)
- Dawei Li
- Campus Chemical Instrument Center, The Ohio State University, Columbus, Ohio 43210, U.S.A
| | - Alexandar L. Hansen
- Campus Chemical Instrument Center, The Ohio State University, Columbus, Ohio 43210, U.S.A
| | - Lei Bruschweiler-Li
- Campus Chemical Instrument Center, The Ohio State University, Columbus, Ohio 43210, U.S.A
| | - Rafael Brüschweiler
- Campus Chemical Instrument Center, The Ohio State University, Columbus, Ohio 43210, U.S.A
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio 43210, U.S.A
- Department of Biological Chemistry and Pharmacology, The Ohio State University, Columbus, Ohio 43210, United States
| |
Collapse
|
14
|
Klukowski P, Augoff M, Zięba M, Drwal M, Gonczarek A, Walczak MJ. NMRNet: a deep learning approach to automated peak picking of protein NMR spectra. Bioinformatics 2018; 34:2590-2597. [DOI: 10.1093/bioinformatics/bty134] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 03/09/2018] [Indexed: 01/13/2023] Open
Affiliation(s)
- Piotr Klukowski
- Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeze Wyspianskiego 27, Wrocław, Poland
| | - Michał Augoff
- Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeze Wyspianskiego 27, Wrocław, Poland
| | - Maciej Zięba
- Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeze Wyspianskiego 27, Wrocław, Poland
| | - Maciej Drwal
- Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeze Wyspianskiego 27, Wrocław, Poland
| | - Adam Gonczarek
- Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeze Wyspianskiego 27, Wrocław, Poland
- Alphamoon Ltd., ul. Wlodkowica 21/3, Wrocław, Poland
| | - Michał J Walczak
- Captor Therapeutics Ltd., ul. Dunska 11, Wrocław, Poland
- Alphamoon Ltd., ul. Wlodkowica 21/3, Wrocław, Poland
| |
Collapse
|
15
|
Han R, Zhang F, Gao X. A fast fiducial marker tracking model for fully automatic alignment in electron tomography. Bioinformatics 2018; 34:853-863. [PMID: 29069299 PMCID: PMC6030832 DOI: 10.1093/bioinformatics/btx653] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Revised: 09/28/2017] [Accepted: 10/20/2017] [Indexed: 11/25/2022] Open
Abstract
Motivation Automatic alignment, especially fiducial marker-based alignment, has become increasingly important due to the high demand of subtomogram averaging and the rapid development of large-field electron microscopy. Among the alignment steps, fiducial marker tracking is a crucial one that determines the quality of the final alignment. Yet, it is still a challenging problem to track the fiducial markers accurately and effectively in a fully automatic manner. Results In this paper, we propose a robust and efficient scheme for fiducial marker tracking. Firstly, we theoretically prove the upper bound of the transformation deviation of aligning the positions of fiducial markers on two micrographs by affine transformation. Secondly, we design an automatic algorithm based on the Gaussian mixture model to accelerate the procedure of fiducial marker tracking. Thirdly, we propose a divide-and-conquer strategy against lens distortions to ensure the reliability of our scheme. To our knowledge, this is the first attempt that theoretically relates the projection model with the tracking model. The real-world experimental results further support our theoretical bound and demonstrate the effectiveness of our algorithm. This work facilitates the fully automatic tracking for datasets with a massive number of fiducial markers. Availability and implementation The C/C ++ source code that implements the fast fiducial marker tracking is available at https://github.com/icthrm/gmm-marker-tracking. Markerauto 1.6 version or later (also integrated in the AuTom platform at http://ear.ict.ac.cn/) offers a complete implementation for fast alignment, in which fast fiducial marker tracking is available by the '-t' option. Contact xin.gao@kaust.edu.sa. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Renmin Han
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia
| | - Fa Zhang
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia
| |
Collapse
|
16
|
NMR-based automated protein structure determination. Arch Biochem Biophys 2017; 628:24-32. [PMID: 28263718 DOI: 10.1016/j.abb.2017.02.011] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Revised: 02/18/2017] [Accepted: 02/28/2017] [Indexed: 11/21/2022]
Abstract
NMR spectra analysis for protein structure determination can now in many cases be performed by automated computational methods. This overview of the computational methods for NMR protein structure analysis presents recent automated methods for signal identification in multidimensional NMR spectra, sequence-specific resonance assignment, collection of conformational restraints, and structure calculation, as implemented in the CYANA software package. These algorithms are sufficiently reliable and integrated into one software package to enable the fully automated structure determination of proteins starting from NMR spectra without manual interventions or corrections at intermediate steps, with an accuracy of 1-2 Å backbone RMSD in comparison with manually solved reference structures.
Collapse
|
17
|
Banelli T, Vuano M, Fogolari F, Fusiello A, Esposito G, Corazza A. Automation of peak-tracking analysis of stepwise perturbed NMR spectra. JOURNAL OF BIOMOLECULAR NMR 2017; 67:121-134. [PMID: 28213793 DOI: 10.1007/s10858-017-0088-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 01/19/2017] [Indexed: 06/06/2023]
Abstract
We describe a new algorithmic approach able to automatically pick and track the NMR resonances of a large number of 2D NMR spectra acquired during a stepwise variation of a physical parameter. The method has been named Trace in Track (TINT), referring to the idea that a gaussian decomposition traces peaks within the tracks recognised through 3D mathematical morphology. It is capable of determining the evolution of the chemical shifts, intensity and linewidths of each tracked peak.The performances obtained in term of track reconstruction and correct assignment on realistic synthetic spectra were high above 90% when a noise level similar to that of experimental data were considered. TINT was applied successfully to several protein systems during a temperature ramp in isotope exchange experiments. A comparison with a state-of-the-art algorithm showed promising results for great numbers of spectra and low signal to noise ratios, when the graduality of the perturbation is appropriate. TINT can be applied to different kinds of high throughput chemical shift mapping experiments, with quasi-continuous variations, in which a quantitative automated recognition is crucial.
Collapse
Affiliation(s)
- Tommaso Banelli
- Dipartimento di Area Medica, Università di Udine, P.le Kolbe, 4, 33100, Udine, Italy
| | - Marco Vuano
- Dipartimento di Area Medica, Università di Udine, P.le Kolbe, 4, 33100, Udine, Italy
| | - Federico Fogolari
- INBB, Viale Medaglie d'Oro, 306, 00136, Roma, Italy
- Dipartimento di Scienze Matematiche Informatiche e Fisiche, Università di Udine, Via delle Scienze, 206, 33100, Udine, Italy
| | - Andrea Fusiello
- Dipartimento Politecnico di Ingegneria e Architettura, Università di Udine, Via delle Scienze, 208, 33100, Udine, Italy
| | - Gennaro Esposito
- INBB, Viale Medaglie d'Oro, 306, 00136, Roma, Italy
- Dipartimento di Scienze Matematiche Informatiche e Fisiche, Università di Udine, Via delle Scienze, 206, 33100, Udine, Italy
- Science & Math Division, New York University Abu Dhabi, Saadiyat Campus, PO Box 129188, Abu Dhabi, UAE
| | - Alessandra Corazza
- Dipartimento di Area Medica, Università di Udine, P.le Kolbe, 4, 33100, Udine, Italy.
- INBB, Viale Medaglie d'Oro, 306, 00136, Roma, Italy.
| |
Collapse
|
18
|
Smith AA. INFOS: spectrum fitting software for NMR analysis. JOURNAL OF BIOMOLECULAR NMR 2017; 67:77-94. [PMID: 28160196 DOI: 10.1007/s10858-016-0085-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 12/23/2016] [Indexed: 06/06/2023]
Abstract
Software for fitting of NMR spectra in MATLAB is presented. Spectra are fitted in the frequency domain, using Fourier transformed lineshapes, which are derived using the experimental acquisition and processing parameters. This yields more accurate fits compared to common fitting methods that use Lorentzian or Gaussian functions. Furthermore, a very time-efficient algorithm for calculating and fitting spectra has been developed. The software also performs initial peak picking, followed by subsequent fitting and refinement of the peak list, by iteratively adding and removing peaks to improve the overall fit. Estimation of error on fitting parameters is performed using a Monte-Carlo approach. Many fitting options allow the software to be flexible enough for a wide array of applications, while still being straightforward to set up with minimal user input.
Collapse
Affiliation(s)
- Albert A Smith
- Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland.
| |
Collapse
|
19
|
Perez M. Autonomous driving in NMR. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2017; 55:15-21. [PMID: 27785822 DOI: 10.1002/mrc.4546] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Revised: 10/20/2016] [Accepted: 10/24/2016] [Indexed: 06/06/2023]
Abstract
The automatic analysis of NMR data has been a much-desired endeavour for the last six decades, as it is the case with any other analytical technique. This need for automation has only grown as advances in hardware; pulse sequences and automation have opened new research areas to NMR and increased the throughput of data. Full automatic analysis is a worthy, albeit hard, challenge, but in a world of artificial intelligence, instant communication and big data, it seems that this particular fight is happening with only one technique at a time (let this be NMR, MS, IR, UV or any other), when the reality of most laboratories is that there are several types of analytical instrumentation present. Data aggregation, verification and elucidation by using complementary techniques (e.g. MS and NMR) is a desirable outcome to pursue, although a time-consuming one if performed manually; hence, the use of automation to perform the heavy lifting for users is required to make the approach attractive for scientists. Many of the decisions and workflows that could be implemented under automation will depend on the two-way communication with databases that understand analytical data, because it is desirable not only to query these databases but also to grow them in as much of an automatic manner as possible. How these databases are designed, set up and the data inside classified will determine what workflows can be implemented. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Manuel Perez
- Mestrelab Research, S.L. Feliciano Barrera 9B-Baixo, Santiago de Compostela, Spain
| |
Collapse
|
20
|
Würz JM, Güntert P. Peak picking multidimensional NMR spectra with the contour geometry based algorithm CYPICK. JOURNAL OF BIOMOLECULAR NMR 2017; 67:63-76. [PMID: 28160195 DOI: 10.1007/s10858-016-0084-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 12/19/2016] [Indexed: 06/06/2023]
Abstract
The automated identification of signals in multidimensional NMR spectra is a challenging task, complicated by signal overlap, noise, and spectral artifacts, for which no universally accepted method is available. Here, we present a new peak picking algorithm, CYPICK, that follows, as far as possible, the manual approach taken by a spectroscopist who analyzes peak patterns in contour plots of the spectrum, but is fully automated. Human visual inspection is replaced by the evaluation of geometric criteria applied to contour lines, such as local extremality, approximate circularity (after appropriate scaling of the spectrum axes), and convexity. The performance of CYPICK was evaluated for a variety of spectra from different proteins by systematic comparison with peak lists obtained by other, manual or automated, peak picking methods, as well as by analyzing the results of automated chemical shift assignment and structure calculation based on input peak lists from CYPICK. The results show that CYPICK yielded peak lists that compare in most cases favorably to those obtained by other automated peak pickers with respect to the criteria of finding a maximal number of real signals, a minimal number of artifact peaks, and maximal correctness of the chemical shift assignments and the three-dimensional structure obtained by fully automated assignment and structure calculation.
Collapse
Affiliation(s)
- Julia M Würz
- Institute of Biophysical Chemistry, Center for Biomolecular Magnetic Resonance, Goethe University Frankfurt am Main, Max-von-Laue-Str. 9, 60438, Frankfurt am Main, Germany
| | - Peter Güntert
- Institute of Biophysical Chemistry, Center for Biomolecular Magnetic Resonance, Goethe University Frankfurt am Main, Max-von-Laue-Str. 9, 60438, Frankfurt am Main, Germany.
- Laboratory of Physical Chemistry, ETH Zürich, Zürich, Switzerland.
- Graduate School of Science and Engineering, Tokyo Metropolitan University, Hachioji, Tokyo, Japan.
| |
Collapse
|
21
|
Chen P, Hu S, Zhang J, Gao X, Li J, Xia J, Wang B. A Sequence-Based Dynamic Ensemble Learning System for Protein Ligand-Binding Site Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:901-912. [PMID: 26661785 DOI: 10.1109/tcbb.2015.2505286] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
BACKGROUND Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures. RESULTS This paper proposes a dynamic ensemble approach to identify protein-ligand binding residues by using sequence information only. To avoid problems resulting from highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we constructed several balanced data sets and we trained a random forest classifier for each of them. We dynamically selected a subset of classifiers according to the similarity between the target protein and the proteins in the training data set. The combination of the predictions of the classifier subset to each query protein target yielded the final predictions. The ensemble of these classifiers formed a sequence-based predictor to identify protein-ligand binding sites. CONCLUSIONS Experimental results on two Critical Assessment of protein Structure Prediction datasets and the ccPDB dataset demonstrated that of our proposed method compared favorably with the state-of-the-art. AVAILABILITY http://www2.ahu.edu.cn/pchen/web/LigandDSES.htm.
Collapse
|
22
|
Yilmaz EM, Güntert P. NMR structure calculation for all small molecule ligands and non-standard residues from the PDB Chemical Component Dictionary. JOURNAL OF BIOMOLECULAR NMR 2015; 63:21-37. [PMID: 26123317 DOI: 10.1007/s10858-015-9959-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 06/22/2015] [Indexed: 05/15/2023]
Abstract
An algorithm, CYLIB, is presented for converting molecular topology descriptions from the PDB Chemical Component Dictionary into CYANA residue library entries. The CYANA structure calculation algorithm uses torsion angle molecular dynamics for the efficient computation of three-dimensional structures from NMR-derived restraints. For this, the molecules have to be represented in torsion angle space with rotations around covalent single bonds as the only degrees of freedom. The molecule must be given a tree structure of torsion angles connecting rigid units composed of one or several atoms with fixed relative positions. Setting up CYANA residue library entries therefore involves, besides straightforward format conversion, the non-trivial step of defining a suitable tree structure of torsion angles, and to re-order the atoms in a way that is compatible with this tree structure. This can be done manually for small numbers of ligands but the process is time-consuming and error-prone. An automated method is necessary in order to handle the large number of different potential ligand molecules to be studied in drug design projects. Here, we present an algorithm for this purpose, and show that CYANA structure calculations can be performed with almost all small molecule ligands and non-standard amino acid residues in the PDB Chemical Component Dictionary.
Collapse
Affiliation(s)
- Emel Maden Yilmaz
- Center for Biomolecular Magnetic Resonance, Institute of Biophysical Chemistry, Goethe University Frankfurt am Main, Max-von-Laue-Str. 9, 60438, Frankfurt am Main, Germany
| | - Peter Güntert
- Center for Biomolecular Magnetic Resonance, Institute of Biophysical Chemistry, Goethe University Frankfurt am Main, Max-von-Laue-Str. 9, 60438, Frankfurt am Main, Germany.
- Laboratory of Physical Chemistry, ETH Zürich, Zurich, Switzerland.
- Graduate School of Science, Tokyo Metropolitan University, Hachioji, Tokyo, Japan.
| |
Collapse
|
23
|
Akhmedov M, Çatay B, Apaydın MS. Automating unambiguous NOE data usage in NVR for NMR protein structure-based assignments. J Bioinform Comput Biol 2015; 13:1550020. [PMID: 26260854 DOI: 10.1142/s0219720015500201] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Nuclear Magnetic Resonance (NMR) Spectroscopy is an important technique that allows determining protein structure in solution. An important problem in protein structure determination using NMR spectroscopy is the mapping of peaks to corresponding amino acids, also known as the assignment problem. Structure-Based Assignment (SBA) is an approach to solve this problem using a template structure that is homologous to the target. Our previously developed approach Nuclear Vector Replacement-Binary Integer Programming (NVR-BIP) computed the optimal solution for small proteins, but was unable to solve the assignments of large proteins. NVR-Ant Colony Optimization (ACO) extended the applicability of the NVR approach for such proteins. One of the input data utilized in these approaches is the Nuclear Overhauser Effect (NOE) data. NOE is an interaction observed between two protons if the protons are located close in space. These protons could be amide protons, protons attached to the alpha-carbon atom in the backbone of the protein, or side chain protons. NVR only uses backbone protons. In this paper, we reformulate the NVR-BIP model to distinguish the type of proton in NOE data and use the corresponding proton coordinates in the extended formulation. In addition, the threshold value over interproton distances is set in a standard manner for all proteins by extracting the NOE upper bound distance information from the data. We also convert NOE intensities into distance thresholds. Our new approach thus handles the NOE data correctly and without manually determined parameters. We accordingly adapt NVR-ACO solution methodology to these changes. Computational results show that our approaches obtain optimal solutions for small proteins. For the large proteins our ant colony optimization-based approach obtains promising results.
Collapse
Affiliation(s)
- Murodzhon Akhmedov
- * Dalle Molle Institute for Artificial Intelligence, Galleria 2, 6928 Manno-Lugano, Switzerland
| | - Bülent Çatay
- † Sabanci University, Faculty of Engineering and Natural Sciences, 34956 Orhanlı, Istanbul, Turkey
| | | |
Collapse
|
24
|
Dashti H, Lee W, Tonelli M, Cornilescu CC, Cornilescu G, Assadi-Porter FM, Westler WM, Eghbalnia HR, Markley JL. NMRFAM-SDF: a protein structure determination framework. JOURNAL OF BIOMOLECULAR NMR 2015; 62:481-95. [PMID: 25900069 PMCID: PMC4569665 DOI: 10.1007/s10858-015-9933-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 04/15/2015] [Indexed: 05/21/2023]
Abstract
The computationally demanding nature of automated NMR structure determination necessitates a delicate balancing of factors that include the time complexity of data collection, the computational complexity of chemical shift assignments, and selection of proper optimization steps. During the past two decades the computational and algorithmic aspects of several discrete steps of the process have been addressed. Although no single comprehensive solution has emerged, the incorporation of a validation protocol has gained recognition as a necessary step for a robust automated approach. The need for validation becomes even more pronounced in cases of proteins with higher structural complexity, where potentially larger errors generated at each step can propagate and accumulate in the process of structure calculation, thereby significantly degrading the efficacy of any software framework. This paper introduces a complete framework for protein structure determination with NMR--from data acquisition to the structure determination. The aim is twofold: to simplify the structure determination process for non-NMR experts whenever feasible, while maintaining flexibility by providing a set of modules that validate each step, and to enable the assessment of error propagations. This framework, called NMRFAM-SDF (NMRFAM-Structure Determination Framework), and its various components are available for download from the NMRFAM website (http://nmrfam.wisc.edu/software.htm).
Collapse
Affiliation(s)
- Hesam Dashti
- National Magnetic Resonance Facility at Madison, Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI, USA
| | - Woonghee Lee
- National Magnetic Resonance Facility at Madison, Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI, USA
| | - Marco Tonelli
- National Magnetic Resonance Facility at Madison, Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI, USA
| | - Claudia C Cornilescu
- National Magnetic Resonance Facility at Madison, Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI, USA
| | - Gabriel Cornilescu
- National Magnetic Resonance Facility at Madison, Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI, USA
| | - Fariba M Assadi-Porter
- National Magnetic Resonance Facility at Madison, Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI, USA
| | - William M Westler
- National Magnetic Resonance Facility at Madison, Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI, USA
| | - Hamid R Eghbalnia
- National Magnetic Resonance Facility at Madison, Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI, USA
| | - John L Markley
- National Magnetic Resonance Facility at Madison, Biochemistry Department, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI, USA.
| |
Collapse
|
25
|
Castillo AM, Bernal A, Patiny L, Wist J. Fully automatic assignment of small molecules' NMR spectra without relying on chemical shift predictions. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2015; 53:603-611. [PMID: 26053353 DOI: 10.1002/mrc.4272] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Revised: 04/30/2015] [Accepted: 05/06/2015] [Indexed: 06/04/2023]
Abstract
We present a method for the automatic assignment of small molecules' NMR spectra. The method includes an automatic and novel self-consistent peak-picking routine that validates NMR peaks in each spectrum against peaks in the same or other spectra that are due to the same resonances. The auto-assignment routine used is based on branch-and-bound optimization and relies predominantly on integration and correlation data; chemical shift information may be included when available to fasten the search and shorten the list of viable assignments, but in most cases tested, it is not required in order to find the correct assignment. This automatic assignment method is implemented as a web-based tool that runs without any user input other than the acquired spectra.
Collapse
Affiliation(s)
- Andrés M Castillo
- Facultad de Ingeniería, Universidad Nacional de Colombia, Bogotá D.C., Colombia
- Chemistry Department, Universidad del Valle, Cali, Valle, A.A. 25360, Colombia
| | - Andrés Bernal
- Chemistry Department, Universidad del Valle, Cali, Valle, A.A. 25360, Colombia
| | - Luc Patiny
- Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, CH-1015, Switzerland
| | - Julien Wist
- Chemistry Department, Universidad del Valle, Cali, Valle, A.A. 25360, Colombia
| |
Collapse
|
26
|
Klukowski P, Walczak MJ, Gonczarek A, Boudet J, Wider G. Computer vision-based automated peak picking applied to protein NMR spectra. Bioinformatics 2015; 31:2981-8. [PMID: 25995228 DOI: 10.1093/bioinformatics/btv318] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2014] [Accepted: 05/18/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION A detailed analysis of multidimensional NMR spectra of macromolecules requires the identification of individual resonances (peaks). This task can be tedious and time-consuming and often requires support by experienced users. Automated peak picking algorithms were introduced more than 25 years ago, but there are still major deficiencies/flaws that often prevent complete and error free peak picking of biological macromolecule spectra. The major challenges of automated peak picking algorithms is both the distinction of artifacts from real peaks particularly from those with irregular shapes and also picking peaks in spectral regions with overlapping resonances which are very hard to resolve by existing computer algorithms. In both of these cases a visual inspection approach could be more effective than a 'blind' algorithm. RESULTS We present a novel approach using computer vision (CV) methodology which could be better adapted to the problem of peak recognition. After suitable 'training' we successfully applied the CV algorithm to spectra of medium-sized soluble proteins up to molecular weights of 26 kDa and to a 130 kDa complex of a tetrameric membrane protein in detergent micelles. Our CV approach outperforms commonly used programs. With suitable training datasets the application of the presented method can be extended to automated peak picking in multidimensional spectra of nucleic acids or carbohydrates and adapted to solid-state NMR spectra. AVAILABILITY AND IMPLEMENTATION CV-Peak Picker is available upon request from the authors. CONTACT gsw@mol.biol.ethz.ch; michal.walczak@mol.biol.ethz.ch; adam.gonczarek@pwr.edu.pl SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Piotr Klukowski
- Department of Computer Science, Wroclaw University of Technology, Wroclaw, Poland and
| | - Michal J Walczak
- Institute of Molecular Biology and Biophysics, ETH Zurich, 8093 Zurich, Switzerland
| | - Adam Gonczarek
- Department of Computer Science, Wroclaw University of Technology, Wroclaw, Poland and
| | - Julien Boudet
- Institute of Molecular Biology and Biophysics, ETH Zurich, 8093 Zurich, Switzerland
| | - Gerhard Wider
- Institute of Molecular Biology and Biophysics, ETH Zurich, 8093 Zurich, Switzerland
| |
Collapse
|
27
|
Cannistraci CV, Abbas A, Gao X. Median Modified Wiener Filter for nonlinear adaptive spatial denoising of protein NMR multidimensional spectra. Sci Rep 2015; 5:8017. [PMID: 25619991 PMCID: PMC4306135 DOI: 10.1038/srep08017] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Accepted: 12/29/2014] [Indexed: 11/21/2022] Open
Abstract
Denoising multidimensional NMR-spectra is a fundamental step in NMR protein structure determination. The state-of-the-art method uses wavelet-denoising, which may suffer when applied to non-stationary signals affected by Gaussian-white-noise mixed with strong impulsive artifacts, like those in multi-dimensional NMR-spectra. Regrettably, Wavelet's performance depends on a combinatorial search of wavelet shapes and parameters; and multi-dimensional extension of wavelet-denoising is highly non-trivial, which hampers its application to multidimensional NMR-spectra. Here, we endorse a diverse philosophy of denoising NMR-spectra: less is more! We consider spatial filters that have only one parameter to tune: the window-size. We propose, for the first time, the 3D extension of the median-modified-Wiener-filter (MMWF), an adaptive variant of the median-filter, and also its novel variation named MMWF*. We test the proposed filters and the Wiener-filter, an adaptive variant of the mean-filter, on a benchmark set that contains 16 two-dimensional and three-dimensional NMR-spectra extracted from eight proteins. Our results demonstrate that the adaptive spatial filters significantly outperform their non-adaptive versions. The performance of the new MMWF* on 2D/3D-spectra is even better than wavelet-denoising. Noticeably, MMWF* produces stable high performance almost invariant for diverse window-size settings: this signifies a consistent advantage in the implementation of automatic pipelines for protein NMR-spectra analysis.
Collapse
Affiliation(s)
- Carlo Vittorio Cannistraci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Ahmed Abbas
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
28
|
Niklasson M, Ahlner A, Andresen C, Marsh JA, Lundström P. Fast and accurate resonance assignment of small-to-large proteins by combining automated and manual approaches. PLoS Comput Biol 2015; 11:e1004022. [PMID: 25569628 PMCID: PMC4288728 DOI: 10.1371/journal.pcbi.1004022] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 11/03/2014] [Indexed: 11/26/2022] Open
Abstract
The process of resonance assignment is fundamental to most NMR studies of protein structure and dynamics. Unfortunately, the manual assignment of residues is tedious and time-consuming, and can represent a significant bottleneck for further characterization. Furthermore, while automated approaches have been developed, they are often limited in their accuracy, particularly for larger proteins. Here, we address this by introducing the software COMPASS, which, by combining automated resonance assignment with manual intervention, is able to achieve accuracy approaching that from manual assignments at greatly accelerated speeds. Moreover, by including the option to compensate for isotope shift effects in deuterated proteins, COMPASS is far more accurate for larger proteins than existing automated methods. COMPASS is an open-source project licensed under GNU General Public License and is available for download from http://www.liu.se/forskning/foass/tidigare-foass/patrik-lundstrom/software?l=en. Source code and binaries for Linux, Mac OS X and Microsoft Windows are available.
Collapse
Affiliation(s)
- Markus Niklasson
- Division of Biomolecular Technology, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden
| | - Alexandra Ahlner
- Division of Biomolecular Technology, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden
| | - Cecilia Andresen
- Division of Biomolecular Technology, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden
| | - Joseph A. Marsh
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| | - Patrik Lundström
- Division of Biomolecular Technology, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden
- * E-mail:
| |
Collapse
|
29
|
Chen P, Huang JZ, Gao X. LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone. BMC Bioinformatics 2014; 15 Suppl 15:S4. [PMID: 25474163 PMCID: PMC4271564 DOI: 10.1186/1471-2105-15-s15-s4] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Background Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. Results In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. Conclusions Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.
Collapse
|
30
|
Abbas A, Guo X, Jing BY, Gao X. An automated framework for NMR resonance assignment through simultaneous slice picking and spin system forming. JOURNAL OF BIOMOLECULAR NMR 2014; 59:75-86. [PMID: 24748536 DOI: 10.1007/s10858-014-9828-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 04/05/2014] [Indexed: 06/03/2023]
Abstract
Despite significant advances in automated nuclear magnetic resonance-based protein structure determination, the high numbers of false positives and false negatives among the peaks selected by fully automated methods remain a problem. These false positives and negatives impair the performance of resonance assignment methods. One of the main reasons for this problem is that the computational research community often considers peak picking and resonance assignment to be two separate problems, whereas spectroscopists use expert knowledge to pick peaks and assign their resonances at the same time. We propose a novel framework that simultaneously conducts slice picking and spin system forming, an essential step in resonance assignment. Our framework then employs a genetic algorithm, directed by both connectivity information and amino acid typing information from the spin systems, to assign the spin systems to residues. The inputs to our framework can be as few as two commonly used spectra, i.e., CBCA(CO)NH and HNCACB. Different from the existing peak picking and resonance assignment methods that treat peaks as the units, our method is based on 'slices', which are one-dimensional vectors in three-dimensional spectra that correspond to certain ([Formula: see text]) values. Experimental results on both benchmark simulated data sets and four real protein data sets demonstrate that our method significantly outperforms the state-of-the-art methods while using a less number of spectra than those methods. Our method is freely available at http://sfb.kaust.edu.sa/Pages/Software.aspx.
Collapse
Affiliation(s)
- Ahmed Abbas
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | | | | | | |
Collapse
|
31
|
Tikole S, Jaravine V, Rogov V, Dötsch V, Güntert P. Peak picking NMR spectral data using non-negative matrix factorization. BMC Bioinformatics 2014; 15:46. [PMID: 24511909 PMCID: PMC3931316 DOI: 10.1186/1471-2105-15-46] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Accepted: 02/04/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Simple peak-picking algorithms, such as those based on lineshape fitting, perform well when peaks are completely resolved in multidimensional NMR spectra, but often produce wrong intensities and frequencies for overlapping peak clusters. For example, NOESY-type spectra have considerable overlaps leading to significant peak-picking intensity errors, which can result in erroneous structural restraints. Precise frequencies are critical for unambiguous resonance assignments. RESULTS To alleviate this problem, a more sophisticated peaks decomposition algorithm, based on non-negative matrix factorization (NMF), was developed. We produce peak shapes from Fourier-transformed NMR spectra. Apart from its main goal of deriving components from spectra and producing peak lists automatically, the NMF approach can also be applied if the positions of some peaks are known a priori, e.g. from consistently referenced spectral dimensions of other experiments. CONCLUSIONS Application of the NMF algorithm to a three-dimensional peak list of the 23 kDa bi-domain section of the RcsD protein (RcsD-ABL-HPt, residues 688-890) as well as to synthetic HSQC data shows that peaks can be picked accurately also in spectral regions with strong overlap.
Collapse
Affiliation(s)
| | | | | | | | - Peter Güntert
- Institute of Biophysical Chemistry, Center for Biomolecular Magnetic Resonance, and Frankfurt Institute of Advanced Studies, Goethe University Frankfurt am Main, Max-von-Laue-Str, 9, 60438 Frankfurt am Main, Germany.
| |
Collapse
|
32
|
Serban N, Li P. A statistical test for mixture detection with application to component identification in multidimensional biomolecular NMR studies. CAN J STAT 2013. [DOI: 10.1002/cjs.11202] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Nicoleta Serban
- H. Milton Stewart Industrial and Systems Engineering; Georgia Institute of Technology; Atlanta Georgia 30331 USA
| | - Pengfei Li
- Department of Statistics and Actuarial Science; University of Waterloo; Waterloo Ontario Canada N2L 3G1
| |
Collapse
|
33
|
Cheng Y, Gao X, Liang F. Bayesian peak picking for NMR spectra. GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 12:39-47. [PMID: 24184964 PMCID: PMC4411369 DOI: 10.1016/j.gpb.2013.07.003] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2013] [Accepted: 07/29/2013] [Indexed: 11/29/2022]
Abstract
Protein structure determination is a very important topic in structural genomics, which helps people to understand varieties of biological functions such as protein-protein interactions, protein–DNA interactions and so on. Nowadays, nuclear magnetic resonance (NMR) has often been used to determine the three-dimensional structures of protein in vivo. This study aims to automate the peak picking step, the most important and tricky step in NMR structure determination. We propose to model the NMR spectrum by a mixture of bivariate Gaussian densities and use the stochastic approximation Monte Carlo algorithm as the computational tool to solve the problem. Under the Bayesian framework, the peak picking problem is casted as a variable selection problem. The proposed method can automatically distinguish true peaks from false ones without preprocessing the data. To the best of our knowledge, this is the first effort in the literature that tackles the peak picking problem for NMR spectrum data using Bayesian method.
Collapse
Affiliation(s)
- Yichen Cheng
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Faming Liang
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA.
| |
Collapse
|
34
|
Gao X. Recent advances in computational methods for nuclear magnetic resonance data processing. GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 11:29-33. [PMID: 23453016 PMCID: PMC4357661 DOI: 10.1016/j.gpb.2012.12.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2012] [Revised: 12/12/2012] [Accepted: 12/28/2012] [Indexed: 11/28/2022]
Abstract
Although three-dimensional protein structure determination using nuclear magnetic resonance (NMR) spectroscopy is a computationally costly and tedious process that would benefit from advanced computational techniques, it has not garnered much research attention from specialists in bioinformatics and computational biology. In this paper, we review recent advances in computational methods for NMR protein structure determination. We summarize the advantages of and bottlenecks in the existing methods and outline some open problems in the field. We also discuss current trends in NMR technology development and suggest directions for research on future computational methods for NMR.
Collapse
Affiliation(s)
- Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.
| |
Collapse
|
35
|
Abbas A, Kong XB, Liu Z, Jing BY, Gao X. Automatic peak selection by a Benjamini-Hochberg-based algorithm. PLoS One 2013; 8:e53112. [PMID: 23308147 PMCID: PMC3538655 DOI: 10.1371/journal.pone.0053112] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2012] [Accepted: 11/26/2012] [Indexed: 11/25/2022] Open
Abstract
A common issue in bioinformatics is that computational methods often generate a large number of predictions sorted according to certain confidence scores. A key problem is then determining how many predictions must be selected to include most of the true predictions while maintaining reasonably high precision. In nuclear magnetic resonance (NMR)-based protein structure determination, for instance, computational peak picking methods are becoming more and more common, although expert-knowledge remains the method of choice to determine how many peaks among thousands of candidate peaks should be taken into consideration to capture the true peaks. Here, we propose a Benjamini-Hochberg (B-H)-based approach that automatically selects the number of peaks. We formulate the peak selection problem as a multiple testing problem. Given a candidate peak list sorted by either volumes or intensities, we first convert the peaks into [Formula: see text]-values and then apply the B-H-based algorithm to automatically select the number of peaks. The proposed approach is tested on the state-of-the-art peak picking methods, including WaVPeak [1] and PICKY [2]. Compared with the traditional fixed number-based approach, our approach returns significantly more true peaks. For instance, by combining WaVPeak or PICKY with the proposed method, the missing peak rates are on average reduced by 20% and 26%, respectively, in a benchmark set of 32 spectra extracted from eight proteins. The consensus of the B-H-selected peaks from both WaVPeak and PICKY achieves 88% recall and 83% precision, which significantly outperforms each individual method and the consensus method without using the B-H algorithm. The proposed method can be used as a standard procedure for any peak picking method and straightforwardly applied to some other prediction selection problems in bioinformatics. The source code, documentation and example data of the proposed method is available at http://sfb.kaust.edu.sa/pages/software.aspx.
Collapse
Affiliation(s)
- Ahmed Abbas
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Xin-Bing Kong
- Department of Statistics, Fudan University, Shanghai, China
| | - Zhi Liu
- Department of Mathematics, Faculty of Science and Technology, University of Macau, Taipa, Macau
| | - Bing-Yi Jing
- Department of Mathematics, Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
36
|
Alipanahi B, Krislock N, Ghodsi A, Wolkowicz H, Donaldson L, Li M. Determining protein structures from NOESY distance constraints by semidefinite programming. J Comput Biol 2012; 20:296-310. [PMID: 23113706 DOI: 10.1089/cmb.2012.0089] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Contemporary practical methods for protein nuclear magnetic resonance (NMR) structure determination use molecular dynamics coupled with a simulated annealing schedule. The objective of these methods is to minimize the error of deviating from the nuclear overhauser effect (NOE) distance constraints. However, the corresponding objective function is highly nonconvex and, consequently, difficult to optimize. Euclidean distance matrix (EDM) methods based on semidefinite programming (SDP) provide a natural framework for these problems. However, the high complexity of SDP solvers and the often noisy distance constraints provide major challenges to this approach. The main contribution of this article is a new SDP formulation for the EDM approach that overcomes these two difficulties. We model the protein as a set of intersecting two- and three-dimensional cliques. Then, we adapt and extend a technique called semidefinite facial reduction to reduce the SDP problem size to approximately one quarter of the size of the original problem. The reduced SDP problem can be solved approximately 100 times faster, and it is also more resistant to numerical problems from erroneous and inexact distance bounds.
Collapse
Affiliation(s)
- Babak Alipanahi
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
| | | | | | | | | | | |
Collapse
|
37
|
Liu Z, Abbas A, Jing BY, Gao X. WaVPeak: picking NMR peaks through wavelet-based smoothing and volume-based filtering. Bioinformatics 2012; 28:914-20. [PMID: 22328784 PMCID: PMC3315717 DOI: 10.1093/bioinformatics/bts078] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2011] [Revised: 01/16/2012] [Accepted: 02/08/2012] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Nuclear magnetic resonance (NMR) has been widely used as a powerful tool to determine the 3D structures of proteins in vivo. However, the post-spectra processing stage of NMR structure determination usually involves a tremendous amount of time and expert knowledge, which includes peak picking, chemical shift assignment and structure calculation steps. Detecting accurate peaks from the NMR spectra is a prerequisite for all following steps, and thus remains a key problem in automatic NMR structure determination. RESULTS We introduce WaVPeak, a fully automatic peak detection method. WaVPeak first smoothes the given NMR spectrum by wavelets. The peaks are then identified as the local maxima. The false positive peaks are filtered out efficiently by considering the volume of the peaks. WaVPeak has two major advantages over the state-of-the-art peak-picking methods. First, through wavelet-based smoothing, WaVPeak does not eliminate any data point in the spectra. Therefore, WaVPeak is able to detect weak peaks that are embedded in the noise level. NMR spectroscopists need the most help isolating these weak peaks. Second, WaVPeak estimates the volume of the peaks to filter the false positives. This is more reliable than intensity-based filters that are widely used in existing methods. We evaluate the performance of WaVPeak on the benchmark set proposed by PICKY (Alipanahi et al., 2009), one of the most accurate methods in the literature. The dataset comprises 32 2D and 3D spectra from eight different proteins. Experimental results demonstrate that WaVPeak achieves an average of 96%, 91%, 88%, 76% and 85% recall on (15)N-HSQC, HNCO, HNCA, HNCACB and CBCA(CO)NH, respectively. When the same number of peaks are considered, WaVPeak significantly outperforms PICKY. AVAILABILITY WaVPeak is an open source program. The source code and two test spectra of WaVPeak are available at http://faculty.kaust.edu.sa/sites/xingao/Pages/Publications.aspx. The online server is under construction. CONTACT statliuzhi@xmu.edu.cn; ahmed.abbas@kaust.edu.sa; majing@ust.hk; xin.gao@kaust.edu.sa.
Collapse
Affiliation(s)
- Zhi Liu
- The Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen 361000, China
| | | | | | | |
Collapse
|
38
|
Jang R, Gao X, Li M. Combining automated peak tracking in SAR by NMR with structure-based backbone assignment from 15N-NOESY. BMC Bioinformatics 2012; 13 Suppl 3:S4. [PMID: 22536902 PMCID: PMC3402924 DOI: 10.1186/1471-2105-13-s3-s4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Chemical shift mapping is an important technique in NMR-based drug screening for identifying the atoms of a target protein that potentially bind to a drug molecule upon the molecule's introduction in increasing concentrations. The goal is to obtain a mapping of peaks with known residue assignment from the reference spectrum of the unbound protein to peaks with unknown assignment in the target spectrum of the bound protein. Although a series of perturbed spectra help to trace a path from reference peaks to target peaks, a one-to-one mapping generally is not possible, especially for large proteins, due to errors, such as noise peaks, missing peaks, missing but then reappearing, overlapped, and new peaks not associated with any peaks in the reference. Due to these difficulties, the mapping is typically done manually or semi-automatically, which is not efficient for high-throughput drug screening. Results We present PeakWalker, a novel peak walking algorithm for fast-exchange systems that models the errors explicitly and performs many-to-one mapping. On the proteins: hBclXL, UbcH5B, and histone H1, it achieves an average accuracy of over 95% with less than 1.5 residues predicted per target peak. Given these mappings as input, we present PeakAssigner, a novel combined structure-based backbone resonance and NOE assignment algorithm that uses just 15N-NOESY, while avoiding TOCSY experiments and 13C-labeling, to resolve the ambiguities for a one-to-one mapping. On the three proteins, it achieves an average accuracy of 94% or better. Conclusions Our mathematical programming approach for modeling chemical shift mapping as a graph problem, while modeling the errors directly, is potentially a time- and cost-effective first step for high-throughput drug screening based on limited NMR data and homologous 3D structures.
Collapse
Affiliation(s)
- Richard Jang
- David R Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
| | | | | |
Collapse
|
39
|
Nikel O, Laurencin D, Bonhomme C, Sroga GE, Besdo S, Lorenz A, Vashishth D. Solid state NMR investigation of intact human bone quality: balancing issues and insight into the structure at the organic-mineral interface. THE JOURNAL OF PHYSICAL CHEMISTRY. C, NANOMATERIALS AND INTERFACES 2012; 116:6320-6331. [PMID: 22822414 PMCID: PMC3399594 DOI: 10.1021/jp2125312] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Age-related bone fragility fractures present a significant problem for public health. Measures of bone quality are increasingly recognized to complement the conventional bone mineral density (BMD) based assessment of fracture risk. The ability to probe and understand bone quality at the molecular level is desirable in order to unravel how the structure of organic matrix and its association with mineral contribute to the overall mechanical properties. The (13)C{(31)P} REDOR MAS NMR (Rotational Echo Double Resonance Magic Angle Spinning Nuclear Magnetic Resonance) technique is uniquely suited for the study of the structure of the organic-mineral interface in bone. For the first time, we have applied it successfully to analyze the structure of intact (non-powdered) human cortical bone samples, from young healthy and old osteoporotic donors. Loading problems associated with the rapid rotation of intact bone were solved using a Finite Element Analysis (FEA) approach, and a method allowing osteoporotic samples to be balanced and spun reproducibly is described. REDOR NMR parameters were set to allow insight into the arrangement of the amino acids at the mineral interface to be accessed, and SVD (Singular Value Decomposition) was applied to enhance the signal to noise ratio and enable a better analysis of the data. From the REDOR data, it was found that carbon atoms belonging to citrate/glucosaminoglycans (GAGs) are closest to the mineral surface regardless of age or site. In contrast, the arrangement of the collagen backbone at the interface varied with site and age. The relative proximity of two of the main amino acids in bone matrix proteins, hydroxyproline and alanine, with respect to the mineral phase was analyzed in more detail, and discussed in view of glycation measurements which were carried out on the tissues. Overall, this work shows that the (13)C{(31)P} REDOR NMR approach could be used as a complementary technique to assess a novel aspect of bone quality, the organic-mineral interface structure.
Collapse
Affiliation(s)
- Ondrej Nikel
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, New York, USA
- Institut Charles Gerhardt de Montpellier, UMR 5253, CNRS-UM2-ENSCM-UM1, Université Montpellier 2, Montpellier, France
| | - Danielle Laurencin
- Institut Charles Gerhardt de Montpellier, UMR 5253, CNRS-UM2-ENSCM-UM1, Université Montpellier 2, Montpellier, France
| | - Christian Bonhomme
- Laboratoire de Chimie de la Matière Condensée de Paris, UMR 7574, UPMC Univ. Paris 06, Paris, France
| | - Grażyna E. Sroga
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, New York, USA
| | - Silke Besdo
- Laboratoire de Chimie de la Matière Condensée de Paris, UMR 7574, UPMC Univ. Paris 06, Paris, France
| | - Anna Lorenz
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, New York, USA
| | - Deepak Vashishth
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, New York, USA
| |
Collapse
|
40
|
Roberts AG, Yang J, Halpert JR, Nelson SD, Thummel KT, Atkins WM. The structural basis for homotropic and heterotropic cooperativity of midazolam metabolism by human cytochrome P450 3A4. Biochemistry 2011; 50:10804-18. [PMID: 21992114 DOI: 10.1021/bi200924t] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Human cytochrome P450 3A4 (CYP3A4) metabolizes a significant portion of clinically relevant drugs and often exhibits complex steady-state kinetics that can involve homotropic and heterotropic cooperativity between bound ligands. In previous studies, the hydroxylation of the sedative midazolam (MDZ) exhibited homotropic cooperativity via a decrease in the ratio of 1'-OH-MDZ to 4-OH-MDZ at higher drug concentrations. In this study, MDZ exhibited heterotropic cooperativity with the antiepileptic drug carbamazepine (CBZ) with characteristic decreases in the 1'-OH-MDZ to 4-OH-MDZ ratios. To unravel the structural basis of MDZ cooperativity, we probed MDZ and CBZ bound to CYP3A4 using longitudinal T(1) nuclear magnetic resonance (NMR) relaxation and molecular docking with AutoDock 4.2. The distances calculated from longitudinal T(1) NMR relaxation were used during simulated annealing to constrain the molecules to the substrate-free X-ray crystal structure of CYP3A4. These simulations revealed that either two MDZ molecules or an MDZ molecule and a CBZ molecule assume a stacked configuration within the CYP3A4 active site. In either case, the proton at position 4 of the MDZ molecule was closer to the heme than the protons of the 1'-CH(3) group. In contrast, molecular docking of a single molecule of MDZ revealed that the molecule was preferentially oriented with the 1'-CH(3) position closer to the heme than position 4. This study provides the first detailed molecular analysis of heterotropic and homotropic cooperativity of a human cytochrome P450 from an NMR-based model. Cooperativity of ligand binding through direct interaction between stacked molecules may represent a common motif for homotropic and heterotropic cooperativity.
Collapse
Affiliation(s)
- Arthur G Roberts
- Department of Pharmaceutical and Biomedical Sciences, University of Georgia, Athens, Georgia 30602, United States.
| | | | | | | | | | | |
Collapse
|
41
|
Jang R, Gao X, Li M. Towards fully automated structure-based NMR resonance assignment of ¹⁵N-labeled proteins from automatically picked peaks. J Comput Biol 2011; 18:347-63. [PMID: 21385039 DOI: 10.1089/cmb.2010.0251] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In NMR resonance assignment, an indispensable step in NMR protein studies, manually processed peaks from both N-labeled and C-labeled spectra are typically used as inputs. However, the use of homologous structures can allow one to use only N-labeled NMR data and avoid the added expense of using C-labeled data. We propose a novel integer programming framework for structure-based backbone resonance assignment using N-labeled data. The core consists of a pair of integer programming models: one for spin system forming and amino acid typing, and the other for backbone resonance assignment. The goal is to perform the assignment directly from spectra without any manual intervention via automatically picked peaks, which are much noisier than manually picked peaks, so methods must be error-tolerant. In the case of semi-automated/manually processed peak data, we compare our system with the Xiong-Pandurangan-Bailey-Kellogg's contact replacement (CR) method, which is the most error-tolerant method for structure-based resonance assignment. Our system, on average, reduces the error rate of the CR method by five folds on their data set. In addition, by using an iterative algorithm, our system has the added capability of using the NOESY data to correct assignment errors due to errors in predicting the amino acid and secondary structure type of each spin system. On a publicly available data set for human ubiquitin, where the typing accuracy is 83%, we achieve 91% accuracy, compared to the 59% accuracy obtained without correcting for such errors. In the case of automatically picked peaks, using assignment information from yeast ubiquitin, we achieve a fully automatic assignment with 97% accuracy. To our knowledge, this is the first system that can achieve fully automatic structure-based assignment directly from spectra. This has implications in NMR protein mutant studies, where the assignment step is repeated for each mutant.
Collapse
Affiliation(s)
- Richard Jang
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
| | | | | |
Collapse
|
42
|
Alipanahi B, Gao X, Karakoc E, Li SC, Balbach F, Feng G, Donaldson L, Li M. Error tolerant NMR backbone resonance assignment and automated structure generation. J Bioinform Comput Biol 2011; 9:15-41. [PMID: 21328705 DOI: 10.1142/s0219720011005276] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2010] [Revised: 09/04/2010] [Accepted: 10/12/2010] [Indexed: 11/18/2022]
Abstract
Error tolerant backbone resonance assignment is the cornerstone of the NMR structure determination process. Although a variety of assignment approaches have been developed, none works sufficiently well on noisy fully automatically picked peaks to enable the subsequent automatic structure determination steps. We have designed an integer linear programming (ILP) based assignment system (IPASS) that has enabled fully automatic protein structure determination for four test proteins. IPASS employs probabilistic spin system typing based on chemical shifts and secondary structure predictions. Furthermore, IPASS extracts connectivity information from the inter-residue information and the (automatically picked) (15)N-edited NOESY peaks which are then used to fix reliable fragments. When applied to automatically picked peaks for real proteins, IPASS achieves an average precision and recall of 82% and 63%, respectively. In contrast, the next best method, MARS, achieves an average precision and recall of 77% and 36%, respectively. The assignments generated by IPASS are then fed into our protein structure calculation system, FALCON-NMR, to determine the 3D structures without human intervention. The final models have backbone RMSDs of 1.25Å, 0.88Å, 1.49Å, and 0.67Å to the reference native structures for proteins TM1112, CASKIN, VRAR, and HACS1, respectively. The web server is publicly available at http://monod.uwaterloo.ca/nmr/ipass.
Collapse
Affiliation(s)
- Babak Alipanahi
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L3G1, Canada
| | | | | | | | | | | | | | | |
Collapse
|
43
|
Abstract
Around half of all protein structures solved nowadays using solution-state nuclear magnetic resonance (NMR) spectroscopy have been because of automated data analysis. The pervasiveness of computational approaches in general hides, however, a more nuanced view in which the full variety and richness of the field appears. This review is structured around a comparison of methods associated with three NMR observables: classical nuclear Overhauser effect (NOE) constraint gathering in contrast with more recent chemical shift and residual dipole coupling (RDC) based protocols. In each case, the emphasis is placed on the latest research, covering mainly the past 5 years. By describing both general concepts and representative programs, the objective is to map out a field in which--through the very profusion of approaches--it is all too easy to lose one's bearings.
Collapse
|
44
|
Ziarek JJ, Peterson FC, Lytle BL, Volkman BF. Binding site identification and structure determination of protein-ligand complexes by NMR a semiautomated approach. Methods Enzymol 2011; 493:241-75. [PMID: 21371594 PMCID: PMC3635485 DOI: 10.1016/b978-0-12-381274-2.00010-8] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Over the last 15 years, the role of NMR spectroscopy in the lead identification and optimization stages of pharmaceutical drug discovery has steadily increased. NMR occupies a unique niche in the biophysical analysis of drug-like compounds because of its ability to identify binding sites, affinities, and ligand poses at the level of individual amino acids without necessarily solving the structure of the protein-ligand complex. However, it can also provide structures of flexible proteins and low-affinity (K(d)>10(-6)M) complexes, which often fail to crystallize. This chapter emphasizes a throughput-focused protocol that aims to identify practical aspects of binding site characterization, automated and semiautomated NMR assignment methods, and structure determination of protein-ligand complexes by NMR.
Collapse
Affiliation(s)
- Joshua J. Ziarek
- Department of Biochemistry, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, Wisconsin, 53226 USA
| | - Francis C. Peterson
- Department of Biochemistry, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, Wisconsin, 53226 USA
| | - Betsy L. Lytle
- Department of Biochemistry, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, Wisconsin, 53226 USA
| | - Brian F. Volkman
- Department of Biochemistry, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, Wisconsin, 53226 USA
| |
Collapse
|