1
|
Klukowski P, Damberger FF, Allain FHT, Iwai H, Kadavath H, Ramelot TA, Montelione GT, Riek R, Güntert P. The 100-protein NMR spectra dataset: A resource for biomolecular NMR data analysis. Sci Data 2024; 11:30. [PMID: 38177162 PMCID: PMC10767026 DOI: 10.1038/s41597-023-02879-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/22/2023] [Indexed: 01/06/2024] Open
Abstract
Multidimensional NMR spectra are the basis for studying proteins by NMR spectroscopy and crucial for the development and evaluation of methods for biomolecular NMR data analysis. Nevertheless, in contrast to derived data such as chemical shift assignments in the BMRB and protein structures in the PDB databases, this primary data is in general not publicly archived. To change this unsatisfactory situation, we present a standardized set of solution NMR data comprising 1329 2-4-dimensional NMR spectra and associated reference (chemical shift assignments, structures) and derived (peak lists, restraints for structure calculation, etc.) annotations. With the 100-protein NMR spectra dataset that was originally compiled for the development of the ARTINA deep learning-based spectra analysis method, 100 protein structures can be reproduced from their original experimental data. The 100-protein NMR spectra dataset is expected to help the development of computational methods for NMR spectroscopy, in particular machine learning approaches, and enable consistent and objective comparisons of these methods.
Collapse
Affiliation(s)
- Piotr Klukowski
- Institute of Molecular Physical Science, ETH Zurich, 8093, Zurich, Switzerland.
| | - Fred F Damberger
- Institute of Biochemistry, ETH Zurich, 8093, Zurich, Switzerland
| | | | - Hideo Iwai
- Institute of Biotechnology, University of Helsinki, 00100, Helsinki, Finland
| | | | - Theresa A Ramelot
- Department of Chemistry and Chemical Biology, and Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
| | - Gaetano T Montelione
- Department of Chemistry and Chemical Biology, and Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
| | - Roland Riek
- Institute of Molecular Physical Science, ETH Zurich, 8093, Zurich, Switzerland.
| | - Peter Güntert
- Institute of Molecular Physical Science, ETH Zurich, 8093, Zurich, Switzerland.
- Institute of Biophysical Chemistry, Goethe University, 60438, Frankfurt am Main, Germany.
- Department of Chemistry, Tokyo Metropolitan University, Hachioji, 192-0397, Tokyo, Japan.
| |
Collapse
|
2
|
Xue X, Sun H, Yang M, Liu X, Hu HY, Deng Y, Wang X. Advances in the Application of Artificial Intelligence-Based Spectral Data Interpretation: A Perspective. Anal Chem 2023; 95:13733-13745. [PMID: 37688541 DOI: 10.1021/acs.analchem.3c02540] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2023]
Abstract
The interpretation of spectral data, including mass, nuclear magnetic resonance, infrared, and ultraviolet-visible spectra, is critical for obtaining molecular structural information. The development of advanced sensing technology has multiplied the amount of available spectral data. Chemical experts must use basic principles corresponding to the spectral information generated by molecular fragments and functional groups. This is a time-consuming process that requires a solid professional knowledge base. In recent years, the rapid development of computer science and its applications in cheminformatics and the emergence of computer-aided expert systems have greatly reduced the difficulty in analyzing large quantities of data. For expert systems, however, the problem-solving strategy must be known in advance or extracted by human experts and translated into algorithms. Gratifyingly, the development of artificial intelligence (AI) methods has shown great promise for solving such problems. Traditional algorithms, including the latest neural network algorithms, have shown great potential for both extracting useful information and processing massive quantities of data. This Perspective highlights recent innovations covering all of the emerging AI-based spectral interpretation techniques. In addition, the main limitations and current obstacles are presented, and the corresponding directions for further research are proposed. Moreover, this Perspective gives the authors' personal outlook on the development and future applications of spectral interpretation.
Collapse
Affiliation(s)
- Xi Xue
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hai-Yu Hu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
| |
Collapse
|
3
|
Chai X, Liu C, Fan X, Huang T, Zhang X, Jiang B, Liu M. Combination of peak-picking and binning for NMR-based untargeted metabonomics study. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2023; 351:107429. [PMID: 37099854 DOI: 10.1016/j.jmr.2023.107429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 03/27/2023] [Accepted: 03/29/2023] [Indexed: 05/29/2023]
Abstract
In NMR-based untargeted metabolomic studies, 1H NMR spectra are usually divided into equal bins/buckets to diminish the effects of peak shift caused by sample status or instrument instability, and to reduce the number of variables used as input for the multivariate statistical analysis. It was noticed that the peaks near bin boundaries may cause significant changes in integral values of adjacent bins, and the weaker peak may be obscured if it is allocated in the same bin with intense peaks. Several efforts have been taken to improve the performance of binning. Here we propose an alternative method, named P-Bin, based on the combination of the classic peak-picking and binning procedures. The location of each peak defined by peak-picking is used as the center of the individual bin. P-Bin is expected to keep all spectral information associated with the peaks and significantly reduce the data size as the spectral regions without peaks are not considered. In addition, both peak-picking and binning are routine procedures, making P-Bin easy to be implemented. To verify the performance, two sets of experimental data from human plasma and Ganoderma lucidum (G. lucidum) extracts were processed using the conventional binning method and the proposed method, before the principal component analysis (PCA) and the orthogonal projection to latent structures discriminant analysis (OPLS-DA). The results indicate that the proposed method has improved both the clustering performance of PCA score plots and the interpretability of OPLS-DA loading plots, and P-Bin could be an improved version of data preparation for metabonomic study.
Collapse
Affiliation(s)
- Xin Chai
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement of Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Caixiang Liu
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement of Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xinyu Fan
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement of Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; State Key Laboratory of Component-based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China
| | - Tao Huang
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement of Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xu Zhang
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement of Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China; Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430074, China; Optics Valley Laboratory, Wuhan 430074, China
| | - Bin Jiang
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement of Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China; Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430074, China; Optics Valley Laboratory, Wuhan 430074, China.
| | - Maili Liu
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement of Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China; Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430074, China; Optics Valley Laboratory, Wuhan 430074, China.
| |
Collapse
|
4
|
Schmid N, Bruderer S, Paruzzo F, Fischetti G, Toscano G, Graf D, Fey M, Henrici A, Ziebart V, Heitmann B, Grabner H, Wegner JD, Sigel RKO, Wilhelm D. Deconvolution of 1D NMR spectra: A deep learning-based approach. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2023; 347:107357. [PMID: 36563418 DOI: 10.1016/j.jmr.2022.107357] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 12/01/2022] [Accepted: 12/04/2022] [Indexed: 06/17/2023]
Abstract
The analysis of nuclear magnetic resonance (NMR) spectra to detect peaks and characterize their parameters, often referred to as deconvolution, is a crucial step in the quantification, elucidation, and verification of the structure of molecular systems. However, deconvolution of 1D NMR spectra is a challenge for both experts and machines. We propose a robust, expert-level quality deep learning-based deconvolution algorithm for 1D experimental NMR spectra. The algorithm is based on a neural network trained on synthetic spectra. Our customized pre-processing and labeling of the synthetic spectra enable the estimation of critical peak parameters. Furthermore, the neural network model transfers well to the experimental spectra and demonstrates low fitting errors and sparse peak lists in challenging scenarios such as crowded, high dynamic range, shoulder peak regions as well as broad peaks. We demonstrate in challenging spectra that the proposed algorithm is superior to expert results.
Collapse
Affiliation(s)
- N Schmid
- Zurich University of Applied Sciences (ZHAW), Switzerland; University of Zurich (UZH), Switzerland.
| | | | | | | | | | - D Graf
- Bruker Switzerland AG, Switzerland
| | - M Fey
- Bruker Switzerland AG, Switzerland
| | - A Henrici
- Zurich University of Applied Sciences (ZHAW), Switzerland
| | - V Ziebart
- Zurich University of Applied Sciences (ZHAW), Switzerland
| | | | - H Grabner
- Zurich University of Applied Sciences (ZHAW), Switzerland
| | | | | | - D Wilhelm
- Zurich University of Applied Sciences (ZHAW), Switzerland
| |
Collapse
|
5
|
Li DW, Hansen AL, Bruschweiler-Li L, Yuan C, Brüschweiler R. Fundamental and practical aspects of machine learning for the peak picking of biomolecular NMR spectra. JOURNAL OF BIOMOLECULAR NMR 2022; 76:49-57. [PMID: 35389128 PMCID: PMC9246764 DOI: 10.1007/s10858-022-00393-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 02/28/2022] [Indexed: 06/14/2023]
Abstract
Rapid progress in machine learning offers new opportunities for the automated analysis of multidimensional NMR spectra ranging from protein NMR to metabolomics applications. Most recently, it has been demonstrated how deep neural networks (DNN) designed for spectral peak picking are capable of deconvoluting highly crowded NMR spectra rivaling the facilities of human experts. Superior DNN-based peak picking is one of a series of critical steps during NMR spectral processing, analysis, and interpretation where machine learning is expected to have a major impact. In this perspective, we lay out some of the unique strengths as well as challenges of machine learning approaches in this new era of automated NMR spectral analysis. Such a discussion seems timely and should help define common goals for the NMR community, the sharing of software tools, standardization of protocols, and calibrate expectations. It will also help prepare for an NMR future where machine learning and artificial intelligence tools will be common place.
Collapse
Affiliation(s)
- Da-Wei Li
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, 43210, USA.
| | - Alexandar L Hansen
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Lei Bruschweiler-Li
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Chunhua Yuan
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Rafael Brüschweiler
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, 43210, USA.
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH, 43210, USA.
- Department of Biological Chemistry and Pharmacology, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
6
|
NMR in Metabolomics: From Conventional Statistics to Machine Learning and Neural Network Approaches. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12062824] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
NMR measurements combined with chemometrics allow achieving a great amount of information for the identification of potential biomarkers responsible for a precise metabolic pathway. These kinds of data are useful in different fields, ranging from food to biomedical fields, including health science. The investigation of the whole set of metabolites in a sample, representing its fingerprint in the considered condition, is known as metabolomics and may take advantage of different statistical tools. The new frontier is to adopt self-learning techniques to enhance clustering or classification actions that can improve the predictive power over large amounts of data. Although machine learning is already employed in metabolomics, deep learning and artificial neural networks approaches were only recently successfully applied. In this work, we give an overview of the statistical approaches underlying the wide range of opportunities that machine learning and neural networks allow to perform with accurate metabolites assignment and quantification.Various actual challenges are discussed, such as proper metabolomics, deep learning architectures and model accuracy.
Collapse
|
7
|
Li DW, Hansen AL, Yuan C, Bruschweiler-Li L, Brüschweiler R. DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra. Nat Commun 2021; 12:5229. [PMID: 34471142 PMCID: PMC8410766 DOI: 10.1038/s41467-021-25496-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Accepted: 08/09/2021] [Indexed: 11/09/2022] Open
Abstract
The analysis of nuclear magnetic resonance (NMR) spectra for the comprehensive and unambiguous identification and characterization of peaks is a difficult, but critically important step in all NMR analyses of complex biological molecular systems. Here, we introduce DEEP Picker, a deep neural network (DNN)-based approach for peak picking and spectral deconvolution which semi-automates the analysis of two-dimensional NMR spectra. DEEP Picker includes 8 hidden convolutional layers and was trained on a large number of synthetic spectra of known composition with variable degrees of crowdedness. We show that our method is able to correctly identify overlapping peaks, including ones that are challenging for expert spectroscopists and existing computational methods alike. We demonstrate the utility of DEEP Picker on NMR spectra of folded and intrinsically disordered proteins as well as a complex metabolomics mixture, and show how it provides access to valuable NMR information. DEEP Picker should facilitate the semi-automation and standardization of protocols for better consistency and sharing of results within the scientific community.
Collapse
Affiliation(s)
- Da-Wei Li
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, USA.
| | - Alexandar L Hansen
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, USA
| | - Chunhua Yuan
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, USA
| | - Lei Bruschweiler-Li
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, USA
| | - Rafael Brüschweiler
- Campus Chemical Instrument Center, The Ohio State University, Columbus, OH, USA.
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH, USA.
- Department of Biological Chemistry and Pharmacology, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
8
|
Vincenzi M, Mercurio FA, Leone M. NMR Spectroscopy in the Conformational Analysis of Peptides: An Overview. Curr Med Chem 2021; 28:2729-2782. [PMID: 32614739 DOI: 10.2174/0929867327666200702131032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 05/21/2020] [Accepted: 05/28/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND NMR spectroscopy is one of the most powerful tools to study the structure and interaction properties of peptides and proteins from a dynamic perspective. Knowing the bioactive conformations of peptides is crucial in the drug discovery field to design more efficient analogue ligands and inhibitors of protein-protein interactions targeting therapeutically relevant systems. OBJECTIVE This review provides a toolkit to investigate peptide conformational properties by NMR. METHODS Articles cited herein, related to NMR studies of peptides and proteins were mainly searched through PubMed and the web. More recent and old books on NMR spectroscopy written by eminent scientists in the field were consulted as well. RESULTS The review is mainly focused on NMR tools to gain the 3D structure of small unlabeled peptides. It is more application-oriented as it is beyond its goal to deliver a profound theoretical background. However, the basic principles of 2D homonuclear and heteronuclear experiments are briefly described. Protocols to obtain isotopically labeled peptides and principal triple resonance experiments needed to study them, are discussed as well. CONCLUSION NMR is a leading technique in the study of conformational preferences of small flexible peptides whose structure can be often only described by an ensemble of conformations. Although NMR studies of peptides can be easily and fast performed by canonical protocols established a few decades ago, more recently we have assisted to tremendous improvements of NMR spectroscopy to investigate instead large systems and overcome its molecular weight limit.
Collapse
Affiliation(s)
- Marian Vincenzi
- Institute of Biostructures and Bioimaging, National Research Council of Italy, Via Mezzocannone 16, 80134, Naples, Italy
| | - Flavia Anna Mercurio
- Institute of Biostructures and Bioimaging, National Research Council of Italy, Via Mezzocannone 16, 80134, Naples, Italy
| | - Marilisa Leone
- Institute of Biostructures and Bioimaging, National Research Council of Italy, Via Mezzocannone 16, 80134, Naples, Italy
| |
Collapse
|
9
|
Chen D, Wang Z, Guo D, Orekhov V, Qu X. Review and Prospect: Deep Learning in Nuclear Magnetic Resonance Spectroscopy. Chemistry 2020; 26:10391-10401. [PMID: 32251549 DOI: 10.1002/chem.202000246] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 04/03/2020] [Indexed: 01/08/2023]
Abstract
Since the concept of deep learning (DL) was formally proposed in 2006, it has had a major impact on academic research and industry. Nowadays, DL provides an unprecedented way to analyze and process data with demonstrated great results in computer vision, medical imaging, natural language processing, and so forth. Herein, applications of DL in NMR spectroscopy are summarized, and a perspective for DL as an entirely new approach that is likely to transform NMR spectroscopy into a much more efficient and powerful technique in chemistry and life sciences is outlined.
Collapse
Affiliation(s)
- Dicheng Chen
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, P.O. Box 979, Xiamen, 361005, P.R. China
| | - Zi Wang
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, P.O. Box 979, Xiamen, 361005, P.R. China
| | - Di Guo
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, 361024, P.R. China
| | - Vladislav Orekhov
- Department of Chemistry and Molecular Biology, University of Gothenburg, Box 465, Gothenburg, 40530, Sweden
| | - Xiaobo Qu
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, P.O. Box 979, Xiamen, 361005, P.R. China
| |
Collapse
|
10
|
Computational methods for NMR and MS for structure elucidation III: More advanced approaches. PHYSICAL SCIENCES REVIEWS 2019. [DOI: 10.1515/psr-2018-0109] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
The structural assignment of natural products, even with the very sophisticated one-dimensional and two-dimensional (1D and 2D) spectroscopic methods available today, is still a tedious and time-consuming task. Mass spectrometry (MS) is generally used for molecular mass determination, molecular formula generation and MS/MSn fragmentation patterns of molecules. In the meantime, nuclear magnetic resonance (NMR) spectroscopy provides spectra (e. g. 1 H, 13C and correlation spectra) whose interpretation allows the structure determination of known or unknown compounds. With the advance of high throughput studies, like metabolomics, the fast and automated identification or annotation of natural products became highly demanded. Some growing tools to meet this demand apply computational methods for structure elucidation. These methods act on characteristic parameters in the structural determination of small molecules. We have numbered and herein present existing and reputed computational methods for peak picking analysis, resonance assignment, nuclear Overhauser effect (NOE) assignment, combinatorial fragmentation and structure calculation and prediction. Fully automated programs in structure determination are also mentioned, together with their integrated algorithms used to elucidate the structure of a metabolite. The use of these automated tools has helped to significantly reduce errors introduced by manual processing and, hence, accelerated the structure identification or annotation of compounds.
Collapse
|
11
|
Monaretto T, Souza A, Moraes TB, Bertucci-Neto V, Rondeau-Mouro C, Colnago LA. Enhancing signal-to-noise ratio and resolution in low-field NMR relaxation measurements using post-acquisition digital filters. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2019; 57:616-625. [PMID: 30443995 DOI: 10.1002/mrc.4806] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 11/09/2018] [Accepted: 11/11/2018] [Indexed: 06/09/2023]
Abstract
The traditional way to enhance signal-to-noise ratio (SNR) of nuclear magnetic resonance (NMR) signals is to increase the number of scans. However, this procedure increases the measuring time that can be prohibitive for some applications. Therefore, we have tested the use of several post-acquisition digital filters to enhance SNR up to one order of magnitude in time domain NMR (TD-NMR) relaxation measurements. The procedures were studied using continuous wave free precession (CWFP-T1 ) signals, acquired with very low flip angles that contain six times more noise than the Carr-Purcell-Meiboom-Gill (CPMG) signal of the same sample and experimental time. Linear (LI) and logarithmic (LO) data compression, low-pass infinity impulse response (LP), Savitzky-Golay (SG), and wavelet transform (WA) post-acquisition filters enhanced the SNR of the CWFP-T1 signals by at least six times. The best filters were LO, SG, and WA that have high enhancement in SNR without significant distortions in the ILT relaxation distribution data. Therefore, it was demonstrated that these post-acquisition digital filters could be a useful way to denoise CWFP-T1 , as well as CPMG noisy signals, and consequently reducing the experimental time. It was also demonstrated that filtered CWFP-T1 method has the potential to be a rapid and nondestructive method to measure fat content in beef and certainly in other meat samples.
Collapse
Affiliation(s)
- Tatiana Monaretto
- Universidade de São Paulo, Instituto de Química de São Carlos, São Carlos, Brazil
| | - Andre Souza
- Schlumberger Brazil Technology Integration Center, Rio de Janeiro, Brazil
| | - Tiago Bueno Moraes
- Universidade de São Paulo, Instituto de Física de São Carlos, São Carlos, Brazil
| | | | | | | |
Collapse
|
12
|
Huang X, Dong H, Tao Q, Yu M, Li Y, Rong L, Krause HJ, Offenhäusser A, Xie X. Sensor Configuration and Algorithms for Power-Line Interference Suppression in Low Field Nuclear Magnetic Resonance. SENSORS (BASEL, SWITZERLAND) 2019; 19:E3566. [PMID: 31443310 PMCID: PMC6721142 DOI: 10.3390/s19163566] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2019] [Revised: 08/12/2019] [Accepted: 08/14/2019] [Indexed: 11/16/2022]
Abstract
Low field (LF) nuclear magnetic resonance (NMR) shows potential advantages to study pure heteronuclear J-coupling and observe the fine structure of matter. Power-line harmonics interferences and fixed-frequency noise peaks might introduce discrete noise peaks into the LF-NMR spectrum in an open environment or in a conductively shielded room, which might disturb J-coupling spectra of matter recorded at LF. In this paper, we describe a multi-channel sensor configuration of superconducting quantum interference devices, and measure the multiple peaks of the 2,2,2-trifluoroethanol J-coupling spectrum. For the case of low signal to noise ratio (SNR) < 1, we suggest two noise suppression algorithms using discrete wavelet analysis (DWA), combined with either least squares method (LSM) or gradient descent (GD). The de-noising methods are based on spatial correlation of the interferences among the superconducting sensors, and are experimentally demonstrated. The DWA-LSM algorithm shows a significant effect in the noise reduction and recovers SNR > 1 for most of the signal peaks. The DWA-GD algorithm improves the SNR further, but takes more computational time. Depending on whether the accuracy or the speed of the de-noising process is more important in LF-NMR applications, the choice of algorithm should be made.
Collapse
Affiliation(s)
- Xiaolei Huang
- State Key Laboratory of Functional Materials for Informatics, Shanghai Institute of Microsystem and Information Technology (SIMIT), Chinese Academy of Sciences (CAS), Shanghai 200050, China
- CAS Center for ExcelleNce in Superconducting Electronics (CENSE), Shanghai 200050, China
- Institute of Complex System (ICS-8), Forschungszentrum Jülich (FZJ), D-52425 Jülich, Germany
- Joint Research Institute on Functional Materials and Electronics, Collaboration between SIMIT and FZJ, D-52425 Jülich, Germany
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hui Dong
- State Key Laboratory of Functional Materials for Informatics, Shanghai Institute of Microsystem and Information Technology (SIMIT), Chinese Academy of Sciences (CAS), Shanghai 200050, China.
- CAS Center for ExcelleNce in Superconducting Electronics (CENSE), Shanghai 200050, China.
- Joint Research Institute on Functional Materials and Electronics, Collaboration between SIMIT and FZJ, D-52425 Jülich, Germany.
| | - Quan Tao
- State Key Laboratory of Functional Materials for Informatics, Shanghai Institute of Microsystem and Information Technology (SIMIT), Chinese Academy of Sciences (CAS), Shanghai 200050, China
- CAS Center for ExcelleNce in Superconducting Electronics (CENSE), Shanghai 200050, China
- Joint Research Institute on Functional Materials and Electronics, Collaboration between SIMIT and FZJ, D-52425 Jülich, Germany
| | - Mengmeng Yu
- State Key Laboratory of Functional Materials for Informatics, Shanghai Institute of Microsystem and Information Technology (SIMIT), Chinese Academy of Sciences (CAS), Shanghai 200050, China
- CAS Center for ExcelleNce in Superconducting Electronics (CENSE), Shanghai 200050, China
- Joint Research Institute on Functional Materials and Electronics, Collaboration between SIMIT and FZJ, D-52425 Jülich, Germany
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yongqiang Li
- State Key Laboratory of Functional Materials for Informatics, Shanghai Institute of Microsystem and Information Technology (SIMIT), Chinese Academy of Sciences (CAS), Shanghai 200050, China
- CAS Center for ExcelleNce in Superconducting Electronics (CENSE), Shanghai 200050, China
- Joint Research Institute on Functional Materials and Electronics, Collaboration between SIMIT and FZJ, D-52425 Jülich, Germany
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Liangliang Rong
- State Key Laboratory of Functional Materials for Informatics, Shanghai Institute of Microsystem and Information Technology (SIMIT), Chinese Academy of Sciences (CAS), Shanghai 200050, China
- CAS Center for ExcelleNce in Superconducting Electronics (CENSE), Shanghai 200050, China
- Joint Research Institute on Functional Materials and Electronics, Collaboration between SIMIT and FZJ, D-52425 Jülich, Germany
| | - Hans-Joachim Krause
- Institute of Complex System (ICS-8), Forschungszentrum Jülich (FZJ), D-52425 Jülich, Germany. h.-
- Joint Research Institute on Functional Materials and Electronics, Collaboration between SIMIT and FZJ, D-52425 Jülich, Germany. h.-
| | - Andreas Offenhäusser
- Institute of Complex System (ICS-8), Forschungszentrum Jülich (FZJ), D-52425 Jülich, Germany
- Joint Research Institute on Functional Materials and Electronics, Collaboration between SIMIT and FZJ, D-52425 Jülich, Germany
| | - Xiaoming Xie
- State Key Laboratory of Functional Materials for Informatics, Shanghai Institute of Microsystem and Information Technology (SIMIT), Chinese Academy of Sciences (CAS), Shanghai 200050, China
- CAS Center for ExcelleNce in Superconducting Electronics (CENSE), Shanghai 200050, China
- Joint Research Institute on Functional Materials and Electronics, Collaboration between SIMIT and FZJ, D-52425 Jülich, Germany
| |
Collapse
|
13
|
Cobas C. Applications of the Whittaker smoother in NMR spectroscopy. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2018; 56:1140-1148. [PMID: 29719068 DOI: 10.1002/mrc.4747] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 04/09/2018] [Accepted: 04/16/2018] [Indexed: 05/26/2023]
Abstract
The Whittaker smoother, a special case of penalized least square, is a multipurpose algorithm that has proven to be very useful in many scientific fields, including image processing, chromatography, and optical spectroscopy. It shares many similarities with the Savitzky-Golay algorithm, but it is significantly faster and easier to automate. Its use in nuclear magnetic resonance, however, is not widespread although several applications have recently been published. In this review, the mathematical background of the method and its main applications in nuclear magnetic resonance spectroscopy will be discussed.
Collapse
Affiliation(s)
- Carlos Cobas
- Mestrelab Research S.L., Santiago de Compostela, A Coruña, 15706, Spain
| |
Collapse
|
14
|
Alazmi M, Abbas A, Guo X, Fan M, Li L, Gao X. A Slice-based 13C-detected NMR Spin System Forming and Resonance Assignment Method. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1999-2008. [PMID: 29994483 DOI: 10.1109/tcbb.2018.2849728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Nuclear magnetic resonance (NMR) spectroscopy is attracting more attention in the field of computational structural biology. Till recently, 1H-detected experiments are the dominant NMR technique used due to the high sensitivity of 1H nuclei. However, the current availability of high magnetic fields and cryogenically cooled probe heads allow researchers to overcome the low sensitivity of 13C nuclei. Consequently, 13C-detected experiments have become a popular technique in different NMR applications especially resonance assignment and structure determination of large proteins. In this paper, we propose the first spin system forming method for 13C-detected NMR spectra. Our method is able to accurately form spin systems based on as few as two 13C-detected spectra, CBCACON, and CBCANCO. Our method picks slices from the more trusted spectrum and uses them as feedback to direct the slice picking in the less trusted one. This feedback leads to picking the accurate slices that consequently helps to form better spin systems. We tested our method on a real dataset of 'Ubiquitin' and a benchmark simulated dataset consisting of 12 proteins. We fed our spin systems as inputs to a genetic algorithm to generate the chemical shift assignment, and obtained 92 percent correct chemical shift assignment for Ubiquitin. For the simulated dataset, we obtained an average recall of 86 percent and an average precision of 88 percent. Finally, our chemical shift assignment of Ubiquitin was given as an input to CS-ROSETTA server that generated structures close to the experimentally determined structure.
Collapse
|
15
|
Klukowski P, Augoff M, Zięba M, Drwal M, Gonczarek A, Walczak MJ. NMRNet: a deep learning approach to automated peak picking of protein NMR spectra. Bioinformatics 2018; 34:2590-2597. [DOI: 10.1093/bioinformatics/bty134] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 03/09/2018] [Indexed: 01/13/2023] Open
Affiliation(s)
- Piotr Klukowski
- Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeze Wyspianskiego 27, Wrocław, Poland
| | - Michał Augoff
- Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeze Wyspianskiego 27, Wrocław, Poland
| | - Maciej Zięba
- Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeze Wyspianskiego 27, Wrocław, Poland
| | - Maciej Drwal
- Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeze Wyspianskiego 27, Wrocław, Poland
| | - Adam Gonczarek
- Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeze Wyspianskiego 27, Wrocław, Poland
- Alphamoon Ltd., ul. Wlodkowica 21/3, Wrocław, Poland
| | - Michał J Walczak
- Captor Therapeutics Ltd., ul. Dunska 11, Wrocław, Poland
- Alphamoon Ltd., ul. Wlodkowica 21/3, Wrocław, Poland
| |
Collapse
|
16
|
Han R, Zhang F, Gao X. A fast fiducial marker tracking model for fully automatic alignment in electron tomography. Bioinformatics 2018; 34:853-863. [PMID: 29069299 PMCID: PMC6030832 DOI: 10.1093/bioinformatics/btx653] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Revised: 09/28/2017] [Accepted: 10/20/2017] [Indexed: 11/25/2022] Open
Abstract
Motivation Automatic alignment, especially fiducial marker-based alignment, has become increasingly important due to the high demand of subtomogram averaging and the rapid development of large-field electron microscopy. Among the alignment steps, fiducial marker tracking is a crucial one that determines the quality of the final alignment. Yet, it is still a challenging problem to track the fiducial markers accurately and effectively in a fully automatic manner. Results In this paper, we propose a robust and efficient scheme for fiducial marker tracking. Firstly, we theoretically prove the upper bound of the transformation deviation of aligning the positions of fiducial markers on two micrographs by affine transformation. Secondly, we design an automatic algorithm based on the Gaussian mixture model to accelerate the procedure of fiducial marker tracking. Thirdly, we propose a divide-and-conquer strategy against lens distortions to ensure the reliability of our scheme. To our knowledge, this is the first attempt that theoretically relates the projection model with the tracking model. The real-world experimental results further support our theoretical bound and demonstrate the effectiveness of our algorithm. This work facilitates the fully automatic tracking for datasets with a massive number of fiducial markers. Availability and implementation The C/C ++ source code that implements the fast fiducial marker tracking is available at https://github.com/icthrm/gmm-marker-tracking. Markerauto 1.6 version or later (also integrated in the AuTom platform at http://ear.ict.ac.cn/) offers a complete implementation for fast alignment, in which fast fiducial marker tracking is available by the '-t' option. Contact xin.gao@kaust.edu.sa. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Renmin Han
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia
| | - Fa Zhang
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia
| |
Collapse
|
17
|
Exploring an optimal wavelet-based filter for cryo-ET imaging. Sci Rep 2018; 8:2582. [PMID: 29416100 PMCID: PMC5803242 DOI: 10.1038/s41598-018-20945-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 01/22/2018] [Indexed: 01/22/2023] Open
Abstract
Cryo-electron tomography (cryo-ET) is one of the most advanced technologies for the in situ visualization of molecular machines by producing three-dimensional (3D) biological structures. However, cryo-ET imaging has two serious disadvantages—low dose and low image contrast—which result in high-resolution information being obscured by noise and image quality being degraded, and this causes errors in biological interpretation. The purpose of this research is to explore an optimal wavelet denoising technique to reduce noise in cryo-ET images. We perform tests using simulation data and design a filter using the optimum selected wavelet parameters (three-level decomposition, level-1 zeroed out, subband-dependent threshold, a soft-thresholding and spline-based discrete dyadic wavelet transform (DDWT)), which we call a modified wavelet shrinkage filter; this filter is suitable for noisy cryo-ET data. When testing using real cryo-ET experiment data, higher quality images and more accurate measures of a biological structure can be obtained with the modified wavelet shrinkage filter processing compared with conventional processing. Because the proposed method provides an inherent advantage when dealing with cryo-ET images, it can therefore extend the current state-of-the-art technology in assisting all aspects of cryo-ET studies: visualization, reconstruction, structural analysis, and interpretation.
Collapse
|
18
|
NMR-based automated protein structure determination. Arch Biochem Biophys 2017; 628:24-32. [PMID: 28263718 DOI: 10.1016/j.abb.2017.02.011] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Revised: 02/18/2017] [Accepted: 02/28/2017] [Indexed: 11/21/2022]
Abstract
NMR spectra analysis for protein structure determination can now in many cases be performed by automated computational methods. This overview of the computational methods for NMR protein structure analysis presents recent automated methods for signal identification in multidimensional NMR spectra, sequence-specific resonance assignment, collection of conformational restraints, and structure calculation, as implemented in the CYANA software package. These algorithms are sufficiently reliable and integrated into one software package to enable the fully automated structure determination of proteins starting from NMR spectra without manual interventions or corrections at intermediate steps, with an accuracy of 1-2 Å backbone RMSD in comparison with manually solved reference structures.
Collapse
|
19
|
Banelli T, Vuano M, Fogolari F, Fusiello A, Esposito G, Corazza A. Automation of peak-tracking analysis of stepwise perturbed NMR spectra. JOURNAL OF BIOMOLECULAR NMR 2017; 67:121-134. [PMID: 28213793 DOI: 10.1007/s10858-017-0088-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 01/19/2017] [Indexed: 06/06/2023]
Abstract
We describe a new algorithmic approach able to automatically pick and track the NMR resonances of a large number of 2D NMR spectra acquired during a stepwise variation of a physical parameter. The method has been named Trace in Track (TINT), referring to the idea that a gaussian decomposition traces peaks within the tracks recognised through 3D mathematical morphology. It is capable of determining the evolution of the chemical shifts, intensity and linewidths of each tracked peak.The performances obtained in term of track reconstruction and correct assignment on realistic synthetic spectra were high above 90% when a noise level similar to that of experimental data were considered. TINT was applied successfully to several protein systems during a temperature ramp in isotope exchange experiments. A comparison with a state-of-the-art algorithm showed promising results for great numbers of spectra and low signal to noise ratios, when the graduality of the perturbation is appropriate. TINT can be applied to different kinds of high throughput chemical shift mapping experiments, with quasi-continuous variations, in which a quantitative automated recognition is crucial.
Collapse
Affiliation(s)
- Tommaso Banelli
- Dipartimento di Area Medica, Università di Udine, P.le Kolbe, 4, 33100, Udine, Italy
| | - Marco Vuano
- Dipartimento di Area Medica, Università di Udine, P.le Kolbe, 4, 33100, Udine, Italy
| | - Federico Fogolari
- INBB, Viale Medaglie d'Oro, 306, 00136, Roma, Italy
- Dipartimento di Scienze Matematiche Informatiche e Fisiche, Università di Udine, Via delle Scienze, 206, 33100, Udine, Italy
| | - Andrea Fusiello
- Dipartimento Politecnico di Ingegneria e Architettura, Università di Udine, Via delle Scienze, 208, 33100, Udine, Italy
| | - Gennaro Esposito
- INBB, Viale Medaglie d'Oro, 306, 00136, Roma, Italy
- Dipartimento di Scienze Matematiche Informatiche e Fisiche, Università di Udine, Via delle Scienze, 206, 33100, Udine, Italy
- Science & Math Division, New York University Abu Dhabi, Saadiyat Campus, PO Box 129188, Abu Dhabi, UAE
| | - Alessandra Corazza
- Dipartimento di Area Medica, Università di Udine, P.le Kolbe, 4, 33100, Udine, Italy.
- INBB, Viale Medaglie d'Oro, 306, 00136, Roma, Italy.
| |
Collapse
|
20
|
Würz JM, Güntert P. Peak picking multidimensional NMR spectra with the contour geometry based algorithm CYPICK. JOURNAL OF BIOMOLECULAR NMR 2017; 67:63-76. [PMID: 28160195 DOI: 10.1007/s10858-016-0084-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 12/19/2016] [Indexed: 06/06/2023]
Abstract
The automated identification of signals in multidimensional NMR spectra is a challenging task, complicated by signal overlap, noise, and spectral artifacts, for which no universally accepted method is available. Here, we present a new peak picking algorithm, CYPICK, that follows, as far as possible, the manual approach taken by a spectroscopist who analyzes peak patterns in contour plots of the spectrum, but is fully automated. Human visual inspection is replaced by the evaluation of geometric criteria applied to contour lines, such as local extremality, approximate circularity (after appropriate scaling of the spectrum axes), and convexity. The performance of CYPICK was evaluated for a variety of spectra from different proteins by systematic comparison with peak lists obtained by other, manual or automated, peak picking methods, as well as by analyzing the results of automated chemical shift assignment and structure calculation based on input peak lists from CYPICK. The results show that CYPICK yielded peak lists that compare in most cases favorably to those obtained by other automated peak pickers with respect to the criteria of finding a maximal number of real signals, a minimal number of artifact peaks, and maximal correctness of the chemical shift assignments and the three-dimensional structure obtained by fully automated assignment and structure calculation.
Collapse
Affiliation(s)
- Julia M Würz
- Institute of Biophysical Chemistry, Center for Biomolecular Magnetic Resonance, Goethe University Frankfurt am Main, Max-von-Laue-Str. 9, 60438, Frankfurt am Main, Germany
| | - Peter Güntert
- Institute of Biophysical Chemistry, Center for Biomolecular Magnetic Resonance, Goethe University Frankfurt am Main, Max-von-Laue-Str. 9, 60438, Frankfurt am Main, Germany.
- Laboratory of Physical Chemistry, ETH Zürich, Zürich, Switzerland.
- Graduate School of Science and Engineering, Tokyo Metropolitan University, Hachioji, Tokyo, Japan.
| |
Collapse
|
21
|
Chen P, Hu S, Zhang J, Gao X, Li J, Xia J, Wang B. A Sequence-Based Dynamic Ensemble Learning System for Protein Ligand-Binding Site Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:901-912. [PMID: 26661785 DOI: 10.1109/tcbb.2015.2505286] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
BACKGROUND Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures. RESULTS This paper proposes a dynamic ensemble approach to identify protein-ligand binding residues by using sequence information only. To avoid problems resulting from highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we constructed several balanced data sets and we trained a random forest classifier for each of them. We dynamically selected a subset of classifiers according to the similarity between the target protein and the proteins in the training data set. The combination of the predictions of the classifier subset to each query protein target yielded the final predictions. The ensemble of these classifiers formed a sequence-based predictor to identify protein-ligand binding sites. CONCLUSIONS Experimental results on two Critical Assessment of protein Structure Prediction datasets and the ccPDB dataset demonstrated that of our proposed method compared favorably with the state-of-the-art. AVAILABILITY http://www2.ahu.edu.cn/pchen/web/LigandDSES.htm.
Collapse
|
22
|
Sun S, Wang X, Gao X, Ren L, Su X, Bu D, Ning K. Condensing Raman spectrum for single-cell phenotype analysis. BMC Bioinformatics 2015; 16 Suppl 18:S15. [PMID: 26681607 PMCID: PMC4682421 DOI: 10.1186/1471-2105-16-s18-s15] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Background In recent years, high throughput and non-invasive Raman spectrometry technique has matured as an effective approach to identification of individual cells by species, even in complex, mixed populations. Raman profiling is an appealing optical microscopic method to achieve this. To fully utilize Raman proling for single-cell analysis, an extensive understanding of Raman spectra is necessary to answer questions such as which filtering methodologies are effective for pre-processing of Raman spectra, what strains can be distinguished by Raman spectra, and what features serve best as Raman-based biomarkers for single-cells, etc. Results In this work, we have proposed an approach called rDisc to discretize the original Raman spectrum into only a few (usually less than 20) representative peaks (Raman shifts). The approach has advantages in removing noises, and condensing the original spectrum. In particular, effective signal processing procedures were designed to eliminate noise, utilising wavelet transform denoising, baseline correction, and signal normalization. In the discretizing process, representative peaks were selected to signicantly decrease the Raman data size. More importantly, the selected peaks are chosen as suitable to serve as key biological markers to differentiate species and other cellular features. Additionally, the classication performance of discretized spectra was found to be comparable to full spectrum having more than 1000 Raman shifts. Overall, the discretized spectrum needs about 5storage space of a full spectrum and the processing speed is considerably faster. This makes rDisc clearly superior to other methods for single-cell classication.
Collapse
|
23
|
Akhmedov M, Çatay B, Apaydın MS. Automating unambiguous NOE data usage in NVR for NMR protein structure-based assignments. J Bioinform Comput Biol 2015; 13:1550020. [PMID: 26260854 DOI: 10.1142/s0219720015500201] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Nuclear Magnetic Resonance (NMR) Spectroscopy is an important technique that allows determining protein structure in solution. An important problem in protein structure determination using NMR spectroscopy is the mapping of peaks to corresponding amino acids, also known as the assignment problem. Structure-Based Assignment (SBA) is an approach to solve this problem using a template structure that is homologous to the target. Our previously developed approach Nuclear Vector Replacement-Binary Integer Programming (NVR-BIP) computed the optimal solution for small proteins, but was unable to solve the assignments of large proteins. NVR-Ant Colony Optimization (ACO) extended the applicability of the NVR approach for such proteins. One of the input data utilized in these approaches is the Nuclear Overhauser Effect (NOE) data. NOE is an interaction observed between two protons if the protons are located close in space. These protons could be amide protons, protons attached to the alpha-carbon atom in the backbone of the protein, or side chain protons. NVR only uses backbone protons. In this paper, we reformulate the NVR-BIP model to distinguish the type of proton in NOE data and use the corresponding proton coordinates in the extended formulation. In addition, the threshold value over interproton distances is set in a standard manner for all proteins by extracting the NOE upper bound distance information from the data. We also convert NOE intensities into distance thresholds. Our new approach thus handles the NOE data correctly and without manually determined parameters. We accordingly adapt NVR-ACO solution methodology to these changes. Computational results show that our approaches obtain optimal solutions for small proteins. For the large proteins our ant colony optimization-based approach obtains promising results.
Collapse
Affiliation(s)
- Murodzhon Akhmedov
- * Dalle Molle Institute for Artificial Intelligence, Galleria 2, 6928 Manno-Lugano, Switzerland
| | - Bülent Çatay
- † Sabanci University, Faculty of Engineering and Natural Sciences, 34956 Orhanlı, Istanbul, Turkey
| | | |
Collapse
|
24
|
Castillo AM, Bernal A, Patiny L, Wist J. Fully automatic assignment of small molecules' NMR spectra without relying on chemical shift predictions. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2015; 53:603-611. [PMID: 26053353 DOI: 10.1002/mrc.4272] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Revised: 04/30/2015] [Accepted: 05/06/2015] [Indexed: 06/04/2023]
Abstract
We present a method for the automatic assignment of small molecules' NMR spectra. The method includes an automatic and novel self-consistent peak-picking routine that validates NMR peaks in each spectrum against peaks in the same or other spectra that are due to the same resonances. The auto-assignment routine used is based on branch-and-bound optimization and relies predominantly on integration and correlation data; chemical shift information may be included when available to fasten the search and shorten the list of viable assignments, but in most cases tested, it is not required in order to find the correct assignment. This automatic assignment method is implemented as a web-based tool that runs without any user input other than the acquired spectra.
Collapse
Affiliation(s)
- Andrés M Castillo
- Facultad de Ingeniería, Universidad Nacional de Colombia, Bogotá D.C., Colombia
- Chemistry Department, Universidad del Valle, Cali, Valle, A.A. 25360, Colombia
| | - Andrés Bernal
- Chemistry Department, Universidad del Valle, Cali, Valle, A.A. 25360, Colombia
| | - Luc Patiny
- Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, CH-1015, Switzerland
| | - Julien Wist
- Chemistry Department, Universidad del Valle, Cali, Valle, A.A. 25360, Colombia
| |
Collapse
|
25
|
Yang X, Yang F. Completing tags by local learning: a novel image tag completion method based on neighborhood tag vector predictor. Neural Comput Appl 2015. [DOI: 10.1007/s00521-015-1983-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
26
|
Klukowski P, Walczak MJ, Gonczarek A, Boudet J, Wider G. Computer vision-based automated peak picking applied to protein NMR spectra. Bioinformatics 2015; 31:2981-8. [PMID: 25995228 DOI: 10.1093/bioinformatics/btv318] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2014] [Accepted: 05/18/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION A detailed analysis of multidimensional NMR spectra of macromolecules requires the identification of individual resonances (peaks). This task can be tedious and time-consuming and often requires support by experienced users. Automated peak picking algorithms were introduced more than 25 years ago, but there are still major deficiencies/flaws that often prevent complete and error free peak picking of biological macromolecule spectra. The major challenges of automated peak picking algorithms is both the distinction of artifacts from real peaks particularly from those with irregular shapes and also picking peaks in spectral regions with overlapping resonances which are very hard to resolve by existing computer algorithms. In both of these cases a visual inspection approach could be more effective than a 'blind' algorithm. RESULTS We present a novel approach using computer vision (CV) methodology which could be better adapted to the problem of peak recognition. After suitable 'training' we successfully applied the CV algorithm to spectra of medium-sized soluble proteins up to molecular weights of 26 kDa and to a 130 kDa complex of a tetrameric membrane protein in detergent micelles. Our CV approach outperforms commonly used programs. With suitable training datasets the application of the presented method can be extended to automated peak picking in multidimensional spectra of nucleic acids or carbohydrates and adapted to solid-state NMR spectra. AVAILABILITY AND IMPLEMENTATION CV-Peak Picker is available upon request from the authors. CONTACT gsw@mol.biol.ethz.ch; michal.walczak@mol.biol.ethz.ch; adam.gonczarek@pwr.edu.pl SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Piotr Klukowski
- Department of Computer Science, Wroclaw University of Technology, Wroclaw, Poland and
| | - Michal J Walczak
- Institute of Molecular Biology and Biophysics, ETH Zurich, 8093 Zurich, Switzerland
| | - Adam Gonczarek
- Department of Computer Science, Wroclaw University of Technology, Wroclaw, Poland and
| | - Julien Boudet
- Institute of Molecular Biology and Biophysics, ETH Zurich, 8093 Zurich, Switzerland
| | - Gerhard Wider
- Institute of Molecular Biology and Biophysics, ETH Zurich, 8093 Zurich, Switzerland
| |
Collapse
|
27
|
|
28
|
Chen ST, Huang HN, Kung WM, Hsu CY. Optimization-based image watermarking with integrated quantization embedding in the wavelet-domain. MULTIMEDIA TOOLS AND APPLICATIONS 2015. [DOI: 10.1007/s11042-015-2522-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
29
|
Cannistraci CV, Abbas A, Gao X. Median Modified Wiener Filter for nonlinear adaptive spatial denoising of protein NMR multidimensional spectra. Sci Rep 2015; 5:8017. [PMID: 25619991 PMCID: PMC4306135 DOI: 10.1038/srep08017] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Accepted: 12/29/2014] [Indexed: 11/21/2022] Open
Abstract
Denoising multidimensional NMR-spectra is a fundamental step in NMR protein structure determination. The state-of-the-art method uses wavelet-denoising, which may suffer when applied to non-stationary signals affected by Gaussian-white-noise mixed with strong impulsive artifacts, like those in multi-dimensional NMR-spectra. Regrettably, Wavelet's performance depends on a combinatorial search of wavelet shapes and parameters; and multi-dimensional extension of wavelet-denoising is highly non-trivial, which hampers its application to multidimensional NMR-spectra. Here, we endorse a diverse philosophy of denoising NMR-spectra: less is more! We consider spatial filters that have only one parameter to tune: the window-size. We propose, for the first time, the 3D extension of the median-modified-Wiener-filter (MMWF), an adaptive variant of the median-filter, and also its novel variation named MMWF*. We test the proposed filters and the Wiener-filter, an adaptive variant of the mean-filter, on a benchmark set that contains 16 two-dimensional and three-dimensional NMR-spectra extracted from eight proteins. Our results demonstrate that the adaptive spatial filters significantly outperform their non-adaptive versions. The performance of the new MMWF* on 2D/3D-spectra is even better than wavelet-denoising. Noticeably, MMWF* produces stable high performance almost invariant for diverse window-size settings: this signifies a consistent advantage in the implementation of automatic pipelines for protein NMR-spectra analysis.
Collapse
Affiliation(s)
- Carlo Vittorio Cannistraci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Ahmed Abbas
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
30
|
Chen P, Huang JZ, Gao X. LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone. BMC Bioinformatics 2014; 15 Suppl 15:S4. [PMID: 25474163 PMCID: PMC4271564 DOI: 10.1186/1471-2105-15-s15-s4] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Background Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. Results In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. Conclusions Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.
Collapse
|
31
|
|
32
|
Abbas A, Guo X, Jing BY, Gao X. An automated framework for NMR resonance assignment through simultaneous slice picking and spin system forming. JOURNAL OF BIOMOLECULAR NMR 2014; 59:75-86. [PMID: 24748536 DOI: 10.1007/s10858-014-9828-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 04/05/2014] [Indexed: 06/03/2023]
Abstract
Despite significant advances in automated nuclear magnetic resonance-based protein structure determination, the high numbers of false positives and false negatives among the peaks selected by fully automated methods remain a problem. These false positives and negatives impair the performance of resonance assignment methods. One of the main reasons for this problem is that the computational research community often considers peak picking and resonance assignment to be two separate problems, whereas spectroscopists use expert knowledge to pick peaks and assign their resonances at the same time. We propose a novel framework that simultaneously conducts slice picking and spin system forming, an essential step in resonance assignment. Our framework then employs a genetic algorithm, directed by both connectivity information and amino acid typing information from the spin systems, to assign the spin systems to residues. The inputs to our framework can be as few as two commonly used spectra, i.e., CBCA(CO)NH and HNCACB. Different from the existing peak picking and resonance assignment methods that treat peaks as the units, our method is based on 'slices', which are one-dimensional vectors in three-dimensional spectra that correspond to certain ([Formula: see text]) values. Experimental results on both benchmark simulated data sets and four real protein data sets demonstrate that our method significantly outperforms the state-of-the-art methods while using a less number of spectra than those methods. Our method is freely available at http://sfb.kaust.edu.sa/Pages/Software.aspx.
Collapse
Affiliation(s)
- Ahmed Abbas
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | | | | | | |
Collapse
|
33
|
Cheng Y, Gao X, Liang F. Bayesian peak picking for NMR spectra. GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 12:39-47. [PMID: 24184964 PMCID: PMC4411369 DOI: 10.1016/j.gpb.2013.07.003] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2013] [Accepted: 07/29/2013] [Indexed: 11/29/2022]
Abstract
Protein structure determination is a very important topic in structural genomics, which helps people to understand varieties of biological functions such as protein-protein interactions, protein–DNA interactions and so on. Nowadays, nuclear magnetic resonance (NMR) has often been used to determine the three-dimensional structures of protein in vivo. This study aims to automate the peak picking step, the most important and tricky step in NMR structure determination. We propose to model the NMR spectrum by a mixture of bivariate Gaussian densities and use the stochastic approximation Monte Carlo algorithm as the computational tool to solve the problem. Under the Bayesian framework, the peak picking problem is casted as a variable selection problem. The proposed method can automatically distinguish true peaks from false ones without preprocessing the data. To the best of our knowledge, this is the first effort in the literature that tackles the peak picking problem for NMR spectrum data using Bayesian method.
Collapse
Affiliation(s)
- Yichen Cheng
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Faming Liang
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA.
| |
Collapse
|
34
|
Gao X. Recent advances in computational methods for nuclear magnetic resonance data processing. GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 11:29-33. [PMID: 23453016 PMCID: PMC4357661 DOI: 10.1016/j.gpb.2012.12.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2012] [Revised: 12/12/2012] [Accepted: 12/28/2012] [Indexed: 11/28/2022]
Abstract
Although three-dimensional protein structure determination using nuclear magnetic resonance (NMR) spectroscopy is a computationally costly and tedious process that would benefit from advanced computational techniques, it has not garnered much research attention from specialists in bioinformatics and computational biology. In this paper, we review recent advances in computational methods for NMR protein structure determination. We summarize the advantages of and bottlenecks in the existing methods and outline some open problems in the field. We also discuss current trends in NMR technology development and suggest directions for research on future computational methods for NMR.
Collapse
Affiliation(s)
- Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.
| |
Collapse
|
35
|
Abbas A, Kong XB, Liu Z, Jing BY, Gao X. Automatic peak selection by a Benjamini-Hochberg-based algorithm. PLoS One 2013; 8:e53112. [PMID: 23308147 PMCID: PMC3538655 DOI: 10.1371/journal.pone.0053112] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2012] [Accepted: 11/26/2012] [Indexed: 11/25/2022] Open
Abstract
A common issue in bioinformatics is that computational methods often generate a large number of predictions sorted according to certain confidence scores. A key problem is then determining how many predictions must be selected to include most of the true predictions while maintaining reasonably high precision. In nuclear magnetic resonance (NMR)-based protein structure determination, for instance, computational peak picking methods are becoming more and more common, although expert-knowledge remains the method of choice to determine how many peaks among thousands of candidate peaks should be taken into consideration to capture the true peaks. Here, we propose a Benjamini-Hochberg (B-H)-based approach that automatically selects the number of peaks. We formulate the peak selection problem as a multiple testing problem. Given a candidate peak list sorted by either volumes or intensities, we first convert the peaks into [Formula: see text]-values and then apply the B-H-based algorithm to automatically select the number of peaks. The proposed approach is tested on the state-of-the-art peak picking methods, including WaVPeak [1] and PICKY [2]. Compared with the traditional fixed number-based approach, our approach returns significantly more true peaks. For instance, by combining WaVPeak or PICKY with the proposed method, the missing peak rates are on average reduced by 20% and 26%, respectively, in a benchmark set of 32 spectra extracted from eight proteins. The consensus of the B-H-selected peaks from both WaVPeak and PICKY achieves 88% recall and 83% precision, which significantly outperforms each individual method and the consensus method without using the B-H algorithm. The proposed method can be used as a standard procedure for any peak picking method and straightforwardly applied to some other prediction selection problems in bioinformatics. The source code, documentation and example data of the proposed method is available at http://sfb.kaust.edu.sa/pages/software.aspx.
Collapse
Affiliation(s)
- Ahmed Abbas
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Xin-Bing Kong
- Department of Statistics, Fudan University, Shanghai, China
| | - Zhi Liu
- Department of Mathematics, Faculty of Science and Technology, University of Macau, Taipa, Macau
| | - Bing-Yi Jing
- Department of Mathematics, Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
36
|
Yu B, Zhang Y. A simple method for predicting transmembrane proteins based on wavelet transform. Int J Biol Sci 2012; 9:22-33. [PMID: 23289014 PMCID: PMC3535531 DOI: 10.7150/ijbs.5371] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2012] [Accepted: 12/02/2012] [Indexed: 11/05/2022] Open
Abstract
The increasing protein sequences from the genome project require theoretical methods to predict transmembrane helical segments (TMHs). So far, several prediction methods have been reported, but there are some deficiencies in prediction accuracy and adaptability in these methods. In this paper, a method based on discrete wavelet transform (DWT) has been developed to predict the number and location of TMHs in membrane proteins. PDB coded as 1KQG is chosen as an example to describe the prediction process by this method. 80 proteins with known 3D structure from Mptopo database are chosen at random as data sets (including 325 TMHs) and 80 sequences are divided into 13 groups according to their function and type. TMHs prediction is carried out for each group of membrane protein sequences and obtain satisfactory result. To verify the feasibility of this method, 80 membrane protein sequences are treated as test sets, 308 TMHs can be predicted and the prediction accuracy is 96.3%. Compared with the main prediction results of seven popular prediction methods, the obtained results indicate that the proposed method in this paper has higher prediction accuracy.
Collapse
Affiliation(s)
- Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, Shandong, China.
| | | |
Collapse
|