1
|
Wu YT, Lu PW, Lin CA, Chang LY, Jaihao C, Peng TY, Lee WF, Teng NC, Lee SY, Dwivedi RP, Negi P, Yang JC. Development of a zinc chloride-based chemo-mechanical system for potential minimally invasive dental caries removal system. J Dent Sci 2024; 19:919-928. [PMID: 38618085 PMCID: PMC11010630 DOI: 10.1016/j.jds.2023.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 08/01/2023] [Indexed: 04/16/2024] Open
Abstract
Background/purpose The chemo-mechanical caries-removal technique is known to offer advantages of selective dentin caries treatment while leaving healthy dental tissues intact. However, current sodium hypochlorite based reagents usually excessively damage dentin collagen. Therefore, the purpose of this study was to develop a novel chemo-mechanical caries-removal system to preserve the collagen network for subsequent prosthetic restorations. Materials and methods The calfskin-derived collagen was chosen as a model system to investigate the dissolution behavior of collagen under different operating conditions of chemical-ultrasonic treatment systems. The molecular weight, triple-helix structure, the morphology, and functional group of collagen after treatment were investigated. Results Various concentrations of sodium hypochlorite or zinc chloride together with ultrasonic machinery were chosen to investigate. The outcomes of circular dichroism (CD) spectra demonstrated stability of the triple-helix structure after treatment of a zinc chloride solution. In addition, two apparent bands at molecular weights (MWs) of 130 and 121 kDa evidenced the stability of collagen network. The positive 222 nm and 195 nm negative CD absorption band indicated the existence of a triple-helix structure for type I collagen. The preservation of the morphology and functional group of the collagen network on the etched dentin surface were investigated by in vitro dentin decalcification model. Conclusion Unlike NaOCl, the 5 wt% zinc chloride solution combined with ultra-sonication showed dissolution rather than denature as well as degradation of the dentin collagen network. Additional in vivo evaluations are needed to verify its usefulness in clinical applications.
Collapse
Affiliation(s)
- Yu-Tzu Wu
- Graduate Institute of Nanomedicine and Medical Engineering, Taipei Medical University, Taipei, Taiwan
| | - Po-Wen Lu
- Graduate Institute of Biomedical Materials and Tissue Engineering, Taipei Medical University, Taipei, Taiwan
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Shuang Ho Hospital, New Taipei, Taiwan
| | - Chih-An Lin
- Graduate Institute of Nanomedicine and Medical Engineering, Taipei Medical University, Taipei, Taiwan
| | - Liang-Yu Chang
- Graduate Institute of Nanomedicine and Medical Engineering, Taipei Medical University, Taipei, Taiwan
| | - Chonlachat Jaihao
- Graduate Institute of Biomedical Materials and Tissue Engineering, Taipei Medical University, Taipei, Taiwan
| | - Tzu-Yu Peng
- School of Dentistry, Taipei Medical University, Taipei, Taiwan
| | - Wei-Fang Lee
- School of Dental Technology, Taipei Medical University, Taipei, Taiwan
| | - Nai-Chia Teng
- School of Dentistry, Taipei Medical University, Taipei, Taiwan
| | - Sheng-Yang Lee
- School of Dentistry, Taipei Medical University, Taipei, Taiwan
| | - Ram Prakash Dwivedi
- School of Electrical and Computer Science Engineering, Shoolini University, Himachal Pradesh, India
| | - Poonam Negi
- School of Pharmaceutical Sciences, Biotechnology and Management Sciences, Shoolini University, Himachal Pradesh, India
| | - Jen-Chang Yang
- Graduate Institute of Nanomedicine and Medical Engineering, Taipei Medical University, Taipei, Taiwan
- International Ph.D. Program in Biomedical Engineering, Taipei Medical University, Taipei, Taiwan
- Research Center of Biomedical Device, Taipei Medical University, Taipei, Taiwan
- Research Center of Digital Oral Science and Technology, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
2
|
Brix KV, Baken S, Poland CA, Blust R, Pope LJ, Tyler CR. Challenges and Recommendations in Assessing Potential Endocrine-Disrupting Properties of Metals in Aquatic Organisms. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY 2023; 42:2564-2579. [PMID: 37671843 DOI: 10.1002/etc.5741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 07/25/2023] [Accepted: 08/29/2023] [Indexed: 09/07/2023]
Abstract
New tools and refined frameworks for identifying and regulating endocrine-disrupting chemicals (EDCs) are being developed as our scientific understanding of how they work advances. Although focus has largely been on organic chemicals, the potential for metals to act as EDCs in aquatic systems is receiving increasing attention. Metal interactions with the endocrine system are complicated because some metals are essential to physiological systems, including the endocrine system, and nonessential metals can have similar physiochemical attributes that allow substitution into or interference with these systems. Consequently, elevated metal exposure could potentially cause endocrine disruption (ED) but can also cause indirect effects on the endocrine system via multiple pathways or elicit physiologically appropriate compensatory endocrine-mediated responses (endocrine modulation). These latter two effects can be confused with, but are clearly not, ED. In the present study, we provide several case studies that exemplify the challenges encountered in evaluating the endocrine-disrupting (ED) potential of metals, followed by recommendations on how to meet them. Given that metals have multiple modes of action (MOAs), we recommend that assessments use metal-specific adverse outcome pathway networks to ensure that accurate causal links are made between MOAs and effects on the endocrine system. We recommend more focus on establishing molecular initiating events for chronic metal toxicity because these are poorly understood and would reduce uncertainty regarding the potential for metals to be EDCs. Finally, more generalized MOAs such as oxidative stress could be involved in metal interactions with the endocrine system, and we suggest it may be experimentally efficient to evaluate these MOAs when ED is inferred. These experiments, however, must provide explicit linkage to the ED endpoints of interest. Environ Toxicol Chem 2023;42:2564-2579. © 2023 The Authors. Environmental Toxicology and Chemistry published by Wiley Periodicals LLC on behalf of SETAC.
Collapse
Affiliation(s)
- Kevin V Brix
- EcoTox, Miami, Florida, USA
- Rosenstiel School of Marine, Atmospheric & Earth Science, University of Miami, Miami, Florida, USA
| | - Stijn Baken
- International Copper Association, Brussels, Belgium
| | - Craig A Poland
- Regulatory Compliance Limited, Loanhead, United Kingdom
- Centre for Inflammation Research, Queen's Medical Research Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Ronny Blust
- Department of Biology, University of Antwerp, Antwerp, Belgium
| | | | - Charles R Tyler
- Biosciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, United Kingdom
| |
Collapse
|
3
|
Zhang J, Zhou F, Liang X, Yang G. SCAMPER: Accurate Type-Specific Prediction of Calcium-Binding Residues Using Sequence-Derived Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1406-1416. [PMID: 35536812 DOI: 10.1109/tcbb.2022.3173437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Understanding molecular mechanisms involved in calcium-protein interactions and modeling corresponding docking rely on the accurate identification of calcium-binding residues (CaBRs). The defects of experimentally annotating protein functions enhances the development of computational approaches that correctly identify calcium-binding interactions. Studies have reported that current methods severely cross-predict residues that interact with other types of molecules (e.g., nucleic acids, proteins, and small ligands) as CaBRs. In this study, a novel predictor named SCAMPER (Selective CAlciuM-binding PrEdictoR) is proposed for the accurate and specific prediction of CaBRs. SCAMPER is designed using newly compiled dataset with complete UniProt sequences and annotations, which include calcium-binding, nucleic acid-binding, protein-binding, and small ligand-binding residues. We use a novel designed two-layer scheme to perform predictions as well as penalize cross-predictions. Empirical tests on an independent test dataset reveals that the proposed method significantly outperforms state-of-the-art predictors. SCAMPER is proved to be capable of distinguishing CaBRs from different types of metal-ion binding residues. We further perform CaBRs predictions on the whole human proteome, and use the results to hypothesize calcium-binding proteins (CaBPs). The latest experimental verified CaBPs and GO analysis prove the accuracy of our predictions. We implement the proposed method and share the data at http://www.inforstation.com/webservers/SCAMPER/.
Collapse
|
4
|
Sun K, Hu X, Feng Z, Wang H, Lv H, Wang Z, Zhang G, Xu S, You X. Predicting Ca 2+ and Mg 2+ ligand binding sites by deep neural network algorithm. BMC Bioinformatics 2022; 22:324. [PMID: 35045825 PMCID: PMC8772041 DOI: 10.1186/s12859-021-04250-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Accepted: 06/09/2021] [Indexed: 11/25/2022] Open
Abstract
Background Alkaline earth metal ions are important protein binding ligands in human body, and it is of great significance to predict their binding residues. Results In this paper, Mg2+ and Ca2+ ligands are taken as the research objects. Based on the characteristic parameters of protein sequences, amino acids, physicochemical characteristics of amino acids and predicted structural information, deep neural network algorithm is used to predict the binding sites of proteins. By optimizing the hyper-parameters of the deep learning algorithm, the prediction results by the fivefold cross-validation are better than those of the Ionseq method. In addition, to further verify the performance of the proposed model, the undersampling data processing method is adopted, and the prediction results on independent test are better than those obtained by the support vector machine algorithm. Conclusions An efficient method for predicting Mg2+ and Ca2+ ligand binding sites was presented.
Collapse
Affiliation(s)
- Kai Sun
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China.,Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, People's Republic of China
| | - Xiuzhen Hu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China. .,Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, People's Republic of China.
| | - Zhenxing Feng
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China.,Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, People's Republic of China
| | - Hongbin Wang
- College of Data Science and Application, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China
| | - Haotian Lv
- College of Data Science and Application, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China
| | - Ziyang Wang
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China.,Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, People's Republic of China
| | - Gaimei Zhang
- Hohhot First Hospital, Hohhot, 010051, People's Republic of China
| | - Shuang Xu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China.,Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, People's Republic of China
| | - Xiaoxiao You
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China.,Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, People's Republic of China
| |
Collapse
|
5
|
Zhou J, Bo S, Wang H, Zheng L, Liang P, Zuo Y. Identification of Disease-Related 2-Oxoglutarate/Fe (II)-Dependent Oxygenase Based on Reduced Amino Acid Cluster Strategy. Front Cell Dev Biol 2021; 9:707938. [PMID: 34336861 PMCID: PMC8323781 DOI: 10.3389/fcell.2021.707938] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 06/10/2021] [Indexed: 11/17/2022] Open
Abstract
The 2-oxoglutarate/Fe (II)-dependent (2OG) oxygenase superfamily is mainly responsible for protein modification, nucleic acid repair and/or modification, and fatty acid metabolism and plays important roles in cancer, cardiovascular disease, and other diseases. They are likely to become new targets for the treatment of cancer and other diseases, so the accurate identification of 2OG oxygenases is of great significance. Many computational methods have been proposed to predict functional proteins to compensate for the time-consuming and expensive experimental identification. However, machine learning has not been applied to the study of 2OG oxygenases. In this study, we developed OGFE_RAAC, a prediction model to identify whether a protein is a 2OG oxygenase. To improve the performance of OGFE_RAAC, 673 amino acid reduction alphabets were used to determine the optimal feature representation scheme by recoding the protein sequence. The 10-fold cross-validation test showed that the accuracy of the model in identifying 2OG oxygenases is 91.04%. Besides, the independent dataset results also proved that the model has excellent generalization and robustness. It is expected to become an effective tool for the identification of 2OG oxygenases. With further research, we have also found that the function of 2OG oxygenases may be related to their polarity and hydrophobicity, which will help the follow-up study on the catalytic mechanism of 2OG oxygenases and the way they interact with the substrate. Based on the model we built, a user-friendly web server was established and can be friendly accessed at http://bioinfor.imu.edu.cn/ogferaac.
Collapse
Affiliation(s)
- Jian Zhou
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Suling Bo
- College of Computer and Information, Inner Mongolia Medical University, Hohhot, China
| | - Hao Wang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| |
Collapse
|
6
|
Stanton JE, Malijauskaite S, McGourty K, Grabrucker AM. The Metallome as a Link Between the "Omes" in Autism Spectrum Disorders. Front Mol Neurosci 2021; 14:695873. [PMID: 34290588 PMCID: PMC8289253 DOI: 10.3389/fnmol.2021.695873] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 06/14/2021] [Indexed: 12/26/2022] Open
Abstract
Metal dyshomeostasis plays a significant role in various neurological diseases such as Alzheimer's disease, Parkinson's disease, Autism Spectrum Disorders (ASD), and many more. Like studies investigating the proteome, transcriptome, epigenome, microbiome, etc., for years, metallomics studies have focused on data from their domain, i.e., trace metal composition, only. Still, few have considered the links between other "omes," which may together result in an individual's specific pathologies. In particular, ASD have been reported to have multitudes of possible causal effects. Metallomics data focusing on metal deficiencies and dyshomeostasis can be linked to functions of metalloenzymes, metal transporters, and transcription factors, thus affecting the proteome and transcriptome. Furthermore, recent studies in ASD have emphasized the gut-brain axis, with alterations in the microbiome being linked to changes in the metabolome and inflammatory processes. However, the microbiome and other "omes" are heavily influenced by the metallome. Thus, here, we will summarize the known implications of a changed metallome for other "omes" in the body in the context of "omics" studies in ASD. We will highlight possible connections and propose a model that may explain the so far independently reported pathologies in ASD.
Collapse
Affiliation(s)
- Janelle E Stanton
- Department of Biological Sciences, University of Limerick, Limerick, Ireland.,Bernal Institute, University of Limerick, Limerick, Ireland
| | - Sigita Malijauskaite
- Bernal Institute, University of Limerick, Limerick, Ireland.,Department of Chemical Sciences, University of Limerick, Limerick, Ireland
| | - Kieran McGourty
- Bernal Institute, University of Limerick, Limerick, Ireland.,Department of Chemical Sciences, University of Limerick, Limerick, Ireland.,Health Research Institute, University of Limerick, Limerick, Ireland
| | - Andreas M Grabrucker
- Department of Biological Sciences, University of Limerick, Limerick, Ireland.,Bernal Institute, University of Limerick, Limerick, Ireland.,Health Research Institute, University of Limerick, Limerick, Ireland
| |
Collapse
|
7
|
Ireland SM, Martin ACR. Zincbindpredict-Prediction of Zinc Binding Sites in Proteins. Molecules 2021; 26:molecules26040966. [PMID: 33673040 PMCID: PMC7918553 DOI: 10.3390/molecules26040966] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 01/26/2021] [Accepted: 02/09/2021] [Indexed: 11/21/2022] Open
Abstract
Background: Zinc binding proteins make up a significant proportion of the proteomes of most organisms and, within those proteins, zinc performs rôles in catalysis and structure stabilisation. Identifying the ability to bind zinc in a novel protein can offer insights into its functions and the mechanism by which it carries out those functions. Computational means of doing so are faster than spectroscopic means, allowing for searching at much greater speeds and scales, and thereby guiding complimentary experimental approaches. Typically, computational models of zinc binding predict zinc binding for individual residues rather than as a single binding site, and typically do not distinguish between different classes of binding site—missing crucial properties indicative of zinc binding. Methods: Previously, we created ZincBindDB, a continuously updated database of known zinc binding sites, categorised by family (the set of liganding residues). Here, we use this dataset to create ZincBindPredict, a set of machine learning methods to predict the most common zinc binding site families for both structure and sequence. Results: The models all achieve an MCC ≥ 0.88, recall ≥ 0.93 and precision ≥ 0.91 for the structural models (mean MCC = 0.97), while the sequence models have MCC ≥ 0.64, recall ≥ 0.80 and precision ≥ 0.83 (mean MCC = 0.87), with the models for binding sites containing four liganding residues performing much better than this. Conclusions: The predictors outperform competing zinc binding site predictors and are available online via a web interface and a GraphQL API.
Collapse
|
8
|
Grosjean N, Blaby-Haas CE. Leveraging computational genomics to understand the molecular basis of metal homeostasis. THE NEW PHYTOLOGIST 2020; 228:1472-1489. [PMID: 32696981 DOI: 10.1111/nph.16820] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 07/03/2020] [Indexed: 06/11/2023]
Abstract
Genome-based data is helping to reveal the diverse strategies plants and algae use to maintain metal homeostasis. In addition to acquisition, distribution and storage of metals, acclimating to feast or famine can involve a wealth of genes that we are just now starting to understand. The fast-paced acquisition of genome-based data, however, is far outpacing our ability to experimentally characterize protein function. Computational genomic approaches are needed to fill the gap between what is known and unknown. To avoid misconstruing bioinformatically derived data, which is the root cause of the inaccurate functional annotations that plague databases, functional inferences from diverse sources and contextualization of that evidence with a robust understanding of protein family evolution is needed. Phylogenomic- and comparative-genomic-based studies can aid in the interpretation of experimental data or provide a spark for the discovery of a new function. These analyses not only lead to novel insight into a target protein's function but can generate thought-provoking insights across protein families.
Collapse
Affiliation(s)
- Nicolas Grosjean
- Biology Department, Brookhaven National Laboratory, Upton, NY, 11973, USA
| | | |
Collapse
|
9
|
Liu L, Hu X, Feng Z, Wang S, Sun K, Xu S. Recognizing Ion Ligand-Binding Residues by Random Forest Algorithm Based on Optimized Dihedral Angle. Front Bioeng Biotechnol 2020; 8:493. [PMID: 32596216 PMCID: PMC7303464 DOI: 10.3389/fbioe.2020.00493] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 04/28/2020] [Indexed: 11/26/2022] Open
Abstract
The prediction of ion ligand–binding residues in protein sequences is a challenging work that contributes to understand the specific functions of proteins in life processes. In this article, we selected binding residues of 14 ion ligands as research objects, including four acid radical ion ligands and 10 metal ion ligands. Based on the amino acid sequence information, we selected the composition and position conservation information of amino acids, the predicted structural information, and physicochemical properties of amino acids as basic feature parameters. We then performed a statistical analysis and reclassification for dihedral angle and proposed new methods on the extraction of feature parameters. The methods mainly included applying information entropy on the extraction of polarization charge and hydrophilic–hydrophobic information of amino acids and using position weight matrices on the extraction of position conservation information. In the prediction model, we used the random forest algorithm and obtained better prediction results than previous works. With the independent test, the Matthew's correlation coefficient and accuracy of 10 metal ion ligand–binding residues were larger than 0.07 and 52%, respectively; the corresponding evaluation values of four acid radical ion ligand–binding residues were larger than 0.15 and 86%, respectively. Further, we classified and combined the phi and psi angles and optimized prediction model for each ion ligand–binding residue.
Collapse
Affiliation(s)
- Liu Liu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
| | - Xiuzhen Hu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
| | - Zhenxing Feng
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
| | - Shan Wang
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
| | - Kai Sun
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
| | - Shuang Xu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
| |
Collapse
|
10
|
Xi B, Tao J, Liu X, Xu X, He P, Dai Q. RaaMLab: A MATLAB toolbox that generates amino acid groups and reduced amino acid modes. Biosystems 2019; 180:38-45. [PMID: 30904554 DOI: 10.1016/j.biosystems.2019.03.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 12/25/2018] [Accepted: 03/06/2019] [Indexed: 01/31/2023]
Abstract
Amino acid (AA) classification and its different biophysical and chemical characteristics have been widely applied to analyze and predict the structural, functional, expression and interaction profiles of proteins and peptides. We present RaaMLab, a free and open-source MATLAB toolbox, to facilitate studies on proteins and peptides, to generate AA groups and to extract the structural and physicochemical features of reduced AAs (RedAA). This toolbox offers 4 kinds of databases, including the physicochemical properties of AAs and their groupings, 49 AA classification methods and 5 types of biophysicochemical features of RedAAs. These factors can be easily computed based on user-defined alphabet size and AA properties of AA groupings. RaaMLab is an open source freely available at https://github.com/bioinfo0706/RaaMLab. This website also contains a tutorial, extensive documentation and examples.
Collapse
Affiliation(s)
- Baohang Xi
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Jin Tao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| | - Xinnan Xu
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Pingan He
- College of Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China.
| |
Collapse
|
11
|
Haberal İ, Oğul H. Prediction of Protein Metal Binding Sites Using Deep Neural Networks. Mol Inform 2019; 38:e1800169. [PMID: 30977960 DOI: 10.1002/minf.201800169] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 03/29/2019] [Indexed: 11/06/2022]
Abstract
Metals have crucial roles for many physiological, pathological and diagnostic processes. Metal binding proteins or metalloproteins are important for metabolism functions. The proteins that reach the three-dimensional structure by folding show which vital function is fulfilled. The prediction of metal-binding in proteins will be considered as a step-in function assignment for new proteins, which helps to obtain functional proteins in genomic studies, is critical to protein function annotation and drug discovery. Computational predictions made by using machine learning methods from the data obtained from amino acid sequences are widely used in the protein metal-binding and various bioinformatics fields. In this work, we present three different deep learning architectures for prediction of metal-binding of Histidines (HIS) and Cysteines (CYS) amino acids. These architectures are as follows: 2D Convolutional Neural Network, Long-Short Term Memory and Recurrent Neural Network. Their comparison is carried out on the three different sets of attributes derived from a public dataset of protein sequences. These three sets of features extracted from the protein sequence were obtained using the PAM scoring matrix, protein composition server, and binary representation methods. The results show that a better performance for prediction of protein metal- binding sites is obtained through Convolutional Neural Network architecture.
Collapse
Affiliation(s)
- İsmail Haberal
- Department of Computer Engineering, Başkent University, Fatih Sultan Mahallesi Eskişehir Yolu 18. km, 06790, Etimesgut, Ankara, Turkey
| | - Hasan Oğul
- Department of Computer Engineering, Başkent University, Fatih Sultan Mahallesi Eskişehir Yolu 18. km, 06790, Etimesgut, Ankara, Turkey.,Faculty of Computer Sciences, Østfold University College, Halden, Norway
| |
Collapse
|
12
|
Greener JG, Moffat L, Jones DT. Design of metalloproteins and novel protein folds using variational autoencoders. Sci Rep 2018; 8:16189. [PMID: 30385875 PMCID: PMC6212568 DOI: 10.1038/s41598-018-34533-1] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Accepted: 10/19/2018] [Indexed: 12/26/2022] Open
Abstract
The design of novel proteins has many applications but remains an attritional process with success in isolated cases. Meanwhile, deep learning technologies have exploded in popularity in recent years and are increasingly applicable to biology due to the rise in available data. We attempt to link protein design and deep learning by using variational autoencoders to generate protein sequences conditioned on desired properties. Potential copper and calcium binding sites are added to non-metal binding proteins without human intervention and compared to a hidden Markov model. In another use case, a grammar of protein structures is developed and used to produce sequences for a novel protein topology. One candidate structure is found to be stable by molecular dynamics simulation. The ability of our model to confine the vast search space of protein sequences and to scale easily has the potential to assist in a variety of protein design tasks.
Collapse
Affiliation(s)
- Joe G Greener
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
- Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
| | - Lewis Moffat
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
- Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
| | - David T Jones
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
- Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK.
| |
Collapse
|
13
|
Shaik NA, Awan ZA, Verma PK, Elango R, Banaganapalli B. Protein phenotype diagnosis of autosomal dominant calmodulin mutations causing irregular heart rhythms. J Cell Biochem 2018; 119:8233-8248. [PMID: 29932249 DOI: 10.1002/jcb.26834] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Accepted: 03/09/2018] [Indexed: 12/21/2022]
Abstract
The life-threatening group of irregular cardiac rhythmic disorders also known as Cardiac Arrhythmias (CA) are caused by mutations in highly conserved Calmodulin (CALM/CaM) genes. Herein, we present a multidimensional approach to diagnose changes in phenotypic, stability, and Ca2+ ion binding properties of CA-causing mutations. Mutation pathogenicity was determined by diverse computational machine learning approaches. We further modeled the mutations in 3D protein structure and analyzed residue level phenotype plasticity. We have also examined the influence of torsion angles, number of H-bonds, and free energy dynamics on the stability, near-native simulation dynamic potential of residue fluctuations in protein structures, Ca2+ ion binding potentials, of CaM mutants. Our study recomends to use M-CAP method for measuring the pathogenicity of CA causing CaM variants. Interestingly, most CA-causing variants we analyzed, exists in either third (V/H-96, S/I-98, V-103) or fourth (G/V-130, V/E/H-132, H-134, P-136, G-141, and L-142) EF-hands located in carboxyl domains of the CaM molecule. We observed that the minor structural fluctuations caused by these variants are likely tolerable owing to the highly flexible nature of calmodulin's globular domains. However, our molecular docking results supports that these variants disturb the affinity of CaM toward Ca2+ ions and corroborate previous findings from functional studies. Taken together, these computational findings can explain the molecular reasons for subtle changes in structure, flexibility, and stability aspects of mutant CaM molecule. Our comprehensive molecular scanning approach demonstrates the utility of computational methods in quick preliminary screening of CA- CaM mutations before undertaking time consuming and complicated functional laboratory assays.
Collapse
Affiliation(s)
- Noor A Shaik
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia.,Princess Al-Jawhara Al-Brahim Centre of Excellence in Research of Hereditary Disorders (PACER-HD), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Zuhier A Awan
- Department of Clinical Biochemistry, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Prashant K Verma
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ramu Elango
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia.,Princess Al-Jawhara Al-Brahim Centre of Excellence in Research of Hereditary Disorders (PACER-HD), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Babajan Banaganapalli
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia.,Princess Al-Jawhara Al-Brahim Centre of Excellence in Research of Hereditary Disorders (PACER-HD), King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
14
|
Cao X, Hu X, Zhang X, Gao S, Ding C, Feng Y, Bao W. Identification of metal ion binding sites based on amino acid sequences. PLoS One 2017; 12:e0183756. [PMID: 28854211 PMCID: PMC5576659 DOI: 10.1371/journal.pone.0183756] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2017] [Accepted: 08/10/2017] [Indexed: 11/26/2022] Open
Abstract
The identification of metal ion binding sites is important for protein function annotation and the design of new drug molecules. This study presents an effective method of analyzing and identifying the binding residues of metal ions based solely on sequence information. Ten metal ions were extracted from the BioLip database: Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+ and Co2+. The analysis showed that Zn2+, Cu2+, Fe2+, Fe3+, and Co2+ were sensitive to the conservation of amino acids at binding sites, and promising results can be achieved using the Position Weight Scoring Matrix algorithm, with an accuracy of over 79.9% and a Matthews correlation coefficient of over 0.6. The binding sites of other metals can also be accurately identified using the Support Vector Machine algorithm with multifeature parameters as input. In addition, we found that Ca2+ was insensitive to hydrophobicity and hydrophilicity information and Mn2+ was insensitive to polarization charge information. An online server was constructed based on the framework of the proposed method and is freely available at http://60.31.198.140:8081/metal/HomePage/HomePage.html.
Collapse
Affiliation(s)
- Xiaoyong Cao
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Xiuzhen Hu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Xiaojin Zhang
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Sujuan Gao
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
- College of Sciences, Inner Mongolia Agricultural University, Hohhot, 010021, China
| | - Changjiang Ding
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Yonge Feng
- College of Sciences, Inner Mongolia Agricultural University, Hohhot, 010021, China
| | - Weihua Bao
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| |
Collapse
|
15
|
Computational approaches for de novo design and redesign of metal-binding sites on proteins. Biosci Rep 2017; 37:BSR20160179. [PMID: 28167677 PMCID: PMC5482196 DOI: 10.1042/bsr20160179] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Revised: 02/06/2017] [Accepted: 02/06/2017] [Indexed: 12/25/2022] Open
Abstract
Metal ions play pivotal roles in protein structure, function and stability. The functional and structural diversity of proteins in nature expanded with the incorporation of metal ions or clusters in proteins. Approximately one-third of these proteins in the databases contain metal ions. Many biological and chemical processes in nature involve metal ion-binding proteins, aka metalloproteins. Many cellular reactions that underpin life require metalloproteins. Most of the remarkable, complex chemical transformations are catalysed by metalloenzymes. Realization of the importance of metal-binding sites in a variety of cellular events led to the advancement of various computational methods for their prediction and characterization. Furthermore, as structural and functional knowledgebase about metalloproteins is expanding with advances in computational and experimental fields, the focus of the research is now shifting towards de novo design and redesign of metalloproteins to extend nature’s own diversity beyond its limits. In this review, we will focus on the computational toolbox for prediction of metal ion-binding sites, de novo metalloprotein design and redesign. We will also give examples of tailor-made artificial metalloproteins designed with the computational toolbox.
Collapse
|
16
|
Valasatava Y, Rosato A, Banci L, Andreini C. MetalPredator: a web server to predict iron-sulfur cluster binding proteomes. Bioinformatics 2016; 32:2850-2. [PMID: 27273670 DOI: 10.1093/bioinformatics/btw238] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The prediction of the iron-sulfur proteome is highly desirable for biomedical and biological research but a freely available tool to predict iron-sulfur proteins has not been developed yet. RESULTS We developed a web server to predict iron-sulfur proteins from protein sequence(s). This tool, called MetalPredator, is able to process complete proteomes rapidly with high recall and precision. AVAILABILITY AND IMPLEMENTATION The web server is freely available at: http://metalweb.cerm.unifi.it/tools/metalpredator/ CONTACT andreini@cerm.unifi.it SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Antonio Rosato
- Magnetic Resonance Center (CERM) Department of Chemistry, University of Florence, 50019 Sesto Fiorentino, Italy
| | - Lucia Banci
- Magnetic Resonance Center (CERM) Department of Chemistry, University of Florence, 50019 Sesto Fiorentino, Italy
| | - Claudia Andreini
- Magnetic Resonance Center (CERM) Department of Chemistry, University of Florence, 50019 Sesto Fiorentino, Italy
| |
Collapse
|
17
|
Tiwari AK, Srivastava R. A survey of computational intelligence techniques in protein function prediction. INTERNATIONAL JOURNAL OF PROTEOMICS 2014; 2014:845479. [PMID: 25574395 PMCID: PMC4276698 DOI: 10.1155/2014/845479] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Revised: 10/31/2014] [Accepted: 11/07/2014] [Indexed: 02/08/2023]
Abstract
During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction.
Collapse
Affiliation(s)
- Arvind Kumar Tiwari
- Department of Computer Science & Engineering, Indian Institute of Technology (BHU), Varanasi 221005, India
| | - Rajeev Srivastava
- Department of Computer Science & Engineering, Indian Institute of Technology (BHU), Varanasi 221005, India
| |
Collapse
|
18
|
Mazumder M, Padhan N, Bhattacharya A, Gourinath S. Prediction and analysis of canonical EF hand loop and qualitative estimation of Ca²⁺ binding affinity. PLoS One 2014; 9:e96202. [PMID: 24760183 PMCID: PMC3997525 DOI: 10.1371/journal.pone.0096202] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Accepted: 04/04/2014] [Indexed: 12/31/2022] Open
Abstract
The diversity of functions carried out by EF hand-containing calcium-binding proteins is due to various interactions made by these proteins as well as the range of affinity levels for Ca2+ displayed by them. However, accurate methods are not available for prediction of binding affinities. Here, amino acid patterns of canonical EF hand sequences obtained from available crystal structures were used to develop a classifier that distinguishes Ca2+-binding loops and non Ca2+-binding regions with 100% accuracy. To investigate further, we performed a proteome-wide prediction for E. histolytica, and classified known EF-hand proteins. We compared our results with published methods on the E. histolytica proteome scan, and demonstrated our method to be more specific and accurate for predicting potential canonical Ca2+-binding loops. Furthermore, we annotated canonical EF-hand motifs and classified them based on their Ca2+-binding affinities using support vector machines. Using a novel method generated from position-specific scoring metrics and then tested against three different experimentally derived EF-hand-motif datasets, predictions of Ca2+-binding affinities were between 87 and 90% accurate. Our results show that the tool described here is capable of predicting Ca2+-binding affinity constants of EF-hand proteins. The web server is freely available at http://202.41.10.46/calb/index.html.
Collapse
Affiliation(s)
- Mohit Mazumder
- School of Life Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Narendra Padhan
- School of Life Sciences, Jawaharlal Nehru University, New Delhi, India
- Department of Immunology, Genetics, and Pathology, Rudbeck Laboratory, Uppsala University, Uppsala, Sweden
| | - Alok Bhattacharya
- School of Life Sciences, Jawaharlal Nehru University, New Delhi, India
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Samudrala Gourinath
- School of Life Sciences, Jawaharlal Nehru University, New Delhi, India
- * E-mail:
| |
Collapse
|
19
|
Govindan G, Nair AS. Bagging with CTD--a novel signature for the hierarchical prediction of secreted protein trafficking in eukaryotes. GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 11:385-90. [PMID: 24316328 PMCID: PMC4357838 DOI: 10.1016/j.gpb.2013.07.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2013] [Revised: 07/01/2013] [Accepted: 07/17/2013] [Indexed: 11/19/2022]
Abstract
Protein trafficking or protein sorting in eukaryotes is a complicated process and is carried out based on the information contained in the protein. Many methods reported prediction of the subcellular location of proteins from sequence information. However, most of these prediction methods use a flat structure or parallel architecture to perform prediction. In this work, we introduce ensemble classifiers with features that are extracted directly from full length protein sequences to predict locations in the protein-sorting pathway hierarchically. Sequence driven features, sequence mapped features and sequence autocorrelation features were tested with ensemble learners and their performances were compared. When evaluated by independent data testing, ensemble based-bagging algorithms with sequence feature composition, transition and distribution (CTD) successfully classified two datasets with accuracies greater than 90%. We compared our results with similar published methods, and our method equally performed with the others at two levels in the secreted pathway. This study shows that the feature CTD extracted from protein sequences is effective in capturing biological features among compartments in secreted pathways.
Collapse
Affiliation(s)
- Geetha Govindan
- Department of Computational Biology and Bioinformatics, University of Kerala, Thiruvananthapuram 695581, India.
| | - Achuthsankar S Nair
- Department of Computational Biology and Bioinformatics, University of Kerala, Thiruvananthapuram 695581, India
| |
Collapse
|
20
|
Zhou Y, Xue S, Yang JJ. Calciomics: integrative studies of Ca2+-binding proteins and their interactomes in biological systems. Metallomics 2013; 5:29-42. [PMID: 23235533 DOI: 10.1039/c2mt20009k] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Calcium ion (Ca(2+)), the fifth most common chemical element in the earth's crust, represents the most abundant mineral in the human body. By binding to a myriad of proteins distributed in different cellular organelles, Ca(2+) impacts nearly every aspect of cellular life. In prokaryotes, Ca(2+) plays an important role in bacterial movement, chemotaxis, survival reactions and sporulation. In eukaryotes, Ca(2+) has been chosen through evolution to function as a universal and versatile intracellular signal. Viruses, as obligate intracellular parasites, also develop smart strategies to manipulate the host Ca(2+) signaling machinery to benefit their own life cycles. This review focuses on recent advances in applying both bioinformatic and experimental approaches to predict and validate Ca(2+)-binding proteins and their interactomes in biological systems on a genome-wide scale (termed "calciomics"). Calmodulin is used as an example of Ca(2+)-binding protein (CaBP) to demonstrate the role of CaBPs on the regulation of biological functions. This review is anticipated to rekindle interest in investigating Ca(2+)-binding proteins and Ca(2+)-modulated functions at the systems level in the post-genomic era.
Collapse
Affiliation(s)
- Yubin Zhou
- Center for Translational Cancer Research, Institute of Biosciences and Technology, Texas A&M University System Health Science Center, Houston, TX 77030, USA
| | | | | |
Collapse
|
21
|
Liu Z, Wang Y, Zhou C, Xue Y, Zhao W, Liu H. Computationally characterizing and comprehensive analysis of zinc-binding sites in proteins. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:171-80. [PMID: 23499845 DOI: 10.1016/j.bbapap.2013.03.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2012] [Revised: 03/02/2013] [Accepted: 03/04/2013] [Indexed: 10/27/2022]
Abstract
Zinc is one of the most essential metals utilized by organisms, and zinc-binding proteins play an important role in a variety of biological processes such as transcription regulation, cell metabolism and apoptosis. Thus, characterizing the precise zinc-binding sites is fundamental to an elucidation of the biological functions and molecular mechanisms of zinc-binding proteins. Using systematic analyses of structural characteristics, we observed that 4-residue and 3-residue zinc-binding sites have distinctly specific geometric features. Based on the results, we developed the novel computational program Geometric REstriction for Zinc-binding (GRE4Zn) to characterize the zinc-binding sites in protein structures, by restricting the distances between zinc and its coordinating atoms. The comparison between GRE4Zn and analogous tools revealed that it achieved a superior performance. A large-scale prediction for structurally characterized proteins was performed with this powerful predictor, and statistical analyses for the results indicated zinc-binding proteins have come to be significantly involved in more complicated biological processes in higher species than simpler species during the course of evolution. Further analyses suggested that zinc-binding proteins are preferentially implicated in a variety of diseases and highly enriched in known drug targets, and the prediction of zinc-binding sites can be helpful for the investigation of molecular mechanisms. In this regard, these prediction and analysis results should prove to be highly useful be helpful for further biomedical study and drug design. The online service of GRE4Zn is freely available at: http://biocomp.ustc.edu.cn/gre4zn/. This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications. Guest Editor: Yudong Cai.
Collapse
Affiliation(s)
- Zexian Liu
- Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science & Technology of China, Hefei, Anhui 230027, China
| | | | | | | | | | | |
Collapse
|
22
|
An integrative computational framework based on a two-step random forest algorithm improves prediction of zinc-binding sites in proteins. PLoS One 2012; 7:e49716. [PMID: 23166753 PMCID: PMC3499040 DOI: 10.1371/journal.pone.0049716] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Accepted: 10/12/2012] [Indexed: 11/30/2022] Open
Abstract
Zinc-binding proteins are the most abundant metalloproteins in the Protein Data Bank where the zinc ions usually have catalytic, regulatory or structural roles critical for the function of the protein. Accurate prediction of zinc-binding sites is not only useful for the inference of protein function but also important for the prediction of 3D structure. Here, we present a new integrative framework that combines multiple sequence and structural properties and graph-theoretic network features, followed by an efficient feature selection to improve prediction of zinc-binding sites. We investigate what information can be retrieved from the sequence, structure and network levels that is relevant to zinc-binding site prediction. We perform a two-step feature selection using random forest to remove redundant features and quantify the relative importance of the retrieved features. Benchmarking on a high-quality structural dataset containing 1,103 protein chains and 484 zinc-binding residues, our method achieved >80% recall at a precision of 75% for the zinc-binding residues Cys, His, Glu and Asp on 5-fold cross-validation tests, which is a 10%-28% higher recall at the 75% equal precision compared to SitePredict and zincfinder at residue level using the same dataset. The independent test also indicates that our method has achieved recall of 0.790 and 0.759 at residue and protein levels, respectively, which is a performance better than the other two methods. Moreover, AUC (the Area Under the Curve) and AURPC (the Area Under the Recall-Precision Curve) by our method are also respectively better than those of the other two methods. Our method can not only be applied to large-scale identification of zinc-binding sites when structural information of the target is available, but also give valuable insights into important features arising from different levels that collectively characterize the zinc-binding sites. The scripts and datasets are available at http://protein.cau.edu.cn/zincidentifier/.
Collapse
|
23
|
Lim CK, Hassan KA, Penesyan A, Loper JE, Paulsen IT. The effect of zinc limitation on the transcriptome ofPseudomonas protegens Pf-5. Environ Microbiol 2012; 15:702-15. [DOI: 10.1111/j.1462-2920.2012.02849.x] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Revised: 07/09/2012] [Accepted: 07/21/2012] [Indexed: 02/03/2023]
Affiliation(s)
- Chee Kent Lim
- Department of Chemistry and Biomolecular Sciences; Macquarie University; Sydney; NSW; Australia
| | - Karl A. Hassan
- Department of Chemistry and Biomolecular Sciences; Macquarie University; Sydney; NSW; Australia
| | - Anahit Penesyan
- Department of Chemistry and Biomolecular Sciences; Macquarie University; Sydney; NSW; Australia
| | - Joyce E. Loper
- USDA-ARS Horticultural Crops Research Laboratory and Department of Botany and Plant Pathology; Oregon State University; Corvallis; OR; USA
| | - Ian T. Paulsen
- Department of Chemistry and Biomolecular Sciences; Macquarie University; Sydney; NSW; Australia
| |
Collapse
|
24
|
Lu CH, Lin YF, Lin JJ, Yu CS. Prediction of metal ion-binding sites in proteins using the fragment transformation method. PLoS One 2012; 7:e39252. [PMID: 22723976 PMCID: PMC3377655 DOI: 10.1371/journal.pone.0039252] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2011] [Accepted: 05/21/2012] [Indexed: 11/19/2022] Open
Abstract
The structure of a protein determines its function and its interactions with other factors. Regions of proteins that interact with ligands, substrates, and/or other proteins, tend to be conserved both in sequence and structure, and the residues involved are usually in close spatial proximity. More than 70,000 protein structures are currently found in the Protein Data Bank, and approximately one-third contain metal ions essential for function. Identifying and characterizing metal ion-binding sites experimentally is time-consuming and costly. Many computational methods have been developed to identify metal ion-binding sites, and most use only sequence information. For the work reported herein, we developed a method that uses sequence and structural information to predict the residues in metal ion-binding sites. Six types of metal ion-binding templates- those involving Ca(2+), Cu(2+), Fe(3+), Mg(2+), Mn(2+), and Zn(2+)-were constructed using the residues within 3.5 Å of the center of the metal ion. Using the fragment transformation method, we then compared known metal ion-binding sites with the templates to assess the accuracy of our method. Our method achieved an overall 94.6 % accuracy with a true positive rate of 60.5 % at a 5 % false positive rate and therefore constitutes a significant improvement in metal-binding site prediction.
Collapse
Affiliation(s)
- Chih-Hao Lu
- Graduate Institute of Molecular Systems Biomedicine, China Medical University, Taichung, Taiwan.
| | | | | | | |
Collapse
|
25
|
MOHAN ABHILASH, ANISHETTY SHARMILA, GAUTAM PENNATHUR. GLOBAL METAL-ION BINDING PROTEIN FINGERPRINT: A METHOD TO IDENTIFY MOTIF-LESS METAL-ION BINDING PROTEINS. J Bioinform Comput Biol 2011. [DOI: 10.1142/s0219720010004884] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Metal-ion binding proteins play a vital role in biological processes. Identifying putative metal-ion binding proteins is through knowledge-based methods. These involve the identification of specific motifs that characterize a specific class of metal-ion binding protein. Metal-ion binding motifs have been identified for the common metal ions. A robust global fingerprint that is useful in identifying a metal-ion binding protein from a non-metal-ion binding protein has not been devised. Such a method will help in identifying novel metal-ion binding proteins and proteins that do not possess a canonical metal-ion binding motif. We have used a set of physico-chemical parameters of metal-ion binding proteins encoded by the genes CzcA, CzcB and CzcD as a training set to supervised classifiers and have been able to identify several other metal ion binding proteins leading us to believe that metal-ion binding proteins have a global fingerprint, which cannot be pinned down to a single feature of the protein sequence.
Collapse
Affiliation(s)
- ABHILASH MOHAN
- Centre for Biotechnology, Anna University, Chennai, Tamilnadu, 600 025, India
| | - SHARMILA ANISHETTY
- Centre for Biotechnology, Anna University, Chennai, Tamilnadu, 600 025, India
| | - PENNATHUR GAUTAM
- Centre for Biotechnology, Anna University, Chennai, Tamilnadu, 600 025, India
| |
Collapse
|
26
|
Zhao W, Xu M, Liang Z, Ding B, Niu L, Liu H, Teng M. Structure-based de novo prediction of zinc-binding sites in proteins of unknown function. ACTA ACUST UNITED AC 2011; 27:1262-8. [PMID: 21414989 DOI: 10.1093/bioinformatics/btr133] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Zinc-binding proteins are the most abundant metallo-proteins in Protein Data Bank (PDB). Accurate prediction of zinc-binding sites in proteins of unknown function may provide important clues for the inference of protein function. As zinc binding is often associated with characteristic 3D arrangements of zinc ligand residues, its prediction may benefit from using not only the sequence information but also the structure information of proteins. RESULTS In this work, we present a structure-based method, TEMSP (3D TEmplate-based Metal Site Prediction), to predict zinc-binding sites. TEMSP significantly improves over previously reported best methods in predicting as many as possible true ligand residues for zinc with minimum overpredictions: if only those results in which all zinc ligand residues have been correctly predicted are defined as true positives, our method improves sensitivity from less than 30% to above 60%, and selectivity from around 25% to 80%. These results are for predictions based on apo state structures. In addition, the method can predict the zinc-bound local structures reliably, generating predictions useful for function inference. We applied TEMSP to 1888 protein structures of the 'Unknown Function' class in the PDB database. A number of zinc-binding sites have been discovered de novo, i.e. based solely on the protein structures. Using the predicted local structures of these sites, possible functional roles were analyzed. AVAILABILITY TEMSP is freely available from http://netalign.ustc.edu.cn/temsp/.
Collapse
Affiliation(s)
- Wei Zhao
- Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China and Key Laboratory of Structural Biology, Chinese Academy of Sciences, 96 Jinzhai Road, Hefei, Anhui, China
| | | | | | | | | | | | | |
Collapse
|
27
|
Dutta A, Bahar I. Metal-binding sites are designed to achieve optimal mechanical and signaling properties. Structure 2011; 18:1140-8. [PMID: 20826340 DOI: 10.1016/j.str.2010.06.013] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2010] [Revised: 05/21/2010] [Accepted: 06/17/2010] [Indexed: 11/29/2022]
Abstract
Many proteins require bound metals to achieve their function. We take advantage of increasing structural data on metal-binding proteins to elucidate three properties: the involvement of metal-binding sites in the global dynamics of the protein, predicted by elastic network models, their exposure/burial to solvent, and their signal-processing properties indicated by Markovian stochastics analysis. Systematic analysis of a data set of 145 structures reveals that the residues that coordinate metal ions enjoy remarkably efficient and precise signal transduction properties. These properties are rationalized in terms of their physical properties: participation in hinge sites that control the softest modes collectively accessible to the protein and occupancy of central positions minimally exposed to solvent. Our observations suggest that metal-binding sites may have been evolutionary selected to achieve optimum allosteric communication. They also provide insights into basic principles for designing metal-binding sites, which are verified to be met by recently designed de novo metal-binding proteins.
Collapse
Affiliation(s)
- Anindita Dutta
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, 3064 BST3, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA
| | | |
Collapse
|
28
|
Brylinski M, Skolnick J. FINDSITE-metal: integrating evolutionary information and machine learning for structure-based metal-binding site prediction at the proteome level. Proteins 2010; 79:735-51. [PMID: 21287609 DOI: 10.1002/prot.22913] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2010] [Revised: 09/27/2010] [Accepted: 10/07/2010] [Indexed: 12/13/2022]
Abstract
The rapid accumulation of gene sequences, many of which are hypothetical proteins with unknown function, has stimulated the development of accurate computational tools for protein function prediction with evolution/structure-based approaches showing considerable promise. In this article, we present FINDSITE-metal, a new threading-based method designed specifically to detect metal-binding sites in modeled protein structures. Comprehensive benchmarks using different quality protein structures show that weakly homologous protein models provide sufficient structural information for quite accurate annotation by FINDSITE-metal. Combining structure/evolutionary information with machine learning results in highly accurate metal-binding annotations; for protein models constructed by TASSER, whose average Cα RMSD from the native structure is 8.9 Å, 59.5% (71.9%) of the best of top five predicted metal locations are within 4 Å (8 Å) from a bound metal in the crystal structure. For most of the targets, multiple metal-binding sites are detected with the best predicted binding site at rank 1 and within the top two ranks in 65.6% and 83.1% of the cases, respectively. Furthermore, for iron, copper, zinc, calcium, and magnesium ions, the binding metal can be predicted with high, typically 70% to 90%, accuracy. FINDSITE-metal also provides a set of confidence indexes that help assess the reliability of predictions. Finally, we describe the proteome-wide application of FINDSITE-metal that quantifies the metal-binding complement of the human proteome. FINDSITE-metal is freely available to the academic community at http://cssb.biology.gatech.edu/findsite-metal/.
Collapse
Affiliation(s)
- Michal Brylinski
- Center for the Study of Systems Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | | |
Collapse
|
29
|
Wang X, Zhao K, Kirberger M, Wong H, Chen G, Yang JJ. Analysis and prediction of calcium-binding pockets from apo-protein structures exhibiting calcium-induced localized conformational changes. Protein Sci 2010; 19:1180-90. [PMID: 20512971 DOI: 10.1002/pro.394] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Calcium binding in proteins exhibits a wide range of polygonal geometries that relate directly to an equally diverse set of biological functions. The binding process stabilizes protein structures and typically results in local conformational change and/or global restructuring of the backbone. Previously, we established the MUG program, which utilized multiple geometries in the Ca(2+)-binding pockets of holoproteins to identify such pockets, ignoring possible Ca(2+)-induced conformational change. In this article, we first report our progress in the analysis of Ca(2+)-induced conformational changes followed by improved prediction of Ca(2+)-binding sites in the large group of Ca(2+)-binding proteins that exhibit only localized conformational changes. The MUG(SR) algorithm was devised to incorporate side chain torsional rotation as a predictor. The output from MUG(SR) presents groups of residues where each group, typically containing two to five residues, is a potential binding pocket. MUG(SR) was applied to both X-ray apo structures and NMR holo structures, which did not use calcium distance constraints in structure calculations. Predicted pockets were validated by comparison with homologous holo structures. Defining a "correct hit" as a group of residues containing at least two true ligand residues, the sensitivity was at least 90%; whereas for a "correct hit" defined as a group of residues containing at least three true ligand residues, the sensitivity was at least 78%. These data suggest that Ca(2+)-binding pockets are at least partially prepositioned to chelate the ion in the apo form of the protein.
Collapse
Affiliation(s)
- Xue Wang
- Department of Computer Science, Georgia State University, Atlanta, Georgia 30303, USA
| | | | | | | | | | | |
Collapse
|
30
|
Kirberger M, Wang X, Zhao K, Tang S, Chen G, Yang JJ. Integration of Diverse Research Methods to Analyze and Engineer Ca-Binding Proteins: From Prediction to Production. Curr Bioinform 2010; 5:68-80. [PMID: 20802832 PMCID: PMC2927018 DOI: 10.2174/157489310790596358] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In recent years, increasingly sophisticated computational and bioinformatics tools have evolved for the analyses of protein structure, function, ligand interactions, modeling and energetics. This includes the development of algorithms to recursively evaluate side-chain rotamer permutations, identify regions in a 3D structure that meet some set of search parameters, calculate and minimize energy values, and provide high-resolution visual tools for theoretical modeling. Here we discuss the interdependency between different areas of bioinformatics, the evolution of different algorithm design approaches, and finally the transition from theoretical models to real-world design and application as they relate to Ca(2+)-binding proteins. Within this context, it has become evident that significant pre-experimental design and calculations can be modeled through computational methods, thus eliminating potentially unproductive research and increasing our confidence in the correlation between real and theoretical models. Moving from prediction to production, it is anticipated that bioinformatics tools will play an increasingly significant role in research and development, improving our ability to both understand the physiological roles of Ca(2+) and other metals and to extend that knowledge to the design of function-specific synthetic proteins capable of fulfilling different roles in medical diagnostics and therapeutics.
Collapse
Affiliation(s)
- Michael Kirberger
- Department of Chemistry, Center for Drug Design and Biotechnology, Georgia State University, Atlanta, GA 30303, USA
| | - Xue Wang
- Department of Computer Science, Georgia State University, Atlanta, Georgia
| | - Kun Zhao
- Department of Mathematics and Statistics, Georgia State University, Atlanta, Georgia, USA
| | - Shen Tang
- Department of Chemistry, Center for Drug Design and Biotechnology, Georgia State University, Atlanta, GA 30303, USA
| | - Guantao Chen
- Department of Computer Science, Georgia State University, Atlanta, Georgia
- Department of Mathematics and Statistics, Georgia State University, Atlanta, Georgia, USA
| | - Jenny J. Yang
- Department of Chemistry, Center for Drug Design and Biotechnology, Georgia State University, Atlanta, GA 30303, USA
| |
Collapse
|
31
|
Tang ZQ, Lin HH, Zhang HL, Han LY, Chen X, Chen YZ. Prediction of functional class of proteins and peptides irrespective of sequence homology by support vector machines. Bioinform Biol Insights 2009; 1:19-47. [PMID: 20066123 PMCID: PMC2789692 DOI: 10.4137/bbi.s315] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Various computational methods have been used for the prediction of protein and peptide function based on their sequences. A particular challenge is to derive functional properties from sequences that show low or no homology to proteins of known function. Recently, a machine learning method, support vector machines (SVM), have been explored for predicting functional class of proteins and peptides from amino acid sequence derived properties independent of sequence similarity, which have shown promising potential for a wide spectrum of protein and peptide classes including some of the low- and non-homologous proteins. This method can thus be explored as a potential tool to complement alignment-based, clustering-based, and structure-based methods for predicting protein function. This article reviews the strategies, current progresses, and underlying difficulties in using SVM for predicting the functional class of proteins. The relevant software and web-servers are described. The reported prediction performances in the application of these methods are also presented.
Collapse
Affiliation(s)
- Zhi Qun Tang
- Department of Pharmacy and Department of Computational Science, National University of Singapore, Republic of Singapore, 117543
| | - Hong Huang Lin
- Department of Pharmacy and Department of Computational Science, National University of Singapore, Republic of Singapore, 117543
| | - Hai Lei Zhang
- Department of Pharmacy and Department of Computational Science, National University of Singapore, Republic of Singapore, 117543
| | - Lian Yi Han
- Department of Pharmacy and Department of Computational Science, National University of Singapore, Republic of Singapore, 117543
| | - Xin Chen
- Department of Biotechnology, Zhejiang University, Hang Zhou, Zhejiang Province, P. R. China, 310029
| | - Yu Zong Chen
- Department of Pharmacy and Department of Computational Science, National University of Singapore, Republic of Singapore, 117543
- Shanghai Center for Bioinformatics Technology, Shanghai, P. R. China, 201203
| |
Collapse
|
32
|
Abstract
Bioinformatics is a central discipline in modern life sciences aimed at describing the complex properties of living organisms starting from large-scale data sets of cellular constituents such as genes and proteins. In order for this wealth of information to provide useful biological knowledge, databases and software tools for data collection, analysis and interpretation need to be developed. In this paper, we review recent advances in the design and implementation of bioinformatics resources devoted to the study of metals in biological systems, a research field traditionally at the heart of bioinorganic chemistry. We show how metalloproteomes can be extracted from genome sequences, how structural properties can be related to function, how databases can be implemented, and how hints on interactions can be obtained from bioinformatics.
Collapse
Affiliation(s)
- Ivano Bertini
- Magnetic Resonance Center (CERM)-University of Florence, Via L. Sacconi 6, Sesto Fiorentino, Italy.
| | | |
Collapse
|
33
|
Wang X, Kirberger M, Qiu F, Chen G, Yang JJ. Towards predicting Ca2+-binding sites with different coordination numbers in proteins with atomic resolution. Proteins 2009; 75:787-98. [PMID: 19003991 DOI: 10.1002/prot.22285] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Ca(2+)-binding sites in proteins exhibit a wide range of polygonal geometries that directly relate to an equally-diverse set of biological functions. Although the highly-conserved EF-Hand motif has been studied extensively, non-EF-Hand sites exhibit much more structural diversity which has inhibited efforts to determine the precise location of Ca(2+)-binding sites, especially for sites with few coordinating ligands. Previously, we established an algorithm capable of predicting Ca(2+)-binding sites using graph theory to identify oxygen clusters comprised of four atoms lying on a sphere of specified radius, the center of which was the predicted calcium position. Here we describe a new algorithm, MUG (MUltiple Geometries), which predicts Ca(2+)-binding sites in proteins with atomic resolution. After first identifying all the possible oxygen clusters by finding maximal cliques, a calcium center (CC) for each cluster, corresponding to the potential Ca(2+) position, is located to maximally regularize the structure of the (cluster, CC) pair. The structure is then inspected by geometric filters. An unqualified (cluster, CC) pair is further handled by recursively removing oxygen atoms and relocating the CC until its structure is either qualified or contains fewer than four ligand atoms. Ligand coordination is then determined for qualified structures. This algorithm, which predicts both Ca(2+) positions and ligand groups, has been shown to successfully predict over 90% of the documented Ca(2+)-binding sites in three datasets of highly-diversified protein structures with 0.22 to 0.49 A accuracy. All multiple-binding sites (i.e. sites with a single ligand atom associated with multiple calcium ions) were predicted, as were half of the low-coordination sites (i.e. sites with less than four protein ligand atoms) and 14/16 cofactor-coordinating sites. Additionally, this algorithm has the flexibility to incorporate surface water molecules and protein cofactors to further improve the prediction for low-coordination and cofactor-coordinating Ca(2+)-binding sites.
Collapse
Affiliation(s)
- Xue Wang
- Department of Computer Science, Georgia State University, Atlanta, Georgia 30303, USA
| | | | | | | | | |
Collapse
|
34
|
Faria D, Ferreira AEN, Falcão AO. Enzyme classification with peptide programs: a comparative study. BMC Bioinformatics 2009; 10:231. [PMID: 19630945 PMCID: PMC2724424 DOI: 10.1186/1471-2105-10-231] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Accepted: 07/24/2009] [Indexed: 11/29/2022] Open
Abstract
Background Efficient and accurate prediction of protein function from sequence is one of the standing problems in Biology. The generalised use of sequence alignments for inferring function promotes the propagation of errors, and there are limits to its applicability. Several machine learning methods have been applied to predict protein function, but they lose much of the information encoded by protein sequences because they need to transform them to obtain data of fixed length. Results We have developed a machine learning methodology, called peptide programs (PPs), to deal directly with protein sequences and compared its performance with that of Support Vector Machines (SVMs) and BLAST in detailed enzyme classification tasks. Overall, the PPs and SVMs had a similar performance in terms of Matthews Correlation Coefficient, but the PPs had generally a higher precision. BLAST performed globally better than both methodologies, but the PPs had better results than BLAST and SVMs for the smaller datasets. Conclusion The higher precision of the PPs in comparison to the SVMs suggests that dealing with sequences is advantageous for detailed protein classification, as precision is essential to avoid annotation errors. The fact that the PPs performed better than BLAST for the smaller datasets demonstrates the potential of the methodology, but the drop in performance observed for the larger datasets indicates that further development is required. Possible strategies to address this issue include partitioning the datasets into smaller subsets and training individual PPs for each subset, or training several PPs for each dataset and combining them using a bagging strategy.
Collapse
Affiliation(s)
- Daniel Faria
- Department of Informatics, Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal.
| | | | | |
Collapse
|
35
|
Muthukrishnan S, Garg A, Raghava GPS. Oxypred: prediction and classification of oxygen-binding proteins. GENOMICS PROTEOMICS & BIOINFORMATICS 2008; 5:250-2. [PMID: 18267306 PMCID: PMC5054225 DOI: 10.1016/s1672-0229(08)60012-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
This study describes a method for predicting and classifying oxygen-binding proteins. Firstly, support vector machine (SVM) modules were developed using amino acid composition and dipeptide composition for predicting oxygen-binding proteins, and achieved maximum accuracy of 85.5% and 87.8%, respectively. Secondly, an SVM module was developed based on amino acid composition, classifying the predicted oxygen-binding proteins into six classes with accuracy of 95.8%, 97.5%, 97.5%, 96.9%, 99.4%, and 96.0% for erythrocruorin, hemerythrin, hemocyanin, hemoglobin, leghemoglobin, and myoglobin proteins, respectively. Finally, an SVM module was developed using dipeptide composition for classifying the oxygen-binding proteins, and achieved maximum accuracy of 96.1%, 98.7%, 98.7%, 85.6%, 99.6%, and 93.3% for the above six classes, respectively. All modules were trained and tested by five-fold cross validation. Based on the above approach, a web server Oxypred was developed for predicting and classifying oxygen-binding proteins (available from http://www.imtech.res.in/raghava/oxypred/).
Collapse
Affiliation(s)
- S Muthukrishnan
- Institute of Microbial Technology, Sector 39-A, Chandigarh 160036, India
| | | | | |
Collapse
|
36
|
Vilasi S, Ragone R. Abundance of intrinsic disorder in SV-IV, a multifunctional androgen-dependent protein secreted from rat seminal vesicle. FEBS J 2008; 275:763-74. [DOI: 10.1111/j.1742-4658.2007.06242.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
37
|
Andreini C, Banci L, Bertini I, Rosato A. Occurrence of Copper Proteins through the Three Domains of Life: A Bioinformatic Approach. J Proteome Res 2008; 7:209-16. [DOI: 10.1021/pr070480u] [Citation(s) in RCA: 153] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
38
|
Abstract
Metals play a variety of roles in biological processes, and hence their presence in a protein structure can yield vital functional information. Because the residues that coordinate a metal often undergo conformational changes upon binding, detection of binding sites based on simple geometric criteria in proteins without bound metal is difficult. However, aspects of the physicochemical environment around a metal binding site are often conserved even when this structural rearrangement occurs. We have developed a Bayesian classifier using known zinc binding sites as positive training examples and nonmetal binding regions that nonetheless contain residues frequently observed in zinc sites as negative training examples. In order to allow variation in the exact positions of atoms, we average a variety of biochemical and biophysical properties in six concentric spherical shells around the site of interest. At a specificity of 99.8%, this method achieves 75.5% sensitivity in unbound proteins at a positive predictive value of 73.6%. We also test its accuracy on predicted protein structures obtained by homology modeling using templates with 30%-50% sequence identity to the target sequences. At a specificity of 99.8%, we correctly identify at least one zinc binding site in 65.5% of modeled proteins. Thus, in many cases, our model is accurate enough to identify metal binding sites in proteins of unknown structure for which no high sequence identity homologs of known structure exist. Both the source code and a Web interface are available to the public at http://feature.stanford.edu/metals.
Collapse
Affiliation(s)
- Jessica C Ebert
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | | |
Collapse
|
39
|
Bertini I, Cavallaro G. Metals in the “omics” world: copper homeostasis and cytochrome c oxidase assembly in a new light. J Biol Inorg Chem 2007; 13:3-14. [DOI: 10.1007/s00775-007-0316-9] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2007] [Accepted: 10/25/2007] [Indexed: 01/20/2023]
|
40
|
Ong SAK, Lin HH, Chen YZ, Li ZR, Cao Z. Efficacy of different protein descriptors in predicting protein functional families. BMC Bioinformatics 2007; 8:300. [PMID: 17705863 PMCID: PMC1997217 DOI: 10.1186/1471-2105-8-300] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2006] [Accepted: 08/17/2007] [Indexed: 02/02/2023] Open
Abstract
Background Sequence-derived structural and physicochemical descriptors have frequently been used in machine learning prediction of protein functional families, thus there is a need to comparatively evaluate the effectiveness of these descriptor-sets by using the same method and parameter optimization algorithm, and to examine whether the combined use of these descriptor-sets help to improve predictive performance. Six individual descriptor-sets and four combination-sets were evaluated in support vector machines (SVM) prediction of six protein functional families. Results The performance of these descriptor-sets were ranked by Matthews correlation coefficient (MCC), and categorized into two groups based on their performance. While there is no overwhelmingly favourable choice of descriptor-sets, certain trends were found. The combination-sets tend to give slightly but consistently higher MCC values and thus overall best performance such that three out of four combination-sets show slightly better performance compared to one out of six individual descriptor-sets. Conclusion Our study suggests that currently used descriptor-sets are generally useful for classifying proteins and the prediction performance may be enhanced by exploring combinations of descriptors.
Collapse
Affiliation(s)
- Serene AK Ong
- Department of Pharmacy, National University of Singapore, Blk S16, Level 8, 08-14, 3 Science Drive 2, Singapore 117543, Singapore
| | - Hong Huang Lin
- Department of Pharmacy, National University of Singapore, Blk S16, Level 8, 08-14, 3 Science Drive 2, Singapore 117543, Singapore
| | - Yu Zong Chen
- Department of Pharmacy, National University of Singapore, Blk S16, Level 8, 08-14, 3 Science Drive 2, Singapore 117543, Singapore
| | - Ze Rong Li
- College of Chemistry, Sichuan University, Chengdu, 610064, P.R. China
| | - Zhiwei Cao
- Shanghai Center for Bioinformatics Technology, 100, Qinzhou Road, Shanghai 200235 P.R. China
| |
Collapse
|
41
|
Ranganathan S, Tammi M, Gribskov M, Tan TW. Establishing bioinformatics research in the Asia Pacific. BMC Bioinformatics 2006. [PMCID: PMC1764485 DOI: 10.1186/1471-2105-7-s5-s1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
In 1998, the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation was set up to champion the advancement of bioinformatics in the Asia Pacific. By 2002, APBioNet was able to gain sufficient critical mass to initiate the first International Conference on Bioinformatics (InCoB) bringing together scientists working in the field of bioinformatics in the region. This year, the InCoB2006 Conference was organized as the 5th annual conference of the Asia-Pacific Bioinformatics Network, on Dec. 18–20, 2006 in New Delhi, India, following a series of successful events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand) and Busan (South Korea). This Introduction provides a brief overview of the peer-reviewed manuscripts accepted for publication in this Supplement. It exemplifies a typical snapshot of the growing research excellence in bioinformatics of the region as we embark on a trajectory of establishing a solid bioinformatics research culture in the Asia Pacific that is able to contribute fully to the global bioinformatics community.
Collapse
|