1
|
Le NQK, Yapp EKY, Yeh HY. ET-GRU: using multi-layer gated recurrent units to identify electron transport proteins. BMC Bioinformatics 2019; 20:377. [PMID: 31277574 PMCID: PMC6612191 DOI: 10.1186/s12859-019-2972-5] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2019] [Accepted: 06/27/2019] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND Electron transport chain is a series of protein complexes embedded in the process of cellular respiration, which is an important process to transfer electrons and other macromolecules throughout the cell. It is also the major process to extract energy via redox reactions in the case of oxidation of sugars. Many studies have determined that the electron transport protein has been implicated in a variety of human diseases, i.e. diabetes, Parkinson, Alzheimer's disease and so on. Few bioinformatics studies have been conducted to identify the electron transport proteins with high accuracy, however, their performance results require a lot of improvements. Here, we present a novel deep neural network architecture to address this problem. RESULTS Most of the previous studies could not use the original position specific scoring matrix (PSSM) profiles to feed into neural networks, leading to a lack of information and the neural networks consequently could not achieve the best results. In this paper, we present a novel approach by using deep gated recurrent units (GRU) on full PSSMs to resolve this problem. Our approach can precisely predict the electron transporters with the cross-validation and independent test accuracy of 93.5 and 92.3%, respectively. Our approach demonstrates superior performance to all of the state-of-the-art predictors on electron transport proteins. CONCLUSIONS Through the proposed study, we provide ET-GRU, a web server for discriminating electron transport proteins in particular and other protein functions in general. Also, our achievement could promote the use of GRU in computational biology, especially in protein function prediction.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore, 639798 Singapore
| | - Edward Kien Yee Yapp
- Singapore Institute of Manufacturing Technology, 2 Fusionopolis Way, #08-04, Innovis, Singapore, 138634 Singapore
| | - Hui-Yuan Yeh
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore, 639798 Singapore
| |
Collapse
|
2
|
Li T, Chen Y, Li T, Jia C. Recognition of Protein Pupylation Sites by Adopting Resampling Approach. Molecules 2018; 23:molecules23123097. [PMID: 30486421 PMCID: PMC6321382 DOI: 10.3390/molecules23123097] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2018] [Revised: 11/21/2018] [Accepted: 11/22/2018] [Indexed: 12/28/2022] Open
Abstract
With the in-depth study of posttranslational modification sites, protein ubiquitination has become the key problem to study the molecular mechanism of posttranslational modification. Pupylation is a widely used process in which a prokaryotic ubiquitin-like protein (Pup) is attached to a substrate through a series of biochemical reactions. However, the experimental methods of identifying pupylation sites is often time-consuming and laborious. This study aims to propose an improved approach for predicting pupylation sites. Firstly, the Pearson correlation coefficient was used to reflect the correlation among different amino acid pairs calculated by the frequency of each amino acid. Then according to a descending ranked order, the multiple types of features were filtered separately by values of Pearson correlation coefficient. Thirdly, to get a qualified balanced dataset, the K-means principal component analysis (KPCA) oversampling technique was employed to synthesize new positive samples and Fuzzy undersampling method was employed to reduce the number of negative samples. Finally, the performance of our method was verified by means of jackknife and a 10-fold cross-validation test. The average results of 10-fold cross-validation showed that the sensitivity (Sn) was 90.53%, specificity (Sp) was 99.8%, accuracy (Acc) was 95.09%, and Matthews Correlation Coefficient (MCC) was 0.91. Moreover, an independent test dataset was used to further measure its performance, and the prediction results achieved the Acc of 83.75%, MCC of 0.49, which was superior to previous predictors. The better performance and stability of our proposed method showed it is an effective way to predict pupylation sites.
Collapse
Affiliation(s)
- Tao Li
- School of Transportation Management, Dalian Maritime University, Dalian 116026, China.
- China Waterborne Transport Research Institute, Beijing 100088, China.
| | - Yan Chen
- School of Transportation Management, Dalian Maritime University, Dalian 116026, China.
| | - Taoying Li
- School of Transportation Management, Dalian Maritime University, Dalian 116026, China.
| | - Cangzhi Jia
- College of Science, Dalian Maritime University, Dalian 116026, China.
| |
Collapse
|
3
|
Nan X, Bao L, Zhao X, Zhao X, Sangaiah AK, Wang GG, Ma Z. EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites. Molecules 2017; 22:molecules22091463. [PMID: 28872627 PMCID: PMC6151806 DOI: 10.3390/molecules22091463] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2017] [Revised: 08/29/2017] [Accepted: 08/30/2017] [Indexed: 01/20/2023] Open
Abstract
Protein pupylation is a type of post-translation modification, which plays a crucial role in cellular function of bacterial organisms in prokaryotes. To have a better insight of the mechanisms underlying pupylation an initial, but important, step is to identify pupylation sites. To date, several computational methods have been established for the prediction of pupylation sites which usually artificially design the negative samples using the verified pupylation proteins to train the classifiers. However, if this process is not properly done it can affect the performance of the final predictor dramatically. In this work, different from previous computational methods, we proposed an enhanced positive-unlabeled learning algorithm (EPuL) to the pupylation site prediction problem, which uses only positive and unlabeled samples. Firstly, we separate the training dataset into the positive dataset and the unlabeled dataset which contains the remaining non-annotated lysine residues. Then, the EPuL algorithm is utilized to select the reliably negative initial dataset and then iteratively pick out the non-pupylation sites. The performance of the proposed method was measured with an accuracy of 90.24%, an Area Under Curve (AUC) of 0.93 and an MCC of 0.81 by 10-fold cross-validation. A user-friendly web server for predicting pupylation sites was developed and was freely available at http://59.73.198.144:8080/EPuL.
Collapse
Affiliation(s)
- Xuanguo Nan
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China.
| | - Lingling Bao
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China.
| | - Xiaosa Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China.
| | - Xiaowei Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China.
| | - Arun Kumar Sangaiah
- School of Computing Science and Engineering, VIT University, Vellore 632014, Tamil Nadu, India.
| | - Gai-Ge Wang
- School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221116, China.
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China.
| |
Collapse
|
4
|
Unhelkar MH, Duong VT, Enendu KN, Kelly JE, Tahir S, Butts CT, Martin RW. Structure prediction and network analysis of chitinases from the Cape sundew, Drosera capensis. Biochim Biophys Acta Gen Subj 2017; 1861:636-643. [PMID: 28040565 PMCID: PMC6679993 DOI: 10.1016/j.bbagen.2016.12.007] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2016] [Revised: 12/06/2016] [Accepted: 12/09/2016] [Indexed: 12/28/2022]
Abstract
BACKGROUND Carnivorous plants possess diverse sets of enzymes with novel functionalities applicable to biotechnology, proteomics, and bioanalytical research. Chitinases constitute an important class of such enzymes, with future applications including human-safe antifungal agents and pesticides. Here, we compare chitinases from the genome of the carnivorous plant Drosera capensis to those from related carnivorous plants and model organisms. METHODS Using comparative modeling, in silico maturation, and molecular dynamics simulation, we produce models of the mature enzymes in aqueous solution. We utilize network analytic techniques to identify similarities and differences in chitinase topology. RESULTS Here, we report molecular models and functional predictions from protein structure networks for eleven new chitinases from D. capensis, including a novel class IV chitinase with two active domains. This architecture has previously been observed in microorganisms but not in plants. We use a combination of comparative and de novo structure prediction followed by molecular dynamics simulation to produce models of the mature forms of these proteins in aqueous solution. Protein structure network analysis of these and other plant chitinases reveal characteristic features of the two major chitinase families. GENERAL SIGNIFICANCE This work demonstrates how computational techniques can facilitate quickly moving from raw sequence data to refined structural models and comparative analysis, and to select promising candidates for subsequent biochemical characterization. This capability is increasingly important given the large and growing body of data from high-throughput genome sequencing, which makes experimental characterization of every target impractical.
Collapse
Affiliation(s)
- Megha H Unhelkar
- Department of Chemistry, University of California, Irvine, Irvine, CA 92697, USA
| | - Vy T Duong
- Department of Chemistry, University of California, Irvine, Irvine, CA 92697, USA; Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA 92697, USA
| | - Kaosoluchi N Enendu
- Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA 92697, USA
| | - John E Kelly
- Department of Chemistry, University of California, Irvine, Irvine, CA 92697, USA
| | - Seemal Tahir
- Department of Chemistry, University of California, Irvine, Irvine, CA 92697, USA
| | - Carter T Butts
- Department of Sociology, University of California, Irvine, Irvine, CA 92697, USA; Department of Electrical Engineering and Computer Science, University of California, Irvine, Irvine, CA 92697, USA; Department of Statistics, University of California, Irvine, CA 92697, USA.
| | - Rachel W Martin
- Department of Chemistry, University of California, Irvine, Irvine, CA 92697, USA; Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA 92697, USA.
| |
Collapse
|
5
|
Zhang X, Liu S. RBPPred: predicting RNA-binding proteins from sequence using SVM. Bioinformatics 2016; 33:854-862. [DOI: 10.1093/bioinformatics/btw730] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2016] [Accepted: 11/16/2016] [Indexed: 11/13/2022] Open
|
6
|
Saravanan KM, Suvaithenamudhan S, Parthasarathy S, Selvaraj S. Pairwise contact energy statistical potentials can help to find probability of point mutations. Proteins 2016; 85:54-64. [PMID: 27761949 DOI: 10.1002/prot.25191] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Revised: 06/16/2016] [Accepted: 10/13/2016] [Indexed: 11/10/2022]
Abstract
To adopt a particular fold, a protein requires several interactions between its amino acid residues. The energetic contribution of these residue-residue interactions can be approximated by extracting statistical potentials from known high resolution structures. Several methods based on statistical potentials extracted from unrelated proteins are found to make a better prediction of probability of point mutations. We postulate that the statistical potentials extracted from known structures of similar folds with varying sequence identity can be a powerful tool to examine probability of point mutation. By keeping this in mind, we have derived pairwise residue and atomic contact energy potentials for the different functional families that adopt the (α/β)8 TIM-Barrel fold. We carried out computational point mutations at various conserved residue positions in yeast Triose phosphate isomerase enzyme for which experimental results are already reported. We have also performed molecular dynamics simulations on a subset of point mutants to make a comparative study. The difference in pairwise residue and atomic contact energy of wildtype and various point mutations reveals probability of mutations at a particular position. Interestingly, we found that our computational prediction agrees with the experimental studies of Silverman et al. (Proc Natl Acad Sci 2001;98:3092-3097) and perform better prediction than iMutant and Cologne University Protein Stability Analysis Tool. The present work thus suggests deriving pairwise contact energy potentials and molecular dynamics simulations of functionally important folds could help us to predict probability of point mutations which may ultimately reduce the time and cost of mutation experiments. Proteins 2016; 85:54-64. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- K M Saravanan
- Centre of Advanced Study in Crystallography and Biophysics, University of Madras, Guindy Campus, Chennai, Tamilnadu, 600 025, India
| | - S Suvaithenamudhan
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tirchirappalli, Tamilnadu, 620 024, India
| | - S Parthasarathy
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tirchirappalli, Tamilnadu, 620 024, India
| | - S Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tirchirappalli, Tamilnadu, 620 024, India
| |
Collapse
|
7
|
Computational Prediction of RNA-Binding Proteins and Binding Sites. Int J Mol Sci 2015; 16:26303-17. [PMID: 26540053 PMCID: PMC4661811 DOI: 10.3390/ijms161125952] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Revised: 10/20/2015] [Accepted: 10/23/2015] [Indexed: 11/19/2022] Open
Abstract
Proteins and RNA interaction have vital roles in many cellular processes such as protein synthesis, sequence encoding, RNA transfer, and gene regulation at the transcriptional and post-transcriptional levels. Approximately 6%–8% of all proteins are RNA-binding proteins (RBPs). Distinguishing these RBPs or their binding residues is a major aim of structural biology. Previously, a number of experimental methods were developed for the determination of protein–RNA interactions. However, these experimental methods are expensive, time-consuming, and labor-intensive. Alternatively, researchers have developed many computational approaches to predict RBPs and protein–RNA binding sites, by combining various machine learning methods and abundant sequence and/or structural features. There are three kinds of computational approaches, which are prediction from protein sequence, prediction from protein structure, and protein-RNA docking. In this paper, we review all existing studies of predictions of RNA-binding sites and RBPs and complexes, including data sets used in different approaches, sequence and structural features used in several predictors, prediction method classifications, performance comparisons, evaluation methods, and future directions.
Collapse
|
8
|
Hasan MM, Zhou Y, Lu X, Li J, Song J, Zhang Z. Computational Identification of Protein Pupylation Sites by Using Profile-Based Composition of k-Spaced Amino Acid Pairs. PLoS One 2015; 10:e0129635. [PMID: 26080082 PMCID: PMC4469302 DOI: 10.1371/journal.pone.0129635] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2015] [Accepted: 05/10/2015] [Indexed: 11/20/2022] Open
Abstract
Prokaryotic proteins are regulated by pupylation, a type of post-translational modification that contributes to cellular function in bacterial organisms. In pupylation process, the prokaryotic ubiquitin-like protein (Pup) tagging is functionally analogous to ubiquitination in order to tag target proteins for proteasomal degradation. To date, several experimental methods have been developed to identify pupylated proteins and their pupylation sites, but these experimental methods are generally laborious and costly. Therefore, computational methods that can accurately predict potential pupylation sites based on protein sequence information are highly desirable. In this paper, a novel predictor termed as pbPUP has been developed for accurate prediction of pupylation sites. In particular, a sophisticated sequence encoding scheme [i.e. the profile-based composition of k-spaced amino acid pairs (pbCKSAAP)] is used to represent the sequence patterns and evolutionary information of the sequence fragments surrounding pupylation sites. Then, a Support Vector Machine (SVM) classifier is trained using the pbCKSAAP encoding scheme. The final pbPUP predictor achieves an AUC value of 0.849 in10-fold cross-validation tests and outperforms other existing predictors on a comprehensive independent test dataset. The proposed method is anticipated to be a helpful computational resource for the prediction of pupylation sites. The web server and curated datasets in this study are freely available at http://protein.cau.edu.cn/pbPUP/.
Collapse
Affiliation(s)
- Md. Mehedi Hasan
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Yuan Zhou
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Xiaotian Lu
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Jinyan Li
- Advanced Analytics Institute and Centre for Health Technologies, University of Technology, Sydney, 81 Broadway, NSW 2007, Australia
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- Monash Bioinformatics Platform and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC 3800, Australia
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
- * E-mail:
| |
Collapse
|
9
|
Vijayabaskar MS, Vishveshwara S. Insights into the fold organization of TIM barrel from interaction energy based structure networks. PLoS Comput Biol 2012; 8:e1002505. [PMID: 22615547 PMCID: PMC3355060 DOI: 10.1371/journal.pcbi.1002505] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2011] [Accepted: 03/12/2012] [Indexed: 11/17/2022] Open
Abstract
There are many well-known examples of proteins with low sequence similarity, adopting the same structural fold. This aspect of sequence-structure relationship has been extensively studied both experimentally and theoretically, however with limited success. Most of the studies consider remote homology or “sequence conservation” as the basis for their understanding. Recently “interaction energy” based network formalism (Protein Energy Networks (PENs)) was developed to understand the determinants of protein structures. In this paper we have used these PENs to investigate the common non-covalent interactions and their collective features which stabilize the TIM barrel fold. We have also developed a method of aligning PENs in order to understand the spatial conservation of interactions in the fold. We have identified key common interactions responsible for the conservation of the TIM fold, despite high sequence dissimilarity. For instance, the central beta barrel of the TIM fold is stabilized by long-range high energy electrostatic interactions and low-energy contiguous vdW interactions in certain families. The other interfaces like the helix-sheet or the helix-helix seem to be devoid of any high energy conserved interactions. Conserved interactions in the loop regions around the catalytic site of the TIM fold have also been identified, pointing out their significance in both structural and functional evolution. Based on these investigations, we have developed a novel network based phylogenetic analysis for remote homologues, which can perform better than sequence based phylogeny. Such an analysis is more meaningful from both structural and functional evolutionary perspective. We believe that the information obtained through the “interaction conservation” viewpoint and the subsequently developed method of structure network alignment, can shed new light in the fields of fold organization and de novo computational protein design. Proteins are polymers of amino-acids that fold into unique three-dimensional structures to perform cellular functions. This structure formation has been shown to depend on the amino-acid sequences. But examples of proteins with diverse sequences retaining a similar structural fold are quite substantial that we can no longer consider such phenomenon as exceptions. Therefore, this non-canonical relationship has been studied extensively mostly by studying the remote sequence similarities between proteins. Here we have attempted to address the above-mentioned problem by analyzing the similarities in the spatial interactions among amino-acids. Since the protein structure is a resultant of different interactions, we have considered the proteins as networks of interacting amino-acids to derive the common interactions within a popular structural fold called the TIM barrel fold. We were able to find common interactions among different families of the TIM fold and generalize the patterns of interactions by which the fold is being maintained despite sequence diversity. The results substantiate our hypothesis that interaction conservation might by a driving factor in fold formation and this new outlook can be used extensively in engineering proteins with better biophysical characteristics.
Collapse
Affiliation(s)
- M S Vijayabaskar
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | | |
Collapse
|
10
|
Chen Z, Chen YZ, Wang XF, Wang C, Yan RX, Zhang Z. Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs. PLoS One 2011; 6:e22930. [PMID: 21829559 PMCID: PMC3146527 DOI: 10.1371/journal.pone.0022930] [Citation(s) in RCA: 136] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2011] [Accepted: 07/01/2011] [Indexed: 01/08/2023] Open
Abstract
As one of the most important reversible protein post-translation modifications, ubiquitination has been reported to be involved in lots of biological processes and closely implicated with various diseases. To fully decipher the molecular mechanisms of ubiquitination-related biological processes, an initial but crucial step is the recognition of ubiquitylated substrates and the corresponding ubiquitination sites. Here, a new bioinformatics tool named CKSAAP_UbSite was developed to predict ubiquitination sites from protein sequences. With the assistance of Support Vector Machine (SVM), the highlight of CKSAAP_UbSite is to employ the composition of k-spaced amino acid pairs surrounding a query site (i.e. any lysine in a query sequence) as input. When trained and tested in the dataset of yeast ubiquitination sites (Radivojac et al, Proteins, 2010, 78: 365-380), a 100-fold cross-validation on a 1∶1 ratio of positive and negative samples revealed that the accuracy and MCC of CKSAAP_UbSite reached 73.40% and 0.4694, respectively. The proposed CKSAAP_UbSite has also been intensively benchmarked to exhibit better performance than some existing predictors, suggesting that it can be served as a useful tool to the community. Currently, CKSAAP_UbSite is freely accessible at http://protein.cau.edu.cn/cksaap_ubsite/. Moreover, we also found that the sequence patterns around ubiquitination sites are not conserved across different species. To ensure a reasonable prediction performance, the application of the current CKSAAP_UbSite should be limited to the proteome of yeast.
Collapse
Affiliation(s)
- Zhen Chen
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
- Bioinformatics Center, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Yong-Zi Chen
- Tianjin Cancer Institute, Tianjin Medical University Cancer Institute and Hospital, Tianjin, China
| | - Xiao-Feng Wang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
- Bioinformatics Center, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Chuan Wang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
- Bioinformatics Center, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Ren-Xiang Yan
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
- Bioinformatics Center, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
- Bioinformatics Center, College of Biological Sciences, China Agricultural University, Beijing, China
- * E-mail:
| |
Collapse
|
11
|
Outer membrane proteins can be simply identified using secondary structure element alignment. BMC Bioinformatics 2011; 12:76. [PMID: 21414186 PMCID: PMC3072342 DOI: 10.1186/1471-2105-12-76] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2010] [Accepted: 03/17/2011] [Indexed: 02/04/2023] Open
Abstract
Background Outer membrane proteins (OMPs) are frequently found in the outer membranes of gram-negative bacteria, mitochondria and chloroplasts and have been found to play diverse functional roles. Computational discrimination of OMPs from globular proteins and other types of membrane proteins is helpful to accelerate new genome annotation and drug discovery. Results Based on the observation that almost all OMPs consist of antiparallel β-strands in a barrel shape and that their secondary structure arrangements differ from those of other types of proteins, we propose a simple method called SSEA-OMP to identify OMPs using secondary structure element alignment. Through intensive benchmark experiments, the proposed SSEA-OMP method is better than some well-established OMP detection methods. Conclusions The major advantage of SSEA-OMP is its good prediction performance considering its simplicity. The web server implements the method is freely accessible at http://protein.cau.edu.cn/SSEA-OMP/index.html.
Collapse
|