1
|
Tang T, Zhang X, Li W, Wang Q, Liu Y, Cao X. Co-training based prediction of multi-label protein-protein interactions. Comput Biol Med 2024; 177:108623. [PMID: 38788374 DOI: 10.1016/j.compbiomed.2024.108623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 05/01/2024] [Accepted: 05/16/2024] [Indexed: 05/26/2024]
Abstract
Prediction of protein-protein interaction (PPI) types enhances the comprehension of the underlying structural characteristics and functions of proteins, which gives rise to a multi-label classification problem. The nominal features describe the physicochemical characteristics of proteins directly, establishing a more robust correlation with the interaction types between proteins than ordered features. Motivated by this, we propose a multi-label PPI prediction model referred to as CoMPPI (Co-training based Multi-Label prediction of Protein-Protein Interaction). This approach aims to maximize the utility of both ordered and nominal features extracted from protein sequences. Specifically, CoMPPI incorporates graph convolutional network (GCN) and 1D convolution operation to process the complementary subsets of features individually, leveraging both local and contextualized information in a more efficient way. In addition, two multi-type PPI datasets were constructed to eliminate the duplication in previous datasets. We compare the performance of CoMPPI with three state-of-the-art methods on three datasets partitioned using distinct schemes (Breadth-first search, Depth-first search, and Random), CoMPPI consistently outperforms the other methods across all cases, demonstrating improvements ranging from 3.81% to 32.40% in Micro-F1. The subsequent ablation experiment confirms the efficacy of employing the co-training framework for multi-label PPI prediction, indicating promising avenues for future advancements in this domain.
Collapse
Affiliation(s)
- Tao Tang
- School of Modern Posts, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Nanjing, 210023, Jiangsu, China
| | - Xiaocai Zhang
- Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, Singapore, 138632, Singapore
| | - Weizhuo Li
- School of Modern Posts, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Nanjing, 210023, Jiangsu, China
| | - Qing Wang
- School of Management, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Nanjing, 210023, Jiangsu, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, 2 Lushan Rd, Changsha, 410086, Hunan, China; Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China.
| | - Xiaofeng Cao
- School of Artificial Intelligence, Jilin University, 2699 Qianjin St, Jilin, 130012, Changchun, China
| |
Collapse
|
2
|
Reaching optimized parameter set: protein secondary structure prediction using neural network. Neural Comput Appl 2016. [DOI: 10.1007/s00521-015-2150-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
3
|
Yousef A, Moghadam Charkari N. SFM: A novel sequence-based fusion method for disease genes identification and prioritization. J Theor Biol 2015. [DOI: 10.1016/j.jtbi.2015.07.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
4
|
Colombatti A, Spessotto P, Doliana R, Mongiat M, Bressan GM, Esposito G. The EMILIN/Multimerin family. Front Immunol 2012; 2:93. [PMID: 22566882 PMCID: PMC3342094 DOI: 10.3389/fimmu.2011.00093] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2011] [Accepted: 12/21/2011] [Indexed: 01/12/2023] Open
Abstract
Elastin microfibrillar interface proteins (EMILINs) and Multimerins (EMILIN1, EMILIN2, Multimerin1, and Multimerin2) constitute a four member family that in addition to the shared C-terminus gC1q domain typical of the gC1q/TNF superfamily members contain a N-terminus unique cysteine-rich EMI domain. These glycoproteins are homotrimeric and assemble into high molecular weight multimers. They are predominantly expressed in the extracellular matrix and contribute to several cellular functions in part associated with the gC1q domain and in part not yet assigned nor linked to other specific regions of the sequence. Among the latter is the control of arterial blood pressure, the inhibition of Bacillus anthracis cell cytotoxicity, the promotion of cell death, the proangiogenic function, and a role in platelet hemostasis. The focus of this review is to highlight the multiplicity of functions and domains of the EMILIN/Multimerin family with a particular emphasis on the regulatory role played by the ligand-receptor interactions of the gC1q domain. EMILIN1 is the most extensively studied member both from the structural and functional point of view. The structure of the gC1q of EMILIN1 solved by NMR highlights unique characteristics compared to other gC1q domains: it shows a marked decrease of the contact surface of the trimeric assembly and while conserving the jelly-roll topology with two β-sheets of antiparallel strands it presents a nine-stranded β-sandwich fold instead of the usual 10-stranded fold. This is likely due to the insertion of nine residues that disrupt the ordered strand organization and forma a highly dynamic protruding loop. In this loop the residue E933 is the site of interaction between gC1q and the α4β1 and α9β1 integrins, and contrary to integrin occupancy that usually upregulates cell growth, when gC1q is ligated by the integrin the cells reduce their proliferative activity.
Collapse
Affiliation(s)
- Alfonso Colombatti
- Experimental Oncology 2, Centro di Riferimento Oncologico, Istituto di Ricerca e Cura a Carattere Scientifico Aviano, Italy.
| | | | | | | | | | | |
Collapse
|
5
|
Doliana R, Veljkovic V, Prljic J, Veljkovic N, De Lorenzo E, Mongiat M, Ligresti G, Marastoni S, Colombatti A. EMILINs interact with anthrax protective antigen and inhibit toxin action in vitro. Matrix Biol 2007; 27:96-106. [PMID: 17988845 DOI: 10.1016/j.matbio.2007.09.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2007] [Revised: 09/13/2007] [Accepted: 09/26/2007] [Indexed: 11/29/2022]
Abstract
The informational spectrum method (ISM) is a virtual spectroscopy method for the fast analysis of potential protein-protein relationships. By applying the ISM approach to the GeneBank protein database the vascular proteins EMILIN1 (Elastin Microfibril Interface Located ProteIN), EMILIN2, MMN1, and MMN2 were identified as additional anthrax PA antigen interacting molecules. This virtual molecular interaction was formally proven by solid phase assays using recombinant proteins. The interaction is independent of the presence of divalent cations and does not involve PA aspartic residue at 683, a critical residue in receptor binding. In fact, the D683A point mutation fully prevented the cell intoxication ability of PA in the presence of Lethal Factor, but it was fully ineffective on the binding of mutated PA to EMILIN1 and EMILIN2. The ISM approach also led to the identification of the potential interaction sites between PA and EMILINs. A PA mutant with a deletion at residue D425 and solid phase protein-protein interaction studies as well as deletion mutant of EMILIN2 confirmed the hypothesized interaction site. Our findings imply that the PA-cell surface receptor interaction is not likely to provide the full explanation for the vascular lesions and prominent hemorrhages that follow Bacillus anthracis infection and spreading and call into play vascular associated proteins such as EMILINs as potential inhibitory proteins.
Collapse
Affiliation(s)
- Roberto Doliana
- Divisione di Oncologia Sperimentale 2, CRO-IRCCS, Aviano, Italy.
| | | | | | | | | | | | | | | | | |
Collapse
|
6
|
A procedure for identifying homologous alternative splicing events. BMC Bioinformatics 2007; 8:260. [PMID: 17640387 PMCID: PMC1950890 DOI: 10.1186/1471-2105-8-260] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2007] [Accepted: 07/19/2007] [Indexed: 01/11/2023] Open
Abstract
Background The study of the functional role of alternative splice isoforms of a gene is a very active area of research in biology. The difficulty of the experimental approach (in particular, in its high-throughput version) leaves ample room for the development of bioinformatics tools that can provide a useful first picture of the problem. Among the possible approaches, one of the simplest is to follow classical protein function annotation protocols and annotate target alternative splice events with the information available from conserved events in other species. However, the application of this protocol requires a procedure capable of recognising such events. Here we present a simple but accurate method developed for this purpose. Results We have developed a method for identifying homologous, or equivalent, alternative splicing events, based on the combined use of neural networks and sequence searches. The procedure comprises four steps: (i) BLAST search for homologues of the two isoforms defining the target alternative splicing event; (ii) construction of all possible candidate events; (iii) scoring of the latter with a series of neural networks; and (iv) filtering of the results. When tested in a set of 473 manually annotated pairs of homologous events, our method showed a good performance, with an accuracy of 0.99, a precision of 0.98 and a sensitivity of 0.93. When no candidates were available, the specificity of our method varied between 0.81 and 0.91. Conclusion The method described in this article allows the identification of homologous alternative splicing events, with a good success rate, indicating that such method could be used for the development of functional annotation of alternative splice isoforms.
Collapse
|
7
|
Sivan S, Filo O, Siegelmann H. Application of expert networks for predicting proteins secondary structure. ACTA ACUST UNITED AC 2007; 24:237-43. [PMID: 17236807 DOI: 10.1016/j.bioeng.2006.12.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2006] [Revised: 12/05/2006] [Accepted: 12/06/2006] [Indexed: 02/02/2023]
Abstract
The present study utilizes expert neural networks for the prediction of proteins secondary structure. We use three independent networks, one for each structure (alpha, beta and coil) as the first-level processing unit; decision upon the chosen structure for each residue is carried out by a second-level, post-processing unit, which utilizes the Chou and Fasman frequency values Falpha and Fbeta in order to strengthen and/or deplete the probability of the specific structure under investigation. The highest prediction case was 76%. Our method requires primitive computational means and a relatively small training set, while still been comparable to previous work. It is not meant to be an alternative to the determination of secondary structure by means of free energy minimization, integration of dynamic equations of motion or crystallography, which are expensive, time-consuming and complicated, but to provide additional constrains, which might be considered and incorporated into larger computing setups in order to reduce the initial search space for the above methods.
Collapse
Affiliation(s)
- Sarit Sivan
- Department of Biomedical Engineering, Technion, Israel Institute of Technology, IIT, Haifa 32000, Israel.
| | | | | |
Collapse
|
8
|
Li ZR, Lin HH, Han LY, Jiang L, Chen X, Chen YZ. PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res 2006; 34:W32-7. [PMID: 16845018 PMCID: PMC1538821 DOI: 10.1093/nar/gkl305] [Citation(s) in RCA: 203] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2005] [Revised: 01/17/2006] [Accepted: 04/10/2006] [Indexed: 02/01/2023] Open
Abstract
Sequence-derived structural and physicochemical features have frequently been used in the development of statistical learning models for predicting proteins and peptides of different structural, functional and interaction profiles. PROFEAT (Protein Features) is a web server for computing commonly-used structural and physicochemical features of proteins and peptides from amino acid sequence. It computes six feature groups composed of ten features that include 51 descriptors and 1447 descriptor values. The computed features include amino acid composition, dipeptide composition, normalized Moreau-Broto autocorrelation, Moran autocorrelation, Geary autocorrelation, sequence-order-coupling number, quasi-sequence-order descriptors and the composition, transition and distribution of various structural and physicochemical properties. In addition, it can also compute previous autocorrelations descriptors based on user-defined properties. Our computational algorithms were extensively tested and the computed protein features have been used in a number of published works for predicting proteins of functional classes, protein-protein interactions and MHC-binding peptides. PROFEAT is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/prof/prof.cgi.
Collapse
Affiliation(s)
- Z. R. Li
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of SingaporeBlk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
- College of Chemistry, Sichuan UniversityChengdu, 610064, P. R. China
| | - H. H. Lin
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of SingaporeBlk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
| | - L. Y. Han
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of SingaporeBlk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
| | - L. Jiang
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of SingaporeBlk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
| | - X. Chen
- Department of Biotechnology, Zhejiang UniversityHangzhou, 310029, P. R. China
| | - Y. Z. Chen
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of SingaporeBlk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
- Shanghai Center for Bioinformation TechnologyShanghai, 201203, P. R. China
| |
Collapse
|
9
|
Guo YZ, Li M, Lu M, Wen Z, Wang K, Li G, Wu J. Classifying G protein-coupled receptors and nuclear receptors on the basis of protein power spectrum from fast Fourier transform. Amino Acids 2006; 30:397-402. [PMID: 16773242 DOI: 10.1007/s00726-006-0332-z] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2005] [Accepted: 01/04/2006] [Indexed: 10/24/2022]
Abstract
As the potential drug targets, G-protein coupled receptors (GPCRs) and nuclear receptors (NRs) are the focuses in pharmaceutical research. It is of great practical significance to develop an automated and reliable method to facilitate the identification of novel receptors. In this study, a method of fast Fourier transform-based support vector machine was proposed to classify GPCRs and NRs from the hydrophobicity of proteins. The models for all the GPCR families and NR subfamilies were trained and validated using jackknife test and the results thus obtained are quite promising. Meanwhile, the performance of the method was evaluated on GPCR and NR independent datasets with good performance. The good results indicate the applicability of the method. Two web servers implementing the prediction are available at http://chem.scu.edu.cn/blast/Pred-GPCR and http://chem.scu.edu.cn/blast/Pred-NR.
Collapse
Affiliation(s)
- Y-Z Guo
- College of Chemistry, Sichuan University, Chengdu, China
| | | | | | | | | | | | | |
Collapse
|
10
|
Gupta N, Mangal N, Biswas S. Evolution and similarity evaluation of protein structures in contact map space. Proteins 2006; 59:196-204. [PMID: 15726585 DOI: 10.1002/prot.20415] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Prediction of fold from amino acid sequence of a protein has been an active area of research in the past few years, but the limited accuracy of existing techniques emphasizes the need to develop newer approaches to tackle this task. In this study, we use contact map prediction as an intermediate step in fold prediction from sequence. Contact map is a reduced graph-theoretic representation of proteins that models the local and global inter-residue contacts in the structure. We start with a population of random contact maps for the protein sequence and "evolve" the population to a "high-feasibility" configuration using a genetic algorithm. A neural network is employed to assess the feasibility of contact maps based on their 4 physically relevant properties. We also introduce 5 parameters, based on algebraic graph theory and physical considerations, that can be used to judge the structural similarity between proteins through contact maps. To predict the fold of a given amino acid sequence, we predict a contact map that will sufficiently approximate the structure of the corresponding protein. Then we assess the similarity of this contact map with the representative contact map of each fold; the fold that corresponds to the closest match is our predicted fold for the input sequence. We have found that our feasibility measure is able to differentiate between feasible and infeasible contact maps. Further, this novel approach is able to predict the folds from sequences significantly better than a random predictor.
Collapse
Affiliation(s)
- Nitin Gupta
- Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, India.
| | | | | |
Collapse
|
11
|
Jia J, Yang L, Zhang ZZ. EHPred: an SVM-based method for epoxide hydrolases recognition and classification. J Zhejiang Univ Sci B 2005; 7:1-6. [PMID: 16365918 PMCID: PMC1361752 DOI: 10.1631/jzus.2006.b0001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A two-layer method based on support vector machines (SVMs) has been developed to distinguish epoxide hydrolases (EHs) from other enzymes and to classify its subfamilies using its primary protein sequences. SVM classifiers were built using three different feature vectors extracted from the primary sequence of EHs: the amino acid composition (AAC), the dipeptide composition (DPC), and the pseudo-amino acid composition (PAAC). Validated by 5-fold cross tests, the first layer SVM classifier can differentiate EHs and non-EHs with an accuracy of 94.2% and has a Matthew's correlation coefficient (MCC) of 0.84. Using 2-fold cross validation, PAAC-based second layer SVM can further classify EH subfamilies with an overall accuracy of 90.7% and MCC of 0.87 as compared to AAC (80.0%) and DPC (84.9%). A program called EHPred has also been developed to assist readers to recognize EHs and to classify their subfamilies using primary protein sequences with greater accuracy.
Collapse
Affiliation(s)
- Jia Jia
- James. D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310008, China
| | - Liang Yang
- James. D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310008, China
| | - Zi-zhang Zhang
- James. D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310008, China
- Department of Chemistry, Zhejiang University, Hangzhou 310027, China
- †E-mail:
| |
Collapse
|
12
|
Gupta K, Thomas D, Vidya SV, Venkatesh KV, Ramakumar S. Detailed protein sequence alignment based on Spectral Similarity Score (SSS). BMC Bioinformatics 2005; 6:105. [PMID: 15850477 PMCID: PMC1131888 DOI: 10.1186/1471-2105-6-105] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2005] [Accepted: 04/23/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The chemical property and biological function of a protein is a direct consequence of its primary structure. Several algorithms have been developed which determine alignment and similarity of primary protein sequences. However, character based similarity cannot provide insight into the structural aspects of a protein. We present a method based on spectral similarity to compare subsequences of amino acids that behave similarly but are not aligned well by considering amino acids as mere characters. This approach finds a similarity score between sequences based on any given attribute, like hydrophobicity of amino acids, on the basis of spectral information after partial conversion to the frequency domain. RESULTS Distance matrices of various branches of the human kinome, that is the full complement of human kinases, were developed that matched the phylogenetic tree of the human kinome establishing the efficacy of the global alignment of the algorithm. PKCd and PKCe kinases share close biological properties and structural similarities but do not give high scores with character based alignments. Detailed comparison established close similarities between subsequences that do not have any significant character identity. We compared their known 3D structures to establish that the algorithm is able to pick subsequences that are not considered similar by character based matching algorithms but share structural similarities. Similarly many subsequences with low character identity were picked between xyna-theau and xyna-clotm F/10 xylanases. Comparison of 3D structures of the subsequences confirmed the claim of similarity in structure. CONCLUSION An algorithm is developed which is inspired by successful application of spectral similarity applied to music sequences. The method captures subsequences that do not align by traditional character based alignment tools but give rise to similar secondary and tertiary structures. The Spectral Similarity Score (SSS) is an extension to the conventional similarity methods and results indicate that it holds a strong potential for analysis of various biological sequences and structural variations in proteins.
Collapse
Affiliation(s)
- Kshitiz Gupta
- Department of Computer Science & Engineering, Indian Institute of Technology, Bombay, Mumbai, India
- Department of Chemical Engineering, Indian Institute of Technology, Bombay, Mumbai, India
- School of Biosciences & Bioengineering, Indian Institute of Technology, Bombay, Mumbai, India
| | - Dina Thomas
- Department of Computer Science & Engineering, Indian Institute of Technology, Bombay, Mumbai, India
| | - SV Vidya
- Department of Physics, Indian Institute of Science, Bangalore, India
| | - KV Venkatesh
- Department of Chemical Engineering, Indian Institute of Technology, Bombay, Mumbai, India
- School of Biosciences & Bioengineering, Indian Institute of Technology, Bombay, Mumbai, India
| | - S Ramakumar
- Department of Physics, Indian Institute of Science, Bangalore, India
- Bioinformatics Center, Indian Institute of Science, Bangalore, India
| |
Collapse
|
13
|
Bhasin M, Raghava GPS. ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nucleic Acids Res 2004; 32:W414-9. [PMID: 15215421 PMCID: PMC441488 DOI: 10.1093/nar/gkh350] [Citation(s) in RCA: 208] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2004] [Accepted: 02/09/2004] [Indexed: 11/13/2022] Open
Abstract
Automated prediction of subcellular localization of proteins is an important step in the functional annotation of genomes. The existing subcellular localization prediction methods are based on either amino acid composition or N-terminal characteristics of the proteins. In this paper, support vector machine (SVM) has been used to predict the subcellular location of eukaryotic proteins from their different features such as amino acid composition, dipeptide composition and physico-chemical properties. The SVM module based on dipeptide composition performed better than the SVM modules based on amino acid composition or physico-chemical properties. In addition, PSI-BLAST was also used to search the query sequence against the dataset of proteins (experimentally annotated proteins) to predict its subcellular location. In order to improve the prediction accuracy, we developed a hybrid module using all features of a protein, which consisted of an input vector of 458 dimensions (400 dipeptide compositions, 33 properties, 20 amino acid compositions of the protein and 5 from PSI-BLAST output). Using this hybrid approach, the prediction accuracies of nuclear, cytoplasmic, mitochondrial and extracellular proteins reached 95.3, 85.2, 68.2 and 88.9%, respectively. The overall prediction accuracy of SVM modules based on amino acid composition, physico-chemical properties, dipeptide composition and the hybrid approach was 78.1, 77.8, 82.9 and 88.0%, respectively. The accuracy of all the modules was evaluated using a 5-fold cross-validation technique. Assigning a reliability index (reliability index > or =3), 73.5% of prediction can be made with an accuracy of 96.4%. Based on the above approach, an online web server ESLpred was developed, which is available at http://www.imtech.res.in/raghava/eslpred/.
Collapse
Affiliation(s)
- Manoj Bhasin
- Bioinformatics Centre, Institute of Microbial Technology, Sector 39A, Chandigarh, India
| | | |
Collapse
|
14
|
Bhasin M, Raghava GPS. Classification of nuclear receptors based on amino acid composition and dipeptide composition. J Biol Chem 2004; 279:23262-6. [PMID: 15039428 DOI: 10.1074/jbc.m401932200] [Citation(s) in RCA: 175] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Nuclear receptors are key transcription factors that regulate crucial gene networks responsible for cell growth, differentiation, and homeostasis. Nuclear receptors form a superfamily of phylogenetically related proteins and control functions associated with major diseases (e.g. diabetes, osteoporosis, and cancer). In this study, a novel method has been developed for classifying the subfamilies of nuclear receptors. The classification was achieved on the basis of amino acid and dipeptide composition from a sequence of receptors using support vector machines. The training and testing was done on a non-redundant data set of 282 proteins obtained from the NucleaRDB data base (1). The performance of all classifiers was evaluated using a 5-fold cross validation test. In the 5-fold cross-validation, the data set was randomly partitioned into five equal sets and evaluated five times on each distinct set while keeping the remaining four sets for training. It was found that different subfamilies of nuclear receptors were quite closely correlated in terms of amino acid composition as well as dipeptide composition. The overall accuracy of amino acid composition-based and dipeptide composition-based classifiers were 82.6 and 97.5%, respectively. Therefore, our results prove that different subfamilies of nuclear receptors are predictable with considerable accuracy using amino acid or dipeptide composition. Furthermore, based on above approach, an online web service, NRpred, was developed, which is available at www.imtech.res.in/raghava/nrpred.
Collapse
Affiliation(s)
- Manoj Bhasin
- Institute of Microbial Technology, Chandigarh 160036, India
| | | |
Collapse
|