1
|
Zhao X, Wang X, Jin Z, Wang R. A normalized differential sequence feature encoding method based on amino acid sequences. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:14734-14755. [PMID: 37679156 DOI: 10.3934/mbe.2023659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Protein interactions are the foundation of all metabolic activities of cells, such as apoptosis, the immune response, and metabolic pathways. In order to optimize the performance of protein interaction prediction, a coding method based on normalized difference sequence characteristics (NDSF) of amino acid sequences is proposed. By using the positional relationships between amino acids in the sequences and the correlation characteristics between sequence pairs, NDSF is jointly encoded. Using principal component analysis (PCA) and local linear embedding (LLE) dimensionality reduction methods, the coded 174-dimensional human protein sequence vector is extracted using sequence features. This study compares the classification performance of four ensemble learning methods (AdaBoost, Extra trees, LightGBM, XGBoost) applied to PCA and LLE features. Cross-validation and grid search methods are used to find the best combination of parameters. The results show that the accuracy of NDSF is generally higher than that of the sequence matrix-based coding method (MOS) coding method, and the loss and coding time can be greatly reduced. The bar chart of feature extraction shows that the classification accuracy is significantly higher when using the linear dimensionality reduction method, PCA, compared to the nonlinear dimensionality reduction method, LLE. After classification with XGBoost, the model accuracy reaches 99.2%, which provides the best performance among all models. This study suggests that NDSF combined with PCA and XGBoost may be an effective strategy for classifying different human protein interactions.
Collapse
Affiliation(s)
- Xiaoman Zhao
- Institute of Intelligent Machinery, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
- University of Science and Technology of China, Hefei 230026, Chin
| | - Xue Wang
- Institute of Intelligent Machinery, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
| | - Zhou Jin
- Institute of Intelligent Machinery, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
| | - Rujing Wang
- Institute of Intelligent Machinery, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
- University of Science and Technology of China, Hefei 230026, Chin
| |
Collapse
|
2
|
Bi J, Wang Y, Gao R, Liu P, Jiang Y, Gao L, Li B, Song Q, Ning M. Functional Analysis of a CTL-X-Type Lectin CTL16 in Development and Innate Immunity of Tribolium castaneum. Int J Mol Sci 2023; 24:10700. [PMID: 37445878 DOI: 10.3390/ijms241310700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 06/23/2023] [Accepted: 06/24/2023] [Indexed: 07/15/2023] Open
Abstract
C-type lectins (CTLs) are a class of proteins containing carbohydrate recognition domains (CRDs), which are characteristic modules that recognize various glycoconjugates and function primarily in immunity. CTLs have been reported to affect growth and development and positively regulate innate immunity in Tribolium castaneum. However, the regulatory mechanisms of TcCTL16 proteins are still unclear. Here, spatiotemporal analyses displayed that TcCTL16 was highly expressed in late pupae and early adults. TcCTL16 RNA interference in early larvae shortened their body length and narrowed their body width, leading to the death of 98% of the larvae in the pupal stage. Further analysis found that the expression level of muscle-regulation-related genes, including cut, vestigial, erect wing, apterous, and spalt major, and muscle-composition-related genes, including Myosin heavy chain and Myosin light chain, were obviously down-regulated after TcCTL16 silencing in T. castaneum. In addition, the transcription of TcCTL16 was mainly distributed in the hemolymph. TcCTL16 was significantly upregulated after challenges with lipopolysaccharides, peptidoglycans, Escherichia coli, and Staphylococcus aureus. Recombinant CRDs of TcCTL16 bind directly to the tested bacteria (except Bacillus subtilis); they also induce extensive bacterial agglutination in the presence of Ca2+. On the contrary, after TcCTL16 silencing in the late larval stage, T. castaneum were able to develop normally. Moreover, the transcript levels of seven antimicrobial peptide genes (attacin2, defensins1, defensins2, coleoptericin1, coleoptericin2, cecropins2, and cecropins3) and one transcription factor gene (relish) were significantly increased under E. coli challenge and led to an increased survival rate of T. castaneum when infected with S. aureus or E. coli, suggesting that TcCTL16 deficiency could be compensated for by increasing AMP expression via the IMD pathways in T. castaneum. In conclusion, this study found that TcCTL16 could be involved in developmental regulation in early larvae and compensate for the loss of CTL function by regulating the expression of AMPs in late larvae, thus laying a solid foundation for further studies on T. castaneum CTLs.
Collapse
Affiliation(s)
- Jingxiu Bi
- Laboratory of Quality and Safety Risk Assessment for Agro-Products of the Ministry of Agriculture (Jinan), Institute of Quality Standard and Testing Technology for Agro-Products, Shandong Academy of Agricultural Sciences, Jinan 250100, China
| | - Yutao Wang
- Laboratory of Quality and Safety Risk Assessment for Agro-Products of the Ministry of Agriculture (Jinan), Institute of Quality Standard and Testing Technology for Agro-Products, Shandong Academy of Agricultural Sciences, Jinan 250100, China
| | - Rui Gao
- Laboratory of Quality and Safety Risk Assessment for Agro-Products of the Ministry of Agriculture (Jinan), Institute of Quality Standard and Testing Technology for Agro-Products, Shandong Academy of Agricultural Sciences, Jinan 250100, China
| | - Pingxiang Liu
- Laboratory of Quality and Safety Risk Assessment for Agro-Products of the Ministry of Agriculture (Jinan), Institute of Quality Standard and Testing Technology for Agro-Products, Shandong Academy of Agricultural Sciences, Jinan 250100, China
| | - Yuying Jiang
- Laboratory of Quality and Safety Risk Assessment for Agro-Products of the Ministry of Agriculture (Jinan), Institute of Quality Standard and Testing Technology for Agro-Products, Shandong Academy of Agricultural Sciences, Jinan 250100, China
| | - Lei Gao
- Laboratory of Quality and Safety Risk Assessment for Agro-Products of the Ministry of Agriculture (Jinan), Institute of Quality Standard and Testing Technology for Agro-Products, Shandong Academy of Agricultural Sciences, Jinan 250100, China
| | - Bin Li
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Qisheng Song
- Division of Plant Science and Technology, University of Missouri, Columbia, MO 65211, USA
| | - Mingxiao Ning
- Laboratory of Quality and Safety Risk Assessment for Agro-Products of the Ministry of Agriculture (Jinan), Institute of Quality Standard and Testing Technology for Agro-Products, Shandong Academy of Agricultural Sciences, Jinan 250100, China
| |
Collapse
|
3
|
Nielsen H, Petsalaki EI, Zhao L, Stühler K. Predicting eukaryotic protein secretion without signals. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2019; 1867:140174. [DOI: 10.1016/j.bbapap.2018.11.011] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2018] [Revised: 10/30/2018] [Accepted: 11/29/2018] [Indexed: 10/27/2022]
|
4
|
Zhang J, Zhang Y, Ma Z. In silico Prediction of Human Secretory Proteins in Plasma Based on Discrete Firefly Optimization and Application to Cancer Biomarkers Identification. Front Genet 2019; 10:542. [PMID: 31244885 PMCID: PMC6563772 DOI: 10.3389/fgene.2019.00542] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 05/21/2019] [Indexed: 12/20/2022] Open
Abstract
The early control and prevention of cancer contributes effectively interventions and cancer therapies. Secretory protein, one of the richest biomarkers, is proved important as molecular signposts of the physiological state of a cell. In this work, we aim to propose a proteomic high-throughput technology platform to facilitate detection of early cancer by means of biomarkers that secreted into the bloodstream. We compile a new benchmark dataset of human secretory proteins in plasma. A series of sequence-derived features, which have been proved involved in the structure and function of the secretory proteins, are collected to mathematically encode these proteins. Considering the influence of potential irrelevant or redundant features, we introduce discrete firefly optimization algorithm to perform feature selection. We evaluate and compare the proposed method SCRIP (Secretory proteins in plasma) with state-of-the-art approaches on benchmark datasets and independent testing datasets. SCRIP achieves the average AUC values of 0.876 and 0.844 in five-fold the cross-validation and independent test, respectively. Besides that, we also test SCRIP on proteins in four types of cancer tissues and successfully detect 66∼77% potential cancer biomarkers.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
- Henan Key Laboratory of Education Big Data Analysis and Application, Xinyang, China
| | - Yu Zhang
- Information Engineering College, Huanghuai University, Zhumadian, China
- Henan Key Laboratory of Smart Lighting, Zhumadian, China
| | - Zhiqiang Ma
- Department of Computer Science, College of Humanities & Sciences of Northeast Normal University, Changchun, China
| |
Collapse
|
5
|
Zhang J, Chai H, Guo S, Guo H, Li Y. High-Throughput Identification of Mammalian Secreted Proteins Using Species-Specific Scheme and Application to Human Proteome. Molecules 2018; 23:molecules23061448. [PMID: 29903999 PMCID: PMC6099666 DOI: 10.3390/molecules23061448] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Revised: 05/29/2018] [Accepted: 05/30/2018] [Indexed: 02/02/2023] Open
Abstract
Secreted proteins are widely spread in living organisms and cells. Since secreted proteins are easy to be detected in body fluids, urine, and saliva in clinical diagnosis, they play important roles in biomarkers for disease diagnosis and vaccine production. In this study, we propose a novel predictor for accurate high-throughput identification of mammalian secreted proteins that is based on sequence-derived features. We combine the features of amino acid composition, sequence motifs, and physicochemical properties to encode collected proteins. Detailed feature analyses prove the effectiveness of the considered features. Based on the differences across various species of secreted proteins, we introduce the species-specific scheme, which is expected to further explore the intrinsic attributes of specific secreted proteins. Experiments on benchmark datasets prove the effectiveness of our proposed method. The test on independent testing dataset also promises a good generalization capability. When compared with the traditional universal model, we experimentally demonstrate that the species-specific scheme is capable of significantly improving the prediction performance. We use our method to make predictions on unreviewed human proteome, and find 272 potential secreted proteins with probabilities that are higher than 99%. A user-friendly web server, named iMSPs (identification of Mammalian Secreted Proteins), which implements our proposed method, is designed and is available for free for academic use at: http://www.inforstation.com/webservers/iMSP/.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China.
| | - Haiting Chai
- College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK.
| | - Song Guo
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China.
| | - Huaping Guo
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China.
| | - Yanling Li
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China.
| |
Collapse
|
6
|
Jeiranikhameneh M, Razavi MR, Irani S, Siadat SD, Oloomi M. Designing novel construction for cell surface display of protein E on Escherichia coli using non-classical pathway based on Lpp-OmpA. AMB Express 2017; 7:53. [PMID: 28247289 PMCID: PMC5331024 DOI: 10.1186/s13568-017-0350-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 02/20/2017] [Indexed: 01/30/2023] Open
Abstract
Today, transference of recombinant protein on the outer surface of bacteria is deemed as a valuable process for various applications in biotechnology including preparation of vaccines. In this study, Lpp-OmpA structure was used to present outer membrane protein E of Haemophilus influenzae on E. coli outer membrane. Also, a structure was designed according to Lpp-OmpA based on non-classical secretion pathway using bioinformatics software such as MEMSAT-SVM, ScrotumP and SignalP where it lacked any signal peptide at its N-terminal. Potential of this structure in the presentation of protein E on the surface of E. coli through non-classical pathway was indicated by western blotting, SDS page and fluorescent microscopy techniques, similarly its effectiveness was compared with Lpp-OmpA system. The results of the current study showed that the new structure had higher efficiency than Lpp-OmpA, and it could transport protein E on outer membrane well. This study is the first report in the presentation of H. influenzae PE onto the surface of E. coli by Lpp-OmpA, and the structure originated from Lpp-OmpA, according to the non-classical secretion pathway. Our results suggest that non-classical secretion pathway may be exploited as a new secretory pathway on the outer surface of the cell for recombinant proteins.
Collapse
|
7
|
Huang WL. Ranking Gene Ontology terms for predicting non-classical secretory proteins in eukaryotes and prokaryotes. J Theor Biol 2012; 312:105-13. [PMID: 22967952 DOI: 10.1016/j.jtbi.2012.07.027] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2012] [Revised: 05/30/2012] [Accepted: 07/28/2012] [Indexed: 11/24/2022]
Abstract
Protein secretion is an important biological process for both eukaryotes and prokaryotes. Several sequence-based methods mainly rely on utilizing various types of complementary features to design accurate classifiers for predicting non-classical secretory proteins. Gene Ontology (GO) terms are increasing informative in predicting protein functions. However, the number of used GO terms is often very large. For example, there are 60,020 GO terms used in the prediction method Euk-mPLoc 2.0 for subcellular localization. This study proposes a novel approach to identify a small set of m top-ranked GO terms served as the only type of input features to design a support vector machine (SVM) based method Sec-GO to predict non-classical secretory proteins in both eukaryotes and prokaryotes. To evaluate the Sec-GO method, two existing methods and their used datasets are adopted for performance comparisons. The Sec-GO method using m=436 GO terms yields an independent test accuracy of 96.7% on mammalian proteins, much better than the existing method SPRED (82.2%) which uses frequencies of tri-peptides and short peptides, secondary structure, and physicochemical properties as input features of a random forest classifier. Furthermore, when applying to Gram-positive bacterial proteins, the Sec-GO with m=158 GO terms has a test accuracy of 94.5%, superior to NClassG+ (90.0%) which uses SVM with several feature types, comprising amino acid composition, di-peptides, physicochemical properties and the position specific weighting matrix. Analysis of the distribution of secretory proteins in a GO database indicates the percentage of the non-classical secretory proteins annotated by GO is larger than that of classical secretory proteins in both eukaryotes and prokaryotes. Of the m top-ranked GO features, the top-four GO terms are all annotated by such subcellular locations as GO:0005576 (Extracellular region). Additionally, the method Sec-GO is easily implemented and its web tool of prediction is available at iclab.life.nctu.edu.tw/secgo.
Collapse
Affiliation(s)
- Wen-Lin Huang
- Department of Management Information System, Asia Pacific Institute of Creativity, No. 110 XueFu Rd., Tou Fen, Miaoli, Taiwan, ROC.
| |
Collapse
|