1
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
2
|
Xiao H, Zou Y, Wang J, Wan S. A Review for Artificial Intelligence Based Protein Subcellular Localization. Biomolecules 2024; 14:409. [PMID: 38672426 PMCID: PMC11048326 DOI: 10.3390/biom14040409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/21/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Proteins need to be located in appropriate spatiotemporal contexts to carry out their diverse biological functions. Mislocalized proteins may lead to a broad range of diseases, such as cancer and Alzheimer's disease. Knowing where a target protein resides within a cell will give insights into tailored drug design for a disease. As the gold validation standard, the conventional wet lab uses fluorescent microscopy imaging, immunoelectron microscopy, and fluorescent biomarker tags for protein subcellular location identification. However, the booming era of proteomics and high-throughput sequencing generates tons of newly discovered proteins, making protein subcellular localization by wet-lab experiments a mission impossible. To tackle this concern, in the past decades, artificial intelligence (AI) and machine learning (ML), especially deep learning methods, have made significant progress in this research area. In this article, we review the latest advances in AI-based method development in three typical types of approaches, including sequence-based, knowledge-based, and image-based methods. We also elaborately discuss existing challenges and future directions in AI-based method development in this research field.
Collapse
Affiliation(s)
- Hanyu Xiao
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Yijin Zou
- College of Veterinary Medicine, China Agricultural University, Beijing 100193, China;
| | - Jieqiong Wang
- Department of Neurological Sciences, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Shibiao Wan
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| |
Collapse
|
3
|
Wang R, Yang X, Wang T, Kou R, Liu P, Huang Y, Chen C. Synergistic effects on oxidative stress, apoptosis and necrosis resulting from combined toxicity of three commonly used pesticides on HepG2 cells. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2023; 263:115237. [PMID: 37451096 DOI: 10.1016/j.ecoenv.2023.115237] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 07/02/2023] [Accepted: 07/05/2023] [Indexed: 07/18/2023]
Abstract
The widespread use of pesticides performs a vital role in safeguarding crop yields and quality, providing the opportunity for multiple pesticides to co-exist, which poses a significant potential risk to human health. To assess the toxic effects caused by exposures to individual pesticides (chlorpyrifos, carbofuran and acetamiprid), binary combinations and ternary combinations, individual and combined exposure models were developed using HepG2 cells and the types of combined effects of pesticide mixtures were assessed using concentration addition (CA), independent action (IA) and combination index (CI) models, respectively, and the expression of biomarkers related to oxidative stress, apoptosis and cell necrosis was further examined. Our results showed that both individual pesticides and mixtures exerted toxic effects on HepG2 cells. The CI model indicated that the toxic effects of pesticide mixtures exhibited synergistic effects. The results of the lactate dehydrogenase (LDH) release and apoptosis assay revealed that the pesticide mixture increased the release of LDH and apoptosis levels. Moreover, our results also showed that individual pesticides and mixtures disrupted redox homeostasis and that pesticide mixtures produced more intense oxidative stress effects. In conclusion, we have illustrated the enhanced combined toxicity of pesticide mixtures by in-vitro experiments, which provides a theoretical basis and scientific basis for further toxicological studies.
Collapse
Affiliation(s)
- Ruike Wang
- School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, Shandong, China
| | - Xi Yang
- Key Laboratory of Argo-Product Quality and Safety of Ministry of Agriculture, Institute of Quality Standards and Testing Technology for Argo-Products, Chinese Academy of Agricultural Sciences, NO.12 Zhong-guan-cun South Street, Haidian District, Beijing 100081, China
| | - Tiancai Wang
- Key Laboratory of Argo-Product Quality and Safety of Ministry of Agriculture, Institute of Quality Standards and Testing Technology for Argo-Products, Chinese Academy of Agricultural Sciences, NO.12 Zhong-guan-cun South Street, Haidian District, Beijing 100081, China
| | - Ruirui Kou
- School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, Shandong, China
| | - Panpan Liu
- School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, Shandong, China
| | - Yueqing Huang
- Department of General Medicine, The Affliated Suzhou Hospital of Nanjing Medical University, Suzhou Municipal Hospital, Nanjing Medical University, Suzhou 215026, China.
| | - Chen Chen
- School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, Shandong, China.
| |
Collapse
|
4
|
Zhang T, Gu J, Wang Z, Wu C, Liang Y, Shi X. Protein Subcellular Localization Prediction Model Based on Graph Convolutional Network. Interdiscip Sci 2022; 14:937-946. [PMID: 35713780 DOI: 10.1007/s12539-022-00529-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 05/12/2022] [Accepted: 05/17/2022] [Indexed: 06/15/2023]
Abstract
Protein subcellular localization prediction is an important research area in bioinformatics, which plays an essential role in understanding protein function and mechanism. Many machine learning and deep learning algorithms have been employed for this task, but most of them do not use structural information of proteins. With the advances in protein structure research in recent years, protein contact map prediction has been dramatically enhanced. In this paper, we present GraphLoc, a deep learning model that predicts the localization of proteins at the subcellular level. The cores of the model are a graph convolutional neural network module and a multi-head attention module. The protein topology graph is constructed based on a contact map predicted from protein sequences, which is used as the input of the GCN module to take full advantage of the structural information of proteins. Multi-head attention module learns the weighted contribution of different amino acids to subcellular localization in different feature representation subspaces. Experiments on the benchmark dataset show that the performance of our model is better than others. The code can be accessed at https://github.com/GoodGuy398/GraphLoc . The proposed GraphLoc model consists of three parts. The first part is a graph convolutional network (GCN) module, which utilizes the predicted contact maps to construct protein graph, taking benefit of protein information accordingly. The second part is the multi-head attention module, which learns the weighted contribution of different amino acids in different feature representation subspace, and weighted average the feature map across all amino acid nodes. The last part is a fully connected layer that maps the flatten graph representation vector to another vector with a category number dimension, followed by a softmax layer to predict the protein subcellular localization.
Collapse
Affiliation(s)
- Tianhao Zhang
- College of Computer Science and Technology, University of Jilin, Changchun, 130012, China
| | - Jiawei Gu
- College of Computer Science and Technology, University of Jilin, Changchun, 130012, China
| | - Zeyu Wang
- College of Computer Science and Technology, University of Jilin, Changchun, 130012, China
| | - Chunguo Wu
- College of Computer Science and Technology, University of Jilin, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun, 130012, China
| | - Yanchun Liang
- College of Computer Science and Technology, University of Jilin, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun, 130012, China
- School of Computer Science, Zhuhai College of Science and Technology, Zhuhai, 519041, China
| | - Xiaohu Shi
- College of Computer Science and Technology, University of Jilin, Changchun, 130012, China.
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun, 130012, China.
- School of Computer Science, Zhuhai College of Science and Technology, Zhuhai, 519041, China.
| |
Collapse
|
5
|
Cong H, Liu H, Cao Y, Chen Y, Liang C. Multiple Protein Subcellular Locations Prediction Based on Deep Convolutional Neural Networks with Self-Attention Mechanism. Interdiscip Sci 2022; 14:421-438. [PMID: 35066812 DOI: 10.1007/s12539-021-00496-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Revised: 12/06/2021] [Accepted: 12/13/2021] [Indexed: 12/12/2022]
Abstract
As an important research field in bioinformatics, protein subcellular location prediction is critical to reveal the protein functions and provide insightful information for disease diagnosis and drug development. Predicting protein subcellular locations remains a challenging task due to the difficulty of finding representative features and robust classifiers. Many feature fusion methods have been widely applied to tackle the above issues. However, they still suffer from accuracy loss due to feature redundancy. Furthermore, multiple protein subcellular locations prediction is more complicated since it is fundamentally a multi-label classification problem. The traditional binary classifiers or even multi-class classifiers cannot achieve satisfactory results. This paper proposes a novel method for protein subcellular location prediction with both single and multiple sites based on deep convolutional neural networks. Specifically, we first obtain the integrated features by simultaneously considering the pseudo amino acid, amino acid index distribution, and physicochemical property. We then adopt deep convolutional neural networks to extract high-dimensional features from the fused feature, removing the redundant preliminary features and gaining better representations of the raw sequences. Moreover, we use the self-attention mechanism and a customized loss function to ensure that the model is more inclined to positive data. In addition, we use random k-label sets to reduce the number of prediction labels. Meanwhile, we employ a hybrid strategy of over-sampling and under-sampling to tackle the data imbalance problem. We compare our model with three representative classification alternatives. The experiment results show that our model achieves the best performance in terms of accuracy, demonstrating the efficacy of the proposed model.
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China.
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China.
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent, Computing University of Jinan, Jinan, China
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent, Computing University of Jinan, Jinan, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| |
Collapse
|
6
|
He F, Wan J, Chu S, Li X, Zong W, Liu R. Toxic mechanism on phenanthrene-triggered cell apoptosis, genotoxicity, immunotoxicity and activity changes of immunity protein in Eisenia fetida: Combined analysis at cellular and molecular levels. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 819:153167. [PMID: 35051481 DOI: 10.1016/j.scitotenv.2022.153167] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 01/05/2022] [Accepted: 01/11/2022] [Indexed: 06/14/2023]
Abstract
Phenanthrene (PHE) is a harmful organic contaminant and exists extensively in the soil environment. The accumulation of PHE would potentially threaten soil invertebrates, including earthworms, and the toxicity is also high. Currently, the possible mechanisms underlying apoptotic pathways induced by PHE and its immunotoxicity and genotoxicity in earthworms remain unclear. Thus, Eisenia fetida coelomocytes and immunity protein lysozyme (LYZ) were chosen as targeted receptors to reveal the apoptotic pathways, genotoxicity, and immunotoxicity triggered by PHE and its binding mechanism with LYZ, using cellular, biochemical, and molecular methods. Results indicated that PHE exposure can cause cell membrane damage, increase cell membrane permeability, and ultimately trigger mitochondria-mediated apoptosis. Increased 8-hydroxy-2-deoxyguanosine (8-OHdG) levels indicated PHE had triggered DNA oxidative damage in cells after PHE exposure. Occurrence of detrimental effects on the immune system in E. fetida coelomocytes due to decreased phagocytic efficacy and destroyed the lysosomal membrane. The LYZ activity in coelomocytes after PHE exposure was consistent with the molecular results, in which the LYZ activity was inhibited. After PHE binding, the protein structure (secondary structure and protein skeleton) and protein environment (the micro-environment of aromatic amino acids) of LYZ were destroyed, forming a larger particle size of the PHE-LYZ complex, and causing a significant sensitization effect on LYZ fluorescence. Molecular simulation indicated the key residues Glu 35, Asp 52, and Trp 62 for protein function located in the binding pocket, suggesting PHE preferentially binds to the active center of LYZ. Additionally, the primary driving forces for the binding interaction between PHE and LYZ molecule are hydrophobicity forces and hydrogen bonds. Taken together, PHE exposure can induce apoptosis by mitochondria-mediated pathway, destroy the normal immune system, and trigger DNA oxidative damage in earthworms. Besides, this study provides a comprehensive evaluation of phenanthrene toxicity to earthworms on molecular and cellular level.
Collapse
Affiliation(s)
- Falin He
- School of Environmental Science and Engineering, Shandong University, China-America CRC for Environment & Health, Shandong Province, 72# Jimo Binhai Road, Qingdao, Shandong 266237, PR China
| | - Jingqiang Wan
- School of Environmental Science and Engineering, Shandong University, China-America CRC for Environment & Health, Shandong Province, 72# Jimo Binhai Road, Qingdao, Shandong 266237, PR China
| | - Shanshan Chu
- School of Environmental Science and Engineering, Shandong University, China-America CRC for Environment & Health, Shandong Province, 72# Jimo Binhai Road, Qingdao, Shandong 266237, PR China
| | - Xiangxiang Li
- School of Environmental Science and Engineering, Shandong University, China-America CRC for Environment & Health, Shandong Province, 72# Jimo Binhai Road, Qingdao, Shandong 266237, PR China
| | - Wansong Zong
- College of Geography and Environment, Shandong Normal University, 88# East Wenhua Road, Jinan, Shandong 250014, PR China
| | - Rutao Liu
- School of Environmental Science and Engineering, Shandong University, China-America CRC for Environment & Health, Shandong Province, 72# Jimo Binhai Road, Qingdao, Shandong 266237, PR China.
| |
Collapse
|
7
|
Jin Y, Zhang A. Total glucosides of paeony ameliorates oxidative stress, apoptosis and inflammatory response by regulating the Smad7‑TGF‑β pathway in allergic rhinitis. Mol Med Rep 2022; 25:83. [PMID: 35029288 PMCID: PMC8778736 DOI: 10.3892/mmr.2022.12599] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 11/29/2021] [Indexed: 11/25/2022] Open
Abstract
Total glucosides of paeony (TGP), an active ingredient extracted from the root of Paeonia alba, has been reported to display an anti-inflammatory effect. However, the effect of TGP on allergic rhinitis (AR) is still unknown. The present study aimed to assess the role of TGP in an AR mouse model. An AR mouse model was established using the ovalbumin method. The expression levels of Smad7/TGF-β pathway-related prtoeins in nasal mucosa tissues were determined by immunofluorescence, immunohistochemistry and western blotting. The severity of nasal allergic symptoms was detected by recording the frequency of sneezing and nose rubbing motions in all mice for 20 min. The levels of IgE and inflammatory cytokines, including IL-4, IL-5, IL-17 and IFN-γ, in the serum were measured by conducting ELISAs. H&E staining, periodic acid-Schiff staining and Masson staining were used to detected histopathological changes in mice. The concentrations of malondialdehyde and glutathione, and the activities of superoxide dismutase and catalase in tissue supernatant and serum were quantified using commercial assay kits. Apoptosis of nasal tissue cells was detected by performing TUNEL assays and western blotting. The expression of Smad7 was upregulated and that of TGF-β was downregulated in the nasal tissue of AR mice. Additionally, TGP regulated the Smad7/TGF-β pathway in the nasal tissue of AR mice. TGP alleviated serum IgE, nasal symptoms and histopathological changes in AR mice. Moreover, TGP ameliorated oxidative stress, cell apoptosis and inflammatory response. Smad7 small interfering RNA intervention aggravated the symptoms of AR mice via activation of the TGF-β pathway and reversed the protective effect of TGP in AR mice. TGP ameliorated oxidative stress, apoptosis and inflammatory response via the Smad7/TGF-β pathway in AR.
Collapse
Affiliation(s)
- Yangzi Jin
- Department of Otolaryngology, The First Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, Zhejiang 310000, P.R. China
| | - Aichun Zhang
- Department of Otolaryngology, The First Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, Zhejiang 310000, P.R. China
| |
Collapse
|
8
|
Wattanapornprom W, Thammarongtham C, Hongsthong A, Lertampaiporn S. Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization. Life (Basel) 2021; 11:life11040293. [PMID: 33808227 PMCID: PMC8066735 DOI: 10.3390/life11040293] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/16/2021] [Accepted: 03/25/2021] [Indexed: 12/17/2022] Open
Abstract
The accurate prediction of protein localization is a critical step in any functional genome annotation process. This paper proposes an improved strategy for protein subcellular localization prediction in plants based on multiple classifiers, to improve prediction results in terms of both accuracy and reliability. The prediction of plant protein subcellular localization is challenging because the underlying problem is not only a multiclass, but also a multilabel problem. Generally, plant proteins can be found in 10–14 locations/compartments. The number of proteins in some compartments (nucleus, cytoplasm, and mitochondria) is generally much greater than that in other compartments (vacuole, peroxisome, Golgi, and cell wall). Therefore, the problem of imbalanced data usually arises. Therefore, we propose an ensemble machine learning method based on average voting among heterogeneous classifiers. We first extracted various types of features suitable for each type of protein localization to form a total of 479 feature spaces. Then, feature selection methods were used to reduce the dimensions of the features into smaller informative feature subsets. This reduced feature subset was then used to train/build three different individual models. In the process of combining the three distinct classifier models, we used an average voting approach to combine the results of these three different classifiers that we constructed to return the final probability prediction. The method could predict subcellular localizations in both single- and multilabel locations, based on the voting probability. Experimental results indicated that the proposed ensemble method could achieve correct classification with an overall accuracy of 84.58% for 11 compartments, on the basis of the testing dataset.
Collapse
Affiliation(s)
- Warin Wattanapornprom
- Applied Computer Science Program, Department of Mathematics, Faculty of Science, King Mongkut’s University of Technology Thonburi, Bangkok 10140, Thailand;
| | - Chinae Thammarongtham
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
| | - Apiradee Hongsthong
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
| | - Supatcha Lertampaiporn
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
- Correspondence:
| |
Collapse
|
9
|
Jing XY, Li FM. Predicting Cell Wall Lytic Enzymes Using Combined Features. Front Bioeng Biotechnol 2021; 8:627335. [PMID: 33585423 PMCID: PMC7874139 DOI: 10.3389/fbioe.2020.627335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 12/04/2020] [Indexed: 11/13/2022] Open
Abstract
Due to the overuse of antibiotics, people are worried that existing antibiotics will become ineffective against pathogens with the rapid rise of antibiotic-resistant strains. The use of cell wall lytic enzymes to destroy bacteria has become a viable alternative to avoid the crisis of antimicrobial resistance. In this paper, an improved method for cell wall lytic enzymes prediction was proposed and the amino acid composition (AAC), the dipeptide composition (DC), the position-specific score matrix auto-covariance (PSSM-AC), and the auto-covariance average chemical shift (acACS) were selected to predict the cell wall lytic enzymes with support vector machine (SVM). In order to overcome the imbalanced data classification problems and remove redundant or irrelevant features, the synthetic minority over-sampling technique (SMOTE) was used to balance the dataset. The F-score was used to select features. The Sn, Sp, MCC, and Acc were 99.35%, 99.02%, 0.98, and 99.19% with jackknife test using the optimized combination feature AAC+DC+acACS+PSSM-AC. The Sn, Sp, MCC, and Acc of cell wall lytic enzymes in our predictive model were higher than those in existing methods. This improved method may be helpful for protein function prediction.
Collapse
Affiliation(s)
- Xiao-Yang Jing
- College of Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Feng-Min Li
- College of Science, Inner Mongolia Agricultural University, Hohhot, China
| |
Collapse
|
10
|
Lv Z, Cui F, Zou Q, Zhang L, Xu L. Anticancer peptides prediction with deep representation learning features. Brief Bioinform 2021; 22:6126754. [PMID: 33529337 DOI: 10.1093/bib/bbab008] [Citation(s) in RCA: 86] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 12/20/2020] [Accepted: 01/05/2021] [Indexed: 12/13/2022] Open
Abstract
Anticancer peptides constitute one of the most promising therapeutic agents for combating common human cancers. Using wet experiments to verify whether a peptide displays anticancer characteristics is time-consuming and costly. Hence, in this study, we proposed a computational method named identify anticancer peptides via deep representation learning features (iACP-DRLF) using light gradient boosting machine algorithm and deep representation learning features. Two kinds of sequence embedding technologies were used, namely soft symmetric alignment embedding and unified representation (UniRep) embedding, both of which involved deep neural network models based on long short-term memory networks and their derived networks. The results showed that the use of deep representation learning features greatly improved the capability of the models to discriminate anticancer peptides from other peptides. Also, UMAP (uniform manifold approximation and projection for dimension reduction) and SHAP (shapley additive explanations) analysis proved that UniRep have an advantage over other features for anticancer peptide identification. The python script and pretrained models could be downloaded from https://github.com/zhibinlv/iACP-DRLF or from http://public.aibiochem.net/iACP-DRLF/.
Collapse
Affiliation(s)
- Zhibin Lv
- University of Electronic Science and Technology of China
| | - Feifei Cui
- University of Electronic Science and Technology of China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences at University of Electronic Science and Technology of China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic
| |
Collapse
|