1
|
Wen JW, Zhang HL, Du PF. Vislocas: Vision transformers for identifying protein subcellular mis-localization signatures of different cancer subtypes from immunohistochemistry images. Comput Biol Med 2024; 174:108392. [PMID: 38608321 DOI: 10.1016/j.compbiomed.2024.108392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 03/22/2024] [Accepted: 04/01/2024] [Indexed: 04/14/2024]
Abstract
Proteins must be sorted to specific subcellular compartments to perform their functions. Abnormal protein subcellular localizations are related to many diseases. Although many efforts have been made in predicting protein subcellular localization from various static information, including sequences, structures and interactions, such static information cannot predict protein mis-localization events in diseases. On the contrary, the IHC (immunohistochemistry) images, which have been widely applied in clinical diagnosis, contains information that can be used to find protein mis-localization events in disease states. In this study, we create the Vislocas method, which is capable of finding mis-localized proteins from IHC images as markers of cancer subtypes. By combining CNNs and vision transformer encoders, Vislocas can automatically extract image features at both global and local level. Vislocas can be trained with full-sized IHC images from scratch. It is the first attempt to create an end-to-end IHC image-based protein subcellular location predictor. Vislocas achieved comparable or better performances than state-of-the-art methods. We applied Vislocas to find significant protein mis-localization events in different subtypes of glioma, melanoma and skin cancer. The mis-localized proteins, which were found purely from IHC images by Vislocas, are in consistency with clinical or experimental results in literatures. All codes of Vislocas have been deposited in a Github repository (https://github.com/JingwenWen99/Vislocas). All datasets of Vislocas have been deposited in Zenodo (https://zenodo.org/records/10632698).
Collapse
Affiliation(s)
- Jing-Wen Wen
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Han-Lin Zhang
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| |
Collapse
|
2
|
Yajie H, Shenglan W, Wei Z, Rufang L, Tingting Y, Yunhui Z, Jie S. Global quantitative proteomic analysis profiles of host protein expression in response to Enterovirus A71 infection in bronchial epithelial cells based on tandem mass tag (TMT) peptide labeling coupled with LC-MS/MS uncovers the key role of proteasome in virus replication. Virus Res 2023; 330:199118. [PMID: 37072100 DOI: 10.1016/j.virusres.2023.199118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 03/30/2023] [Accepted: 04/15/2023] [Indexed: 04/20/2023]
Abstract
Enterovirus A71 (EV-A71) is a neurotropic human pathogen which mainly caused hand, foot and mouth disease (HFMD) mostly in children under 5 years-old. Generally, EV-A71-associated HFMD is a relatively self-limiting febrile disease, but there will still be a small percentage of patients with rapid disease progression and severe neurological complications. To date, the underlying mechanism of EV-A71 inducing pathological injury of central nervous system (CNS) remains largely unclear. It has been investigated and discussed the changes of mRNA, miRNA and circRNA expression profile during infection by EV-A71 in our previous studies. However, these studies were only analyzed at the RNA level, not at the protein level. It's the protein levels that ultimately do the work in the body. Here, to address this, we performed a tandem mass tag (TMT) peptide labeling coupled with LC-MS/MS approach to quantitatively identify cellular proteome changes at 24 h post-infection (hpi) in EV-A71-infected 16HBE cells. In total, 6615 proteins were identified by using TMT coupled with LC-MS/MS in this study. In the EV-A71- and mock-infected groups, 210 differentially expressed proteins were found, including 86 upregulated and 124 downregulated proteins, at 24 hpi. To ensure the validity and reliability of the proteomics data, 3 randomly selected proteins were verified by Western blot and Immunofluorescence analysis, and the results were consistent with the TMT results. Subsequently, functional enrichment analysis indicated that the up-regulated and down-regulated proteins were individually involved in various biological processes and signaling pathways, including metabolic process, AMPK signaling pathway, Neurotrophin signaling pathway, Viral myocarditis, GABAergic synapse, and so on. Moreover, among these enriched functional analysis, the "Proteasome" pathway was up-regulated, which has caught our attention. Inhibition of proteasome was found to obviously suppress the EV-A71 replication. Finally, further in-depth analysis revealed that these differentially expressed proteins contained distinct domains and localized in different subcellular components. Taken together, our data provided a comprehensive view of host cell response to EV-A71 and identified host proteins may lead to better understanding of the pathogenic mechanisms and host responses to EV-A71 infection, and also to the identification of new therapeutic targets for EV-A71 infection.
Collapse
Affiliation(s)
- Hu Yajie
- Department of Pulmonary and Critical Care Medicine, The First People's Hospital of Yunnan Province; The Affiliated Hospital of Kunming University of Science and Technology, Kunming, Yunnan, China.; Yunnan Provincial Key Laboratory of Clinical Virology
| | - Wang Shenglan
- Department of Pulmonary and Critical Care Medicine, The First People's Hospital of Yunnan Province; The Affiliated Hospital of Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Zhao Wei
- Department of Pulmonary and Critical Care Medicine, The First People's Hospital of Yunnan Province; The Affiliated Hospital of Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Li Rufang
- Department of Pulmonary and Critical Care Medicine, The First People's Hospital of Yunnan Province; The Affiliated Hospital of Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Yang Tingting
- Department of Pulmonary and Critical Care Medicine, The First People's Hospital of Yunnan Province; The Affiliated Hospital of Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Zhang Yunhui
- Department of Pulmonary and Critical Care Medicine, The First People's Hospital of Yunnan Province; The Affiliated Hospital of Kunming University of Science and Technology, Kunming, Yunnan, China..
| | - Song Jie
- Institute of Medical Biology, Chinese Academy of Medical Science and Peking Union Medical College, Yunnan Key Laboratory of Vaccine Research and Development on Severe Infectious Diseases, Kunming, China.
| |
Collapse
|
3
|
Wang RH, Luo T, Zhang HL, Du PF. PLA-GNN: Computational inference of protein subcellular location alterations under drug treatments with deep graph neural networks. Comput Biol Med 2023; 157:106775. [PMID: 36921458 DOI: 10.1016/j.compbiomed.2023.106775] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 02/21/2023] [Accepted: 03/09/2023] [Indexed: 03/12/2023]
Abstract
The aberrant protein sorting has been observed in many conditions, including complex diseases, drug treatments, and environmental stresses. It is important to systematically identify protein mis-localization events in a given condition. Experimental methods for finding mis-localized proteins are always costly and time consuming. Predicting protein subcellular localizations has been studied for many years. However, only a handful of existing works considered protein subcellular location alterations. We proposed a computational method for identifying alterations of protein subcellular locations under drug treatments. We took three drugs, including TSA (trichostain A), bortezomib and tacrolimus, as instances for this study. By introducing dynamic protein-protein interaction networks, graph neural network algorithms were applied to aggregate topological information under different conditions. We systematically reported potential protein mis-localization events under drug treatments. As far as we know, this is the first attempt to find protein mis-localization events computationally in drug treatment conditions. Literatures validated that a number of proteins, which are highly related to pharmacological mechanisms of these drugs, may undergo protein localization alterations. We name our method as PLA-GNN (Protein Localization Alteration by Graph Neural Networks). It can be extended to other drugs and other conditions. All datasets and codes of this study has been deposited in a GitHub repository (https://github.com/quinlanW/PLA-GNN).
Collapse
Affiliation(s)
- Ren-Hua Wang
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Tao Luo
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Han-Lin Zhang
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| |
Collapse
|
4
|
Xie Q, Wang D, Luo X, Li Z, Hu A, Yang H, Tang J, Gao P, Sun T, Kong L. Proteome profiling of formalin-fixed, paraffin-embedded lung adenocarcinoma tissues using a tandem mass tag-based quantitative proteomics approach. Oncol Lett 2021; 22:706. [PMID: 34457061 PMCID: PMC8358594 DOI: 10.3892/ol.2021.12967] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 06/22/2021] [Indexed: 12/18/2022] Open
Abstract
Over the past few decades, increasing efforts have been made to improve the understanding of, and treatment options for, lung adenocarcinoma (LUAD). However, considering the heterogeneity of LUAD, precise proteomics-based characterization at the molecular level is an urgent clinical requirement for effective treatment. Formalin-fixed, paraffin-embedded (FFPE) tissue is a good option as the working tool for proteomics studies. The present study aimed to obtain a global protein profile using LUAD FFPE tissue samples. Using a quantitative proteomics approach, the study revealed that 360 proteins were significantly more highly expressed in LUAD than in adjacent nontumor lung tissues. Also, 19 differentially expressed membrane proteins were found to be primarily responsible for immune processes. Epidermal growth factor (EGF)-like domain and laminin EGF domain showed markedly different expression levels between cancer tissues and tumor-adjacent normal tissues. Furthermore, Gene Ontology functional enrichment analysis showed that significantly upregulated proteins were associated with the endoplasmic reticulum lumen, protein disulfide isomerase activity, vitamin binding, cell cycle G1/S phase transition, to name but a few. Also, numerous kinases and post-translational modification enzymes were significantly upregulated across all eight LUAD samples compared with paracarcinoma tissues. Proteomics analysis revealed that AAA domain containing 3A (ATAD3a), a member of the ATPase family, was highly expressed in LUAD tissues, which was supported by immunohistochemical analysis. Furthermore, the study confirmed that ATAD3a enhanced the cisplatin sensitivity of LUAD cells. Collectively, the findings of the present study provide new potential candidate targets in patients with LUAD, and may aid auxiliary LUAD diagnosis and surveillance in a noninvasive manner.
Collapse
Affiliation(s)
- Qi Xie
- Department of Pathology, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, People's Hospital of Henan University, Zhengzhou, Henan 450003, P.R China
| | - Dan Wang
- Department of Neorology, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, People's Hospital of Henan University, Zhengzhou, Henan 450003, P.R China
| | - Xiao Luo
- International Medical Center, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, People's Hospital of Henan University, Zhengzhou, Henan 450003, P.R China
| | - Zhen Li
- Department of Pathology, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, People's Hospital of Henan University, Zhengzhou, Henan 450003, P.R China
| | - Aixia Hu
- Department of Pathology, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, People's Hospital of Henan University, Zhengzhou, Henan 450003, P.R China
| | - Hui Yang
- Department of Thoracic Surgery, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, People's Hospital of Henan University, Zhengzhou, Henan 450003, P.R China
| | - Jinxing Tang
- Department of Thoracic Surgery, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, People's Hospital of Henan University, Zhengzhou, Henan 450003, P.R China
| | - Peiyu Gao
- Department of Thoracic Surgery, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, People's Hospital of Henan University, Zhengzhou, Henan 450003, P.R China
| | - Tingyi Sun
- Department of Pathology, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, People's Hospital of Henan University, Zhengzhou, Henan 450003, P.R China
| | - Lingfei Kong
- Department of Pathology, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, People's Hospital of Henan University, Zhengzhou, Henan 450003, P.R China
| |
Collapse
|
5
|
Imai K, Nakai K. Tools for the Recognition of Sorting Signals and the Prediction of Subcellular Localization of Proteins From Their Amino Acid Sequences. Front Genet 2020; 11:607812. [PMID: 33324450 PMCID: PMC7723863 DOI: 10.3389/fgene.2020.607812] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 11/03/2020] [Indexed: 12/13/2022] Open
Abstract
At the time of translation, nascent proteins are thought to be sorted into their final subcellular localization sites, based on the part of their amino acid sequences (i.e., sorting or targeting signals). Thus, it is interesting to computationally recognize these signals from the amino acid sequences of any given proteins and to predict their final subcellular localization with such information, supplemented with additional information (e.g., k-mer frequency). This field has a long history and many prediction tools have been released. Even in this era of proteomic atlas at the single-cell level, researchers continue to develop new algorithms, aiming at accessing the impact of disease-causing mutations/cell type-specific alternative splicing, for example. In this article, we overview the entire field and discuss its future direction.
Collapse
Affiliation(s)
- Kenichiro Imai
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Kenta Nakai
- The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
6
|
Li GP, Du PF, Shen ZA, Liu HY, Luo T. DPPN-SVM: Computational Identification of Mis-Localized Proteins in Cancers by Integrating Differential Gene Expressions With Dynamic Protein-Protein Interaction Networks. Front Genet 2020; 11:600454. [PMID: 33193746 PMCID: PMC7644922 DOI: 10.3389/fgene.2020.600454] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Accepted: 10/07/2020] [Indexed: 12/29/2022] Open
Abstract
Eukaryotic cells contain numerous components, which are known as subcellular compartments or subcellular organelles. Proteins must be sorted to proper subcellular compartments to carry out their molecular functions. Mis-localized proteins are related to various cancers. Identifying mis-localized proteins is important in understanding the pathology of cancers and in developing therapies. However, experimental methods, which are used to determine protein subcellular locations, are always costly and time-consuming. We tried to identify cancer-related mis-localized proteins in three different cancers using computational approaches. By integrating gene expression profiles and dynamic protein-protein interaction networks, we established DPPN-SVM (Dynamic Protein-Protein Network with Support Vector Machine), a predictive model using the SVM classifier with diffusion kernels. With this predictive model, we identified a number of mis-localized proteins. Since we introduced the dynamic protein-protein network, which has never been considered in existing works, our model is capable of identifying more mis-localized proteins than existing studies. As far as we know, this is the first study to incorporate dynamic protein-protein interaction network in identifying mis-localized proteins in cancers.
Collapse
Affiliation(s)
- Guang-Ping Li
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Zi-Ang Shen
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Hang-Yu Liu
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Tao Luo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
7
|
Miao YY, Zhao W, Li GP, Gao Y, Du PF. Predicting Endoplasmic Reticulum Resident Proteins Using Auto-Cross Covariance Transformation With a U-Shaped Residue Weight-Transfer Function. Front Genet 2020; 10:1231. [PMID: 31921288 PMCID: PMC6932965 DOI: 10.3389/fgene.2019.01231] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 11/06/2019] [Indexed: 11/13/2022] Open
Abstract
Background: The endoplasmic reticulum (ER) is an important organelle in eukaryotic cells. It is involved in many important biological processes, such as cell metabolism, protein synthesis, and post-translational modification. The proteins that reside within the ER are called ER-resident proteins. These proteins are closely related to the biological functions of the ER. The difference between the ER-resident proteins and other non-resident proteins should be carefully studied. Methods: We developed a support vector machine (SVM)-based method. We developed a U-shaped weight-transfer function and used it, along with the positional-specific physiochemical properties (PSPCP), to integrate together sequence order information, signaling peptides information, and evolutionary information. Result: Our method achieved over 86% accuracy in a jackknife test. We also achieved roughly 86% sensitivity and 67% specificity in an independent dataset test. Our method is capable of identifying ER-resident proteins.
Collapse
Affiliation(s)
- Yang-Yang Miao
- College of Intelligence and Computing, Tianjin University, Tianjin, China.,School of Chemical Engineering, Tianjin University, Tianjin, China
| | - Wei Zhao
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Guang-Ping Li
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yang Gao
- School of Medicine, Nankai University, Tianjin, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
8
|
Javed F, Hayat M. Predicting subcellular localization of multi-label proteins by incorporating the sequence features into Chou's PseAAC. Genomics 2019; 111:1325-1332. [DOI: 10.1016/j.ygeno.2018.09.004] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2018] [Accepted: 09/04/2018] [Indexed: 12/13/2022]
|
9
|
Abstract
Background:
Revealing the subcellular location of a newly discovered protein can
bring insight into their function and guide research at the cellular level. The experimental methods
currently used to identify the protein subcellular locations are both time-consuming and expensive.
Thus, it is highly desired to develop computational methods for efficiently and effectively identifying
the protein subcellular locations. Especially, the rapidly increasing number of protein sequences
entering the genome databases has called for the development of automated analysis methods.
Methods:
In this review, we will describe the recent advances in predicting the protein subcellular
locations with machine learning from the following aspects: i) Protein subcellular location benchmark
dataset construction, ii) Protein feature representation and feature descriptors, iii) Common
machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web
servers.
Result & Conclusion:
Concomitant with a large number of protein sequences generated by highthroughput
technologies, four future directions for predicting protein subcellular locations with
machine learning should be paid attention. One direction is the selection of novel and effective features
(e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins.
Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth
one is the protein multiple location sites prediction.
Collapse
Affiliation(s)
- Ting-He Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Shao-Wu Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| |
Collapse
|
10
|
Qiao S, Yan B, Li J. Ensemble learning for protein multiplex subcellular localization prediction based on weighted KNN with different features. APPL INTELL 2017. [DOI: 10.1007/s10489-017-1029-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
11
|
Du PF. Predicting Protein Submitochondrial Locations: The 10th Anniversary. Curr Genomics 2017; 18:316-321. [PMID: 29081687 PMCID: PMC5635615 DOI: 10.2174/1389202918666170228143256] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 10/16/2016] [Accepted: 11/02/2016] [Indexed: 12/16/2022] Open
Abstract
Predicting protein submitochondrial location has been studied for about ten years. A number of methods have been developed. The prediction performances have been improved to an almost perfect level. In this review, we introduce the background of this research topic. We also compare the methods, the performances and the datasets that have been used by these studies. Towards the end, we provide hints for the future directions of this research topic.
Collapse
Affiliation(s)
- Pu-Feng Du
- School of Computer Science and Technology, Tianjin University, Tianjin300350, China
| |
Collapse
|
12
|
Jiao YS, Du PF. Predicting protein submitochondrial locations by incorporating the positional-specific physicochemical properties into Chou's general pseudo-amino acid compositions. J Theor Biol 2017; 416:81-87. [DOI: 10.1016/j.jtbi.2016.12.026] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2016] [Revised: 12/06/2016] [Accepted: 12/30/2016] [Indexed: 11/26/2022]
|
13
|
Hasan MAM, Ahmad S, Molla MKI. Protein subcellular localization prediction using multiple kernel learning based support vector machine. MOLECULAR BIOSYSTEMS 2017; 13:785-795. [DOI: 10.1039/c6mb00860g] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
An efficient multi-label protein subcellular localization prediction system was developed by introducing multiple kernel learning (MKL) based support vector machine (SVM).
Collapse
Affiliation(s)
- Md. Al Mehedi Hasan
- Department of Computer Science & Engineering
- University of Rajshahi
- Rajshahi
- Bangladesh
| | - Shamim Ahmad
- Department of Computer Science & Engineering
- University of Rajshahi
- Rajshahi
- Bangladesh
| | | |
Collapse
|
14
|
Hasan MAM, Ahmad S, Molla MKI. iMulti-HumPhos: a multi-label classifier for identifying human phosphorylated proteins using multiple kernel learning based support vector machines. MOLECULAR BIOSYSTEMS 2017; 13:1608-1618. [DOI: 10.1039/c7mb00180k] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
An efficient multi-label classifier for identifying human phosphorylated proteins has been developed by introducing multiple kernel learning based support vector machines.
Collapse
Affiliation(s)
- Md. Al Mehedi Hasan
- Department of Computer Science & Engineering
- University of Rajshahi
- Rajshahi 6205
- Bangladesh
| | - Shamim Ahmad
- Department of Computer Science & Engineering
- University of Rajshahi
- Rajshahi 6205
- Bangladesh
| | | |
Collapse
|
15
|
Jiao Y, Du P. Performance measures in evaluating machine learning based bioinformatics predictors for classifications. QUANTITATIVE BIOLOGY 2016. [DOI: 10.1007/s40484-016-0081-2] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
16
|
Qu X, Wang D, Chen Y, Qiao S, Zhao Q. Predicting the Subcellular Localization of Proteins with Multiple Sites Based on Multiple Features Fusion. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:36-42. [PMID: 26452288 DOI: 10.1109/tcbb.2015.2485207] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Protein sub-cellular localization prediction has attracted much attention in recent years because of its importance for protein function studying and targeted drug discovery, and that makes it to be an important research field in bioinformatics. Traditional experimental methods which ascertain the protein sub-cellular locations are costly and time consuming. In the last two decades, machine learning methods got increasing development, and a large number of machine learning based protein sub-cellular location predictors have been developed. However, most of such predictors can only predict proteins in only one subcellular location. With the development of biology techniques, more and more proteins which have two or even more sub-cellular locations have been found. It is much more significant to study such proteins because they have extremely useful implication for both basic biology and bioinformatics research. In order to improve the accuracy of prediction, much more feature information which can represent the protein sequence should be extracted. In this paper, several feature extraction methods were fused together to extract the feature information, then the multi-label k nearest neighbors (ML-KNN) algorithm was used to predict protein sub-cellular locations. The best overall accuracies we got for dataset s1 in constructing Gpos-mploc is 66.7304 and 59.9206 percent for dataset s2 in constructing Virus-mPLoc.
Collapse
|
17
|
Jiao YS, Du PF. Predicting Golgi-resident protein types using pseudo amino acid compositions: Approaches with positional specific physicochemical properties. J Theor Biol 2015; 391:35-42. [PMID: 26702543 DOI: 10.1016/j.jtbi.2015.11.009] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Revised: 11/17/2015] [Accepted: 11/19/2015] [Indexed: 11/24/2022]
Abstract
Knowing the type of a Golgi-resident protein is an important step in understanding its molecular functions as well as its role in biological processes. In this paper, we developed a novel computational method to predict Golgi-resident protein types using positional specific physicochemical properties and analysis of variance based feature selection methods. Our method achieved 86.9% prediction accuracy in leave-one-out cross-validations with only 59 features. Our method has the potential to be applied in predicting a wide range of protein attributes.
Collapse
Affiliation(s)
- Ya-Sen Jiao
- School of Computer Science and Technology, Tianjin University, Tianjin 300072, China
| | - Pu-Feng Du
- School of Computer Science and Technology, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
18
|
Predicting human protein subcellular locations by the ensemble of multiple predictors via protein-protein interaction network with edge clustering coefficients. PLoS One 2014; 9:e86879. [PMID: 24466278 PMCID: PMC3900678 DOI: 10.1371/journal.pone.0086879] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 12/18/2013] [Indexed: 12/14/2022] Open
Abstract
One of the fundamental tasks in biology is to identify the functions of all proteins to reveal the primary machinery of a cell. Knowledge of the subcellular locations of proteins will provide key hints to reveal their functions and to understand the intricate pathways that regulate biological processes at the cellular level. Protein subcellular location prediction has been extensively studied in the past two decades. A lot of methods have been developed based on protein primary sequences as well as protein-protein interaction network. In this paper, we propose to use the protein-protein interaction network as an infrastructure to integrate existing sequence based predictors. When predicting the subcellular locations of a given protein, not only the protein itself, but also all its interacting partners were considered. Unlike existing methods, our method requires neither the comprehensive knowledge of the protein-protein interaction network nor the experimentally annotated subcellular locations of most proteins in the protein-protein interaction network. Besides, our method can be used as a framework to integrate multiple predictors. Our method achieved 56% on human proteome in absolute-true rate, which is higher than the state-of-the-art methods.
Collapse
|
19
|
Li X, Wu X, Wu G. Robust feature generation for protein subchloroplast location prediction with a weighted GO transfer model. J Theor Biol 2014; 347:84-94. [PMID: 24423409 DOI: 10.1016/j.jtbi.2014.01.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Revised: 10/17/2013] [Accepted: 01/03/2014] [Indexed: 10/25/2022]
Abstract
Chloroplasts are crucial organelles of green plants and eukaryotic algae since they conduct photosynthesis. Predicting the subchloroplast location of a protein can provide important insights for understanding its biological functions. The performance of subchloroplast location prediction algorithms often depends on deriving predictive and succinct features from genomic and proteomic data. In this work, a novel weighted Gene Ontology (GO) transfer model is proposed to generate discriminating features from sequence data and GO Categories. This model contains two components. First, we transfer the GO terms of the homologous protein, and then assign the bit-score as weights to GO features. Second, we employ term-selection methods to determine weights for GO terms. This model is capable of improving prediction accuracy due to the tolerance of the noise derived from homolog knowledge transfer. The proposed weighted GO transfer method based on bit-score and a logarithmic transformation of CHI-square (WS-LCHI) performs better than the baseline models, and also outperforms the four off-the-shelf subchloroplast prediction methods.
Collapse
Affiliation(s)
- Xiaomei Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, PR China.
| | - Xindong Wu
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, PR China; Department of Computer Science, University of Vermont, Burlington, VT 50405, USA.
| | - Gongqing Wu
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, PR China.
| |
Collapse
|
20
|
A novel approach for protein subcellular location prediction using amino acid exposure. BMC Bioinformatics 2013; 14:342. [PMID: 24283794 PMCID: PMC4219330 DOI: 10.1186/1471-2105-14-342] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 11/25/2013] [Indexed: 11/10/2022] Open
Abstract
Background Proteins perform their functions in associated cellular locations. Therefore, the study of protein function can be facilitated by predictions of protein location. Protein location can be predicted either from the sequence of a protein alone by identification of targeting peptide sequences and motifs, or by homology to proteins of known location. A third approach, which is complementary, exploits the differences in amino acid composition of proteins associated to different cellular locations, and can be useful if motif and homology information are missing. Here we expand this approach taking into account amino acid composition at different levels of amino acid exposure. Results Our method has two stages. For stage one, we trained multiple Support Vector Machines (SVMs) to score eukaryotic protein sequences for membership to each of three categories: nuclear, cytoplasmic and extracellular, plus extra category nucleocytoplasmic, accounting for the fact that a large number of proteins shuttles between those two locations. In stage two we use an artificial neural network (ANN) to propose a category from the scores given to the four locations in stage one. The method reaches an accuracy of 68% when using as input 3D-derived values of amino acid exposure. Calibration of the method using predicted values of amino acid exposure allows classifying proteins without 3D-information with an accuracy of 62% and discerning proteins in different locations even if they shared high levels of identity. Conclusions In this study we explored the relationship between residue exposure and protein subcellular location. We developed a new algorithm for subcellular location prediction that uses residue exposure signatures. Our algorithm uses a novel approach to address the multiclass classification problem. The algorithm is implemented as web server 'NYCE’ and can be accessed at http://cbdm.mdc-berlin.de/~amer/nyce.
Collapse
|
21
|
SubMito-PSPCP: predicting protein submitochondrial locations by hybridizing positional specific physicochemical properties with pseudoamino acid compositions. BIOMED RESEARCH INTERNATIONAL 2013; 2013:263829. [PMID: 24027753 PMCID: PMC3763570 DOI: 10.1155/2013/263829] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/12/2013] [Revised: 07/10/2013] [Accepted: 07/20/2013] [Indexed: 11/17/2022]
Abstract
Knowing the submitochondrial location of a mitochondrial protein is an important step in understanding its function. We developed a new method for predicting protein submitochondrial locations by introducing a new concept: positional specific physicochemical properties. With the framework of general form pseudoamino acid compositions, our method used only about 100 features to represent protein sequences, which is much simpler than the existing methods. On the dataset of SubMito, our method achieved over 93% overall accuracy, with 98.60% for inner membrane, 93.90% for matrix, and 70.70% for outer membrane, which are comparable to all state-of-the-art methods. As our method can be used as a general method to upgrade all pseudoamino-acid-composition-based methods, it should be very useful in future studies. We implement our method as an online service: SubMito-PSPCP.
Collapse
|