1
|
Xiao H, Zou Y, Wang J, Wan S. A Review for Artificial Intelligence Based Protein Subcellular Localization. Biomolecules 2024; 14:409. [PMID: 38672426 PMCID: PMC11048326 DOI: 10.3390/biom14040409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/21/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Proteins need to be located in appropriate spatiotemporal contexts to carry out their diverse biological functions. Mislocalized proteins may lead to a broad range of diseases, such as cancer and Alzheimer's disease. Knowing where a target protein resides within a cell will give insights into tailored drug design for a disease. As the gold validation standard, the conventional wet lab uses fluorescent microscopy imaging, immunoelectron microscopy, and fluorescent biomarker tags for protein subcellular location identification. However, the booming era of proteomics and high-throughput sequencing generates tons of newly discovered proteins, making protein subcellular localization by wet-lab experiments a mission impossible. To tackle this concern, in the past decades, artificial intelligence (AI) and machine learning (ML), especially deep learning methods, have made significant progress in this research area. In this article, we review the latest advances in AI-based method development in three typical types of approaches, including sequence-based, knowledge-based, and image-based methods. We also elaborately discuss existing challenges and future directions in AI-based method development in this research field.
Collapse
Affiliation(s)
- Hanyu Xiao
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Yijin Zou
- College of Veterinary Medicine, China Agricultural University, Beijing 100193, China;
| | - Jieqiong Wang
- Department of Neurological Sciences, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Shibiao Wan
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| |
Collapse
|
2
|
Ardern Z, Chakraborty S, Lenk F, Kaster AK. Elucidating the functional roles of prokaryotic proteins using big data and artificial intelligence. FEMS Microbiol Rev 2023; 47:fuad003. [PMID: 36725215 PMCID: PMC9960493 DOI: 10.1093/femsre/fuad003] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 01/11/2023] [Accepted: 01/31/2023] [Indexed: 02/03/2023] Open
Abstract
Annotating protein sequences according to their biological functions is one of the key steps in understanding microbial diversity, metabolic potentials, and evolutionary histories. However, even in the best-studied prokaryotic genomes, not all proteins can be characterized by classical in vivo, in vitro, and/or in silico methods-a challenge rapidly growing alongside the advent of next-generation sequencing technologies and their enormous extension of 'omics' data in public databases. These so-called hypothetical proteins (HPs) represent a huge knowledge gap and hidden potential for biotechnological applications. Opportunities for leveraging the available 'Big Data' have recently proliferated with the use of artificial intelligence (AI). Here, we review the aims and methods of protein annotation and explain the different principles behind machine and deep learning algorithms including recent research examples, in order to assist both biologists wishing to apply AI tools in developing comprehensive genome annotations and computer scientists who want to contribute to this leading edge of biological research.
Collapse
Affiliation(s)
- Zachary Ardern
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany
- Wellcome Trust Sanger Institute, Hinxton, Saffron Walden CB10 1RQ, United Kingdom
| | - Sagarika Chakraborty
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany
| | - Florian Lenk
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany
| | - Anne-Kristin Kaster
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
3
|
Jiang Z, Wang D, Chen Y. Automatic classification of nerve discharge rhythms based on sparse auto-encoder and time series feature. BMC Bioinformatics 2022; 22:619. [PMID: 35168551 PMCID: PMC8848584 DOI: 10.1186/s12859-022-04592-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Accepted: 01/11/2022] [Indexed: 12/01/2022] Open
Abstract
Background Nerve discharge is the carrier of information transmission, which can reveal the basic rules of various nerve activities. Recognition of the nerve discharge rhythm is the key to correctly understand the dynamic behavior of the nervous system. The previous methods for the nerve discharge recognition almost depended on the traditional statistical features, and the nonlinear dynamical features of the discharge activity. The artificial extraction and the empirical judgment of the features were required for the recognition. Thus, these methods suffered from subjective factors and were not conducive to the identification of a large number of discharge rhythms. Results The ability of automatic feature extraction along with the development of the neural network has been greatly improved. In this paper, an effective discharge rhythm classification model based on sparse auto-encoder was proposed. The sparse auto-encoder was used to construct the feature learning network. The simulated discharge data from the Chay model and its variants were taken as the input of the network, and the fused features, including the network learning features, covariance and approximate entropy of nerve discharge, were classified by Softmax. The results showed that the accuracy of the classification on the testing data was 87.5%, which could provide more accurate classification results. Compared with other methods for the identification of nerve discharge types, this method could extract the characteristics of nerve discharge rhythm automatically without artificial design, and show a higher accuracy. Conclusions The sparse auto-encoder, even neural network has not been used to classify the basic nerve discharge from neither biological experiment data nor model simulation data. The automatic classification method of nerve discharge rhythm based on the sparse auto-encoder in this paper reduced the subjectivity and misjudgment of the artificial feature extraction, saved the time for the comparison with the traditional method, and improved the intelligence of the classification of discharge types. It could further help us to recognize and identify the nerve discharge activities in a new way. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04592-3.
Collapse
Affiliation(s)
- Zhongting Jiang
- School of Information Science and Engineering, University of Jinan, Jinan, 250022, China
| | - Dong Wang
- School of Information Science and Engineering, University of Jinan, Jinan, 250022, China. .,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, 250022, China.
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, 250022, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, 250022, China
| |
Collapse
|
4
|
Xiong E, Cao D, Qu C, Zhao P, Wu Z, Yin D, Zhao Q, Gong F. Multilocation proteins in organelle communication: Based on protein-protein interactions. PLANT DIRECT 2022; 6:e386. [PMID: 35229068 PMCID: PMC8861329 DOI: 10.1002/pld3.386] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Revised: 12/17/2021] [Accepted: 01/18/2022] [Indexed: 05/25/2023]
Abstract
Protein-protein interaction (PPI) plays a crucial role in most biological processes, including signal transduction and cell apoptosis. Importantly, the knowledge of PPIs can be useful for identification of multimeric protein complexes and elucidation of uncharacterized protein functions. Arabidopsis thaliana, the best-characterized dicotyledonous plant, the steadily increasing amount of information on the levels of its proteome and signaling pathways is progressively enabling more researchers to construct models for cellular processes for the plant, which in turn encourages more experimental data to be generated. In this study, we performed an overview analysis of the 10 major organelles and their associated proteins of the dicotyledonous model plant Arabidopsis thaliana via PPI network, and found that PPI may play an important role in organelle communication. Further, multilocation proteins, especially phosphorylation-related multilocation proteins, can function as a "needle and thread" via PPIs and play an important role in organelle communication. Similar results were obtained in a monocotyledonous model crop, rice. Furthermore, we provide a research strategy for multilocation proteins by LOPIT technique, proteomics, and bioinformatics analysis and also describe their potential role in the field of plant science. The results provide a new view that the phosphorylation-related multilocation proteins play an important role in organelle communication and provide new insight into PPIs and novel directions for proteomic research. The research of phosphorylation-related multilocation proteins may promote the development of organelle communication and provide an important theoretical basis for plant responses to external stress.
Collapse
Affiliation(s)
- Erhui Xiong
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| | - Di Cao
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| | - Chengxin Qu
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| | - Pengfei Zhao
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| | - Zhaokun Wu
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| | - Dongmei Yin
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| | - Quanzhi Zhao
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| | - Fangping Gong
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| |
Collapse
|
5
|
Savulescu AF, Bouilhol E, Beaume N, Nikolski M. Prediction of RNA subcellular localization: Learning from heterogeneous data sources. iScience 2021; 24:103298. [PMID: 34765919 PMCID: PMC8571491 DOI: 10.1016/j.isci.2021.103298] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
RNA subcellular localization has recently emerged as a widespread phenomenon, which may apply to the majority of RNAs. The two main sources of data for characterization of RNA localization are sequence features and microscopy images, such as obtained from single-molecule fluorescent in situ hybridization-based techniques. Although such imaging data are ideal for characterization of RNA distribution, these techniques remain costly, time-consuming, and technically challenging. Given these limitations, imaging data exist only for a limited number of RNAs. We argue that the field of RNA localization would greatly benefit from complementary techniques able to characterize location of RNA. Here we discuss the importance of RNA localization and the current methodology in the field, followed by an introduction on prediction of location of molecules. We then suggest a machine learning approach based on the integration between imaging localization data and sequence-based data to assist in characterization of RNA localization on a transcriptome level.
Collapse
Affiliation(s)
- Anca Flavia Savulescu
- Division of Chemical, Systems & Synthetic Biology, Institute for Infectious Disease & Molecular Medicine, Faculty of Health Sciences, University of Cape Town, 7925 Cape Town, South Africa
| | - Emmanuel Bouilhol
- Université de Bordeaux, Bordeaux Bioinformatics Center, Bordeaux, France
- Université de Bordeaux, CNRS, IBGC, UMR 5095, Bordeaux, France
| | - Nicolas Beaume
- Division of Medical Virology, Faculty of Health Sciences, University of Cape Town,7925 Cape Town, South Africa
| | - Macha Nikolski
- Université de Bordeaux, Bordeaux Bioinformatics Center, Bordeaux, France
- Université de Bordeaux, CNRS, IBGC, UMR 5095, Bordeaux, France
| |
Collapse
|
6
|
Abstract
Background:
Thermophilic proteins can maintain good activity under high temperature,
therefore, it is important to study thermophilic proteins for the thermal stability of proteins.
Objective:
In order to solve the problem of low precision and low efficiency in predicting
thermophilic proteins, a prediction method based on feature fusion and machine learning was
proposed in this paper.
Methods:
For the selected thermophilic data sets, firstly, the thermophilic protein sequence was
characterized based on feature fusion by the combination of g-gap dipeptide, entropy density and
autocorrelation coefficient. Then, Kernel Principal Component Analysis (KPCA) was used to reduce
the dimension of the expressed protein sequence features in order to reduce the training time and
improve efficiency. Finally, the classification model was designed by using the classification
algorithm.
Results:
A variety of classification algorithms was used to train and test on the selected thermophilic
dataset. By comparison, the accuracy of the Support Vector Machine (SVM) under the jackknife
method was over 92%. The combination of other evaluation indicators also proved that the SVM
performance was the best.
Conclusion:
Because of choosing an effectively feature representation method and a robust
classifier, the proposed method is suitable for predicting thermophilic proteins and is superior to
most reported methods.
Collapse
Affiliation(s)
- Xian-Fang Wang
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Peng Gao
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Yi-Feng Liu
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Hong-Fei Li
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Fan Lu
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| |
Collapse
|
7
|
Jiang Z, Wang D, Shang H, Chen Y. Effect of potassium channel noise on nerve discharge based on the Chay model. Technol Health Care 2020; 28:371-381. [PMID: 32364170 PMCID: PMC7369062 DOI: 10.3233/thc-209038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
BACKGROUND: The nervous system senses and transmits information through the firing behavior of neurons, and this process is affected by various noises. However, in the previous study of the influence of noise on nerve discharge, the channel of some noise effects is not clear, and the difference from other noises was not examined. OBJECTIVE: To construct ion channel noise which is more biologically significant, and to clarify the basic characteristics of the random firing rhythm of neurons generated by different types of noise acting on ion channels. Method: Based on the dynamics of the ion channel, we constructed ion channel noise. We simulated the nerve discharge based on the Chay model of potassium ion channel noise, and used the nonlinear time series analysis method to measure the certainty and randomness of nerve discharge. RESULTS: In the Chay model with potassium ion noise, the chaotic rhythm defined by the original model could be effectively unified with the random rhythm simulated by the previous random Chay model into a periodic bifurcation process. CONCLUSION: This method clarified the influence of ion channel noise on nerve discharge, better understood the randomness of nerve discharge and provided a more reasonable explanation for the mechanism of nerve discharge.
Collapse
Affiliation(s)
- Zhongting Jiang
- School of Information Science and Engineering, University of Jinan, Jinan, Shandong, China
| | - Dong Wang
- School of Information Science and Engineering, University of Jinan, Jinan, Shandong, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, Shandong, China.,Key Laboratory of Medicinal Plant and Animal Resources of Qinghai-Tibet Plateau in Qinghai Province, Qinghai Normal University, Xining, Qinghai, China
| | - Huijie Shang
- School of Information Science and Engineering, University of Jinan, Jinan, Shandong, China
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, Shandong, China
| |
Collapse
|
8
|
PSO-LocBact: A Consensus Method for Optimizing Multiple Classifier Results for Predicting the Subcellular Localization of Bacterial Proteins. BIOMED RESEARCH INTERNATIONAL 2019; 2019:5617153. [PMID: 31886228 PMCID: PMC6925685 DOI: 10.1155/2019/5617153] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Revised: 10/03/2019] [Accepted: 10/30/2019] [Indexed: 02/06/2023]
Abstract
Several computational approaches for predicting subcellular localization have been developed and proposed. These approaches provide diverse performance because of their different combinations of protein features, training datasets, training strategies, and computational machine learning algorithms. In some cases, these tools may yield inconsistent and conflicting prediction results. It is important to consider such conflicting or contradictory predictions from multiple prediction programs during protein annotation, especially in the case of a multiclass classification problem such as subcellular localization. Hence, to address this issue, this work proposes the use of the particle swarm optimization (PSO) algorithm to combine the prediction outputs from multiple different subcellular localization predictors with the aim of integrating diverse prediction models to enhance the final predictions. Herein, we present PSO-LocBact, a consensus classifier based on PSO that can be used to combine the strengths of several preexisting protein localization predictors specially designed for bacteria. Our experimental results indicate that the proposed method can resolve inconsistency problems in subcellular localization prediction for both Gram-negative and Gram-positive bacterial proteins. The average accuracy achieved on each test dataset is over 98%, higher than that achieved with any individual predictor.
Collapse
|