1
|
Singh A, Tiwari AK. Machine learning-based approach for prediction of ion channels and their subclasses. J Cell Biochem 2023; 124:72-88. [PMID: 36271914 DOI: 10.1002/jcb.30343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 10/10/2022] [Accepted: 10/12/2022] [Indexed: 01/25/2023]
Abstract
Ion channels are ion-permeable protein pores that are found in all cell lipid membranes. Distinct ion channels play multiple roles in biological processes. Proteomic data is fast accumulating as a result of the fast growth of mass spectrometry and giving us the chance to comprehensively explore ion channel classes along with their subclasses. This paper proposes an eXtreme Gradient Boosting (XGBoost)-based method to estimate the ion channel classes and their subclasses. Here, 12 feature vectors are applied to better characterize protein sequences like amino acid composition, pseudo-amino acid composition, normalized moreau-broto autocorrelation, amphiphilic pseudo-amino acid composition, dipeptide composition, Geary autocorrelation, tripeptide composition, sequence-order-coupling number, composition/transition/distribution, conjoint triad, moran autocorrelation, quasi-sequence-order descriptors. Here, a total of 9920 features are extracted from the protein sequence. The principal component analysis is applied to determine the optimal number of features to optimize the performance. In 10-fold cross-validation the proposed XGBoost based approach with optimal 50 features achieved accuracy of 100%, 98.70%, 98.77%, 97.26%, 87.40%, 97.39%, 98.03%, 96.42%, and F1-Score of 100%, 99%, 99%, 97%, 87%, 97%, 98%, 97%, for prediction of ion channel and nonion channel, voltage-gated and ligand-gated ion channels, subclasses of voltage-gated ion channels (VGICs), subclasses of ligand-gated ion channels (LGICs), subclasses of voltage-gated calcium channels (VGCCs), subclasses of voltage-gated potassium channels (VGKCs), subclasses of voltage-gated sodium channels (VGSCs), and subclasses of voltage-gated chloride channels, respectively. Here the proposed approach also compares with the other approaches such as support vector machine, k-nearest neighbor, Gaussian Naïve Bayes, and random forest and also compares with existing methods such as support vector machine (SVM) with maximum relevance maximum distance with an accuracy of 86.6%, 83.7%, and 85.1%, for ion channels, non-ion channels and overall respectively and SVM with radial basis function kernel-based method with an accuracy of 100%, 97% and 99.9% for ion channels, nonion channels, and overall accuracy, respectively.
Collapse
Affiliation(s)
- Anuj Singh
- Department of Computer Science and Engineering, Kamla Nehru Institute of Technology, Sultanpur, Uttar Pradesh, India
| | - Arvind Kumar Tiwari
- Department of Computer Science and Engineering, Kamla Nehru Institute of Technology, Sultanpur, Uttar Pradesh, India
| |
Collapse
|
2
|
Apostolakou AE, Nastou KC, Petichakis GN, Litou ZI, Iconomidou VA. LiGIoNs: A computational method for the detection and classification of ligand-gated ion channels. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2022; 1864:183956. [PMID: 35577076 DOI: 10.1016/j.bbamem.2022.183956] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 04/19/2022] [Accepted: 05/02/2022] [Indexed: 06/15/2023]
Abstract
Ligand-Gated Ion Channels (LGICs) is one of the largest groups of transmembrane proteins. Due to their major role in synaptic transmission, both in the nervous system and the somatic neuromuscular junction, LGICs present attractive therapeutic targets. During the last few years, several computational methods for the detection of LGICs have been developed. These methods are based on machine learning approaches utilizing features extracted solely from the amino acid composition. Here we report the development of LiGIoNs, a profile Hidden Markov Model (pHMM) method for the prediction and ligand-based classification of LGICs. The method consists of a library of 10 pHMMs, one per LGIC subfamily, built from the alignment of representative LGIC sequences. In addition, 14 Pfam pHMMs are used to further annotate and classify unknown protein sequences into one of the 10 LGIC subfamilies. Evaluation of the method showed that it outperforms existing methods in the detection of LGICs. On top of that, LiGIoNs is the only currently available method that classifies LGICs into subfamilies. The method is available online at http://bioinformatics.biol.uoa.gr/ligions/.
Collapse
Affiliation(s)
- Avgi E Apostolakou
- Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens 15701, Greece
| | - Katerina C Nastou
- Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens 15701, Greece
| | - Georgios N Petichakis
- Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens 15701, Greece
| | - Zoi I Litou
- Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens 15701, Greece
| | - Vassiliki A Iconomidou
- Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens 15701, Greece.
| |
Collapse
|
3
|
Nguyen TTD, Le NQK, Tran TA, Pham DM, Ou YY. Incorporating a transfer learning technique with amino acid embeddings to efficiently predict N-linked glycosylation sites in ion channels. Comput Biol Med 2021; 130:104212. [PMID: 33454535 DOI: 10.1016/j.compbiomed.2021.104212] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 12/21/2020] [Accepted: 01/04/2021] [Indexed: 11/27/2022]
Abstract
Glycosylation is a dynamic enzymatic process that attaches glycan to proteins or other organic molecules such as lipoproteins. Research has shown that such a process in ion channel proteins plays a fundamental role in modulating ion channel functions. This study used a computational method to predict N-linked glycosylation sites, the most common type, in ion channel proteins. From segments of ion channel proteins centered around N-linked glycosylation sites, the amino acid embedding vectors of each residue were concatenated to create features for prediction. We experimented with two different models for converting amino acids to their corresponding embeddings: one was fed with ion channel sequences and the other with a large dataset composed of more than one million protein sequences. The latter model stemmed from the idea of transfer learning technique and emerged as a more efficient feature extractor. Our best model was obtained from this transfer learning approach and a hyperparameter tuning process with a random search on 5-fold cross-validation data. It achieved an accuracy, specificity, sensitivity, and Matthews correlation coefficient of 93.4%, 92.8%, 98.6%, and 0.726, respectively. Corresponding scores on an independent test were 92.9%, 92.2%, 99%, and 0.717. These results outperform the position-specific scoring matrix features that are predominantly employed in post-translational modification site predictions. Furthermore, compared to N-GlyDE, GlycoEP, SPRINT-Gly, the most recent N-linked glycosylation site predictors, our model yields higher scores on the above 4 metrics, thus further demonstrating the efficiency of our approach.
Collapse
Affiliation(s)
| | - Nguyen-Quoc-Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, 106, Taiwan; Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, 106, Taiwan
| | | | - Dinh-Minh Pham
- Institute of Biotechnology, Vietnam Academy of Science and Technology, Hanoi, Viet Nam
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan.
| |
Collapse
|
4
|
Gao J, Miao Z, Zhang Z, Wei H, Kurgan L. Prediction of Ion Channels and their Types from Protein Sequences: Comprehensive Review and Comparative Assessment. Curr Drug Targets 2020; 20:579-592. [PMID: 30360734 DOI: 10.2174/1389450119666181022153942] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2018] [Revised: 10/03/2018] [Accepted: 10/04/2018] [Indexed: 12/20/2022]
Abstract
BACKGROUND Ion channels are a large and growing protein family. Many of them are associated with diseases, and consequently, they are targets for over 700 drugs. Discovery of new ion channels is facilitated with computational methods that predict ion channels and their types from protein sequences. However, these methods were never comprehensively compared and evaluated. OBJECTIVE We offer first-of-its-kind comprehensive survey of the sequence-based predictors of ion channels. We describe eight predictors that include five methods that predict ion channels, their types, and four classes of the voltage-gated channels. We also develop and use a new benchmark dataset to perform comparative empirical analysis of the three currently available predictors. RESULTS While several methods that rely on different designs were published, only a few of them are currently available and offer a broad scope of predictions. Support and availability after publication should be required when new methods are considered for publication. Empirical analysis shows strong performance for the prediction of ion channels and modest performance for the prediction of ion channel types and voltage-gated channel classes. We identify a substantial weakness of current methods that cannot accurately predict ion channels that are categorized into multiple classes/types. CONCLUSION Several predictors of ion channels are available to the end users. They offer practical levels of predictive quality. Methods that rely on a larger and more diverse set of predictive inputs (such as PSIONplus) are more accurate. New tools that address multi-label prediction of ion channels should be developed.
Collapse
Affiliation(s)
- Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Zhen Miao
- College of Life Sciences, Nankai University, Tianjin, China
| | - Zhaopeng Zhang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Hong Wei
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, United States
| |
Collapse
|
5
|
PSIONplus m Server for Accurate Multi-Label Prediction of Ion Channels and Their Types. Biomolecules 2020; 10:biom10060876. [PMID: 32517331 PMCID: PMC7355608 DOI: 10.3390/biom10060876] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 05/28/2020] [Accepted: 06/04/2020] [Indexed: 11/26/2022] Open
Abstract
Computational prediction of ion channels facilitates the identification of putative ion channels from protein sequences. Several predictors of ion channels and their types were developed in the last quindecennial. While they offer reasonably accurate predictions, they also suffer a few shortcomings including lack of availability, parallel prediction mode, single-label prediction (inability to predict multiple channel subtypes), and incomplete scope (inability to predict subtypes of the voltage-gated channels). We developed a first-of-its-kind PSIONplusm method that performs sequential multi-label prediction of ion channels and their subtypes for both voltage-gated and ligand-gated channels. PSIONplusm sequentially combines the outputs produced by three support vector machine-based models from the PSIONplus predictor and is available as a webserver. Empirical tests show that PSIONplusm outperforms current methods for the multi-label prediction of the ion channel subtypes. This includes the existing single-label methods that are available to the users, a naïve multi-label predictor that combines results produced by multiple single-label methods, and methods that make predictions based on sequence alignment and domain annotations. We also found that the current methods (including PSIONplusm) fail to accurately predict a few of the least frequently occurring ion channel subtypes. Thus, new predictors should be developed when a larger quantity of annotated ion channels will be available to train predictive models.
Collapse
|
6
|
Han K, Wang M, Zhang L, Wang Y, Guo M, Zhao M, Zhao Q, Zhang Y, Zeng N, Wang C. Predicting Ion Channels Genes and Their Types With Machine Learning Techniques. Front Genet 2019; 10:399. [PMID: 31130983 PMCID: PMC6510169 DOI: 10.3389/fgene.2019.00399] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Accepted: 04/12/2019] [Indexed: 02/01/2023] Open
Abstract
Motivation: The number of ion channels is increasing rapidly. As many of them are associated with diseases, they are the targets of more than 700 drugs. The discovery of new ion channels is facilitated by computational methods that predict ion channels and their types from protein sequences. Methods: We used the SVMProt and the k-skip-n-gram methods to extract the feature vectors of ion channels, and obtained 188- and 400-dimensional features, respectively. The 188- and 400-dimensional features were combined to obtain 588-dimensional features. We then employed the maximum-relevance-maximum-distance method to reduce the dimensions of the 588-dimensional features. Finally, the support vector machine and random forest methods were used to build the prediction models to evaluate the classification effect. Results: Different methods were employed to extract various feature vectors, and after effective dimensionality reduction, different classifiers were used to classify the ion channels. We extracted the ion channel data from the Universal Protein Resource (UniProt, http://www.uniprot.org/) and Ligand-Gated Ion Channel databases (http://www.ebi.ac.uk/compneur-srv/LGICdb/LGICdb.php), and then verified the performance of the classifiers after screening. The findings of this study could inform the research and development of drugs.
Collapse
Affiliation(s)
- Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, China
| | - Miao Wang
- Life Sciences and Environmental Sciences Development Center, Harbin University of Commerce, Harbin, China
| | - Lei Zhang
- Life Sciences and Environmental Sciences Development Center, Harbin University of Commerce, Harbin, China
| | - Ying Wang
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Mian Guo
- Department of Neurosurgery, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Ming Zhao
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, China
| | - Qian Zhao
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, China
| | - Yu Zhang
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, China
| | - Nianyin Zeng
- Department of Instrumental and Electrical Engineering, Xiamen University, Xiamen, China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
7
|
Taju SW, Ou Y. DeepIon: Deep learning approach for classifying ion transporters and ion channels from membrane proteins. J Comput Chem 2019; 40:1521-1529. [DOI: 10.1002/jcc.25805] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 01/19/2019] [Accepted: 01/30/2019] [Indexed: 01/20/2023]
Affiliation(s)
- Semmy Wellem Taju
- Department of Computer Science and EngineeringYuan Ze University Chung‐Li 32003 Taiwan
| | - Yu‐Yen Ou
- Department of Computer Science and EngineeringYuan Ze University Chung‐Li 32003 Taiwan
| |
Collapse
|
8
|
Zhao YW, Su ZD, Yang W, Lin H, Chen W, Tang H. IonchanPred 2.0: A Tool to Predict Ion Channels and Their Types. Int J Mol Sci 2017; 18:ijms18091838. [PMID: 28837067 PMCID: PMC5618487 DOI: 10.3390/ijms18091838] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Revised: 08/21/2017] [Accepted: 08/21/2017] [Indexed: 12/11/2022] Open
Abstract
Ion channels (IC) are ion-permeable protein pores located in the lipid membranes of all cells. Different ion channels have unique functions in different biological processes. Due to the rapid development of high-throughput mass spectrometry, proteomic data are rapidly accumulating and provide us an opportunity to systematically investigate and predict ion channels and their types. In this paper, we constructed a support vector machine (SVM)-based model to quickly predict ion channels and their types. By considering the residue sequence information and their physicochemical properties, a novel feature-extracted method which combined dipeptide composition with the physicochemical correlation between two residues was employed. A feature selection strategy was used to improve the performance of the model. Comparison results of in jackknife cross-validation demonstrated that our method was superior to other methods for predicting ion channels and their types. Based on the model, we built a web server called IonchanPred which can be freely accessed from http://lin.uestc.edu.cn/server/IonchanPredv2.0.
Collapse
Affiliation(s)
- Ya-Wei Zhao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Zhen-Dong Su
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Wuritu Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
- Development and Planning Department, Inner Mongolia University, Hohhot 010021, China.
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Wei Chen
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, China.
| | - Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China.
| |
Collapse
|