1
|
Nath G, Coursey A, Ekong J, Rastegari E, Sengupta S, Dag AZ, Delen D. Determining the temporal factors of survival associated with brain and nervous system cancer patients: A hybrid machine learning methodology. INTERNATIONAL JOURNAL OF HEALTHCARE MANAGEMENT 2023. [DOI: 10.1080/20479700.2023.2196101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
Affiliation(s)
- Gopal Nath
- Department of Mathematics and Statistics, Murray State University, Murray, KY, USA
| | - Austin Coursey
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Joseph Ekong
- Department of Industrial Engineering, Western New England University, Springfield, MA, USA
| | - Elham Rastegari
- Department of Business, Intelligence and Analytics, Creighton University, Omaha, NE, USA
| | - Saptarshi Sengupta
- Department of Computer Science, San José State University, San José, CA, USA
| | - Asli Z. Dag
- Heider College of Business, Creighton University, Omaha, NE, USA
| | - Dursun Delen
- Spears School of Business, Oklahoma State University, Stillwater, OK, USA
- Faculty of Engineering and Natural Sciences, Istinye University, Istanbul, Turkey
| |
Collapse
|
2
|
Sepúlveda-Torres R, Vicente M, Saquete E, Lloret E, Palomar M. Leveraging relevant summarized information and multi-layer classification to generalize the detection of misleading headlines. DATA KNOWL ENG 2023. [DOI: 10.1016/j.datak.2023.102176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
|
3
|
Apostolakou AE, Nastou KC, Petichakis GN, Litou ZI, Iconomidou VA. LiGIoNs: A computational method for the detection and classification of ligand-gated ion channels. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2022; 1864:183956. [PMID: 35577076 DOI: 10.1016/j.bbamem.2022.183956] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 04/19/2022] [Accepted: 05/02/2022] [Indexed: 06/15/2023]
Abstract
Ligand-Gated Ion Channels (LGICs) is one of the largest groups of transmembrane proteins. Due to their major role in synaptic transmission, both in the nervous system and the somatic neuromuscular junction, LGICs present attractive therapeutic targets. During the last few years, several computational methods for the detection of LGICs have been developed. These methods are based on machine learning approaches utilizing features extracted solely from the amino acid composition. Here we report the development of LiGIoNs, a profile Hidden Markov Model (pHMM) method for the prediction and ligand-based classification of LGICs. The method consists of a library of 10 pHMMs, one per LGIC subfamily, built from the alignment of representative LGIC sequences. In addition, 14 Pfam pHMMs are used to further annotate and classify unknown protein sequences into one of the 10 LGIC subfamilies. Evaluation of the method showed that it outperforms existing methods in the detection of LGICs. On top of that, LiGIoNs is the only currently available method that classifies LGICs into subfamilies. The method is available online at http://bioinformatics.biol.uoa.gr/ligions/.
Collapse
Affiliation(s)
- Avgi E Apostolakou
- Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens 15701, Greece
| | - Katerina C Nastou
- Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens 15701, Greece
| | - Georgios N Petichakis
- Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens 15701, Greece
| | - Zoi I Litou
- Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens 15701, Greece
| | - Vassiliki A Iconomidou
- Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens 15701, Greece.
| |
Collapse
|
4
|
Yuan SS, Gao D, Xie XQ, Ma CY, Su W, Zhang ZY, Zheng Y, Ding H. IBPred: A sequence-based predictor for identifying ion binding protein in phage. Comput Struct Biotechnol J 2022; 20:4942-4951. [PMID: 36147670 PMCID: PMC9474292 DOI: 10.1016/j.csbj.2022.08.053] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 08/23/2022] [Accepted: 08/24/2022] [Indexed: 11/16/2022] Open
Abstract
Ion binding proteins (IBPs) can selectively and non-covalently interact with ions. IBPs in phages also play an important role in biological processes. Therefore, accurate identification of IBPs is necessary for understanding their biological functions and molecular mechanisms that involve binding to ions. Since molecular biology experimental methods are still labor-intensive and cost-ineffective in identifying IBPs, it is helpful to develop computational methods to identify IBPs quickly and efficiently. In this work, a random forest (RF)-based model was constructed to quickly identify IBPs. Based on the protein sequence information and residues' physicochemical properties, the dipeptide composition combined with the physicochemical correlation between two residues were proposed for the extraction of features. A feature selection technique called analysis of variance (ANOVA) was used to exclude redundant information. By comparing with other classified methods, we demonstrated that our method could identify IBPs accurately. Based on the model, a Python package named IBPred was built with the source code which can be accessed at https://github.com/ShishiYuan/IBPred.
Collapse
Affiliation(s)
- Shi-Shi Yuan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Dong Gao
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xue-Qin Xie
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cai-Yi Ma
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Su
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhao-Yue Zhang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| | - Yan Zheng
- Baotou Medical College, Baotou 014040, China
| | - Hui Ding
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
5
|
Zou H. iAHTP-LH: Integrating Low-Order and High-Order Correlation Information for Identifying Antihypertensive Peptides. Int J Pept Res Ther 2022. [DOI: 10.1007/s10989-022-10414-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
6
|
Zou H, Yang F, Yin Z. iTTCA-MFF: identifying tumor T cell antigens based on multiple feature fusion. Immunogenetics 2022; 74:447-454. [PMID: 35246701 DOI: 10.1007/s00251-022-01258-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 02/26/2022] [Indexed: 11/05/2022]
Abstract
Cancer is a terrible disease, recent studies reported that tumor T cell antigens (TTCAs) may play a promising role in cancer treatment. Since experimental methods are still expensive and time-consuming, it is highly desirable to develop automatic computational methods to identify tumor T cell antigens from the huge amount of natural and synthetic peptides. Hence, in this study, a novel computational model called iTTCA-MFF was proposed to identify TTCAs. In order to describe the sequence effectively, the physicochemical (PC) properties of amino acid and residue pairwise energy content matrix (RECM) were firstly employed to encode peptide sequences. Then, two different approaches including covariance and Pearson's correlation coefficient (PCC) were used to collect discriminative information from PC and RECM matrixes. Next, an effective feature selection approach called the least absolute shrinkage and selection operator (LAASO) was adopted to select the optimal features. These selected optimal features were fed into support vector machine (SVM) for identifying TTCAs. We performed experiments on two different datasets, experimental results indicated that the proposed method is promising and may play a complementary role to the existing methods for identifying TTCAs. The datasets and codes can be available at https://figshare.com/articles/online_resource/iTTCA-MFF/17636120 .
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, 330003, China.
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, 330003, China
| | - Zhijian Yin
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, 330003, China
| |
Collapse
|
7
|
Nasiri H, Alavi SA. A Novel Framework Based on Deep Learning and ANOVA Feature Selection Method for Diagnosis of COVID-19 Cases from Chest X-Ray Images. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:4694567. [PMID: 35013680 PMCID: PMC8742147 DOI: 10.1155/2022/4694567] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 12/20/2021] [Indexed: 12/12/2022]
Abstract
Background and Objective. The new coronavirus disease (known as COVID-19) was first identified in Wuhan and quickly spread worldwide, wreaking havoc on the economy and people's everyday lives. As the number of COVID-19 cases is rapidly increasing, a reliable detection technique is needed to identify affected individuals and care for them in the early stages of COVID-19 and reduce the virus's transmission. The most accessible method for COVID-19 identification is Reverse Transcriptase-Polymerase Chain Reaction (RT-PCR); however, it is time-consuming and has false-negative results. These limitations encouraged us to propose a novel framework based on deep learning that can aid radiologists in diagnosing COVID-19 cases from chest X-ray images. Methods. In this paper, a pretrained network, DenseNet169, was employed to extract features from X-ray images. Features were chosen by a feature selection method, i.e., analysis of variance (ANOVA), to reduce computations and time complexity while overcoming the curse of dimensionality to improve accuracy. Finally, selected features were classified by the eXtreme Gradient Boosting (XGBoost). The ChestX-ray8 dataset was employed to train and evaluate the proposed method. Results and Conclusion. The proposed method reached 98.72% accuracy for two-class classification (COVID-19, No-findings) and 92% accuracy for multiclass classification (COVID-19, No-findings, and Pneumonia). The proposed method's precision, recall, and specificity rates on two-class classification were 99.21%, 93.33%, and 100%, respectively. Also, the proposed method achieved 94.07% precision, 88.46% recall, and 100% specificity for multiclass classification. The experimental results show that the proposed framework outperforms other methods and can be helpful for radiologists in the diagnosis of COVID-19 cases.
Collapse
Affiliation(s)
- Hamid Nasiri
- Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran
| | - Seyed Ali Alavi
- Electrical and Computer Engineering Department, Semnan University, Semnan, Iran
| |
Collapse
|
8
|
Ismail H, White C, Al-Barakati H, Newman RH, Kc DB. FEPS: A Tool for Feature Extraction from Protein Sequence. Methods Mol Biol 2022; 2499:65-104. [PMID: 35696075 DOI: 10.1007/978-1-0716-2317-6_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Machine learning has become one of the most popular choices for developing computational approaches in protein structural bioinformatics. The ability to extract features from protein sequence/structure often becomes one of the crucial steps for the development of machine learning-based approaches. Over the years, various sequence, structural, and physicochemical descriptors have been developed for proteins and these descriptors have been used to predict/solve various bioinformatics problems. Hence, several feature extraction tools have been developed over the years to help researchers to generate numeric features from protein sequences. Most of these tools have some limitations regarding the number of sequences they can handle and the subsequent preprocessing that is required for the generated features before they can be fed to machine learning methods. Here, we present Feature Extraction from Protein Sequences (FEPS), a toolkit for feature extraction. FEPS is a versatile software package for generating various descriptors from protein sequences and can handle several sequences: the number of which is limited only by the computational resources. In addition, the features extracted from FEPS do not require subsequent processing and are ready to be fed to the machine learning techniques as it provides various output formats as well as the ability to concatenate these generated features. FEPS is made freely available via an online web server as well as a stand-alone toolkit. FEPS, a comprehensive toolkit for feature extraction, will help spur the development of machine learning-based models for various bioinformatics problems.
Collapse
Affiliation(s)
- Hamid Ismail
- Department of Animal Science, North Carolina A&T State University, Greensboro, NC, USA
| | - Clarence White
- Computational Science and Engineering Department, North Carolina A&T State University, Greensboro, NC, USA
| | - Hussam Al-Barakati
- Department of Computer Science, Jamoum University College, Umm Al-Qura University, Jamoum, Saudi Arabia
| | - Robert H Newman
- Department of Biology, North Carolina A&T State University, Greensboro, NC, USA
| | - Dukka B Kc
- Department of Computer Science, Michigan Technological University, Houghton, MI, USA.
| |
Collapse
|
9
|
Meher PK, Satpathy S. Improved recognition of splice sites in A. thaliana by incorporating secondary structure information into sequence-derived features: a computational study. 3 Biotech 2021; 11:484. [PMID: 34790508 DOI: 10.1007/s13205-021-03036-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 10/18/2021] [Indexed: 10/19/2022] Open
Abstract
Identification of splice sites is an important aspect with regard to the prediction of gene structure. In most of the existing splice site prediction studies, machine learning algorithms coupled with sequence-derived features have been successfully employed for splice site recognition. However, the splice site identification by incorporating the secondary structure information is lacking, particularly in plant species. Thus, we made an attempt in this study to evaluate the performance of structural features on the splice site prediction accuracy in Arabidopsis thaliana. Prediction accuracies were evaluated with the sequence-derived features alone as well as by incorporating the structural features into the sequence-derived features, where support vector machine (SVM) was employed as prediction algorithm. Both short (40 base pairs) and long (105 base pairs) sequence datasets were considered for evaluation. After incorporating the secondary structure features, improvements in accuracies were observed only for the longer sequence dataset and the improvement was found to be higher with the sequence-derived features that accounted nucleotide dependencies. On the other hand, either a little or no improvement in accuracies was found for the short sequence dataset. The performance of SVM was further compared with that of LogitBoost, Random Forest (RF), AdaBoost and XGBoost machine learning methods. The prediction accuracies of SVM, AdaBoost and XGBoost were observed to be at par and higher than that of RF and LogitBoost algorithms. While prediction was performed by taking all the sequence-derived features along with the structural features, a little improvement in accuracies was found as compared to the combination of individual sequence-based features and structural features. To the best of our knowledge, this is the first attempt concerning the computational prediction of splice sites using machine learning methods by incorporating the secondary structure information into the sequence-derived features. All the source codes are available at https://github.com/meher861982/SSFeature. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s13205-021-03036-8.
Collapse
|
10
|
Zou H. Identifying blood‐brain barrier peptides by using amino acids physicochemical properties and features fusion method. Pept Sci (Hoboken) 2021. [DOI: 10.1002/pep2.24247] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics Jiangxi Science and Technology Normal University Nanchang China
| |
Collapse
|
11
|
Zou H, Yang F, Yin Z. Identifying N7-methylguanosine sites by integrating multiple features. Biopolymers 2021; 113:e23480. [PMID: 34709657 DOI: 10.1002/bip.23480] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 10/12/2021] [Accepted: 10/14/2021] [Indexed: 11/10/2022]
Abstract
Recent studies reported that N7-methylguanosine (m7G) plays a vital role in gene expression regulation. As a consequence, determining the distribution of m7G is a crucial step towards further understanding its biological functions. Although biological experimental approaches are capable of accurately locating m7G sites, they are labor-intensive, costly, and time-consuming. Therefore, it is necessary to develop more effective and robust computational methods to replace, or at least complement current experimental methods. In this study, we developed a novel sequence-based computational tool to identify RNA m7G sites. In this model, 22 kinds of dinucleotide physicochemical (PC) properties were employed to encode the RNA sequence. Three types of descriptors, including auto-covariance, cross-covariance, and discrete wavelet transform were adopted to extract effective features from the PC matrix. The least absolute shrinkage and selection operator (LASSO) algorithm was utilized to reduce the influence of irrelevant or redundant features. Finally, these selected features were fed into a support vector machine (SVM) for distinguishing m7G from non-m7G sites. The proposed method significantly outperforms existing predictors across all evaluation metrics. It indicates that the approach is effective in identifying RNA m7G sites.
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Zhijian Yin
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| |
Collapse
|
12
|
Ashrafuzzaman M. Artificial Intelligence, Machine Learning and Deep Learning in Ion Channel Bioinformatics. MEMBRANES 2021; 11:membranes11090672. [PMID: 34564489 PMCID: PMC8467682 DOI: 10.3390/membranes11090672] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 08/20/2021] [Accepted: 08/30/2021] [Indexed: 11/28/2022]
Abstract
Ion channels are linked to important cellular processes. For more than half a century, we have been learning various structural and functional aspects of ion channels using biological, physiological, biochemical, and biophysical principles and techniques. In recent days, bioinformaticians and biophysicists having the necessary expertise and interests in computer science techniques including versatile algorithms have started covering a multitude of physiological aspects including especially evolution, mutations, and genomics of functional channels and channel subunits. In these focused research areas, the use of artificial intelligence (AI), machine learning (ML), and deep learning (DL) algorithms and associated models have been found very popular. With the help of available articles and information, this review provide an introduction to this novel research trend. Ion channel understanding is usually made considering the structural and functional perspectives, gating mechanisms, transport properties, channel protein mutations, etc. Focused research on ion channels and related findings over many decades accumulated huge data which may be utilized in a specialized scientific manner to fast conclude pinpointed aspects of channels. AI, ML, and DL techniques and models may appear as helping tools. This review aims at explaining the ways we may use the bioinformatics techniques and thus draw a few lines across the avenue to let the ion channel features appear clearer.
Collapse
Affiliation(s)
- Md Ashrafuzzaman
- Department of Biochemistry, College of Science, King Saud University, Riyadh 11451, Saudi Arabia
| |
Collapse
|
13
|
Jiang M, Zhao B, Luo S, Wang Q, Chu Y, Chen T, Mao X, Liu Y, Wang Y, Jiang X, Wei DQ, Xiong Y. NeuroPpred-Fuse: an interpretable stacking model for prediction of neuropeptides by fusing sequence information and feature selection methods. Brief Bioinform 2021; 22:6350884. [PMID: 34396388 DOI: 10.1093/bib/bbab310] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 07/01/2021] [Accepted: 07/18/2021] [Indexed: 12/13/2022] Open
Abstract
Neuropeptides acting as signaling molecules in the nervous system of various animals play crucial roles in a wide range of physiological functions and hormone regulation behaviors. Neuropeptides offer many opportunities for the discovery of new drugs and targets for the treatment of neurological diseases. In recent years, there have been several data-driven computational predictors of various types of bioactive peptides, but the relevant work about neuropeptides is little at present. In this work, we developed an interpretable stacking model, named NeuroPpred-Fuse, for the prediction of neuropeptides through fusing a variety of sequence-derived features and feature selection methods. Specifically, we used six types of sequence-derived features to encode the peptide sequences and then combined them. In the first layer, we ensembled three base classifiers and four feature selection algorithms, which select non-redundant important features complementarily. In the second layer, the output of the first layer was merged and fed into logistic regression (LR) classifier to train the model. Moreover, we analyzed the selected features and explained the feasibility of the selected features. Experimental results show that our model achieved 90.6% accuracy and 95.8% AUC on the independent test set, outperforming the state-of-the-art models. In addition, we exhibited the distribution of selected features by these tree models and compared the results on the training set to that on the test set. These results fully showed that our model has a certain generalization ability. Therefore, we expect that our model would provide important advances in the discovery of neuropeptides as new drugs for the treatment of neurological diseases.
Collapse
Affiliation(s)
- Mingming Jiang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Bowen Zhao
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shenggan Luo
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Qiankun Wang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Tianhang Chen
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yatong Liu
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xue Jiang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
14
|
ANOX: A robust computational model for predicting the antioxidant proteins based on multiple features. Anal Biochem 2021; 631:114257. [PMID: 34043981 DOI: 10.1016/j.ab.2021.114257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 05/12/2021] [Accepted: 05/14/2021] [Indexed: 11/20/2022]
Abstract
As an indispensable component of various living organisms, the antioxidant proteins have been studied for anti-aging and prevention of various diseases, such as altitude sickness, coronary heart disease, and even cancer. However, the traditional experimental methods for identifying the antioxidant proteins are very expensive and time-consuming. Thus, to address the challenge, a new predictor, named ANOX, was developed in this study. Multiple features, such as frequency matrix features (FRE), amino acid and dipeptide composition (AADP), evolutionary difference formula features (EEDP), k-separated bigrams (KSB), and PSI-PRED secondary structure (PRED), were extracted to generate the original feature space. To find the optimized feature subset, the Max-Relevance-Max-Distance (MRMD) algorithm was implemented for feature ranking and our model received the best performance with the top 1170 features. Rigorous tests were performed to evaluate the performance of ANOX, and the results showed that ANOX achieved a major improvement in the prediction accuracy of the antioxidant proteins (AUC:0.930 and 0.935 using 5-fold cross-validation or the jackknife test) compared to the state-of-the-art predictor AOPs-SVM (AUC:0.869 and 0.885). The dataset used in this study and the source code of ANOX are all available at https://github.com/NWAFU-LiuLab/ANOX.
Collapse
|
15
|
Nanni L, Brahnam S. Robust ensemble of handcrafted and learned approaches for DNA-binding proteins. APPLIED COMPUTING AND INFORMATICS 2021. [DOI: 10.1108/aci-03-2021-0051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks.
Design/methodology/approach
Efficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most powerful general-purpose DNA-BP classifier system.
Findings
The best ensemble proposed here produced comparable, if not superior, classification results on a broad and fair comparison with the literature across four different datasets representing a variety of DNA-BP classification tasks, thereby demonstrating both the power and generalizability of the proposed system.
Originality/value
Most DNA-BP methods proposed in the literature are only validated on one (rarely two) datasets/tasks. In this work, the authors report the performance of our general-purpose DNA-BP system on four datasets representing different DNA-BP classification tasks. The excellent results of the proposed best classifier system demonstrate the power of the proposed approach. These results can now be used for baseline comparisons by other researchers in the field.
Collapse
|
16
|
ANPrAod: Identify Antioxidant Proteins by Fusing Amino Acid Clustering Strategy and N-Peptide Combination. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:5518209. [PMID: 33927782 PMCID: PMC8049822 DOI: 10.1155/2021/5518209] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/02/2021] [Accepted: 03/10/2021] [Indexed: 11/18/2022]
Abstract
Antioxidant proteins perform significant functions in disease control and delaying aging which can prevent free radicals from damaging organisms. Accurate identification of antioxidant proteins has important implications for the development of new drugs and the treatment of related diseases, as they play a critical role in the control or prevention of cancer and aging-related conditions. Since experimental identification techniques are time-consuming and expensive, many computational methods have been proposed to identify antioxidant proteins. Although the accuracy of these methods is acceptable, there are still some challenges. In this study, we developed a computational model called ANPrAod to identify antioxidant proteins based on a support vector machine. In order to eliminate potential redundant features and improve prediction accuracy, 673 amino acid reduction alphabets were calculated by us to find the optimal feature representation scheme. The final model could produce an overall accuracy of 87.53% with the ROC of 0.7266 in five-fold cross-validation, which was better than the existing methods. The results of the independent dataset also demonstrated the excellent robustness and reliability of ANPrAod, which could be a promising tool for antioxidant protein identification and contribute to hypothesis-driven experimental design.
Collapse
|
17
|
Li X, Tang Q, Tang H, Chen W. Identifying Antioxidant Proteins by Combining Multiple Methods. Front Bioeng Biotechnol 2020; 8:858. [PMID: 32793581 PMCID: PMC7391787 DOI: 10.3389/fbioe.2020.00858] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 07/03/2020] [Indexed: 11/13/2022] Open
Abstract
Antioxidant proteins play important roles in preventing free radical oxidation from damaging cells and DNA. They have become ideal candidates of disease prevention and treatment. Therefore, it is urgent to identify antioxidants from natural compounds. Since experimental methods are still cost ineffective, a series of computational methods have been proposed to identify antioxidant proteins. However, the performance of the current methods are still not satisfactory. In this study, a support vector machine based method, called Vote9, was proposed to identify antioxidants, in which the sequences were encoded by using the features generated from 9 optimal individual models. Results from jackknife test demonstrated that Vote9 is comparable with the best one of the existing predictors for this task. We hope that Vote9 will become a useful tool or at least can play a complementary role to the existing methods for identifying antioxidants.
Collapse
Affiliation(s)
- Xianhai Li
- School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China.,Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Qiang Tang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hua Tang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Wei Chen
- School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China.,Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China.,School of Life Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, China
| |
Collapse
|
18
|
Abstract
During the last three decades or so, many efforts have been made to study the protein cleavage
sites by some disease-causing enzyme, such as HIV (Human Immunodeficiency Virus) protease
and SARS (Severe Acute Respiratory Syndrome) coronavirus main proteinase. It has become increasingly
clear <i>via</i> this mini-review that the motivation driving the aforementioned studies is quite wise,
and that the results acquired through these studies are very rewarding, particularly for developing peptide
drugs.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
19
|
Zhang D, Guan ZX, Zhang ZM, Li SH, Dao FY, Tang H, Lin H. Recent Development of Computational Predicting Bioluminescent Proteins. Curr Pharm Des 2020; 25:4264-4273. [PMID: 31696804 DOI: 10.2174/1381612825666191107100758] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/04/2019] [Indexed: 12/22/2022]
Abstract
Bioluminescent Proteins (BLPs) are widely distributed in many living organisms that act as a key role of light emission in bioluminescence. Bioluminescence serves various functions in finding food and protecting the organisms from predators. With the routine biotechnological application of bioluminescence, it is recognized to be essential for many medical, commercial and other general technological advances. Therefore, the prediction and characterization of BLPs are significant and can help to explore more secrets about bioluminescence and promote the development of application of bioluminescence. Since the experimental methods are money and time-consuming for BLPs identification, bioinformatics tools have played important role in fast and accurate prediction of BLPs by combining their sequences information with machine learning methods. In this review, we summarized and compared the application of machine learning methods in the prediction of BLPs from different aspects. We wish that this review will provide insights and inspirations for researches on BLPs.
Collapse
Affiliation(s)
- Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
20
|
Gao J, Miao Z, Zhang Z, Wei H, Kurgan L. Prediction of Ion Channels and their Types from Protein Sequences: Comprehensive Review and Comparative Assessment. Curr Drug Targets 2020; 20:579-592. [PMID: 30360734 DOI: 10.2174/1389450119666181022153942] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2018] [Revised: 10/03/2018] [Accepted: 10/04/2018] [Indexed: 12/20/2022]
Abstract
BACKGROUND Ion channels are a large and growing protein family. Many of them are associated with diseases, and consequently, they are targets for over 700 drugs. Discovery of new ion channels is facilitated with computational methods that predict ion channels and their types from protein sequences. However, these methods were never comprehensively compared and evaluated. OBJECTIVE We offer first-of-its-kind comprehensive survey of the sequence-based predictors of ion channels. We describe eight predictors that include five methods that predict ion channels, their types, and four classes of the voltage-gated channels. We also develop and use a new benchmark dataset to perform comparative empirical analysis of the three currently available predictors. RESULTS While several methods that rely on different designs were published, only a few of them are currently available and offer a broad scope of predictions. Support and availability after publication should be required when new methods are considered for publication. Empirical analysis shows strong performance for the prediction of ion channels and modest performance for the prediction of ion channel types and voltage-gated channel classes. We identify a substantial weakness of current methods that cannot accurately predict ion channels that are categorized into multiple classes/types. CONCLUSION Several predictors of ion channels are available to the end users. They offer practical levels of predictive quality. Methods that rely on a larger and more diverse set of predictive inputs (such as PSIONplus) are more accurate. New tools that address multi-label prediction of ion channels should be developed.
Collapse
Affiliation(s)
- Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Zhen Miao
- College of Life Sciences, Nankai University, Tianjin, China
| | - Zhaopeng Zhang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Hong Wei
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, United States
| |
Collapse
|
21
|
Gong J, Chen Y, Pu F, Sun P, He F, Zhang L, Li Y, Ma Z, Wang H. Understanding Membrane Protein Drug Targets in Computational Perspective. Curr Drug Targets 2020; 20:551-564. [PMID: 30516106 DOI: 10.2174/1389450120666181204164721] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 09/03/2018] [Accepted: 09/04/2018] [Indexed: 01/16/2023]
Abstract
Membrane proteins play crucial physiological roles in vivo and are the major category of drug targets for pharmaceuticals. The research on membrane protein is a significant part in the drug discovery. The biological process is a cycled network, and the membrane protein is a vital hub in the network since most drugs achieve the therapeutic effect via interacting with the membrane protein. In this review, typical membrane protein targets are described, including GPCRs, transporters and ion channels. Also, we conclude network servers and databases that are referring to the drug, drug-target information and their relevant data. Furthermore, we chiefly introduce the development and practice of modern medicines, particularly demonstrating a series of state-of-the-art computational models for the prediction of drug-target interaction containing network-based approach and machine-learningbased approach as well as showing current achievements. Finally, we discuss the prospective orientation of drug repurposing and drug discovery as well as propose some improved framework in bioactivity data, created or improved predicted approaches, alternative understanding approaches of drugs bioactivity and their biological processes.
Collapse
Affiliation(s)
- Jianting Gong
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Yongbing Chen
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Feng Pu
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Pingping Sun
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Fei He
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Li Zhang
- School of Computer Science and Engineering, Changchun University of Technology, Changchun, China
| | - Yanwen Li
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| | - Han Wang
- School of Information Science and Technology, Northeast Normal University, Changchun, China.,Institution of Computational Biology, Northeast Normal University, Changchun, China
| |
Collapse
|
22
|
PSIONplus m Server for Accurate Multi-Label Prediction of Ion Channels and Their Types. Biomolecules 2020; 10:biom10060876. [PMID: 32517331 PMCID: PMC7355608 DOI: 10.3390/biom10060876] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 05/28/2020] [Accepted: 06/04/2020] [Indexed: 11/26/2022] Open
Abstract
Computational prediction of ion channels facilitates the identification of putative ion channels from protein sequences. Several predictors of ion channels and their types were developed in the last quindecennial. While they offer reasonably accurate predictions, they also suffer a few shortcomings including lack of availability, parallel prediction mode, single-label prediction (inability to predict multiple channel subtypes), and incomplete scope (inability to predict subtypes of the voltage-gated channels). We developed a first-of-its-kind PSIONplusm method that performs sequential multi-label prediction of ion channels and their subtypes for both voltage-gated and ligand-gated channels. PSIONplusm sequentially combines the outputs produced by three support vector machine-based models from the PSIONplus predictor and is available as a webserver. Empirical tests show that PSIONplusm outperforms current methods for the multi-label prediction of the ion channel subtypes. This includes the existing single-label methods that are available to the users, a naïve multi-label predictor that combines results produced by multiple single-label methods, and methods that make predictions based on sequence alignment and domain annotations. We also found that the current methods (including PSIONplusm) fail to accurately predict a few of the least frequently occurring ion channel subtypes. Thus, new predictors should be developed when a larger quantity of annotated ion channels will be available to train predictive models.
Collapse
|
23
|
Smolarczyk T, Roterman-Konieczna I, Stapor K. Protein Secondary Structure Prediction: A Review of Progress and Directions. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191017104639] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Over the last few decades, a search for the theory of protein folding has
grown into a full-fledged research field at the intersection of biology, chemistry and informatics.
Despite enormous effort, there are still open questions and challenges, like understanding the rules
by which amino acid sequence determines protein secondary structure.
Objective:
In this review, we depict the progress of the prediction methods over the years and
identify sources of improvement.
Methods:
The protein secondary structure prediction problem is described followed by the discussion
on theoretical limitations, description of the commonly used data sets, features and a review
of three generations of methods with the focus on the most recent advances. Additionally, methods
with available online servers are assessed on the independent data set.
Results:
The state-of-the-art methods are currently reaching almost 88% for 3-class prediction and
76.5% for an 8-class prediction.
Conclusion:
This review summarizes recent advances and outlines further research directions.
Collapse
Affiliation(s)
- Tomasz Smolarczyk
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| | - Irena Roterman-Konieczna
- Department of Bioinformatics and Telemedicine, Jagiellonian University Medical College, Krakow, Poland
| | - Katarzyna Stapor
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
24
|
Shao Y, Chou KC. pLoc_Deep-mEuk: Predict Subcellular Localization of Eukaryotic Proteins by Deep Learning. ACTA ACUST UNITED AC 2020. [DOI: 10.4236/ns.2020.126034] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
25
|
Hu Y, Zhao T, Zhang N, Zhang Y, Cheng L. A Review of Recent Advances and Research on Drug Target Identification Methods. Curr Drug Metab 2019; 20:209-216. [PMID: 30251599 DOI: 10.2174/1389200219666180925091851] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 01/01/2018] [Accepted: 08/02/2018] [Indexed: 12/14/2022]
Abstract
BACKGROUND From a therapeutic viewpoint, understanding how drugs bind and regulate the functions of their target proteins to protect against disease is crucial. The identification of drug targets plays a significant role in drug discovery and studying the mechanisms of diseases. Therefore the development of methods to identify drug targets has become a popular issue. METHODS We systematically review the recent work on identifying drug targets from the view of data and method. We compiled several databases that collect data more comprehensively and introduced several commonly used databases. Then divided the methods into two categories: biological experiments and machine learning, each of which is subdivided into different subclasses and described in detail. RESULTS Machine learning algorithms are the majority of new methods. Generally, an optimal set of features is chosen to predict successful new drug targets with similar properties. The most widely used features include sequence properties, network topological features, structural properties, and subcellular locations. Since various machine learning methods exist, improving their performance requires combining a better subset of features and choosing the appropriate model for the various datasets involved. CONCLUSION The application of experimental and computational methods in protein drug target identification has become increasingly popular in recent years. Current biological and computational methods still have many limitations due to unbalanced and incomplete datasets or imperfect feature selection methods.
Collapse
Affiliation(s)
- Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ningyi Zhang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150088, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
26
|
Khan YD, Amin N, Hussain W, Rasool N, Khan SA, Chou KC. iProtease-PseAAC(2L): A two-layer predictor for identifying proteases and their types using Chou's 5-step-rule and general PseAAC. Anal Biochem 2019; 588:113477. [PMID: 31654612 DOI: 10.1016/j.ab.2019.113477] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Revised: 10/02/2019] [Accepted: 10/18/2019] [Indexed: 12/16/2022]
Abstract
Proteases are a type of enzymes, which perform the process of proteolysis. Proteolysis normally refers to protein and peptide degradation which is crucial for the survival, growth and wellbeing of a cell. Moreover, proteases have a strong association with therapeutics and drug development. The proteases are classified into five different types according to their nature and physiochemical characteristics. Mostly the methods used to differentiate protease from other proteins and identify their class requires a clinical test which is usually time-consuming and operator dependent. Herein, we report a classifier named iProtease-PseAAC (2L) for identifying proteases and their classes. The predictor is developed employing the flow of 5-step rule, initiating from the collection of benchmark dataset and terminating at the development of predictor. Rigorous verification and validation tests are performed and metrics are collected to calculate the authenticity of the trained model. The self-consistency validation gives the 98.32% accuracy, for cross-validation the accuracy is 90.71% and jackknife gives 96.07% accuracy. The average accuracy for level-2 i.e. protease classification is 95.77%. Based on the above-mentioned results, it is concluded that iProtease-PseAAC (2L) has the great ability to identify the proteases and their classes using a given protein sequence.
Collapse
Affiliation(s)
- Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, P.O. Box 10033, C-II, Johar Town, Lahore, 54770, Pakistan.
| | - Najm Amin
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, P.O. Box 10033, C-II, Johar Town, Lahore, 54770, Pakistan
| | - Waqar Hussain
- National Center of Artificial Intelligence, Punjab University College of Information Technology, University of the Punjab, Lahore, Pakistan
| | - Nouman Rasool
- Dr Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, 75270, Pakistan
| | - Sher Afzal Khan
- Faculty of Computing and Information Technology in Rabigh, Jeddah, 21577, Saudi Arabia; Abdul Wali Khan University, Department of Computer Sciences, Mardan, Pakistan
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, 02478, USA
| |
Collapse
|
27
|
Zhang M, Li F, Marquez-Lago TT, Leier A, Fan C, Kwoh CK, Chou KC, Song J, Jia C. MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics 2019; 35:2957-2965. [PMID: 30649179 PMCID: PMC6736106 DOI: 10.1093/bioinformatics/btz016] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 12/09/2018] [Accepted: 01/05/2019] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Promoters are short DNA consensus sequences that are localized proximal to the transcription start sites of genes, allowing transcription initiation of particular genes. However, the precise prediction of promoters remains a challenging task because individual promoters often differ from the consensus at one or more positions. RESULTS In this study, we present a new multi-layer computational approach, called MULTiPly, for recognizing promoters and their specific types. MULTiPly took into account the sequences themselves, including both local information such as k-tuple nucleotide composition, dinucleotide-based auto covariance and global information of the entire samples based on bi-profile Bayes and k-nearest neighbour feature encodings. Specifically, the F-score feature selection method was applied to identify the best unique type of feature prediction results, in combination with other types of features that were subsequently added to further improve the prediction performance of MULTiPly. Benchmarking experiments on the benchmark dataset and comparisons with five state-of-the-art tools show that MULTiPly can achieve a better prediction performance on 5-fold cross-validation and jackknife tests. Moreover, the superiority of MULTiPly was also validated on a newly constructed independent test dataset. MULTiPly is expected to be used as a useful tool that will facilitate the discovery of both general and specific types of promoters in the post-genomic era. AVAILABILITY AND IMPLEMENTATION The MULTiPly webserver and curated datasets are freely available at http://flagshipnt.erc.monash.edu/MULTiPly/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Meng Zhang
- School of Science, Dalian Maritime University, Dalian, China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Tatiana T Marquez-Lago
- Department of Genetics, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - André Leier
- Department of Genetics, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Cunshuo Fan
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | | | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian, China
- College of Information Engineering, Northwest A&F University, Yangling, China
| |
Collapse
|
28
|
Chou KC. Proposing Pseudo Amino Acid Components is an Important Milestone for Proteome and Genome Analyses. Int J Pept Res Ther 2019. [DOI: 10.1007/s10989-019-09910-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
29
|
Lin J, Chen H, Li S, Liu Y, Li X, Yu B. Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging-SVM ensemble classifier. Artif Intell Med 2019; 98:35-47. [DOI: 10.1016/j.artmed.2019.07.005] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2018] [Revised: 03/03/2019] [Accepted: 07/18/2019] [Indexed: 12/14/2022]
|
30
|
Han K, Wang M, Zhang L, Wang Y, Guo M, Zhao M, Zhao Q, Zhang Y, Zeng N, Wang C. Predicting Ion Channels Genes and Their Types With Machine Learning Techniques. Front Genet 2019; 10:399. [PMID: 31130983 PMCID: PMC6510169 DOI: 10.3389/fgene.2019.00399] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Accepted: 04/12/2019] [Indexed: 02/01/2023] Open
Abstract
Motivation: The number of ion channels is increasing rapidly. As many of them are associated with diseases, they are the targets of more than 700 drugs. The discovery of new ion channels is facilitated by computational methods that predict ion channels and their types from protein sequences. Methods: We used the SVMProt and the k-skip-n-gram methods to extract the feature vectors of ion channels, and obtained 188- and 400-dimensional features, respectively. The 188- and 400-dimensional features were combined to obtain 588-dimensional features. We then employed the maximum-relevance-maximum-distance method to reduce the dimensions of the 588-dimensional features. Finally, the support vector machine and random forest methods were used to build the prediction models to evaluate the classification effect. Results: Different methods were employed to extract various feature vectors, and after effective dimensionality reduction, different classifiers were used to classify the ion channels. We extracted the ion channel data from the Universal Protein Resource (UniProt, http://www.uniprot.org/) and Ligand-Gated Ion Channel databases (http://www.ebi.ac.uk/compneur-srv/LGICdb/LGICdb.php), and then verified the performance of the classifiers after screening. The findings of this study could inform the research and development of drugs.
Collapse
Affiliation(s)
- Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, China
| | - Miao Wang
- Life Sciences and Environmental Sciences Development Center, Harbin University of Commerce, Harbin, China
| | - Lei Zhang
- Life Sciences and Environmental Sciences Development Center, Harbin University of Commerce, Harbin, China
| | - Ying Wang
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Mian Guo
- Department of Neurosurgery, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Ming Zhao
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, China
| | - Qian Zhao
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, China
| | - Yu Zhang
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, China
| | - Nianyin Zeng
- Department of Instrumental and Electrical Engineering, Xiamen University, Xiamen, China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
31
|
Akbar S, Hayat M, Kabir M, Iqbal M. iAFP-gap-SMOTE: An Efficient Feature Extraction Scheme Gapped Dipeptide Composition is Coupled with an Oversampling Technique for Identification of Antifreeze Proteins. LETT ORG CHEM 2019. [DOI: 10.2174/1570178615666180816101653] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Antifreeze proteins (AFPs) perform distinguishable roles in maintaining homeostatic conditions of living organisms and protect their cell and body from freezing in extremely cold conditions. Owing to high diversity in protein sequences and structures, the discrimination of AFPs from non- AFPs through experimental approaches is expensive and lengthy. It is, therefore, vastly desirable to propose a computational intelligent and high throughput model that truly reflects AFPs quickly and accurately. In a sequel, a new predictor called “iAFP-gap-SMOTE” is proposed for the identification of AFPs. Protein sequences are expressed by adopting three numerical feature extraction schemes namely; Split Amino Acid Composition, G-gap di-peptide Composition and Reduce Amino Acid alphabet composition. Usually, classification hypothesis biased towards majority class in case of the imbalanced dataset. Oversampling technique Synthetic Minority Over-sampling Technique is employed in order to increase the instances of the lower class and control the biasness. 10-fold cross-validation test is applied to appraise the success rates of “iAFP-gap-SMOTE” model. After the empirical investigation, “iAFP-gap-SMOTE” model obtained 95.02% accuracy. The comparison suggested that the accuracy of” iAFP-gap-SMOTE” model is higher than that of the present techniques in the literature so far. It is greatly recommended that our proposed model “iAFP-gap-SMOTE” might be helpful for the research community and academia.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| | - Muhammad Kabir
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| | - Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| |
Collapse
|
32
|
Abstract
Background:DNA-binding proteins, binding to DNA, widely exist in living cells, participating in many cell activities. They can participate some DNA-related cell activities, for instance DNA replication, transcription, recombination, and DNA repair.Objective:Given the importance of DNA-binding proteins, studies for predicting the DNA-binding proteins have been a popular issue over the past decades. In this article, we review current machine-learning methods which research on the prediction of DNA-binding proteins through feature representation methods, classifiers, measurements, dataset and existing web server.Method:The prediction methods of DNA-binding protein can be divided into two types, based on amino acid composition and based on protein structure. In this article, we accord to the two types methods to introduce the application of machine learning in DNA-binding proteins prediction.Results:Machine learning plays an important role in the classification of DNA-binding proteins, and the result is better. The best ACC is above 80%.Conclusion:Machine learning can be widely used in many aspects of biological information, especially in protein classification. Some issues should be considered in future work. First, the relationship between the number of features and performance must be explored. Second, many features are used to predict DNA-binding proteins and propose solutions for high-dimensional spaces.
Collapse
Affiliation(s)
- Kaiyang Qu
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Leyi Wei
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
33
|
Characterization of human proteins with different subcellular localizations by topological and biological properties. Genomics 2018; 111:1831-1838. [PMID: 30543849 DOI: 10.1016/j.ygeno.2018.12.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 12/02/2018] [Accepted: 12/07/2018] [Indexed: 11/20/2022]
Abstract
Knowing the protein localization can provide valuable information resource for elucidating protein function. In recent years, with the advances of human genomics and proteomics, it is possible to characterize human proteins that are located in different subcellular localizations. In this study, we used the topological properties and biological properties to characterize human proteins with six subcellular localizations. Almost all of these properties were found to be significantly different among six protein categories. Network topology analysis indicated that several significant topological properties, including the degree and k-core, were higher for the mitochondrial proteins. Biological property analysis showed that the nuclear proteins appeared to be correlated with important biological function. We hope these findings may provide some important help for comprehensive understanding the biological function of proteins, and prediction of protein subcellular localizations in human.
Collapse
|
34
|
Zhang W, Yue X, Tang G, Wu W, Huang F, Zhang X. SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions. PLoS Comput Biol 2018; 14:e1006616. [PMID: 30533006 PMCID: PMC6331124 DOI: 10.1371/journal.pcbi.1006616] [Citation(s) in RCA: 93] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Revised: 01/14/2019] [Accepted: 11/02/2018] [Indexed: 01/12/2023] Open
Abstract
LncRNA-protein interactions play important roles in post-transcriptional gene regulation, poly-adenylation, splicing and translation. Identification of lncRNA-protein interactions helps to understand lncRNA-related activities. Existing computational methods utilize multiple lncRNA features or multiple protein features to predict lncRNA-protein interactions, but features are not available for all lncRNAs or proteins; most of existing methods are not capable of predicting interacting proteins (or lncRNAs) for new lncRNAs (or proteins), which don’t have known interactions. In this paper, we propose the sequence-based feature projection ensemble learning method, “SFPEL-LPI”, to predict lncRNA-protein interactions. First, SFPEL-LPI extracts lncRNA sequence-based features and protein sequence-based features. Second, SFPEL-LPI calculates multiple lncRNA-lncRNA similarities and protein-protein similarities by using lncRNA sequences, protein sequences and known lncRNA-protein interactions. Then, SFPEL-LPI combines multiple similarities and multiple features with a feature projection ensemble learning frame. In computational experiments, SFPEL-LPI accurately predicts lncRNA-protein associations and outperforms other state-of-the-art methods. More importantly, SFPEL-LPI can be applied to new lncRNAs (or proteins). The case studies demonstrate that our method can find out novel lncRNA-protein interactions, which are confirmed by literature. Finally, we construct a user-friendly web server, available at http://www.bioinfotech.cn/SFPEL-LPI/. LncRNA-protein interactions play important roles in post-transcriptional gene regulation, poly-adenylation, splicing and translation. Identification of lncRNA-protein interactions helps to understand lncRNA-related activities. In this paper, we propose a novel computational method “SFPEL-LPI” to predict lncRNA-protein interactions. SFPEL-LPI makes use of lncRNA sequences, protein sequences and known lncRNA-protein associations to extract features and calculate similarities for lncRNAs and proteins, and then combines them with a feature projection ensemble learning frame. SFPEL-LPI can predict unobserved interactions between lncRNAs and proteins, and also can make predictions for new lncRNAs (or proteins), which have no interactions with any proteins (or lncRNAs). SFPEL-LPI produces high-accuracy performances on the benchmark dataset when evaluated by five-fold cross validation, and outperforms state-of-the-art methods. The case studies demonstrate that SFPEL-LPI can find out novel associations, which are confirmed by literature. To facilitate the lncRNA-protein interaction prediction, we develop a user-friendly web server, available at http://www.bioinfotech.cn/SFPEL-LPI/.
Collapse
Affiliation(s)
- Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, China
- School of Computer Science, Wuhan University, Wuhan, China
- * E-mail: , (WZ); (XZ)
| | - Xiang Yue
- Department of Computer Science and Engineering, The Ohio State University, Columbus, United States of America
| | - Guifeng Tang
- School of Computer Science, Wuhan University, Wuhan, China
| | - Wenjian Wu
- Electronic Information School, Wuhan University, Wuhan, China
| | - Feng Huang
- School of Computer Science, Wuhan University, Wuhan, China
| | - Xining Zhang
- School of Computer Science, Wuhan University, Wuhan, China
- * E-mail: , (WZ); (XZ)
| |
Collapse
|
35
|
Set of approaches based on 3D structure and position specific-scoring matrix for predicting DNA-binding proteins. Bioinformatics 2018; 35:1844-1851. [DOI: 10.1093/bioinformatics/bty912] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 10/08/2018] [Accepted: 10/31/2018] [Indexed: 11/14/2022] Open
|
36
|
Chen W, Feng P, Ding H, Lin H. Classifying Included and Excluded Exons in Exon Skipping Event Using Histone Modifications. Front Genet 2018; 9:433. [PMID: 30327665 PMCID: PMC6174203 DOI: 10.3389/fgene.2018.00433] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 09/12/2018] [Indexed: 12/15/2022] Open
Abstract
Alternative splicing (AS) not only ensures the diversity of gene expression products, but also closely correlated with genetic diseases. Therefore, knowledge about regulatory mechanisms of AS will provide useful clues for understanding its biological functions. In the current study, a random forest based method was developed to classify included and excluded exons in exon skipping event. In this method, the samples in the dataset were encoded by using optimal histone modification features which were optimized by using the Maximum Relevance Maximum Distance (MRMD) feature selection technique. The proposed method obtained an accuracy of 72.91% in 10-fold cross validation test and outperformed existing methods. Meanwhile, we also systematically analyzed the distribution of histone modifications between included and excluded exons and discovered their preference in both kinds of exons, which might provide insights into researches on the regulatory mechanisms of alternative splicing.
Collapse
Affiliation(s)
- Wei Chen
- Center for Genomics and Computational Biology, School of Life Science, North China University of Science and Technology, Tangshan, China.,Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Pengmian Feng
- School of Public Health, North China University of Science and Technology, Tangshan, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics and Center for Information in Biomedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics and Center for Information in Biomedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
37
|
Yang H, Qiu WR, Liu G, Guo FB, Chen W, Chou KC, Lin H. iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC. Int J Biol Sci 2018; 14:883-891. [PMID: 29989083 PMCID: PMC6036749 DOI: 10.7150/ijbs.24616] [Citation(s) in RCA: 135] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Accepted: 02/04/2018] [Indexed: 02/06/2023] Open
Abstract
Meiotic recombination caused by meiotic double-strand DNA breaks. In some regions the frequency of DNA recombination is relatively higher, while in other regions the frequency is lower: the former is usually called "recombination hotspot", while the latter the "recombination coldspot". Information of the hot and cold spots may provide important clues for understanding the mechanism of genome revolution. Therefore, it is important to accurately predict these spots. In this study, we rebuilt the benchmark dataset by unifying its samples with a same length (131 bp). Based on such a foundation and using SVM (Support Vector Machine) classifier, a new predictor called "iRSpot-Pse6NC" was developed by incorporating the key hexamer features into the general PseKNC (Pseudo K-tuple Nucleotide Composition) via the binomial distribution approach. It has been observed via rigorous cross-validations that the proposed predictor is superior to its counterparts in overall accuracy, stability, sensitivity and specificity. For the convenience of most experimental scientists, the web-server for iRSpot-Pse6NC has been established at http://lin-group.cn/server/iRSpot-Pse6NC, by which users can easily obtain their desired result without the need to go through the detailed mathematical equations involved.
Collapse
Affiliation(s)
- Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wang-Ren Qiu
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.,Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, 333403, China
| | - Guoqing Liu
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Feng-Biao Guo
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Chen
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.,Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, China.,Gordon Life Science Institute, Boston, MA 02478, USA
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.,Gordon Life Science Institute, Boston, MA 02478, USA
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.,Gordon Life Science Institute, Boston, MA 02478, USA
| |
Collapse
|
38
|
Tang H, Zhao YW, Zou P, Zhang CM, Chen R, Huang P, Lin H. HBPred: a tool to identify growth hormone-binding proteins. Int J Biol Sci 2018; 14:957-964. [PMID: 29989085 PMCID: PMC6036759 DOI: 10.7150/ijbs.24174] [Citation(s) in RCA: 136] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 01/15/2018] [Indexed: 12/19/2022] Open
Abstract
Hormone-binding protein (HBP) is a kind of soluble carrier protein and can selectively and non-covalently interact with hormone. HBP plays an important role in life growth, but its function is still unclear. Correct recognition of HBPs is the first step to further study their function and understand their biological process. However, it is difficult to correctly recognize HBPs from more and more proteins through traditional biochemical experiments because of high experimental cost and long experimental period. To overcome these disadvantages, we designed a computational method for identifying HBPs accurately in the study. At first, we collected HBP data from UniProt to establish a high-quality benchmark dataset. Based on the dataset, the dipeptide composition was extracted from HBP residue sequences. In order to find out the optimal features to provide key clues for HBP identification, the analysis of various (ANOVA) was performed for feature ranking. The optimal features were selected through the incremental feature selection strategy. Subsequently, the features were inputted into support vector machine (SVM) for prediction model construction. Jackknife cross-validation results showed that 88.6% HBPs and 81.3% non-HBPs were correctly recognized, suggesting that our proposed model was powerful. This study provides a new strategy to identify HBPs. Moreover, based on the proposed model, we established a webserver called HBPred, which could be freely accessed at http://lin-group.cn/server/HBPred.
Collapse
Affiliation(s)
- Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Ya-Wei Zhao
- Key Laboratory for NeuroInformation of Ministry of Education, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Ping Zou
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Chun-Mei Zhang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Rong Chen
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Po Huang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
39
|
|
40
|
Kang J, Fang Y, Yao P, Li N, Tang Q, Huang J. NeuroPP: A Tool for the Prediction of Neuropeptide Precursors Based on Optimal Sequence Composition. Interdiscip Sci 2018. [DOI: 10.1007/s12539-018-0287-2] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
41
|
Abstract
Computational identification of special protein molecules is a key issue in understanding protein function. It can guide molecular experiments and help to save costs. I assessed 18 papers published in the special issue of Int. J. Mol. Sci., and also discussed the related works. The computational methods employed in this special issue focused on machine learning, network analysis, and molecular docking. New methods and new topics were also proposed. There were in addition several wet experiments, with proven results showing promise. I hope our special issue will help in protein molecules identification researches.
Collapse
|
42
|
Zou Q, He W. Special Protein Molecules Computational Identification. Int J Mol Sci 2018; 19:ijms19020536. [PMID: 29439426 PMCID: PMC5855758 DOI: 10.3390/ijms19020536] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Revised: 02/02/2018] [Accepted: 02/10/2018] [Indexed: 01/29/2023] Open
Abstract
Computational identification of special protein molecules is a key issue in understanding protein function. It can guide molecular experiments and help to save costs. I assessed 18 papers published in the special issue of Int. J. Mol. Sci., and also discussed the related works. The computational methods employed in this special issue focused on machine learning, network analysis, and molecular docking. New methods and new topics were also proposed. There were in addition several wet experiments, with proven results showing promise. I hope our special issue will help in protein molecules identification researches.
Collapse
Affiliation(s)
- Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin 300354, China.
| | - Wenying He
- School of Computer Science and Technology, Tianjin University, Tianjin 300354, China.
| |
Collapse
|
43
|
Meher PK, Sahu TK, Gahoi S, Rao AR. ir-HSP: Improved Recognition of Heat Shock Proteins, Their Families and Sub-types Based On g-Spaced Di-peptide Features and Support Vector Machine. Front Genet 2018; 8:235. [PMID: 29379521 PMCID: PMC5770798 DOI: 10.3389/fgene.2017.00235] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 12/27/2017] [Indexed: 12/24/2022] Open
Abstract
Heat shock proteins (HSPs) play a pivotal role in cell growth and variability. Since conventional approaches are expensive and voluminous protein sequence information is available in the post-genomic era, development of an automated and accurate computational tool is highly desirable for prediction of HSPs, their families and sub-types. Thus, we propose a computational approach for reliable prediction of all these components in a single framework and with higher accuracy as well. The proposed approach achieved an overall accuracy of ~84% in predicting HSPs, ~97% in predicting six different families of HSPs, and ~94% in predicting four types of DnaJ proteins, with bench mark datasets. The developed approach also achieved higher accuracy as compared to most of the existing approaches. For easy prediction of HSPs by experimental scientists, a user friendly web server ir-HSP is made freely accessible at http://cabgrid.res.in:8080/ir-hsp. The ir-HSP was further evaluated for proteome-wide identification of HSPs by using proteome datasets of eight different species, and ~50% of the predicted HSPs in each species were found to be annotated with InterPro HSP families/domains. Thus, the developed computational method is expected to supplement the currently available approaches for prediction of HSPs, to the extent of their families and sub-types.
Collapse
Affiliation(s)
- Prabina K Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Tanmaya K Sahu
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Shachi Gahoi
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Atmakuri R Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| |
Collapse
|
44
|
iACP: a sequence-based tool for identifying anticancer peptides. Oncotarget 2017; 7:16895-909. [PMID: 26942877 PMCID: PMC4941358 DOI: 10.18632/oncotarget.7815] [Citation(s) in RCA: 300] [Impact Index Per Article: 42.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Accepted: 02/11/2016] [Indexed: 02/07/2023] Open
Abstract
Cancer remains a major killer worldwide. Traditional methods of cancer treatment are expensive and have some deleterious side effects on normal cells. Fortunately, the discovery of anticancer peptides (ACPs) has paved a new way for cancer treatment. With the explosive growth of peptide sequences generated in the post genomic age, it is highly desired to develop computational methods for rapidly and effectively identifying ACPs, so as to speed up their application in treating cancer. Here we report a sequence-based predictor called iACP developed by the approach of optimizing the g-gap dipeptide components. It was demonstrated by rigorous cross-validations that the new predictor remarkably outperformed the existing predictors for the same purpose in both overall accuracy and stability. For the convenience of most experimental scientists, a publicly accessible web-server for iACP has been established at http://lin.uestc.edu.cn/server/iACP, by which users can easily obtain their desired results.
Collapse
|
45
|
Rahman MS, Rahman MK, Kaykobad M, Rahman MS. isGPT: An optimized model to identify sub-Golgi protein types using SVM and Random Forest based feature selection. Artif Intell Med 2017; 84:90-100. [PMID: 29183738 DOI: 10.1016/j.artmed.2017.11.003] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 11/13/2017] [Accepted: 11/17/2017] [Indexed: 10/18/2022]
Abstract
The Golgi Apparatus (GA) is a key organelle for protein synthesis within the eukaryotic cell. The main task of GA is to modify and sort proteins for transport throughout the cell. Proteins permeate through the GA on the ER (Endoplasmic Reticulum) facing side (cis side) and depart on the other side (trans side). Based on this phenomenon, we get two types of GA proteins, namely, cis-Golgi protein and trans-Golgi protein. Any dysfunction of GA proteins can result in congenital glycosylation disorders and some other forms of difficulties that may lead to neurodegenerative and inherited diseases like diabetes, cancer and cystic fibrosis. So, the exact classification of GA proteins may contribute to drug development which will further help in medication. In this paper, we focus on building a new computational model that not only introduces easy ways to extract features from protein sequences but also optimizes classification of trans-Golgi and cis-Golgi proteins. After feature extraction, we have employed Random Forest (RF) model to rank the features based on the importance score obtained from it. After selecting the top ranked features, we have applied Support Vector Machine (SVM) to classify the sub-Golgi proteins. We have trained regression model as well as classification model and found the former to be superior. The model shows improved performance over all previous methods. As the benchmark dataset is significantly imbalanced, we have applied Synthetic Minority Over-sampling Technique (SMOTE) to the dataset to make it balanced and have conducted experiments on both versions. Our method, namely, identification of sub-Golgi Protein Types (isGPT), achieves accuracy values of 95.4%, 95.9% and 95.3% for 10-fold cross-validation test, jackknife test and independent test respectively. According to different performance metrics, isGPT performs better than state-of-the-art techniques. The source code of isGPT, along with relevant dataset and detailed experimental results, can be found at https://github.com/srautonu/isGPT.
Collapse
Affiliation(s)
- M Saifur Rahman
- Department of CSE, BUET, ECE Building, West Palasi, Dhaka 1205, Bangladesh.
| | | | - M Kaykobad
- Department of CSE, BUET, ECE Building, West Palasi, Dhaka 1205, Bangladesh.
| | - M Sohel Rahman
- Department of CSE, BUET, ECE Building, West Palasi, Dhaka 1205, Bangladesh.
| |
Collapse
|
46
|
Kumar R, Kumari B, Kumar M. Prediction of endoplasmic reticulum resident proteins using fragmented amino acid composition and support vector machine. PeerJ 2017; 5:e3561. [PMID: 28890846 PMCID: PMC5588793 DOI: 10.7717/peerj.3561] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2017] [Accepted: 06/20/2017] [Indexed: 12/15/2022] Open
Abstract
Background The endoplasmic reticulum plays an important role in many cellular processes, which includes protein synthesis, folding and post-translational processing of newly synthesized proteins. It is also the site for quality control of misfolded proteins and entry point of extracellular proteins to the secretory pathway. Hence at any given point of time, endoplasmic reticulum contains two different cohorts of proteins, (i) proteins involved in endoplasmic reticulum-specific function, which reside in the lumen of the endoplasmic reticulum, called as endoplasmic reticulum resident proteins and (ii) proteins which are in process of moving to the extracellular space. Thus, endoplasmic reticulum resident proteins must somehow be distinguished from newly synthesized secretory proteins, which pass through the endoplasmic reticulum on their way out of the cell. Approximately only 50% of the proteins used in this study as training data had endoplasmic reticulum retention signal, which shows that these signals are not essentially present in all endoplasmic reticulum resident proteins. This also strongly indicates the role of additional factors in retention of endoplasmic reticulum-specific proteins inside the endoplasmic reticulum. Methods This is a support vector machine based method, where we had used different forms of protein features as inputs for support vector machine to develop the prediction models. During training leave-one-out approach of cross-validation was used. Maximum performance was obtained with a combination of amino acid compositions of different part of proteins. Results In this study, we have reported a novel support vector machine based method for predicting endoplasmic reticulum resident proteins, named as ERPred. During training we achieved a maximum accuracy of 81.42% with leave-one-out approach of cross-validation. When evaluated on independent dataset, ERPred did prediction with sensitivity of 72.31% and specificity of 83.69%. We have also annotated six different proteomes to predict the candidate endoplasmic reticulum resident proteins in them. A webserver, ERPred, was developed to make the method available to the scientific community, which can be accessed at http://proteininformatics.org/mkumar/erpred/index.html. Discussion We found that out of 124 proteins of the training dataset, only 66 proteins had endoplasmic reticulum retention signals, which shows that these signals are not an absolute necessity for endoplasmic reticulum resident proteins to remain inside the endoplasmic reticulum. This observation also strongly indicates the role of additional factors in retention of proteins inside the endoplasmic reticulum. Our proposed predictor, ERPred, is a signal independent tool. It is tuned for the prediction of endoplasmic reticulum resident proteins, even if the query protein does not contain specific ER-retention signal.
Collapse
Affiliation(s)
- Ravindra Kumar
- Department of Biophysics, University of Delhi South Campus, New Delhi, India.,Current affiliation: Newe-Ya'ar Research Center, Agricultural Research Organization, Ramat Yishay, Israel
| | - Bandana Kumari
- Department of Biophysics, University of Delhi South Campus, New Delhi, India
| | - Manish Kumar
- Department of Biophysics, University of Delhi South Campus, New Delhi, India
| |
Collapse
|
47
|
Zhao YW, Su ZD, Yang W, Lin H, Chen W, Tang H. IonchanPred 2.0: A Tool to Predict Ion Channels and Their Types. Int J Mol Sci 2017; 18:ijms18091838. [PMID: 28837067 PMCID: PMC5618487 DOI: 10.3390/ijms18091838] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Revised: 08/21/2017] [Accepted: 08/21/2017] [Indexed: 12/11/2022] Open
Abstract
Ion channels (IC) are ion-permeable protein pores located in the lipid membranes of all cells. Different ion channels have unique functions in different biological processes. Due to the rapid development of high-throughput mass spectrometry, proteomic data are rapidly accumulating and provide us an opportunity to systematically investigate and predict ion channels and their types. In this paper, we constructed a support vector machine (SVM)-based model to quickly predict ion channels and their types. By considering the residue sequence information and their physicochemical properties, a novel feature-extracted method which combined dipeptide composition with the physicochemical correlation between two residues was employed. A feature selection strategy was used to improve the performance of the model. Comparison results of in jackknife cross-validation demonstrated that our method was superior to other methods for predicting ion channels and their types. Based on the model, we built a web server called IonchanPred which can be freely accessed from http://lin.uestc.edu.cn/server/IonchanPredv2.0.
Collapse
Affiliation(s)
- Ya-Wei Zhao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Zhen-Dong Su
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Wuritu Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
- Development and Planning Department, Inner Mongolia University, Hohhot 010021, China.
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Wei Chen
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, China.
| | - Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China.
| |
Collapse
|
48
|
Moreira IS, Koukos PI, Melo R, Almeida JG, Preto AJ, Schaarschmidt J, Trellet M, Gümüş ZH, Costa J, Bonvin AMJJ. SpotOn: High Accuracy Identification of Protein-Protein Interface Hot-Spots. Sci Rep 2017; 7:8007. [PMID: 28808256 PMCID: PMC5556074 DOI: 10.1038/s41598-017-08321-2] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Accepted: 07/07/2017] [Indexed: 12/21/2022] Open
Abstract
We present SpotOn, a web server to identify and classify interfacial residues as Hot-Spots (HS) and Null-Spots (NS). SpotON implements a robust algorithm with a demonstrated accuracy of 0.95 and sensitivity of 0.98 on an independent test set. The predictor was developed using an ensemble machine learning approach with up-sampling of the minor class. It was trained on 53 complexes using various features, based on both protein 3D structure and sequence. The SpotOn web interface is freely available at: http://milou.science.uu.nl/services/SPOTON/.
Collapse
Affiliation(s)
- Irina S Moreira
- CNC - Center for Neuroscience and Cell Biology; Rua Larga, FMUC, Polo I, 1°andar, Universidade de Coimbra, 3004-517, Coimbra, Portugal. .,Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, 3584CH, The Netherlands.
| | - Panagiotis I Koukos
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, 3584CH, The Netherlands
| | - Rita Melo
- CNC - Center for Neuroscience and Cell Biology; Rua Larga, FMUC, Polo I, 1°andar, Universidade de Coimbra, 3004-517, Coimbra, Portugal.,Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, Estrada Nacional 10 (ao km 139,7), 2695-066, Bobadela LRS, Portugal
| | - Jose G Almeida
- CNC - Center for Neuroscience and Cell Biology; Rua Larga, FMUC, Polo I, 1°andar, Universidade de Coimbra, 3004-517, Coimbra, Portugal
| | - Antonio J Preto
- CNC - Center for Neuroscience and Cell Biology; Rua Larga, FMUC, Polo I, 1°andar, Universidade de Coimbra, 3004-517, Coimbra, Portugal
| | - Joerg Schaarschmidt
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, 3584CH, The Netherlands
| | - Mikael Trellet
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, 3584CH, The Netherlands
| | - Zeynep H Gümüş
- Department of Genetics and Genomics and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Joaquim Costa
- CMUP/FCUP, Centro de Matemática da Universidade do Porto, Faculdade de Ciências, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - Alexandre M J J Bonvin
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, 3584CH, The Netherlands.
| |
Collapse
|
49
|
Dao FY, Yang H, Su ZD, Yang W, Wu Y, Hui D, Chen W, Tang H, Lin H. Recent Advances in Conotoxin Classification by Using Machine Learning Methods. Molecules 2017; 22:molecules22071057. [PMID: 28672838 PMCID: PMC6152242 DOI: 10.3390/molecules22071057] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Revised: 06/12/2017] [Accepted: 06/19/2017] [Indexed: 11/16/2022] Open
Abstract
Conotoxins are disulfide-rich small peptides, which are invaluable peptides that target ion channel and neuronal receptors. Conotoxins have been demonstrated as potent pharmaceuticals in the treatment of a series of diseases, such as Alzheimer's disease, Parkinson's disease, and epilepsy. In addition, conotoxins are also ideal molecular templates for the development of new drug lead compounds and play important roles in neurobiological research as well. Thus, the accurate identification of conotoxin types will provide key clues for the biological research and clinical medicine. Generally, conotoxin types are confirmed when their sequence, structure, and function are experimentally validated. However, it is time-consuming and costly to acquire the structure and function information by using biochemical experiments. Therefore, it is important to develop computational tools for efficiently and effectively recognizing conotoxin types based on sequence information. In this work, we reviewed the current progress in computational identification of conotoxins in the following aspects: (i) construction of benchmark dataset; (ii) strategies for extracting sequence features; (iii) feature selection techniques; (iv) machine learning methods for classifying conotoxins; (v) the results obtained by these methods and the published tools; and (vi) future perspectives on conotoxin classification. The paper provides the basis for in-depth study of conotoxins and drug therapy research.
Collapse
Affiliation(s)
- Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Zhen-Dong Su
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Wuritu Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
- Development and Planning Department, Inner Mongolia University, Hohhot 010021, China.
| | - Yun Wu
- College of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China.
| | - Ding Hui
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Wei Chen
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, China.
| | - Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China.
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
50
|
Saidijam M, Karimi Dermani F, Sohrabi S, Patching SG. Efflux proteins at the blood-brain barrier: review and bioinformatics analysis. Xenobiotica 2017; 48:506-532. [PMID: 28481715 DOI: 10.1080/00498254.2017.1328148] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
1. Efflux proteins at the blood-brain barrier provide a mechanism for export of waste products of normal metabolism from the brain and help to maintain brain homeostasis. They also prevent entry into the brain of a wide range of potentially harmful compounds such as drugs and xenobiotics. 2. Conversely, efflux proteins also hinder delivery of therapeutic drugs to the brain and central nervous system used to treat brain tumours and neurological disorders. For bypassing efflux proteins, a comprehensive understanding of their structures, functions and molecular mechanisms is necessary, along with new strategies and technologies for delivery of drugs across the blood-brain barrier. 3. We review efflux proteins at the blood-brain barrier, classified as either ATP-binding cassette (ABC) transporters (P-gp, BCRP, MRPs) or solute carrier (SLC) transporters (OATP1A2, OATP1A4, OATP1C1, OATP2B1, OAT3, EAATs, PMAT/hENT4 and MATE1). 4. This includes information about substrate and inhibitor specificity, structural organisation and mechanism, membrane localisation, regulation of expression and activity, effects of diseases and conditions and the principal technique used for in vivo analysis of efflux protein activity: positron emission tomography (PET). 5. We also performed analyses of evolutionary relationships, membrane topologies and amino acid compositions of the proteins, and linked these to structure and function.
Collapse
Affiliation(s)
- Massoud Saidijam
- a Department of Molecular Medicine and Genetics , Research Centre for Molecular Medicine, School of Medicine, Hamadan University of Medical Sciences , Hamadan , Iran and
| | - Fatemeh Karimi Dermani
- a Department of Molecular Medicine and Genetics , Research Centre for Molecular Medicine, School of Medicine, Hamadan University of Medical Sciences , Hamadan , Iran and
| | - Sareh Sohrabi
- a Department of Molecular Medicine and Genetics , Research Centre for Molecular Medicine, School of Medicine, Hamadan University of Medical Sciences , Hamadan , Iran and
| | - Simon G Patching
- b School of BioMedical Sciences and the Astbury Centre for Structural Molecular Biology, University of Leeds , Leeds , UK
| |
Collapse
|