51
|
Lou C, Zhao J, Shi R, Wang Q, Zhou W, Wang Y, Wang G, Huang L, Feng X, Zhou F. sefOri: selecting the best-engineered sequence features to predict DNA replication origins. Bioinformatics 2019; 36:49-55. [DOI: 10.1093/bioinformatics/btz506] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 05/25/2019] [Accepted: 06/13/2019] [Indexed: 01/08/2023] Open
Abstract
AbstractMotivationCell divisions start from replicating the double-stranded DNA, and the DNA replication process needs to be precisely regulated both spatially and temporally. The DNA is replicated starting from the DNA replication origins. A few successful prediction models were generated based on the assumption that the DNA replication origin regions have sequence level features like physicochemical properties significantly different from the other DNA regions.ResultsThis study proposed a feature selection procedure to further refine the classification model of the DNA replication origins. The experimental data demonstrated that as large as 26% improvement in the prediction accuracy may be achieved on the yeast Saccharomyces cerevisiae. Moreover, the prediction accuracies of the DNA replication origins were improved for all the four yeast genomes investigated in this study.Availability and implementationThe software sefOri version 1.0 was available at http://www.healthinformaticslab.org/supp/resources.php. An online server was also provided for the convenience of the users, and its web link may be found in the above-mentioned web page.Supplementary informationSupplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chenwei Lou
- BioKnow Health Informatics Lab, College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Jian Zhao
- BioKnow Health Informatics Lab, College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Ruoyao Shi
- BioKnow Health Informatics Lab, College of Life Sciences, Jilin University, Changchun 130012, China
| | - Qian Wang
- BioKnow Health Informatics Lab, College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Wenyang Zhou
- BioKnow Health Informatics Lab, College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Yubo Wang
- BioKnow Health Informatics Lab, College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Guoqing Wang
- Department of Pathogenobiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medicine, Jilin University, Changchun 130012, China
| | - Lan Huang
- BioKnow Health Informatics Lab, College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Xin Feng
- BioKnow Health Informatics Lab, College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Fengfeng Zhou
- BioKnow Health Informatics Lab, College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| |
Collapse
|
52
|
Gao R, Wang M, Zhou J, Fu Y, Liang M, Guo D, Nie J. Prediction of Enzyme Function Based on Three Parallel Deep CNN and Amino Acid Mutation. Int J Mol Sci 2019; 20:E2845. [PMID: 31212665 PMCID: PMC6600291 DOI: 10.3390/ijms20112845] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Revised: 06/03/2019] [Accepted: 06/04/2019] [Indexed: 01/28/2023] Open
Abstract
During the past decade, due to the number of proteins in PDB database being increased gradually, traditional methods cannot better understand the function of newly discovered enzymes in chemical reactions. Computational models and protein feature representation for predicting enzymatic function are more important. Most of existing methods for predicting enzymatic function have used protein geometric structure or protein sequence alone. In this paper, the functions of enzymes are predicted from many-sided biological information including sequence information and structure information. Firstly, we extract the mutation information from amino acids sequence by the position scoring matrix and express structure information with amino acids distance and angle. Then, we use histogram to show the extracted sequence and structural features respectively. Meanwhile, we establish a network model of three parallel Deep Convolutional Neural Networks (DCNN) to learn three features of enzyme for function prediction simultaneously, and the outputs are fused through two different architectures. Finally, The proposed model was investigated on a large dataset of 43,843 enzymes from the PDB and achieved 92.34% correct classification when sequence information is considered, demonstrating an improvement compared with the previous result.
Collapse
Affiliation(s)
- Ruibo Gao
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Mengmeng Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Jiaoyan Zhou
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Yuhang Fu
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Meng Liang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Dongliang Guo
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Junlan Nie
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| |
Collapse
|
53
|
The preliminary efficacy evaluation of the CTLA-4-Ig treatment against Lupus nephritis through in-silico analyses. J Theor Biol 2019; 471:74-81. [DOI: 10.1016/j.jtbi.2019.03.017] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 03/22/2019] [Indexed: 01/04/2023]
|
54
|
Barukab O, Khan YD, Khan SA, Chou KC. iSulfoTyr-PseAAC: Identify Tyrosine Sulfation Sites by Incorporating Statistical Moments via Chou's 5-steps Rule and Pseudo Components. Curr Genomics 2019; 20:306-320. [PMID: 32030089 PMCID: PMC6983959 DOI: 10.2174/1389202920666190819091609] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 08/04/2019] [Accepted: 08/06/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND The amino acid residues, in protein, undergo post-translation modification (PTM) during protein synthesis, a process of chemical and physical change in an amino acid that in turn alters behavioral properties of proteins. Tyrosine sulfation is a ubiquitous posttranslational modification which is known to be associated with regulation of various biological functions and pathological pro-cesses. Thus its identification is necessary to understand its mechanism. Experimental determination through site-directed mutagenesis and high throughput mass spectrometry is a costly and time taking process, thus, the reliable computational model is required for identification of sulfotyrosine sites. METHODOLOGY In this paper, we present a computational model for the prediction of the sulfotyrosine sites named iSulfoTyr-PseAAC in which feature vectors are constructed using statistical moments of protein amino acid sequences and various position/composition relative features. These features are in-corporated into PseAAC. The model is validated by jackknife, cross-validation, self-consistency and in-dependent testing. RESULTS Accuracy determined through validation was 93.93% for jackknife test, 95.16% for cross-validation, 94.3% for self-consistency and 94.3% for independent testing. CONCLUSION The proposed model has better performance as compared to the existing predictors, how-ever, the accuracy can be improved further, in future, due to increasing number of sulfotyrosine sites in proteins.
Collapse
Affiliation(s)
| | | | - Sher Afzal Khan
- Address correspondence to this author at the Department of Information Technology, Faculty of Computing and Information Technology in Rabigh, King Abdulaziz University, P.O. Box 344, Rabigh, 21911, Saudi Arabia; and Department of Computer Sciences, Abdul Wali Khan University, Mardan, Pakistan; E-mail:
| | | |
Collapse
|
55
|
Ilyas S, Hussain W, Ashraf A, Khan YD, Khan SA, Chou KC. iMethylK_pseAAC: Improving Accuracy of Lysine Methylation Sites Identification by Incorporating Statistical Moments and Position Relative Features into General PseAAC via Chou's 5-steps Rule. Curr Genomics 2019; 20:275-292. [PMID: 32030087 PMCID: PMC6983956 DOI: 10.2174/1389202920666190809095206] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 07/02/2019] [Accepted: 07/26/2019] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Methylation is one of the most important post-translational modifications in the human body which usually arises on lysine among the most intensely modified residues. It performs a dynamic role in numerous biological procedures, such as regulation of gene expression, regulation of protein function and RNA processing. Therefore, to identify lysine methylation sites is an important challenge as some experimental procedures are time-consuming. OBJECTIVE Herein, we propose a computational predictor named iMethylK_pseAAC to identify lysine methylation sites. METHODS Firstly, we constructed feature vectors based on PseAAC using position and composition rel-ative features and statistical moments. A neural network is trained based on the extracted features. The performance of the proposed method is then validated using cross-validation and jackknife testing. RESULTS The objective evaluation of the predictor showed accuracy of 96.7% for self-consistency, 91.61% for 10-fold cross-validation and 93.42% for jackknife testing. CONCLUSION It is concluded that iMethylK_pseAAC outperforms the counterparts to identify lysine methylation sites such as iMethyl_pseACC, BPB_pPMS and PMeS.
Collapse
Affiliation(s)
| | | | | | - Yaser Daanial Khan
- Address correspondence to this author at the Department of Computer Science, School of Systems and Technology, University of Management and Technology, P.O. Box 10033, C-II, Johar Town, Lahore, Pakistan; Tel: +923054440271; E-mail:
| | | | | |
Collapse
|
56
|
SPrenylC-PseAAC: A sequence-based model developed via Chou's 5-steps rule and general PseAAC for identifying S-prenylation sites in proteins. J Theor Biol 2019; 468:1-11. [DOI: 10.1016/j.jtbi.2019.02.007] [Citation(s) in RCA: 98] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Revised: 02/07/2019] [Accepted: 02/11/2019] [Indexed: 11/22/2022]
|
57
|
Pan Q, Guo Y, Guo L, Liao S, Zhao C, Wang S, Liu HF. Mechanistic Insights of Chemicals and Drugs as Risk Factors for Systemic Lupus Erythematosus. Curr Med Chem 2019; 27:5175-5188. [PMID: 30947650 DOI: 10.2174/0929867326666190404140658] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 03/25/2019] [Accepted: 03/27/2019] [Indexed: 12/21/2022]
Abstract
Systemic Lupus Erythematosus (SLE) is a chronic and relapsing heterogenous autoimmune disease that primarily affects women of reproductive age. Genetic and environmental risk factors are involved in the pathogenesis of SLE, and susceptibility genes have recently been identified. However, as gene therapy is far from clinical application, further investigation of environmental risk factors could reveal important therapeutic approaches. We systematically explored two groups of environmental risk factors: chemicals (including silica, solvents, pesticides, hydrocarbons, heavy metals, and particulate matter) and drugs (including procainamide, hydralazine, quinidine, Dpenicillamine, isoniazid, and methyldopa). Furthermore, the mechanisms underlying risk factors, such as genetic factors, epigenetic change, and disrupted immune tolerance, were explored. This review identifies novel risk factors and their underlying mechanisms. Practicable measures for the management of these risk factors will benefit SLE patients and provide potential therapeutic strategies.
Collapse
Affiliation(s)
- Qingjun Pan
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Yun Guo
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Linjie Guo
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Shuzhen Liao
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Chunfei Zhao
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Sijie Wang
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Hua-Feng Liu
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| |
Collapse
|
58
|
Abstract
Background:DNA-binding proteins, binding to DNA, widely exist in living cells, participating in many cell activities. They can participate some DNA-related cell activities, for instance DNA replication, transcription, recombination, and DNA repair.Objective:Given the importance of DNA-binding proteins, studies for predicting the DNA-binding proteins have been a popular issue over the past decades. In this article, we review current machine-learning methods which research on the prediction of DNA-binding proteins through feature representation methods, classifiers, measurements, dataset and existing web server.Method:The prediction methods of DNA-binding protein can be divided into two types, based on amino acid composition and based on protein structure. In this article, we accord to the two types methods to introduce the application of machine learning in DNA-binding proteins prediction.Results:Machine learning plays an important role in the classification of DNA-binding proteins, and the result is better. The best ACC is above 80%.Conclusion:Machine learning can be widely used in many aspects of biological information, especially in protein classification. Some issues should be considered in future work. First, the relationship between the number of features and performance must be explored. Second, many features are used to predict DNA-binding proteins and propose solutions for high-dimensional spaces.
Collapse
Affiliation(s)
- Kaiyang Qu
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Leyi Wei
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
59
|
Yang W, Zhu XJ, Huang J, Ding H, Lin H. A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181113131415] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Background:The location of proteins in a cell can provide important clues to their functions in various biological processes. Thus, the application of machine learning method in the prediction of protein subcellular localization has become a hotspot in bioinformatics. As one of key organelles, the Golgi apparatus is in charge of protein storage, package, and distribution.Objective:The identification of protein location in Golgi apparatus will provide in-depth insights into their functions. Thus, the machine learning-based method of predicting protein location in Golgi apparatus has been extensively explored. The development of protein sub-Golgi apparatus localization prediction should be reviewed for providing a whole background for the fields.Method:The benchmark dataset, feature extraction, machine learning method and published results were summarized.Results:We briefly introduced the recent progresses in protein sub-Golgi apparatus localization prediction using machine learning methods and discussed their advantages and disadvantages.Conclusion:We pointed out the perspective of machine learning methods in protein sub-Golgi localization prediction.
Collapse
Affiliation(s)
- Wuritu Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan, 610054, China
| | - Xiao-Juan Zhu
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan, 610054, China
| | - Jian Huang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan, 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan, 610054, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan, 610054, China
| |
Collapse
|
60
|
SPalmitoylC-PseAAC: A sequence-based model developed via Chou's 5-steps rule and general PseAAC for identifying S-palmitoylation sites in proteins. Anal Biochem 2019; 568:14-23. [DOI: 10.1016/j.ab.2018.12.019] [Citation(s) in RCA: 93] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Revised: 12/19/2018] [Accepted: 12/22/2018] [Indexed: 02/06/2023]
|
61
|
HRGPred: Prediction of herbicide resistant genes with k-mer nucleotide compositional features and support vector machine. Sci Rep 2019; 9:778. [PMID: 30692561 PMCID: PMC6349872 DOI: 10.1038/s41598-018-37309-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 12/03/2018] [Indexed: 02/07/2023] Open
Abstract
Herbicide resistance (HR) is a major concern for the agricultural producers as well as environmentalists. Resistance to commonly used herbicides are conferred due to mutation(s) in the genes encoding herbicide target sites/proteins (GETS). Identification of these genes through wet-lab experiments is time consuming and expensive. Thus, a supervised learning-based computational model has been proposed in this study, which is first of its kind for the prediction of seven classes of GETS. The cDNA sequences of the genes were initially transformed into numeric features based on the k-mer compositions and then supplied as input to the support vector machine. In the proposed SVM-based model, the prediction occurs in two stages, where a binary classifier in the first stage discriminates the genes involved in conferring the resistance to herbicides from other genes, followed by a multi-class classifier in the second stage that categorizes the predicted herbicide resistant genes in the first stage into any one of the seven resistant classes. Overall classification accuracies were observed to be ~89% and >97% for binary and multi-class classifications respectively. The proposed model confirmed higher accuracy than the homology-based algorithms viz., BLAST and Hidden Markov Model. Besides, the developed computational model achieved ~87% accuracy, while tested with an independent dataset. An online prediction server HRGPred (http://cabgrid.res.in:8080/hrgpred) has also been established to facilitate the prediction of GETS by the scientific community.
Collapse
|
62
|
Wei L, Su R, Wang B, Li X, Zou Q, Gao X. Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.04.082] [Citation(s) in RCA: 110] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
63
|
Rout S, Mahapatra RK. In silico analysis of plasmodium falciparum CDPK5 protein through molecular modeling, docking and dynamics. J Theor Biol 2019; 461:254-267. [DOI: 10.1016/j.jtbi.2018.10.045] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 10/15/2018] [Accepted: 10/22/2018] [Indexed: 10/28/2022]
|
64
|
Jia J, Li X, Qiu W, Xiao X, Chou KC. iPPI-PseAAC(CGR): Identify protein-protein interactions by incorporating chaos game representation into PseAAC. J Theor Biol 2019; 460:195-203. [DOI: 10.1016/j.jtbi.2018.10.021] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 09/16/2018] [Accepted: 10/08/2018] [Indexed: 01/11/2023]
|
65
|
Dao FY, Lv H, Wang F, Ding H. Recent Advances on the Machine Learning Methods in Identifying DNA Replication Origins in Eukaryotic Genomics. Front Genet 2018; 9:613. [PMID: 30619452 PMCID: PMC6295579 DOI: 10.3389/fgene.2018.00613] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Accepted: 11/21/2018] [Indexed: 01/01/2023] Open
Abstract
The initiate site of DNA replication is called origins of replication (ORI) which is regulated by a set of regulatory proteins and plays important roles in the basic biochemical process during cell growth and division in all living organisms. Therefore, the study of ORIs is essential for understanding the cell-division cycle and gene expression regulation so that scholars can develop a new strategy against genetic diseases by using the knowledge of DNA replication. Thus, the accurate identification of ORIs will provide key clues for DNA replication research and clinical medicine. Although, the conventional experiments could provide accurate results, they are time-consuming and cost ineffective. On the contrary, bioinformatics-based methods can overcome these shortcomings. Especially, with the emergence of DNA sequences in the post-genomic era, it is highly expected to develop high throughput tools to identify ORIs based on sequence information. In this review, we will summarize the current progress in computational prediction of eukaryotic ORIs including the collection of benchmark dataset, the application of machine learning-based techniques, the results obtained by these methods, and the construction of web servers. Finally, we gave the future perspectives on ORIs prediction. The review provided readers with a whole background of ORIs prediction based on machine learning methods, which will be helpful for researchers to study DNA replication in-depth and drug therapy of genetic defect.
Collapse
Affiliation(s)
- Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lv
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fang Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
66
|
Xiao X, Xu ZC, Qiu WR, Wang P, Ge HT, Chou KC. iPSW(2L)-PseKNC: A two-layer predictor for identifying promoters and their strength by hybrid features via pseudo K-tuple nucleotide composition. Genomics 2018; 111:1785-1793. [PMID: 30529532 DOI: 10.1016/j.ygeno.2018.12.001] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 11/20/2018] [Accepted: 12/04/2018] [Indexed: 12/20/2022]
Abstract
The promoter is a regulatory DNA region about 81-1000 base pairs long, usually located near the transcription start site (TSS) along upstream of a given gene. By combining a certain protein called transcription factor, the promoter provides the starting point for regulated gene transcription, and hence plays a vitally important role in gene transcriptional regulation. With explosive growth of DNA sequences in the post-genomic age, it has become an urgent challenge to develop computational method for effectively identifying promoters because the information thus obtained is very useful for both basic research and drug development. Although some prediction methods were developed in this regard, most of them were limited at merely identifying whether a query DNA sequence being of a promoter or not. However, based on their strength-distinct levels for transcriptional activation and expression, promoter should be divided into two categories: strong and weak types. Here a new two-layer predictor, called "iPSW(2L)-PseKNC", was developed by fusing the physicochemical properties of nucleotides and their nucleotide density into PseKNC (pseudo K-tuple nucleotide composition). Its 1st-layer serves to predict whether a query DNA sequence sample is of promoter or not, while its 2nd-layer is able to predict the strength of promoters. It has been observed through rigorous cross-validations that the 1st-layer sub-predictor is remarkably superior to the existing state-of-the-art predictors in identifying the promoters and non-promoters, and that the 2nd-layer sub-predictor can do what is beyond the reach of the existing predictors. Moreover, the web-server for iPSW(2L)-PseKNC has been established at http://www.jci-bioinfo.cn/iPSW(2L)-PseKNC, by which the majority of experimental scientists can easily get the results they need.
Collapse
Affiliation(s)
- Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China; The Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Zhao-Chun Xu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.
| | - Wang-Ren Qiu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China; The Gordon Life Science Institute, Boston, MA 02478, USA
| | - Peng Wang
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Hui-Ting Ge
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Kuo-Chen Chou
- The Gordon Life Science Institute, Boston, MA 02478, USA; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
67
|
Pan Y, Wang S, Zhang Q, Lu Q, Su D, Zuo Y, Yang L. Analysis and prediction of animal toxins by various Chou's pseudo components and reduced amino acid compositions. J Theor Biol 2018; 462:221-229. [PMID: 30452961 DOI: 10.1016/j.jtbi.2018.11.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 11/06/2018] [Accepted: 11/15/2018] [Indexed: 01/19/2023]
Abstract
The animal toxin proteins are one of the disulfide rich small peptides that detected in venomous species. They are used as pharmacological tools and therapeutic agents in medicine for the high specificity of their targets. The successful analysis and prediction of toxin proteins may have important signification for the pharmacological and therapeutic researches of toxins. In this study, significant differences were found between the toxins and the non-toxins in amino acid compositions and several important biological properties. The random forest was firstly proposed to predict the animal toxin proteins by selecting 400 pseudo amino acid compositions and the dipeptide compositions of reduced amino acid alphabet as the input parameters. Based on dipeptide composition of reduced amino acid alphabet with 13 reduced amino acids, the best overall accuracy of 85.71% was obtained. These results indicated that our algorithm was an efficient tool for the animal toxin prediction.
Collapse
Affiliation(s)
- Yi Pan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qi Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qianzi Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| |
Collapse
|
68
|
Dao FY, Lv H, Wang F, Feng CQ, Ding H, Chen W, Lin H. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics 2018; 35:2075-2083. [DOI: 10.1093/bioinformatics/bty943] [Citation(s) in RCA: 147] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 11/06/2018] [Accepted: 11/13/2018] [Indexed: 02/07/2023] Open
Affiliation(s)
- Fu-Ying Dao
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lv
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fang Wang
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chao-Qin Feng
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Ding
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wei Chen
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
69
|
Zhang S, Liang Y. Predicting apoptosis protein subcellular localization by integrating auto-cross correlation and PSSM into Chou’s PseAAC. J Theor Biol 2018; 457:163-169. [DOI: 10.1016/j.jtbi.2018.08.042] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Revised: 08/25/2018] [Accepted: 08/31/2018] [Indexed: 10/28/2022]
|
70
|
Mei J, Fu Y, Zhao J. Analysis and prediction of ion channel inhibitors by using feature selection and Chou's general pseudo amino acid composition. J Theor Biol 2018; 456:41-48. [DOI: 10.1016/j.jtbi.2018.07.040] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2018] [Revised: 07/20/2018] [Accepted: 07/29/2018] [Indexed: 12/23/2022]
|
71
|
Ariaeenejad S, Mousivand M, Moradi Dezfouli P, Hashemi M, Kavousi K, Hosseini Salekdeh G. A computational method for prediction of xylanase enzymes activity in strains of Bacillus subtilis based on pseudo amino acid composition features. PLoS One 2018; 13:e0205796. [PMID: 30346964 PMCID: PMC6197662 DOI: 10.1371/journal.pone.0205796] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Accepted: 10/02/2018] [Indexed: 01/09/2023] Open
Abstract
Xylanases are hydrolytic enzymes which based on physicochemical properties, structure, mode of action and substrate specificities are classified into various glycoside hydrolase (GH) families. The purpose of this study is to show that the activity of the members of the xylanase family in the specified pH and temperature conditions can be computationally predicted. The proposed computational regression model was trained and tested with the Pseudo Amino Acid Composition (PseAAC) features extracted solely from the amino acid sequences of enzymes. The xylanases with experimentally determined activities were used as the training dataset to adjust the model parameters. To develop the model, 41 strains of Bacillus subtilis isolated from field soil were screened. From them, 28 strains with the highest halo diameter were selected for further studies. The performance of the model for prediction of xylanase activity was evaluated in three different temperature and pH conditions using stratified cross-validation and jackknife methods. The trained model can be used for determining the activity of newly found xylanases in the specified condition. Such computational models help to scale down the experimental costs and save time by identifying enzymes with appropriate activity for scientific and industrial usage. Our methodology for activity prediction of xylanase enzymes can be potentially applied to the members of the other enzyme families. The availability of sufficient experimental data in specified pH and temperature conditions is a prerequisite for training the learning model and to achieve high accuracy.
Collapse
Affiliation(s)
- Shohreh Ariaeenejad
- Department of Systems Biology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREO), Karaj, Iran
| | - Maryam Mousivand
- Department of Microbial Biotechnology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREO), Karaj, Iran
| | - Parinaz Moradi Dezfouli
- Department of Microbial Biotechnology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREO), Karaj, Iran
| | - Maryam Hashemi
- Department of Microbial Biotechnology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREO), Karaj, Iran
| | - Kaveh Kavousi
- Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Ghasem Hosseini Salekdeh
- Department of Systems Biology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREO), Karaj, Iran
| |
Collapse
|
72
|
Tahir M, Hayat M, Khan SA. iNuc-ext-PseTNC: an efficient ensemble model for identification of nucleosome positioning by extending the concept of Chou's PseAAC to pseudo-tri-nucleotide composition. Mol Genet Genomics 2018; 294:199-210. [PMID: 30291426 DOI: 10.1007/s00438-018-1498-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 09/28/2018] [Indexed: 10/28/2022]
Abstract
Nucleosome is a central element of eukaryotic chromatin, which composes of histone proteins and DNA molecules. It performs vital roles in many eukaryotic intra-nuclear processes, for instance, chromatin structure and transcriptional regulation formation. Identification of nucleosome positioning via wet lab is difficult; so, the attention is diverted towards the accurate intelligent automated prediction. In this regard, a novel intelligent automated model "iNuc-ext-PseTNC" is developed to identify the nucleosome positioning in genomes accurately. In this predictor, the sequences of DNA are mathematically represented by two different discrete feature extraction techniques, namely pseudo-tri-nucleotide composition (PseTNC) and pseudo-di-nucleotide composition. Several contemporary machine learning algorithms were examined. Further, the predictions of individual classifiers were integrated through an evolutionary genetic algorithm. The success rates of the ensemble model are higher than individual classifiers. After analyzing the prediction results, it is noticed that iNuc-ext-PseTNC model has achieved better performance in combination with PseTNC feature space, which are 94.3%, 93.14%, and 88.60% of accuracies using six-fold cross-validation test for the three benchmark datasets S1, S2, and S3, respectively. The achieved outcomes exposed that the results of iNuc-ext-PseTNC model are prominent compared to the existing methods so far notifiable in the literature. It is ascertained that the proposed model might be more fruitful and a practical tool for rudimentary academia and research.
Collapse
Affiliation(s)
- Muhammad Tahir
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, KP, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, KP, Pakistan.
| | - Sher Afzal Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, KP, Pakistan
| |
Collapse
|
73
|
Xu Y, Yang Y, Ding J, Li C. iGlu-Lys: A Predictor for Lysine Glutarylation Through Amino Acid Pair Order Features. IEEE Trans Nanobioscience 2018; 17:394-401. [DOI: 10.1109/tnb.2018.2848673] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
74
|
Butt AH, Rasool N, Khan YD. Predicting membrane proteins and their types by extracting various sequence features into Chou's general PseAAC. Mol Biol Rep 2018; 45:2295-2306. [PMID: 30238411 DOI: 10.1007/s11033-018-4391-5] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 09/14/2018] [Indexed: 11/30/2022]
Abstract
For many biological functions membrane proteins (MPs) are considered crucial. Due to this nature of MPs, many pharmaceutical agents have reflected them as attractive targets. It bears indispensable importance that MPs are predicted with accurate measures using effective and efficient computational models (CMs). Annotation of MPs using in vitro analytical techniques is time-consuming and expensive; and in some cases, it can prove to be intractable. Due to this scenario, automated prediction and annotation of MPs through CM based techniques have appeared to be useful. Based on the use of computational intelligence and statistical moments based feature set, an MP prediction framework is proposed. Furthermore, the previously used dataset has been enhanced by incorporating new MPs from the latest release of UniProtKB. Rigorous experimentation proves that the use of statistical moments with a multilayer neural network, trained using back-propagation based prediction techniques allows more thorough results.
Collapse
Affiliation(s)
- Ahmad Hassan Butt
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, C-II, Johar Town, P.O. Box 10033, Lahore, 54770, Pakistan.
| | - Nouman Rasool
- Department of Life Sciences, School of Science, University of Management and Technology, C-II, Johar Town, P.O. Box 10033, Lahore, 54770, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, C-II, Johar Town, P.O. Box 10033, Lahore, 54770, Pakistan
| |
Collapse
|
75
|
Chen W, Ding H, Zhou X, Lin H, Chou KC. iRNA(m6A)-PseDNC: Identifying N 6-methyladenosine sites using pseudo dinucleotide composition. Anal Biochem 2018; 561-562:59-65. [PMID: 30201554 DOI: 10.1016/j.ab.2018.09.002] [Citation(s) in RCA: 126] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 08/31/2018] [Accepted: 09/03/2018] [Indexed: 01/28/2023]
Abstract
As a prevalent post-transcriptional modification, N6-methyladenosine (m6A) plays key roles in a series of biological processes. Although experimental technologies have been developed and applied to identify m6A sites, they are still cost-ineffective for transcriptome-wide detections of m6A. As good complements to the experimental techniques, some computational methods have been proposed to identify m6A sites. However, their performance remains unsatisfactory. In this study, we firstly proposed an Euclidean distance based method to construct a high quality benchmark dataset. By encoding the RNA sequences using pseudo nucleotide composition, a new predictor called iRNA(m6A)-PseDNC was developed to identify m6A sites in the Saccharomyces cerevisiae genome. It has been demonstrated by the 10-fold cross validation test that the performance of iRNA(m6A)-PseDNC is superior to the existing methods. Meanwhile, for the convenience of most experimental scientists, established at the site http://lin-group.cn/server/iRNA(m6A)-PseDNC.php is its web-server, by which users can easily get their desired results without need to go through the detailed mathematics. It is anticipated that iRNA(m6A)-PseDNC will become a useful high throughput tool for identifying m6A sites in the S. cerevisiae genome.
Collapse
Affiliation(s)
- Wei Chen
- School of Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, 063000, China; Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, 611730, China; Gordon Life Science Institute, Boston, MA, 02478, USA.
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Xu Zhou
- School of Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, 063000, China.
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Gordon Life Science Institute, Boston, MA, 02478, USA.
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Gordon Life Science Institute, Boston, MA, 02478, USA.
| |
Collapse
|
76
|
Rahman MS, Aktar U, Jani MR, Shatabda S. iPro70-FMWin: identifying Sigma70 promoters using multiple windowing and minimal features. Mol Genet Genomics 2018; 294:69-84. [PMID: 30187132 DOI: 10.1007/s00438-018-1487-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 08/29/2018] [Indexed: 01/16/2023]
Abstract
In bacterial DNA, there are specific sequences of nucleotides called promoters that can bind to the RNA polymerase. Sigma70 ([Formula: see text]) is one of the most important promoter sequences due to its presence in most of the DNA regulatory functions. In this paper, we identify the most effective and optimal sequence-based features for prediction of [Formula: see text] promoter sequences in a bacterial genome. We used both short-range and long-range DNA sequences in our proposed method. A very small number of effective features are selected from a large number of the extracted features using multi-window of different sizes within the DNA sequences. We call our prediction method iPro70-FMWin and made it freely accessible online via a web application established at http://ipro70.pythonanywhere.com/server for the sake of convenience of the researchers. We have tested our method using a standard benchmark dataset. In the experiments, iPro70-FMWin has achieved an area under the curve of the receiver operating characteristic and accuracy of 0.959 and 90.57%, respectively, which significantly outperforms the state-of-the-art predictors.
Collapse
Affiliation(s)
- Md Siddiqur Rahman
- Department of Computer Science and Engineering, United International University, Madani Avenue, Satarkul, Badda, Dhaka, 1212, Bangladesh
| | - Usma Aktar
- Department of Computer Science and Engineering, United International University, Madani Avenue, Satarkul, Badda, Dhaka, 1212, Bangladesh
| | - Md Rafsan Jani
- Department of Computer Science and Engineering, United International University, Madani Avenue, Satarkul, Badda, Dhaka, 1212, Bangladesh
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Madani Avenue, Satarkul, Badda, Dhaka, 1212, Bangladesh.
| |
Collapse
|
77
|
Qiu WR, Sun BQ, Xiao X, Xu ZC, Jia JH, Chou KC. iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics 2018; 110:239-246. [DOI: 10.1016/j.ygeno.2017.10.008] [Citation(s) in RCA: 99] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 10/23/2017] [Accepted: 10/25/2017] [Indexed: 01/23/2023]
|
78
|
Kumar R, Kumari B, Kumar M. Proteome-wide prediction and annotation of mitochondrial and sub-mitochondrial proteins by incorporating domain information. Mitochondrion 2018; 42:11-22. [DOI: 10.1016/j.mito.2017.10.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Revised: 07/21/2017] [Accepted: 10/06/2017] [Indexed: 12/22/2022]
|
79
|
Tan JX, Dao FY, Lv H, Feng PM, Ding H. Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods. Molecules 2018; 23:molecules23082000. [PMID: 30103458 PMCID: PMC6222849 DOI: 10.3390/molecules23082000] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 07/30/2018] [Accepted: 08/08/2018] [Indexed: 12/31/2022] Open
Abstract
Accurate identification of phage virion protein is not only a key step for understanding the function of the phage virion protein but also helpful for further understanding the lysis mechanism of the bacterial cell. Since traditional experimental methods are time-consuming and costly for identifying phage virion proteins, it is extremely urgent to apply machine learning methods to accurately and efficiently identify phage virion proteins. In this work, a support vector machine (SVM) based method was proposed by mixing multiple sets of optimal g-gap dipeptide compositions. The analysis of variance (ANOVA) and the minimal-redundancy-maximal-relevance (mRMR) with an increment feature selection (IFS) were applied to single out the optimal feature set. In the five-fold cross-validation test, the proposed method achieved an overall accuracy of 87.95%. We believe that the proposed method will become an efficient and powerful method for scientists concerning phage virion proteins.
Collapse
Affiliation(s)
- Jiu-Xin Tan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Hao Lv
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Peng-Mian Feng
- Hebei Province Key Laboratory of Occupational Health and Safety for Coal Industry, School of Public Health, North China University of Science and Technology, Tangshan 063000, China.
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
80
|
He W, Jia C, Zou Q. 4mCPred: machine learning methods for DNA N4-methylcytosine sites prediction. Bioinformatics 2018; 35:593-601. [DOI: 10.1093/bioinformatics/bty668] [Citation(s) in RCA: 103] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 07/07/2018] [Accepted: 07/24/2018] [Indexed: 11/14/2022] Open
Affiliation(s)
- Wenying He
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Cangzhi Jia
- Department of Mathematics, Dalian Maritime University, Dalian, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| |
Collapse
|
81
|
Al Maruf MA, Shatabda S. iRSpot-SF: Prediction of recombination hotspots by incorporating sequence based features into Chou's Pseudo components. Genomics 2018; 111:966-972. [PMID: 29935224 DOI: 10.1016/j.ygeno.2018.06.003] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Revised: 06/09/2018] [Accepted: 06/13/2018] [Indexed: 11/28/2022]
Abstract
Recombination hotspots in a genome are unevenly distributed. Hotspots are regions in a genome that show higher rates of meiotic recombinations. Computational methods for recombination hotspot prediction often use sophisticated features that are derived from physico-chemical or structure based properties of nucleotides. In this paper, we propose iRSpot-SF that uses sequence based features which are computationally cheap to generate. Four feature groups are used in our method: k-mer composition, gapped k-mer composition, TF-IDF of k-mers and reverse complement k-mer composition. We have used recursive feature elimination to select 17 top features for hotspot prediction. Our analysis shows the superiority of gapped k-mer composition and reverse complement k-mer composition features over others. We have used SVM with RBF kernel as a classification algorithm. We have tested our algorithm on standard benchmark datasets. Compared to other methods iRSpot-SF is able to produce significantly better results in terms of accuracy, Mathew's Correlation Coefficient and sensitivity which are 84.58%, 0.6941 and 84.57%. We have made our method readily available to use as a python based tool and made the datasets and source codes available at: https://github.com/abdlmaruf/iRSpot-SF. An web application is developed based on iRSpot-SF and freely available to use at: http://irspot.pythonanywhere.com/server.html.
Collapse
Affiliation(s)
- Md Abdullah Al Maruf
- Department of Computer Science and Engineering, United International University, Madani Aveneue, Satarkul, Badda, Dhaka 1212, Bangladesh
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Madani Aveneue, Satarkul, Badda, Dhaka 1212, Bangladesh.
| |
Collapse
|
82
|
Lim LWK, Chung HH, Chong YL, Lee NK. A survey of recently emerged genome-wide computational enhancer predictor tools. Comput Biol Chem 2018; 74:132-141. [DOI: 10.1016/j.compbiolchem.2018.03.019] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Revised: 03/13/2018] [Accepted: 03/13/2018] [Indexed: 12/19/2022]
|
83
|
Zhang M, Xu Y, Li L, Liu Z, Yang X, Yu DJ. Accurate RNA 5-methylcytosine site prediction based on heuristic physical-chemical properties reduction and classifier ensemble. Anal Biochem 2018; 550:41-48. [DOI: 10.1016/j.ab.2018.03.027] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2017] [Revised: 03/27/2018] [Accepted: 03/28/2018] [Indexed: 11/25/2022]
|
84
|
Khan YD, Rasool N, Hussain W, Khan SA, Chou KC. iPhosT-PseAAC: Identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC. Anal Biochem 2018; 550:109-116. [DOI: 10.1016/j.ab.2018.04.021] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 04/19/2018] [Accepted: 04/21/2018] [Indexed: 01/29/2023]
|
85
|
Tang H, Zhao YW, Zou P, Zhang CM, Chen R, Huang P, Lin H. HBPred: a tool to identify growth hormone-binding proteins. Int J Biol Sci 2018; 14:957-964. [PMID: 29989085 PMCID: PMC6036759 DOI: 10.7150/ijbs.24174] [Citation(s) in RCA: 136] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 01/15/2018] [Indexed: 12/19/2022] Open
Abstract
Hormone-binding protein (HBP) is a kind of soluble carrier protein and can selectively and non-covalently interact with hormone. HBP plays an important role in life growth, but its function is still unclear. Correct recognition of HBPs is the first step to further study their function and understand their biological process. However, it is difficult to correctly recognize HBPs from more and more proteins through traditional biochemical experiments because of high experimental cost and long experimental period. To overcome these disadvantages, we designed a computational method for identifying HBPs accurately in the study. At first, we collected HBP data from UniProt to establish a high-quality benchmark dataset. Based on the dataset, the dipeptide composition was extracted from HBP residue sequences. In order to find out the optimal features to provide key clues for HBP identification, the analysis of various (ANOVA) was performed for feature ranking. The optimal features were selected through the incremental feature selection strategy. Subsequently, the features were inputted into support vector machine (SVM) for prediction model construction. Jackknife cross-validation results showed that 88.6% HBPs and 81.3% non-HBPs were correctly recognized, suggesting that our proposed model was powerful. This study provides a new strategy to identify HBPs. Moreover, based on the proposed model, we established a webserver called HBPred, which could be freely accessed at http://lin-group.cn/server/HBPred.
Collapse
Affiliation(s)
- Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Ya-Wei Zhao
- Key Laboratory for NeuroInformation of Ministry of Education, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Ping Zou
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Chun-Mei Zhang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Rong Chen
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Po Huang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
86
|
DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC. J Theor Biol 2018; 452:22-34. [PMID: 29753757 DOI: 10.1016/j.jtbi.2018.05.006] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2018] [Revised: 04/21/2018] [Accepted: 05/04/2018] [Indexed: 11/21/2022]
Abstract
A DNA-binding protein (DNA-BP) is a protein that can bind and interact with a DNA. Identification of DNA-BPs using experimental methods is expensive as well as time consuming. As such, fast and accurate computational methods are sought for predicting whether a protein can bind with a DNA or not. In this paper, we focus on building a new computational model to identify DNA-BPs in an efficient and accurate way. Our model extracts meaningful information directly from the protein sequences, without any dependence on functional domain or structural information. After feature extraction, we have employed Random Forest (RF) model to rank the features. Afterwards, we have used Recursive Feature Elimination (RFE) method to extract an optimal set of features and trained a prediction model using Support Vector Machine (SVM) with linear kernel. Our proposed method, named as DNA-binding Protein Prediction model using Chou's general PseAAC (DPP-PseAAC), demonstrates superior performance compared to the state-of-the-art predictors on standard benchmark dataset. DPP-PseAAC achieves accuracy values of 93.21%, 95.91% and 77.42% for 10-fold cross-validation test, jackknife test and independent test respectively. The source code of DPP-PseAAC, along with relevant dataset and detailed experimental results, can be found at https://github.com/srautonu/DNABinding. A publicly accessible web interface has also been established at: http://77.68.43.135:8080/DPP-PseAAC/.
Collapse
|
87
|
Zhang LQ, Li QZ. Estimating the effects of transcription factors binding and histone modifications on gene expression levels in human cells. Oncotarget 2018; 8:40090-40103. [PMID: 28454114 PMCID: PMC5522221 DOI: 10.18632/oncotarget.16988] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 03/11/2017] [Indexed: 12/22/2022] Open
Abstract
Transcription factors and histone modifications are vital for the regulation of gene expression. Hence, to estimate the effects of transcription factors binding and histone modifications on gene expression, we construct a statistical model for the genome-wide 15 transcription factors binding data, 10 histone modifications profiles and DNase-I hypersensitivity data in three mammalian. Remarkably, our results show POLR2A and H3K36me3 can highly and consistently predict gene expression in three cell lines. And H3K4me3, H3K27me3 and H3K9ac are more reliable predictors than other histone modifications in human embryonic stem cells. Moreover, genome-wide statistical redundancies exist within and between transcription factors and histone modifications, and these phenomena may be caused by the regulation mechanism. In further study, we find that even though transcription factors and histone modifications offer similar effects on expression levels of genome-wide genes, the effects of transcription factors and histone modifications on predictive abilities are different for genes in independent biological processes.
Collapse
Affiliation(s)
- Lu-Qiang Zhang
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| | - Qian-Zhong Li
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| |
Collapse
|
88
|
Zhang S, Zhuang W, Xu Z. Prediction of DNase I hypersensitive sites in plant genome using multiple modes of pseudo components. Anal Biochem 2018; 549:149-156. [DOI: 10.1016/j.ab.2018.03.025] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2018] [Revised: 03/23/2018] [Accepted: 03/27/2018] [Indexed: 12/25/2022]
|
89
|
Muthu Krishnan S. Using Chou's general PseAAC to analyze the evolutionary relationship of receptor associated proteins (RAP) with various folding patterns of protein domains. J Theor Biol 2018; 445:62-74. [DOI: 10.1016/j.jtbi.2018.02.008] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 01/24/2018] [Accepted: 02/12/2018] [Indexed: 01/31/2023]
|
90
|
Prediction of the aquatic toxicity of aromatic compounds to tetrahymena pyriformis through support vector regression. Oncotarget 2018; 8:49359-49369. [PMID: 28467816 PMCID: PMC5564774 DOI: 10.18632/oncotarget.17210] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Accepted: 03/30/2017] [Indexed: 01/24/2023] Open
Abstract
Toxicity evaluation is an extremely important process during drug development. It is usually initiated by experiments on animals, which is time-consuming and costly. To speed up such a process, a quantitative structure-activity relationship (QSAR) study was performed to develop a computational model for correlating the structures of 581 aromatic compounds with their aquatic toxicity to tetrahymena pyriformis. A set of 68 molecular descriptors derived solely from the structures of the aromatic compounds were calculated based on Gaussian 03, HyperChem 7.5, and TSAR V3.3. A comprehensive feature selection method, minimum Redundancy Maximum Relevance (mRMR)-genetic algorithm (GA)-support vector regression (SVR) method, was applied to select the best descriptor subset in QSAR analysis. The SVR method was employed to model the toxicity potency from a training set of 500 compounds. Five-fold cross-validation method was used to optimize the parameters of SVR model. The new SVR model was tested on an independent dataset of 81 compounds. Both high internal consistent and external predictive rates were obtained, indicating the SVR model is very promising to become an effective tool for fast detecting the toxicity.
Collapse
|
91
|
Song J, Wang Y, Li F, Akutsu T, Rawlings ND, Webb GI, Chou KC. iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief Bioinform 2018. [DOI: 10.1093/bib/bby028 epub ahead of print].] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Yanan Wang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, 611-0011, Japan
| | - Neil D Rawlings
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA and Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
92
|
Liu B, Weng F, Huang DS, Chou KC. iRO-3wPseKNC: identify DNA replication origins by three-window-based PseKNC. Bioinformatics 2018; 34:3086-3093. [DOI: 10.1093/bioinformatics/bty312] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Accepted: 04/18/2018] [Indexed: 12/16/2022] Open
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Gordon Life Science Institute, Belmont, MA, USA
| | - Fan Weng
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Belmont, MA, USA
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
93
|
Jia C, Yang Q, Zou Q. NucPosPred: Predicting species-specific genomic nucleosome positioning via four different modes of general PseKNC. J Theor Biol 2018; 450:15-21. [PMID: 29678692 DOI: 10.1016/j.jtbi.2018.04.025] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 04/13/2018] [Accepted: 04/16/2018] [Indexed: 11/20/2022]
Abstract
The nucleosome is the basic structure of chromatin in eukaryotic cells, with essential roles in the regulation of many biological processes, such as DNA transcription, replication and repair, and RNA splicing. Because of the importance of nucleosomes, the factors that determine their positioning within genomes should be investigated. High-resolution nucleosome-positioning maps are now available for organisms including Saccharomyces cerevisiae, Drosophila melanogaster and Caenorhabditis elegans, enabling the identification of nucleosome positioning by application of computational tools. Here, we describe a novel predictor called NucPosPred, which was specifically designed for large-scale identification of nucleosome positioning in C. elegans and D. melanogaster genomes. NucPosPred was separately optimized for each species for four types of DNA sequence feature extraction, with consideration of two classification algorithms (gradient-boosting decision tree and support vector machine). The overall accuracy obtained with NucPosPred was 92.29% for C. elegans and 88.26% for D. melanogaster, outperforming previous methods and demonstrating the potential for species-specific prediction of nucleosome positioning. For the convenience of most experimental scientists, a web-server for the predictor NucPosPred is available at http://121.42.167.206/NucPosPred/index.jsp.
Collapse
Affiliation(s)
- Cangzhi Jia
- Science of College, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China.
| | - Qing Yang
- Science of College, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China.
| |
Collapse
|
94
|
iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition. Oncotarget 2018; 8:41178-41188. [PMID: 28476023 PMCID: PMC5522291 DOI: 10.18632/oncotarget.17104] [Citation(s) in RCA: 146] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 03/15/2017] [Indexed: 01/24/2023] Open
Abstract
Occurring at cytosine (C) of RNA, 5-methylcytosine (m5C) is an important post-transcriptional modification (PTCM). The modification plays significant roles in biological processes by regulating RNA metabolism in both eukaryotes and prokaryotes. It may also, however, cause cancers and other major diseases. Given an uncharacterized RNA sequence that contains many C residues, can we identify which one of them can be of m5C modification, and which one cannot? It is no doubt a crucial problem, particularly with the explosive growth of RNA sequences in the postgenomic age. Unfortunately, so far no user-friendly web-server whatsoever has been developed to address such a problem. To meet the increasingly high demand from most experimental scientists working in the area of drug development, we have developed a new predictor called iRNAm5C-PseDNC by incorporating ten types of physical-chemical properties into pseudo dinucleotide composition via the auto/cross-covariance approach. Rigorous jackknife tests show that its anticipated accuracy is quite high. For most experimental scientists’ convenience, a user-friendly web-server for the predictor has been provided at http://www.jci-bioinfo.cn/iRNAm5C-PseDNC along with a step-by-step user guide, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the approach presented here can also be used to deal with many other problems in genome analysis.
Collapse
|
95
|
iMem-2LSAAC: A two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into chou's pseudo amino acid composition. J Theor Biol 2018; 442:11-21. [DOI: 10.1016/j.jtbi.2018.01.008] [Citation(s) in RCA: 83] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 12/23/2017] [Accepted: 01/10/2018] [Indexed: 02/08/2023]
|
96
|
Chen W, Feng P, Yang H, Ding H, Lin H, Chou KC. iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget 2018; 8:4208-4217. [PMID: 27926534 PMCID: PMC5354824 DOI: 10.18632/oncotarget.13758] [Citation(s) in RCA: 199] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 11/23/2016] [Indexed: 01/14/2023] Open
Abstract
Catalyzed by adenosine deaminase (ADAR), the adenosine to inosine (A-to-I) editing in RNA is not only involved in various important biological processes, but also closely associated with a series of major diseases. Therefore, knowledge about the A-to-I editing sites in RNA is crucially important for both basic research and drug development. Given an uncharacterized RNA sequence that contains many adenosine (A) residues, can we identify which one of them can be of A-to-I editing, and which one cannot? Unfortunately, so far no computational method whatsoever has been developed to address such an important problem based on the RNA sequence information alone. To fill this empty area, we have proposed a predictor called iRNA-AI by incorporating the chemical properties of nucleotides and their sliding occurrence density distribution along a RNA sequence into the general form of pseudo nucleotide composition (PseKNC). It has been shown by the rigorous jackknife test and independent dataset test that the performance of the proposed predictor is quite promising. For the convenience of most experimental scientists, a user-friendly web-server for iRNA-AI has been established at http://lin.uestc.edu.cn/server/iRNA-AI/, by which users can easily get their desired results without the need to go through the mathematical details.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, Tangshan, China.,Gordon Life Science Institute, Belmont, Massachusetts, United States of America
| | - Pengmian Feng
- Hebei Province Key Laboratory of Occupational Health and Safety for Coal Industry, School of Public Health, North China University of Science and Technology, Tangshan, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Gordon Life Science Institute, Belmont, Massachusetts, United States of America
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Gordon Life Science Institute, Belmont, Massachusetts, United States of America
| |
Collapse
|
97
|
A Novel Modeling in Mathematical Biology for Classification of Signal Peptides. Sci Rep 2018; 8:1039. [PMID: 29348418 PMCID: PMC5773712 DOI: 10.1038/s41598-018-19491-y] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Accepted: 01/02/2018] [Indexed: 11/17/2022] Open
Abstract
The molecular structure of macromolecules in living cells is ambiguous unless we classify them in a scientific manner. Signal peptides are of vital importance in determining the behavior of newly formed proteins towards their destined path in cellular and extracellular location in both eukaryotes and prokaryotes. In the present research work, a novel method is offered to foreknow the behavior of signal peptides and determine their cleavage site. The proposed model employs neural networks using isolated sets of prokaryote and eukaryote primary sequences. Protein sequences are classified as secretory or non-secretory in order to investigate secretory proteins and their signal peptides. In comparison with the previous prediction tools, the proposed algorithm is more rigorous, well-organized, significantly appropriate and highly accurate for the examination of signal peptides even in extensive collection of protein sequences.
Collapse
|
98
|
Borowska M, Brzozowska E, Kuć P, Oczeretko E, Mosdorf R, Laudański P. Identification of preterm birth based on RQA analysis of electrohysterograms. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 153:227-236. [PMID: 29157455 DOI: 10.1016/j.cmpb.2017.10.018] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Revised: 10/10/2017] [Accepted: 10/12/2017] [Indexed: 06/07/2023]
Abstract
BACKGROUND AND OBJECTIVE Common methods for data analysis are mainly based on linear concepts, but in recent years nonlinear dynamics methods have been introduced. It is a well-known fact that In typical biological systems lack of stationarity and rather sudden changes of state are the properties distinguishing them from each other. There is an urgent need to better understand the mechanical activity of the myometrium (its contractility) to find a solution for preterm delivery problem, the largest cause of neonatal deaths and morbidity. The electrohysterographic signal (EHG) is a good non-linear, bioelectrical indicator for the detection and identification of term and preterm birth. METHODS The material of the study consists of EHG signals, obtained from 20 patients between the 24th and the 28th week of pregnancy with threatened preterm labor. The women were divided into two groups: those delivering after more than 7 days - group A (n = 10) and women delivering within 7 days - group B (n = 10). In this paper, an analysis of bioelectrical signals was performed by recurrence quantification analysis (RQA) and principal component analysis (PCA) to distinguish particular patterns for term and preterm birth. To date, these methods have not been used for the evaluation of bioelectrical activity in the uterus. To train novel classifiers for the EHG signals Support Vectors Machine classifications (multiclass SVM) was used. Statistical analysis was performed by means of non-parametric Mann-Whitney test. RESULTS From among eleven parameters obtained from recurrence quantification analysis, five most appropriate were chosen: Recurrence Rate, Determinism, Laminarity, Entropy and Recurrence Period Density Entropy. Significant increase (p < .001) of Recurrence Rate was found in patients from group B, while increase of parameters, besides Laminarity, was found in patients from group A. The accuracy of classification obtained as a result of the analysis increased to 83,32%. CONCLUSION We showed that the respectively selected recurrence quantificators obtained for that time series could be used to classify all those signals to the appropriate group. The proposed analysis could help in detecting preterm labor based on the EHG signal dynamics.
Collapse
Affiliation(s)
- Marta Borowska
- Faculty of Mechanical Engineering, Bialystok University of Technology, Wiejska 45C, 15-351 Białystok, Poland.
| | - Ewelina Brzozowska
- Faculty of Mechanical Engineering, Bialystok University of Technology, Wiejska 45C, 15-351 Białystok, Poland
| | - Paweł Kuć
- Department of Perinatology, Medical University of Bialystok, M. Skłodowskiej-Curie 24A, 15-276 Białystok, Poland
| | - Edward Oczeretko
- Faculty of Mechanical Engineering, Bialystok University of Technology, Wiejska 45C, 15-351 Białystok, Poland
| | - Romuald Mosdorf
- Faculty of Mechanical Engineering, Bialystok University of Technology, Wiejska 45C, 15-351 Białystok, Poland
| | - Piotr Laudański
- Department of Perinatology, Medical University of Bialystok, M. Skłodowskiej-Curie 24A, 15-276 Białystok, Poland
| |
Collapse
|
99
|
pLoc-mEuk: Predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. Genomics 2018; 110:50-58. [DOI: 10.1016/j.ygeno.2017.08.005] [Citation(s) in RCA: 180] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 08/10/2017] [Accepted: 08/11/2017] [Indexed: 11/22/2022]
|
100
|
Rahman MS, Rahman MK, Kaykobad M, Rahman MS. isGPT: An optimized model to identify sub-Golgi protein types using SVM and Random Forest based feature selection. Artif Intell Med 2017; 84:90-100. [PMID: 29183738 DOI: 10.1016/j.artmed.2017.11.003] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 11/13/2017] [Accepted: 11/17/2017] [Indexed: 10/18/2022]
Abstract
The Golgi Apparatus (GA) is a key organelle for protein synthesis within the eukaryotic cell. The main task of GA is to modify and sort proteins for transport throughout the cell. Proteins permeate through the GA on the ER (Endoplasmic Reticulum) facing side (cis side) and depart on the other side (trans side). Based on this phenomenon, we get two types of GA proteins, namely, cis-Golgi protein and trans-Golgi protein. Any dysfunction of GA proteins can result in congenital glycosylation disorders and some other forms of difficulties that may lead to neurodegenerative and inherited diseases like diabetes, cancer and cystic fibrosis. So, the exact classification of GA proteins may contribute to drug development which will further help in medication. In this paper, we focus on building a new computational model that not only introduces easy ways to extract features from protein sequences but also optimizes classification of trans-Golgi and cis-Golgi proteins. After feature extraction, we have employed Random Forest (RF) model to rank the features based on the importance score obtained from it. After selecting the top ranked features, we have applied Support Vector Machine (SVM) to classify the sub-Golgi proteins. We have trained regression model as well as classification model and found the former to be superior. The model shows improved performance over all previous methods. As the benchmark dataset is significantly imbalanced, we have applied Synthetic Minority Over-sampling Technique (SMOTE) to the dataset to make it balanced and have conducted experiments on both versions. Our method, namely, identification of sub-Golgi Protein Types (isGPT), achieves accuracy values of 95.4%, 95.9% and 95.3% for 10-fold cross-validation test, jackknife test and independent test respectively. According to different performance metrics, isGPT performs better than state-of-the-art techniques. The source code of isGPT, along with relevant dataset and detailed experimental results, can be found at https://github.com/srautonu/isGPT.
Collapse
Affiliation(s)
- M Saifur Rahman
- Department of CSE, BUET, ECE Building, West Palasi, Dhaka 1205, Bangladesh.
| | | | - M Kaykobad
- Department of CSE, BUET, ECE Building, West Palasi, Dhaka 1205, Bangladesh.
| | - M Sohel Rahman
- Department of CSE, BUET, ECE Building, West Palasi, Dhaka 1205, Bangladesh.
| |
Collapse
|