1
|
Jin J, Feng J. iDHS-RGME: Identification of DNase I hypersensitive sites by integrating information on nucleotide composition and physicochemical properties. Biochem Biophys Res Commun 2024; 734:150618. [PMID: 39222575 DOI: 10.1016/j.bbrc.2024.150618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 08/19/2024] [Accepted: 08/28/2024] [Indexed: 09/04/2024]
Abstract
As pivotal markers of chromatin accessibility, DNase I hypersensitive sites (DHSs) intimately link to fundamental biological processes encompassing gene expression regulation and disease pathogenesis. Developing efficient and precise algorithms for DHSs identification holds paramount importance for unraveling genome functionality and elucidating disease mechanisms. This study innovatively presents iDHS-RGME, an Extremely Randomized Trees (Extra-Trees)-based algorithm that integrates unique feature extraction techniques for enhanced DHSs prediction. Specifically, iDHS-RGME utilizes two feature extraction approaches: Reverse Complementary Kmer (RCKmer) and Geary Spatial Autocorrelation (GSA), which comprehensively capture sequence attributes from diverse angles, bolstering information richness and accuracy. To address data imbalance, Borderline-SMOTE is employed, followed by Maximum Information Coefficient (MIC) for meticulous feature selection. Comparative evaluations underscored the superiority of the Extra-Trees classifier, which was subsequently adopted for model prediction. Through rigorous five-fold cross-validation, iDHS-RGME achieved remarkable accuracies of 94.71 % and 95.07 % on two independent datasets, outperforming previous models in terms of both precision and effectiveness.
Collapse
Affiliation(s)
- Jian Jin
- School of Science, Minzu University of China, Beijing, 100081, China
| | - Jie Feng
- School of Science, Minzu University of China, Beijing, 100081, China.
| |
Collapse
|
2
|
Zawisza-Álvarez M, Peñuela-Melero J, Vegas E, Reverter F, Garcia-Fernàndez J, Herrera-Úbeda C. Exploring functional conservation in silico: a new machine learning approach to RNA-editing. Brief Bioinform 2024; 25:bbae332. [PMID: 38980372 PMCID: PMC11232462 DOI: 10.1093/bib/bbae332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 06/09/2024] [Accepted: 06/25/2024] [Indexed: 07/10/2024] Open
Abstract
Around 50 years ago, molecular biology opened the path to understand changes in forms, adaptations, complexity, or the basis of human diseases through myriads of reports on gene birth, gene duplication, gene expression regulation, and splicing regulation, among other relevant mechanisms behind gene function. Here, with the advent of big data and artificial intelligence (AI), we focus on an elusive and intriguing mechanism of gene function regulation, RNA editing, in which a single nucleotide from an RNA molecule is changed, with a remarkable impact in the increase of the complexity of the transcriptome and proteome. We present a new generation approach to assess the functional conservation of the RNA-editing targeting mechanism using two AI learning algorithms, random forest (RF) and bidirectional long short-term memory (biLSTM) neural networks with an attention layer. These algorithms, combined with RNA-editing data coming from databases and variant calling from same-individual RNA and DNA-seq experiments from different species, allowed us to predict RNA-editing events using both primary sequence and secondary structure. Then, we devised a method for assessing conservation or divergence in the molecular mechanisms of editing completely in silico: the cross-testing analysis. This novel method not only helps to understand the conservation of the editing mechanism through evolution but could set the basis for achieving a better understanding of the adenosine-targeting mechanism in other fields.
Collapse
Affiliation(s)
- Michał Zawisza-Álvarez
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Av. Digonal 643, 08028 Barcelona, Spain
- Institut de Biomedicina (IBUB), Universitat de Barcelona, Av. Diagonal 643, 08028 Barcelona, Spain
| | - Jesús Peñuela-Melero
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Av. Digonal 643, 08028 Barcelona, Spain
| | - Esteban Vegas
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Av. Digonal 643, 08028 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Fragilidad y Envejecimiento Saludable (CIBERFES), Instituto de Salud Carlos III, Calle Sinesio Delgado 4, 28029 Madrid, Spain
| | - Ferran Reverter
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Av. Digonal 643, 08028 Barcelona, Spain
| | - Jordi Garcia-Fernàndez
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Av. Digonal 643, 08028 Barcelona, Spain
- Institut de Biomedicina (IBUB), Universitat de Barcelona, Av. Diagonal 643, 08028 Barcelona, Spain
| | - Carlos Herrera-Úbeda
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Av. Digonal 643, 08028 Barcelona, Spain
- Institut de Biomedicina (IBUB), Universitat de Barcelona, Av. Diagonal 643, 08028 Barcelona, Spain
| |
Collapse
|
3
|
Chen R, Li F, Guo X, Bi Y, Li C, Pan S, Coin LJM, Song J. ATTIC is an integrated approach for predicting A-to-I RNA editing sites in three species. Brief Bioinform 2023; 24:bbad170. [PMID: 37150785 PMCID: PMC10565902 DOI: 10.1093/bib/bbad170] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 04/12/2023] [Accepted: 04/14/2023] [Indexed: 05/09/2023] Open
Abstract
A-to-I editing is the most prevalent RNA editing event, which refers to the change of adenosine (A) bases to inosine (I) bases in double-stranded RNAs. Several studies have revealed that A-to-I editing can regulate cellular processes and is associated with various human diseases. Therefore, accurate identification of A-to-I editing sites is crucial for understanding RNA-level (i.e. transcriptional) modifications and their potential roles in molecular functions. To date, various computational approaches for A-to-I editing site identification have been developed; however, their performance is still unsatisfactory and needs further improvement. In this study, we developed a novel stacked-ensemble learning model, ATTIC (A-To-I ediTing predICtor), to accurately identify A-to-I editing sites across three species, including Homo sapiens, Mus musculus and Drosophila melanogaster. We first comprehensively evaluated 37 RNA sequence-derived features combined with 14 popular machine learning algorithms. Then, we selected the optimal base models to build a series of stacked ensemble models. The final ATTIC framework was developed based on the optimal models improved by the feature selection strategy for specific species. Extensive cross-validation and independent tests illustrate that ATTIC outperforms state-of-the-art tools for predicting A-to-I editing sites. We also developed a web server for ATTIC, which is publicly available at http://web.unimelb-bioinfortools.cloud.edu.au/ATTIC/. We anticipate that ATTIC can be utilized as a useful tool to accelerate the identification of A-to-I RNA editing events and help characterize their roles in post-transcriptional regulation.
Collapse
Affiliation(s)
- Ruyi Chen
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Fuyi Li
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Yue Bi
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
| | - Shirui Pan
- School of Information and Communication Technology, Griffith University, QLD 4222, Australia
| | - Lachlan J M Coin
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, VIC 3800, Australia
| |
Collapse
|
4
|
RNA modifications in aging-associated cardiovascular diseases. Aging (Albany NY) 2022; 14:8110-8136. [PMID: 36178367 PMCID: PMC9596201 DOI: 10.18632/aging.204311] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 09/17/2022] [Indexed: 11/25/2022]
Abstract
Cardiovascular disease (CVD) is a leading cause of morbidity and mortality worldwide that bears an enormous healthcare burden and aging is a major contributing factor to CVDs. Functional gene expression network during aging is regulated by mRNAs transcriptionally and by non-coding RNAs epi-transcriptionally. RNA modifications alter the stability and function of both mRNAs and non-coding RNAs and are involved in differentiation, development, and diseases. Here we review major chemical RNA modifications on mRNAs and non-coding RNAs, including N6-adenosine methylation, N1-adenosine methylation, 5-methylcytidine, pseudouridylation, 2′ -O-ribose-methylation, and N7-methylguanosine, in the aging process with an emphasis on cardiovascular aging. We also summarize the currently available methods to detect RNA modifications and the bioinformatic tools to study RNA modifications. More importantly, we discussed the specific implication of the RNA modifications on mRNAs and non-coding RNAs in the pathogenesis of aging-associated CVDs, including atherosclerosis, hypertension, coronary heart diseases, congestive heart failure, atrial fibrillation, peripheral artery disease, venous insufficiency, and stroke.
Collapse
|
5
|
Qin S, Fan Y, Hu S, Wang Y, Wang Z, Cao Y, Liu Q, Tan S, Dai Z, Zhou W. iPReditor-CMG: Improving a predictive RNA editor for crop mitochondrial genomes using genomic sequence features and an optimal support vector machine. PHYTOCHEMISTRY 2022; 200:113222. [PMID: 35561852 DOI: 10.1016/j.phytochem.2022.113222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 04/29/2022] [Accepted: 04/30/2022] [Indexed: 06/15/2023]
Abstract
In crops, RNA editing is one of the most important post-transcriptional processes in which specific cytidines (C) in virtually all mitochondrial protein-coding genes are converted to uridines (U). Despite extensive recent research in RNA editing, exploring all of the C-to-U editing events efficiently on the genomic scale remains challengeable. Developing accurate prediction methods for the detection of RNA editing sites would dramatically reduce experimental determination. Therefore, we propose a novel method, iPReditor-CMG (improved predictive RNA editor for crop mitochondrial genomes), to predict crop mitochondrial editing sites using genome sequence and an optimised support vector machine (SVM). We first selected three mitochondrial genomes with known RNA editing sites from Arabidopsis thaliana, Brassica napus and Oryza sativa, released by NCBI, as the training and test sets. The genes and their transcripts from self-sequenced tobacco mitochondrial ATPase were selected as the validation set. The iPReditor-CMG first coded the genome sequences as numerical vectors and then performed an efficient feature selection on the high-dimensional feature space, where the SVM was employed in feature selection and following modelling. The average independent prediction accuracy of intraspecific editing sites across three species was 0.85, and up to 0.91 in A. thaliana, which outperformed the reference models. For the interspecific independent prediction, the prediction accuracy between dicotyledons was 0.78 and the accuracy between dicotyledons and monocotyledons was 0.56, which implies that there might be similarity in the C-to-U editing mechanism in close relatives. Finally, the best model was identified with an independent test accuracy of 0.91 and an AUC of 0.88, which suggested that five unreported feature sequences, i.e. TGACA, ACAAC, GTAGA, CCGTT and TAACA, are closely associated with the editing phenomenon. Multiple tests supported that the iPReditor-CMG could be effectively applied to predict editing sites in crop mitochondria, which may further contribute to understanding the mechanisms of site editing and post-transcriptional events in crop mitochondria.
Collapse
Affiliation(s)
- Sidong Qin
- Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China
| | - Yanjun Fan
- Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China; Shanxi Province Jincheng City Landscaping Service Center, Shanxi, 048000, China
| | - Shengnan Hu
- Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China
| | - Yongqiang Wang
- Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China
| | - Ziqi Wang
- Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China
| | - Yixiang Cao
- Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China
| | - Qiyuan Liu
- Key Laboratory of Crop Physiology, Ecology and Genetic Breeding, Ministry of Education, College of Agronomy, Jiangxi Agricultural University, Nanchang, 330045, China
| | - Siqiao Tan
- College of Information and Intelligence, Hunan Agricultural University, Changsha, 410128, China
| | - Zhijun Dai
- Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China
| | - Wei Zhou
- Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China.
| |
Collapse
|
6
|
An Effective Deep Learning-Based Architecture for Prediction of N7-Methylguanosine Sites in Health Systems. ELECTRONICS 2022. [DOI: 10.3390/electronics11121917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
N7-methylguanosine (m7G) is one of the most important epigenetic modifications found in rRNA, mRNA, and tRNA, and performs a promising role in gene expression regulation. Owing to its significance, well-equipped traditional laboratory-based techniques have been performed for the identification of N7-methylguanosine (m7G). Consequently, these approaches were found to be time-consuming and cost-ineffective. To move on from these traditional approaches to predict N7-methylguanosine sites with high precision, the concept of artificial intelligence has been adopted. In this study, an intelligent computational model called N7-methylguanosine-Long short-term memory (m7G-LSTM) is introduced for the prediction of N7-methylguanosine sites. One-hot encoding and word2vec feature schemes are used to express the biological sequences while the LSTM and CNN algorithms have been employed for classification. The proposed “m7G-LSTM” model obtained an accuracy value of 95.95%, a specificity value of 95.94%, a sensitivity value of 95.97%, and Matthew’s correlation coefficient (MCC) value of 0.919. The proposed predictive m7G-LSTM model has significantly achieved better outcomes than previous models in terms of all evaluation parameters. The proposed m7G-LSTM computational system aims to support the drug industry and help researchers in the fields of bioinformatics to enhance innovation for the prediction of the behavior of N7-methylguanosine sites.
Collapse
|
7
|
Wang H, Wang S, Zhang Y, Bi S, Zhu X. A brief review of machine learning methods for RNA methylation sites prediction. Methods 2022; 203:399-421. [DOI: 10.1016/j.ymeth.2022.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 02/15/2022] [Accepted: 03/01/2022] [Indexed: 02/07/2023] Open
|
8
|
Qiao H, Zhang S, Xue T, Wang J, Wang B. iPro-GAN: A novel model based on generative adversarial learning for identifying promoters and their strength. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 215:106625. [PMID: 35038653 DOI: 10.1016/j.cmpb.2022.106625] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 12/13/2021] [Accepted: 01/06/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND AND OBJECTIVE Promoter is a component of the gene, which can specifically bind with RNA polymerase and determine where transcription starts, and also determine the transcription efficiency of the gene. Promoters can be divided into strong promoters and weak promoters because their structures and the interaction time interval are quite different. The functional variation of the promoter can lead to a variety of diseases. Therefore, identifying promoters and their strength is necessary and has important biological significance. A novel and promising model based on deep learning is proposed to achieve it. METHODS In this work, we build a power model named iPro-GAN for identification of promoters and their strength. First, we collect benchmark datasets and independent datasets for training and testing. Then, Moran-based spatial auto-cross correlation method is used as feature extraction method. Finally, deep convolution generative adversarial network with 10-fold cross validation is applied for classifying. The first layer of the model is used to identify the promoter and the second layer is used to determine its type. RESULTS On the benchmark data set, the accuracy of the first layer predictor is 93.15%, and the accuracy of the second layer predictor is 92.30%. On the independent data set, the accuracy of the first layer predictor is 86.77%, and the accuracy of the second layer predictor is 91.66%. In particular, breakthrough progress has been made in the identification of promoters' strength. CONCLUSIONS These results are far higher than the existing best predictor, which indicate that our model is serviceable and practicable to identify promoters and their strength. Furthermore, the datasets and source codes are available from this link: https://github.com/Bovbene/iPro-GAN.
Collapse
Affiliation(s)
- Huijuan Qiao
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China.
| | - Tian Xue
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Jinyue Wang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Bowei Wang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| |
Collapse
|
9
|
Tahir M, Khan F, Hayat M, Alshehri MD. An effective machine learning-based model for the prediction of protein–protein interaction sites in health systems. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07024-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
10
|
Zhang Z, Wang L. Using Chou's 5-steps rule to identify N 6-methyladenine sites by ensemble learning combined with multiple feature extraction methods. J Biomol Struct Dyn 2022; 40:796-806. [PMID: 32948102 DOI: 10.1080/07391102.2020.1821778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
N6-methyladenine (m6A), a type of modification mostly affecting the downstream biological functions and determining the levels of gene expression, is mediated by the methylation of adenine in nucleic acids. It is also a key factor for influencing biological processes and has attracted attention as a target for treating diseases. Here, an ensemble predictor named as TL-Methy, was developed to identify m6A sites across the genome. TL-Methy is a 2-level machine learning method developed by combining the support vector machine model and multiple features extraction methods, including nucleic acid composition, di-nucleotide composition, tri-nucleotide composition, position-specific trinucleotide propensity, Bi-profile Bayes, binary encoding, and accumulated nucleotide frequency. For Homo sapiens, TL-Methy method reached the accuracy of 91.68% on jackknife test and of 92.23% on 10-fold cross validation test; For Mus musculus, TL-Methy method achieved the accuracy of 93.66% on jackknife test and of 97.07% on 10-fold cross validation test; For Saccharomyces cerevisiae, TL-Methy method obtained the accuracy of 81.57% on jackknife test and of 82.54% on 10-fold cross validation test; For rice genome, TL-Methy method achieved the accuracy of 91.87% on jackknife test and of 93.04% on 10-fold cross validation test. The results via these two test approaches demonstrated the robustness and practicality of our TL-Methy model. The TL-Methy model may be as a potential method for m6A site identification.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Zhongwang Zhang
- College of Science, Dalian Maritime University, Dalian, P.R. China
| | - Lidong Wang
- College of Science, Dalian Maritime University, Dalian, P.R. China
| |
Collapse
|
11
|
Genome-Wide Scanning of Potential Hotspots for Adenosine Methylation: A Potential Path to Neuronal Development. Life (Basel) 2021; 11:life11111185. [PMID: 34833061 PMCID: PMC8618456 DOI: 10.3390/life11111185] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Revised: 10/23/2021] [Accepted: 10/30/2021] [Indexed: 12/27/2022] Open
Abstract
Methylation of adenosines at N6 position (m6A) is the most frequent internal modification in mRNAs of the human genome and attributable to diverse roles in physiological development, and pathophysiological processes. However, studies on the role of m6A in neuronal development are sparse and not well-documented. The m6A detection remains challenging due to its inconsistent pattern and less sensitivity by the current detection techniques. Therefore, we applied a sliding window technique to identify the consensus site (5′-GGACT-3′) n ≥ 2 and annotated all m6A hotspots in the human genome. Over 6.78 × 107 hotspots were identified and 96.4% were found to be located in the non-coding regions, suggesting that methylation occurs before splicing. Several genes, RPS6K, NRP1, NRXN, EGFR, YTHDF2, have been involved in various stages of neuron development and their functioning. However, the contribution of m6A in these genes needs further validation in the experimental model. Thus, the present study elaborates the location of m6A in the human genome and its function in neuron physiology.
Collapse
|
12
|
El Allali A, Elhamraoui Z, Daoud R. Machine learning applications in RNA modification sites prediction. Comput Struct Biotechnol J 2021; 19:5510-5524. [PMID: 34712397 PMCID: PMC8517552 DOI: 10.1016/j.csbj.2021.09.025] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/24/2021] [Accepted: 09/25/2021] [Indexed: 12/15/2022] Open
Abstract
Ribonucleic acid (RNA) modifications are post-transcriptional chemical composition changes that have a fundamental role in regulating the main aspect of RNA function. Recently, large datasets have become available thanks to the recent development in deep sequencing and large-scale profiling. This availability of transcriptomic datasets has led to increased use of machine learning based approaches in epitranscriptomics, particularly in identifying RNA modifications. In this review, we comprehensively explore machine learning based approaches used for the prediction of 11 RNA modification types, namely,m 1 A ,m 6 A ,m 5 C , 5 hmC , ψ , 2 ' - O - Me , ac 4 C ,m 7 G , A - to - I ,m 2 G , and D . This review covers the life cycle of machine learning methods to predict RNA modification sites including available benchmark datasets, feature extraction, and classification algorithms. We compare available methods in terms of datasets, target species, approach, and accuracy for each RNA modification type. Finally, we discuss the advantages and limitations of the reviewed approaches and suggest future perspectives.
Collapse
Affiliation(s)
- A. El Allali
- African Genome Center, University Mohamed VI Polytechnic, Morocco
| | - Zahra Elhamraoui
- African Genome Center, University Mohamed VI Polytechnic, Morocco
| | - Rachid Daoud
- African Genome Center, University Mohamed VI Polytechnic, Morocco
| |
Collapse
|
13
|
RDDSVM: accurate prediction of A-to-I RNA editing sites from sequence using support vector machines. Funct Integr Genomics 2021; 21:633-643. [PMID: 34529170 DOI: 10.1007/s10142-021-00805-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Revised: 08/30/2021] [Accepted: 08/31/2021] [Indexed: 10/20/2022]
Abstract
Adenosine to inosine (A-to-I) editing in RNA is involved in various biological processes like gene expression, alternative splicing, and mRNA degradation associated with carcinogenesis and various human diseases. Therefore, accurate identification of RNA editing sites in transcriptome is valuable for research and medicine. RNA-seq is very useful for the detection of RNA editing events in condition-specific cells. However, computational analysis methods of RNA-seq data have considerable false-positive risks due to mapping errors. In this study, we developed a simple machine learning method using support vector machines to train sequence and structure information derived from flanking sequences of experimentally verified A-to-I editing sites to predict new A-to-I editing sites in RNA. The highest performance results were obtained by the model that utilizes the composition of the triplet sequence elements in the flanking regions of the in A-to-I editing sites. Using this model, the SVM classifier also showed high performance on experimentally verified data providing a sensitivity of 92.8%, specificity of 77.1%, and accuracy of 90.2%. To compare the predictive capacity of our method with other classifiers that use sequence information, we have used validated human A-to-I RNA editing sites by Sanger sequencing. Out of 58 validated editing sites, our method recognized 53 of them correctly with an accuracy of 91.4% outperforming other classifiers. As to our knowledge, this is the first case of utilization of the composition of the triplet sequence elements neighboring A-to-I editing sites for the prediction of new A-to-I editing sites in RNA. The methodology is very easy to perform and computationally low demanding making it a convenient and valuable choice for facilities with low sources. To facilitate the usage of the method publicly, we developed an open-source program called RDDSVM to perform prediction on candidate A-to-I RNA editing sites using support vector machines.
Collapse
|
14
|
Akmal MA, Hussain W, Rasool N, Khan YD, Khan SA, Chou KC. Using CHOU'S 5-Steps Rule to Predict O-Linked Serine Glycosylation Sites by Blending Position Relative Features and Statistical Moment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2045-2056. [PMID: 31985438 DOI: 10.1109/tcbb.2020.2968441] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Glycosylation of proteins in eukaryote cells is an important and complicated post-translation modification due to its pivotal role and association with crucial physiological functions within most of the proteins. Identification of glycosylation sites in a polypeptide chain is not an easy task due to multiple impediments. Analytical identification of these sites is expensive and laborious. There is a dire need to develop a reliable computational method for precise determination of such sites which can help researchers to save time and effort. Herein, we propose a novel predictor namely iGlycoS-PseAAC by integrating the Chou's Pseudo Amino Acid Composition (PseAAC) and relative/absolute position-based features. The self-consistency results show that the accuracy revealed by the model using the benchmark dataset for prediction of O-linked glycosylation having serine sites is 98.8 percent. The overall accuracy of predictor achieved through 10-fold cross validation by combining the positive and negative results is 97.2 percent. The overall accuracy achieved through Jackknife test is 96.195 percent by aggregating of all the prediction results. Thus the proposed predictor can help in predicting the O-linked glycosylated serine sites in an efficient and accurate way. The overall results show that the accuracy of the iGlycoS-PseAAC is higher than the existing tools.
Collapse
|
15
|
Feng P, Feng L, Tang C. Comparison and Analysis of Computational Methods for Identifying N6-Methyladenosine Sites in Saccharomyces cerevisiae. Curr Pharm Des 2021; 27:1219-1229. [PMID: 33167827 DOI: 10.2174/1381612826666201109110703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Accepted: 07/20/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND N6-methyladenosine (m6A) plays critical roles in a broad range of biological processes. Knowledge about the precise location of m6A site in the transcriptome is vital for deciphering its biological functions. Although experimental techniques have made substantial contributions to identify m6A, they are still labor intensive and time consuming. As complement to experimental methods, in the past few years, a series of computational approaches have been proposed to identify m6A sites. METHODS In order to facilitate researchers to select appropriate methods for identifying m6A sites, it is necessary to conduct a comprehensive review and comparison of existing methods. RESULTS Since research works on m6A in Saccharomyces cerevisiae are relatively clear, in this review, we summarized recent progress of computational prediction of m6A sites in S. cerevisiae and assessed the performance of existing computational methods. Finally, future directions of computationally identifying m6A sites are presented. CONCLUSION Taken together, we anticipate that this review will serve as an important guide for computational analysis of m6A modifications.
Collapse
Affiliation(s)
- Pengmian Feng
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| | - Lijing Feng
- School of Sciences, North China University of Science and Technology, Tangshan 063000, China
| | - Chaohui Tang
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| |
Collapse
|
16
|
Li G, Du X, Li X, Zou L, Zhang G, Wu Z. Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning. PeerJ 2021; 9:e11262. [PMID: 33986992 PMCID: PMC8101451 DOI: 10.7717/peerj.11262] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 03/22/2021] [Indexed: 12/12/2022] Open
Abstract
DNA-binding proteins (DBPs) play pivotal roles in many biological functions such as alternative splicing, RNA editing, and methylation. Many traditional machine learning (ML) methods and deep learning (DL) methods have been proposed to predict DBPs. However, these methods either rely on manual feature extraction or fail to capture long-term dependencies in the DNA sequence. In this paper, we propose a method, called PDBP-Fusion, to identify DBPs based on the fusion of local features and long-term dependencies only from primary sequences. We utilize convolutional neural network (CNN) to learn local features and use bi-directional long-short term memory network (Bi-LSTM) to capture critical long-term dependencies in context. Besides, we perform feature extraction, model training, and model prediction simultaneously. The PDBP-Fusion approach can predict DBPs with 86.45% sensitivity, 79.13% specificity, 82.81% accuracy, and 0.661 MCC on the PDB14189 benchmark dataset. The MCC of our proposed methods has been increased by at least 9.1% compared to other advanced prediction models. Moreover, the PDBP-Fusion also gets superior performance and model robustness on the PDB2272 independent dataset. It demonstrates that the PDBP-Fusion can be used to predict DBPs from sequences accurately and effectively; the online server is at http://119.45.144.26:8080/PDBP-Fusion/.
Collapse
Affiliation(s)
- Guobin Li
- School of Artificial Intelligence and Big Data, Hefei University, Hefei, China
| | - Xiuquan Du
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xinlu Li
- School of Artificial Intelligence and Big Data, Hefei University, Hefei, China
| | - Le Zou
- School of Artificial Intelligence and Big Data, Hefei University, Hefei, China
| | - Guanhong Zhang
- School of Artificial Intelligence and Big Data, Hefei University, Hefei, China
| | - Zhize Wu
- School of Artificial Intelligence and Big Data, Hefei University, Hefei, China
| |
Collapse
|
17
|
Awais M, Hussain W, Khan YD, Rasool N, Khan SA, Chou KC. iPhosH-PseAAC: Identify Phosphohistidine Sites in Proteins by Blending Statistical Moments and Position Relative Features According to the Chou's 5-Step Rule and General Pseudo Amino Acid Composition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:596-610. [PMID: 31144645 DOI: 10.1109/tcbb.2019.2919025] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Protein phosphorylation is one of the key mechanism in prokaryotes and eukaryotes and is responsible for various biological functions such as protein degradation, intracellular localization, the multitude of cellular processes, molecular association, cytoskeletal dynamics, and enzymatic inhibition/activation. Phosphohistidine (PhosH) has a key role in a number of biological processes, including central metabolism to signalling in eukaryotes and bacteria. Thus, identification of phosphohistidine sites in a protein sequence is crucial, and experimental identification can be expensive, time-taking, and laborious. To address this problem, here, we propose a novel computational model namely iPhosH-PseAAC for prediction of phosphohistidine sites in a given protein sequence using pseudo amino acid composition (PseAAC), statistical moments, and position relative features. The results of the proposed predictor are validated through self-consistency testing, 10-fold cross-validation, and jackknife testing. The self-consistency validation gave the 100 percent accuracy, whereas, for cross-validation, the accuracy achieved is 94.26 percent. Moreover, jackknife testing gave 97.07 percent accuracy for the proposed model. Thus, the proposed model iPhosH-PseAAC for prediction of iPhosH site has the great ability to predict the PhosH sites in given proteins.
Collapse
|
18
|
Wang H, Chen S, Wei J, Song G, Zhao Y. A-to-I RNA Editing in Cancer: From Evaluating the Editing Level to Exploring the Editing Effects. Front Oncol 2021; 10:632187. [PMID: 33643923 PMCID: PMC7905090 DOI: 10.3389/fonc.2020.632187] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Accepted: 12/21/2020] [Indexed: 12/21/2022] Open
Abstract
As an important regulatory mechanism at the posttranscriptional level in metazoans, adenosine deaminase acting on RNA (ADAR)-induced A-to-I RNA editing modification of double-stranded RNA has been widely detected and reported. Editing may lead to non-synonymous amino acid mutations, RNA secondary structure alterations, pre-mRNA processing changes, and microRNA-mRNA redirection, thereby affecting multiple cellular processes and functions. In recent years, researchers have successfully developed several bioinformatics software tools and pipelines to identify RNA editing sites. However, there are still no widely accepted editing site standards due to the variety of parallel optimization and RNA high-seq protocols and programs. It is also challenging to identify RNA editing by normal protocols in tumor samples due to the high DNA mutation rate. Numerous RNA editing sites have been reported to be located in non-coding regions and can affect the biosynthesis of ncRNAs, including miRNAs and circular RNAs. Predicting the function of RNA editing sites located in non-coding regions and ncRNAs is significantly difficult. In this review, we aim to provide a better understanding of bioinformatics strategies for human cancer A-to-I RNA editing identification and briefly discuss recent advances in related areas, such as the oncogenic and tumor suppressive effects of RNA editing.
Collapse
Affiliation(s)
- Heming Wang
- Clinical Medical College, Changchun University of Chinese Medicine, Changchun, China
- Department of Gastroenterology and Hepatology, Zhongshan Hospital of Fudan University, Shanghai, China
- Shanghai Institute of Liver Diseases, Shanghai, China
| | - Sinuo Chen
- Department of Gastroenterology and Hepatology, Zhongshan Hospital of Fudan University, Shanghai, China
- Shanghai Institute of Liver Diseases, Shanghai, China
| | - Jiayi Wei
- Department of Gastroenterology and Hepatology, Zhongshan Hospital of Fudan University, Shanghai, China
- Shanghai Institute of Liver Diseases, Shanghai, China
| | - Guangqi Song
- Department of Gastroenterology and Hepatology, Zhongshan Hospital of Fudan University, Shanghai, China
- Shanghai Institute of Liver Diseases, Shanghai, China
| | - Yicheng Zhao
- Clinical Medical College, Changchun University of Chinese Medicine, Changchun, China
| |
Collapse
|
19
|
Khan YD, Alzahrani E, Alghamdi W, Ullah MZ. Sequence-based Identification of Allergen Proteins Developed by Integration of PseAAC and Statistical Moments via 5-Step Rule. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200424085947] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Background:
Allergens are antigens that can stimulate an atopic type I human
hypersensitivity reaction by an immunoglobulin E (IgE) reaction. Some proteins are naturally
allergenic than others. The challenge for toxicologists is to identify properties that allow proteins
to cause allergic sensitization and allergic diseases. The identification of allergen proteins is a very
critical and pivotal task. The experimental identification of protein functions is a hectic, laborious
and costly task; therefore, computer scientists have proposed various methods in the field of
computational biology and bioinformatics using various data science approaches. Objectives:
Herein, we report a novel predictor for the identification of allergen proteins.
Methods:
For feature extraction, statistical moments and various position-based features have been
incorporated into Chou’s pseudo amino acid composition (PseAAC), and are used for training of a
neural network.
Results:
The predictor is validated through 10-fold cross-validation and Jackknife testing, which
gave 99.43% and 99.87% accurate results.
Conclusions:
Thus, the proposed predictor can help in predicting the Allergen proteins in an
efficient and accurate way and can provide baseline data for the discovery of new drugs and
biomarkers.
Collapse
Affiliation(s)
- Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, C II Johar Town, Lahore 54770, Pakistan
| | - Ebraheem Alzahrani
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, P.O. Box 80221, Jeddah, Saudi Arabia
| | - Malik Zaka Ullah
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia
| |
Collapse
|
20
|
Liu GH, Zhang BW, Qian G, Wang B, Mao B, Bichindaritz I. Bioimage-Based Prediction of Protein Subcellular Location in Human Tissue with Ensemble Features and Deep Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1966-1980. [PMID: 31107658 DOI: 10.1109/tcbb.2019.2917429] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Prediction of protein subcellular location has currently become a hot topic because it has been proven to be useful for understanding both the disease mechanisms and novel drug design. With the rapid development of automated microscopic imaging technology in recent years, classification methods of bioimage-based protein subcellular location have attracted considerable attention for images can describe the protein distribution intuitively and in detail. In the current study, a prediction method of protein subcellular location was proposed based on multi-view image features that are extracted from three different views, including the four texture features of the original image, the global and local features of the protein extracted from the protein channel images after color segmentation, and the global features of DNA extracted from the DNA channel image. Finally, the extracted features were combined together to improve the performance of subcellular localization prediction. From the performance comparison of different combination features under the same classifier, the best ensemble features could be obtained. In this work, a classifier based on Stacked Auto-encoders and the random forest was also put forward. To improve the prediction results, the deep network was combined with the traditional statistical classification methods. Stringent cross-validation and independent validation tests on the benchmark dataset demonstrated the efficacy of the proposed method.
Collapse
|
21
|
Liu K, Chen W. iMRM: a platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics 2020; 36:3336-3342. [PMID: 32134472 DOI: 10.1093/bioinformatics/btaa155] [Citation(s) in RCA: 108] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 02/26/2020] [Accepted: 02/28/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION RNA modifications play critical roles in a series of cellular and developmental processes. Knowledge about the distributions of RNA modifications in the transcriptomes will provide clues to revealing their functions. Since experimental methods are time consuming and laborious for detecting RNA modifications, computational methods have been proposed for this aim in the past five years. However, there are some drawbacks for both experimental and computational methods in simultaneously identifying modifications occurred on different nucleotides. RESULTS To address such a challenge, in this article, we developed a new predictor called iMRM, which is able to simultaneously identify m6A, m5C, m1A, ψ and A-to-I modifications in Homo sapiens, Mus musculus and Saccharomyces cerevisiae. In iMRM, the feature selection technique was used to pick out the optimal features. The results from both 10-fold cross-validation and jackknife test demonstrated that the performance of iMRM is superior to existing methods for identifying RNA modifications. AVAILABILITY AND IMPLEMENTATION A user-friendly web server for iMRM was established at http://www.bioml.cn/XG_iRNA/home. The off-line command-line version is available at https://github.com/liukeweiaway/iMRM. CONTACT greatchen@ncst.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kewei Liu
- School of Life Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063009, China
| | - Wei Chen
- School of Life Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063009, China.,Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| |
Collapse
|
22
|
Amanat S, Ashraf A, Hussain W, Rasool N, Khan YD. Identification of Lysine Carboxylation Sites in Proteins by Integrating Statistical Moments and Position Relative Features via General PseAAC. Curr Bioinform 2020. [DOI: 10.2174/1574893614666190723114923] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Background:
Carboxylation is one of the most biologically important post-translational
modifications and occurs on lysine, arginine, and glutamine residues of a protein. Among all these
three, the covalent attachment of the carboxyl group with the lysine side chain is the most frequent
and biologically important type of carboxylation. For studying such biological functions, it is essential
to correctly determine the lysine sites sensitive to carboxylation.
Objective:
Herein, we present a computational model for the prediction of the carboxylysine site
which is based on machine learning.
Methods:
Various position and composition relative features have been incorporated into the Pse-
AAC for construction of feature vectors and a neural network is employed as a classifier. The
model is validated by jackknife, cross-validation, self-consistency, and independent testing.
Results:
The results of the self-consistency test elaborated that model has 99.76% Acc, 99.76% Sp,
99.76% Sp, and 0.99 MCC. Using the jackknife method, prediction model validation gave 97.07%
Acc, while for 10-fold cross-validation, prediction model validation gave 95.16% Acc.
Conclusion:
The results of independent dataset testing were 94.3% which illustrated that the proposed
model has better performance as compared to the existing model PreLysCar; however, the
accuracy can be improved further, in the future, due to the increasing number of carboxylysine
sites in proteins.
Collapse
Affiliation(s)
- Saba Amanat
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Adeel Ashraf
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Waqar Hussain
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Nouman Rasool
- Department of Life Sciences, School of Science University of Management and Technology, Lahore, Pakistan
| | - Yaser D. Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
23
|
DeepPred-SubMito: A Novel Submitochondrial Localization Predictor Based on Multi-Channel Convolutional Neural Network and Dataset Balancing Treatment. Int J Mol Sci 2020; 21:ijms21165710. [PMID: 32784927 PMCID: PMC7460811 DOI: 10.3390/ijms21165710] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 08/05/2020] [Accepted: 08/07/2020] [Indexed: 12/18/2022] Open
Abstract
Mitochondrial proteins are physiologically active in different compartments, and their abnormal location will trigger the pathogenesis of human mitochondrial pathologies. Correctly identifying submitochondrial locations can provide information for disease pathogenesis and drug design. A mitochondrion has four submitochondrial compartments, the matrix, the outer membrane, the inner membrane, and the intermembrane space, but various existing studies ignored the intermembrane space. The majority of researchers used traditional machine learning methods for predicting mitochondrial protein localization. Those predictors required expert-level knowledge of biology to be encoded as features rather than allowing the underlying predictor to extract features through a data-driven procedure. Besides, few researchers have considered the imbalance in datasets. In this paper, we propose a novel end-to-end predictor employing deep neural networks, DeepPred-SubMito, for protein submitochondrial location prediction. First, we utilize random over-sampling to decrease the influence caused by unbalanced datasets. Next, we train a multi-channel bilayer convolutional neural network for multiple subsequences to learn high-level features. Third, the prediction result is outputted through the fully connected layer. The performance of the predictor is measured by 10-fold cross-validation and 5-fold cross-validation on the SM424-18 dataset and the SubMitoPred dataset, respectively. Experimental results show that the predictor outperforms state-of-the-art predictors. In addition, the prediction of results in the M983 dataset also confirmed its effectiveness in predicting submitochondrial locations.
Collapse
|
24
|
Gachpazan M, Kashani H, Khazaei M, Hassanian SM, Rezayi M, Asgharzadeh F, Ghayour-Mobarhan M, Ferns GA, Avan A. The Impact of Statin Therapy on the Survival of Patients with Gastrointestinal Cancer. Curr Drug Targets 2020; 20:738-747. [PMID: 30539694 DOI: 10.2174/1389450120666181211165449] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 10/25/2018] [Accepted: 12/05/2018] [Indexed: 12/13/2022]
Abstract
Statins are 3-hydroxy-3-methylglutaryl coenzyme A (HMG-CoA) reductase inhibitors that may play an important role in the evolution of cancers, due to their effects on cancer cell metabolism. Statins affect several potential pathways, including cell proliferation, angiogenesis, apoptosis and metastasis. The number of trials assessing the putative clinical benefits of statins in cancer is increasing. Currently, there are several trials listed on the global trial identifier website clinicaltrials.gov. Given the compelling evidence from these trials in a variety of clinical settings, there have been calls for a clinical trial of statins in the adjuvant gastrointestinal cancer setting. However, randomized controlled trials on specific cancer types in relation to statin use, as well as studies on populations without a clinical indication for using statins, have elucidated some potential underlying biological mechanisms, and the investigation of different statins is probably warranted. It would be useful for these trials to incorporate the assessment of tumour biomarkers predictive of statin response in their design. This review summarizes the recent preclinical and clinical studies that assess the application of statins in the treatment of gastrointestinal cancers with particular emphasize on their association with cancer risk.
Collapse
Affiliation(s)
- Meysam Gachpazan
- Metabolic syndrome Research center, Mashhad University of Medical Sciences, Mashhad, Iran.,Department of Modern Sciences and Technologies; Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Hoda Kashani
- Department of Modern Sciences and Technologies; Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Majid Khazaei
- Metabolic syndrome Research center, Mashhad University of Medical Sciences, Mashhad, Iran.,Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Seyed Mahdi Hassanian
- Metabolic syndrome Research center, Mashhad University of Medical Sciences, Mashhad, Iran.,Department of Medical Biochemistry; Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Majid Rezayi
- Metabolic syndrome Research center, Mashhad University of Medical Sciences, Mashhad, Iran.,Department of Modern Sciences and Technologies; Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Fereshteh Asgharzadeh
- Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Majid Ghayour-Mobarhan
- Metabolic syndrome Research center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Gordon A Ferns
- Brighton & Sussex Medical School, Division of Medical Education, Falmer, Brighton, Sussex BN1 9PH, United Kingdom
| | - Amir Avan
- Metabolic syndrome Research center, Mashhad University of Medical Sciences, Mashhad, Iran.,Department of Modern Sciences and Technologies; Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.,Cancer Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| |
Collapse
|
25
|
Chou KC. An Insightful 10-year Recollection Since the Emergence of the 5-steps Rule. Curr Pharm Des 2020; 25:4223-4234. [PMID: 31782354 DOI: 10.2174/1381612825666191129164042] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/25/2019] [Indexed: 11/22/2022]
Abstract
OBJECTIVE One of the most challenging and also the most difficult problems is how to formulate a biological sequence with a vector but considerably keep its sequence order information. METHODS To address such a problem, the approach of Pseudo Amino Acid Components or PseAAC has been developed. RESULTS AND CONCLUSION It has become increasingly clear via the 10-year recollection that the aforementioned proposal has been indeed very powerful.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, Massachusetts 02478, United States.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
26
|
Saikia S, Bordoloi M, Sarmah R. Established and In-trial GPCR Families in Clinical Trials: A Review for Target Selection. Curr Drug Targets 2020; 20:522-539. [PMID: 30394207 DOI: 10.2174/1389450120666181105152439] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 08/28/2018] [Accepted: 10/22/2018] [Indexed: 12/14/2022]
Abstract
The largest family of drug targets in clinical trials constitute of GPCRs (G-protein coupled receptors) which accounts for about 34% of FDA (Food and Drug Administration) approved drugs acting on 108 unique GPCRs. Factors such as readily identifiable conserved motif in structures, 127 orphan GPCRs despite various de-orphaning techniques, directed functional antibodies for validation as drug targets, etc. has widened their therapeutic windows. The availability of 44 crystal structures of unique receptors, unexplored non-olfactory GPCRs (encoded by 50% of the human genome) and 205 ligand receptor complexes now present a strong foundation for structure-based drug discovery and design. The growing impact of polypharmacology for complex diseases like schizophrenia, cancer etc. warrants the need for novel targets and considering the undiscriminating and selectivity of GPCRs, they can fulfill this purpose. Again, natural genetic variations within the human genome sometimes delude the therapeutic expectations of some drugs, resulting in medication response differences and ADRs (adverse drug reactions). Around ~30 billion US dollars are dumped annually for poor accounting of ADRs in the US alone. To curb such undesirable reactions, the knowledge of established and currently in clinical trials GPCRs families can offer huge understanding towards the drug designing prospects including "off-target" effects reducing economical resource and time. The druggability of GPCR protein families and critical roles played by them in complex diseases are explained. Class A, class B1, class C and class F are generally established family and GPCRs in phase I (19%), phase II(29%), phase III(52%) studies are also reviewed. From the phase I studies, frizzled receptors accounted for the highest in trial targets, neuropeptides in phase II and melanocortin in phase III studies. Also, the bioapplications for nanoparticles along with future prospects for both nanomedicine and GPCR drug industry are discussed. Further, the use of computational techniques and methods employed for different target validations are also reviewed along with their future potential for the GPCR based drug discovery.
Collapse
Affiliation(s)
- Surovi Saikia
- Natural Products Chemistry Group, CSIR North East Institute of Science & Technology, Jorhat-785006, Assam, India
| | - Manobjyoti Bordoloi
- Natural Products Chemistry Group, CSIR North East Institute of Science & Technology, Jorhat-785006, Assam, India
| | - Rajeev Sarmah
- Allied Health Sciences, Assam Down Town University, Panikhaiti, Guwahati 781026, Assam, India
| |
Collapse
|
27
|
Hu Y, Lu Y, Wang S, Zhang M, Qu X, Niu B. Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs. Curr Drug Targets 2020; 20:488-500. [PMID: 30091413 DOI: 10.2174/1389450119666180809122244] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 06/19/2018] [Accepted: 06/25/2018] [Indexed: 12/14/2022]
Abstract
BACKGROUND Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. OBJECTIVE In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. RESULTS Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. CONCLUSION This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.
Collapse
Affiliation(s)
- Yan Hu
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Yi Lu
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Shuo Wang
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Mengying Zhang
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Xiaosheng Qu
- National Engineering Laboratory of Southwest Endangered Medicinal Resources Development, Guangxi Botanical Garden of Medicinal Plants, 530023,Nanning, China
| | - Bing Niu
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
28
|
Liu L, Song B, Ma J, Song Y, Zhang SY, Tang Y, Wu X, Wei Z, Chen K, Su J, Rong R, Lu Z, de Magalhães JP, Rigden DJ, Zhang L, Zhang SW, Huang Y, Lei X, Liu H, Meng J. Bioinformatics approaches for deciphering the epitranscriptome: Recent progress and emerging topics. Comput Struct Biotechnol J 2020; 18:1587-1604. [PMID: 32670500 PMCID: PMC7334300 DOI: 10.1016/j.csbj.2020.06.010] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2020] [Revised: 06/02/2020] [Accepted: 06/07/2020] [Indexed: 12/13/2022] Open
Abstract
Post-transcriptional RNA modification occurs on all types of RNA and plays a vital role in regulating every aspect of RNA function. Thanks to the development of high-throughput sequencing technologies, transcriptome-wide profiling of RNA modifications has been made possible. With the accumulation of a large number of high-throughput datasets, bioinformatics approaches have become increasing critical for unraveling the epitranscriptome. We review here the recent progress in bioinformatics approaches for deciphering the epitranscriptomes, including epitranscriptome data analysis techniques, RNA modification databases, disease-association inference, general functional annotation, and studies on RNA modification site prediction. We also discuss the limitations of existing approaches and offer some future perspectives.
Collapse
Affiliation(s)
- Lian Liu
- School of Computer Sciences, Shannxi Normal University, Xi’an, Shaanxi 710119, China
| | - Bowen Song
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Jiani Ma
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Yi Song
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Song-Yao Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, Shaanxi 710072, China
| | - Yujiao Tang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Xiangyu Wu
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Kunqi Chen
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Jionglong Su
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
| | - Rong Rong
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Zhiliang Lu
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - João Pedro de Magalhães
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Daniel J. Rigden
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Shao-Wu Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Yufei Huang
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, 78249, USA
- Department of Epidemiology and Biostatistics, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Xiujuan Lei
- School of Computer Sciences, Shannxi Normal University, Xi’an, Shaanxi 710119, China
| | - Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| |
Collapse
|
29
|
Prediction of N6-methyladenosine sites using convolution neural network model based on distributed feature representations. Neural Netw 2020; 129:385-391. [PMID: 32593932 DOI: 10.1016/j.neunet.2020.05.027] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2020] [Revised: 05/21/2020] [Accepted: 05/24/2020] [Indexed: 01/24/2023]
Abstract
N6-methyladenosine (m6A) is a well-studied and most common interior messenger RNA (mRNA) modification that plays an important function in cell development. N6A is found in all kingdoms of life and many other cellular processes such as RNA splicing, immune tolerance, regulatory functions, RNA processing, and cancer. Despite the crucial role of m6A in cells, it was targeted computationally, but unfortunately, the obtained results were unsatisfactory. It is imperative to develop an efficient computational model that can truly represent m6A sites. In this regard, an intelligent and highly discriminative computational model namely: m6A-word2vec is introduced for the discrimination of m6A sites. Here, a concept of natural language processing in the form of word2vec is used to represent the motif of the target class automatically. These motifs (numerical descriptors) are automatically targeted from the human genome without any clear definition. Further, the extracted feature space is then forwarded to the convolution neural network model as input for prediction. The developed computational model obtained 83.17%, 92.69%, and 90.50% accuracy for benchmark datasets S1, S2, and S3, respectively, using a 10-fold cross-validation test. The predictive outcomes validate that the developed intelligent computational model showed better performance compared to existing computational models. It is thus greatly estimated that the introduced computational model "m6A-word2vec" may be a supportive and practical tool for elementary and pharmaceutical research such as in drug design along with academia.
Collapse
|
30
|
|
31
|
Zheng L, Huang S, Mu N, Zhang H, Zhang J, Chang Y, Yang L, Zuo Y. RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5650975. [PMID: 31802128 PMCID: PMC6893003 DOI: 10.1093/database/baz131] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 10/16/2019] [Accepted: 10/17/2019] [Indexed: 12/12/2022]
Abstract
By reducing amino acid alphabet, the protein complexity can be significantly simplified, which could improve computational efficiency, decrease information redundancy and reduce chance of overfitting. Although some reduced alphabets have been proposed, different classification rules could produce distinctive results for protein sequence analysis. Thus, it is urgent to construct a systematical frame for reduced alphabets. In this work, we constructed a comprehensive web server called RAACBook for protein sequence analysis and machine learning application by integrating reduction alphabets. The web server contains three parts: (i) 74 types of reduced amino acid alphabet were manually extracted to generate 673 reduced amino acid clusters (RAACs) for dealing with unique protein problems. It is easy for users to select desired RAACs from a multilayer browser tool. (ii) An online tool was developed to analyze primary sequence of protein. The tool could produce K-tuple reduced amino acid composition by defining three correlation parameters (K-tuple, g-gap, λ-correlation). The results are visualized as sequence alignment, mergence of RAA composition, feature distribution and logo of reduced sequence. (iii) The machine learning server is provided to train the model of protein classification based on K-tuple RAAC. The optimal model could be selected according to the evaluation indexes (ROC, AUC, MCC, etc.). In conclusion, RAACBook presents a powerful and user-friendly service in protein sequence analysis and computational proteomics. RAACBook can be freely available at http://bioinfor.imu.edu.cn/raacbook. Database URL: http://bioinfor.imu.edu.cn/raacbook
Collapse
Affiliation(s)
- Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Nengjiang Mu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Haoyue Zhang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Jiayu Zhang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Yu Chang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Baojian Road No.157, Harbin 150081, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| |
Collapse
|
32
|
AHMAD WAKIL, ARAFAT EASIN, TAHERZADEH GHAZALEH, SHARMA ALOK, DIPTA SHUBHASHISROY, DEHZANGI ABDOLLAH, SHATABDA SWAKKHAR. Mal-Light: Enhancing Lysine Malonylation Sites Prediction Problem Using Evolutionary-based Features. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:77888-77902. [PMID: 33354488 PMCID: PMC7751949 DOI: 10.1109/access.2020.2989713] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Post Translational Modification (PTM) is considered an important biological process with a tremendous impact on the function of proteins in both eukaryotes, and prokaryotes cells. During the past decades, a wide range of PTMs has been identified. Among them, malonylation is a recently identified PTM which plays a vital role in a wide range of biological interactions. Notwithstanding, this modification plays a potential role in energy metabolism in different species including Homo Sapiens. The identification of PTM sites using experimental methods is time-consuming and costly. Hence, there is a demand for introducing fast and cost-effective computational methods. In this study, we propose a new machine learning method, called Mal-Light, to address this problem. To build this model, we extract local evolutionary-based information according to the interaction of neighboring amino acids using a bi-peptide based method. We then use Light Gradient Boosting (LightGBM) as our classifier to predict malonylation sites. Our results demonstrate that Mal-Light is able to significantly improve malonylation site prediction performance compared to previous studies found in the literature. Using Mal-Light we achieve Matthew's correlation coefficient (MCC) of 0.74 and 0.60, Accuracy of 86.66% and 79.51%, Sensitivity of 78.26% and 67.27%, and Specificity of 95.05% and 91.75%, for Homo Sapiens and Mus Musculus proteins, respectively. Mal-Light is implemented as an online predictor which is publicly available at: (http://brl.uiu.ac.bd/MalLight/).
Collapse
Affiliation(s)
- WAKIL AHMAD
- Department of Computer Science and Engineering, United International University, United City, Madani Avenue, Dhaka 1212, Bangladesh
| | - EASIN ARAFAT
- Department of Computer Science and Engineering, United International University, United City, Madani Avenue, Dhaka 1212, Bangladesh
| | - GHAZALEH TAHERZADEH
- Institute for Bioscience and Biotechnology Research, University of Maryland, College Park, MD, 20742, USA
| | - ALOK SHARMA
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD-4111, Australia
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Kanagawa, Japan
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji
- CREST, JST, Tokyo, 102-8666, Japan
| | - SHUBHASHIS ROY DIPTA
- Department of Computer Science and Engineering, United International University, United City, Madani Avenue, Dhaka 1212, Bangladesh
| | - ABDOLLAH DEHZANGI
- Department of Computer Science, Morgan State University, Baltimore, MD, 21251, USA
| | - SWAKKHAR SHATABDA
- Department of Computer Science and Engineering, United International University, United City, Madani Avenue, Dhaka 1212, Bangladesh
| |
Collapse
|
33
|
Qian Y, Ye S, Zhang Y, Zhang J. SUMO-Forest: A Cascade Forest based method for the prediction of SUMOylation sites on imbalanced data. Gene 2020; 741:144536. [PMID: 32160959 DOI: 10.1016/j.gene.2020.144536] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 03/03/2020] [Accepted: 03/06/2020] [Indexed: 11/30/2022]
Affiliation(s)
- Ying Qian
- School of Computer Science & Technology, East China Normal University, North Zhongshan Road, 200062 Shanghai, China.
| | - Shasha Ye
- School of Computer Science & Technology, East China Normal University, North Zhongshan Road, 200062 Shanghai, China.
| | - Yu Zhang
- School of Computer Science & Technology, East China Normal University, North Zhongshan Road, 200062 Shanghai, China.
| | - Jiongmin Zhang
- School of Computer Science & Technology, East China Normal University, North Zhongshan Road, 200062 Shanghai, China.
| |
Collapse
|
34
|
Identifying FL11 subtype by characterizing tumor immune microenvironment in prostate adenocarcinoma via Chou's 5-steps rule. Genomics 2020; 112:1500-1515. [DOI: 10.1016/j.ygeno.2019.08.021] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 08/03/2019] [Accepted: 08/26/2019] [Indexed: 12/14/2022]
|
35
|
Song J, Wang Y, Li F, Akutsu T, Rawlings ND, Webb GI, Chou KC. iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief Bioinform 2020; 20:638-658. [PMID: 29897410 PMCID: PMC6556904 DOI: 10.1093/bib/bby028] [Citation(s) in RCA: 128] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Revised: 03/02/2018] [Indexed: 01/03/2023] Open
Abstract
Regulation of proteolysis plays a critical role in a myriad of important cellular processes. The key to better understanding the mechanisms that control this process is to identify the specific substrates that each protease targets. To address this, we have developed iProt-Sub, a powerful bioinformatics tool for the accurate prediction of protease-specific substrates and their cleavage sites. Importantly, iProt-Sub represents a significantly advanced version of its successful predecessor, PROSPER. It provides optimized cleavage site prediction models with better prediction performance and coverage for more species-specific proteases (4 major protease families and 38 different proteases). iProt-Sub integrates heterogeneous sequence and structural features and uses a two-step feature selection procedure to further remove redundant and irrelevant features in an effort to improve the cleavage site prediction accuracy. Features used by iProt-Sub are encoded by 11 different sequence encoding schemes, including local amino acid sequence profile, secondary structure, solvent accessibility and native disorder, which will allow a more accurate representation of the protease specificity of approximately 38 proteases and training of the prediction models. Benchmarking experiments using cross-validation and independent tests showed that iProt-Sub is able to achieve a better performance than several existing generic tools. We anticipate that iProt-Sub will be a powerful tool for proteome-wide prediction of protease-specific substrates and their cleavage sites, and will facilitate hypothesis-driven functional interrogation of protease-specific substrate cleavage and proteolytic events.
Collapse
Affiliation(s)
- Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia.,Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Yanan Wang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, 611-0011, Japan
| | - Neil D Rawlings
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA and Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
36
|
Zheng H, Yang H, Gong D, Mai L, Qiu X, Chen L, Su X, Wei R, Zeng Z. Progress in the Mechanism and Clinical Application of Cilostazol. Curr Top Med Chem 2020; 19:2919-2936. [PMID: 31763974 DOI: 10.2174/1568026619666191122123855] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2019] [Revised: 07/27/2019] [Accepted: 08/02/2019] [Indexed: 12/20/2022]
Abstract
Cilostazol is a unique platelet inhibitor that has been used clinically for more than 20 years. As a phosphodiesterase type III inhibitor, cilostazol is capable of reversible inhibition of platelet aggregation and vasodilation, has antiproliferative effects, and is widely used in the treatment of peripheral arterial disease, cerebrovascular disease, percutaneous coronary intervention, etc. This article briefly reviews the pharmacological mechanisms and clinical application of cilostazol.
Collapse
Affiliation(s)
- Huilei Zheng
- Department of Medical Examination & Health Management, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China.,Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China
| | - Hua Yang
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China.,Department of Critical Care Medicine, Second People's Hospital of Nanning, Nanning, Guangxi, China
| | - Danping Gong
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China.,Elderly Cardiology Ward, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Lanxian Mai
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China.,Disciplinary Construction Office, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Xiaoling Qiu
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China
| | - Lidai Chen
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China
| | - Xiaozhou Su
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China
| | - Ruoqi Wei
- Department of Computer Science and Engineering, University of Bridgeport,126 Park Ave, BRIDGEPORT, CT 06604, United States
| | - Zhiyu Zeng
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China.,Elderly Cardiology Ward, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| |
Collapse
|
37
|
Pugalenthi G, Nithya V, Chou KC, Archunan G. Nglyc: A Random Forest Method for Prediction of N-Glycosylation Sites in Eukaryotic Protein Sequence. Protein Pept Lett 2020; 27:178-186. [DOI: 10.2174/0929866526666191002111404] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Revised: 07/26/2019] [Accepted: 07/29/2019] [Indexed: 01/29/2023]
Abstract
Background:N-Glycosylation is one of the most important post-translational mechanisms in eukaryotes. N-glycosylation predominantly occurs in N-X-[S/T] sequon where X is any amino acid other than proline. However, not all N-X-[S/T] sequons in proteins are glycosylated. Therefore, accurate prediction of N-glycosylation sites is essential to understand Nglycosylation mechanism.Objective:In this article, our motivation is to develop a computational method to predict Nglycosylation sites in eukaryotic protein sequences.Methods:In this article, we report a random forest method, Nglyc, to predict N-glycosylation site from protein sequence, using 315 sequence features. The method was trained using a dataset of 600 N-glycosylation sites and 600 non-glycosylation sites and tested on the dataset containing 295 Nglycosylation sites and 253 non-glycosylation sites. Nglyc prediction was compared with NetNGlyc, EnsembleGly and GPP methods. Further, the performance of Nglyc was evaluated using human and mouse N-glycosylation sites.Results:Nglyc method achieved an overall training accuracy of 0.8033 with all 315 features. Performance comparison with NetNGlyc, EnsembleGly and GPP methods shows that Nglyc performs better than the other methods with high sensitivity and specificity rate.Conclusion:Our method achieved an overall accuracy of 0.8248 with 0.8305 sensitivity and 0.8182 specificity. Comparison study shows that our method performs better than the other methods. Applicability and success of our method was further evaluated using human and mouse N-glycosylation sites. Nglyc method is freely available at https://github.com/bioinformaticsML/ Ngly.
Collapse
Affiliation(s)
- Ganesan Pugalenthi
- Pheromone Technology Laboratory, Department of Animal Science, Bharathidasan University, Tiruchirappalli- 620024, India
| | - Varadharaju Nithya
- Department of Animal Health Management, Alagappa University, Karaikudi-630003, India
| | - Kuo-Chen Chou
- Gordon Life Science Institute, San Diego, CA 92130, United States
| | - Govindaraju Archunan
- Pheromone Technology Laboratory, Department of Animal Science, Bharathidasan University, Tiruchirappalli- 620024, India
| |
Collapse
|
38
|
Ju Z, Wang SY. Prediction of lysine formylation sites using the composition of k-spaced amino acid pairs via Chou's 5-steps rule and general pseudo components. Genomics 2020; 112:859-866. [DOI: 10.1016/j.ygeno.2019.05.027] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2019] [Revised: 05/13/2019] [Accepted: 05/30/2019] [Indexed: 11/30/2022]
|
39
|
Shao YT, Liu XX, Lu Z, Chou KC. pLoc_Deep-mHum: Predict Subcellular Localization of Human Proteins by Deep Learning. ACTA ACUST UNITED AC 2020. [DOI: 10.4236/ns.2020.127042] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
40
|
Shao Y, Chou KC. pLoc_Deep-mEuk: Predict Subcellular Localization of Eukaryotic Proteins by Deep Learning. ACTA ACUST UNITED AC 2020. [DOI: 10.4236/ns.2020.126034] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
41
|
iQSP: A Sequence-Based Tool for the Prediction and Analysis of Quorum Sensing Peptides via Chou's 5-Steps Rule and Informative Physicochemical Properties. Int J Mol Sci 2019; 21:ijms21010075. [PMID: 31861928 PMCID: PMC6981611 DOI: 10.3390/ijms21010075] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 12/13/2019] [Accepted: 12/18/2019] [Indexed: 01/18/2023] Open
Abstract
Understanding of quorum-sensing peptides (QSPs) in their functional mechanism plays an essential role in finding new opportunities to combat bacterial infections by designing drugs. With the avalanche of the newly available peptide sequences in the post-genomic age, it is highly desirable to develop a computational model for efficient, rapid and high-throughput QSP identification purely based on the peptide sequence information alone. Although, few methods have been developed for predicting QSPs, their prediction accuracy and interpretability still requires further improvements. Thus, in this work, we proposed an accurate sequence-based predictor (called iQSP) and a set of interpretable rules (called IR-QSP) for predicting and analyzing QSPs. In iQSP, we utilized a powerful support vector machine (SVM) cooperating with 18 informative features from physicochemical properties (PCPs). Rigorous independent validation test showed that iQSP achieved maximum accuracy and MCC of 93.00% and 0.86, respectively. Furthermore, a set of interpretable rules IR-QSP was extracted by using random forest model and the 18 informative PCPs. Finally, for the convenience of experimental scientists, the iQSP web server was established and made freely available online. It is anticipated that iQSP will become a useful tool or at least as a complementary existing method for predicting and analyzing QSPs.
Collapse
|
42
|
pLoc_bal-mHum: Predict subcellular localization of human proteins by PseAAC and quasi-balancing training dataset. Genomics 2019; 111:1274-1282. [DOI: 10.1016/j.ygeno.2018.08.007] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 08/14/2018] [Accepted: 08/16/2018] [Indexed: 12/17/2022]
|
43
|
iRSpot-DTS: Predict recombination spots by incorporating the dinucleotide-based spare-cross covariance information into Chou's pseudo components. Genomics 2019; 111:1760-1770. [DOI: 10.1016/j.ygeno.2018.11.031] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 11/29/2018] [Accepted: 11/30/2018] [Indexed: 12/16/2022]
|
44
|
Ju Z, Wang SY. Identify Lysine Neddylation Sites Using Bi-profile Bayes Feature Extraction via the Chou's 5-steps Rule and General Pseudo Components. Curr Genomics 2019; 20:592-601. [PMID: 32581647 PMCID: PMC7290059 DOI: 10.2174/1389202921666191223154629] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 10/19/2019] [Accepted: 11/07/2019] [Indexed: 01/06/2023] Open
Abstract
Introduction Neddylation is a highly dynamic and reversible post-translational modification. The abnormality of neddylation has previously been shown to be closely related to some human diseases. The detection of neddylation sites is essential for elucidating the regulation mechanisms of protein neddylation. Objective As the detection of the lysine neddylation sites by the traditional experimental method is often expensive and time-consuming, it is imperative to design computational methods to identify neddylation sites. Methods In this study, a bioinformatics tool named NeddPred is developed to identify underlying protein neddylation sites. A bi-profile bayes feature extraction is used to encode neddylation sites and a fuzzy support vector machine model is utilized to overcome the problem of noise and class imbalance in the prediction. Results Matthew's correlation coefficient of NeddPred achieved 0.7082 and an area under the receiver operating characteristic curve of 0.9769. Independent tests show that NeddPred significantly outperforms existing lysine neddylation sites predictor NeddyPreddy. Conclusion Therefore, NeddPred can be a complement to the existing tools for the prediction of neddylation sites. A user-friendly webserver for NeddPred is accessible at 123.206.31.171/NeddPred/.
Collapse
Affiliation(s)
- Zhe Ju
- College of Science, Shenyang Aerospace University, Shenyang110136, P.R. China
| | - Shi-Yun Wang
- College of Science, Shenyang Aerospace University, Shenyang110136, P.R. China
| |
Collapse
|
45
|
Chou KC. Impacts of Pseudo Amino Acid Components and 5-steps Rule to Proteomics and Proteome Analysis. Curr Top Med Chem 2019; 19:2283-2300. [DOI: 10.2174/1568026619666191018100141] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 08/18/2019] [Accepted: 08/26/2019] [Indexed: 01/27/2023]
Abstract
Stimulated by the 5-steps rule during the last decade or so, computational proteomics has achieved remarkable progresses in the following three areas: (1) protein structural class prediction; (2) protein subcellular location prediction; (3) post-translational modification (PTM) site prediction. The results obtained by these predictions are very useful not only for an in-depth study of the functions of proteins and their biological processes in a cell, but also for developing novel drugs against major diseases such as cancers, Alzheimer’s, and Parkinson’s. Moreover, since the targets to be predicted may have the multi-label feature, two sets of metrics are introduced: one is for inspecting the global prediction quality, while the other for the local prediction quality. All the predictors covered in this review have a userfriendly web-server, through which the majority of experimental scientists can easily obtain their desired data without the need to go through the complicated mathematics.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| |
Collapse
|
46
|
Malebary SJ, Rehman MSU, Khan YD. iCrotoK-PseAAC: Identify lysine crotonylation sites by blending position relative statistical features according to the Chou's 5-step rule. PLoS One 2019; 14:e0223993. [PMID: 31751380 PMCID: PMC6874067 DOI: 10.1371/journal.pone.0223993] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 10/02/2019] [Indexed: 01/22/2023] Open
Abstract
Among different post-translational modifications (PTMs), one of the most important one is the lysine crotonylation in proteins. Its importance cannot be undermined related to different diseases and essential biological practice. The key step for finding the hidden mechanisms of crotonylation along with their occurrence sites is to completely apprehend the mechanism behind this biological process. In previously reported studies, researchers have used different techniques, like position weighted matrix (PWM), support vector machine (SVM), k nearest neighbors (KNN), and many others. However, the maximum prediction accuracy achieved was not such high. To address this, herein, we propose an improved predictor for lysine crotonylation sites named iCrotoK-PseAAC, in which we have incorporated various position and composition relative features along with statistical moments into PseAAC. The results of self-consistency testing were 100% accurate, while the 10-fold cross validation gave 99.0% accuracy. Based on the validation and comparison of model, it is concluded that the iCrotoK-PseAAC is more accurate than the previously proposed models.
Collapse
Affiliation(s)
- Sharaf Jameel Malebary
- Department of Information Technology, King Abdul Aziz University, Rabigh, Kingdom of Saudi Arabia
| | - Muhammad Safi ur Rehman
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
47
|
Xie NZ, Li JX, Huang RB. Biological Production of (S)-acetoin: A State-of-the-Art Review. Curr Top Med Chem 2019; 19:2348-2356. [PMID: 31648637 DOI: 10.2174/1568026619666191018111424] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 08/28/2019] [Accepted: 09/02/2019] [Indexed: 12/24/2022]
Abstract
Acetoin is an important four-carbon compound that has many applications in foods, chemical synthesis, cosmetics, cigarettes, soaps, and detergents. Its stereoisomer (S)-acetoin, a high-value chiral compound, can also be used to synthesize optically active drugs, which could enhance targeting properties and reduce side effects. Recently, considerable progress has been made in the development of biotechnological routes for (S)-acetoin production. In this review, various strategies for biological (S)- acetoin production are summarized, and their constraints and possible solutions are described. Furthermore, future prospects of biological production of (S)-acetoin are discussed.
Collapse
Affiliation(s)
- Neng-Zhong Xie
- National Engineering Research Center for Non-Food Biorefinery, State Key Laboratory of Non-Food Biomass and Enzyme Technology, Guangxi Key Laboratory of Bio-refinery, Guangxi Biomass Engineering Technology Research Center, Guangxi Academy of Sciences, 98 Daling Road, Nanning, 530007, China
| | - Jian-Xiu Li
- National Engineering Research Center for Non-Food Biorefinery, State Key Laboratory of Non-Food Biomass and Enzyme Technology, Guangxi Key Laboratory of Bio-refinery, Guangxi Biomass Engineering Technology Research Center, Guangxi Academy of Sciences, 98 Daling Road, Nanning, 530007, China
| | - Ri-Bo Huang
- National Engineering Research Center for Non-Food Biorefinery, State Key Laboratory of Non-Food Biomass and Enzyme Technology, Guangxi Key Laboratory of Bio-refinery, Guangxi Biomass Engineering Technology Research Center, Guangxi Academy of Sciences, 98 Daling Road, Nanning, 530007, China.,State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Life Science and Technology, Guangxi University, 100 Daxue Road, Nanning, 530004, China
| |
Collapse
|
48
|
Chou KC. Advances in Predicting Subcellular Localization of Multi-label Proteins and its Implication for Developing Multi-target Drugs. Curr Med Chem 2019; 26:4918-4943. [PMID: 31060481 DOI: 10.2174/0929867326666190507082559] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 01/29/2019] [Accepted: 01/31/2019] [Indexed: 12/16/2022]
Abstract
The smallest unit of life is a cell, which contains numerous protein molecules. Most
of the functions critical to the cell’s survival are performed by these proteins located in its different
organelles, usually called ‘‘subcellular locations”. Information of subcellular localization
for a protein can provide useful clues about its function. To reveal the intricate pathways at the
cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite.
Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine
the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing
and selecting the right targets for drug development. Unfortunately, it is both timeconsuming
and costly to determine the subcellular locations of proteins purely based on experiments.
With the avalanche of protein sequences generated in the post-genomic age, it is highly
desired to develop computational methods for rapidly and effectively identifying the subcellular
locations of uncharacterized proteins based on their sequences information alone. Actually,
considerable progresses have been achieved in this regard. This review is focused on those
methods, which have the capacity to deal with multi-label proteins that may simultaneously
exist in two or more subcellular location sites. Protein molecules with this kind of characteristic
are vitally important for finding multi-target drugs, a current hot trend in drug development.
Focused in this review are also those methods that have use-friendly web-servers established so
that the majority of experimental scientists can use them to get the desired results without the
need to go through the detailed mathematics involved.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
49
|
Kang C. 19F-NMR in Target-based Drug Discovery. Curr Med Chem 2019; 26:4964-4983. [PMID: 31187703 DOI: 10.2174/0929867326666190610160534] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Revised: 08/14/2018] [Accepted: 03/13/2019] [Indexed: 02/06/2023]
Abstract
Solution NMR spectroscopy plays important roles in understanding protein structures, dynamics and protein-protein/ligand interactions. In a target-based drug discovery project, NMR can serve an important function in hit identification and lead optimization. Fluorine is a valuable probe for evaluating protein conformational changes and protein-ligand interactions. Accumulated studies demonstrate that 19F-NMR can play important roles in fragment- based drug discovery (FBDD) and probing protein-ligand interactions. This review summarizes the application of 19F-NMR in understanding protein-ligand interactions and drug discovery. Several examples are included to show the roles of 19F-NMR in confirming identified hits/leads in the drug discovery process. In addition to identifying hits from fluorinecontaining compound libraries, 19F-NMR will play an important role in drug discovery by providing a fast and robust way in novel hit identification. This technique can be used for ranking compounds with different binding affinities and is particularly useful for screening competitive compounds when a reference ligand is available.
Collapse
Affiliation(s)
- CongBao Kang
- Experimental Drug Development Centre (EDDC), Agency for Science, Technology and Research (A*STAR), 10 Biopolis Road, #05-01, Singapore, 138670, Singapore
| |
Collapse
|
50
|
Su ZD, Huang Y, Zhang ZY, Zhao YW, Wang D, Chen W, Chou KC, Lin H. iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics 2019; 34:4196-4204. [PMID: 29931187 DOI: 10.1093/bioinformatics/bty508] [Citation(s) in RCA: 129] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 06/19/2018] [Indexed: 12/20/2022] Open
Abstract
Motivation Long non-coding RNAs (lncRNAs) are a class of RNA molecules with more than 200 nucleotides. They have important functions in cell development and metabolism, such as genetic markers, genome rearrangements, chromatin modifications, cell cycle regulation, transcription and translation. Their functions are generally closely related to their localization in the cell. Therefore, knowledge about their subcellular locations can provide very useful clues or preliminary insight into their biological functions. Although biochemical experiments could determine the localization of lncRNAs in a cell, they are both time-consuming and expensive. Therefore, it is highly desirable to develop bioinformatics tools for fast and effective identification of their subcellular locations. Results We developed a sequence-based bioinformatics tool called 'iLoc-lncRNA' to predict the subcellular locations of LncRNAs by incorporating the 8-tuple nucleotide features into the general PseKNC (Pseudo K-tuple Nucleotide Composition) via the binomial distribution approach. Rigorous jackknife tests have shown that the overall accuracy achieved by the new predictor on a stringent benchmark dataset is 86.72%, which is over 20% higher than that by the existing state-of-the-art predictor evaluated on the same tests. Availability and implementation A user-friendly webserver has been established at http://lin-group.cn/server/iLoc-LncRNA, by which users can easily obtain their desired results. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhen-Dong Su
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yan Huang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zhao-Yue Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Ya-Wei Zhao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Dong Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Wei Chen
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, China.,Gordon Life Science Institute, Boston, MA, USA
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Gordon Life Science Institute, Boston, MA, USA
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Gordon Life Science Institute, Boston, MA, USA
| |
Collapse
|