1
|
Babaiha NS, Aghdam R, Ghiam S, Eslahchi C. NN-RNALoc: Neural network-based model for prediction of mRNA sub-cellular localization using distance-based sub-sequence profiles. PLoS One 2023; 18:e0258793. [PMID: 37708177 PMCID: PMC10501558 DOI: 10.1371/journal.pone.0258793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Accepted: 05/12/2023] [Indexed: 09/16/2023] Open
Abstract
The localization of messenger RNAs (mRNAs) is a frequently observed phenomenon and a crucial aspect of gene expression regulation. It is also a mechanism for targeting proteins to a specific cellular region. Moreover, prior research and studies have shown the significance of intracellular RNA positioning during embryonic and neural dendrite formation. Incorrect RNA localization, which can be caused by a variety of factors, such as mutations in trans-regulatory elements, has been linked to the development of certain neuromuscular diseases and cancer. In this study, we introduced NN-RNALoc, a neural network-based method for predicting the cellular location of mRNA using novel features extracted from mRNA sequence data and protein interaction patterns. In fact, we developed a distance-based subsequence profile for RNA sequence representation that is more memory and time-efficient than well-known k-mer sequence representation. Combining protein-protein interaction data, which is essential for numerous biological processes, with our novel distance-based subsequence profiles of mRNA sequences produces more accurate features. On two benchmark datasets, CeFra-Seq and RNALocate, the performance of NN-RNALoc is compared to powerful predictive models proposed in previous works (mRNALoc, RNATracker, mLoc-mRNA, DM3Loc, iLoc-mRNA, and EL-RMLocNet), and a ground neural (DNN5-mer) network. Compared to the previous methods, NN-RNALoc significantly reduces computation time and also outperforms them in terms of accuracy. This study's source code and datasets are freely accessible at https://github.com/NeginBabaiha/NN-RNALoc.
Collapse
Affiliation(s)
- Negin Sadat Babaiha
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, Germany
| | - Rosa Aghdam
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, United States of America
| | - Shokoofeh Ghiam
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| |
Collapse
|
2
|
Yang R, Liu J, Zhang L. ECAmyloid: An amyloid predictor based on ensemble learning and comprehensive sequence-derived features. Comput Biol Chem 2023; 104:107853. [PMID: 36990028 DOI: 10.1016/j.compbiolchem.2023.107853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 03/17/2023] [Accepted: 03/20/2023] [Indexed: 03/30/2023]
Abstract
Amyloid fibrils formed by the mis-aggregation of amyloid proteins can lead to neuronal degenerations in the Alzheimer's disease. Predicting amyloid proteins not only contributes to understanding physicochemical properties and formation mechanism of amyloid proteins, but also has significant implications in the amyloid disease treatment and the development of a new purpose for amyloid materials. In this study, an ensemble learning model with sequence-derived features, ECAmyloid, is proposed to identify amyloids. The sequence-derived features including Pseudo Position Specificity Score Matrix (Pse-PSSM), Split Amino Acid Composition (SAAC), Solvent Accessibility (SA), and Secondary Structure Information (SSI) are employed to incorporate sequence composition, evolutionary and structural information. The individual learners of the ensemble learning model are selected by an increment classifier selection strategy. The final prediction results are determined by voting of prediction results of multiple individual learners. In view of the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted to generate positive samples. To eliminate irrelevant features and redundant features, correlation-based feature subset (CFS) selection combined with a heuristic search strategy is performed to obtain the optimal feature subset. Experimental results indicate that the ensemble classifier achieves an accuracy of 98.29%, a sensitivity of 0.992, a specificity of 0.974 on the training dataset using the 10-fold cross validation, far higher than the results obtained by its individual learners. Compared with the original feature set, the accuracy, sensitivity, specificity, MCC, F1-score, G-Mean of the ensemble method trained by the optimal feature subset are improved by 1.05%, 0.012, 0.01, 0.021, 0.011 and 0.011, respectively. Moreover, the comparison results with existing methods on two same independent test datasets demonstrate that the proposed method is an effective and promising predictor for large-scale determination of amyloid proteins. The data and code used to develop ECAmyloid has been shared to Github, and can be freely downloaded at https://github.com/KOALA-L/ECAmyloid.git.
Collapse
Affiliation(s)
- Runtao Yang
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China
| | - Jiaming Liu
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China
| | - Lina Zhang
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China.
| |
Collapse
|
3
|
Genç M, Özkale MR. Lasso regression under stochastic restrictions in linear regression: An application to genomic data. COMMUN STAT-THEOR M 2022. [DOI: 10.1080/03610926.2022.2149243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Affiliation(s)
- Murat Genç
- Department of Management Information Systems, Faculty of Economics and Administrative Sciences, Tarsus University, Mersin, Turkey
| | - M. Revan Özkale
- Department of Statistics, Faculty of Science and Letters, Çukurova University, Adana, Turkey
| |
Collapse
|
4
|
Ning Q, Zhao X, Ma Z. A Novel Method for Identification of Glutarylation Sites Combining Borderline-SMOTE With Tomek Links Technique in Imbalanced Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2632-2641. [PMID: 34236968 DOI: 10.1109/tcbb.2021.3095482] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Glutarylation is a type of post-translational modification that occurs on lysine residues. It plays an irreplaceable role in various cellular functions. Therefore, identification of glutarylation sites is significant for understanding the molecular mechanism of glutarylation. In this study, we proposed a method named DEXGB_Glu to identify lysine glutarylation sites using XGBoost as classifier which was optimized by differential evolution algorithm. Aiming at the imbalance between positive samples and negative samples, Borderline-SMOTE method was employed to synthesize positive samples, increasing their amount equal to negative samples. Then, Tomek links technique was applied to filter out noise data. Analysis of this method and its results showed that differential evolution algorithm obviously improved the performance and the combination of Borderline-SMOTE and Tomek links effectively solved the imbalance between positive samples and negative samples. Finally, the performance of this method was much better than other methods in prediction of glutarylation sites. The data and code are available on https://github.com/ningq669/DEXGB_Glu.
Collapse
|
5
|
Cong H, Liu H, Cao Y, Chen Y, Liang C. Multiple Protein Subcellular Locations Prediction Based on Deep Convolutional Neural Networks with Self-Attention Mechanism. Interdiscip Sci 2022; 14:421-438. [PMID: 35066812 DOI: 10.1007/s12539-021-00496-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Revised: 12/06/2021] [Accepted: 12/13/2021] [Indexed: 12/12/2022]
Abstract
As an important research field in bioinformatics, protein subcellular location prediction is critical to reveal the protein functions and provide insightful information for disease diagnosis and drug development. Predicting protein subcellular locations remains a challenging task due to the difficulty of finding representative features and robust classifiers. Many feature fusion methods have been widely applied to tackle the above issues. However, they still suffer from accuracy loss due to feature redundancy. Furthermore, multiple protein subcellular locations prediction is more complicated since it is fundamentally a multi-label classification problem. The traditional binary classifiers or even multi-class classifiers cannot achieve satisfactory results. This paper proposes a novel method for protein subcellular location prediction with both single and multiple sites based on deep convolutional neural networks. Specifically, we first obtain the integrated features by simultaneously considering the pseudo amino acid, amino acid index distribution, and physicochemical property. We then adopt deep convolutional neural networks to extract high-dimensional features from the fused feature, removing the redundant preliminary features and gaining better representations of the raw sequences. Moreover, we use the self-attention mechanism and a customized loss function to ensure that the model is more inclined to positive data. In addition, we use random k-label sets to reduce the number of prediction labels. Meanwhile, we employ a hybrid strategy of over-sampling and under-sampling to tackle the data imbalance problem. We compare our model with three representative classification alternatives. The experiment results show that our model achieves the best performance in terms of accuracy, demonstrating the efficacy of the proposed model.
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China.
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China.
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent, Computing University of Jinan, Jinan, China
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent, Computing University of Jinan, Jinan, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| |
Collapse
|
6
|
Hasan MM, Tsukiyama S, Cho JY, Kurata H, Alam MA, Liu X, Manavalan B, Deng HW. Deepm5C: A deep learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy. Mol Ther 2022; 30:2856-2867. [PMID: 35526094 PMCID: PMC9372321 DOI: 10.1016/j.ymthe.2022.05.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 04/25/2022] [Accepted: 05/03/2022] [Indexed: 11/30/2022] Open
Abstract
As one of the most prevalent post-transcriptional epigenetic modifications, N5-methylcytosine (m5C), plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important accurately identify m5C modifications in order to gain a deeper understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models have been developed using small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we propose Deepm5C, a bioinformatics method to identify RNA m5C sites in the throughout human genome. To develop Deepm5C, we constructed a novel benchmarking dataset and investigated a mixture of three conventional feature encoding algorithms and a feature derived from word embedding approaches. Afterwards, four variants of deep learning classifiers and four commonly used conventional classifiers were employed and trained with the four encodings, ultimately obtaining 32 baseline models. A stacking strategy is effectively utilized by integrating the predicted output of the optimal baseline models and trained with a 1-D convolutional neural network. As a result, the Deepm5C predictor achieved excellent performance during cross-validation with a Matthews correlation coefficient and accuracy of 0.697 and 0.855, respectively. The corresponding metrics during the independent test were 0.691 and 0.852, respectively. Overall, Deepm5C achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, Deepm5C is expected to assist community-wide efforts in identifying putative m5Cs and formulate the novel testable biological hypothesis.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA.
| | - Sho Tsukiyama
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Jae Youl Cho
- Molecular Immunology Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Korea
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA
| | - Xiaowen Liu
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Korea.
| | - Hong-Wen Deng
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA.
| |
Collapse
|
7
|
Nguyen TTD, Ho QT, Le NQK, Phan VD, Ou YY. Use Chou's 5-Steps Rule With Different Word Embedding Types to Boost Performance of Electron Transport Protein Prediction Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1235-1244. [PMID: 32750894 DOI: 10.1109/tcbb.2020.3010975] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Living organisms receive necessary energy substances directly from cellular respiration. The completion of electron storage and transportation requires the process of cellular respiration with the aid of electron transport chains. Therefore, the work of deciphering electron transport proteins is inevitably needed. The identification of these proteins with high performance has a prompt dependence on the choice of methods for feature extraction and machine learning algorithm. In this study, protein sequences served as natural language sentences comprising words. The nominated word embedding-based feature sets, hinged on the word embedding modulation and protein motif frequencies, were useful for feature choosing. Five word embedding types and a variety of conjoint features were examined for such feature selection. The support vector machine algorithm consequentially was employed to perform classification. The performance statistics within the 5-fold cross-validation including average accuracy, specificity, sensitivity, as well as MCC rates surpass 0.95. Such metrics in the independent test are 96.82, 97.16, 95.76 percent, and 0.9, respectively. Compared to state-of-the-art predictors, the proposed method can generate more preferable performance above all metrics indicating the effectiveness of the proposed method in determining electron transport proteins. Furthermore, this study reveals insights about the applicability of various word embeddings for understanding surveyed sequences.
Collapse
|
8
|
Xiong E, Cao D, Qu C, Zhao P, Wu Z, Yin D, Zhao Q, Gong F. Multilocation proteins in organelle communication: Based on protein-protein interactions. PLANT DIRECT 2022; 6:e386. [PMID: 35229068 PMCID: PMC8861329 DOI: 10.1002/pld3.386] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Revised: 12/17/2021] [Accepted: 01/18/2022] [Indexed: 05/25/2023]
Abstract
Protein-protein interaction (PPI) plays a crucial role in most biological processes, including signal transduction and cell apoptosis. Importantly, the knowledge of PPIs can be useful for identification of multimeric protein complexes and elucidation of uncharacterized protein functions. Arabidopsis thaliana, the best-characterized dicotyledonous plant, the steadily increasing amount of information on the levels of its proteome and signaling pathways is progressively enabling more researchers to construct models for cellular processes for the plant, which in turn encourages more experimental data to be generated. In this study, we performed an overview analysis of the 10 major organelles and their associated proteins of the dicotyledonous model plant Arabidopsis thaliana via PPI network, and found that PPI may play an important role in organelle communication. Further, multilocation proteins, especially phosphorylation-related multilocation proteins, can function as a "needle and thread" via PPIs and play an important role in organelle communication. Similar results were obtained in a monocotyledonous model crop, rice. Furthermore, we provide a research strategy for multilocation proteins by LOPIT technique, proteomics, and bioinformatics analysis and also describe their potential role in the field of plant science. The results provide a new view that the phosphorylation-related multilocation proteins play an important role in organelle communication and provide new insight into PPIs and novel directions for proteomic research. The research of phosphorylation-related multilocation proteins may promote the development of organelle communication and provide an important theoretical basis for plant responses to external stress.
Collapse
Affiliation(s)
- Erhui Xiong
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| | - Di Cao
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| | - Chengxin Qu
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| | - Pengfei Zhao
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| | - Zhaokun Wu
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| | - Dongmei Yin
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| | - Quanzhi Zhao
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| | - Fangping Gong
- College of AgronomyHenan Agricultural UniversityZhengzhouChina
| |
Collapse
|
9
|
Ensemble of classifier chains and decision templates for multi-label classification. Knowl Inf Syst 2022. [DOI: 10.1007/s10115-021-01647-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
10
|
Shahraki S, Samareh Delarami H, Poorsargol M, Sori Nezami Z. Structural and functional changes of catalase through interaction with Erlotinib hydrochloride. Use of Chou's 5-steps rule to study mechanisms. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2021; 260:119940. [PMID: 34038867 DOI: 10.1016/j.saa.2021.119940] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 05/06/2021] [Accepted: 05/07/2021] [Indexed: 06/12/2023]
Abstract
Erlotinib hydrochloride (Erlo) is used in the treatment of non-small cell lung cancer, pancreatic cancer and other types of cancer. Interaction of small molecules with bio-macromolecules can lead to changes in the structure and function of them which is one of the possible side effects of the drugs. In this study, the interaction of Erlo with bovine liver catalase (BLC) using spectroscopic and computational methods is presented in detail. The enzymatic function of BLC decreased to 58.7% when the concentration of the Erlo was 0.5 × 10-7 M. Fluorescence results revealed that the combination of BLC with Erlo undergoes static quenching mechanism (Kb = 1.15 × 104 M-1 at 300 K). The interaction process was spontaneous, exothermic and enthalpy-driven and Van der Waals and hydrogen bonds forces played major roles in the this process. UV-Vis, CD, 3D, and synchronous fluorescence measurements indicated the changes in the microenvironment residues and α-helix contents of BLC in the presence of Erlo. Docking and molecular dynamics presented a stable binding configuration and their results were perfectly consistent with the spectroscopic results. Theoretical calculations and experimental analysis help to fully understand of drug interaction with important biological molecules such as enzymes.
Collapse
|
11
|
Liao Z, Pan G, Sun C, Tang J. Predicting subcellular location of protein with evolution information and sequence-based deep learning. BMC Bioinformatics 2021; 22:515. [PMID: 34686152 PMCID: PMC8539821 DOI: 10.1186/s12859-021-04404-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 09/24/2021] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Protein subcellular localization prediction plays an important role in biology research. Since traditional methods are laborious and time-consuming, many machine learning-based prediction methods have been proposed. However, most of the proposed methods ignore the evolution information of proteins. In order to improve the prediction accuracy, we present a deep learning-based method to predict protein subcellular locations. RESULTS Our method utilizes not only amino acid compositions sequence but also evolution matrices of proteins. Our method uses a bidirectional long short-term memory network that processes the entire protein sequence and a convolutional neural network that extracts features from protein sequences. The position specific scoring matrix is used as a supplement to protein sequences. Our method was trained and tested on two benchmark datasets. The experiment results show that our method yields accurate results on the two datasets with an average precision of 0.7901, ranking loss of 0.0758 and coverage of 1.2848. CONCLUSION The experiment results show that our method outperforms five methods currently available. According to those experiments, we can see that our method is an acceptable alternative to predict protein subcellular location.
Collapse
Affiliation(s)
- Zhijun Liao
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, 1 Xuefu North Road, University Town, Fuzhou, 350122 FJ China
- Department of Computer Science and Engineering, University of South Carolina, 550 Assembly St, Columbia, SC 29208 USA
| | - Gaofeng Pan
- Department of Computer Science and Engineering, University of South Carolina, 550 Assembly St, Columbia, SC 29208 USA
| | - Chao Sun
- Department of Computer Science and Engineering, University of South Carolina, 550 Assembly St, Columbia, SC 29208 USA
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, 550 Assembly St, Columbia, SC 29208 USA
- College of Electrical and Power Engineering, Taiyuan University of Technology, No. 79 Yinze West Street, Taiyuan, 030024 SX China
| |
Collapse
|
12
|
Akmal MA, Hussain W, Rasool N, Khan YD, Khan SA, Chou KC. Using CHOU'S 5-Steps Rule to Predict O-Linked Serine Glycosylation Sites by Blending Position Relative Features and Statistical Moment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2045-2056. [PMID: 31985438 DOI: 10.1109/tcbb.2020.2968441] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Glycosylation of proteins in eukaryote cells is an important and complicated post-translation modification due to its pivotal role and association with crucial physiological functions within most of the proteins. Identification of glycosylation sites in a polypeptide chain is not an easy task due to multiple impediments. Analytical identification of these sites is expensive and laborious. There is a dire need to develop a reliable computational method for precise determination of such sites which can help researchers to save time and effort. Herein, we propose a novel predictor namely iGlycoS-PseAAC by integrating the Chou's Pseudo Amino Acid Composition (PseAAC) and relative/absolute position-based features. The self-consistency results show that the accuracy revealed by the model using the benchmark dataset for prediction of O-linked glycosylation having serine sites is 98.8 percent. The overall accuracy of predictor achieved through 10-fold cross validation by combining the positive and negative results is 97.2 percent. The overall accuracy achieved through Jackknife test is 96.195 percent by aggregating of all the prediction results. Thus the proposed predictor can help in predicting the O-linked glycosylated serine sites in an efficient and accurate way. The overall results show that the accuracy of the iGlycoS-PseAAC is higher than the existing tools.
Collapse
|
13
|
Tabassum H, Ahmad IZ. Molecular Docking and Dynamics Simulation Analysis of Thymoquinone and Thymol Compounds from Nigella sativa L. that Inhibit Cag A and Vac A Oncoprotein of Helicobacter pylori: Probable Treatment of H. pylori Infections. Med Chem 2021; 17:146-157. [PMID: 32116195 DOI: 10.2174/1573406416666200302113729] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 10/24/2019] [Accepted: 12/04/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND Helicobacter pylori infection is accountable for most of the peptic ulcer and intestinal cancers. Due to the uprising resistance towards H. pylori infection through the present and common proton pump inhibitors regimens, the investigation of novel candidates is the inevitable issue. Medicinal plants have always been a source of lead compounds for drug discovery. The research of the related effective enzymes linked with this gram-negative bacterium is critical for the discovery of novel drug targets. OBJECTIVE The aim of the study is to identify the best candidate to evaluate the inhibitory effect of thymoquinone and thymol against H. pylori oncoproteins, Cag A and Vac A in comparison to the standard drug, metronidazole by using a computational approach. MATERIALS AND METHODS The targeted oncoproteins, Cag A and Vac A were retrieved from RCSB PDB. Lipinski's rule and ADMET toxicity profiling were carried out on the phytoconstituents of the N. sativa. The two compounds of N. sativa were further analyzed by molecular docking and MD simulation studies. The reported phytoconstituents, thymoquinone and thymol present in N. sativa were docked with H. pylori Cag A and Vac A oncoproteins. Structures of ligands were prepared using ChemDraw Ultra 10 software and then changed into their 3D PDB structures using Molinspiration followed by energy minimization by using software Discovery Studio client 2.5. RESULTS The docking results revealed the promising inhibitory potential of thymoquinone against Cag A and Vac A with docking energy of -5.81 kcal/mole and -3.61kcal/mole, respectively. On the contrary, the inhibitory potential of thymol against Cag A and Vac A in terms of docking energy was -5.37 kcal/mole and -3.94kcal/mole as compared to the standard drug, metronidazole having docking energy of -4.87 kcal/mole and -3.20 kcal/mole, respectively. Further, molecular dynamic simulations were conducted for 5ns for optimization, flexibility prediction, and determination of folded Cag A and Vac A oncoproteins stability. The Cag A and Vac A oncoproteins-TQ complexes were found to be quite stable with the root mean square deviation value of 0.2nm. CONCLUSION The computational approaches suggested that thymoquinone and thymol may play an effective pharmacological role to treat H. pylori infection. Hence, it could be summarized that the ligands thymoquinone and thymol bound and interacted well with the proteins Cag A and Vac A as compared to the ligand MTZ. Our study showed that all lead compounds had good interaction with Cag A and Vac A proteins and suggested them to be a useful target to inhibit H. pylori infection.
Collapse
Affiliation(s)
- Heena Tabassum
- Natural Products Laboratory, Department of Bioengineering, Integral University, Dasauli, Kursi Road, Lucknow- 226026, Uttar Pradesh, India
| | - Iffat Zareen Ahmad
- Natural Products Laboratory, Department of Bioengineering, Integral University, Dasauli, Kursi Road, Lucknow- 226026, Uttar Pradesh, India
| |
Collapse
|
14
|
Dai X, Xu F, Wang S, Mundra PA, Zheng J. PIKE-R2P: Protein-protein interaction network-based knowledge embedding with graph neural network for single-cell RNA to protein prediction. BMC Bioinformatics 2021; 22:139. [PMID: 34078261 PMCID: PMC8170782 DOI: 10.1186/s12859-021-04022-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 02/11/2021] [Indexed: 12/05/2022] Open
Abstract
Background Recent advances in simultaneous measurement of RNA and protein abundances at single-cell level provide a unique opportunity to predict protein abundance from scRNA-seq data using machine learning models. However, existing machine learning methods have not considered relationship among the proteins sufficiently. Results We formulate this task in a multi-label prediction framework where multiple proteins are linked to each other at the single-cell level. Then, we propose a novel method for single-cell RNA to protein prediction named PIKE-R2P, which incorporates protein–protein interactions (PPI) and prior knowledge embedding into a graph neural network. Compared with existing methods, PIKE-R2P could significantly improve prediction performance in terms of smaller errors and higher correlations with the gold standard measurements. Conclusion The superior performance of PIKE-R2P indicates that adding the prior knowledge of PPI to graph neural networks can be a powerful strategy for cross-modality prediction of protein abundances at the single-cell level.
Collapse
Affiliation(s)
- Xinnan Dai
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Pudong District, Shanghai, 201210, China
| | - Fan Xu
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Pudong District, Shanghai, 201210, China
| | - Shike Wang
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Pudong District, Shanghai, 201210, China
| | - Piyushkumar A Mundra
- Molecular Oncology Group, Cancer Research UK Manchester Institute, The University of Manchester, Alderley Park, Manchester, UK
| | - Jie Zheng
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Pudong District, Shanghai, 201210, China.
| |
Collapse
|
15
|
Islam MR, Islam MS, Sakeef N. RNA Secondary Structure Prediction with Pseudoknots Using Chemical Reaction Optimization Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1195-1207. [PMID: 31443047 DOI: 10.1109/tcbb.2019.2936570] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
RNA molecules play a significant role in cell function especially including pseudoknots. In past decades, several methods have been developed to predict RNA secondary structure with pseudoknots and the most popular one uses minimum free energy. It is a nondeterministic polynomial-time hard (NP-hard) problem. We have proposed an approach based on a metaheuristic algorithm named Chemical Reaction Optimization (CRO) to solve the RNA pseudoknotted structure prediction problem. The reaction operators of CRO algorithm have been redesigned and used on the generated population to find the structure with the minimum free energy. Besides, we have developed an additional operator called Repair operator which has a great influence on our algorithm in increasing accuracy. It helps to increase the true positive base pairs while decreasing the false positive and false negative base pairs. Four energy models have been applied to calculate the energy. To evaluate the performance, we have used four datasets containing RNA pseudoknotted sequences taken from the RNA STRAND and Pseudobase++ database. We have compared the proposed approach with some existing algorithms and shown that our CRO based model is a better prediction method in terms of accuracy and speed.
Collapse
|
16
|
Khan YD, Alzahrani E, Alghamdi W, Ullah MZ. Sequence-based Identification of Allergen Proteins Developed by Integration of PseAAC and Statistical Moments via 5-Step Rule. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200424085947] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Background:
Allergens are antigens that can stimulate an atopic type I human
hypersensitivity reaction by an immunoglobulin E (IgE) reaction. Some proteins are naturally
allergenic than others. The challenge for toxicologists is to identify properties that allow proteins
to cause allergic sensitization and allergic diseases. The identification of allergen proteins is a very
critical and pivotal task. The experimental identification of protein functions is a hectic, laborious
and costly task; therefore, computer scientists have proposed various methods in the field of
computational biology and bioinformatics using various data science approaches. Objectives:
Herein, we report a novel predictor for the identification of allergen proteins.
Methods:
For feature extraction, statistical moments and various position-based features have been
incorporated into Chou’s pseudo amino acid composition (PseAAC), and are used for training of a
neural network.
Results:
The predictor is validated through 10-fold cross-validation and Jackknife testing, which
gave 99.43% and 99.87% accurate results.
Conclusions:
Thus, the proposed predictor can help in predicting the Allergen proteins in an
efficient and accurate way and can provide baseline data for the discovery of new drugs and
biomarkers.
Collapse
Affiliation(s)
- Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, C II Johar Town, Lahore 54770, Pakistan
| | - Ebraheem Alzahrani
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, P.O. Box 80221, Jeddah, Saudi Arabia
| | - Malik Zaka Ullah
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia
| |
Collapse
|
17
|
Du X, Hu J, Li S. Using Chou's 5-Step Rule to Predict DNA-Protein Binding with Multi-scale Complementary Feature. J Proteome Res 2021; 20:1639-1656. [PMID: 33522829 DOI: 10.1021/acs.jproteome.0c00864] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
It is well known that DNA-protein binding (DPB) prediction is not only beneficial to understand the regulation mechanism of gene expression but also a challenging task in the field of computational biology. Traditional methods for DPB prediction that depend on manually extracted features may lead to classification errors. Recently, deep learning such as convolutional neural network (CNN) has been successfully applied to classification tasks and improved DPB prediction performance significantly. Yet, these methods are based on the original DNA sequence modeling, ignoring the hidden complex dependency and complementarity between multiple sequence features. In consideration of this problem, we propose a method to fuse different sequence features and analyze them systematically through multi-scale CNN. First, sliding windows of specified lengths are set on distinct DNA sequences to generate multiple sequence features with unequal lengths. Second, multiple feature sequences are fused and encoded for feature representation. Third, multi-scale CNN with different binding motif lengths is used to automatically learn and mine the influence of internal attributes and hidden complex relations between the fusion sequence features and make full use of the complementary advantages of extracted CNN features to predict DPB. When our model is applied to 690 ChIP-seq datasets, it achieves an average AUC of 0.9112, which is significantly better than the latest methods. The results show that our method is effective for DPB prediction and is freely available at http://121.5.71.120/mscDPB/.
Collapse
Affiliation(s)
- Xiuquan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, Anhui, China.,School of Computer Science and Technology, Anhui University, Hefei 230601, Anhui, China
| | - Jiajia Hu
- School of Computer Science and Technology, Anhui University, Hefei 230601, Anhui, China
| | - Shuo Li
- Department of Medical Imaging, Western University, London, ON N6A 3K7, Canada
| |
Collapse
|
18
|
Hasan MM, Shoombuatong W, Kurata H, Manavalan B. Critical evaluation of web-based DNA N6-methyladenine site prediction tools. Brief Funct Genomics 2021; 20:258-272. [PMID: 33491072 DOI: 10.1093/bfgp/elaa028] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Revised: 12/11/2020] [Accepted: 12/15/2020] [Indexed: 12/13/2022] Open
Abstract
Methylation of DNA N6-methyladenosine (6mA) is a type of epigenetic modification that plays pivotal roles in various biological processes. The accurate genome-wide identification of 6mA is a challenging task that leads to understanding the biological functions. For the last 5 years, a number of bioinformatics approaches and tools for 6mA site prediction have been established, and some of them are easily accessible as web application. Nevertheless, the accurate genome-wide identification of 6mA is still one of the challenging works that lead to understanding the biological functions. Especially in practical applications, these tools have implemented diverse encoding schemes, machine learning algorithms and feature selection methods, whereas few systematic performance comparisons of 6mA site predictors have been reported. In this review, 11 publicly available 6mA predictors evaluated with seven different species-specific datasets (Arabidopsis thaliana, Tolypocladium, Diospyros lotus, Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans and Escherichia coli). Of those, few species are close homologs, and the remaining datasets are distant sequences. Our independent, validation tests demonstrated that Meta-i6mA and MM-6mAPred models for A. thaliana, Tolypocladium, S. cerevisiae and D. melanogaster achieved excellent overall performance when compared with their counterparts. However, none of the existing methods were suitable for E. coli, C. elegans and D. lotus. A feasibility of the existing predictors is also discussed for the seven species. Our evaluation provides useful guidelines for the development of 6mA site predictors and helps biologists selecting suitable prediction tools.
Collapse
Affiliation(s)
| | - Watshara Shoombuatong
- Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics in the Kyushu Institute of Technology, Japan
| | | |
Collapse
|
19
|
Identification of antioxidant proteins using a discriminative intelligent model of k-space amino acid pairs based descriptors incorporating with ensemble feature selection. Biocybern Biomed Eng 2020. [DOI: 10.1016/j.bbe.2020.10.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
20
|
Imai K, Nakai K. Tools for the Recognition of Sorting Signals and the Prediction of Subcellular Localization of Proteins From Their Amino Acid Sequences. Front Genet 2020; 11:607812. [PMID: 33324450 PMCID: PMC7723863 DOI: 10.3389/fgene.2020.607812] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 11/03/2020] [Indexed: 12/13/2022] Open
Abstract
At the time of translation, nascent proteins are thought to be sorted into their final subcellular localization sites, based on the part of their amino acid sequences (i.e., sorting or targeting signals). Thus, it is interesting to computationally recognize these signals from the amino acid sequences of any given proteins and to predict their final subcellular localization with such information, supplemented with additional information (e.g., k-mer frequency). This field has a long history and many prediction tools have been released. Even in this era of proteomic atlas at the single-cell level, researchers continue to develop new algorithms, aiming at accessing the impact of disease-causing mutations/cell type-specific alternative splicing, for example. In this article, we overview the entire field and discuss its future direction.
Collapse
Affiliation(s)
- Kenichiro Imai
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Kenta Nakai
- The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
21
|
Identification of ligand-binding residues using protein sequence profile alignment and query-specific support vector machine model. Anal Biochem 2020; 604:113799. [DOI: 10.1016/j.ab.2020.113799] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 05/23/2020] [Accepted: 05/26/2020] [Indexed: 12/23/2022]
|
22
|
DNA6mA-MINT: DNA-6mA Modification Identification Neural Tool. Genes (Basel) 2020; 11:genes11080898. [PMID: 32764497 PMCID: PMC7463462 DOI: 10.3390/genes11080898] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 07/28/2020] [Accepted: 07/28/2020] [Indexed: 11/16/2022] Open
Abstract
DNA N6-methyladenine (6mA) is part of numerous biological processes including DNA repair, DNA replication, and DNA transcription. The 6mA modification sites hold a great impact when their biological function is under consideration. Research in biochemical experiments for this purpose is carried out and they have demonstrated good results. However, they proved not to be a practical solution when accessed under cost and time parameters. This led researchers to develop computational models to fulfill the requirement of modification identification. In consensus, we have developed a computational model recommended by Chou’s 5-steps rule. The Neural Network (NN) model uses convolution layers to extract the high-level features from the encoded binary sequence. These extracted features were given an optimal interpretation by using a Long Short-Term Memory (LSTM) layer. The proposed architecture showed higher performance compared to state-of-the-art techniques. The proposed model is evaluated on Mus musculus, Rice, and “Combined-species” genomes with 5- and 10-fold cross-validation. Further, with access to a user-friendly web server, publicly available can be accessed freely.
Collapse
|
23
|
Use Chou’s 5-steps rule to identify DNase I hypersensitive sites via dinucleotide property matrix and extreme gradient boosting. Mol Genet Genomics 2020; 295:1431-1442. [DOI: 10.1007/s00438-020-01711-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 07/11/2020] [Indexed: 01/08/2023]
|
24
|
Ju Z, Wang SY. Computational Identification of Lysine Glutarylation Sites Using Positive-Unlabeled Learning. Curr Genomics 2020; 21:204-211. [PMID: 33071614 PMCID: PMC7521029 DOI: 10.2174/1389202921666200511072327] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2019] [Revised: 04/12/2020] [Accepted: 04/13/2020] [Indexed: 12/27/2022] Open
Abstract
Background
As a new type of protein acylation modification, lysine glutarylation has been found to play a crucial role in metabolic processes and mitochondrial functions. To further explore the biological mechanisms and functions of glutarylation, it is significant to predict the potential glutarylation sites. In the existing glutarylation site predictors, experimentally verified glutarylation sites are treated as positive samples and non-verified lysine sites as the negative samples to train predictors. However, the non-verified lysine sites may contain some glutarylation sites which have not been experimentally identified yet. Methods
In this study, experimentally verified glutarylation sites are treated as the positive samples, whereas the remaining non-verified lysine sites are treated as unlabeled samples. A bioinformatics tool named PUL-GLU was developed to identify glutarylation sites using a positive-unlabeled learning algorithm. Results
Experimental results show that PUL-GLU significantly outperforms the current glutarylation site predictors. Therefore, PUL-GLU can be a powerful tool for accurate identification of protein glutarylation sites. Conclusion
A user-friendly web-server for PUL-GLU is available at http://bioinform.cn/pul_glu/.
Collapse
Affiliation(s)
- Zhe Ju
- College of Science, Shenyang Aerospace University, Shenyang110136, P.R. China
| | - Shi-Yun Wang
- College of Science, Shenyang Aerospace University, Shenyang110136, P.R. China
| |
Collapse
|
25
|
Abstract
During the last three decades or so, many efforts have been made to study the protein cleavage
sites by some disease-causing enzyme, such as HIV (Human Immunodeficiency Virus) protease
and SARS (Severe Acute Respiratory Syndrome) coronavirus main proteinase. It has become increasingly
clear <i>via</i> this mini-review that the motivation driving the aforementioned studies is quite wise,
and that the results acquired through these studies are very rewarding, particularly for developing peptide
drugs.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
26
|
Bouziane H, Chouarfia A. Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment. J Integr Bioinform 2020; 18:51-79. [PMID: 32598314 PMCID: PMC8035964 DOI: 10.1515/jib-2019-0091] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 04/08/2020] [Indexed: 12/31/2022] Open
Abstract
To date, many proteins generated by large-scale genome sequencing projects are still uncharacterized and subject to intensive investigations by both experimental and computational means. Knowledge of protein subcellular localization (SCL) is of key importance for protein function elucidation. However, it remains a challenging task, especially for multiple sites proteins known to shuttle between cell compartments to perform their proper biological functions and proteins which do not have significant homology to proteins of known subcellular locations. Due to their low-cost and reasonable accuracy, machine learning-based methods have gained much attention in this context with the availability of a plethora of biological databases and annotated proteins for analysis and benchmarking. Various predictive models have been proposed to tackle the SCL problem, using different protein sequence features pertaining to the subcellular localization, however, the overwhelming majority of them focuses on single localization and cover very limited cellular locations. The prediction was basically established on sorting signals, amino acids compositions, and homology. To improve the prediction quality, focus is actually on knowledge information extracted from annotation databases, such as protein-protein interactions and Gene Ontology (GO) functional domains annotation which has been recently a widely adopted and essential information for learning systems. To deal with such problem, in the present study, we considered SCL prediction task as a multi-label learning problem and tried to label both single site and multiple sites unannotated bacterial protein sequences by mining proteins homology relationships using both GO terms of protein homologs and PSI-BLAST profiles. The experiments using 5-fold cross-validation tests on the benchmark datasets showed a significant improvement on the results obtained by the proposed consensus multi-label prediction model which discriminates six compartments for Gram-negative and five compartments for Gram-positive bacterial proteins.
Collapse
Affiliation(s)
- Hafida Bouziane
- Département d’Informatique, Université des Sciences et de la Technologie d’Oran Mohamed Boudiaf, USTO-MB BP 1505, El M’Naouer, 31000, Oran, Algeria
| | - Abdallah Chouarfia
- Département d’Informatique, Université des Sciences et de la Technologie d’Oran Mohamed Boudiaf, USTO-MB BP 1505, El M’Naouer, 31000, Oran, Algeria
| |
Collapse
|
27
|
Chou KC. An Insightful 10-year Recollection Since the Emergence of the 5-steps Rule. Curr Pharm Des 2020; 25:4223-4234. [PMID: 31782354 DOI: 10.2174/1381612825666191129164042] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/25/2019] [Indexed: 11/22/2022]
Abstract
OBJECTIVE One of the most challenging and also the most difficult problems is how to formulate a biological sequence with a vector but considerably keep its sequence order information. METHODS To address such a problem, the approach of Pseudo Amino Acid Components or PseAAC has been developed. RESULTS AND CONCLUSION It has become increasingly clear via the 10-year recollection that the aforementioned proposal has been indeed very powerful.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, Massachusetts 02478, United States.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
28
|
Mohabatkar H, Ebrahimi S, Moradi M. Using Chou’s Five-steps Rule to Classify and Predict Glutathione S-transferases with Different Machine Learning Algorithms and Pseudo Amino Acid Composition. Int J Pept Res Ther 2020. [DOI: 10.1007/s10989-020-10087-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
29
|
Pandey RP, Kumar S, Ahmad S, Vibhuti A, Raj VS, Verma AK, Sharma P, Leal E. Use Chou's 5-steps rule to evaluate protective efficacy induced by antigenic proteins of Mycobacterium tuberculosis encapsulated in chitosan nanoparticles. Life Sci 2020; 256:117961. [PMID: 32534039 DOI: 10.1016/j.lfs.2020.117961] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The study focuses on whether antigenic proteins encapsulated in biopolymeric nanoparticles can augment protective efficacy. Chitosan nanoparticles (ChN) were prepared by ionic gelation method and Culture Filtrate Proteins (CFP) - CFP-10 and CFP-21 of Mycobacterium tuberculosis (Mtb) were encapsulated in ChN. The binding efficiency of nanoparticles with CFP-10 and CFP-21 proteins was confirmed by UV-Spectrophotometer. The efficacy of nanoparticles-encapsulated antigenic proteins administered intraperitoneal against Mtb aerosol infection was evaluated in Balb/c mice. Protection study was done by bacterial counts [CFU]. CFP-10 and CFP-21 proteins primed cells demonstrated a Th1 bias T cell response in an ex vivo assay. ChN-CFP10 and ChN-CFP21 nanoparticles have both protective and therapeutic potential against Mtb. In the group of mice immunized with CHN-CFP-10 the number of colonies reduced significantly from day 15 to day 60. ChN-CFP-21 showed maximum protection in ChN-CFP-21 immunized mice. ChN-CFP-10 and ChN-CFP-21 clearly showed enhanced protection against Mtb.
Collapse
Affiliation(s)
- Ramendra Pati Pandey
- Centre for Drug Design Discovery and Development (C4D), SRM University, Delhi-NCR, Rajiv Gandhi Education City, Sonepat 131 029, Haryana, India
| | - Santosh Kumar
- ICGEB (International Centre For Genetic Engineering And Biotechnology), New Delhi 110067, India
| | - Saheem Ahmad
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, University of Ha'il, Ha'il, 55476, Saudi Arabia
| | - Arpana Vibhuti
- Centre for Drug Design Discovery and Development (C4D), SRM University, Delhi-NCR, Rajiv Gandhi Education City, Sonepat 131 029, Haryana, India.
| | - V Samuel Raj
- Centre for Drug Design Discovery and Development (C4D), SRM University, Delhi-NCR, Rajiv Gandhi Education City, Sonepat 131 029, Haryana, India.
| | - Anita Kamra Verma
- Nano-Biotech Laboratory, Department of Zoology, Kirori Mal College, University of Delhi, New Delhi 110003, India
| | - Pawan Sharma
- ICGEB (International Centre For Genetic Engineering And Biotechnology), New Delhi 110067, India
| | - Elcio Leal
- Institute of Biological Sciences, Federal University of Para, Para 66075-000, Brazil.
| |
Collapse
|
30
|
Prediction of N6-methyladenosine sites using convolution neural network model based on distributed feature representations. Neural Netw 2020; 129:385-391. [PMID: 32593932 DOI: 10.1016/j.neunet.2020.05.027] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2020] [Revised: 05/21/2020] [Accepted: 05/24/2020] [Indexed: 01/24/2023]
Abstract
N6-methyladenosine (m6A) is a well-studied and most common interior messenger RNA (mRNA) modification that plays an important function in cell development. N6A is found in all kingdoms of life and many other cellular processes such as RNA splicing, immune tolerance, regulatory functions, RNA processing, and cancer. Despite the crucial role of m6A in cells, it was targeted computationally, but unfortunately, the obtained results were unsatisfactory. It is imperative to develop an efficient computational model that can truly represent m6A sites. In this regard, an intelligent and highly discriminative computational model namely: m6A-word2vec is introduced for the discrimination of m6A sites. Here, a concept of natural language processing in the form of word2vec is used to represent the motif of the target class automatically. These motifs (numerical descriptors) are automatically targeted from the human genome without any clear definition. Further, the extracted feature space is then forwarded to the convolution neural network model as input for prediction. The developed computational model obtained 83.17%, 92.69%, and 90.50% accuracy for benchmark datasets S1, S2, and S3, respectively, using a 10-fold cross-validation test. The predictive outcomes validate that the developed intelligent computational model showed better performance compared to existing computational models. It is thus greatly estimated that the introduced computational model "m6A-word2vec" may be a supportive and practical tool for elementary and pharmaceutical research such as in drug design along with academia.
Collapse
|
31
|
Zhao X, Min Z, Wei X, Ju Y, Fang Y. Using the Chou's 5-steps rule, transient overexpression technique, subcellular location, and bioinformatic analysis to verify the function of Vitis vinifera O-methyltranferase 3 (VvOMT3) protein. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2020; 151:621-629. [PMID: 32335385 DOI: 10.1016/j.plaphy.2020.04.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Revised: 04/10/2020] [Accepted: 04/10/2020] [Indexed: 06/11/2023]
Abstract
3-Isobutyl-2-methoxypyrazine (IBMP) is an important odor compound that revives unripe grapes or poor-quality wine. The biosynthesis of IBMP in grape berries is under the catalysis of Vitis vinifera O-methyltranferase 3 (VvOMT3). The homologous verification in this paper was carried out with the transient overexpression technique. The results showed that both the expression levels of the VvOMT3 gene and the IBMP concentration in 'Red globe' grapes increased significantly, which suggested that VvOMT3 could function in the biosynthesis of IBMP. Based on β-glucuronidase (GUS) staining results, blue color was only observed in grape pulp, not in grape skin, which indicated that VvOMT3 was expressed in grape pulp. The outcomes of the subcellular location examination performed on the protoplasts of Arabidopsis thaliana showed that the VvOMT3 protein was located on the inner surface of the cytoplasmic membrane. In summary, the VvOMT3 enzyme may function at the inner surface of the cytoplasmic membrane of pulp cells during grape development. These results will provide a background for future research on the catalytic mechanisms of VvOMT3.
Collapse
Affiliation(s)
- Xianfang Zhao
- College of Enology, Heyang Viti-viniculture Station, Northwest A & F University, Yangling, 712100, Shaanxi, China; Life School of Science and Technology, Henan Institute of Science and Technology, Xinxiang, 453003, Henan, China.
| | - Zhuo Min
- College of Enology, Heyang Viti-viniculture Station, Northwest A & F University, Yangling, 712100, Shaanxi, China
| | - Xiaofeng Wei
- College of Enology, Heyang Viti-viniculture Station, Northwest A & F University, Yangling, 712100, Shaanxi, China
| | - Yanlun Ju
- College of Enology, Heyang Viti-viniculture Station, Northwest A & F University, Yangling, 712100, Shaanxi, China.
| | - Yulin Fang
- College of Enology, Heyang Viti-viniculture Station, Northwest A & F University, Yangling, 712100, Shaanxi, China.
| |
Collapse
|
32
|
|
33
|
|
34
|
Zheng L, Huang S, Mu N, Zhang H, Zhang J, Chang Y, Yang L, Zuo Y. RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5650975. [PMID: 31802128 PMCID: PMC6893003 DOI: 10.1093/database/baz131] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 10/16/2019] [Accepted: 10/17/2019] [Indexed: 12/12/2022]
Abstract
By reducing amino acid alphabet, the protein complexity can be significantly simplified, which could improve computational efficiency, decrease information redundancy and reduce chance of overfitting. Although some reduced alphabets have been proposed, different classification rules could produce distinctive results for protein sequence analysis. Thus, it is urgent to construct a systematical frame for reduced alphabets. In this work, we constructed a comprehensive web server called RAACBook for protein sequence analysis and machine learning application by integrating reduction alphabets. The web server contains three parts: (i) 74 types of reduced amino acid alphabet were manually extracted to generate 673 reduced amino acid clusters (RAACs) for dealing with unique protein problems. It is easy for users to select desired RAACs from a multilayer browser tool. (ii) An online tool was developed to analyze primary sequence of protein. The tool could produce K-tuple reduced amino acid composition by defining three correlation parameters (K-tuple, g-gap, λ-correlation). The results are visualized as sequence alignment, mergence of RAA composition, feature distribution and logo of reduced sequence. (iii) The machine learning server is provided to train the model of protein classification based on K-tuple RAAC. The optimal model could be selected according to the evaluation indexes (ROC, AUC, MCC, etc.). In conclusion, RAACBook presents a powerful and user-friendly service in protein sequence analysis and computational proteomics. RAACBook can be freely available at http://bioinfor.imu.edu.cn/raacbook. Database URL: http://bioinfor.imu.edu.cn/raacbook
Collapse
Affiliation(s)
- Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Nengjiang Mu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Haoyue Zhang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Jiayu Zhang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Yu Chang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Baojian Road No.157, Harbin 150081, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| |
Collapse
|
35
|
Rehman AU, Olof Olsson P, Khan N, Khan K. Identification of Human Secretome and Membrane Proteome-Based Cancer Biomarkers Utilizing Bioinformatics. J Membr Biol 2020; 253:257-270. [PMID: 32415382 DOI: 10.1007/s00232-020-00122-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Accepted: 05/02/2020] [Indexed: 12/12/2022]
Abstract
Cellular secreted proteins (secretome), together with cellular membrane proteins, collectively referred to as secretory and membrane proteins (SMPs) are a large potential source of biomarkers as they can be used to indicate cell types and conditions. SMPs have been shown to be ideal candidates for several clinically approved drug regimens including for cancer. This study aimed at performing a functional analysis of SMPs within different cancer subtypes to provide great clinical targets for potential prognostic, diagnostic and the therapeutics use. Using an innovative majority decision-based algorithm and transcriptomic data spanning 5 cancer types and over 3000 samples, we quantified the relative difference in SMPs gene expression compared to normal adjacent tissue. A detailed deep data mining analysis revealed a consistent group of downregulated SMP isoforms, enriched in hematopoietic cell lineages (HCL), in multiple cancer types. HCL-associated genes were frequently downregulated in successive cancer stages and high expression was associated with good patient prognosis. In addition, we suggest a potential mechanism by which cancer cells suppress HCL signaling by reducing the expression of immune-related genes. Our data identified potential biomarkers for the cancer immunotherapy. We conclude that our approach may be applicable for the delineation of other types of cancer and illuminate specific targets for therapeutics and diagnostics.
Collapse
Affiliation(s)
- Adeel Ur Rehman
- Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Diseases, School of Life Sciences, University of Science and Technology of China, Hefei, 230027, China.
| | | | - Naveed Khan
- Max Plank Partner Institute of Computational Biology, Shanghai Institute of Biological Sciences, Shanghai, 200032, China
| | - Khalid Khan
- Department of Respiratory and Critical Care Medicine, The Second Clinical Medical College (Shenzhen People's Hospital) of Jinan University, Shenzhen Institute of Respiratory Diseases, Shenzhen, China.,Integrated Chinese and Western Medicine Postdoctoral Research Station, Jinan University, Guangzhou, China
| |
Collapse
|
36
|
Hasan MM, Manavalan B, Shoombuatong W, Khatun MS, Kurata H. i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation. PLANT MOLECULAR BIOLOGY 2020; 103:225-234. [PMID: 32140819 DOI: 10.1007/s11103-020-00988-y] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 02/29/2020] [Indexed: 05/28/2023]
Abstract
DNA N6-methyladenine (6 mA) is one of the most vital epigenetic modifications and involved in controlling the various gene expression levels. With the avalanche of DNA sequences generated in numerous databases, the accurate identification of 6 mA plays an essential role for understanding molecular mechanisms. Because the experimental approaches are time-consuming and costly, it is desirable to develop a computation model for rapidly and accurately identifying 6 mA. To the best of our knowledge, we first proposed a computational model named i6mA-Fuse to predict 6 mA sites from the Rosaceae genomes, especially in Rosa chinensis and Fragaria vesca. We implemented the five encoding schemes, i.e., mononucleotide binary, dinucleotide binary, k-space spectral nucleotide, k-mer, and electron-ion interaction pseudo potential compositions, to build the five, single-encoding random forest (RF) models. The i6mA-Fuse uses a linear regression model to combine the predicted probability scores of the five, single encoding-based RF models. The resultant species-specific i6mA-Fuse achieved remarkably high performances with AUCs of 0.982 and 0.978 and with MCCs of 0.869 and 0.858 on the independent datasets of Rosa chinensis and Fragaria vesca, respectively. In the F. vesca-specific i6mA-Fuse, the MBE and EIIP contributed to 75% and 25% of the total prediction; in the R. chinensis-specific i6mA-Fuse, Kmer, MBE, and EIIP contribute to 15%, 65%, and 20% of the total prediction. To assist high-throughput prediction for DNA 6 mA identification, the i6mA-Fuse is publicly accessible at https://kurata14.bio.kyutech.ac.jp/i6mA-Fuse/.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
- Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan
| | | | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan.
- Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan.
| |
Collapse
|
37
|
Wang S, Wang Y, Yu C, Cao Y, Yu Y, Pan Y, Su D, Lu Q, Yang W, Zuo Y, Yang L. Characterization of the relationship between FLI1 and immune infiltrate level in tumour immune microenvironment for breast cancer. J Cell Mol Med 2020; 24:5501-5514. [PMID: 32249526 PMCID: PMC7214163 DOI: 10.1111/jcmm.15205] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 01/31/2020] [Accepted: 03/06/2020] [Indexed: 12/24/2022] Open
Abstract
Breast cancer is the most common cancer and the leading cause of cancer death among women in the world. Tumour‐infiltrating lymphocytes were defined as the white blood cells left in the vasculature and localized in tumours. Recently, tumour‐infiltrating lymphocytes were found to be associated with good prognosis and response to immunotherapy in tumours. In this study, to examine the influence of FLI1 in immune system in breast cancer, we interrogated the relationship between the FLI1 expression levels with infiltration levels of 28 immune cell types. By splitting the breast cancer samples into high and low expression FLI1 subtypes, we found that the high expression FLI1 subtype was enriched in many immune cell types, and the up‐regulated differentially expressed genes between them were enriched in immune system processes, immune‐related KEGG pathways and biological processes. In addition, many important immune‐related features were found to be positively correlated with the FLI1 expression level. Furthermore, we found that the FLI1 was correlated with the immune‐related genes. Our findings may provide useful help for recognizing the relationship between tumour immune microenvironment and FLI1, and may unravel clinical outcomes and immunotherapy utility for FLI1 in breast cancer.
Collapse
Affiliation(s)
- Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yakun Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chunlu Yu
- Public Health College, Harbin Medical University, Harbin, China
| | - Yiyin Cao
- Public Health College, Harbin Medical University, Harbin, China
| | - Yao Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yi Pan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Qianzi Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Wuritu Yang
- The State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Yongchun Zuo
- The State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
38
|
Identifying FL11 subtype by characterizing tumor immune microenvironment in prostate adenocarcinoma via Chou's 5-steps rule. Genomics 2020; 112:1500-1515. [DOI: 10.1016/j.ygeno.2019.08.021] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 08/03/2019] [Accepted: 08/26/2019] [Indexed: 12/14/2022]
|
39
|
Zheng H, Yang H, Gong D, Mai L, Qiu X, Chen L, Su X, Wei R, Zeng Z. Progress in the Mechanism and Clinical Application of Cilostazol. Curr Top Med Chem 2020; 19:2919-2936. [PMID: 31763974 DOI: 10.2174/1568026619666191122123855] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2019] [Revised: 07/27/2019] [Accepted: 08/02/2019] [Indexed: 12/20/2022]
Abstract
Cilostazol is a unique platelet inhibitor that has been used clinically for more than 20 years. As a phosphodiesterase type III inhibitor, cilostazol is capable of reversible inhibition of platelet aggregation and vasodilation, has antiproliferative effects, and is widely used in the treatment of peripheral arterial disease, cerebrovascular disease, percutaneous coronary intervention, etc. This article briefly reviews the pharmacological mechanisms and clinical application of cilostazol.
Collapse
Affiliation(s)
- Huilei Zheng
- Department of Medical Examination & Health Management, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China.,Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China
| | - Hua Yang
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China.,Department of Critical Care Medicine, Second People's Hospital of Nanning, Nanning, Guangxi, China
| | - Danping Gong
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China.,Elderly Cardiology Ward, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Lanxian Mai
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China.,Disciplinary Construction Office, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Xiaoling Qiu
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China
| | - Lidai Chen
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China
| | - Xiaozhou Su
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China
| | - Ruoqi Wei
- Department of Computer Science and Engineering, University of Bridgeport,126 Park Ave, BRIDGEPORT, CT 06604, United States
| | - Zhiyu Zeng
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China.,Elderly Cardiology Ward, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| |
Collapse
|
40
|
Charoenkwan P, Kanthawong S, Schaduangrat N, Yana J, Shoombuatong W. PVPred-SCM: Improved Prediction and Analysis of Phage Virion Proteins Using a Scoring Card Method. Cells 2020; 9:E353. [PMID: 32028709 PMCID: PMC7072630 DOI: 10.3390/cells9020353] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Revised: 01/20/2020] [Accepted: 01/27/2020] [Indexed: 12/16/2022] Open
Abstract
Although, existing methods have been successful in predicting phage (or bacteriophage) virion proteins (PVPs) using various types of protein features and complex classifiers, such as support vector machine and naïve Bayes, these two methods do not allow interpretability. However, the characterization and analysis of PVPs might be of great significance to understanding the molecular mechanisms of bacteriophage genetics and the development of antibacterial drugs. Hence, we herein proposed a novel method (PVPred-SCM) based on the scoring card method (SCM) in conjunction with dipeptide composition to identify and characterize PVPs. In PVPred-SCM, the propensity scores of 400 dipeptides were calculated using the statistical discrimination approach. Rigorous independent validation test showed that PVPred-SCM utilizing only dipeptide composition yielded an accuracy of 77.56%, indicating that PVPred-SCM performed well relative to the state-of-the-art method utilizing a number of protein features. Furthermore, the propensity scores of dipeptides were used to provide insights into the biochemical and biophysical properties of PVPs. Upon comparison, it was found that PVPred-SCM was superior to the existing methods considering its simplicity, interpretability, and implementation. Finally, in an effort to facilitate high-throughput prediction of PVPs, we provided a user-friendly web-server for identifying the likelihood of whether or not these sequences are PVPs. It is anticipated that PVPred-SCM will become a useful tool or at least a complementary existing method for predicting and analyzing PVPs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand;
| | - Sakawrat Kanthawong
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen 40002, Thailand;
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Janchai Yana
- Department of Chemistry, Faculty of Science and Technology, Chiang Mai Rajabhat University, Chiang Mai 50300, Thailand;
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| |
Collapse
|
41
|
Some illuminating remarks on molecular genetics and genomics as well as drug development. Mol Genet Genomics 2020; 295:261-274. [PMID: 31894399 DOI: 10.1007/s00438-019-01634-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 12/05/2019] [Indexed: 02/07/2023]
Abstract
Facing the explosive growth of biological sequences unearthed in the post-genomic age, one of the most important but also most difficult problems in computational biology is how to express a biological sequence with a discrete model or a vector, but still keep it with considerable sequence-order information or its special pattern. To deal with such a challenging problem, the ideas of "pseudo amino acid components" and "pseudo K-tuple nucleotide composition" have been proposed. The ideas and their approaches have further stimulated the birth for "distorted key theory", "wenxing diagram", and substantially strengthening the power in treating the multi-label systems, as well as the establishment of the famous "5-steps rule". All these logic developments are quite natural that are very useful not only for theoretical scientists but also for experimental scientists in conducting genetics/genomics analysis and drug development. Presented in this review paper are also their future perspectives; i.e., their impacts will become even more significant and propounding.
Collapse
|
42
|
Using Chou’s 5-Step Rule to Evaluate the Stability of Tautomers: Susceptibility of 2-[(Phenylimino)-methyl]-cyclohexane-1,3-diones to Tautomerization Based on the Calculated Gibbs Free Energies. ENERGIES 2020. [DOI: 10.3390/en13010183] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Gibbs free energies, based on DFT (Density Functional Theory) calculations, prove that enaminone (2-(anilinemethylidene)cyclohexane-1,3-dione) and ketamine (2-[(phenylimino)-methyl]cyclohexane-1,3-dione) are the most and least stable tautomeric forms of the studied systems, respectively. 1H and 13C NMR spectra prove that 2-(anilinemethylidene)cyclohexane-1,3-diones are the only tautomeric species present in dimethylsulfoxide solution (a very weak signal can be seen only for the p-methoxy derivatives). The zwitterionic character of these enaminones is strengthened by naphthoannulation and by the insertion of the electron-withdrawing substituent into the benzene ring (the latter weakens the intramolecular hydrogen bond in the compound). Substituent and naphtoannulation have no effect on the stability of the studied tautomers. Slight twisting of the benzene ring, with respect to the CArNC plane (seen in the crystalline state), was proven to also take place in vacuum and in solution.
Collapse
|
43
|
iQSP: A Sequence-Based Tool for the Prediction and Analysis of Quorum Sensing Peptides via Chou's 5-Steps Rule and Informative Physicochemical Properties. Int J Mol Sci 2019; 21:ijms21010075. [PMID: 31861928 PMCID: PMC6981611 DOI: 10.3390/ijms21010075] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 12/13/2019] [Accepted: 12/18/2019] [Indexed: 01/18/2023] Open
Abstract
Understanding of quorum-sensing peptides (QSPs) in their functional mechanism plays an essential role in finding new opportunities to combat bacterial infections by designing drugs. With the avalanche of the newly available peptide sequences in the post-genomic age, it is highly desirable to develop a computational model for efficient, rapid and high-throughput QSP identification purely based on the peptide sequence information alone. Although, few methods have been developed for predicting QSPs, their prediction accuracy and interpretability still requires further improvements. Thus, in this work, we proposed an accurate sequence-based predictor (called iQSP) and a set of interpretable rules (called IR-QSP) for predicting and analyzing QSPs. In iQSP, we utilized a powerful support vector machine (SVM) cooperating with 18 informative features from physicochemical properties (PCPs). Rigorous independent validation test showed that iQSP achieved maximum accuracy and MCC of 93.00% and 0.86, respectively. Furthermore, a set of interpretable rules IR-QSP was extracted by using random forest model and the 18 informative PCPs. Finally, for the convenience of experimental scientists, the iQSP web server was established and made freely available online. It is anticipated that iQSP will become a useful tool or at least as a complementary existing method for predicting and analyzing QSPs.
Collapse
|
44
|
Chou KC. Impacts of Pseudo Amino Acid Components and 5-steps Rule to Proteomics and Proteome Analysis. Curr Top Med Chem 2019; 19:2283-2300. [DOI: 10.2174/1568026619666191018100141] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 08/18/2019] [Accepted: 08/26/2019] [Indexed: 01/27/2023]
Abstract
Stimulated by the 5-steps rule during the last decade or so, computational proteomics has achieved remarkable progresses in the following three areas: (1) protein structural class prediction; (2) protein subcellular location prediction; (3) post-translational modification (PTM) site prediction. The results obtained by these predictions are very useful not only for an in-depth study of the functions of proteins and their biological processes in a cell, but also for developing novel drugs against major diseases such as cancers, Alzheimer’s, and Parkinson’s. Moreover, since the targets to be predicted may have the multi-label feature, two sets of metrics are introduced: one is for inspecting the global prediction quality, while the other for the local prediction quality. All the predictors covered in this review have a userfriendly web-server, through which the majority of experimental scientists can easily obtain their desired data without the need to go through the complicated mathematics.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| |
Collapse
|
45
|
Xuan P, Cui H, Shen T, Sheng N, Zhang T. HeteroDualNet: A Dual Convolutional Neural Network With Heterogeneous Layers for Drug-Disease Association Prediction via Chou's Five-Step Rule. Front Pharmacol 2019; 10:1301. [PMID: 31780934 PMCID: PMC6856670 DOI: 10.3389/fphar.2019.01301] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 10/11/2019] [Indexed: 11/14/2022] Open
Abstract
Identifying new treatments for existing drugs can help reduce drug development costs and explore novel indications of drugs. The prediction of associations between drugs and diseases is challenging because their similarities and relations are complicated and non-linear. We propose a HeteroDualNet model to address this issue. Firstly, three types of matrices are extracted to represent intra-drug similarities, intra-disease similarity and drug-disease associations. The intra-drug similarities consider three drug features and a newly introduced drug-related disease correlation. Secondly, an embedding mechanism is proposed to integrate these matrices in a heterogenous drug-disease association layer (hetero-layer). Further, a neighbouring heterogeneous layer (hetero-layer-N) is constructed to incorporate the biological premise that similar drugs can often treat related diseases. Finally, a dual convolutional neural network is built with hetero-layer and hetero-layer-N as two branches to learn from characteristics of drug-disease and the relations of their neighbours simultaneously. HeteroDualNet outperformed the other four methods in comparison over a public dataset of 763 drugs and 681 diseases in terms of Areas Under the Curves of Receiver Operating Characteristics and Precision-Recall, and recall rate at top k. Case study of five drugs further proved the capacity of HeteroDualNet in finding reliable disease candidates of drugs as validated by database records or literature. Our findings show that the embedded heterogenous layers of original and neighbouring drug-disease representations in a dual neural network improved the association prediction performance.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Bundoora, VIC, Australia
| | - Tonghui Shen
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Nan Sheng
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin, China
| |
Collapse
|
46
|
Zhou GP. The Impact of Biophysics on Medicinal Chemistry. Curr Med Chem 2019; 26:4916-4917. [PMID: 37020360 DOI: 10.2174/092986732626190930142417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
47
|
Lan J, Liu Z, Liao C, Merkler DJ, Han Q, Li J. A Study for Therapeutic Treatment against Parkinson's Disease via Chou's 5-steps Rule. Curr Top Med Chem 2019; 19:2318-2333. [PMID: 31629395 DOI: 10.2174/1568026619666191019111528] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 08/05/2019] [Accepted: 08/22/2019] [Indexed: 11/22/2022]
Abstract
The enzyme L-DOPA decarboxylase (DDC), also called aromatic-L-amino-acid decarboxylase, catalyzes the biosynthesis of dopamine, serotonin, and trace amines. Its deficiency or perturbations in expression result in severe motor dysfunction or a range of neurodegenerative and psychiatric disorders. A DDC substrate, L-DOPA, combined with an inhibitor of the enzyme is still the most effective treatment for symptoms of Parkinson's disease. In this review, we provide an update regarding the structures, functions, and inhibitors of DDC, particularly with regards to the treatment of Parkinson's disease. This information will provide insight into the pharmacological treatment of Parkinson's disease.
Collapse
Affiliation(s)
- Jianqiang Lan
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Life and Pharmaceutical Sciences, Hainan University, Haikou, Hainan 570228, China
| | - Zhongqiang Liu
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Life and Pharmaceutical Sciences, Hainan University, Haikou, Hainan 570228, China
| | - Chenghong Liao
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Life and Pharmaceutical Sciences, Hainan University, Haikou, Hainan 570228, China
| | - David J Merkler
- Department of Chemistry, University of South Florida, Tampa, FL, 33620, United States
| | - Qian Han
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Life and Pharmaceutical Sciences, Hainan University, Haikou, Hainan 570228, China
| | - Jianyong Li
- Department of Biochemistry, Virginia Tech, Blacksburg, VA 24061, United States
| |
Collapse
|
48
|
Liang R, Xie J, Zhang C, Zhang M, Huang H, Huo H, Cao X, Niu B. Identifying Cancer Targets Based on Machine Learning Methods via Chou's 5-steps Rule and General Pseudo Components. Curr Top Med Chem 2019; 19:2301-2317. [PMID: 31622219 DOI: 10.2174/1568026619666191016155543] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 07/19/2019] [Accepted: 08/26/2019] [Indexed: 01/09/2023]
Abstract
In recent years, the successful implementation of human genome project has made people realize that genetic, environmental and lifestyle factors should be combined together to study cancer due to the complexity and various forms of the disease. The increasing availability and growth rate of 'big data' derived from various omics, opens a new window for study and therapy of cancer. In this paper, we will introduce the application of machine learning methods in handling cancer big data including the use of artificial neural networks, support vector machines, ensemble learning and naïve Bayes classifiers.
Collapse
Affiliation(s)
- Ruirui Liang
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Jiayang Xie
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Chi Zhang
- Foshan Huaxia Eye Hospital, Huaxia Eye Hospital Group, Foshan 528000, China
| | - Mengying Zhang
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Hai Huang
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Haizhong Huo
- Department of General Surgery, Shanghai Ninth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai 200011, China
| | - Xin Cao
- Zhongshan Hospital, Institute of Clinical Science, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Bing Niu
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| |
Collapse
|
49
|
Liu Z, Dong W, Jiang W, He Z. csDMA: an improved bioinformatics tool for identifying DNA 6 mA modifications via Chou's 5-step rule. Sci Rep 2019; 9:13109. [PMID: 31511570 PMCID: PMC6739324 DOI: 10.1038/s41598-019-49430-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Accepted: 08/24/2019] [Indexed: 12/31/2022] Open
Abstract
DNA N6-methyldeoxyadenosine (6 mA) modifications were first found more than 60 years ago but were thought to be only widespread in prokaryotes and unicellular eukaryotes. With the development of high-throughput sequencing technology, 6 mA modifications were found in different multicellular eukaryotes by using experimental methods. However, the experimental methods were time-consuming and costly, which makes it is very necessary to develop computational methods instead. In this study, a machine learning-based prediction tool, named csDMA, was developed for predicting 6 mA modifications. Firstly, three feature encoding schemes, Motif, Kmer, and Binary, were used to generate the feature matrix. Secondly, different algorithms were selected into the prediction model and the ExtraTrees model received the best AUC of 0.878 by using 5-fold cross-validation on the training dataset. Besides, the ExtraTrees model also received the best AUC of 0.893 on the independent testing dataset. Finally, we compared our method with state-of-the-art predictors and the results shown that our model achieved better performance than existing tools.
Collapse
Affiliation(s)
- Ze Liu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.,Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| | - Wei Dong
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China. .,Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China.
| | - Wei Jiang
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.,Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| | - Zili He
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.,Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| |
Collapse
|
50
|
FKRR-MVSF: A Fuzzy Kernel Ridge Regression Model for Identifying DNA-Binding Proteins by Multi-View Sequence Features via Chou's Five-Step Rule. Int J Mol Sci 2019; 20:ijms20174175. [PMID: 31454964 PMCID: PMC6747228 DOI: 10.3390/ijms20174175] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 08/10/2019] [Accepted: 08/19/2019] [Indexed: 12/22/2022] Open
Abstract
DNA-binding proteins play an important role in cell metabolism. In biological laboratories, the detection methods of DNA-binding proteins includes yeast one-hybrid methods, bacterial singles and X-ray crystallography methods and others, but these methods involve a lot of labor, material and time. In recent years, many computation-based approachs have been proposed to detect DNA-binding proteins. In this paper, a machine learning-based method, which is called the Fuzzy Kernel Ridge Regression model based on Multi-View Sequence Features (FKRR-MVSF), is proposed to identifying DNA-binding proteins. First of all, multi-view sequence features are extracted from protein sequences. Next, a Multiple Kernel Learning (MKL) algorithm is employed to combine multiple features. Finally, a Fuzzy Kernel Ridge Regression (FKRR) model is built to detect DNA-binding proteins. Compared with other methods, our model achieves good results. Our method obtains an accuracy of 83.26% and 81.72% on two benchmark datasets (PDB1075 and compared with PDB186), respectively.
Collapse
|