1
|
Weckbecker M, Anžel A, Yang Z, Hattab G. Interpretable molecular encodings and representations for machine learning tasks. Comput Struct Biotechnol J 2024; 23:2326-2336. [PMID: 38867722 PMCID: PMC11167246 DOI: 10.1016/j.csbj.2024.05.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 05/13/2024] [Accepted: 05/19/2024] [Indexed: 06/14/2024] Open
Abstract
Molecular encodings and their usage in machine learning models have demonstrated significant breakthroughs in biomedical applications, particularly in the classification of peptides and proteins. To this end, we propose a new encoding method: Interpretable Carbon-based Array of Neighborhoods (iCAN). Designed to address machine learning models' need for more structured and less flexible input, it captures the neighborhoods of carbon atoms in a counting array and improves the utility of the resulting encodings for machine learning models. The iCAN method provides interpretable molecular encodings and representations, enabling the comparison of molecular neighborhoods, identification of repeating patterns, and visualization of relevance heat maps for a given data set. When reproducing a large biomedical peptide classification study, it outperforms its predecessor encoding. When extended to proteins, it outperforms a lead structure-based encoding on 71% of the data sets. Our method offers interpretable encodings that can be applied to all organic molecules, including exotic amino acids, cyclic peptides, and larger proteins, making it highly versatile across various domains and data sets. This work establishes a promising new direction for machine learning in peptide and protein classification in biomedicine and healthcare, potentially accelerating advances in drug discovery and disease diagnosis.
Collapse
Affiliation(s)
- Moritz Weckbecker
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
| | - Aleksandar Anžel
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
| | - Zewen Yang
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
| | - Georges Hattab
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
- Department of Mathematics and Computer science Freie Universität, Arnimallee 14, Berlin, 14195, Berlin, Germany
| |
Collapse
|
2
|
Iwaniak A, Minkiewicz P, Darewicz M. Bioinformatics and bioactive peptides from foods: Do they work together? ADVANCES IN FOOD AND NUTRITION RESEARCH 2024; 108:35-111. [PMID: 38461003 DOI: 10.1016/bs.afnr.2023.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2024]
Abstract
We live in the Big Data Era which affects many aspects of science, including research on bioactive peptides derived from foods, which during the last few decades have been a focus of interest for scientists. These two issues, i.e., the development of computer technologies and progress in the discovery of novel peptides with health-beneficial properties, are closely interrelated. This Chapter presents the example applications of bioinformatics for studying biopeptides, focusing on main aspects of peptide analysis as the starting point, including: (i) the role of peptide databases; (ii) aspects of bioactivity prediction; (iii) simulation of peptide release from proteins. Bioinformatics can also be used for predicting other features of peptides, including ADMET, QSAR, structure, and taste. To answer the question asked "bioinformatics and bioactive peptides from foods: do they work together?", currently it is almost impossible to find examples of peptide research with no bioinformatics involved. However, theoretical predictions are not equivalent to experimental work and always require critical scrutiny. The aspects of compatibility of in silico and in vitro results are also summarized herein.
Collapse
Affiliation(s)
- Anna Iwaniak
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland.
| | - Piotr Minkiewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland
| | - Małgorzata Darewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland
| |
Collapse
|
3
|
Harun-Or-Roshid M, Maeda K, Phan LT, Manavalan B, Kurata H. Stack-DHUpred: Advancing the accuracy of dihydrouridine modification sites detection via stacking approach. Comput Biol Med 2024; 169:107848. [PMID: 38145601 DOI: 10.1016/j.compbiomed.2023.107848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 11/14/2023] [Accepted: 12/11/2023] [Indexed: 12/27/2023]
Abstract
Dihydrouridine (DHU, D) is one of the most abundant post-transcriptional uridine modifications found in tRNA, mRNA, and snoRNA, closely associated with disease pathogenesis and various biological processes in eukaryotes. Identifying D sites is important for understanding the modification mechanisms and/or epigenetic regulation. However, biological experiments for detecting D sites are time-consuming and expensive. Given these challenges, computational methods have been developed for accurately identifying the D sites in genome-wide datasets. However, existing methods have some limitations, and their prediction performance needs to be improved. In this work, we have developed a new computational predictor for accurately identifying D sites called Stack-DHUpred. Briefly, we trained 66 baseline models or single-feature models by connecting six machine learning classifiers with eleven different feature encoding methods and stacked different baseline models to build stacked ensemble learning models. Subsequently, the optimal combination of the baseline models was identified for the construction of the final stacked model. Remarkably, the Stack-DHUpred outperformed the existing predictors on our new independent dataset, indicating that the stacking approach significantly improved the prediction performance. We have made Stack-DHUpred available to the public through a web server (http://kurata35.bio.kyutech.ac.jp/Stack-DHUpred) and a standalone program (https://github.com/kuratahiroyuki/Stack-DHUpred). We believe that Stack-DHUpred will be a valuable tool for accelerating the discovery of D modifications and understanding their role in post-transcriptional regulation.
Collapse
Affiliation(s)
- Md Harun-Or-Roshid
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Kazuhiro Maeda
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Le Thi Phan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Republic of Korea
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Republic of Korea.
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| |
Collapse
|
4
|
Xu J, Li F, Li C, Guo X, Landersdorfer C, Shen HH, Peleg AY, Li J, Imoto S, Yao J, Akutsu T, Song J. iAMPCN: a deep-learning approach for identifying antimicrobial peptides and their functional activities. Brief Bioinform 2023; 24:bbad240. [PMID: 37369638 PMCID: PMC10359087 DOI: 10.1093/bib/bbad240] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 05/30/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
Antimicrobial peptides (AMPs) are short peptides that play crucial roles in diverse biological processes and have various functional activities against target organisms. Due to the abuse of chemical antibiotics and microbial pathogens' increasing resistance to antibiotics, AMPs have the potential to be alternatives to antibiotics. As such, the identification of AMPs has become a widely discussed topic. A variety of computational approaches have been developed to identify AMPs based on machine learning algorithms. However, most of them are not capable of predicting the functional activities of AMPs, and those predictors that can specify activities only focus on a few of them. In this study, we first surveyed 10 predictors that can identify AMPs and their functional activities in terms of the features they employed and the algorithms they utilized. Then, we constructed comprehensive AMP datasets and proposed a new deep learning-based framework, iAMPCN (identification of AMPs based on CNNs), to identify AMPs and their related 22 functional activities. Our experiments demonstrate that iAMPCN significantly improved the prediction performance of AMPs and their corresponding functional activities based on four types of sequence features. Benchmarking experiments on the independent test datasets showed that iAMPCN outperformed a number of state-of-the-art approaches for predicting AMPs and their functional activities. Furthermore, we analyzed the amino acid preferences of different AMP activities and evaluated the model on datasets of varying sequence redundancy thresholds. To facilitate the community-wide identification of AMPs and their corresponding functional types, we have made the source codes of iAMPCN publicly available at https://github.com/joy50706/iAMPCN/tree/master. We anticipate that iAMPCN can be explored as a valuable tool for identifying potential AMPs with specific functional activities for further experimental validation.
Collapse
Affiliation(s)
- Jing Xu
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC 3800, Australia
| | - Chen Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Cornelia Landersdorfer
- Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, VIC 3800, Australia
| | - Hsin-Hui Shen
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Department of Materials Science and Engineering, Faculty of Engineering, Monash University, Clayton, VIC, 3800, Australia
| | - Anton Y Peleg
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Department of Infectious Diseases, Alfred Hospital, Alfred Health, Melbourne, Victoria, Australia
| | - Jian Li
- Monash Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
| | - Seiya Imoto
- Division of Health Medical Intelligence, Human Genome Center, Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | | | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan
| |
Collapse
|
5
|
Teimouri H, Medvedeva A, Kolomeisky AB. Bacteria-Specific Feature Selection for Enhanced Antimicrobial Peptide Activity Predictions Using Machine-Learning Methods. J Chem Inf Model 2023; 63:1723-1733. [PMID: 36912047 DOI: 10.1021/acs.jcim.2c01551] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
There are several classes of short peptide molecules, known as antimicrobial peptides (AMPs), which are produced during the immune responses of living organisms against various infections. In recent years, substantial progress has been achieved in applying machine-learning methods to predict the activities of AMPs against bacteria. In most investigated cases, however, the outcome is not bacterium-specific since the specific features of bacteria, such as chemical composition and structure of membranes, are not considered. To overcome this problem, we developed a new computational approach that allowed us to train several supervised machine-learning models using a specific set of data associated with peptides targeting E. coli bacteria. LASSO regression and Support Vector Machine techniques have been utilized to select, among more than 1500 physicochemical descriptors, the most important features that can be used to classify a peptide as antimicrobial or ineffective against E. coli. We then performed the classification of active versus inactive AMPs using the Support Vector classifiers, Logistic Regression, and Random Forest methods. This computational study allows us to make recommendations of how to design more efficient antibacterial drug therapies.
Collapse
Affiliation(s)
- Hamid Teimouri
- Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Angela Medvedeva
- Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Anatoly B Kolomeisky
- Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas 77005, United States.,Department of Physics and Astronomy, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
6
|
MPMABP: A CNN and Bi-LSTM-Based Method for Predicting Multi-Activities of Bioactive Peptides. Pharmaceuticals (Basel) 2022; 15:ph15060707. [PMID: 35745625 PMCID: PMC9231127 DOI: 10.3390/ph15060707] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/23/2022] [Accepted: 05/30/2022] [Indexed: 12/30/2022] Open
Abstract
Bioactive peptides are typically small functional peptides with 2–20 amino acid residues and play versatile roles in metabolic and biological processes. Bioactive peptides are multi-functional, so it is vastly challenging to accurately detect all their functions simultaneously. We proposed a convolution neural network (CNN) and bi-directional long short-term memory (Bi-LSTM)-based deep learning method (called MPMABP) for recognizing multi-activities of bioactive peptides. The MPMABP stacked five CNNs at different scales, and used the residual network to preserve the information from loss. The empirical results showed that the MPMABP is superior to the state-of-the-art methods. Analysis on the distribution of amino acids indicated that the lysine preferred to appear in the anti-cancer peptide, the leucine in the anti-diabetic peptide, and the proline in the anti-hypertensive peptide. The method and analysis are beneficial to recognize multi-activities of bioactive peptides.
Collapse
|
7
|
Qi L, Gao X, Pan D, Sun Y, Cai Z, Xiong Y, Dang Y. Research progress in the screening and evaluation of umami peptides. Compr Rev Food Sci Food Saf 2022; 21:1462-1490. [PMID: 35201672 DOI: 10.1111/1541-4337.12916] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 12/22/2021] [Accepted: 01/03/2022] [Indexed: 12/22/2022]
Abstract
Umami is an important element affecting food taste, and the development of umami peptides is a topic of interest in food-flavoring research. The existing technology used for traditional screening of umami peptides is time-consuming and labor-intensive, making it difficult to meet the requirements of high-throughput screening, which limits the rapid development of umami peptides. The difficulty in performing a standard measurement of umami intensity is another problem that restricts the development of umami peptides. The existing methods are not sensitive and specific, making it difficult to achieve a standard evaluation of umami taste. This review summarizes the umami receptors and umami peptides, focusing on the problems restricting the development of umami peptides, high-throughput screening, and establishment of evaluation standards. The rapid screening of umami peptides was realized based on molecular docking technology and a machine learning method, and the standard evaluation of umami could be realized with a bionic taste sensor. The progress of rapid screening and evaluation methods significantly promotes the study of umami peptides and increases its application in the seasoning industry.
Collapse
Affiliation(s)
- Lulu Qi
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of AgroProducts, College of Food and Pharmaceutical Sciences, Ningbo University, Ningbo, Zhejiang, China
| | - Xinchang Gao
- Department of Chemistry, Tsinghua University, Beijing, China
| | - Daodong Pan
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of AgroProducts, College of Food and Pharmaceutical Sciences, Ningbo University, Ningbo, Zhejiang, China.,National R&D Center for Freshwater Fish Processing, Jiangxi Normal University, Nanchang, Jiangxi, China
| | - Yangying Sun
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of AgroProducts, College of Food and Pharmaceutical Sciences, Ningbo University, Ningbo, Zhejiang, China
| | - Zhendong Cai
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of AgroProducts, College of Food and Pharmaceutical Sciences, Ningbo University, Ningbo, Zhejiang, China
| | - Yongzhao Xiong
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of AgroProducts, College of Food and Pharmaceutical Sciences, Ningbo University, Ningbo, Zhejiang, China
| | - Yali Dang
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of AgroProducts, College of Food and Pharmaceutical Sciences, Ningbo University, Ningbo, Zhejiang, China
| |
Collapse
|
8
|
Tasmia SA, Kibria MK, Tuly KF, Islam MA, Khatun MS, Hasan MM, Mollah MNH. Prediction of serine phosphorylation sites mapping on Schizosaccharomyces Pombe by fusing three encoding schemes with the random forest classifier. Sci Rep 2022; 12:2632. [PMID: 35173235 PMCID: PMC8850546 DOI: 10.1038/s41598-022-06529-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 02/01/2022] [Indexed: 11/08/2022] Open
Abstract
Serine phosphorylation is one type of protein post-translational modifications (PTMs), which plays an essential role in various cellular processes and disease pathogenesis. Numerous methods are used for the prediction of phosphorylation sites. However, the traditional wet-lab based experimental approaches are time-consuming, laborious, and expensive. In this work, a computational predictor was proposed to predict serine phosphorylation sites mapping on Schizosaccharomyces pombe (SP) by the fusion of three encoding schemes namely k-spaced amino acid pair composition (CKSAAP), binary and amino acid composition (AAC) with the random forest (RF) classifier. So far, the proposed method is firstly developed to predict serine phosphorylation sites for SP. Both the training and independent test performance scores were used to investigate the success of the proposed RF based fusion prediction model compared to others. We also investigated their performances by 5-fold cross-validation (CV). In all cases, it was observed that the recommended predictor achieves the largest scores of true positive rate (TPR), true negative rate (TNR), accuracy (ACC), Mathew coefficient of correlation (MCC), Area under the ROC curve (AUC) and pAUC (partial AUC) at false positive rate (FPR) = 0.20. Thus, the prediction performance as discussed in this paper indicates that the proposed approach may be a beneficial and motivating computational resource for predicting serine phosphorylation sites in the case of Fungi. The online interface of the software for the proposed prediction model is publicly available at http://mollah-bioinformaticslab-stat.ru.ac.bd/PredSPS/ .
Collapse
Affiliation(s)
- Samme Amena Tasmia
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Md Kaderi Kibria
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Khanis Farhana Tuly
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Md Ariful Islam
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Mst Shamima Khatun
- Department of Microbiology and Immunology, Tulane University School of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Md Nurul Haque Mollah
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| |
Collapse
|
9
|
Bin Hafeez A, Jiang X, Bergen PJ, Zhu Y. Antimicrobial Peptides: An Update on Classifications and Databases. Int J Mol Sci 2021; 22:11691. [PMID: 34769122 PMCID: PMC8583803 DOI: 10.3390/ijms222111691] [Citation(s) in RCA: 100] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 10/24/2021] [Accepted: 10/25/2021] [Indexed: 02/06/2023] Open
Abstract
Antimicrobial peptides (AMPs) are distributed across all kingdoms of life and are an indispensable component of host defenses. They consist of predominantly short cationic peptides with a wide variety of structures and targets. Given the ever-emerging resistance of various pathogens to existing antimicrobial therapies, AMPs have recently attracted extensive interest as potential therapeutic agents. As the discovery of new AMPs has increased, many databases specializing in AMPs have been developed to collect both fundamental and pharmacological information. In this review, we summarize the sources, structures, modes of action, and classifications of AMPs. Additionally, we examine current AMP databases, compare valuable computational tools used to predict antimicrobial activity and mechanisms of action, and highlight new machine learning approaches that can be employed to improve AMP activity to combat global antimicrobial resistance.
Collapse
Affiliation(s)
- Ahmer Bin Hafeez
- Centre of Biotechnology and Microbiology, University of Peshawar, Peshawar 25120, Pakistan;
| | - Xukai Jiang
- Infection and Immunity Program, Department of Microbiology, Biomedicine Discovery Institute, Monash University, Clayton, VIC 3800, Australia; (X.J.); (P.J.B.)
- National Glycoengineering Research Center, Shandong University, Qingdao 266237, China
| | - Phillip J. Bergen
- Infection and Immunity Program, Department of Microbiology, Biomedicine Discovery Institute, Monash University, Clayton, VIC 3800, Australia; (X.J.); (P.J.B.)
| | - Yan Zhu
- Infection and Immunity Program, Department of Microbiology, Biomedicine Discovery Institute, Monash University, Clayton, VIC 3800, Australia; (X.J.); (P.J.B.)
| |
Collapse
|
10
|
Akbar S, Ahmad A, Hayat M, Rehman AU, Khan S, Ali F. iAtbP-Hyb-EnC: Prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model. Comput Biol Med 2021; 137:104778. [PMID: 34481183 DOI: 10.1016/j.compbiomed.2021.104778] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 08/16/2021] [Accepted: 08/17/2021] [Indexed: 11/26/2022]
Abstract
Tuberculosis (TB) is a worldwide illness caused by the bacteria Mycobacterium tuberculosis. Owing to the high prevalence of multidrug-resistant tuberculosis, numerous traditional strategies for developing novel alternative therapies have been presented. The effectiveness and dependability of these procedures are not always consistent. Peptide-based therapy has recently been regarded as a preferable alternative due to its excellent selectivity in targeting specific cells without affecting the normal cells. However, due to the rapid growth of the peptide samples, predicting TB accurately has become a challenging task. To effectively identify antitubercular peptides, an intelligent and reliable prediction model is indispensable. An ensemble learning approach was used in this study to improve expected results by compensating for the shortcomings of individual classification algorithms. Initially, three distinct representation approaches were used to formulate the training samples: k-space amino acid composition, composite physiochemical properties, and one-hot encoding. The feature vectors of the applied feature extraction methods are then combined to generate a heterogeneous vector. Finally, utilizing individual and heterogeneous vectors, five distinct nature classification models were used to evaluate prediction rates. In addition, a genetic algorithm-based ensemble model was used to improve the suggested model's prediction and training capabilities. Using Training and independent datasets, the proposed ensemble model achieved an accuracy of 94.47% and 92.68%, respectively. It was observed that our proposed "iAtbP-Hyb-EnC" model outperformed and reported ~10% highest training accuracy than existing predictors. The "iAtbP-Hyb-EnC" model is suggested to be a reliable tool for scientists and might play a valuable role in academic research and drug discovery. The source code and all datasets are publicly available at https://github.com/Farman335/iAtbP-Hyb-EnC.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Ashfaq Ahmad
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Ateeq Ur Rehman
- Department of Information Technology, The University of Haripur, KP, Pakistan.
| | - Salman Khan
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Farman Ali
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| |
Collapse
|
11
|
Charoenkwan P, Chiangjong W, Hasan MM, Nantasenamat C, Shoombuatong W. Review and comparative analysis of machine learning-based predictors for predicting and analyzing of anti-angiogenic peptides. Curr Med Chem 2021; 29:849-864. [PMID: 34375178 DOI: 10.2174/0929867328666210810145806] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 06/17/2021] [Accepted: 06/22/2021] [Indexed: 11/22/2022]
Abstract
Cancer is one of the leading causes of death worldwide and underlying this is angiogenesis that represents one of the hallmarks of cancer. Ongoing effort is already under way in the discovery of anti-angiogenic peptides (AAPs) as a promising therapeutic route by tackling the formation of new blood vessels. As such, the identification of AAPs constitutes a viable path for understanding their mechanistic properties pertinent for the discovery of new anti-cancer drugs. In spite of the abundance of peptide sequences in public databases, experimental efforts in the identification of anti-angiogenic peptides have progressed very slowly owing to its high expenditures and laborious nature. Owing to its inherent ability to make sense of large volumes of data, machine learning (ML) represents a lucrative technique that can be harnessed for peptide-based drug discovery. In this review, we conducted a comprehensive and comparative analysis of ML-based AAP predictors in terms of their employed feature descriptors, ML algorithms, cross-validation methods and prediction performance. Moreover, the common framework of these AAP predictors and their inherent weaknesses are also discussed. Particularly, we explore future perspectives for improving the prediction accuracy and model interpretability, which represents an interesting avenue for overcoming some of the inherent weaknesses of existing AAP predictors. We anticipate that this review would assist researchers in the rapid screening and identification of promising AAPs for clinical use.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand
| | - Wararat Chiangjong
- Pediatric Translational Research Unit, Department of Pediatrics, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok 10400, Thailand
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, United States
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| |
Collapse
|
12
|
Charoenkwan P, Anuwongcharoen N, Nantasenamat C, Hasan MM, Shoombuatong W. In Silico Approaches for the Prediction and Analysis of Antiviral Peptides: A Review. Curr Pharm Des 2021; 27:2180-2188. [PMID: 33138759 DOI: 10.2174/1381612826666201102105827] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Accepted: 08/20/2020] [Indexed: 11/22/2022]
Abstract
In light of the growing resistance toward current antiviral drugs, efforts to discover novel and effective antiviral therapeutic agents remain a pressing scientific effort. Antiviral peptides (AVPs) represent promising therapeutic agents due to their extraordinary advantages in terms of potency, efficacy and pharmacokinetic properties. The growing volume of newly discovered peptide sequences in the post-genomic era requires computational approaches for timely and accurate identification of AVPs. Machine learning (ML) methods such as random forest and support vector machine represent robust learning algorithms that are instrumental in successful peptide-based drug discovery. Therefore, this review summarizes the current state-of-the-art application of ML methods for identifying AVPs directly from the sequence information. We compare the efficiency of these methods in terms of the underlying characteristics of the dataset used along with feature encoding methods, ML algorithms, cross-validation methods and prediction performance. Finally, guidelines for the development of robust AVP models are also discussed. It is anticipated that this review will serve as a useful guide for the design and development of robust AVP and related therapeutic peptide predictors in the future.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Nuttapat Anuwongcharoen
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| |
Collapse
|
13
|
Khatun MS, Alam MA, Shoombuatong W, Mollah MNH, Kurata H, Hasan MM. Recent development of bioinformatics tools for microRNA target prediction. Curr Med Chem 2021; 29:865-880. [PMID: 34348604 DOI: 10.2174/0929867328666210804090224] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 06/10/2021] [Accepted: 06/15/2021] [Indexed: 11/22/2022]
Abstract
MicroRNAs (miRNAs) are central players that regulate the post-transcriptional processes of gene expression. Binding of miRNAs to target mRNAs can repress their translation by inducing the degradation or by inhibiting the translation of the target mRNAs. High-throughput experimental approaches for miRNA target identification are costly and time-consuming, depending on various factors. It is vitally important to develop the bioinformatics methods for accurately predicting miRNA targets. With the increase of RNA sequences in the post-genomic era, bioinformatics methods are being developed for miRNA studies specially for miRNA target prediction. This review summarizes the current development of state-of-the-art bioinformatics tools for miRNA target prediction, points out the progress and limitations of the available miRNA databases, and their working principles. Finally, we discuss the caveat and perspectives of the next-generation algorithms for the prediction of miRNA targets.
Collapse
Affiliation(s)
- Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502. Japan
| | - Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112. United States
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700. Thailand
| | - Md Nurul Haque Mollah
- Laboratory of Bioinformatics, Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh. 5Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083. Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502. Japan
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502. Japan
| |
Collapse
|
14
|
He B, Yang S, Long J, Chen X, Zhang Q, Gao H, Chen H, Huang J. TUPDB: Target-Unrelated Peptide Data Bank. Interdiscip Sci 2021; 13:426-432. [PMID: 33993461 DOI: 10.1007/s12539-021-00436-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 04/29/2021] [Accepted: 05/06/2021] [Indexed: 11/29/2022]
Abstract
The isolation of target-unrelated peptides (TUPs) through biopanning remains as a major problem of phage display selection experiments. These TUPs do not have any actual affinity toward targets of interest, which tend to be mistakenly identified as target-binding peptides. Therefore, an information portal for storing TUP data is urgently needed. Here, we present a TUP data bank (TUPDB), which is a comprehensive, manually curated database of approximately 73 experimentally verified TUPs and 1963 potential TUPs collected from TUPScan, the BDB database, and public research articles. The TUPScan tool has been integrated in TUPDB to facilitate TUP analysis. We believe that TUPDB can help identify and remove TUPs in future reports in the biopanning community. The database is of great importance to improving the quality of phage display-based epitope mapping and promoting the development of vaccines, diagnostics, and therapeutics. The TUPDB database is available at http://i.uestc.edu.cn/tupdb .
Collapse
Affiliation(s)
- Bifang He
- School of Medicine, Guizhou University, Guiyang, 550025, China. .,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 611731, China.
| | - Shanshan Yang
- School of Medicine, Guizhou University, Guiyang, 550025, China
| | - Jinjin Long
- School of Medicine, Guizhou University, Guiyang, 550025, China
| | - Xue Chen
- School of Medicine, Guizhou University, Guiyang, 550025, China
| | - Qianyue Zhang
- School of Medicine, Guizhou University, Guiyang, 550025, China
| | - Hui Gao
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Heng Chen
- School of Medicine, Guizhou University, Guiyang, 550025, China.
| | - Jian Huang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 611731, China.
| |
Collapse
|
15
|
Islam MM, Alam MJ, Ahmed FF, Hasan MM, Mollah MNH. Improved Prediction of Protein-Protein Interaction Mapping on Homo Sapiens by Using Amino Acid Sequence Features in a Supervised Learning Framework. Protein Pept Lett 2021; 28:74-83. [PMID: 32520672 DOI: 10.2174/0929866527666200610141258] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 05/03/2020] [Accepted: 05/04/2020] [Indexed: 02/07/2023]
Abstract
BACKGROUND Protein-Protein Interaction (PPI) has emerged as a key role in the control of many biological processes including protein function, disease incidence, and therapy design. However, the identification of PPI by wet lab experiment is a challenging task, since it is laborious, time consuming and expensive. Therefore, computational prediction of PPI is now given emphasis before going to the experimental validation, since it is simultaneously less laborious, time saver and cost minimizer. OBJECTIVE The objective of this study is to develop an improved computational method for PPI prediction mapping on Homo sapiens by using the amino acid sequence features in a supervised learning framework. METHODS The experimentally validated 91 positive-PPI pairs of human protein sequences were collected from IntAct Molecular Interaction Database. Then we constructed three balanced datasets with ratios 1:1, 1:2 and 1:3 of positive and negative PPI samples. Then we partitioned each dataset into training (80%) and independent test (20%) datasets. Again each training dataset was partitioned into four mutually exclusive groups of equal sizes for interchanging each group with independent test group to perform 5-fold cross validation (CV). Then we trained candidate seven classifiers (NN, SVM, LR, NB, KNN, AB and RF) with each ratio case to obtain the better PPI predictor by comparing their performance scores. RESULTS The random forest (RF) based predictor that was trained with 1:2 ratio of positive-PPI and negative-PPI samples based on AAC encoding features provided the most accurate PPI prediction by producing the highest average performance scores of accuracy (93.50%), sensitivity (95.0%), MCC (85.2%), AUC (0.941) and pAUC (0.236) with the 5-fold cross-validation. It also achieved the highest average performance scores of accuracy (92.0%), sensitivity (94.0%), MCC (83.6%), AUC (0.922) and pAUC (0.207) with the independent test datasets in a comparison of the other candidate and existing predictors. CONCLUSION The final resultant prediction strongly recommend that the RF based predictor is a better prediction model of PPI mapping on Homo sapiens.
Collapse
Affiliation(s)
- Md Merajul Islam
- Bioinformatics Laboratory, Department of Statistics, Rajshahi University, Rajshahi-6205, Bangladesh
| | - Md Jahangir Alam
- Bioinformatics Laboratory, Department of Statistics, Rajshahi University, Rajshahi-6205, Bangladesh
| | - Fee Faysal Ahmed
- Department of Mathematics, Jashore University of Science and Technology, Jashore, Bangladesh
| | - Md Mehedi Hasan
- Deptartment of Bioscience and Bioinformatics, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan
| | - Md Nurul Haque Mollah
- Bioinformatics Laboratory, Department of Statistics, Rajshahi University, Rajshahi-6205, Bangladesh
| |
Collapse
|
16
|
Winkler DA. Use of Artificial Intelligence and Machine Learning for Discovery of Drugs for Neglected Tropical Diseases. Front Chem 2021; 9:614073. [PMID: 33791277 PMCID: PMC8005575 DOI: 10.3389/fchem.2021.614073] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 01/18/2021] [Indexed: 12/11/2022] Open
Abstract
Neglected tropical diseases continue to create high levels of morbidity and mortality in a sizeable fraction of the world’s population, despite ongoing research into new treatments. Some of the most important technological developments that have accelerated drug discovery for diseases of affluent countries have not flowed down to neglected tropical disease drug discovery. Pharmaceutical development business models, cost of developing new drug treatments and subsequent costs to patients, and accessibility of technologies to scientists in most of the affected countries are some of the reasons for this low uptake and slow development relative to that for common diseases in developed countries. Computational methods are starting to make significant inroads into discovery of drugs for neglected tropical diseases due to the increasing availability of large databases that can be used to train ML models, increasing accuracy of these methods, lower entry barrier for researchers, and widespread availability of public domain machine learning codes. Here, the application of artificial intelligence, largely the subset called machine learning, to modelling and prediction of biological activities and discovery of new drugs for neglected tropical diseases is summarized. The pathways for the development of machine learning methods in the short to medium term and the use of other artificial intelligence methods for drug discovery is discussed. The current roadblocks to, and likely impacts of, synergistic new technological developments on the use of ML methods for neglected tropical disease drug discovery in the future are also discussed.
Collapse
Affiliation(s)
- David A Winkler
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, VIC, Australia.,Latrobe Institute for Molecular Science, La Trobe University, Bundoora, VIC, Australia.,School of Pharmacy, University of Nottingham, Nottingham, United Kingdom.,CSIRO Data61, Pullenvale, QLD, Australia
| |
Collapse
|
17
|
Nilamyani AN, Auliah FN, Moni MA, Shoombuatong W, Hasan MM, Kurata H. PredNTS: Improved and Robust Prediction of Nitrotyrosine Sites by Integrating Multiple Sequence Features. Int J Mol Sci 2021; 22:2704. [PMID: 33800121 PMCID: PMC7962192 DOI: 10.3390/ijms22052704] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Revised: 03/02/2021] [Accepted: 03/03/2021] [Indexed: 12/15/2022] Open
Abstract
Nitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyrosine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction can play a vital role before the biological experimentation. Herein, we developed a computational predictor PredNTS by integrating multiple sequence features including K-mer, composition of k-spaced amino acid pairs (CKSAAP), AAindex, and binary encoding schemes. The important features were selected by the recursive feature elimination approach using a random forest classifier. Finally, we linearly combined the successive random forest (RF) probability scores generated by the different, single encoding-employing RF models. The resultant PredNTS predictor achieved an area under a curve (AUC) of 0.910 using five-fold cross validation. It outperformed the existing predictors on a comprehensive and independent dataset. Furthermore, we investigated several machine learning algorithms to demonstrate the superiority of the employed RF algorithm. The PredNTS is a useful computational resource for the prediction of nitrotyrosine sites. The web-application with the curated datasets of the PredNTS is publicly available.
Collapse
Affiliation(s)
- Andi Nur Nilamyani
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (A.N.N.); (F.N.A.)
| | - Firda Nurul Auliah
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (A.N.N.); (F.N.A.)
| | - Mohammad Ali Moni
- WHO Collaborating Centre on eHealth, UNSW Digital Health, School of Public Health and Community Medicine, Faculty of Medicine, UNSW Sydney, Sydney, NSW 2052, Australia;
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (A.N.N.); (F.N.A.)
- Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (A.N.N.); (F.N.A.)
| |
Collapse
|
18
|
Auliah FN, Nilamyani AN, Shoombuatong W, Alam MA, Hasan MM, Kurata H. PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations. Int J Mol Sci 2021; 22:ijms22042120. [PMID: 33672741 PMCID: PMC7924619 DOI: 10.3390/ijms22042120] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 02/12/2021] [Accepted: 02/18/2021] [Indexed: 12/30/2022] Open
Abstract
Pupylation is a type of reversible post-translational modification of proteins, which plays a key role in the cellular function of microbial organisms. Several proteomics methods have been developed for the prediction and analysis of pupylated proteins and pupylation sites. However, the traditional experimental methods are laborious and time-consuming. Hence, computational algorithms are highly needed that can predict potential pupylation sites using sequence features. In this research, a new prediction model, PUP-Fuse, has been developed for pupylation site prediction by integrating multiple sequence representations. Meanwhile, we explored the five types of feature encoding approaches and three machine learning (ML) algorithms. In the final model, we integrated the successive ML scores using a linear regression model. The PUP-Fuse achieved a Mathew correlation value of 0.768 by a 10-fold cross-validation test. It also outperformed existing predictors in an independent test. The web server of the PUP-Fuse with curated datasets is freely available.
Collapse
Affiliation(s)
- Firda Nurul Auliah
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
| | - Andi Nur Nilamyani
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA;
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
- Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
- Correspondence:
| |
Collapse
|
19
|
Hasan MM, Shoombuatong W, Kurata H, Manavalan B. Critical evaluation of web-based DNA N6-methyladenine site prediction tools. Brief Funct Genomics 2021; 20:258-272. [PMID: 33491072 DOI: 10.1093/bfgp/elaa028] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Revised: 12/11/2020] [Accepted: 12/15/2020] [Indexed: 12/13/2022] Open
Abstract
Methylation of DNA N6-methyladenosine (6mA) is a type of epigenetic modification that plays pivotal roles in various biological processes. The accurate genome-wide identification of 6mA is a challenging task that leads to understanding the biological functions. For the last 5 years, a number of bioinformatics approaches and tools for 6mA site prediction have been established, and some of them are easily accessible as web application. Nevertheless, the accurate genome-wide identification of 6mA is still one of the challenging works that lead to understanding the biological functions. Especially in practical applications, these tools have implemented diverse encoding schemes, machine learning algorithms and feature selection methods, whereas few systematic performance comparisons of 6mA site predictors have been reported. In this review, 11 publicly available 6mA predictors evaluated with seven different species-specific datasets (Arabidopsis thaliana, Tolypocladium, Diospyros lotus, Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans and Escherichia coli). Of those, few species are close homologs, and the remaining datasets are distant sequences. Our independent, validation tests demonstrated that Meta-i6mA and MM-6mAPred models for A. thaliana, Tolypocladium, S. cerevisiae and D. melanogaster achieved excellent overall performance when compared with their counterparts. However, none of the existing methods were suitable for E. coli, C. elegans and D. lotus. A feasibility of the existing predictors is also discussed for the seven species. Our evaluation provides useful guidelines for the development of 6mA site predictors and helps biologists selecting suitable prediction tools.
Collapse
Affiliation(s)
| | - Watshara Shoombuatong
- Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics in the Kyushu Institute of Technology, Japan
| | | |
Collapse
|
20
|
Pirtskhalava M, Amstrong AA, Grigolava M, Chubinidze M, Alimbarashvili E, Vishnepolsky B, Gabrielian A, Rosenthal A, Hurt DE, Tartakovsky M. DBAASP v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Res 2021; 49:D288-D297. [PMID: 33151284 PMCID: PMC7778994 DOI: 10.1093/nar/gkaa991] [Citation(s) in RCA: 208] [Impact Index Per Article: 69.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 10/09/2020] [Accepted: 10/14/2020] [Indexed: 12/30/2022] Open
Abstract
The Database of Antimicrobial Activity and Structure of Peptides (DBAASP) is an open-access, comprehensive database containing information on amino acid sequences, chemical modifications, 3D structures, bioactivities and toxicities of peptides that possess antimicrobial properties. DBAASP is updated continuously, and at present, version 3.0 (DBAASP v3) contains >15 700 entries (8000 more than the previous version), including >14 500 monomers and nearly 400 homo- and hetero-multimers. Of the monomeric antimicrobial peptides (AMPs), >12 000 are synthetic, about 2700 are ribosomally synthesized, and about 170 are non-ribosomally synthesized. Approximately 3/4 of the entries were added after the initial release of the database in 2014 reflecting the recent sharp increase in interest in AMPs. Despite the increased interest, adoption of peptide antimicrobials in clinical practice is still limited as a consequence of several factors including side effects, problems with bioavailability and high production costs. To assist in developing and optimizing de novo peptides with desired biological activities, DBAASP offers several tools including a sophisticated multifactor analysis of relevant physicochemical properties. Furthermore, DBAASP has implemented a structure modelling pipeline that automates the setup, execution and upload of molecular dynamics (MD) simulations of database peptides. At present, >3200 peptides have been populated with MD trajectories and related analyses that are both viewable within the web browser and available for download. More than 400 DBAASP entries also have links to experimentally determined structures in the Protein Data Bank. DBAASP v3 is freely accessible at http://dbaasp.org.
Collapse
Affiliation(s)
- Malak Pirtskhalava
- Ivane Beritashvili Center of Experimental Biomedicine, Tbilisi 0160, Georgia
| | - Anthony A Amstrong
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Maia Grigolava
- Ivane Beritashvili Center of Experimental Biomedicine, Tbilisi 0160, Georgia
| | - Mindia Chubinidze
- Ivane Beritashvili Center of Experimental Biomedicine, Tbilisi 0160, Georgia
| | | | - Boris Vishnepolsky
- Ivane Beritashvili Center of Experimental Biomedicine, Tbilisi 0160, Georgia
| | - Andrei Gabrielian
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Alex Rosenthal
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Darrell E Hurt
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Michael Tartakovsky
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
21
|
Hasan MM, Alam MA, Shoombuatong W, Kurata H. IRC-Fuse: improved and robust prediction of redox-sensitive cysteine by fusing of multiple feature representations. J Comput Aided Mol Des 2021; 35:315-323. [PMID: 33392948 DOI: 10.1007/s10822-020-00368-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 12/06/2020] [Indexed: 12/11/2022]
Abstract
Redox-sensitive cysteine (RSC) thiol contributes to many biological processes. The identification of RSC plays an important role in clarifying some mechanisms of redox-sensitive factors; nonetheless, experimental investigation of RSCs is expensive and time-consuming. The computational approaches that quickly and accurately identify candidate RSCs using the sequence information are urgently needed. Herein, an improved and robust computational predictor named IRC-Fuse was developed to identify the RSC by fusing of multiple feature representations. To enhance the performance of our model, we integrated the probability scores evaluated by the random forest models implementing different encoding schemes. Cross-validation results exhibited that the IRC-Fuse achieved accuracy and AUC of 0.741 and 0.807, respectively. The IRC-Fuse outperformed exiting methods with improvement of 10% and 13% on accuracy and MCC, respectively, over independent test data. Comparative analysis suggested that the IRC-Fuse was more effective and promising than the existing predictors. For the convenience of experimental scientists, the IRC-Fuse online web server was implemented and publicly accessible at http://kurata14.bio.kyutech.ac.jp/IRC-Fuse/ .
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan. .,Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan.
| | - Md Ashad Alam
- Tulane Center of Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan.
| |
Collapse
|
22
|
Meng Q, Wu Y, Sui X, Meng J, Wang T, Lin Y, Wang Z, Zhou X, Qi Y, Du J, Gao Y. POTN: A Human Leukocyte Antigen-A2 Immunogenic Peptides Screening Model and Its Applications in Tumor Antigens Prediction. Front Immunol 2020; 11:02193. [PMID: 33133063 PMCID: PMC7579403 DOI: 10.3389/fimmu.2020.02193] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 08/11/2020] [Indexed: 12/23/2022] Open
Abstract
Whole genome/exome sequencing data for tumors are now abundant, and many tumor antigens, especially mutant antigens (neoantigens), have been identified for cancer immunotherapy. However, only a small fraction of the peptides from these antigens induce cytotoxic T cell responses. Therefore, efficient methods to identify these antigenic peptides are crucial. The current models of major histocompatibility complex (MHC) binding and antigenic prediction are still inaccurate. In this study, 360 9-mer peptides with verified immunological activity were selected to construct a prediction of tumor neoantigen (POTN) model, an immunogenic prediction model specifically for the human leukocyte antigen-A2 allele. Based on the physicochemical properties of amino acids, such as the residue propensity, hydrophobicity, and organic solvent/water, we found that the predictive capability of POTN is superior to that of the prediction programs SYPEITHI, IEDB, and NetMHCpan 4.0. We used POTN to screen peptides for the cancer-testis antigen located on the X chromosome, and we identified several peptides that may trigger immunogenicity. We synthesized and measured the binding affinity and immunogenicity of these peptides and found that the accuracy of POTN is higher than that of NetMHCpan 4.0. Identifying the properties related to the T cell response or immunogenicity paves the way to understanding the MHC/peptide/T cell receptor complex. In conclusion, POTN is an efficient prediction model for screening high-affinity immunogenic peptides from tumor antigens, and thus provides useful information for developing cancer immunotherapy.
Collapse
Affiliation(s)
- Qingqing Meng
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Yahong Wu
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Xinghua Sui
- School of Pharmaceutical Sciences (Shenzhen), Sun Yat-sen University, Shenzhen, China
| | - Jingjie Meng
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Tingting Wang
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Yan Lin
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Zhiwei Wang
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Xiuman Zhou
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Yuanming Qi
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Jiangfeng Du
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Yanfeng Gao
- School of Life Sciences, Zhengzhou University, Zhengzhou, China.,School of Pharmaceutical Sciences (Shenzhen), Sun Yat-sen University, Shenzhen, China
| |
Collapse
|
23
|
Khatun MS, Hasan MM, Shoombuatong W, Kurata H. ProIn-Fuse: improved and robust prediction of proinflammatory peptides by fusing of multiple feature representations. J Comput Aided Mol Des 2020; 34:1229-1236. [DOI: 10.1007/s10822-020-00343-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Accepted: 09/16/2020] [Indexed: 12/11/2022]
|
24
|
Khatun MS, Shoombuatong W, Hasan MM, Kurata H. Evolution of Sequence-based Bioinformatics Tools for Protein-protein Interaction Prediction. Curr Genomics 2020; 21:454-463. [PMID: 33093807 PMCID: PMC7536797 DOI: 10.2174/1389202921999200625103936] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 03/19/2020] [Accepted: 05/27/2020] [Indexed: 12/22/2022] Open
Abstract
Protein-protein interactions (PPIs) are the physical connections between two or more proteins via electrostatic forces or hydrophobic effects. Identification of the PPIs is pivotal, which contributes to many biological processes including protein function, disease incidence, and therapy design. The experimental identification of PPIs via high-throughput technology is time-consuming and expensive. Bioinformatics approaches are expected to solve such restrictions. In this review, our main goal is to provide an inclusive view of the existing sequence-based computational prediction of PPIs. Initially, we briefly introduce the currently available PPI databases and then review the state-of-the-art bioinformatics approaches, working principles, and their performances. Finally, we discuss the caveats and future perspective of the next generation algorithms for the prediction of PPIs.
Collapse
Affiliation(s)
| | | | - Md. Mehedi Hasan
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan; Tel: +81-948-297-828; E-mail: and Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| | - Hiroyuki Kurata
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan; Tel: +81-948-297-828; E-mail: and Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| |
Collapse
|
25
|
Nava Lara RA, Beltrán JA, Brizuela CA, Del Rio G. Relevant Features of Polypharmacologic Human-Target Antimicrobials Discovered by Machine-Learning Techniques. Pharmaceuticals (Basel) 2020; 13:ph13090204. [PMID: 32825532 PMCID: PMC7559829 DOI: 10.3390/ph13090204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 08/07/2020] [Accepted: 08/07/2020] [Indexed: 11/16/2022] Open
Abstract
Polypharmacologic human-targeted antimicrobials (polyHAM) are potentially useful in the treatment of complex human diseases where the microbiome is important (e.g., diabetes, hypertension). We previously reported a machine-learning approach to identify polyHAM from FDA-approved human targeted drugs using a heterologous approach (training with peptides and non-peptide compounds). Here we discover that polyHAM are more likely to be found among antimicrobials displaying a broad-spectrum antibiotic activity and that topological, but not chemical features, are most informative to classify this activity. A heterologous machine-learning approach was trained with broad-spectrum antimicrobials and tested with human metabolites; these metabolites were labeled as antimicrobials or non-antimicrobials based on a naïve text-mining approach. Human metabolites are not commonly recognized as antimicrobials yet circulate in the human body where microbes are found and our heterologous model was able to classify those with antimicrobial activity. These results provide the basis to develop applications aimed to design human diets that purposely alter metabolic compounds proportions as a way to control human microbiome.
Collapse
Affiliation(s)
- Rodrigo A. Nava Lara
- Department of Biochemistry and Structural Biology, Instituto de Fisiologia Celular, UNAM, Mexico City 04510, Mexico;
| | - Jesús A. Beltrán
- Department of Computer Science, CICESE Research Center, Ensenada 22860, Mexico; (J.A.B.); (C.A.B.)
| | - Carlos A. Brizuela
- Department of Computer Science, CICESE Research Center, Ensenada 22860, Mexico; (J.A.B.); (C.A.B.)
| | - Gabriel Del Rio
- Department of Biochemistry and Structural Biology, Instituto de Fisiologia Celular, UNAM, Mexico City 04510, Mexico;
- Correspondence:
| |
Collapse
|
26
|
Hasan MM, Manavalan B, Shoombuatong W, Khatun MS, Kurata H. i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation. PLANT MOLECULAR BIOLOGY 2020; 103:225-234. [PMID: 32140819 DOI: 10.1007/s11103-020-00988-y] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 02/29/2020] [Indexed: 05/28/2023]
Abstract
DNA N6-methyladenine (6 mA) is one of the most vital epigenetic modifications and involved in controlling the various gene expression levels. With the avalanche of DNA sequences generated in numerous databases, the accurate identification of 6 mA plays an essential role for understanding molecular mechanisms. Because the experimental approaches are time-consuming and costly, it is desirable to develop a computation model for rapidly and accurately identifying 6 mA. To the best of our knowledge, we first proposed a computational model named i6mA-Fuse to predict 6 mA sites from the Rosaceae genomes, especially in Rosa chinensis and Fragaria vesca. We implemented the five encoding schemes, i.e., mononucleotide binary, dinucleotide binary, k-space spectral nucleotide, k-mer, and electron-ion interaction pseudo potential compositions, to build the five, single-encoding random forest (RF) models. The i6mA-Fuse uses a linear regression model to combine the predicted probability scores of the five, single encoding-based RF models. The resultant species-specific i6mA-Fuse achieved remarkably high performances with AUCs of 0.982 and 0.978 and with MCCs of 0.869 and 0.858 on the independent datasets of Rosa chinensis and Fragaria vesca, respectively. In the F. vesca-specific i6mA-Fuse, the MBE and EIIP contributed to 75% and 25% of the total prediction; in the R. chinensis-specific i6mA-Fuse, Kmer, MBE, and EIIP contribute to 15%, 65%, and 20% of the total prediction. To assist high-throughput prediction for DNA 6 mA identification, the i6mA-Fuse is publicly accessible at https://kurata14.bio.kyutech.ac.jp/i6mA-Fuse/.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
- Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan
| | | | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan.
- Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan.
| |
Collapse
|
27
|
i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes. Comput Struct Biotechnol J 2020; 18:906-912. [PMID: 32322372 PMCID: PMC7168350 DOI: 10.1016/j.csbj.2020.04.001] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 03/31/2020] [Accepted: 04/03/2020] [Indexed: 12/12/2022] Open
Abstract
N4-methylcytosine (4mC) is one of the most important DNA modifications and involved in regulating cell differentiations and gene expressions. The accurate identification of 4mC sites is necessary to understand various biological functions. In this work, we developed a new computational predictor called i4mC-Mouse to identify 4mC sites in the mouse genome. Herein, six encoding schemes of k-space nucleotide composition (KSNC), k-mer nucleotide composition (Kmer), mono nucleotide binary encoding (MBE), dinucleotide binary encoding, electron–ion interaction pseudo potentials (EIIP) and dinucleotide physicochemical composition were explored that cover different characteristics of DNA sequence information. Subsequently, we built six RF-based encoding models and then linearly combined their probability scores to construct the final predictor. Among the six RF-based models, the Kmer, KSNC, MBE, and EIIP encodings are sufficient, which contributed to 10%, 45%, 25%, and 20% of the prediction performance, respectively. On the independent test the i4mC-Mouse predicted the 4mC sites with accuracy and MCC of 0.816 and 0.633, respectively, which were approximately 2.5% and 5% higher than those of the existing method (4mCpred-EL). For experimental biologists, a freely available web application was implemented at http://kurata14.bio.kyutech.ac.jp/i4mC-Mouse/.
Collapse
|
28
|
Rashid MM, Shatabda S, Hasan MM, Kurata H. Recent Development of Machine Learning Methods in Microbial Phosphorylation Sites. Curr Genomics 2020; 21:194-203. [PMID: 33071613 PMCID: PMC7521030 DOI: 10.2174/1389202921666200427210833] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 04/12/2020] [Accepted: 04/13/2020] [Indexed: 01/10/2023] Open
Abstract
A variety of protein post-translational modifications has been identified that control many cellular functions. Phosphorylation studies in mycobacterial organisms have shown critical importance in diverse biological processes, such as intercellular communication and cell division. Recent technical advances in high-precision mass spectrometry have determined a large number of microbial phosphorylated proteins and phosphorylation sites throughout the proteome analysis. Identification of phosphorylated proteins with specific modified residues through experimentation is often labor-intensive, costly and time-consuming. All these limitations could be overcome through the application of machine learning (ML) approaches. However, only a limited number of computational phosphorylation site prediction tools have been developed so far. This work aims to present a complete survey of the existing ML-predictors for microbial phosphorylation. We cover a variety of important aspects for developing a successful predictor, including operating ML algorithms, feature selection methods, window size, and software utility. Initially, we review the currently available phosphorylation site databases of the microbiome, the state-of-the-art ML approaches, working principles, and their performances. Lastly, we discuss the limitations and future directions of the computational ML methods for the prediction of phosphorylation.
Collapse
Affiliation(s)
| | | | - Md. Mehedi Hasan
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828;, E-mail: and Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| | - Hiroyuki Kurata
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828;, E-mail: and Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| |
Collapse
|
29
|
Mosharaf MP, Hassan MM, Ahmed FF, Khatun MS, Moni MA, Mollah MNH. Computational prediction of protein ubiquitination sites mapping on Arabidopsis thaliana. Comput Biol Chem 2020; 85:107238. [DOI: 10.1016/j.compbiolchem.2020.107238] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Revised: 01/22/2020] [Accepted: 02/18/2020] [Indexed: 02/06/2023]
|
30
|
Basith S, Manavalan B, Hwan Shin T, Lee G. Machine intelligence in peptide therapeutics: A next‐generation tool for rapid disease screening. Med Res Rev 2020; 40:1276-1314. [DOI: 10.1002/med.21658] [Citation(s) in RCA: 139] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 11/26/2019] [Accepted: 12/16/2019] [Indexed: 12/12/2022]
Affiliation(s)
- Shaherin Basith
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| | | | - Tae Hwan Shin
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| | - Gwang Lee
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| |
Collapse
|
31
|
Hasan MM, Manavalan B, Khatun MS, Kurata H. i4mC-ROSE, a bioinformatics tool for the identification of DNA N4-methylcytosine sites in the Rosaceae genome. Int J Biol Macromol 2019; 157:752-758. [PMID: 31805335 DOI: 10.1016/j.ijbiomac.2019.12.009] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 11/29/2019] [Accepted: 12/02/2019] [Indexed: 12/18/2022]
Abstract
One of the most important epigenetic modifications is N4-methylcytosine, which regulates many biological processes including DNA replication and chromosome stability. Identification of N4-methylcytosine sites is pivotal to understand specific biological functions. Herein, we developed the first bioinformatics tool called i4mC-ROSE for identifying N4-methylcytosine sites in the genomes of Fragaria vesca and Rosa chinensis in the Rosaceae, which utilizes a random forest classifier with six encoding methods that cover various aspects of DNA sequence information. The i4mC-ROSE predictor achieves area under the curve scores of 0.883 and 0.889 for the two genomes during cross-validation. Moreover, the i4mC-ROSE outperforms other classifiers tested in this study when objectively evaluated on the independent datasets. The proposed i4mC-ROSE tool can serve users' demand for the prediction of 4mC sites in the Rosaceae genome. The i4mC-ROSE predictor and utilized datasets are publicly accessible at http://kurata14.bio.kyutech.ac.jp/i4mC-ROSE/.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Balachandran Manavalan
- Department of Physiology, Ajou University School of Medicine, Suwon 443380, Republic of Korea
| | - Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| |
Collapse
|
32
|
Hasan MM, Manavalan B, Khatun MS, Kurata H. Prediction of S-nitrosylation sites by integrating support vector machines and random forest. Mol Omics 2019; 15:451-458. [DOI: 10.1039/c9mo00098d] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Cysteine S-nitrosylation is a type of reversible post-translational modification of proteins, which controls diverse biological processes.
Collapse
Affiliation(s)
- Md. Mehedi Hasan
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
- Japan Society for the Promotion of Science
| | | | - Mst. Shamima Khatun
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
- Biomedical Informatics R&D Center
| |
Collapse
|