1
|
Weckbecker M, Anžel A, Yang Z, Hattab G. Interpretable molecular encodings and representations for machine learning tasks. Comput Struct Biotechnol J 2024; 23:2326-2336. [PMID: 38867722 PMCID: PMC11167246 DOI: 10.1016/j.csbj.2024.05.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 05/13/2024] [Accepted: 05/19/2024] [Indexed: 06/14/2024] Open
Abstract
Molecular encodings and their usage in machine learning models have demonstrated significant breakthroughs in biomedical applications, particularly in the classification of peptides and proteins. To this end, we propose a new encoding method: Interpretable Carbon-based Array of Neighborhoods (iCAN). Designed to address machine learning models' need for more structured and less flexible input, it captures the neighborhoods of carbon atoms in a counting array and improves the utility of the resulting encodings for machine learning models. The iCAN method provides interpretable molecular encodings and representations, enabling the comparison of molecular neighborhoods, identification of repeating patterns, and visualization of relevance heat maps for a given data set. When reproducing a large biomedical peptide classification study, it outperforms its predecessor encoding. When extended to proteins, it outperforms a lead structure-based encoding on 71% of the data sets. Our method offers interpretable encodings that can be applied to all organic molecules, including exotic amino acids, cyclic peptides, and larger proteins, making it highly versatile across various domains and data sets. This work establishes a promising new direction for machine learning in peptide and protein classification in biomedicine and healthcare, potentially accelerating advances in drug discovery and disease diagnosis.
Collapse
Affiliation(s)
- Moritz Weckbecker
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
| | - Aleksandar Anžel
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
| | - Zewen Yang
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
| | - Georges Hattab
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
- Department of Mathematics and Computer science Freie Universität, Arnimallee 14, Berlin, 14195, Berlin, Germany
| |
Collapse
|
2
|
Rong Y, Feng B, Cai X, Song H, Wang L, Wang Y, Yan X, Sun Y, Zhao J, Li P, Yang H, Wang Y, Wang F. Predicting variable-length ACE inhibitory peptides based on graph convolutional network. Int J Biol Macromol 2024:137060. [PMID: 39481706 DOI: 10.1016/j.ijbiomac.2024.137060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 10/07/2024] [Accepted: 10/28/2024] [Indexed: 11/02/2024]
Abstract
Traditional molecular descriptors have contributed to the prediction of angiotensin I-converting enzyme (ACE) inhibitory peptides, but they often fall short in capturing the complex structure of the molecule. To address these limitations, this study introduces molecular graphs as an advanced method for peptide characterization. Peptides containing 2-10 amino acids were represented using molecular graphs, and a graph convolutional network (GCN) model was constructed to predict variable-length peptides. This model was compared with machine learning (ML) models based on molecular descriptors, including Random Forest (RF), Support Vector Machine (SVM), and k-Nearest Neighbor (kNN), under the same benchmark. Notably, the GCN model outperformed the other models with an accuracy of 0.78, effectively identifying ACE inhibitory potential. Furthermore, the GCN model also demonstrated superior performance, exceeding existing methods with an accuracy rate of over 98 % on an independent test set. To validate our predictions, we synthesized peptides VAPE and AQQKEP with high predicted probabilities, and their IC50 values of 2.25 ± 0.11 and 3.75 ± 0.17 μM, respectively, indicating potent ACE inhibitory activity. The developed GCN model presents a powerful tool for the rapid screening and identification of ACE inhibitory peptides, offering promising opportunities for developing antihypertensive components in functional foods.
Collapse
Affiliation(s)
- Yating Rong
- Institute of Agro-Products Processing Science and Technology, Chinese Academy of Agricultural Sciences/Key Laboratory of Agro-Products Processing, Ministry of Agriculture, Beijing 100193, China; Department of Food Science, Northeast Agricultural University, Harbin 150030, China
| | - Baolong Feng
- Center for Education Technology, Northeast Agricultural University, Harbin 150030, PR China.
| | - Xiaoshuang Cai
- Department of Food Science, Northeast Agricultural University, Harbin 150030, China
| | - Hongjie Song
- Department of Food Science, Northeast Agricultural University, Harbin 150030, China
| | - Lili Wang
- Department of Food Science, Northeast Agricultural University, Harbin 150030, China
| | - Yehui Wang
- Department of Food Science, Northeast Agricultural University, Harbin 150030, China
| | - Xinxu Yan
- Department of Food Science, Northeast Agricultural University, Harbin 150030, China
| | - Yulin Sun
- Department of Food Science, Northeast Agricultural University, Harbin 150030, China
| | - Jinyong Zhao
- Institute of Agro-Products Processing Science and Technology, Chinese Academy of Agricultural Sciences/Key Laboratory of Agro-Products Processing, Ministry of Agriculture, Beijing 100193, China
| | - Ping Li
- Institute of Agro-Products Processing Science and Technology, Chinese Academy of Agricultural Sciences/Key Laboratory of Agro-Products Processing, Ministry of Agriculture, Beijing 100193, China
| | - Huihui Yang
- Institute of Agro-Products Processing Science and Technology, Chinese Academy of Agricultural Sciences/Key Laboratory of Agro-Products Processing, Ministry of Agriculture, Beijing 100193, China
| | - Yutang Wang
- Institute of Agro-Products Processing Science and Technology, Chinese Academy of Agricultural Sciences/Key Laboratory of Agro-Products Processing, Ministry of Agriculture, Beijing 100193, China; Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji 831100, China.
| | - Fengzhong Wang
- Institute of Agro-Products Processing Science and Technology, Chinese Academy of Agricultural Sciences/Key Laboratory of Agro-Products Processing, Ministry of Agriculture, Beijing 100193, China.
| |
Collapse
|
3
|
Rathore AS, Choudhury S, Arora A, Tijare P, Raghava GPS. ToxinPred 3.0: An improved method for predicting the toxicity of peptides. Comput Biol Med 2024; 179:108926. [PMID: 39038391 DOI: 10.1016/j.compbiomed.2024.108926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 05/17/2024] [Accepted: 07/17/2024] [Indexed: 07/24/2024]
Abstract
Toxicity emerges as a prominent challenge in the design of therapeutic peptides, causing the failure of numerous peptides during clinical trials. In 2013, our group developed ToxinPred, a computational method that has been extensively adopted by the scientific community for predicting peptide toxicity. In this paper, we propose a refined variant of ToxinPred that showcases improved reliability and accuracy in predicting peptide toxicity. Initially, we utilized a similarity/alignment-based approach employing BLAST to predict toxic peptides, which yielded satisfactory accuracy; however, the method suffered from inadequate coverage. Subsequently, we employed a motif-based approach using MERCI software to uncover specific patterns or motifs that are exclusively observed in toxic peptides. The search for these motifs in peptides allowed us to predict toxic peptides with a high level of specificity with poor sensitivity. To overcome the coverage limitations, we developed alignment-free methods using machine/deep learning techniques to balance sensitivity and specificity of prediction. Deep learning model (ANN - LSTM with fixed sequence length) developed using one-hot encoding achieved a maximum AUROC of 0.93 with MCC of 0.71 on an independent dataset. Machine learning model (extra tree) developed using compositional features of peptides achieved a maximum AUROC of 0.95 with MCC of 0.78. We also developed large language models and achieved maximum AUC of 0.93 using ESM2-t33. Finally, we developed hybrid or ensemble methods combining two or more methods to enhance performance. Our specific hybrid method, which combines a motif-based approach with a machine learning-based model, achieved a maximum AUROC of 0.98 with MCC 0.81 on an independent dataset. In this study, all models were trained and tested on 80 % of data using five-fold cross-validation and evaluated on the remaining 20 % of data called independent dataset. The evaluation of all methods on an independent dataset revealed that the method proposed in this study exhibited better performance than existing methods. To cater to the needs of the scientific community, we have developed a standalone software, pip package and web-based server ToxinPred3 (https://github.com/raghavagps/toxinpred3 and https://webs.iiitd.edu.in/raghava/toxinpred3/).
Collapse
Affiliation(s)
- Anand Singh Rathore
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Shubham Choudhury
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Akanksha Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Purva Tijare
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| |
Collapse
|
4
|
Periwal N, Arora P, Thakur A, Agrawal L, Goyal Y, Rathore AS, Anand HS, Kaur B, Sood V. Antiprotozoal peptide prediction using machine learning with effective feature selection techniques. Heliyon 2024; 10:e36163. [PMID: 39247292 PMCID: PMC11380031 DOI: 10.1016/j.heliyon.2024.e36163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/09/2024] [Accepted: 08/11/2024] [Indexed: 09/10/2024] Open
Abstract
Background Protozoal pathogens pose a considerable threat, leading to notable mortality rates and the ongoing challenge of developing resistance to drugs. This situation underscores the urgent need for alternative therapeutic approaches. Antimicrobial peptides stand out as promising candidates for drug development. However, there is a lack of published research focusing on predicting antimicrobial peptides specifically targeting protozoal pathogens. In this study, we introduce a successful machine learning-based framework designed to predict potential antiprotozoal peptides effective against protozoal pathogens. Objective The primary objective of this study is to classify and predict antiprotozoal peptides using diverse negative datasets. Methods A comprehensive literature review was conducted to gather experimentally validated antiprotozoal peptides, forming the positive dataset for our study. To construct a robust machine learning classifier, multiple negative datasets were incorporated, including (i) non-antimicrobial, (ii) antiviral, (iii) antibacterial, (iv) antifungal, and (v) antimicrobial peptides excluding those targeting protozoal pathogens. Various compositional features of the peptides were extracted using the pfeature algorithm. Two feature selection methods, SVC-L1 and mRMR, were employed to identify highly relevant features crucial for distinguishing between the positive and negative datasets. Additionally, five popular classifiers i.e. Decision Tree, Random Forest, Support Vector Machine, Logistic Regression, and XGBoost were used to build efficient decision models. Results XGBoost was the most effective in classifying antiprotozoal peptides from each negative dataset based on the features selected by the mRMR feature selection method. The proposed machine learning framework efficiently differentiate the antiprotozoal peptides from (i) non-antimicrobial (ii) antiviral (iii) antibacterial (iv) antifungal and (v) antimicrobial with accuracy of 97.27 %, 93.64 %, 86.36 %, 90.91 %, and 89.09 % respectively on the validation dataset. Conclusion The models are incorporated in a user-friendly web server (www.soodlab.com/appred) to predict the antiprotozoal activity of given peptides.
Collapse
Affiliation(s)
- Neha Periwal
- Department of Biochemistry, Jamia Hamdard, India
| | - Pooja Arora
- Department of Zoology, Hansraj College, University of Delhi, India
| | | | | | - Yash Goyal
- Department of Computer Science, Hansraj College, University of Delhi, India
| | - Anand S Rathore
- Department of Zoology, Hansraj College, University of Delhi, India
| | | | - Baljeet Kaur
- Department of Computer Science, Hansraj College, University of Delhi, India
| | - Vikas Sood
- Department of Biochemistry, Jamia Hamdard, India
| |
Collapse
|
5
|
Bajiya N, Choudhury S, Dhall A, Raghava GPS. AntiBP3: A Method for Predicting Antibacterial Peptides against Gram-Positive/Negative/Variable Bacteria. Antibiotics (Basel) 2024; 13:168. [PMID: 38391554 PMCID: PMC10885866 DOI: 10.3390/antibiotics13020168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 02/03/2024] [Accepted: 02/06/2024] [Indexed: 02/24/2024] Open
Abstract
Most of the existing methods developed for predicting antibacterial peptides (ABPs) are mostly designed to target either gram-positive or gram-negative bacteria. In this study, we describe a method that allows us to predict ABPs against gram-positive, gram-negative, and gram-variable bacteria. Firstly, we developed an alignment-based approach using BLAST to identify ABPs and achieved poor sensitivity. Secondly, we employed a motif-based approach to predict ABPs and obtained high precision with low sensitivity. To address the issue of poor sensitivity, we developed alignment-free methods for predicting ABPs using machine/deep learning techniques. In the case of alignment-free methods, we utilized a wide range of peptide features that include different types of composition, binary profiles of terminal residues, and fastText word embedding. In this study, a five-fold cross-validation technique has been used to build machine/deep learning models on training datasets. These models were evaluated on an independent dataset with no common peptide between training and independent datasets. Our machine learning-based model developed using the amino acid binary profile of terminal residues achieved maximum AUC 0.93, 0.98, and 0.94 for gram-positive, gram-negative, and gram-variable bacteria, respectively, on an independent dataset. Our method performs better than existing methods when compared with existing approaches on an independent dataset. A user-friendly web server, standalone package and pip package have been developed to facilitate peptide-based therapeutics.
Collapse
Affiliation(s)
- Nisha Bajiya
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi 110020, India
| | - Shubham Choudhury
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi 110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi 110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi 110020, India
| |
Collapse
|
6
|
Yao L, Guan J, Li W, Chung CR, Deng J, Chiang YC, Lee TY. Identifying Antitubercular Peptides via Deep Forest Architecture with Effective Feature Representation. Anal Chem 2024; 96:1538-1546. [PMID: 38226973 DOI: 10.1021/acs.analchem.3c04196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Abstract
Tuberculosis (TB) is a severe disease caused by Mycobacterium tuberculosis that poses a significant threat to human health. The emergence of drug-resistant strains has made the global fight against TB even more challenging. Antituberculosis peptides (ATPs) have shown promising results as a potential treatment for TB. However, conventional wet lab-based approaches to ATP discovery are time-consuming and costly and often fail to discover peptides with desired properties. To address these challenges, we propose a novel machine learning-based framework called ATPfinder that can significantly accelerate the discovery of ATP. Our approach integrates various efficient peptide descriptors and utilizes the deep forest algorithm to construct the model. This neural network-like cascading structure can effectively process and mine features without complex hyperparameter tuning. Our experimental results show that ATPfinder outperforms existing ATP prediction tools, achieving state-of-the-art performance with an accuracy of 89.3% and an MCC of 0.70. Moreover, our framework exhibits better robustness than baseline algorithms commonly used for other sequence analysis tasks. Additionally, the excellent interpretability of our model can assist researchers in understanding the critical features of ATP. Finally, we developed a downloadable desktop application to simplify the use of our framework for researchers. Therefore, ATPfinder can facilitate the discovery of peptide drugs and provide potential solutions for TB treatment. Our framework is freely available at https://github.com/lantianyao/ATPfinder/ (data sets and code) and https://awi.cuhk.edu.cn/dbAMP/ATPfinder.html (software).
Collapse
Affiliation(s)
- Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Jiahui Guan
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Wenshuo Li
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, 320317 Taoyuan, Taiwan
| | - Junyang Deng
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Ying-Chih Chiang
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, 300093 Hsinchu, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, 300093 Hsinchu, Taiwan
| |
Collapse
|
7
|
Zou H. iHBPs-VWDC: variable-length window-based dynamic connectivity approach for identifying hormone-binding proteins. J Biomol Struct Dyn 2023:1-10. [PMID: 37978902 DOI: 10.1080/07391102.2023.2283150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/08/2023] [Indexed: 11/19/2023]
Abstract
Hormone-binding proteins (HBPs) are soluble carrier proteins that play a vital role in the growth and development of living organisms. Identifying HBPs accurately is crucial for understanding their functions. However, traditional wet lab experimental methods are labor intensive and cost ineffective. Therefore, there is a need for computational methods to efficiently identify HBPs. In this study, a machine learning method based on support vector machine (SVM) was proposed for the accurate and efficient identification of HBPs. The encoding of protein sequences involved using fifty different physicochemical (PC) properties. A variable-length window-based dynamic connectivity method was applied to capture the connection information between two different PC properties through two distinct strategies. The canonical correlation analysis algorithm was then used to fuse features obtained from these approaches. Feature selection was performed using the F-score approach to choose the most discriminative features. Finally, these selected features were fed into the SVM to discriminate between HBPs and non-HBPs. The proposed method achieved high classification accuracies of 99.19%, 96.77%, and 94.57% on the main dataset and two independent datasets, respectively, as demonstrated in the jackknife test. Comparative results showed that our proposed method outperforms existing approaches on the same datasets, indicating its potential as a useful tool for identifying HBPs. The Matlab codes and datasets used in the current study are freely available at https://figshare.com/articles/online_resource/iHBPs-VWDC/23559834.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
- Jiangxi Engineering Research Center of Unattended Perception System and Artificial Intelligence Technology, Jiangxi Science and Technology Normal University, Nanchang, China
| |
Collapse
|
8
|
Yu Z, Yin Z, Zou H. iAMY-RECMFF: Identifying amyloidgenic peptides by using residue pairwise energy content matrix and features fusion algorithm. J Bioinform Comput Biol 2023; 21:2350023. [PMID: 37899353 DOI: 10.1142/s0219720023500233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2023]
Abstract
Various diseases, including Huntington's disease, Alzheimer's disease, and Parkinson's disease, have been reported to be linked to amyloid. Therefore, it is crucial to distinguish amyloid from non-amyloid proteins or peptides. While experimental approaches are typically preferred, they are costly and time-consuming. In this study, we have developed a machine learning framework called iAMY-RECMFF to discriminate amyloidgenic from non-amyloidgenic peptides. In our model, we first encoded the peptide sequences using the residue pairwise energy content matrix. We then utilized Pearson's correlation coefficient and distance correlation to extract useful information from this matrix. Additionally, we employed an improved similarity network fusion algorithm to integrate features from different perspectives. The Fisher approach was adopted to select the optimal feature subset. Finally, the selected features were inputted into a support vector machine for identifying amyloidgenic peptides. Experimental results demonstrate that our proposed method significantly improves the identification of amyloidgenic peptides compared to existing predictors. This suggests that our method may serve as a powerful tool in identifying amyloidgenic peptides. To facilitate academic use, the dataset and codes used in the current study are accessible at https://figshare.com/articles/online_resource/iAMY-RECMFF/22816916.
Collapse
Affiliation(s)
- Zizheng Yu
- School of Communications and Electronics Jiangxi, Science and Technology Normal University, Nanchang 330013, P. R. China
| | - Zhijian Yin
- School of Communications and Electronics Jiangxi, Science and Technology Normal University, Nanchang 330013, P. R. China
- Jiangxi Engineering Research Center of Unattended Perception System and Artificial Intelligence Technology Jiangxi Science and Technology Normal University, Jiangxi 330088, P. R. China
| | - Hongliang Zou
- School of Communications and Electronics Jiangxi, Science and Technology Normal University, Nanchang 330013, P. R. China
- Jiangxi Engineering Research Center of Unattended Perception System and Artificial Intelligence Technology Jiangxi Science and Technology Normal University, Jiangxi 330088, P. R. China
| |
Collapse
|
9
|
Polinário G, Primo LMDG, Rosa MABC, Dett FHM, Barbugli PA, Roque-Borda CA, Pavan FR. Antimicrobial peptides as drugs with double response against Mycobacterium tuberculosis coinfections in lung cancer. Front Microbiol 2023; 14:1183247. [PMID: 37342560 PMCID: PMC10277934 DOI: 10.3389/fmicb.2023.1183247] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 05/16/2023] [Indexed: 06/23/2023] Open
Abstract
Tuberculosis and lung cancer are, in many cases, correlated diseases that can be confused because they have similar symptoms. Many meta-analyses have proven that there is a greater chance of developing lung cancer in patients who have active pulmonary tuberculosis. It is, therefore, important to monitor the patient for a long time after recovery and search for combined therapies that can treat both diseases, as well as face the great problem of drug resistance. Peptides are molecules derived from the breakdown of proteins, and the membranolytic class is already being studied. It has been proposed that these molecules destabilize cellular homeostasis, performing a dual antimicrobial and anticancer function and offering several possibilities of adaptation for adequate delivery and action. In this review, we focus on two important reason for the use of multifunctional peptides or peptides, namely the double activity and no harmful effects on humans. We review some of the main antimicrobial and anti-inflammatory bioactive peptides and highlight four that have anti-tuberculosis and anti-cancer activity, which may contribute to obtaining drugs with this dual functionality.
Collapse
Affiliation(s)
- Giulia Polinário
- School of Pharmaceutical Sciences, São Paulo State University (UNESP), Araraquara, São Paulo, Brazil
| | | | | | | | - Paula Aboud Barbugli
- School of Pharmaceutical Sciences, São Paulo State University (UNESP), Araraquara, São Paulo, Brazil
| | | | - Fernando Rogério Pavan
- School of Pharmaceutical Sciences, São Paulo State University (UNESP), Araraquara, São Paulo, Brazil
| |
Collapse
|
10
|
Pande A, Patiyal S, Lathwal A, Arora C, Kaur D, Dhall A, Mishra G, Kaur H, Sharma N, Jain S, Usmani SS, Agrawal P, Kumar R, Kumar V, Raghava GPS. Pfeature: A Tool for Computing Wide Range of Protein Features and Building Prediction Models. J Comput Biol 2023; 30:204-222. [PMID: 36251780 DOI: 10.1089/cmb.2022.0241] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
In the last three decades, a wide range of protein features have been discovered to annotate a protein. Numerous attempts have been made to integrate these features in a software package/platform so that the user may compute a wide range of features from a single source. To complement the existing methods, we developed a method, Pfeature, for computing a wide range of protein features. Pfeature allows to compute more than 200,000 features required for predicting the overall function of a protein, residue-level annotation of a protein, and function of chemically modified peptides. It has six major modules, namely, composition, binary profiles, evolutionary information, structural features, patterns, and model building. Composition module facilitates to compute most of the existing compositional features, plus novel features. The binary profile of amino acid sequences allows to compute the fraction of each type of residue as well as its position. The evolutionary information module allows to compute evolutionary information of a protein in the form of a position-specific scoring matrix profile generated using Position-Specific Iterative Basic Local Alignment Search Tool (PSI-BLAST); fit for annotation of a protein and its residues. A structural module was developed for computing of structural features/descriptors from a tertiary structure of a protein. These features are suitable to predict the therapeutic potential of a protein containing non-natural or chemically modified residues. The model-building module allows to implement various machine learning techniques for developing classification and regression models as well as feature selection. Pfeature also allows the generation of overlapping patterns and features from a protein. A user-friendly Pfeature is available as a web server python library and stand-alone package.
Collapse
Affiliation(s)
- Akshara Pande
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Anjali Lathwal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Chakit Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Dilraj Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gaurav Mishra
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Department of Electrical Engineering, Shiv Nadar University, Greater Noida, India
| | - Harpreet Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Neelam Sharma
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Shipra Jain
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Salman Sadullah Usmani
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Piyush Agrawal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Rajesh Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Vinod Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
11
|
Kong W, Huang W, Peng C, Zhang B, Duan G, Ma W, Huang Z. Multiple machine learning methods aided virtual screening of Na V 1.5 inhibitors. J Cell Mol Med 2022; 27:266-276. [PMID: 36573431 PMCID: PMC9843531 DOI: 10.1111/jcmm.17652] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 10/30/2022] [Accepted: 12/06/2022] [Indexed: 12/28/2022] Open
Abstract
Nav 1.5 sodium channels contribute to the generation of the rapid upstroke of the myocardial action potential and thereby play a central role in the excitability of myocardial cells. At present, the patch clamp method is the gold standard for ion channel inhibitor screening. However, this method has disadvantages such as high technical difficulty, high cost and low speed. In this study, novel machine learning models to screen chemical blockers were developed to overcome the above shortage. The data from the ChEMBL Database were employed to establish the machine learning models. Firstly, six molecular fingerprints together with five machine learning algorithms were used to develop 30 classification models to predict effective inhibitors. A validation and a test set were used to evaluate the performance of the models. Subsequently, the privileged substructures tightly associated with the inhibition of the Nav 1.5 ion channel were extracted using the bioalerts Python package. In the validation set, the RF-Graph model performed best. Similarly, RF-Graph produced the best result in the test set in which the Prediction Accuracy (Q) was 0.9309 and Matthew's correlation coefficient was 0.8627, further indicating the model had high classification ability. The results of the privileged substructures indicated Sulfa structures and fragments with large Steric hindrance tend to block Nav 1.5. In the unsupervised learning task of identifying sulfa drugs, MACCS and Graph fingerprints had good results. In summary, effective machine learning models have been constructed which help to screen potential inhibitors of the Nav 1.5 ion channel and key privileged substructures with high affinity were also extracted.
Collapse
Affiliation(s)
- Weikaixin Kong
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina,Institute for Molecular Medicine Finland (FIMM)HiLIFE, University of HelsinkiHelsinkiFinland,Institute Sanqu Technology (Hangzhou) Co., Ltd.HangzhouChina
| | - Weiran Huang
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina
| | - Chao Peng
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina
| | - Bowen Zhang
- ComMedX (Computational Medicine Beijing Co., Ltd.)BeijingChina
| | - Guifang Duan
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina
| | - Weining Ma
- Department of NeurologyShengjing Hospital affiliated to China Medical UniversityShenyangChina
| | - Zhuo Huang
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina,State Key Laboratory of Natural and Biomimetic Drugs, Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina
| |
Collapse
|
12
|
Hemmati S, Rasekhi Kazerooni H. Polypharmacological Cell-Penetrating Peptides from Venomous Marine Animals Based on Immunomodulating, Antimicrobial, and Anticancer Properties. Mar Drugs 2022; 20:md20120763. [PMID: 36547910 PMCID: PMC9787916 DOI: 10.3390/md20120763] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 11/25/2022] [Accepted: 11/30/2022] [Indexed: 12/09/2022] Open
Abstract
Complex pathological diseases, such as cancer, infection, and Alzheimer's, need to be targeted by multipronged curative. Various omics technologies, with a high rate of data generation, demand artificial intelligence to translate these data into druggable targets. In this study, 82 marine venomous animal species were retrieved, and 3505 cryptic cell-penetrating peptides (CPPs) were identified in their toxins. A total of 279 safe peptides were further analyzed for antimicrobial, anticancer, and immunomodulatory characteristics. Protease-resistant CPPs with endosomal-escape ability in Hydrophis hardwickii, nuclear-localizing peptides in Scorpaena plumieri, and mitochondrial-targeting peptides from Synanceia horrida were suitable for compartmental drug delivery. A broad-spectrum S. horrida-derived antimicrobial peptide with a high binding-affinity to bacterial membranes was an antigen-presenting cell (APC) stimulator that primes cytokine release and naïve T-cell maturation simultaneously. While antibiofilm and wound-healing peptides were detected in Synanceia verrucosa, APC epitopes as universal adjuvants for antiviral vaccination were in Pterois volitans and Conus monile. Conus pennaceus-derived anticancer peptides showed antiangiogenic and IL-2-inducing properties with moderate BBB-permeation and were defined to be a tumor-homing peptide (THP) with the ability to inhibit programmed death ligand-1 (PDL-1). Isoforms of RGD-containing peptides with innate antiangiogenic characteristics were in Conus tessulatus for tumor targeting. Inhibitors of neuropilin-1 in C. pennaceus are proposed for imaging probes or therapeutic delivery. A Conus betulinus cryptic peptide, with BBB-permeation, mitochondrial-targeting, and antioxidant capacity, was a stimulator of anti-inflammatory cytokines and non-inducer of proinflammation proposed for Alzheimer's. Conclusively, we have considered the dynamic interaction of cells, their microenvironment, and proportional-orchestrating-host- immune pathways by multi-target-directed CPPs resembling single-molecule polypharmacology. This strategy might fill the therapeutic gap in complex resistant disorders and increase the candidates' clinical-translation chance.
Collapse
Affiliation(s)
- Shiva Hemmati
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz 71345-1583, Iran
- Department of Pharmaceutical Biology, Faculty of Pharmaceutical Sciences, UCSI University, Cheras, Kuala Lumpur 56000, Malaysia
- Biotechnology Research Center, Shiraz University of Medical Sciences, Shiraz 71345-1583, Iran
- Correspondence: ; Tel.: +98-7132-424-128
| | | |
Collapse
|
13
|
Accurate Prediction of Anti-hypertensive Peptides Based on Convolutional Neural Network and Gated Recurrent unit. Interdiscip Sci 2022; 14:879-894. [PMID: 35474167 DOI: 10.1007/s12539-022-00521-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Revised: 03/30/2022] [Accepted: 04/06/2022] [Indexed: 12/30/2022]
Abstract
Hypertension (HT) is a general disease, and also one of the most ordinary and major causes of cardiovascular disease. Some diseases are caused by high blood pressure, including impairment of heart and kidney function, cerebral hemorrhage and myocardial infarction. Due to the limitations of laboratory methods, bioactive peptides for the treatment of HT need a long time to be identified. Therefore, it is of great immediate significance for the identification of anti-hypertensive peptides (AHTPs). With the prevalence of machine learning, it is suggested to use it as a supplementary method for AHTPs classification. Therefore, we develop a new model to identify AHTPs based on multiple features and deep learning. And the deep model is constructed by combining a convolutional neural network (CNN) and a gated recurrent unit (GRU). The unique convolution structure is used to reduce the feature dimension and running time. The data processed by CNN is input into the recurrent structure GRU, and important information is filtered out through the reset gate and update gate. Finally, the output layer adopts Sigmoid activation function. Firstly, we use Kmer, the deviation between the dipeptide frequency and the expected mean (DDE), encoding based on grouped weight (EBGW), enhanced grouped amino acid composition (EGAAC) and dipeptide binary profile and frequency (DBPF) to extract features. For Kmer, DDE, EBGW and EGAAC, it is widely used in the field of protein research. DBPF is a new feature representation method designed by us. It corresponds dipeptides to binary numbers, and finally obtains a binary coding file and a frequency file. Then these features are spliced together and input into our proposed model for prediction and analysis. After a tenfold cross-validation test, this model has a better competitive advantage than the previous methods, and the accuracy is 96.23% and 99.10%, respectively. From the results, compared with the previous methods, it has been greatly improved. It shows that the combination of convolution calculation and recurrent structure has a positive impact on the classification of AHTPs. The results show that this method is a feasible, efficient and competitive sequence analysis tool for AHTPs. Meanwhile, we design a friendly online prediction tool and it is freely accessible at http://ahtps.zhanglab.site/ .
Collapse
|
14
|
Onah E, Uzor PF, Ugwoke IC, Eze JU, Ugwuanyi ST, Chukwudi IR, Ibezim A. Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors. BMC Bioinformatics 2022; 23:466. [DOI: 10.1186/s12859-022-05017-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 10/11/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the treatment of human immunodeficiency virus (HIV) infection, which is responsible for AIDS. Peptide cleavage by HIV-1 protease is an essential step in the replication of HIV-1. Thus, correct and timely prediction of the cleavage site of HIV-1 protease can significantly speed up and optimize the drug discovery process of novel HIV-1 protease inhibitors. In this work, we built and compared the performance of selected machine learning models for the prediction of HIV-1 protease cleavage site utilizing a hybrid of octapeptide sequence information comprising bond composition, amino acid binary profile (AABP), and physicochemical properties as numerical descriptors serving as input variables for some selected machine learning algorithms. Our work differs from antecedent studies exploring the same subject in the combination of octapeptide descriptors and method used. Instead of using various subsets of the dataset for training and testing the models, we combined the dataset, applied a 3-way data split, and then used a "stratified" 10-fold cross-validation technique alongside the testing set to evaluate the models.
Results
Among the 8 models evaluated in the “stratified” 10-fold CV experiment, logistic regression, multi-layer perceptron classifier, linear discriminant analysis, gradient boosting classifier, Naive Bayes classifier, and decision tree classifier with AUC, F-score, and B. Acc. scores in the ranges of 0.91–0.96, 0.81–0.88, and 80.1–86.4%, respectively, have the closest predictive performance to the state-of-the-art model (AUC 0.96, F-score 0.80 and B. Acc. ~ 80.0%). Whereas, the perceptron classifier and the K-nearest neighbors had statistically lower performance (AUC 0.77–0.82, F-score 0.53–0.69, and B. Acc. 60.0–68.5%) at p < 0.05. On the other hand, logistic regression, and multi-layer perceptron classifier (AUC of 0.97, F-score > 0.89, and B. Acc. > 90.0%) had the best performance on further evaluation on the testing set, though linear discriminant analysis, gradient boosting classifier, and Naive Bayes classifier equally performed well (AUC > 0.94, F-score > 0.87, and B. Acc. > 86.0%).
Conclusions
Logistic regression and multi-layer perceptron classifiers have comparable predictive performances to the state-of-the-art model when octapeptide sequence descriptors consisting of AABP, bond composition and standard physicochemical properties are used as input variables. In our future work, we hope to develop a standalone software for HIV-1 protease cleavage site prediction utilizing the linear regression algorithm and the aforementioned octapeptide sequence descriptors.
Collapse
|
15
|
Dasmahapatra U, Chanda K. Synthetic approaches to potent heterocyclic inhibitors of tuberculosis: A decade review. Front Pharmacol 2022; 13:1021216. [PMID: 36386156 PMCID: PMC9661889 DOI: 10.3389/fphar.2022.1021216] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 10/03/2022] [Indexed: 09/08/2024] Open
Abstract
Tuberculosis (TB) continues to be a significant global health concern with about 1.5 million deaths annually. Despite efforts to develop more efficient vaccines, reliable diagnostics, and chemotherapeutics, tuberculosis has become a concern to world health due to HIV, the rapid growth of bacteria that are resistant to treatment, and the recently introduced COVID-19 pandemic. As is well known, advances in synthetic organic chemistry have historically enabled the production of important life-saving medications that have had a tremendous impact on patients' lives and health all over the world. Small-molecule research as a novel chemical entity for a specific disease target offers in-depth knowledge and potential therapeutic targets. In this viewpoint, we concentrated on the synthesis of a number of heterocycles reported in the previous decade and the screening of their inhibitory action against diverse strains of Mycobacterium tuberculosis. These findings offer specific details on the structure-based activity of several heterocyclic scaffolds backed by their in vitro tests as a promising class of antitubercular medicines, which will be further useful to build effective treatments to prevent this terrible illness.
Collapse
Affiliation(s)
| | - Kaushik Chanda
- Department of Chemistry, School of Advanced Sciences, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
16
|
Rodrigues CHM, Garg A, Keizer D, Pires DEV, Ascher DB. CSM-peptides: A computational approach to rapid identification of therapeutic peptides. Protein Sci 2022; 31:e4442. [PMID: 36173168 PMCID: PMC9518225 DOI: 10.1002/pro.4442] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/25/2022]
Abstract
Peptides are attractive alternatives for the development of new therapeutic strategies due to their versatility and low complexity of synthesis. Increasing interest in these molecules has led to the creation of large collections of experimentally characterized therapeutic peptides, which greatly contributes to development of data-driven computational approaches. Here we propose CSM-peptides, a novel machine learning method for rapid identification of eight different types of therapeutic peptides: anti-angiogenic, anti-bacterial, anti-cancer, anti-inflammatory, anti-viral, cell-penetrating, quorum sensing, and surface binding. Our method has shown to outperform existing approaches, achieving an AUC of up to 0.92 on independent blind tests, and consistent performance on cross-validation. We anticipate CSM-peptides to be of great value in helping screening large libraries to identify novel peptides with therapeutic potential and have made it freely available as a user-friendly web server and Application Programming Interface at https://biosig.lab.uq.edu.au/csm_peptides.
Collapse
Affiliation(s)
- Carlos H. M. Rodrigues
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandSt LuciaQueenslandAustralia
| | - Anjali Garg
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
| | - David Keizer
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
| | - Douglas E. V. Pires
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Computing and Information SystemsUniversity of MelbourneMelbourneVictoriaAustralia
| | - David B. Ascher
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandSt LuciaQueenslandAustralia
| |
Collapse
|
17
|
Vishnepolsky B, Grigolava M, Managadze G, Gabrielian A, Rosenthal A, Hurt DE, Tartakovsky M, Pirtskhalava M. Comparative analysis of machine learning algorithms on the microbial strain-specific AMP prediction. Brief Bioinform 2022; 23:6611915. [PMID: 35724561 PMCID: PMC9294419 DOI: 10.1093/bib/bbac233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 05/17/2022] [Accepted: 05/18/2022] [Indexed: 12/29/2022] Open
Abstract
The evolution of drug-resistant pathogenic microbial species is a major global health concern. Naturally occurring, antimicrobial peptides (AMPs) are considered promising candidates to address antibiotic resistance problems. A variety of computational methods have been developed to accurately predict AMPs. The majority of such methods are not microbial strain specific (MSS): they can predict whether a given peptide is active against some microbe, but cannot accurately calculate whether such peptide would be active against a particular MS. Due to insufficient data on most MS, only a few MSS predictive models have been developed so far. To overcome this problem, we developed a novel approach that allows to improve MSS predictive models (MSSPM), based on properties, computed for AMP sequences and characteristics of genomes, computed for target MS. New models can perform predictions of AMPs for MS that do not have data on peptides tested on them. We tested various types of feature engineering as well as different machine learning (ML) algorithms to compare the predictive abilities of resulting models. Among the ML algorithms, Random Forest and AdaBoost performed best. By using genome characteristics as additional features, the performance for all models increased relative to models relying on AMP sequence-based properties only. Our novel MSS AMP predictor is freely accessible as part of DBAASP database resource at http://dbaasp.org/prediction/genome.
Collapse
Affiliation(s)
- Boris Vishnepolsky
- Corresponding authors: B. Vishnepolsky, Laboratory of Bioinformatics, Ivane Beritashvili Center of Experimental Biomedicine, Tbilisi, Georgia. Tel: +995595771363; E-mail: ; M. Pirtskhalava, Laboratory of Bioinformatics, Ivane Beritashvili Center of Experimental Biomedicine, Tbilisi, Georgia. Tel: +995574162397; E-mail:
| | - Maya Grigolava
- Ivane Beritashvili Center of Experimental Biomedicine, Tbilisi 0160, Georgia
| | - Grigol Managadze
- Ivane Beritashvili Center of Experimental Biomedicine, Tbilisi 0160, Georgia
| | - Andrei Gabrielian
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Alex Rosenthal
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Darrell E Hurt
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Michael Tartakovsky
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Malak Pirtskhalava
- Corresponding authors: B. Vishnepolsky, Laboratory of Bioinformatics, Ivane Beritashvili Center of Experimental Biomedicine, Tbilisi, Georgia. Tel: +995595771363; E-mail: ; M. Pirtskhalava, Laboratory of Bioinformatics, Ivane Beritashvili Center of Experimental Biomedicine, Tbilisi, Georgia. Tel: +995574162397; E-mail:
| |
Collapse
|
18
|
Li Y, Li X, Liu Y, Yao Y, Huang G. MPMABP: A CNN and Bi-LSTM-Based Method for Predicting Multi-Activities of Bioactive Peptides. Pharmaceuticals (Basel) 2022; 15:707. [PMID: 35745625 PMCID: PMC9231127 DOI: 10.3390/ph15060707] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/23/2022] [Accepted: 05/30/2022] [Indexed: 12/30/2022] Open
Abstract
Bioactive peptides are typically small functional peptides with 2-20 amino acid residues and play versatile roles in metabolic and biological processes. Bioactive peptides are multi-functional, so it is vastly challenging to accurately detect all their functions simultaneously. We proposed a convolution neural network (CNN) and bi-directional long short-term memory (Bi-LSTM)-based deep learning method (called MPMABP) for recognizing multi-activities of bioactive peptides. The MPMABP stacked five CNNs at different scales, and used the residual network to preserve the information from loss. The empirical results showed that the MPMABP is superior to the state-of-the-art methods. Analysis on the distribution of amino acids indicated that the lysine preferred to appear in the anti-cancer peptide, the leucine in the anti-diabetic peptide, and the proline in the anti-hypertensive peptide. The method and analysis are beneficial to recognize multi-activities of bioactive peptides.
Collapse
Affiliation(s)
- You Li
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (Y.L.); (X.L.)
| | - Xueyong Li
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (Y.L.); (X.L.)
| | - Yuewu Liu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China;
| | - Yuhua Yao
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China;
| | - Guohua Huang
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (Y.L.); (X.L.)
| |
Collapse
|
19
|
Lertampaiporn S, Hongsthong A, Wattanapornprom W, Thammarongtham C. Ensemble-AHTPpred: A Robust Ensemble Machine Learning Model Integrated With a New Composite Feature for Identifying Antihypertensive Peptides. Front Genet 2022; 13:883766. [PMID: 35571042 PMCID: PMC9096110 DOI: 10.3389/fgene.2022.883766] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 04/04/2022] [Indexed: 11/13/2022] Open
Abstract
Hypertension or elevated blood pressure is a serious medical condition that significantly increases the risks of cardiovascular disease, heart disease, diabetes, stroke, kidney disease, and other health problems, that affect people worldwide. Thus, hypertension is one of the major global causes of premature death. Regarding the prevention and treatment of hypertension with no or few side effects, antihypertensive peptides (AHTPs) obtained from natural sources might be useful as nutraceuticals. Therefore, the search for alternative/novel AHTPs in food or natural sources has received much attention, as AHTPs may be functional agents for human health. AHTPs have been observed in diverse organisms, although many of them remain underinvestigated. The identification of peptides with antihypertensive activity in the laboratory is time- and resource-consuming. Alternatively, computational methods based on robust machine learning can identify or screen potential AHTP candidates prior to experimental verification. In this paper, we propose Ensemble-AHTPpred, an ensemble machine learning algorithm composed of a random forest (RF), a support vector machine (SVM), and extreme gradient boosting (XGB), with the aim of integrating diverse heterogeneous algorithms to enhance the robustness of the final predictive model. The selected feature set includes various computed features, such as various physicochemical properties, amino acid compositions (AACs), transitions, n-grams, and secondary structure-related information; these features are able to learn more information in terms of analyzing or explaining the characteristics of the predicted peptide. In addition, the tool is integrated with a newly proposed composite feature (generated based on a logistic regression function) that combines various feature aspects to enable improved AHTP characterization. Our tool, Ensemble-AHTPpred, achieved an overall accuracy above 90% on independent test data. Additionally, the approach was applied to novel experimentally validated AHTPs, obtained from recent studies, which did not overlap with the training and test datasets, and the tool could precisely predict these AHTPs.
Collapse
Affiliation(s)
- Supatcha Lertampaiporn
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
| | - Apiradee Hongsthong
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
| | - Warin Wattanapornprom
- Applied Computer Science Program, Department of Mathematics, Faculty of Science, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
| | - Chinae Thammarongtham
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
- *Correspondence: Chinae Thammarongtham,
| |
Collapse
|
20
|
Humanizing plant-derived snakins and their encrypted antimicrobial peptides. Biochimie 2022; 199:92-111. [PMID: 35472564 DOI: 10.1016/j.biochi.2022.04.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 04/16/2022] [Accepted: 04/20/2022] [Indexed: 12/11/2022]
Abstract
Due to safety restrictions, plant-derived antimicrobial peptides (AMPs) need optimization to be consumed beyond preservatives. Herein, 175 GASA-domain-containing snakins were analyzed. Factors including charge, hydrophobicity, helicity, hydrophobic moment (μH), folding enthalpy, folding heat capacity, folding free energy, therapeutic index, allergenicity, and bitterness were considered. The most optimal snakins for oral consumption as preservatives were from Cajanus cajan, Cucumis melo, Durio zibethinus, Glycine soja, Herrania umbratica, and Ziziphus jujuba. Virtual digestion of snakins predicted ACE1 and DPPIV inhibitory as dominant effects upon oral use with antihypertensive and antidiabetic properties. To be applied as a therapeutic in parenteral administration, snakins were browsed for short 20-mer encrypted fragments that were non-toxic or with eliminated toxicity using directed mutagenesis yet retaining the AMP property. The most promising 20-mer AMPs were Mr-SNK2-1a in Morella rubra with BBB permeation, Na-SNK2-2a(C18W), and Na-SNK2-2b(C16F) from Nicotiana attenuata. These AMPs were cell-penetrating peptides (CPPs), with a charge of +6, a μH of about 0.40, and a Boman-index higher than 2.48 Kcalmol-1. Na-SNK2-2a(C18W) had putative activity against gram-negative bacteria with MIC lower than 25 μgml-1, and Na-SNK2-2b(C16F) was a potential anti-HIV with an IC50 of 3.04 μM. Other 20-mer AMPs, such as Cc-SNK1-2a from Cajanus cajan displayed an anti-HCV property with an IC50 of 13.91 μM. While Si-SNK2-3a(C17P) from Sesamum indicum was a cationic anti-angiogenic CPP targeting the acidic microenvironment of tumors, Cme-SNK2-1a(C11F) from Cucumis melo was an immunomodulator CPP applicable as a vaccine adjuvant. Because of combined mechanisms, investigating cysteine-rich peptides can nominate effective biotherapeutics.
Collapse
|
21
|
Zou H, Yang F, Yin Z. Identification of tumor homing peptides by utilizing hybrid feature representation. J Biomol Struct Dyn 2022; 41:3405-3412. [PMID: 35262448 DOI: 10.1080/07391102.2022.2049368] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Cancer is one of the serious diseases, recent studies reported that tumor homing peptides (THPs) play a key role in treatment of cancer. Due to the experimental methods are time-consuming and expensive, it is urgent to develop automatic computational approaches to identify THPs. Hence, in this study, we proposed a novel machine learning methods to distinguish THPs from non-THPs, in which the peptide sequences firstly encoded by pseudo residue pairwise energy content matrix (PseRECM) and pseudo physicochemical property (PsePC). Moreover, the least absolute shrinkage and selection operator (LAASO) was employed to select optimal features from the extracted features. All of these selected features were fed into support vector machine (SVM) for identifying THPs. We achieved 89.02%, 88.49%, and 94.58% classification accuracy on the Main, Small, and Main90 dataset, respectively. Experimental results showed that our proposed method outperforms the existing predictors on the same benchmark datasets. It indicates that the proposed method may be a useful tool in identifying THPs. The datasets and codes used in current study are available at https://figshare.com/articles/online_resource/iTHPs/16778770.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Zhijian Yin
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| |
Collapse
|
22
|
Mckenna A, P N Dubey S. Machine Learning Based Predictive Model for the Analysis of Sequence Activity Relationships Using Protein Spectra and Protein Descriptors. J Biomed Inform 2022; 128:104016. [PMID: 35143999 DOI: 10.1016/j.jbi.2022.104016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 12/13/2021] [Accepted: 02/03/2022] [Indexed: 11/26/2022]
Abstract
Accurately establishing the connection between a protein sequence and its function remains a focal point within the field of protein engineering, especially in the context of predicting the effects of mutations. From this, there has been a continued drive to build accurate and reliable predictive models via machine learning that allow for the virtual screening of many protein mutant sequences, measuring the relationship between sequence and 'fitness' or 'activity', commonly known as a Sequence-Activity-Relationship (SAR). An important preliminary stage in the building of these predictive models is the encoding of the chosen sequences. Evaluated in this work is a plethora of encoding strategies using the Amino Acid Index database, where the indices are transformed into their spectral form via Digital Signal Processing (DSP) techniques, as well as numerous protein structural and physiochemical descriptors. The encoding strategies are explored on a dataset curated to measure the thermostability of various mutants from a recombination library, designed from parental cytochrome P450s. In this work it was concluded that the implementation of protein spectra in concatenation with protein descriptors, together with the Partial Least Squares Regression (PLS) algorithm, gave the most noteworthy increase in the quality of the predictive models (as described in Encoding Strategy C), highlighting their utility in identifying an SAR. The accompanying software produced for this paper is termed pySAR (Python Sequence-Activity-Relationship), which allows for a user to find the optimal arrangement of structural and or physiochemical properties to encode their specific mutant library dataset; the source code is available at: https://github.com/amckenna41/pySAR.
Collapse
Affiliation(s)
- Adam Mckenna
- School of Electronics, Electrical Engineering and Computer Science, Queen's University of Belfast, University Road, BT7 1NN, Belfast, United Kingdom.
| | - Sandhya P N Dubey
- Department of Data Science and Computer Applications, Manipal Institute of Technology, Manipal Academy of Higher Education (MAHE), Manipal, Karnataka 576104, India.
| |
Collapse
|
23
|
Abstract
Antibiotic resistance constitutes a global threat and could lead to a future pandemic. One strategy is to develop a new generation of antimicrobials. Naturally occurring antimicrobial peptides (AMPs) are recognized templates and some are already in clinical use. To accelerate the discovery of new antibiotics, it is useful to predict novel AMPs from the sequenced genomes of various organisms. The antimicrobial peptide database (APD) provided the first empirical peptide prediction program. It also facilitated the testing of the first machine-learning algorithms. This chapter provides an overview of machine-learning predictions of AMPs. Most of the predictors, such as AntiBP, CAMP, and iAMPpred, involve a single-label prediction of antimicrobial activity. This type of prediction has been expanded to antifungal, antiviral, antibiofilm, anti-TB, hemolytic, and anti-inflammatory peptides. The multiple functional roles of AMPs annotated in the APD also enabled multi-label predictions (iAMP-2L, MLAMP, and AMAP), which include antibacterial, antiviral, antifungal, antiparasitic, antibiofilm, anticancer, anti-HIV, antimalarial, insecticidal, antioxidant, chemotactic, spermicidal activities, and protease inhibiting activities. Also considered in predictions are peptide posttranslational modification, 3D structure, and microbial species-specific information. We compare important amino acids of AMPs implied from machine learning with the frequently occurring residues of the major classes of natural peptides. Finally, we discuss advances, limitations, and future directions of machine-learning predictions of antimicrobial peptides. Ultimately, we may assemble a pipeline of such predictions beyond antimicrobial activity to accelerate the discovery of novel AMP-based antimicrobials.
Collapse
Affiliation(s)
- Guangshun Wang
- Department of Pathology and Microbiology, College of Medicine, University of Nebraska Medical Center, 985900 Nebraska Medical Center, Omaha, NE 68198-5900, USA;,Corresponding to: Dr. Monique van Hoek: ; Dr. Iosif Vaisman: ; Dr. Guangshun Wang:
| | - Iosif I. Vaisman
- School of Systems Biology, George Mason University, 10920 George Mason Circle, Manassas, VA, 20110, USA.,Corresponding to: Dr. Monique van Hoek: ; Dr. Iosif Vaisman: ; Dr. Guangshun Wang:
| | - Monique L. van Hoek
- School of Systems Biology, George Mason University, 10920 George Mason Circle, Manassas, VA, 20110, USA.,Corresponding to: Dr. Monique van Hoek: ; Dr. Iosif Vaisman: ; Dr. Guangshun Wang:
| |
Collapse
|
24
|
Wani MA, Garg P, Roy KK. Machine learning-enabled predictive modeling to precisely identify the antimicrobial peptides. Med Biol Eng Comput 2021; 59:2397-2408. [PMID: 34632545 DOI: 10.1007/s11517-021-02443-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Accepted: 09/14/2021] [Indexed: 10/20/2022]
Abstract
The ubiquitous antimicrobial peptides (AMPs), with a broad range of antimicrobial activities, represent a great promise for combating the multi-drug resistant infections. In this study, using a large and diverse set of AMPs (2638) and non-AMPs (3700), we have explored a variety of machine learning classifiers to build in silico models for AMP prediction, including Random Forest (RF), k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Decision Tree (DT), Naive Bayes (NB), Quadratic Discriminant Analysis (QDA), and ensemble learning. Among the various models generated, the RF classifier-based model top-performed in both the internal [Accuracy: 91.40%, Precision: 89.37%, Sensitivity: 90.05%, and Specificity: 92.36%] and external validations [Accuracy: 89.43%, Precision: 88.92%, Sensitivity: 85.21%, and Specificity: 92.43%]. In addition, the RF classifier-based model correctly predicted the known AMPs and non-AMPs; those kept aside as an additional external validation set. The performance assessment revealed three features viz. ChargeD2001, PAAC12 (pseudo amino acid composition), and polarity T13 that are likely to play vital roles in the antimicrobial activity of AMPs. The developed RF-based classification model may further be useful in the design and prediction of the novel potential AMPs.
Collapse
Affiliation(s)
- Mushtaq Ahmad Wani
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Kolkata, 700054, West Bengal, India
| | - Prabha Garg
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Mohali, 160062, Punjab, India
| | - Kuldeep K Roy
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Kolkata, 700054, West Bengal, India. .,Department of Pharmaceutical Sciences, School of Health Sciences, University of Petroleum and Energy Studies (UPES), P.O. Bidholi, Dehradun, 248007, Uttarakhand, India.
| |
Collapse
|
25
|
Zou H, Yin Z. m7G-DPP: Identifying N7-methylguanosine sites based on dinucleotide physicochemical properties of RNA. Biophys Chem 2021; 279:106697. [PMID: 34628276 DOI: 10.1016/j.bpc.2021.106697] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 10/01/2021] [Accepted: 10/02/2021] [Indexed: 11/17/2022]
Abstract
N7-methylguanosine (m7G) modification is one of the most common post-transcriptional RNA modifications, which play vital role in the regulation of gene expression. Dysfunction of m7G may result to developmental defects and the appearance of some serious diseases. Thus, it is an urgent task to fast and accurate identifying m7G sites. In view of experimental approaches are costly and time-consuming, researchers focused their attention on computational models. Hence, in current study, we proposed a novel predictor called m7G-DPP to identify m7G sites. In the predictor, the RNA sequences were firstly encoded by physicochemical (PC) properties of dinucleotide. Then, sliding window approach was adopted to divide PC matrix into multiple matrixes, and Pearson's correlation coefficient (PCC), dynamic time warping (DTW), and distance correlation (DC) were employed to extract classification features at each window. Next, the least absolute shrinkage and selection operator (LASSO) algorithm was applied to select discriminative features. Finally, these selected features were fed into support vector machine to identify m7G sites. Experimental results showed that the proposed method is effective, which may play a complementary role in current m7G sites prediction studies. The MATLAB codes and dataset can be obtained from website at https://figshare.com/articles/online_resource/m7G-DPP/15000348.
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330003, China.
| | - Zhijian Yin
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330003, China
| |
Collapse
|
26
|
Rashidi HH, Dang LT, Albahra S, Ravindran R, Khan IH. Automated machine learning for endemic active tuberculosis prediction from multiplex serological data. Sci Rep 2021; 11:17900. [PMID: 34504228 PMCID: PMC8429671 DOI: 10.1038/s41598-021-97453-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 08/25/2021] [Indexed: 11/09/2022] Open
Abstract
Serological diagnosis of active tuberculosis (TB) is enhanced by detection of multiple antibodies due to variable immune responses among patients. Clinical interpretation of these complex datasets requires development of suitable algorithms, a time consuming and tedious undertaking addressed by the automated machine learning platform MILO (Machine Intelligence Learning Optimizer). MILO seamlessly integrates data processing, feature selection, model training, and model validation to simultaneously generate and evaluate thousands of models. These models were then further tested for generalizability on out-of-sample secondary and tertiary datasets. Out of 31 antigens evaluated, a 23-antigen model was the most robust on both the secondary dataset (TB vs healthy) and the tertiary dataset (TB vs COPD) with sensitivity of 90.5% and respective specificities of 100.0% and 74.6%. MILO represents a user-friendly, end-to-end solution for automated generation and deployment of optimized models, ideal for applications where rapid clinical implementation is critical such as emerging infectious diseases.
Collapse
Affiliation(s)
- Hooman H Rashidi
- Department of Pathology and Laboratory Medicine, University of California Davis, 4400 V Street, Sacramento, CA, 95817, USA.
| | - Luke T Dang
- Department of Pathology and Laboratory Medicine, University of California Davis, 4400 V Street, Sacramento, CA, 95817, USA
| | - Samer Albahra
- Department of Pathology and Laboratory Medicine, University of California Davis, 4400 V Street, Sacramento, CA, 95817, USA
| | - Resmi Ravindran
- Department of Pathology and Laboratory Medicine, University of California Davis, 4400 V Street, Sacramento, CA, 95817, USA
| | - Imran H Khan
- Department of Pathology and Laboratory Medicine, University of California Davis, 4400 V Street, Sacramento, CA, 95817, USA.
| |
Collapse
|
27
|
Akbar S, Ahmad A, Hayat M, Rehman AU, Khan S, Ali F. iAtbP-Hyb-EnC: Prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model. Comput Biol Med 2021; 137:104778. [PMID: 34481183 DOI: 10.1016/j.compbiomed.2021.104778] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 08/16/2021] [Accepted: 08/17/2021] [Indexed: 11/26/2022]
Abstract
Tuberculosis (TB) is a worldwide illness caused by the bacteria Mycobacterium tuberculosis. Owing to the high prevalence of multidrug-resistant tuberculosis, numerous traditional strategies for developing novel alternative therapies have been presented. The effectiveness and dependability of these procedures are not always consistent. Peptide-based therapy has recently been regarded as a preferable alternative due to its excellent selectivity in targeting specific cells without affecting the normal cells. However, due to the rapid growth of the peptide samples, predicting TB accurately has become a challenging task. To effectively identify antitubercular peptides, an intelligent and reliable prediction model is indispensable. An ensemble learning approach was used in this study to improve expected results by compensating for the shortcomings of individual classification algorithms. Initially, three distinct representation approaches were used to formulate the training samples: k-space amino acid composition, composite physiochemical properties, and one-hot encoding. The feature vectors of the applied feature extraction methods are then combined to generate a heterogeneous vector. Finally, utilizing individual and heterogeneous vectors, five distinct nature classification models were used to evaluate prediction rates. In addition, a genetic algorithm-based ensemble model was used to improve the suggested model's prediction and training capabilities. Using Training and independent datasets, the proposed ensemble model achieved an accuracy of 94.47% and 92.68%, respectively. It was observed that our proposed "iAtbP-Hyb-EnC" model outperformed and reported ~10% highest training accuracy than existing predictors. The "iAtbP-Hyb-EnC" model is suggested to be a reliable tool for scientists and might play a valuable role in academic research and drug discovery. The source code and all datasets are publicly available at https://github.com/Farman335/iAtbP-Hyb-EnC.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Ashfaq Ahmad
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Ateeq Ur Rehman
- Department of Information Technology, The University of Haripur, KP, Pakistan.
| | - Salman Khan
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Farman Ali
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| |
Collapse
|
28
|
ActTRANS: Functional classification in active transport proteins based on transfer learning and contextual representations. Comput Biol Chem 2021; 93:107537. [PMID: 34217007 DOI: 10.1016/j.compbiolchem.2021.107537] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 05/09/2021] [Accepted: 06/26/2021] [Indexed: 01/08/2023]
Abstract
MOTIVATION Primary and secondary active transport are two types of active transport that involve using energy to move the substances. Active transport mechanisms do use proteins to assist in transport and play essential roles to regulate the traffic of ions or small molecules across a cell membrane against the concentration gradient. In this study, the two main types of proteins involved in such transport are classified from transmembrane transport proteins. We propose a Support Vector Machine (SVM) with contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) to represent protein sequences. BERT is a powerful model in transfer learning, a deep learning language representation model developed by Google and one of the highest performing pre-trained model for Natural Language Processing (NLP) tasks. The idea of transfer learning with pre-trained model from BERT is applied to extract fixed feature vectors from the hidden layers and learn contextual relations between amino acids in the protein sequence. Therefore, the contextualized word representations of proteins are introduced to effectively model complex structures of amino acids in the sequence and the variations of these amino acids in the context. By generating context information, we capture multiple meanings for the same amino acid to reveal the importance of specific residues in the protein sequence. RESULTS The performance of the proposed method is evaluated using five-fold cross-validation and independent test. The proposed method achieves an accuracy of 85.44 %, 88.74 % and 92.84 % for Class-1, Class-2, and Class-3, respectively. Experimental results show that this approach can outperform from other feature extraction methods using context information, effectively classify two types of active transport and improve the overall performance.
Collapse
|
29
|
Wan Y, Wang Z, Lee TY. Incorporating support vector machine with sequential minimal optimization to identify anticancer peptides. BMC Bioinformatics 2021; 22:286. [PMID: 34051755 PMCID: PMC8164238 DOI: 10.1186/s12859-021-03965-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2020] [Accepted: 01/08/2021] [Indexed: 12/09/2022] Open
Abstract
BACKGROUND Cancer is one of the major causes of death worldwide. To treat cancer, the use of anticancer peptides (ACPs) has attracted increased attention in recent years. ACPs are a unique group of small molecules that can target and kill cancer cells fast and directly. However, identifying ACPs by wet-lab experiments is time-consuming and labor-intensive. Therefore, it is significant to develop computational tools for ACPs prediction. Though some ACP prediction tools have been developed recently, their performances are not well enough and most of them do not offer a function to distinguish ACPs from antimicrobial peptides (AMPs). Considering the fact that a growing number of studies have shown that some AMPs exhibit anticancer function, this work tries to build a model for distinguishing AMPs from ACPs in addition to a model that predicts ACPs from whole peptides. RESULTS This study chooses amino acid composition, N5C5, k-space, position-specific scoring matrix (PSSM) as features, and analyzes them by machine learning methods, including support vector machine (SVM) and sequential minimal optimization (SMO) to build a model (model 2) for distinguishing ACPs from whole peptides. Another model (model 1) that distinguishes ACPs from AMPs is also developed. Comparing to previous models, models developed in this research show better performance (accuracy: 85.5% for model 1 and 95.2% for model 2). CONCLUSIONS This work utilizes a new feature, PSSM, which contributes to better performance than other features. In addition to SVM, SMO is used in this research for optimizing SVM and the SMO-optimized models show better performance than non-optimized models. Last but not least, this work provides two different functions, including distinguishing ACPs from AMPs and distinguishing ACPs from all peptides. The second SMO-optimized model, which utilizes PSSM as a feature, performs better than all other existing tools.
Collapse
Affiliation(s)
- Yu Wan
- School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, Guangdong, People's Republic of China
| | - Zhuo Wang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, Guangdong, People's Republic of China
| | - Tzong-Yi Lee
- School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, Guangdong, People's Republic of China.
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, Guangdong, People's Republic of China.
| |
Collapse
|
30
|
Shah SMA, Taju SW, Dlamini BB, Ou YY. DeepSIRT: A deep neural network for identification of sirtuin targets and their subcellular localizations. Comput Biol Chem 2021; 93:107514. [PMID: 34058657 DOI: 10.1016/j.compbiolchem.2021.107514] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 05/12/2021] [Indexed: 09/30/2022]
Abstract
Sirtuins are a family of proteins that play a key role in regulating a wide range of cellular processes including DNA regulation, metabolism, aging/longevity, cell survival, apoptosis, and stress resistance. Sirtuins are protein deacetylases and include in the class III family of histone deacetylase enzymes (HDACs). The class III HDACs contains seven members of the sirtuin family from SIRT1 to SIRT7. The seven members of the sirtuin family have various substrates and are present in nearly all subcellular localizations including the nucleus, cytoplasm, and mitochondria. In this study, a deep neural network approach using one-dimensional Convolutional Neural Networks (CNN) was proposed to build a prediction model that can accurately identify the outcome of the sirtuin protein by targeting their subcellular localizations. Therefore, the function and localization of sirtuin targets were analyzed and annotated to compartmentalize into distinct subcellular localizations. We further reduced the sequence similarity between protein sequences and three feature extraction methods were applied in datasets. Finally, the proposed method has been tested and compared with various machine-learning algorithms. The proposed method is validated on two independent datasets and showed an average of up to 85.77 % sensitivity, 97.32 % specificity, and 0.82 MCC for seven members of the sirtuin family of proteins.
Collapse
Affiliation(s)
- Syed Muazzam Ali Shah
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan
| | - Semmy Wellem Taju
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan
| | - Bongani Brian Dlamini
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan
| | - Yu-Yen Ou
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan.
| |
Collapse
|
31
|
Maryam L, Usmani SS, Raghava GPS. Computational resources in the management of antibiotic resistance: Speeding up drug discovery. Drug Discov Today 2021; 26:2138-2151. [PMID: 33892146 DOI: 10.1016/j.drudis.2021.04.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 12/24/2020] [Accepted: 04/12/2021] [Indexed: 01/19/2023]
Abstract
This article reviews more than 50 computational resources developed in past two decades for forecasting of antibiotic resistance (AR)-associated mutations, genes and genomes. More than 30 databases have been developed for AR-associated information, but only a fraction of them are updated regularly. A large number of methods have been developed to find AR genes, mutations and genomes, with most of them based on similarity-search tools such as BLAST and HMMER. In addition, methods have been developed to predict the inhibition potential of antibiotics against a bacterial strain from the whole-genome data of bacteria. This review also discuss computational resources that can be used to manage the treatment of AR-associated diseases.
Collapse
Affiliation(s)
- Lubna Maryam
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi 110020, India
| | - Salman Sadullah Usmani
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi 110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi 110020, India.
| |
Collapse
|
32
|
Winkler DA. Use of Artificial Intelligence and Machine Learning for Discovery of Drugs for Neglected Tropical Diseases. Front Chem 2021; 9:614073. [PMID: 33791277 PMCID: PMC8005575 DOI: 10.3389/fchem.2021.614073] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 01/18/2021] [Indexed: 12/11/2022] Open
Abstract
Neglected tropical diseases continue to create high levels of morbidity and mortality in a sizeable fraction of the world’s population, despite ongoing research into new treatments. Some of the most important technological developments that have accelerated drug discovery for diseases of affluent countries have not flowed down to neglected tropical disease drug discovery. Pharmaceutical development business models, cost of developing new drug treatments and subsequent costs to patients, and accessibility of technologies to scientists in most of the affected countries are some of the reasons for this low uptake and slow development relative to that for common diseases in developed countries. Computational methods are starting to make significant inroads into discovery of drugs for neglected tropical diseases due to the increasing availability of large databases that can be used to train ML models, increasing accuracy of these methods, lower entry barrier for researchers, and widespread availability of public domain machine learning codes. Here, the application of artificial intelligence, largely the subset called machine learning, to modelling and prediction of biological activities and discovery of new drugs for neglected tropical diseases is summarized. The pathways for the development of machine learning methods in the short to medium term and the use of other artificial intelligence methods for drug discovery is discussed. The current roadblocks to, and likely impacts of, synergistic new technological developments on the use of ML methods for neglected tropical disease drug discovery in the future are also discussed.
Collapse
Affiliation(s)
- David A Winkler
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, VIC, Australia.,Latrobe Institute for Molecular Science, La Trobe University, Bundoora, VIC, Australia.,School of Pharmacy, University of Nottingham, Nottingham, United Kingdom.,CSIRO Data61, Pullenvale, QLD, Australia
| |
Collapse
|
33
|
Ali Shah SM, Taju SW, Ho QT, Nguyen TTD, Ou YY. GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models. Comput Biol Med 2021; 131:104259. [PMID: 33581474 DOI: 10.1016/j.compbiomed.2021.104259] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 02/04/2021] [Accepted: 02/04/2021] [Indexed: 12/14/2022]
Abstract
Recently, language representation models have drawn a lot of attention in the field of natural language processing (NLP) due to their remarkable results. Among them, BERT (Bidirectional Encoder Representations from Transformers) has proven to be a simple, yet powerful language model that has achieved novel state-of-the-art performance. BERT adopted the concept of contextualized word embeddings to capture the semantics and context in which words appear. We utilized pre-trained BERT models to extract features from protein sequences for discriminating three families of glucose transporters: the major facilitator superfamily of glucose transporters (GLUTs), the sodium-glucose linked transporters (SGLTs), and the sugars will eventually be exported transporters (SWEETs). We treated protein sequences as sentences and transformed them into fixed-length meaningful vectors where a 768- or 1024-dimensional vector represents each amino acid. We observed that BERT-Base and BERT-Large models improved the performance by more than 4% in terms of average sensitivity and Matthews correlation coefficient (MCC), indicating the efficiency of this approach. We also developed a bidirectional transformer-based protein model (TransportersBERT) for comparison with existing pre-trained BERT models.
Collapse
Affiliation(s)
- Syed Muazzam Ali Shah
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan
| | - Semmy Wellem Taju
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan
| | - Quang-Thai Ho
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan
| | | | - Yu-Yen Ou
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan.
| |
Collapse
|
34
|
Pirtskhalava M, Amstrong AA, Grigolava M, Chubinidze M, Alimbarashvili E, Vishnepolsky B, Gabrielian A, Rosenthal A, Hurt DE, Tartakovsky M. DBAASP v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Res 2021; 49:D288-D297. [PMID: 33151284 PMCID: PMC7778994 DOI: 10.1093/nar/gkaa991] [Citation(s) in RCA: 232] [Impact Index Per Article: 77.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 10/09/2020] [Accepted: 10/14/2020] [Indexed: 12/30/2022] Open
Abstract
The Database of Antimicrobial Activity and Structure of Peptides (DBAASP) is an open-access, comprehensive database containing information on amino acid sequences, chemical modifications, 3D structures, bioactivities and toxicities of peptides that possess antimicrobial properties. DBAASP is updated continuously, and at present, version 3.0 (DBAASP v3) contains >15 700 entries (8000 more than the previous version), including >14 500 monomers and nearly 400 homo- and hetero-multimers. Of the monomeric antimicrobial peptides (AMPs), >12 000 are synthetic, about 2700 are ribosomally synthesized, and about 170 are non-ribosomally synthesized. Approximately 3/4 of the entries were added after the initial release of the database in 2014 reflecting the recent sharp increase in interest in AMPs. Despite the increased interest, adoption of peptide antimicrobials in clinical practice is still limited as a consequence of several factors including side effects, problems with bioavailability and high production costs. To assist in developing and optimizing de novo peptides with desired biological activities, DBAASP offers several tools including a sophisticated multifactor analysis of relevant physicochemical properties. Furthermore, DBAASP has implemented a structure modelling pipeline that automates the setup, execution and upload of molecular dynamics (MD) simulations of database peptides. At present, >3200 peptides have been populated with MD trajectories and related analyses that are both viewable within the web browser and available for download. More than 400 DBAASP entries also have links to experimentally determined structures in the Protein Data Bank. DBAASP v3 is freely accessible at http://dbaasp.org.
Collapse
Affiliation(s)
- Malak Pirtskhalava
- Ivane Beritashvili Center of Experimental Biomedicine, Tbilisi 0160, Georgia
| | - Anthony A Amstrong
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Maia Grigolava
- Ivane Beritashvili Center of Experimental Biomedicine, Tbilisi 0160, Georgia
| | - Mindia Chubinidze
- Ivane Beritashvili Center of Experimental Biomedicine, Tbilisi 0160, Georgia
| | | | - Boris Vishnepolsky
- Ivane Beritashvili Center of Experimental Biomedicine, Tbilisi 0160, Georgia
| | - Andrei Gabrielian
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Alex Rosenthal
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Darrell E Hurt
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Michael Tartakovsky
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
35
|
Sharma N, Patiyal S, Dhall A, Pande A, Arora C, Raghava GPS. AlgPred 2.0: an improved method for predicting allergenic proteins and mapping of IgE epitopes. Brief Bioinform 2020; 22:5985292. [PMID: 33201237 DOI: 10.1093/bib/bbaa294] [Citation(s) in RCA: 117] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Revised: 10/02/2020] [Accepted: 10/05/2020] [Indexed: 12/22/2022] Open
Abstract
AlgPred 2.0 is a web server developed for predicting allergenic proteins and allergenic regions in a protein. It is an updated version of AlgPred developed in 2006. The dataset used for training, testing and validation consists of 10 075 allergens and 10 075 non-allergens. In addition, 10 451 experimentally validated immunoglobulin E (IgE) epitopes were used to identify antigenic regions in a protein. All models were trained on 80% of data called training dataset, and the performance of models was evaluated using 5-fold cross-validation technique. The performance of the final model trained on the training dataset was evaluated on 20% of data called validation dataset; no two proteins in any two sets have more than 40% similarity. First, a Basic Local Alignment Search Tool (BLAST) search has been performed against the dataset, and allergens were predicted based on the level of similarity with known allergens. Second, IgE epitopes obtained from the IEDB database were searched in the dataset to predict allergens based on their presence in a protein. Third, motif-based approaches like multiple EM for motif elicitation/motif alignment and search tool have been used to predict allergens. Fourth, allergen prediction models have been developed using a wide range of machine learning techniques. Finally, the ensemble approach has been used for predicting allergenic protein by combining prediction scores of different approaches. Our best model achieved maximum performance in terms of area under receiver operating characteristic curve 0.98 with Matthew's correlation coefficient 0.85 on the validation dataset. A web server AlgPred 2.0 has been developed that allows the prediction of allergens, mapping of IgE epitope, motif search and BLAST search (https://webs.iiitd.edu.in/raghava/algpred2/).
Collapse
Affiliation(s)
- Neelam Sharma
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Akshara Pande
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Chakit Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
36
|
Enhanced prediction of anti-tubercular peptides from sequence information using divergence measure-based intuitionistic fuzzy-rough feature selection. Soft comput 2020. [DOI: 10.1007/s00500-020-05363-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
37
|
Barigye SJ, Gómez-Ganau S, Serrano-Candelas E, Gozalbes R. PeptiDesCalculator: Software for computation of peptide descriptors. Definition, implementation and case studies for 9 bioactivity endpoints. Proteins 2020; 89:174-184. [PMID: 32881068 DOI: 10.1002/prot.26003] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 08/05/2020] [Accepted: 08/27/2020] [Indexed: 11/09/2022]
Abstract
We present a novel Java-based program denominated PeptiDesCalculator for computing peptide descriptors. These descriptors include: redefinitions of known protein parameters to suite the peptide domain, generalization schemes for the global descriptions of peptide characteristics, as well as empirical descriptors based on experimental evidence on peptide stability and interaction propensity. The PeptiDesCalculator software provides a user-friendly Graphical User Interface (GUI) and is parallelized to maximize the use of computational resources available in current work stations. The PeptiDesCalculator indices are employed in modeling 8 peptide bioactivity endpoints demonstrating satisfactory behavior. Moreover, we compare the performance of a support vector machine (SVM) classifier built using 15 PeptiDesCalculator indices with that of a recently reported deep neural network (DNN) antimicrobial activity classifier, demonstrating comparable test set performance notwithstanding the remarkably lower degree of freedom for the former. This software will facilitate the development of in silico models for the prediction of peptide properties.
Collapse
Affiliation(s)
- Stephen J Barigye
- ProtoQSAR SL, Centro Europeo de Empresas Innovadoras (CEEI), Parque Tecnológico de Valencia, Valencia, Spain.,MolDrug AI Systems SL, Valencia, Spain
| | - Sergi Gómez-Ganau
- ProtoQSAR SL, Centro Europeo de Empresas Innovadoras (CEEI), Parque Tecnológico de Valencia, Valencia, Spain.,Eurofins Agroscience Services Regulatory Spain SL, Valencia, Spain
| | - Eva Serrano-Candelas
- ProtoQSAR SL, Centro Europeo de Empresas Innovadoras (CEEI), Parque Tecnológico de Valencia, Valencia, Spain
| | - Rafael Gozalbes
- ProtoQSAR SL, Centro Europeo de Empresas Innovadoras (CEEI), Parque Tecnológico de Valencia, Valencia, Spain.,MolDrug AI Systems SL, Valencia, Spain
| |
Collapse
|
38
|
Chen W, Feng P, Nie F. iATP: A Sequence Based Method for Identifying Anti-tubercular Peptides. Med Chem 2020; 16:620-625. [DOI: 10.2174/1573406415666191002152441] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 05/15/2019] [Accepted: 08/23/2019] [Indexed: 11/22/2022]
Abstract
Background:
Tuberculosis is one of the biggest threats to human health. Recent studies
have demonstrated that anti-tubercular peptides are promising candidates for the discovery of new
anti-tubercular drugs. Since experimental methods are still labor intensive, it is highly desirable to
develop automatic computational methods to identify anti-tubercular peptides from the huge
amount of natural and synthetic peptides. Hence, accurate and fast computational methods are
highly needed.
Methods and Results:
In this study, a support vector machine based method was proposed to identify
anti-tubercular peptides, in which the peptides were encoded by using the optimal g-gap dipeptide
compositions. Comparative results demonstrated that our method outperforms existing methods
on the same benchmark dataset. For the convenience of scientific community, a freely accessible
web-server was built, which is available at http://lin-group.cn/server/iATP.
Conclusion:
It is anticipated that the proposed method will become a useful tool for identifying
anti-tubercular peptides.
Collapse
Affiliation(s)
- Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| | - Pengmian Feng
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| | - Fulei Nie
- Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan 063000, China
| |
Collapse
|
39
|
Manavalan B, Basith S, Shin TH, Wei L, Lee G. mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation. Bioinformatics 2020; 35:2757-2765. [PMID: 30590410 DOI: 10.1093/bioinformatics/bty1047] [Citation(s) in RCA: 165] [Impact Index Per Article: 41.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 12/05/2018] [Accepted: 12/20/2018] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Cardiovascular disease is the primary cause of death globally accounting for approximately 17.7 million deaths per year. One of the stakes linked with cardiovascular diseases and other complications is hypertension. Naturally derived bioactive peptides with antihypertensive activities serve as promising alternatives to pharmaceutical drugs. So far, there is no comprehensive analysis, assessment of diverse features and implementation of various machine-learning (ML) algorithms applied for antihypertensive peptide (AHTP) model construction. RESULTS In this study, we utilized six different ML algorithms, namely, Adaboost, extremely randomized tree (ERT), gradient boosting (GB), k-nearest neighbor, random forest (RF) and support vector machine (SVM) using 51 feature descriptors derived from eight different feature encodings for the prediction of AHTPs. While ERT-based trained models performed consistently better than other algorithms regardless of various feature descriptors, we treated them as baseline predictors, whose predicted probability of AHTPs was further used as input features separately for four different ML-algorithms (ERT, GB, RF and SVM) and developed their corresponding meta-predictors using a two-step feature selection protocol. Subsequently, the integration of four meta-predictors through an ensemble learning approach improved the balanced prediction performance and model robustness on the independent dataset. Upon comparison with existing methods, mAHTPred showed superior performance with an overall improvement of approximately 6-7% in both benchmarking and independent datasets. AVAILABILITY AND IMPLEMENTATION The user-friendly online prediction tool, mAHTPred is freely accessible at http://thegleelab.org/mAHTPred. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Tae Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.,Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
| | - Leyi Wei
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.,Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
| |
Collapse
|
40
|
Jamal S, Khubaib M, Gangwar R, Grover S, Grover A, Hasnain SE. Artificial Intelligence and Machine learning based prediction of resistant and susceptible mutations in Mycobacterium tuberculosis. Sci Rep 2020; 10:5487. [PMID: 32218465 PMCID: PMC7099008 DOI: 10.1038/s41598-020-62368-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Accepted: 03/13/2020] [Indexed: 11/09/2022] Open
Abstract
Tuberculosis (TB), an infectious disease caused by Mycobacterium tuberculosis (M.tb), causes highest number of deaths globally for any bacterial disease necessitating novel diagnosis and treatment strategies. High-throughput sequencing methods generate a large amount of data which could be exploited in determining multi-drug resistant (MDR-TB) associated mutations. The present work is a computational framework that uses artificial intelligence (AI) based machine learning (ML) approaches for predicting resistance in the genes rpoB, inhA, katG, pncA, gyrA and gyrB for the drugs rifampicin, isoniazid, pyrazinamide and fluoroquinolones. The single nucleotide variations were represented by several sequence and structural features that indicate the influence of mutations on the target protein coded by each gene. We used ML algorithms - naïve bayes, k nearest neighbor, support vector machine, and artificial neural network, to build the prediction models. The classification models had an average accuracy of 85% across all examined genes and were evaluated on an external unseen dataset to demonstrate their application. Further, molecular docking and molecular dynamics simulations were performed for wild type and predicted resistance causing mutant protein and anti-TB drug complexes to study their impact on the conformation of proteins to confirm the observed phenotype.
Collapse
Affiliation(s)
- Salma Jamal
- Jamia Hamdard Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Mohd Khubaib
- Jamia Hamdard Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Rishabh Gangwar
- Jamia Hamdard Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Sonam Grover
- Jamia Hamdard Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Abhinav Grover
- School of Biotechnology, Jawaharlal Nehru University, New Mehrauli Road, New Delhi, 110 067, India
| | - Seyed E Hasnain
- Jamia Hamdard Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India. .,Dr. Reddy's Institute of Life Sciences, University of Hyderabad Campus, Professor C.R. Rao Road, Hyderabad, 500046, India.
| |
Collapse
|
41
|
Kaur D, Arora C, Raghava GPS. A Hybrid Model for Predicting Pattern Recognition Receptors Using Evolutionary Information. Front Immunol 2020; 11:71. [PMID: 32082326 PMCID: PMC7002473 DOI: 10.3389/fimmu.2020.00071] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 01/13/2020] [Indexed: 12/17/2022] Open
Abstract
This study describes a method developed for predicting pattern recognition receptors (PRRs), which are an integral part of the immune system. The models developed here were trained and evaluated on the largest possible non-redundant PRRs, obtained from PRRDB 2.0, and non-pattern recognition receptors (Non-PRRs), obtained from Swiss-Prot. Firstly, a similarity-based approach using BLAST was used to predict PRRs and got limited success due to a large number of no-hits. Secondly, machine learning-based models were developed using sequence composition and achieved a maximum MCC of 0.63. In addition to this, models were developed using evolutionary information in the form of PSSM composition and achieved maximum MCC value of 0.66. Finally, we developed hybrid models that combined a similarity-based approach using BLAST and machine learning-based models. Our best model, which combined BLAST and PSSM based model, achieved a maximum MCC value of 0.82 with an AUROC value of 0.95, utilizing the potential of both similarity-based search and machine learning techniques. In order to facilitate the scientific community, we also developed a web server "PRRpred" based on the best model developed in this study (http://webs.iiitd.edu.in/raghava/prrpred/).
Collapse
Affiliation(s)
- Dilraj Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Chakit Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
42
|
Basith S, Manavalan B, Hwan Shin T, Lee G. Machine intelligence in peptide therapeutics: A next‐generation tool for rapid disease screening. Med Res Rev 2020; 40:1276-1314. [DOI: 10.1002/med.21658] [Citation(s) in RCA: 139] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 11/26/2019] [Accepted: 12/16/2019] [Indexed: 12/12/2022]
Affiliation(s)
- Shaherin Basith
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| | | | - Tae Hwan Shin
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| | - Gwang Lee
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| |
Collapse
|
43
|
Kaur D, Patiyal S, Sharma N, Usmani SS, Raghava GPS. PRRDB 2.0: a comprehensive database of pattern-recognition receptors and their ligands. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5523871. [PMID: 31250014 PMCID: PMC6597477 DOI: 10.1093/database/baz076] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 05/21/2019] [Accepted: 05/22/2019] [Indexed: 12/19/2022]
Abstract
PRRDB 2.0 is an updated version of PRRDB that maintains comprehensive information about pattern-recognition receptors (PRRs) and their ligands. The current version of the database has ~2700 entries, which are nearly five times of the previous version. It contains extensive information about 467 unique PRRs and 827 pathogens-associated molecular patterns (PAMPs), manually extracted from ~600 research articles. It possesses information about PRRs and PAMPs that has been extracted manually from research articles and public databases. Each entry provides comprehensive details about PRRs and PAMPs that includes their name, sequence, origin, source, type, etc. We have provided internal and external links to various databases/resources (like Swiss-Prot, PubChem) to obtain further information about PRRs and their ligands. This database also provides links to ~4500 experimentally determined structures in the protein data bank of various PRRs and their complexes. In addition, 110 PRRs with unknown structures have also been predicted, which are important in order to understand the structure-function relationship between receptors and their ligands. Numerous web-based tools have been integrated into PRRDB 2.0 to facilitate users to perform different tasks like (i) extensive searching of the database; (ii) browsing or categorization of data based on receptors, ligands, source, etc. and (iii) similarity search using BLAST and Smith-Waterman algorithm.
Collapse
Affiliation(s)
- Dilraj Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, New Delhi 110020, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, New Delhi 110020, India
| | - Neelam Sharma
- Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, New Delhi 110020, India
| | - Salman Sadullah Usmani
- Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, New Delhi 110020, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, Chandigarh 160036, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, New Delhi 110020, India
| |
Collapse
|
44
|
Khatun S, Hasan M, Kurata H. Efficient computational model for identification of antitubercular peptides by integrating amino acid patterns and properties. FEBS Lett 2019; 593:3029-3039. [PMID: 31297788 DOI: 10.1002/1873-3468.13536] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 06/25/2019] [Accepted: 07/05/2019] [Indexed: 12/30/2022]
Abstract
Tuberculosis (TB) is a leading killer caused by Mycobacterium tuberculosis. Recently, anti-TB peptides have provided an alternative approach to combat antibiotic tolerance. We have developed an effective computational predictor, identification of antitubercular peptides (iAntiTB), by the integration of multiple feature vectors deriving from the amino acid sequences via random forest (RF) and support vector machine (SVM) classifiers. The iAntiTB combines the RF and SVM scores via linear regression to enhance the prediction accuracy. To make a robust and accurate predictor, we prepared the two datasets with different types of negative samples. The iAntiTB achieved area under the ROC curve values of 0.896 and 0.946 on the training datasets of the first and second datasets, respectively. The iAntiTB outperformed the other existing predictors.
Collapse
Affiliation(s)
- Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
| | - Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan.,Biomedical Informatics R&D Center, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
| |
Collapse
|
45
|
AtbPpred: A Robust Sequence-Based Prediction of Anti-Tubercular Peptides Using Extremely Randomized Trees. Comput Struct Biotechnol J 2019; 17:972-981. [PMID: 31372196 PMCID: PMC6658830 DOI: 10.1016/j.csbj.2019.06.024] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 06/27/2019] [Accepted: 06/28/2019] [Indexed: 01/01/2023] Open
Abstract
Mycobacterium tuberculosis is one of the most dangerous pathogens in humans. It acts as an etiological agent of tuberculosis (TB), infecting almost one-third of the world's population. Owing to the high incidence of multidrug-resistant TB and extensively drug-resistant TB, there is an urgent need for novel and effective alternative therapies. Peptide-based therapy has several advantages, such as diverse mechanisms of action, low immunogenicity, and selective affinity to bacterial cell envelopes. However, the identification of anti-tubercular peptides (AtbPs) via experimentation is laborious and expensive; hence, the development of an efficient computational method is necessary for the prediction of AtbPs prior to both in vitro and in vivo experiments. To this end, we developed a two-layer machine learning (ML)-based predictor called AtbPpred for the identification of AtbPs. In the first layer, we applied a two-step feature selection procedure and identified the optimal feature set individually for nine different feature encodings, whose corresponding models were developed using extremely randomized tree (ERT). In the second-layer, the predicted probability of AtbPs from the above nine models were considered as input features to ERT and developed the final predictor. AtbPpred respectively achieved average accuracies of 88.3% and 87.3% during cross-validation and an independent evaluation, which were ~8.7% and 10.0% higher than the state-of-the-art method. Furthermore, we established a user-friendly webserver which is currently available at http://thegleelab.org/AtbPpred. We anticipate that this predictor could be useful in the high-throughput prediction of AtbPs and also provide mechanistic insights into its functions. We developed a novel computational framework for the identification of anti-tubercular peptides using Extremely randomized tree. AtbPpred displayed superior performance compared to the existing method on both benchmark and independent datasets. We constructed a user-friendly web server that implements the proposed AtbPpred method.
Collapse
|
46
|
Manavalan B, Basith S, Shin TH, Wei L, Lee G. Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 16:733-744. [PMID: 31146255 PMCID: PMC6540332 DOI: 10.1016/j.omtn.2019.04.019] [Citation(s) in RCA: 164] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 04/16/2019] [Accepted: 04/22/2019] [Indexed: 11/19/2022]
Abstract
DNA N4-methylcytosine (4mC) is an important genetic modification and plays crucial roles in differentiation between self and non-self DNA and in controlling DNA replication, cell cycle, and gene-expression levels. Accurate 4mC site identification is fundamental to improve the understanding of 4mC biological functions and mechanisms. Hence, it is necessary to develop in silico approaches for efficient and high-throughput 4mC site identification. Although some bioinformatic tools have been developed in this regard, their prediction accuracy and generalizability require improvement to optimize their usability in practical applications. For this purpose, we here proposed Meta-4mCpred, a meta-predictor for 4mC site prediction. In Meta-4mCpred, we employed a feature representation learning scheme and generated 56 probabilistic features based on four different machine-learning algorithms and seven feature encodings covering diverse sequence information, including compositional, physicochemical, and position-specific information. Subsequently, the probabilistic features were used as an input to support vector machine and developed a final meta-predictor. To the best of our knowledge, this is the first meta-predictor for 4mC site prediction. Cross-validation results show that Meta-4mCpred achieved an overall average accuracy of 84.2% from six different species, which is ∼2%–4% higher than those attainable using the state-of-the-art predictors. Furthermore, Meta-4mCpred achieved an overall average accuracy of 86% on independent datasets evaluation, which is over 4% higher than those yielded by the state-of-the-art predictors. The user-friendly webserver employed to implement the proposed Meta-4mCpred is freely accessible at http://thegleelab.org/Meta-4mCpred.
Collapse
Affiliation(s)
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Tae Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea; Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
| | - Leyi Wei
- School of Computer Science and Technology, Tianjin University, China.
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea; Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea.
| |
Collapse
|
47
|
Spänig S, Heider D. Encodings and models for antimicrobial peptide classification for multi-resistant pathogens. BioData Min 2019; 12:7. [PMID: 30867681 PMCID: PMC6399931 DOI: 10.1186/s13040-019-0196-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 02/24/2019] [Indexed: 01/10/2023] Open
Abstract
Antimicrobial peptides (AMPs) are part of the inherent immune system. In fact, they occur in almost all organisms including, e.g., plants, animals, and humans. Remarkably, they show effectivity also against multi-resistant pathogens with a high selectivity. This is especially crucial in times, where society is faced with the major threat of an ever-increasing amount of antibiotic resistant microbes. In addition, AMPs can also exhibit antitumor and antiviral effects, thus a variety of scientific studies dealt with the prediction of active peptides in recent years. Due to their potential, even the pharmaceutical industry is keen on discovering and developing novel AMPs. However, AMPs are difficult to verify in vitro, hence researchers conduct sequence similarity experiments against known, active peptides. Unfortunately, this approach is very time-consuming and limits potential candidates to sequences with a high similarity to known AMPs. Machine learning methods offer the opportunity to explore the huge space of sequence variations in a timely manner. These algorithms have, in principal, paved the way for an automated discovery of AMPs. However, machine learning models require a numerical input, thus an informative encoding is very important. Unfortunately, developing an appropriate encoding is a major challenge, which has not been entirely solved so far. For this reason, the development of novel amino acid encodings is established as a stand-alone research branch. The present review introduces state-of-the-art encodings of amino acids as well as their properties in sequence and structure based aggregation. Moreover, albeit a well-chosen encoding is essential, performant classifiers are required, which is reflected by a tendency towards specifically designed models in the literature. Furthermore, we introduce these models with a particular focus on encodings derived from support vector machines and deep learning approaches. Albeit a strong focus has been set on AMP predictions, not all of the mentioned encodings have been elaborated as part of antimicrobial research studies, but rather as general protein or peptide representations.
Collapse
Affiliation(s)
- Sebastian Spänig
- Department of Bioinformatics, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| | - Dominik Heider
- Department of Bioinformatics, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| |
Collapse
|
48
|
Usmani SS, Agrawal P, Sehgal M, Patel PK, Raghava GPS. ImmunoSPdb: an archive of immunosuppressive peptides. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5309009. [PMID: 30753476 PMCID: PMC6367516 DOI: 10.1093/database/baz012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 01/15/2019] [Indexed: 11/12/2022]
Abstract
Immunosuppression proved as a captivating therapy in several autoimmune disorders, asthma as well as in organ transplantation. Immunosuppressive peptides are specific for reducing efficacy of immune system with wide range of therapeutic implementations. `ImmunoSPdb’ is a comprehensive, manually curated database of around 500 experimentally verified immunosuppressive peptides compiled from 79 research article and 32 patents. The current version comprises of 553 entries providing extensive information including peptide name, sequence, chirality, chemical modification, origin, nature of peptide, its target as well as mechanism of action, amino acid frequency and composition, etc. Data analysis revealed that most of the immunosuppressive peptides are linear (91%), are shorter in length i.e. up to 20 amino acids (62%) and have L form of amino acids (81%). About 30% peptide are either chemically modified or have end terminal modification. Most of the peptides either are derived from proteins (41%) or naturally (27%) exist. Blockage of potassium ion channel (24%) is one a major target for immunosuppressive peptides. In addition, we have annotated tertiary structure by using PEPstrMOD and I-TASSER. Many user-friendly, web-based tools have been integrated to facilitate searching, browsing and analyzing the data. We have developed a user-friendly responsive website to assist a wide range of users.
Collapse
Affiliation(s)
- Salman Sadullah Usmani
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Piyush Agrawal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Manika Sehgal
- Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Pradeep Kumar Patel
- Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| |
Collapse
|